Building and distributing a single combined Metanorma artifact using Data URIs
Introduction
Metanorma supports two types of output XML formats: a single-combined Metanorma XML output, where only a single XML file is generated as a compilation artifact, and a file tree composed of an XML file with additional file links, with the links pointing to the original included files. The single-combined format is useful for distribution as it encapsulates all data within one file, while the file tree format maintains references to external files, which can be beneficial for managing and updating individual components.
This post discusses the benefits of using a single combined Metanorma XML output and the use of Data URIs to represent media files within the document.
Unified Metanorma XML output
Metanorma supports generation of a single, combined XML output, which differs
from having a tree of files where files link with each other. This is achieved
by using the :data-uri-image: true
option which is enabled by default.
Unlike Microsoft Word, which stores all media files inside an archive, Metanorma does not currently specify a compressed archive format.
Instead, it uses Data URIs to combine all the data files into a single Metanorma XML file.
This approach is beneficial for distribution since there are no moving parts (a single file) that can result in broken links.
Encoding as Data URI (default)
When :data-uri-image: true
is set, Metanorma encodes images, audio files, and
video files as inline Data URIs.
As a result, the Metanorma Semantic XML output embeds all media files within the single XML file as Data URIs.
The advantage to this internal representation of files for distributing Metanorma documents:
-
When generating an HTML document, the generated HTML can be sent anywhere as a single file, without needing to take care of the separate media files or file attachments it invokes.
-
When distributing the authoritative Semantic XML file, you do not need to worry about the media files being lost or misplaced, as they are all bundled within the XML file.
For Word documents and for PDFs, the media files are converted into internally bundled representations.
For file attachments, Metanorma uses the same approach, but as an XML element rather than a URI. This is similar to the representation of file attachments in HTML, which Alex Dyuzhev recently wrote about in PDF Attachments.
Note
|
Attachments are just as valid for HTML as for PDF output. |
Limitations of using Data URIs
There are limitations to using Data URIs though:
-
If media files are too large, presentation software may have trouble processing those URIs.
-
Browsers can routinely handle URIs up to 1 MB without issues, but larger URIs, such as those representing video files of 100 MB or 1 GB, can cause problems.
-
Performance and stability issues may arise when dealing with excessively large Data URIs.
Disabling Data URI encoding
General
There are valid reasons to disable Data URI encoding, such as when media files are too large.
Given that Metanorma supports multiple types of output, including PDF, HTML and Word, it is essential to consider the implications of disabling Data URI encoding.
When Data URI encoding is disabled, media files are referenced as links to external files in the Metanorma XML files and the HTML output, rather than bundling them inside the file.
In this case, the Word and PDF outputs will need to convert the media files into internally bundled representations.
Disabling Data URI encoding for media files
Encoding media files as Data URIs can be disabled by setting the document
attribute :data-uri-image: false
.
This means that all media files in the document are referenced, in the Metanorma XML files and the HTML output, as links to those external files, rather than bundling them inside the file.
The implications of this approach are:
-
XML: When distributing the authoritative Semantic XML file, you will need to take care to include those media files to ensure the external file links are maintained.
-
HTML: When distributing the generated HTML file, you will need to take care to include those media files to ensure the external file links are maintained.
-
Word and PDF: Unaffected by this setting, as the media files are bundled internally in these representations.
Disabling Data URI encoding for attachments
Disabling Data URI encoding for file attachments can be achieved by setting the
document attribute :data-uri-attachments: false
.
In this case, any file attachments will be referenced as links, rather than bundling them inside the file, and you will need to handle them the same way you handle attachments.
The catch is that, unlike media files, HTML cannot make sense of Data URI encoding for an arbitrary attachment, so you will have to distribute the HTML file with its attachments as separate files anyway.
The implications of this approach are:
-
XML: When distributing the authoritative Semantic XML file, you will need to take care to include those file attachments to ensure the external file links are maintained.
-
HTML: When distributing the generated HTML file, you will need to take care to include those file attachments to ensure the external file links are maintained. All attachments bundled with the file are exported to a folder called
_{document-name}_attachments
. -
Word and PDF: Unaffected by this setting, as the file attachments are bundled internally in these representations.
Extending the Data URI size limit
To prevent users from inadvertently generating Data URIs too big for a browser to handle, Metanorma sets the maximum allowed Data URI size by default to 14 MB (corresponding to a 10 MB media file).
If the Data URI needed to represent a media file is bigger than that, Metanorma aborts execution with a warning that you need to change file configuration.
You can deal with this warning in one of three ways:
-
Set
:data-uri-attachments: false
-
Set
data-uri-maxsize
to a byte size big enough to capture your file. Remember that Data URI encodings are one third larger than the binary files they encode. So if you have a 1 GB media file, you will need to setdata-uri-maxsize: 1400000000
, to prevent aborting. -
Set
data-uri-maxsize: 0
, if you want to throw caution to the winds, and have no maximum Data URI size for your document.
Conclusion
Using Data URIs in Metanorma provides a streamlined way to distribute documents as a single file, avoiding issues with broken links. However, it is essential to be aware of the limitations, such as performance issues with large files, and configure the settings appropriately to handle larger files effectively.
By understanding and utilizing the options to disable Data URI encoding or extend the Data URI size limit, users can ensure their documents are both efficient and reliable for distribution.