Metanorma: Aequitate Verum

Building and distributing a single combined Metanorma artifact using Data URIs

Author’s picture Nick Nicholas on 05 Oct 2024

Introduction

Metanorma supports two types of output XML formats: a single-combined Metanorma XML output, where only a single XML file is generated as a compilation artifact, and a file tree composed of an XML file with additional file links, with the links pointing to the original included files. The single-combined format is useful for distribution as it encapsulates all data within one file, while the file tree format maintains references to external files, which can be beneficial for managing and updating individual components.

This post discusses the benefits of using a single combined Metanorma XML output and the use of Data URIs to represent media files within the document.

Unified Metanorma XML output

Metanorma supports generation of a single, combined XML output, which differs from having a tree of files where files link with each other. This is achieved by using the :data-uri-image: true option which is enabled by default.

Unlike Microsoft Word, which stores all media files inside an archive, Metanorma does not currently specify a compressed archive format.

Instead, it uses Data URIs to combine all the data files into a single Metanorma XML file.

This approach is beneficial for distribution since there are no moving parts (a single file) that can result in broken links.

Encoding as Data URI (default)

When :data-uri-image: true is set, Metanorma encodes images, audio files, and video files as inline Data URIs.

As a result, the Metanorma Semantic XML output embeds all media files within the single XML file as Data URIs.

The advantage to this internal representation of files for distributing Metanorma documents:

  • When generating an HTML document, the generated HTML can be sent anywhere as a single file, without needing to take care of the separate media files or file attachments it invokes.

  • When distributing the authoritative Semantic XML file, you do not need to worry about the media files being lost or misplaced, as they are all bundled within the XML file.

For Word documents and for PDFs, the media files are converted into internally bundled representations.

For file attachments, Metanorma uses the same approach, but as an XML element rather than a URI. This is similar to the representation of file attachments in HTML, which Alex Dyuzhev recently wrote about in PDF Attachments.

Note
Attachments are just as valid for HTML as for PDF output.

Limitations of using Data URIs

There are limitations to using Data URIs though:

  • If media files are too large, presentation software may have trouble processing those URIs.

  • Browsers can routinely handle URIs up to 1 MB without issues, but larger URIs, such as those representing video files of 100 MB or 1 GB, can cause problems.

  • Performance and stability issues may arise when dealing with excessively large Data URIs.

Disabling Data URI encoding

General

There are valid reasons to disable Data URI encoding, such as when media files are too large.

Given that Metanorma supports multiple types of output, including PDF, HTML and Word, it is essential to consider the implications of disabling Data URI encoding.

When Data URI encoding is disabled, media files are referenced as links to external files in the Metanorma XML files and the HTML output, rather than bundling them inside the file.

In this case, the Word and PDF outputs will need to convert the media files into internally bundled representations.

Disabling Data URI encoding for media files

Encoding media files as Data URIs can be disabled by setting the document attribute :data-uri-image: false.

This means that all media files in the document are referenced, in the Metanorma XML files and the HTML output, as links to those external files, rather than bundling them inside the file.

The implications of this approach are:

  • XML: When distributing the authoritative Semantic XML file, you will need to take care to include those media files to ensure the external file links are maintained.

  • HTML: When distributing the generated HTML file, you will need to take care to include those media files to ensure the external file links are maintained.

  • Word and PDF: Unaffected by this setting, as the media files are bundled internally in these representations.

Disabling Data URI encoding for attachments

Disabling Data URI encoding for file attachments can be achieved by setting the document attribute :data-uri-attachments: false.

In this case, any file attachments will be referenced as links, rather than bundling them inside the file, and you will need to handle them the same way you handle attachments.

The catch is that, unlike media files, HTML cannot make sense of Data URI encoding for an arbitrary attachment, so you will have to distribute the HTML file with its attachments as separate files anyway.

The implications of this approach are:

  • XML: When distributing the authoritative Semantic XML file, you will need to take care to include those file attachments to ensure the external file links are maintained.

  • HTML: When distributing the generated HTML file, you will need to take care to include those file attachments to ensure the external file links are maintained. All attachments bundled with the file are exported to a folder called _{document-name}_attachments.

  • Word and PDF: Unaffected by this setting, as the file attachments are bundled internally in these representations.

Extending the Data URI size limit

To prevent users from inadvertently generating Data URIs too big for a browser to handle, Metanorma sets the maximum allowed Data URI size by default to 14 MB (corresponding to a 10 MB media file).

If the Data URI needed to represent a media file is bigger than that, Metanorma aborts execution with a warning that you need to change file configuration.

You can deal with this warning in one of three ways:

  • Set :data-uri-attachments: false

  • Set data-uri-maxsize to a byte size big enough to capture your file. Remember that Data URI encodings are one third larger than the binary files they encode. So if you have a 1 GB media file, you will need to set data-uri-maxsize: 1400000000, to prevent aborting.

  • Set data-uri-maxsize: 0, if you want to throw caution to the winds, and have no maximum Data URI size for your document.

Conclusion

Using Data URIs in Metanorma provides a streamlined way to distribute documents as a single file, avoiding issues with broken links. However, it is essential to be aware of the limitations, such as performance issues with large files, and configure the settings appropriately to handle larger files effectively.

By understanding and utilizing the options to disable Data URI encoding or extend the Data URI size limit, users can ensure their documents are both efficient and reliable for distribution.