Accessibility tagging in Metanorma PDFs

Need for accessibility tagging

Assistive technologies (AT) in general and screen readers are important tools for the visually impaired to read documents, and that also applies to standardization documents prepared using Metanorma.

Many organizations that utilize the Metanorma suite are legally required to provide “accessible” output, in other words, additional information that ensures content to be useable by AT tools.

Metanorma is committed to supporting the population that utilize assistive technologies. In this post we introduce Metanorma PDF accessibility features that are built into every PDF generated using Metanorma.

Note	Common legal requirements include the US Federal Government’s Section 508 and the European Accessibility Act.

Introducing the PDF tag tree

Those who have read the previous post about PDF math accessibility will know that the PDF format provides two kinds of information hierarchies, namely:

the content tree is the representation of the layout of content, providing a hierarchy of data elements that reflect the selectable text of a PDF file;
the tag tree is the representation of the logical structure of the document and its content, providing a hierarchy of data elements intended for accessibility applications.

Accessibility features in PDF commonly rely on information embedded in the tag tree.

The PDF tag tree is implemented as a hierarchy of tags with metadata that each relate to a visual element on the page, in effect "tagging" a content element with additional information. Each tag defines the structural role of the content element, such as whether it is a section title, or a list label, etc.

In fact, a correctly populated and structured tag tree is a main requirement for screen readers and other assistive technologies to work properly with a PDF document.

Tags are immensely useful for accessibility experiences, they:

allow the identification of document elements and their roles, such as headers, paragraphs, lists, and external elements (outside of the PDF file) on a PDF page, in effect making the content accessible;
provide a meaningful reading order for screen readers, text-to-speech tools and other assistive technology tools;
facilitate document resizing/reflowing for viewing with non-default font-size or smaller screens.

Note	Specification of PDF tags are defined in the PDF 1.7 standard, ISO 32000-1:2008.

Basic structural tagging

As described in the previous post, Metanorma generates PDFs through mn2pdf, a Java PDF processor based on the open-source Apache FOP (Formatting Objects Processor), a print formatter driven by XSL formatting objects (XSL-FO) technology.

While Apache FOP provides a default mapping for Formatting Objects (FO) to PDF tags, the mapping is basic and does not fully meet the needs of modern assistive technologies.

In the following sections we illustrate how Metanorma performs tagging.

Table 1. Metanorma formatting object mapping to PDF tags (identical to Apache FOP)
Meaning	Formatting object element	PDF tag value
Major division, clause/section	`fo:page-sequence`	`Part`
Block	`fo:block-container`	`Div`
Paragraph	`fo:block`	`P`

Figure 1. The ISO Rice document with an accurately populated tag tree

Detailed structural tagging

Lists and list items

The PDF standard also provides the list and list item tags to identify those roles within rendered content, in Metanorma we extend the mapping to them.

Table 2. List-related mapping to PDF tags
Meaning	Formatting object element	PDF tag value
List	`fo:list-block`	`L`
List item	`fo:list-item`	`LI`

The following example demonstrates the tagged list and list items in a generated PDF document.

Figure 2. Tags with L and LI for list and list items in the ISO Rice document

We’ve customized the mapping to more accuracy of the tagging:

Headings, sub-headings and more

The PDF standard provides a series of heading tags to identify the differentiation of importance amongst headings, and they are automatically supported by the Metanorma PDF generation engine.

These tags are not mapped from Formatting Objects but directly set by the generation engine in output.

Table 3. Heading mapping to PDF tags
Meaning	PDF tag value
Header 1, clause heading	`H1`
Header 2, sub-clause heading	`H2`
Header 3, second-level sub-clause	`H3`
Header 4, third-level sub-clause	`H4`
Header 5, fourth-level sub-clause	`H5`
Header 6, fifth-level sub-clause	`H6`

Figure 3. Tags H1 to H6 for clause and sub-clause headings

Table 4. Table of contents mapping to PDF tags
Meaning	PDF tag value
Table of contents section	`TOC`
Table of contents individual entry	`TOCI`

Block quotes

The BlockQuote tag is provided by the PDF standard to tag quotations in block form.

Table 5. Block quote mapping to PDF tags
Meaning	PDF tag value
Block quote	`BlockQuote`

Figure 5. Tag BlockQuote for block quotations

Index

While not every document contains an index, the PDF standard helpfully provides a special tag Index to indicate a document’s index content.

Table 6. Index section mapping to PDF tags
Meaning	PDF tag value
Index section	`Index`
Index individual entry	`P`

Figure 6. Tag Index for the document’s Index

Source code

The PDF standard provides the Code tag to indicate that the tagged content is software source code.

Table 7. Source code mapping to PDF tags
Meaning	PDF tag value
Source code inline or block	`Code`

Figure 7. Tag Code to indicate source code

Summary

Metanorma provides excellent support of PDF accessibility features out of the box, and particularly provides an accurate and fully structured tag tree in generated PDFs to facilitate usage of assistive technologies.

If you have any further accessibility needs with Metanorma, please do not hesitate to contact us!