Accessibility tagging in Metanorma PDFs
Need for accessibility tagging
Assistive technologies (AT) in general and screen readers are important tools for the visually impaired to read documents, and that also applies to standardization documents prepared using Metanorma.
Many organizations that utilize the Metanorma suite are legally required to provide “accessible” output, in other words, additional information that ensures content to be useable by AT tools.
Metanorma is committed to supporting the population that utilize assistive technologies. In this post we introduce Metanorma PDF accessibility features that are built into every PDF generated using Metanorma.
Note
|
Common legal requirements include the US Federal Government’s Section 508 and the European Accessibility Act. |
Introducing the PDF tag tree
Those who have read the previous post about PDF math accessibility will know that the PDF format provides two kinds of information hierarchies, namely:
-
the content tree is the representation of the layout of content, providing a hierarchy of data elements that reflect the selectable text of a PDF file;
-
the tag tree is the representation of the logical structure of the document and its content, providing a hierarchy of data elements intended for accessibility applications.
Accessibility features in PDF commonly rely on information embedded in the tag tree.
The PDF tag tree is implemented as a hierarchy of tags with metadata that each relate to a visual element on the page, in effect "tagging" a content element with additional information. Each tag defines the structural role of the content element, such as whether it is a section title, or a list label, etc.
In fact, a correctly populated and structured tag tree is a main requirement for screen readers and other assistive technologies to work properly with a PDF document.
Tags are immensely useful for accessibility experiences, they:
-
allow the identification of document elements and their roles, such as headers, paragraphs, lists, and external elements (outside of the PDF file) on a PDF page, in effect making the content accessible;
-
provide a meaningful reading order for screen readers, text-to-speech tools and other assistive technology tools;
-
facilitate document resizing/reflowing for viewing with non-default font-size or smaller screens.
Note
|
Specification of PDF tags are defined in the PDF 1.7 standard, ISO 32000-1:2008. |
Basic structural tagging
As described in the previous post,
Metanorma generates PDFs through mn2pdf
, a Java PDF processor based on the
open-source
Apache FOP (Formatting Objects Processor),
a print formatter driven by
XSL formatting objects (XSL-FO) technology.
While Apache FOP provides a default mapping for Formatting Objects (FO) to PDF tags, the mapping is basic and does not fully meet the needs of modern assistive technologies.
In the following sections we illustrate how Metanorma performs tagging.
Meaning | Formatting object element | PDF tag value |
---|---|---|
Major division, clause/section |
|
|
Block |
|
|
Paragraph |
|
|
Detailed structural tagging
Lists and list items
The PDF standard also provides the list and list item tags to identify those roles within rendered content, in Metanorma we extend the mapping to them.
Meaning | Formatting object element | PDF tag value |
---|---|---|
List |
|
|
List item |
|
|
The following example demonstrates the tagged list and list items in a generated PDF document.
L
and LI
for list and list items in the ISO Rice documentWe’ve customized the mapping to more accuracy of the tagging:
Headings, sub-headings and more
The PDF standard provides a series of heading tags to identify the differentiation of importance amongst headings, and they are automatically supported by the Metanorma PDF generation engine.
These tags are not mapped from Formatting Objects but directly set by the generation engine in output.
Meaning | PDF tag value |
---|---|
Header 1, clause heading |
|
Header 2, sub-clause heading |
|
Header 3, second-level sub-clause |
|
Header 4, third-level sub-clause |
|
Header 5, fourth-level sub-clause |
|
Header 6, fifth-level sub-clause |
|
H1
to H6
for clause and sub-clause headingsTable of contents
The PDF standard provides the TOC
and TOCI
tags for the "Table of Contents"
section and each individual entry within the table of contents.
Meaning | PDF tag value |
---|---|
Table of contents section |
|
Table of contents individual entry |
|
TOC
and TOCI
for the Table of ContentsBlock quotes
The BlockQuote
tag is provided by the PDF standard to tag quotations in block
form.
Meaning | PDF tag value |
---|---|
Block quote |
|
BlockQuote
for block quotationsIndex
While not every document contains an index, the PDF standard helpfully provides
a special tag Index
to indicate a document’s index content.
Meaning | PDF tag value |
---|---|
Index section |
|
Index individual entry |
|
Index
for the document’s IndexSource code
The PDF standard provides the Code
tag to indicate that the tagged content
is software source code.
Meaning | PDF tag value |
---|---|
Source code inline or block |
|
Code
to indicate source codeSummary
Metanorma provides excellent support of PDF accessibility features out of the box, and particularly provides an accurate and fully structured tag tree in generated PDFs to facilitate usage of assistive technologies.
If you have any further accessibility needs with Metanorma, please do not hesitate to contact us!
References
-
ISO 32000-1:2008, the PDF 1.7 standard