Convert ISO/NISO STS documents to Metanorma using mnconvert
Introduction
An increasingly number of National Standards Bodies (NSBs), as members of ISO and IEC, now use Metanorma to re-publish their adoptions of ISO and IEC standards.
Occasionally, these NSBs also re-publish "corrections" of the ISO and IEC standards when the source files happen to be erroneous or corrupt.
Typically, a somewhat manual process was needed to convert ISO STS and NISO STS files to Metanorma.
To address this need, Metanorma now provides mnconvert
, a command line tool
that helps automatically convert ISO STS or NISO STS formats into Metanorma.
With mnconvert
, STS users can easily convert and republish their ISO- and
IEC-sourced documents using Metanorma.
ISO STS and NISO STS formats
The ISO Standards Tag Set (ISO STS) was originally developed by ISO with its NSB members as an internal XML format to facilitate its XML-based publishing workflow.
The ISO STS XML schema defines a set of elements and attributes that allow encoding of metadata and content of an ISO/IEC standard document in XML.
The format was originally developed based on a tailored set of elements from the Journal Article Tag Set (JATS) schema, which was originally developed by the National Institute of Health (NIH) for usage in scientific publishing.
As ISO STS came into use, several NSBs wanted to generalize a format for national standards publishing. They took ISO STS to the National Information Standards Organization (NISO) to standardize it.
The result was NISO Standards Tag Set (NISO STS), a generalized tag set that supports notions of national and regional adoption of standards, used by NSBs.
Starting in 2022, ISO has began its migration from ISO STS to NISO STS. IEC, which held out from the previous XML exercise, has started to migrate its publication workflow to NISO STS starting in 2023.
Migrating from ISO STS and NISO STS formats to Metanorma
ISO and IEC now distribute standards to its members using the NISO STS format.
However, NISO STS is an "encoding format" meant to be read-only.
The usage of NISO STS does not facilitate the:
-
authoring or editing of standards;
-
adoption of standards into national or regional standards formats with modifications.
In addition, not all NSBs wished to adopt the same set of tools used by ISO and IEC to work with NISO STS, due to cost, quality, flexibility and integration issues.
An increasing number of NSBs have adopted Metanorma for their STS processing toolchain, since Metanorma enables ingestion of ISO STS and NISO STS files, authoring and editing of the standards, and the re-publication or adoption into the national / regional formats, or other presentational formats like HTML and Word DOC.
An ISO STS or NISO STS XML file, especially when at length, are generally difficult to manually migrate into the Metanorma format, even with advanced text editors.
For NSBs that work with both ISO and IEC publishing, the incompatibilities between the NISO STS files from both organizations further exacerbates the issue. Despite both using NISO STS, ISO and IEC use slightly different conventions and encodings in NISO STS. This presents an additional barrier for NSBs trying to use a unified XML toolchain to handle documents from both ISO and IEC.
The encoding differences between ISO’s NISO STS and IEC’s NISO STS is documented in the NISO STS 1.0 — IEC/ISO Coding Guidelines document.
Introducing mnconvert
General
The newly released mnconvert
is a
command-line tool that allows conversion of ISO STS and NISO STS XML content
into Metanorma AsciiDoc.
Instead of a simple XML transformation process, mnconvert
produces a
directory of files following best practices in Metanorma document editing,
providing split sections and an immediate toolset for re-publishing or
modification of content in the same ISO and IEC presentation.
mnconvert
is a Java program packed into a JAR file for execution.
Quick guidance on how to use mnconvert
is provided below.
Note
|
For further information on how to use the command, please see
mnconvert documentation.
|
Prerequisites
A system needs to meet the following prerequisites to run mnconvert
.
-
Presence of a Java JRE. Java typically comes with your system, with the official versions at Oracle (install Java from java.com) or from OpenJDK.
NoteIf you think you have the Java JRE already but unsure, execute the command java -version
to confirm. -
Download the latest JAR release of
mnconvert
from themnconvert
releases page;
If you wanted to extend mnconvert
, you will also need to install:
-
a Java Development Kit (JDK), with the most popular one OpenJDK;
-
the
make
andmaven
build tools, as per these instructions.
Using mnconvert
The command structure of mnconvert
is:
mnconvert
java -jar <location to mnconvert JAR file> <path to STS XML file> [options]
The default behavior of mnconvert
is to convert the given XML file to
Metanorma AsciiDoc:
-
The converted files will be placed at the same directory as the XML file.
-
The filename of the main document will be kept the same as in the provided XML file, but with the
.adoc
extension. -
Sections and images (if any) will be placed in separate subfolders, named
sections
andimages
, respectively, in accordance with Metanorma best practices in authoring (see samples).
Note
|
Optional parameters to mnconvert are listed and explained in the section
below.
|
Optional parameters for conversion
The full list of mnconvert
options are described at
mnconvert
documentation.
The following options are commonly used:
--output
or-o
-
to specify an output path different to the XML file.
--output-format
or-of
-
to specify the output format. Values are:
adoc
-
(default) Metanorma AsciiDoc output.
xml
-
Metanorma Presentational XML output.
--imagesdir
or-img
-
to specify an
:imagesdir:
attribute value which determines where the images will be extracted to. Defaults toimages
. --mathml
or-m
-
to specify the MathML version of the XML STS. Values are:
2
-
for MathML version 2
3
-
for MathML version 3
--check-type
or-ct
-
to perform a validation check of the XML file against ISO STS or NISO STS.
Values are:
xsd-niso
-
check against NISO XSD
dtd-niso
-
check against NISO DTD
iso-dtd
-
check against ISO DTD
Converting ISO STS and NISO STS into Metanorma AsciiDoc
General
While mnconvert
is the main processor in converting STS documents into
Metanorma AsciiDoc, there is a quality process surrounding conversion.
These are the major steps in converting an ISO STS or NISO STS document into Metanorma AsciiDoc:
-
Run
mnconvert
-
Validate and perform semantic checks
-
Generate converted output
Running mnconvert
For illustrative purposes, let’s assume that:
-
the JAR and XML files are located in the same folder;
-
we are using JAR release 1.56.0, i.e.
mnconvert-1.56.0.jar
; -
the XML STS file we want to convert is named
standard.xml
.
Taking these assumptions into account, the command becomes:
mnconvert
on standard.xml
java -jar mnconvert-1.56.0.jar standard.xml [options]
Validate and perform semantic checks
General
Once the initial conversion with mnconvert
tool is performed, it is likely
that still some adjustments need to be made to obtain a well-encoded
Metanorma AsciiDoc.
The reasons are given below:
-
ISO STS and NISO STS are "presentational" formats that are geared towards instructions to display or render content, the content itself does not come with detailed semantics. Metanorma, however, is a semantic publishing engine, which requires the user to encode content with proper semantics. This means that during the conversion, when
mnconvert
does not know the meaning of content, the user needs to supplement them post-conversion. -
ISO- and IEC- sourced XML files can contain errors and occasionally miss critical data, for example the encoding of the committees and working groups. Such missing data require manual supplementation.
-
In order to generate the document, the document needs to pass all Metanorma validation rules, so a remedial process is necessary.
Metanorma documentation will be our best ally in this chore.
The complexity of remediation depends on the source XML file itself,
but is typically a quick validation — given that mnconvert
has done
the majority of the work.
The following manual validation checks are useful in ensuring the correctness of the resulting document.
A quality checklist is provided at Annex: Conversion quality checklist for users to keep track of validation steps per conversion.
Check: Verify document attributes
Ensure the correctness of document attributes.
-
If the STS XML source is encoded properly, the document attributes will be correctly encoded.
-
Frequently the sourced XML files will not provide contribution information or publication attributes, such as publication date. Ensure that all attributes used for describing (sub)committees are available. Any missing information should be manually supplemented.
-
Metanorma supports conversion of STS files at any stage (at ISO DIS and IS stages utilize STS). Check whether the document is published as a
draft
or not.
Check: Remove unnecessary non-breaking spaces
XML STS documents tend to include numerous non-breaking spaces to avoid the split of continuity of some expressions by a line break.
Some of these non-breaking spaces are unnecessary in Metanorma encoding. These unnecessary non-breaking spaces should be removed.
For example:
-
white space in a percentage expression (e.g.,
75 %
) should not be broken by a line break; -
non-breaking spaces in reference declarations are not needed.
Check: Validate "Normative references" clause boilerplate
Verify if the initial text of the "Normative references" clause corresponds to the usual predefined text, or if it is any other different.
In the "Normative references" clause, the initial paragraph will be
encoded like this after using mnconvert
:
[NOTE,type=boilerplate]
====
The following referenced documents are indispensable for the application
of this document. For dated references, only the edition cited applies.
For undated references, the latest edition of the referenced document
(including any amendments) applies.
====
Occasionally, ISO and IEC documents will contain a non-standard (non-compliant to ISO/IEC DIR 2) boilerplate.
-
If the converted boilerplate is compliant to ISO/IEC DIR 2, we can remove this NOTE block, as this text is provided automatically by Metanorma.
-
If the text differs, it means that the ISO/IEC DIR 2 boilerplate was overridden. In which case, this markup should be retained.
Check: Validate "Terms and definitions" clause encoding
Verify the encoding of the "Terms and definitions" clause.
-
For a "Terms and definitions" clauses that imports terms from another document, it is advised to use
[source="REFERENCE_ANCHOR1,REFERENCE_ANCHOR_2"]
syntax before the clause definition to generate a predefined text for the "Terms and definitions" clause. -
When a "Terms and definitions" clause contains sub-clauses, care must be taken to ensure the correctness of the clause.
-
Concept mentions should be checked to ensure that terms are encoded in the
{{…}}
syntax.
Enhancement: Encoding of source code blocks
Encoding of any "source code" blocks should be checked.
Metanorma supports [source]
content blocks as containers for source code.
In ISO STS and NISO STS, there is no such provision for source code, they are
just treated as plain text with particular formatting styles.
-
Source code blocks mainly contain the code without any line breaks.
-
Insert necessary line breaks and indentations where needed (or, if there are too many lines, just copy and paste the whole code from the XML source)
-
Check whether all source code blocks are encoded properly with the
[source]
declaration. It is preferable to specify the programming language used in the code block. -
In some cases, source code blocks are applied with formatting. If any source code text is boldfaced, italicized, or underlined, the text should be surrounded by delimiters
{{{…}}}
for enabling markup. You also have the option of thesubs
attribute configured assubs="verbatim,quotes"
, to enable rich text inside code.
Check: Validate references in "Bibliography" and "Normative references"
Check the references in "Bibliography" and Normative references sections.
-
Occasionally, in ISO or IEC sourced STS documents, an identical reference can be listed in both sections, despite violating the rules of the ISO/IEC DIR 2.
If that is the case, remove the reference from the "Bibliography" and make sure that all the cross-references are then made with the anchor given in the "Normative references" section.
-
Ensure that there are no cross-references pointing to the document itself (it happens).
-
Ensure that correct identifiers are given for auto-fetching references.
Check: Verify cross-references
All cross-references should be validated.
Original text often refers to the bibliographic entries differently, which makes
it hard for mnconvert
to convert all of them perfectly. That is mostly
noticeable in references to some specific clause, or even paragraph from some
bibliographic entry. Therefore, make sure to avoid hanging clauses in
cross-references.
Anchors mostly use underscores for dividing separate words/numbers which
specify some reference. In the AsciiDoc files generated by mnconvert
, it is
possible that an anchor will include unnecessary whitespace before or after
an underscore, or two underscores instead of one. It is recommended to
confirm that the same anchor is used for the same reference throughout the
document and remove any needless whitespace to avoid broken cross-references.
Check: Validate document encoding
Perform a quick validation of the document’s encoding in general.
Ensure the document complies with the best practices for encoding a document in Metanorma AsciiDoc:
-
Remove non-ASCII characters;
NoteIt is a good practice to use regular expressions for finding non-ASCII characters. In this type of conversions, it is especially handy since it finds unnoticeable non-ASCII whitespaces. -
Remove
width
parameters in tables; -
Check
width
andheight
parameters given to images, most are not needed; -
Validate text encoding; some ISO and IEC sourced STS XML files originate from OCR, and they will contain OCR errors;
NoteIt is possible that characters are wrongly encoded in the source XML file, e.g. number 1
written asl
and vice versa. It is advised to check whether there are such typos in the converted AsciiDoc and to correct them. -
Optionally, split lines longer than 100 characters into shorter ones for ease of editing.
Enhancement: Re-encode math into stem
blocks
XML STS documents frequently contain math or numbers not in MathML but as textual encoding.
These entries can be migrated into the \$...\$
blocks used in
Metanorma for proper formatting.
Generating the resulting document
After the validation process, it is time to attempt generating the document through Metanorma.
$ metanorma site generate
Voila!
Conclusion
Metanorma now provides mnconvert
that helps users convert existing
ISO STS and NISO STS content into Metanorma for adoption or republication.
In fact, the same process can also be used to "normalize" STS XML files, since Metanorma also produces STS XML output, and some NSBs are finding that useful!
Annex: Conversion quality checklist
Conversion quality checklist:
-
Verify document attributes
-
Remove unnecessary non-breaking spaces
-
Validate "Normative references" clause boilerplate
-
Validate "Terms and definitions" clause encoding
-
Validate references in "Bibliography" and "Normative references"
-
Verify cross-references
-
Validate document encoding