Metanorma: Aequitate Verum

Validation of Metanorma XML output

General

Metanorma documents are compiled into the authoritative Metanorma Semantic XML format, which is validated against an XML schema for document structure, style rules around content.

Validation message output channels

Validation messages are output to the following channels:

  • console (standard error, STDERR);

  • an error log file (the filename of the current document, suffixed with .err.html).

Validation log structure

All errors that are logged by Metanorma have the following properties:

Error Category (see Validation error classes)

An error category can include multiple error messages

Error ID

a single error message can be reported multiple times in a document, for different locations in the document

Severity

each error message is assigned a different level of severity

Location

some error messages are reported against one or more specific locations in the document

In the error log file, messages are shown by:

The error log file has the following structure:

Error class listing

Error class

Line number in Ascidoc source (where available)

ID of element in Metanorma XML (where available)

Error (identifier)

Message

Context (by default, Metanorma XML)

Severity

For example:

  • Bibliography: Severity 1: 1 error

  • Anchors: Severity 1: 1 error

  • Metanorma XML Syntax: Severity 2: 3 errors

Bibliography

Line ID Error Message Context Severity

000012

Anchor1

STANDOC_12

Reference RNP is missing a document identifier (docid)

<clause id="Anchor1" inline-header="false" bibitem="true" obligation="informative">

<dl id="_8923fd28-1ac2-4b9a-94ea-d2b5efd4a467"> <dt>id</dt>

<dt>contributor</dt>

1

Anchors

Line ID Error Message Context Severity

00009

Anchor2

STANDOC_38

Cross-reference target iso123 is undefined

<xref target="iso123"/>

1

Metanorma XML Syntax

Line ID Error Message Context Severity

XML Line 000005:11

STANDOC_7

element "copyright" incomplete; missing required element "owner"

2

Error structure

Severity

There are four severity values, 0 through 3:

Severity 0
  • Causes execution to abort.

  • Alerts about fatal errors are also output to the console.

  • Must be addressed to get documents to compile.

Severity 1
  • Should be addressed to ensure document correctness.

Severity 2
  • Less critical but should be investigated if the document is visibly wrong.

Severity 3
  • Information-only warnings.

These severity values are applicable to all flavours of Metanorma.

Location

Errors are identified at different stages of Metanorma processing, and because of that, not all of them can be pinpointed to a particular location in the document in the same way.

  • Some errors apply to the document as a whole, and cannot be pinned down to one location in the text; for example, there is an inconsistency in how the document is structured, or in its metadata as expressed through document attributes

  • Some errors are identified while the document Asciidoc source is being processed. These are where possible identified against the line number of the Asciidoc document. In cases where the source document has been broken up into multiple documents through include::[], the line number is that of the resolved, concatenated single Asciidoc document; this is output to the file *.asciidoc.log.txt, and you can check that document to work out what the line number refers to.

  • Some Asciidoc-level errors instead need to be reported against the structure of the Asciidoc document, by identifying the Asciidoc section (clause) containing the error. These are reported as Section: …​.

  • Errors involving parsing the generated Metanorma XML against the Metanorma XML schema are identified by line number of the Metanorma XML document; these are reported as XML Line …​.

  • Most errors are identified against the nearest node anchor in the generated XML (or the user-provided anchor in the source Asciidoc, which is the same); if no user-provided anchor is available, the GUID automatically generated by Metanorma is provided instead. Because these anchors can be tracked in the generated HTML output, those errors are hyperlinked from the error log to the corresponding location in the generated HTML output for the document.

Error ID

All errors logged by Metanorma have an identifier, consisting of the flavor of Metanorma, then underscore, then by a number. These Error IDs are given in the error log.

ISO_7, IEEE_15.

Errors that are generic to Metanorma are prefixed by the gem generating them.

STANDOC_7, RELATON_4, METANORMA_2, ISODOC_1.

The available errors for a Metanorma flavour can also be reviewed by the following command:

$ metanorma -L -t {flavor}`

Where,

{flavor}

is the Metanorma flavour being used

The command metanorma -L -t iso will list the available error messages by error class for the ISO flavour of Metanorma.

Note
Some error messages are parameterized as templates, with blanks filled in specific to a location; these are indicated in the raw error messages displayed by %s.
Excerpt of ISO style messages
ISO_44 : Single terms clause in vocabulary document should have normal Terms and definitions heading
ISO_45 : Multiple terms clauses in vocabulary document should have 'Terms related to' heading
ISO_46 : 'see %s' is pointing to a normative section

Validation error classes

General

Each error class is a category of error that Metanorma can detect. The error classes are not exhaustive, and new error classes may be added as new functionality is added to Metanorma.

Style validation error classes

Style

These are typically specific to the SDO, and reflect requirements on content set by the SDO editorial group. These issues will not prevent compilation, and they are not always correct, but they do catch the kinds of issues that SDO editorial review is supposed to identify.

For example, ISO Content style validation lists the ISO-specific content style rules that Metanorma warns about when compiling ISO documents, derived from ISO/IEC DIR 2 and from the ISO House Style specification.

Markup validation error classes

Markup issues are typically more serious than style issues, and may prevent the document from being well-formed.

Markup issues usually need to be resolved for the document to be properly compiled.

Note
Deciphering what has gone wrong with markup issues may take more effort than style issues.
Anchors

Issue with identifiers of document elements, or resources (including URIs)

Severity 0 (fatal)
STANDOC_8

Malformed URL

STANDOC_36

Duplicate, ambiguous anchor in file

AsciiDoc Input

Issue with AsciiDoc markup, likely to prevent parsing of document

Bibliography

Issue with bibliographic markup

Severity 0 (fatal)
STANDOC_9

Nominated attachment file does not exist

STANDOC_19

Missing reference in local Relaton data source file

STANDOC_37

Invalid format of local Relaton data source file

STANDOC_52

Error in specification of bibliographic annotation spans

STANDOC_54

Missing local Relaton data source file; see Importing bibliographic records from other formats.

Relaton

Issue with externally fetched bibliographic record, via the Relaton software library

Severity 0 (fatal)
RELATON_1

Fatal error in the Relaton software library

RELATON_5

Reference to an IEV term (International Electrochemical Vocabulary) that does not exist; see Sourcing concepts from termbases.

Cross-references

Issue with cross-reference to document elements

Severity 0 (fatal)
STANDOC_3

Invalid specification of index term (too many attributes, suggests missing quotation marks around a term containing a comma)

STANDOC_31

Illegal connective between cross-references (other than and, or, from, to)

STANDOC_47

Mismatch of callouts and annotations on sourcecode snippet

Document Attributes

Issue with content of AsciiDoc document attributes

Images

Issue with images

Severity 0 (fatal)
STANDOC_44

Image file not found

STANDOC_46

Image file too large to be encoded as Data URI

Include

Issue with includes

Severity 0 (fatal)
STANDOC_41

The specified file indicated in the include command does not exist.

Note

It is important to note that "block comments" (comments delimited by ////) do not comment out the include command.

If an include command is given in a block comment, the include command will still be processed and the contents included in the commented out text. This means that if the included file does not exist, the "missing include file" error will be raised, as Metanorma is more strict in enforcing the existence of included files than a typical AsciiDoc processor.

To prevent bad includes from aborting execution, either:

  • skip checking for fatal errors entirely by putting a :novalid: document attribute in the document; or

  • comment out the include command with a "line comment" (a line starting with //) instead of a "block comment", as follows:

     // include:missing-file[]

    instead of

     ////
     include:missing-file[]
     ////
STANDOC_1

Specified boilerplate file does not exist (:boilerplate-authority:) [added in https://github.com/metanorma/metanorma-standoc/releases/tag/v3.0.7].

Maths

Issue with mathematical expressions

Severity 0 (fatal)
STANDOC_6

Malformed MathML expression (whether entered as MathML, or after being converted from any math syntax)

STANDOC_33

Invalid MathML expression

Requirements

Issue with Metanorma requirements markup

Severity 0 (fatal)
MODSPEC_3

(In Modspec) requirement identifier is used more than once

Table

Issue with syntax of table declarations.

Severity 0 (fatal)
STANDOC_2

Empty table

STANDOC_4

Inconsistent number of rows specified (rowspan)

STANDOC_5

Inconsistent number of columns specified (colspan)

Terms

Issue with syntax in the terms and definitions clauses.

Severity 0 (fatal)
STANDOC_23

Concept markup ({{…​}}) points to something which is not a term or symbol

STANDOC_25

Designation markup (preferred:[], admitted:[], deprecated:[]) used in a clause not recognised as a terms clause

Metanorma XML Syntax

Issue with validation of Metanorma Semantic XML.

Severity 0 (fatal)
STANDOC_42

Passthrough markup has been specified as Metanorma XML (with no format attribute), but it contains non-Metanorma elements. If a different XML format is intended, format= should be used. [added in https://github.com/metanorma/metanorma-standoc/releases/tag/v3.0.5]

Severity 2 (info)

These errors deal with such things as restrictions on what kinds of text can appear where, pointers within the document that are orphaned, and elements that appear in the wrong sequence.

Metanorma will usually generate HTML and Word output despite the presence of those errors.

These errors can proliferate as the schema is quite strict, and should be investigated only when the document is visibly wrong.

Filtering

Global

The error file can get quite large, and it is possible to filter certain classes of log messages from the error log:

Example 1. Filtering out all messages of severity 3 and all messages from the categories
:log-filter-severity: 3
:log-filter-category: Cross-references,Document Attributes,Metanorma XML Syntax
:log-filter-error-ids: STANDOC_12,ISO_7

By location

It is also possible to filter error messages by location in the generated XML file, by reference to user-defined anchors (which persist in the XML file).

  • The filtering only applies to error messages which are specific to locations identified by anchor. It will not filter out errors in XML validation (which are located by XML line); errors that are identified in Asciidoc processing outside of specific nodes (e.g. include errors); or global to the document and thus have no specific location.

  • Filtering only applies to the generated log file, and it takes place when that file is output to disk. Many errors are also displayed to console as they are encountered; this filtering will not prevent that from happening.

The directive to filter error messages out by location is embedded in the Asciidoc document as a reviewer comment` of type ignore-log. These comments are removed from the generated Metanorma output, and only apply to the Metanorma log. As with reviewer comments in general, the from argument of the comment specifies the node to which the filter applies, and the optional to argument specifies the final node in a range of nodes to which the filter applies. If a node is specified for a filter, the filter applies to all child nodes of that node.

So for a document that looks like:

[[clause1]]
== Initial clause

[[clause11]]
=== First subclause

[[clause12]]
=== Second subclause

[[clause2]]
== Second clause

[[clause21]]
=== First subclause of Second clause

[[clause3]]
== Third clause

[[clause31]]
=== First subclause of Third clause

[[clause32]]
=== Second subclause of Third clause
  • from=clause1 applies to "Initial clause" and its subclauses, but not "Second clause" or "Third clause"

  • from=clause1,to=clause31 applies to all nodes inclusive between "Initial clause" and "First subclause of Third clause"— including "Second clause". It does not apply to "Second subclause of Third clause".

If the review comment is empty, then all reported errors specific to the identified range will be skipped in the generated error log:

[from=Clause1,type=ignore-log]
****
****

If the review comment contains a comma-delimited list of Error IDs (Error ID), only those errors will be skipped:

[from=Clause1,type=ignore-log]
****
STANDOC_39, STANDOC_38
****