Metanorma: Aequitate Verum

Introducing NIST PubID, and the migration of publication identifiers

Author’s picture Ronald Tse Author’s picture Artur Komarov on 09 Jan 2022

Summary

This article introduces the NIST PubID and describes the nist-pubid conversion tool for migrating existing NIST publication identifiers to the NIST PubID scheme.

Publication identifiers

Purpose

Publication identifiers are used to uniquely identifying publications, and they are everywhere.

Take the ISBN for example. The ISBN (International Standard Book Number), amongst a family of identifiers created by ISO/TC 46, provides a unique number for every title. The ISBN number is a numeric identifier of 10- or 13-digits, such as 978-0691097503 (ISBN-13) or 069109750X (ISBN-10).

The DOI (Digital Object Identifier) is another identification scheme (also by ISO/TC 46) that is commonly used for digital publications because of its PURL (Persistent URL) properties. A DOI identifier takes a form of such: 10.6028/NIST.SP.800-116r1, where the portion before the slash is a registered namespace (10.6028 represents NIST), and the portion after the slash is the publication identifier within the namespace. The DOI can be used as a PURL through the official DOI website, which provides a redirect of a DOI to a full URL.

Example 1. Example of DOI mechanism

Publication identifiers for standards

Some identifier schemes not only provide a unique reference to a publication, they also allow conveying additional information about the publication itself.

  • The ISBN identifier is a series of numbers that while are easily readable, a series of 10 or 13 digits are difficult to remember. The ISBN identifier is not meant to convey any information about the book itself other than providing a unique reference.

  • The DOI identifier is a string of characters. While the scheme does not require any further information than a random string of characters, many publishers have tried adding meaning to that particular string. For example, in the case of 10.6028/NIST.SP.800-116r1, one might be able to tell that this identifier is for the publication “NIST Special Publication 800-116 Revision 1” given some knowledge about the NIST SP series (even without knowing 10.6028 stands for NIST).

Standards organizations typically use organization-specific publication identifiers that convey additional information about their own publications such as:

  • the type of the standard

  • the development stage of the standard

  • obligation or the normative status of the standard

  • edition and/or publication date of the standard

  • revision information

For example, an ISO publication identifier like "ISO/DIS 34000:2021", describes that the standard is published by ISO, its number is "34000", the developmental stage is "DIS" (Draft International Standard), and published in 2021.

There are many complex standards identifiers that can be saved for another post!

Today we focus on the NIST PubID, which is a standard identifier for NIST Tech Pubs.

NIST PubID

Background

NIST has recently published a draft of their “Publication Identifier Syntax for NIST Technical Series Publications”, issued by the NIST Information Services Office.

This document describes a new syntax called “PubID” that can uniquely identify individual publications within the NIST Technical Series Publications (commonly referred to as “NIST Tech Pubs”).

Note
Hereinafter we refer to this syntax as the "NIST PubID".

The PubID scheme is jointly developed by the NIST Information Services Office and the NIST Computer Security Division.

A public comment period was held by NIST on PubID that concluded late 2021, and the Ribose/Metanorma team has had a chance to submit a formal response to the request for comments.

Note
The Ribose/Metanorma response is openly published at https://riboseinc.github.io/report-nist-pubid/ (source at GitHub).

Prior to PubID, identifiers of NIST Tech Pubs had some conventional structure but were not formally defined. Given that the National Bureau of Standards, the predecessor of NIST, first published what is now called the NIST Tech Pubs series of documents in 1901, there is a long history of inconsistent and specially assigned identifiers that did not fulfill the roles needed for an identifier for standards.

In addition, with the increased expectation in public transparency of standards in development, NIST is trying to publish and circulate earlier drafts outside of NIST and to the public for comments. Given that there are multiple development stages, it is important that every iteration of a standard document is uniquely referable and can be differentiated amongst its stages.

Purpose

The NIST PubID intends to consolidate all NIST Tech Pubs with a unified identification scheme that can convey standardization status information to its readers.

Based on the existing NIST publication identifier convention, the PubID team took inspiration from the ISO and IEC standard identifiers.

The eventual PubID scheme should work not only with newly created publications, but also the full collection of Tech Pubs — the full NIST Library catalog of 19,333 NIST Tech Publications dating from 1901 to 2022.

Note
The 19,333 figure is as of 2022-01-10 according to the NIST Tech Pubs XML metadata records published on GitHub.

NIST PubID styles

The NIST PubID addresses four aspects of document identifiers used in NIST Tech Pubs.

A traditional NIST Tech Pubs practice is to provide variants of the publication identifier inside the document itself. We call these variants “styles”.

Full style

used in the title and the bibliography for citations

Example 2. Example of full style PubID

"National Institute of Standards and Technology Special Publication 800-27, Revision A"

Abbreviated style

used in the "Authority" section

Example 3. Example of abbreviated style PubID

"Natl. Inst. Stand. Technol. Spec. Publ. 800-57 Part 1, Revision 4"

Short style

used for inline citations

Example 4. Example of short style PubID

"In Section 3.2 of SP 800-187…​"

In recent years, NIST Tech Pubs have been assigned individual DOIs, and the newly published documents often have their own DOI embedded within the documents. So we have a fourth variant:

Machine-readable (MR) style

used for the DOI and the DOI URL

Example 5. Example of machine-readable PubID

"NIST.SP.800-116r1"

One important goal of the NIST PubID is to be able to automatically generate and interchange any given variant into another, through a defined set of metadata data models.

This particular usage can be seen in the diagram from our response to NIST in their PubID comments solicitation period (Comments on the “Publication Identifier Syntax for NIST Technical Series Publications”).

PubID interchange and outputs
Figure 1. PubID core data elements and its rendered outputs

Core data elements

The PubID is an advanced attempt in encoding metadata that can be embedded within a human-readable identifier but also allow the machine extraction of them.

In order to make this happen, a core set of data elements are defined that are used to build the PubID.

These data elements include:

Publisher

NIST and its predecessor NBS have published documents under its own abbreviation.

Series

The publication series. There are at least 53 publication series in NIST Tech Pubs.

Stage

Some groups within NIST, such as the Computer Security Division, publish early drafts for external circulation and public preview/review. Having the standardization stage encoded allows reviewers to uniquely identify drafts for citations as well as prevent misidentification with final publications.

Report number

The identification of a publication within a NIST series.

Part

There are standards that are of multiple parts or volumes, and they should be identified as such.

Edition

Publications get revised and often get published in multiple editions. This element supports revision numbers, publication dates and versions.

Translation

Publications that are published in the non-English languages get assigned a specific code.

Update

The update number indicates that a publication has been updated since its first publication. In NIST Tech Pubs, an "updated" publication means it incorporates changes from previously published errata.

Note
In contrast with ISO or IEC, NIST typically does not publish individual corrigendum or errata Tech Pubs, instead, "updated" Tech Pubs that incorporate corrections are published.

Detailed information on these elements can be found at: https://github.com/metanorma/nist-pubid

Planning the migration to NIST PubIDs

Historic compatibility and testing

In order to adopt the NIST PubID scheme, one important aspect is to be able to retroactively apply the scheme to previously published documents, so that the users of the new scheme can identify legacy documents using the new scheme. That’s converting a total of 19,333 identifiers!

The NIST Library (thanks to Kathryn and Kate) has very helpfully published the raw data they have of the NIST Tech Pubs in XML format on GitHub (link: https://github.com/usnistgov/NIST-Tech-Pubs).

While data elements in the XML do not fully cover those needed for the NIST PubID scheme (it is a new scheme after all!), we can extract information from the existing publication identifiers and corresponding DOIs for the missing values.

One of the most visible changes will be in the series identifiers, where legacy series identifiers like “NISTIR” and “NISTGCR” will officially be relabeled as “NIST IR” and “NIST GCR”.

Assessing migration impact

To assess the impact of the change and demonstrate the visual differences between the pre-PubID and post-PubID identifiers, a conversion and bulk comparison tool is necessary.

In particular, we wish to do the following:

  1. Parse a NIST publication identifier into a NIST PubID object;

  2. If the NIST publication identifier does not contain sufficient information, parse the DOI and supplement that information into the PubID.

We also wish to generate a comparison table (e.g. CSV) to allow easy comparison between legacy and new PubID identifiers.

A conversion tool for NIST PubIDs: nist-pubid

Introduction

To generate the new NIST PubIDs for existing documents, since the required data elements required in the new PubID scheme are not consistently provided in current NIST document identifiers, it is necessary to utilize the full metadata information of those documents.

We implemented an open-source conversion tool that extracts the required PubID data elements from existing NIST Tech Pubs metadata, such as the legacy identifier, DOI, edition and publication date information, to generate the new PubID.

This tool is realized as a Ruby gem called nist-pubid.

nist-pubid provides a CLI (Command-Line Interface) and a Ruby library that can be used to create and manipulate PubID objects.

In this post we will show how to generate and convert NIST PubIDs through the CLI.

Installation

The only prerequisite is to have Ruby installed. Please refer to the official Ruby installation guide.

The nist-pubid tool can be installed as follows.

$ gem install nist-pubid

Now you should be able to use the nist-pubid command.

When called without arguments (or as nist-pubid help) the help screen will be shown.

$ nist-pubid
Commands:
  nist-pubid convert         # Convert legacy NIST Tech Pubs ID to NIST PubID
  nist-pubid help [COMMAND]  # Describe available commands or one specific command
  nist-pubid report          # Create report for NIST Tech Pubs database (fetches from GitHub)

Converting a legacy identifier to NIST PubID

The command nist-pubid provides a convert subcommand that converts a legacy Nist Tech Pubs identifier into the NIST PubID format.

Here’s how it can be used:

$ nist-pubid help convert
Usage:
  nist-pubid convert

Options:
  -s, [--style=STYLE]    # Convert to PubID style (short|long|mr|abbrev)
                         # Default: short
  -f, [--format=FORMAT]  # Render in format (JSON, string)
                         # Default: string

Convert legacy NIST Tech Pubs ID to NIST PubID
$ nist-pubid convert "NIST SP 800-53a"
NIST SP 800-53A
$ nist-pubid convert "NIST SP 800-57p1r3"
NIST SP 800-57pt1r3

The convert command also supports DOI conversion.

$ nist-pubid convert "NIST.SP.800-57p1r3"
NIST SP 800-57pt1r3

In addition to outputting PubID short style, we can also output other styles and formats of the resulting PubID.

$ nist-pubid convert -s mr "NIST SP 800-53a"
NIST.SP.800-53A
$ nist-pubid convert -s long -f json "NIST SP 800-53a" | jq
{
  "styles": {
    "short": "NIST SP 800-53A",
    "abbrev": "Natl. Inst. Stand. Technol. Spec. Publ. 800-53A",
    "long": "National Institute of Standards and Technology Special Publication 800-53A",
    "mr": "NIST.SP.800-53A"
  },
  "publisher": "NIST",
  "serie": "NIST SP",
  "code": "800-53A"
}

Generating the bulk NIST Tech Pubs migration report

This is the nice part — a single command that generates the full table of converted PubIDs from the NIST Tech Pubs database, comprising of 19,333 entries.

The report command can be used as follows:

$ nist-pubid help report
Usage:
  nist-pubid report

Options:
  [--csv], [--no-csv]  # Export to CSV format

Create report for NIST Tech Pubs database (fetches from GitHub)

The purpose of this command is to aid the NIST PubID team in assessing the impact and type of changes to be made in enacting this new scheme.

By default, the report command generates a table to indicate which migrated identifiers have changed, focusing on changes of two styles:

  • PubID in short style vs legacy publication ID

  • PubID in machine-readable style vs legacy DOI

As seen in the following output, a or a - will be shown in the appropriate column of change.

$ nist-pubid report
ID changed? | New PubID | Document ID | DOI changed? | New PubID-MR | DOI | Title
 - | NBS BH 1 | NBS BH 1 |  - | NBS.BH.1 | NBS.BH.1 | Recommended minimum requirements for small dwelling construction : report of Building Code Committee July 20, 1922
 - | NBS BH 10 | NBS BH 10 |  - | NBS.BH.10 | NBS.BH.10 | A city planning primer by the advisory committee on zoning appointed by Secretary Hoover
 ...
✅ | NBS BH 3A | NBS BH 3a | ✅ | NBS.BH.3A | NBS.BH.3a | A zoning primer by the advisory committee on zoning appointed by Secretary Hoover (Revised)
 - | NBS BH 4 | NBS BH 4 |  - | NBS.BH.4 | NBS.BH.4 | How to own your home : a handbook for prospective home owners
✅ | NBS BH 5A | NBS BH 5a | ✅ | NBS.BH.5A | NBS.BH.5a | A standard state zoning enabling act under which municipalities may adopt zoning regulations by the advisory committee on zoning appointed by Secretary Hoover (revised edition 1926)
...
✅ | NBS RPT 2751 | NBS report ; 2751 |  - | NBS.RPT.2751 | NBS.RPT.2751 | Stochastic search for the maximum of a function
 ...
✅ | NBS RPT 2831 | NBS report ; 2831 |  - | NBS.RPT.2831 | NBS.RPT.2831 | Error bounds for eigenvalues of symmetric integral equations

Better yet, the report command supports CSV output. The "changes" fields will display true or false accordingly.

$ nist-pubid report --csv
ID changed?,New PubID,Document ID,DOI changed?,New PubID-MR,DOI,Title
false,NBS BH 1,NBS BH 1,false,NBS.BH.1,NBS.BH.1,"Recommended minimum requirements for small dwelling construction : report of Building Code Committee July 20, 1922"
false,NBS BH 10,NBS BH 10,false,NBS.BH.10,NBS.BH.10,A city planning primer by the advisory committee on zoning appointed by Secretary Hoover
false,NBS BH 11,NBS BH 11,false,NBS.BH.11,NBS.BH.11,A standard city planning enabling act by the advisory committee on city planning and zoning appointed by secretary Hoover
...
true,NIST SP 260-214,NIST SP 260-14,false,NIST.SP.260-214,NIST.SP.260-214,"Analysis of Seafood Reference Materials: RM 8256, RM 8257, RM 8258 and RM 8259, Wild-Caught Coho Salmon (RM 8256), Aquacultured Coho Salmon (RM 8257), Wild-Caught Shrimp (RM 8258), Aquacultured Shrimp (RM 8259)"
false,NIST SP 260-14,NIST SP 260-14,false,NIST.SP.260-14,NIST.SP.260-14,"Analysis of Seafood Reference Materials: RM 8256, RM 8257, RM 8258 and RM 8259, Wild-Caught Coho Salmon (RM 8256), Aquacultured Coho Salmon (RM 8257), Wild-Caught Shrimp (RM 8258), Aquacultured Shrimp (RM 8259)"
true,NIST IR 8379,NISTIR 8379,false,NIST.IR.8379,NIST.IR.8379,Summary Report for the Virtual Workshop Addressing Public Comment on NIST Cybersecurity for IoT Guidance

The best part is that this CSV will work properly with spreadsheet editors like Excel and Pages. All you need is to export the CSV values to a CSV file, and open it in your favorite program.

$ nist-pubid report --csv > myreport.csv

It is easy to filter these columns in Microsoft Excel with the following steps:

  1. Open the CSV file in Excel

  2. Convert the header row into a filter row: first highlight the header row, then click on "Data > Filter"

  3. Filter the columns accordingly

PubID conversion report
Figure 2. PubID conversion report, showing mapping between legacy publication identifiers and NIST PubIDs

Now it’s easy to screen through the anomalies and surface the data issues!

Conclusion

NIST has taken an innovative first step in formalizing a standards publication identification scheme.

NIST PubID is a very well thought-out approach for implementing a standards publication identifier that works well for humans and machines. And we hope that it sets precedence for other SDOs to build their own documented identifier scheme based on the NIST experience.

We look forward to its finalization in 2022, and let’s see if other SDOs follow suit!

Special thanks

Special thanks to Jim Foti of the CSD, ITL, and Kathryn Miller and Kate Bucher of the ISO, Management Resources for developing the PubID scheme, and really appreciate the mention in the acknowledgments section!