Metanorma: Aequitate Verum

Ruby encoding for East Asian languages now available

Author’s picture Nick Nicholas Author’s picture Ronald Tse on 19 Dec 2023

Introduction

Metanorma supports the authoring of Japanese Industrial Standards (JIS).

In order to consolidate its support of East Asian languages such as Japanese, Metanorma has started supporting Ruby annotation of text.

Ruby characters

Ruby characters are small annotations that appear alongside characters in East Asian languages. They are primarily used to indicate the pronunciation of characters, and are particularly common in that function in Japanese, where they are referred to as furigana:

2023 12 19 1
Figure 1. Example of furigana used to annotate the Japanese kanji for "Tokyo" with pronunciation guides.

Because annotations are meant to explain characters, they are not themselves expected to be ideographic characters, and are instead given in syllabaries or alphabets associated with phonetic guides, such as Hiragana, Katakana, or Romaji (Latin script) in Japanese, and Pinyin or Bopomofo in Chinese.

Ruby representing semantic information or pronouncing instructions

It is possible to annotate characters with text indicating the meaning instead of, or as well as, the pronunciation of the character.

Example 1. Example of semantic annotation in Ruby: 親友 shin’yū "close friend" annotated with the English loanword ライバル raibaru "rival", to mean "a rival who is also friend" (cf. English frenemy)
2023 12 19 2

Double-sided ruby annotations

It is possible for double-sided annotations to have different scope. In the example above, mamo annotates only the Kanji 護, while protego annotates the full word 護れ.

And annotations can be double-sided, with an annotation either side of the characters being annotated:

Example 2. Example of "double-sided" Ruby, 護れ mamore "protect" annotated phonetically in two languages: マモ mamo in Japanese, and プロテゴ protego in English
2023 12 19 3

In the following example, each character is phonetically glossed: 東 as とう , 南 as なん nan, But the entire name is glossed as たつみ tatsumi: tōnan is the expect Japanese pronunciation of 東南 "southeast", but tatsumi is an archaic Japanese word for "southeast", and therefore a legitimate reading of the same two characters.

Example 3. tō + nan = tatsumi no hōgaku: "in the direction of the southeast"
2023 12 19 4

Digital support for ruby characters

Ruby support has proven quite challenging in digital typography:

  • HTML

  • CSS has received a much greater role in CSS support of Ruby than was originally envisioned. Browser support of Ruby styling remains uneven as of this writing (e.g. for ruby-position).

  • Microsoft Word does not natively support double-sided Ruby annotations: they presuppose only one annotation per set of characters.

  • PDF generation tools, such as Apache FOP and XSL-FO, do not natively support Ruby at all.

Metanorma implementation of ruby characters

Requirements

Metanorma requirements on the implementation of ruby characters are as follows:

  • provide a model for encoding ruby characters that is not tethered to an obsolete or overly complicated encoding model;

  • enables semantic treatment and processing of annotations;

  • allows a reasonable range of rendering options.

In terms of feature support, specifically:

  • supports distinction between semantic and phonetic annotations

  • supports identification of script and language of annotation

  • supports double-sided ruby characters

  • supports partly-overlapping annotations

While edge cases will require full annotation markup and bookmarks to succeed, the model we have arrived at provides reasonable coverage of any ruby characters likely to arise in standards documents such as Japanese Industrial Standards.

Encoding syntax

In the simplest case, Ruby annotations are marked up as:

Syntax for encoding ruby characters
ruby:{annotation}[{annotated character(s)}]

Simple ruby characters

The following are how the phonetic guides to "Tokyo" above would be marked up:

Example 4. Example of encoding simple ruby characters
ruby:とうきょう[東京]
ruby:トウキョウ[東京]
ruby:Tōkyō[東京]

Per character ruby

Annotations can be broken down per character:

Example 5. Example of encoding ruby character annotations broken down character levels
ruby:とう[東]ruby:きょう[京]
ruby:トウ[東]ruby:キョウ[京]
ruby:Tō[東]ruby:kyō[京]

Double-sided ruby characters

Double-sided ruby is supported, with ruby:[] macro nesting. This approach is consistent with HTML 5, and is the only approach supported in Living HTML.

In any nesting of AsciiDoc macros, the closing bracket of the nested macro instance needs to be escaped.

The syntax for double-sided ruby is therefore:

Syntax for double-sided ruby
ruby:[ ... ruby:[...\] ... ]

Metanorma assumes that in double-sided ruby, the outer annotation appears before the characters annotated, and the inner annotation appears after them.

Note
In Word, where double-sided Ruby is not supported, the inner annotation appears after the characters in brackets, as a workaround.
Encoding double-sided ruby from above examples
ruby:プロテゴ[ruby:まも[護\]{blank}れ]!
ruby:たつみ[ruby:とう[東\]ruby:なん[南\]]

Encoding additional information to the ruby characters

Additional information can be provided optionally for the ruby characters.

  • script code: The ISO 15924 code can be entered with the prefix script=.

    Example 6. Example of encoding ruby characters with script code
    ruby:とうきょう[script=Hira,東京]
  • language code: The ISO 639 code is entered with the prefix lang=.

    Example 7. Example of encoding ruby characters with language code
    ruby:Tōkyō[lang=ja,script=Latn,東京]
    ruby:トウキョウ[script=Kana,lang=ja,東京]
  • type of ruby: Either pronunciation (default) or annotation.

    Example 8. Example of encoding ruby characters indicated as annotation
    ruby:しんゆ[親友] // by default type `pronunciation`
    ruby:しんゆ[type=pronunciation,親友]
    ruby:ライバル[type=annotation,親友]

Conclusion

Metanorma now supports encoding of ruby characters with a mature implementation, and this functionality is now available across all Metanorma flavors.