Ruby encoding for East Asian languages now available
Introduction
Metanorma supports the authoring of Japanese Industrial Standards (JIS).
In order to consolidate its support of East Asian languages such as Japanese, Metanorma has started supporting Ruby annotation of text.
Ruby characters
Ruby characters are small annotations that appear alongside characters in East Asian languages. They are primarily used to indicate the pronunciation of characters, and are particularly common in that function in Japanese, where they are referred to as furigana:
Because annotations are meant to explain characters, they are not themselves expected to be ideographic characters, and are instead given in syllabaries or alphabets associated with phonetic guides, such as Hiragana, Katakana, or Romaji (Latin script) in Japanese, and Pinyin or Bopomofo in Chinese.
Ruby representing semantic information or pronouncing instructions
It is possible to annotate characters with text indicating the meaning instead of, or as well as, the pronunciation of the character.
Double-sided ruby annotations
It is possible for double-sided annotations to have different scope. In the example above, mamo annotates only the Kanji 護, while protego annotates the full word 護れ.
And annotations can be double-sided, with an annotation either side of the characters being annotated:
In the following example, each character is phonetically glossed: 東 as とう tō, 南 as なん nan, But the entire name is glossed as たつみ tatsumi: tōnan is the expect Japanese pronunciation of 東南 "southeast", but tatsumi is an archaic Japanese word for "southeast", and therefore a legitimate reading of the same two characters.
Digital support for ruby characters
Ruby support has proven quite challenging in digital typography:
-
HTML
-
The initial W3C specification on HTML support of Ruby (2001) has proven very complex to implement.
-
Even the HTML5 approach (2016) has not been taken up by browsers.
-
The contemporary approach taken in the Living HTML specification is drastically simpler; see The situation with
<ruby>
(2020) for details on the challenges encountered.
-
-
CSS has received a much greater role in CSS support of Ruby than was originally envisioned. Browser support of Ruby styling remains uneven as of this writing (e.g. for
ruby-position
). -
Microsoft Word does not natively support double-sided Ruby annotations: they presuppose only one annotation per set of characters.
-
PDF generation tools, such as Apache FOP and XSL-FO, do not natively support Ruby at all.
Metanorma implementation of ruby characters
Requirements
Metanorma requirements on the implementation of ruby characters are as follows:
-
provide a model for encoding ruby characters that is not tethered to an obsolete or overly complicated encoding model;
-
enables semantic treatment and processing of annotations;
-
allows a reasonable range of rendering options.
In terms of feature support, specifically:
-
supports distinction between semantic and phonetic annotations
-
supports identification of script and language of annotation
-
supports double-sided ruby characters
-
supports partly-overlapping annotations
While edge cases will require full annotation markup and bookmarks to succeed, the model we have arrived at provides reasonable coverage of any ruby characters likely to arise in standards documents such as Japanese Industrial Standards.
Encoding syntax
In the simplest case, Ruby annotations are marked up as:
ruby:{annotation}[{annotated character(s)}]
Simple ruby characters
The following are how the phonetic guides to "Tokyo" above would be marked up:
ruby:とうきょう[東京]
ruby:トウキョウ[東京]
ruby:Tōkyō[東京]
Per character ruby
Annotations can be broken down per character:
ruby:とう[東]ruby:きょう[京]
ruby:トウ[東]ruby:キョウ[京]
ruby:Tō[東]ruby:kyō[京]
Double-sided ruby characters
Double-sided ruby is supported, with ruby:[]
macro nesting. This approach is
consistent with HTML 5, and is the only approach supported in Living HTML.
In any nesting of AsciiDoc macros, the closing bracket of the nested macro instance needs to be escaped.
The syntax for double-sided ruby is therefore:
ruby:[ ... ruby:[...\] ... ]
Metanorma assumes that in double-sided ruby, the outer annotation appears before the characters annotated, and the inner annotation appears after them.
Note
|
In Word, where double-sided Ruby is not supported, the inner annotation appears after the characters in brackets, as a workaround. |
ruby:プロテゴ[ruby:まも[護\]{blank}れ]!
ruby:たつみ[ruby:とう[東\]ruby:なん[南\]]
Encoding additional information to the ruby characters
Additional information can be provided optionally for the ruby characters.
-
script code: The ISO 15924 code can be entered with the prefix
script=
.Example 6. Example of encoding ruby characters with script coderuby:とうきょう[script=Hira,東京]
-
language code: The ISO 639 code is entered with the prefix
lang=
.Example 7. Example of encoding ruby characters with language coderuby:Tōkyō[lang=ja,script=Latn,東京] ruby:トウキョウ[script=Kana,lang=ja,東京]
-
type of ruby: Either
pronunciation
(default) orannotation
.Example 8. Example of encoding ruby characters indicated asannotation
ruby:しんゆ[親友] // by default type `pronunciation` ruby:しんゆ[type=pronunciation,親友] ruby:ライバル[type=annotation,親友]
Conclusion
Metanorma now supports encoding of ruby characters with a mature implementation, and this functionality is now available across all Metanorma flavors.