Support for Japanese internationalisation
Introduction
Metanorma supports a number of flavours of standardisation documents, many of which use languages other than English. As a result, internationalisation of content is a core concern of Metanorma — particularly with automatically generated content, such as captions, crossreferences, and autonumbering.
Scripts other than Latin pose their own challenges in internationalisation, including RTL (right-to-left) scripts like Arabic and Hebrew, and CJK (Chinese, Japanese, Korean), ideographic scripts. Our recent move to support documents in Japanese has led to a good deal of effort in CJK scripts specifically. We have already written here on the work we have done with Ruby annotation. We summarise here some of our recent work.
JIS and Plateau
Metanorma has done some work in support of Guóbiāo (Chinese national) standards in the past, and it supports Chinese as one of the six working languages of the ITU, alongside Arabic, English, Spanish, French, and Russian. However, the most extensive work we have done on internationalisation has been with Japanese, promoted by our expansion of Metanorma to two flavours primarily using Japanese:
-
metanorma-jis
supports Japanese Industrial Standards (JIS), published by the Japanese Standards Association (JSA). The JSA as a national standards body coordinates with ISO and IEC, and its format is closely aligned to ISO. Its documents are published in both Japanese and English. -
metanorma-plateau
supports the Plateau project of the Japanese Ministry of Land, Infrastructure, Transport and Tourism. The flavour is implemented to derive frommetanorma-jis
, but overrides its formatting in several instances.
Vertical printing
The default for Japanese standardisation documents follows the Western convention of writing text left-to-right, top-down; this is particularly preferred as standardisation documents typically include mathematical formulas, and Western-language text. However, the traditional Japanese practice of writing Japanese top-to-bottom, right-to-left remains common, particularly in legal text. Metanorma is currently working on implementing vertical writing in CJK in the PDF format of JIS docuemnts, as a rendering option.
Japanese calender
Japan uses the Western Gregorian calendar alongside the traditional Japanese calendar, which uses regnal years for the year rather than Anno Domini dating. In official contexts, the Japanese calendar is used: that includes indication of when documents were created and published.
Metanorma uses ISO 8601, which is founded on the Gregorian calendar, to enter its dates as metadata; so the date
a document was published will be indicated as something like created-date: 2020-10-11
. When the date
is displayed in the document frontispiece, it is rendered in the Japanese calendar, as 令和二年10月11日
[Year 2 of the Reiwa era — the reign of emperor Naruhito; month 10 day 11].
Japanese numbering
As with vertical printing, Japanese standardisation documents typically use Arabic numerals for automated numbering (clause numbers, ordered list numbers) and in metadata (edition numbers, dates of publication). However more conservatively formatted documents such as legal documents, that tend to use vertical writing, also tend to use Japanese numbering (properly speaking, Chinese numbering) in those contexts. Metanorma has recently added functionality in JIS and Plateau to use Japanese numbering instead of Arabic numbering in those contexts. So by default, a Japanese document equivalent to
Published: 2020-10-11
*1. Introduction.
1.1. Scope.
The following topics are in scope of this document:
Japanese numbers.
Arabic numbers.
Conversion between Japanese and Arabic numbers.
would be:
公開日: 令和二年10月11日
1 はじめに。
1.1 範囲。
このドキュメントの範囲は以下のトピックです:
日本語の数字。
アラビア数字。
日本語とアラビア数字の変換。
If Japanese numbering is set:
:presentation-metadata-autonumbering-style: japanese
the document will instead look like:
公開日: 令和二年十月十一日
一 はじめに。
一・一 範囲。
このドキュメントの範囲は以下のトピックです:
一. 日本語の数字。 二. アラビア数字。 三. 日本語とアラビア数字の変換。
Note
|
the dot between numbers in clause numbers is a middle dot in Japanese numbering. |
Full-width punctuation
Punctiation in CJK scripts is different from that of Roman script, even where CJK has adopted Western punctuation. In order to fit in with the ideographic characters of Chinese, Japanese and Korean, the punctuation of CJK needs to be of the same width as an ideographic character ("full-width punctuation"). Automatically populated text in Metanorma will be automatically populated by default with Roman punctuation; any such punctuation needs to be adjusted to be full-width. The context where this is most apparent is in bibliographic references, which are populated by template out of a bibliographic database; but this also applies to cross-references and captions.
For instance, a list item cross-reference will by default end in a closing parenthesis: 1). If Japanese numbering is being used, that needs to be rendered not as 一), but as 一), with a full-width parenthesis.
Often in technical documents, Roman text and Arabic numberals are interspersed with CJK text. In such cases, full-width punctuation should not be applied when it adjoins Roman text, but only with CJK text. So in the example above, if Arabic numbering is used for lists, 1) should be left alone, and not converted to 1).
CJK carriage return
In Roman text, the carriage return at the end of a line in Asciidoc is interpreted as space; so a text entered as
Now is the time for all good men
to come to the aid of the party.
is reflowed in Metanorma XML (and thus Metanorma outputs) as
<p>Now is the time for all good men to come to the aid of the party.</p>
Space is used much more sparingly in CJK; as a result, a carriage return in CJK Asciidoc text is not interpreted as space; so
今こそ、すべての善良な人々が
政党を支援する時です。
is reflowed in Metanorma XML as
<p>今こそ、すべての善良な人々が政党を支援する時です。</p>
with no Roman or CJK space introduced between 人々が and 政党を.
However, as with punctuation, any lines ending with Roman text have the space respected:
実施は中村秀子氏と John
Smith 氏の間で交渉されました。
reflows to
<p>実施は中村秀子氏と John Smith 氏の間で交渉されました</p>
Extended space
In CJK scripts, titles consisting of only a few characters are rendered in extended spacing; so Foreword as a title is not rendered as 序文, but as 序 文. This behaviour has been implemented in Metanorma for all section titles consisting of four characters or less.