HTML and MS Word HTML Output
In order to create CSS stylesheets for the HTML and Microsoft Word HTML output of Metanorma, it is necessary to understand the structure of the HTML it generates.
HTML
Top-Level Structure
The head
of the HTML document contains a single stylesheet
(the :htmlstylesheet
parameter of HtmlConvert.new()
),
and a few script calls that are embedded in the Ruby code
(initialising jQuery, including webfonts).
The body
of the HTML document is divided into the following parts:
-
A title section (
<div class="title-section">
), comprising identifying information about the document, such as appears in a title page in print.-
The section is populated with an HTML template (the
:htmlcoverpage
parameter ofHtmlConvert.new()
). The information in this section is sourced from document metadata, rather than document content proper; the gem uses Liquid Template to populate the HTML template. Different fields usually have distinct class names for CSS styling; these can vary by gem. -
For example, ISO documents have
coverpage_docnumber
(for the document ID),coverpage_techcommittee
(for the technical committee responsible for the document),doctitle-en
(for the English-language title of the document),doctitle-fr
(for the French title),title, subtitle, part
(for the three components of the document title), andcoverpage_docstage
(for the stage of publication of the document).
-
-
A prefatory section (
<div class="prefatory-section">
), comprising predefined text which also does not come from document content proper. This is typically restricted to a copyright statement (<div class="copyright">
), contact details, and a table of contents<div id="toc">
.-
The section is also populated with a Liquid HTML template (the
:htmlintropage
parameter ofHtmlConvert.new()
). -
The table of contents in the HTML template is a placeholder; it is populated by a table of contents script included among the scripts loaded into the HTML body.
-
-
The main section of the document (
<main class="main-section">
), which is populated with the document content. -
Scripts. These are populated from a static file (the
:scripts
parameter ofHtmlConvert.new()
). These are expected to include MathJax, a Table of Contents generator, and a script for handling footnotes.
Body markup
Within the body of the document, different blocks and inline spans of the Metanorma document model (Metanorma Standoc XML, BasicDoc XML) are represented by different CSS classes, as follows:
Sections
- Symbols and abbreviated terms
-
<div class="Symbols">
(contents are a definition list) - Appendix title
-
<h1 class="Annex">
- Appendix, Bibliography, Introduction
-
<div class="Section3">
- Introduction title
-
<h1 class="IntroTitle">
- Foreword title
-
<h1 class="ForewordTitle">
- Deprecated term
-
<p class="DeprecatedTerms">
- Alternative term
-
<p class="AltTerms">
- Primary term
-
<p class="Terms">
- Term header
-
<p class="TermNum">
- Document title (in body)
-
<p class="zzSTDTitle1">
Blocks
- Note
-
<div class="Note">
- Note label
-
<span class="note_label">
- Figure
-
<div class="figure">
- Figure title
-
<span class="FigureTitle">
- Example
-
<table class="example">
or<div class="example">
- Example label
-
<span class="example_label">
- Sourcecode
-
<p class="Sourcecode">
- Admonition
-
<div class="Admonition">
- Formula
-
<div class="formula">
- Blockquote
-
<div class="Quote">
- Blockquote attribution
-
<p class="QuoteAttribution">
- Footnote
-
<aside class="footnote">
- Ordered list
-
<ol>
- Unordered list
-
<ul>
- Definition list
-
<dl>
- Normative reference
-
<p class="NormRef">
- Informative reference
-
<p class="Biblio">
- Table
-
<table>
- Table title
-
<p class="TableTitle">
- Table head
-
<thead>
- Table body
-
<tbody>
- Table foot
-
<tfoot>
Inline
- Hyperlink
-
<a>
- Cross-Reference
-
<a>
- Stem expression
-
<span class="stem">
- Small caps
-
<span style="font-variant:small-caps;">
- Emphasis
-
<i>
- Strong
-
<b>
- Superscript
-
<sup>
- Subscript
-
<sub>
- Monospace
-
<tt>
- Strikethrough
-
<s>
- Line Break
-
<br>
- Horizontal Rule
-
<hr>
- Page Break
-
<br>
(realised as page break in Word HTML)
Images
All images for an HTML document {filename}.html
are moved to the folder {filename}_htmlimages
, and renamed with GUIDs. This is to ensure that all images are available in the one location, making it easier to package the HTML output and upload it elsewhere.
Word HTML
Read a general intro to Microsoft Word output format.
Top-Level Structure
The headers and footers of a Word document are defined in Word HTML in a separate file, header.html
(the :header
parameter of WordConvert.new()
), which is included in the file manifest for the document. The header.html file is cross-referenced to the Word HTML CSS file, and contains a separate div
for each header and footer type; refer to the instances in the gems for illustration.
The head
of the Word HTML document contains two stylesheets (the :wordstylesheet
and :standardsheet
parameter of WordConvert.new()
). The :wordstylesheet
is intended as generic Word markup, while :standardsheet
is intended to contain styling specific to the standard. No scripts are supported in Word HTML.
The other elements of the Word HTML head are populated by the html2doc gem: a reference to a manifest of included files (specifically images and the header file), and settings to open the document in Print View at 100% magnification.
The body
of the Word HTML document is divided into the following parts:
-
A title section (
<div class="WordSection1">
), comprising identifying information about the document, such as appears in a title page in print.-
The section is populated with an HTML template (the
:wordcoverpage
parameter ofWordConvert.new()
). As with HTML, the information in this section is sourced from document metadata, rather than document content proper; and the gem uses Liquid Template to populate the HTML template.
-
-
A prefatory section (
<div class="WordSection2">
), comprising predefined text which does not come from document content proper (such as a Table of Contents shell), as well as prefatory material from the document content. The prefatory section is set in the CSS stylesheet to have Roman numerals for its pagination.-
Because of the requirement for Roman numerals, prefatory material from the document is sent to this section, whereas all document content in the HTML document is sent to the main section.
-
-
The main section of the document (
<div class="WordSection3">
), which is populated with the remaining document content. The main section is set in the CSS stylesheet to have Arabic numerals for its pagination. -
Optionally, a colophon (
<div class="colophon">
), which is populated with predefined text and/or document metadata. (Currently colophons in Metanorma gems appear only in Word output.)
Body markup
With the exception of the top-level document sections, discussed above, the Word HTML generated by the gem use the same CSS classes as the HTML proper. As already noted, the quirks of Word HTML CSS mean that classes need to be repeated on descendant elements that are not required in normal CSS.
The handling of footnotes and comments in Word HTML uses idiosyncratic Word HTML markup, including custom CSS, and is generated separately from their HTML counterparts in the gems.