A few days ago, during a discussion in
EFN about interesting books to read
about copyright and the data retention directive, a suggestion to read
the 1968 short story Kodémus by
Tore Åge Bringsværd
came up. The text was only available in old paper books, and thus not
easily available for current and future generations. Some of the
people participating in the discussion contacted the author, and
reported back 2013-03-19 that the author was OK with releasing the
short story using a Creative
Commons license. The text was quickly scanned and OCR-ed, and we
were ready to start on the editing and typesetting.
As I already had some experience formatting text in my project to
provide a Norwegian version of the Free Culture book by Lawrence
Lessig, I chipped in and set up a
DocBook processing framework to
generate PDF, HTML and EPUB version of the short story. The tools to
transform DocBook to different formats are already in my Linux
distribution of choice, Debian, so
all I had to do was to use the
dblatex,
dbtoepub
and xmlto tools to do the
conversion. After a few days, we decided to replace dblatex with
xsltproc/fop (aka
docbook-xsl),
to get the copyright information to show up in the PDF and to get a
nicer <variablelist> typesetting, but that is just a minor
technical detail.
There were a few challenges, of course. We want to typeset the
short story to look like the original, and that require fairly good
control over the layout. The original short story have three
parts/scenes separated by a single horizontally centred star (*), and
the paragraphs do not contain only flowing text, but dialogs and text
that started on a new line in the middle of the paragraph.
I initially solved the first challenge by using a paragraph with a
single star in it, ie <para>*</para>, but it made sure a
placeholder indicated where the scene shifted. This did not look too
good without the centring. The next approach was to create a new
preprocessor directive <?newscene?>, mapping to "<hr/>"
for HTML and "<fo:block text-align="center"><fo:leader
leader-pattern="rule" rule-thickness="0.5pt"/></fo:block>"
for FO/PDF output (did not try to implement this in dblatex, as we had
switched at this time). The HTML XSL file looked like this:
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
<xsl:template match="processing-instruction('newscene')">
<hr/>
</xsl:template>
</xsl:stylesheet>
And the FO/PDF XSL file looked like this:
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
<xsl:template match="processing-instruction('newscene')">
<fo:block text-align="center">
<fo:leader leader-pattern="rule" rule-thickness="0.5pt"/>
</fo:block>
</xsl:template>
</xsl:stylesheet>
Finally, I came across the <bridgehead> tag, which seem to be
a good fit for the task at hand, and I replaced <?newscene?>
with <bridgehead>*</bridgehead>. It isn't centred, but we
can fix it with some XSL rule if the current visual layout isn't
enough.
I did not find a good DocBook compliant way to solve the
linebreak/paragraph challenge, so I ended up creating a new processor
directive <?linebreak?>, mapping to <br/> in HTML, and
<fo:block/> in FO/PDF. I suspect there are better ways to do
this, and welcome ideas and patches on github. The HTML XSL file now
look like this:
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
<xsl:template match="processing-instruction('linebreak)">
<br/>
</xsl:template>
</xsl:stylesheet>
And the FO/PDF XSL file looked like this:
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:template match="processing-instruction('linebreak)">
<fo:block/>
</xsl:template>
</xsl:stylesheet>
One unsolved challenge is our wish to expose different ISBN numbers
per publication format, while keeping all of them in some conditional
structure in the DocBook source. No idea how to do this, so we ended
up listing all the ISBN numbers next to their format in the colophon
page.
If you want to check out the finished result, check out the
source repository at
github
(future/new/official
repository). We expect it to be ready and announced in a few
days.