Petter Reinholdtsen

Typesetting a short story using docbook for PDF, HTML and EPUB
24th March 2013

A few days ago, during a discussion in EFN about interesting books to read about copyright and the data retention directive, a suggestion to read the 1968 short story Kodémus by Tore Åge Bringsværd came up. The text was only available in old paper books, and thus not easily available for current and future generations. Some of the people participating in the discussion contacted the author, and reported back 2013-03-19 that the author was OK with releasing the short story using a Creative Commons license. The text was quickly scanned and OCR-ed, and we were ready to start on the editing and typesetting.

As I already had some experience formatting text in my project to provide a Norwegian version of the Free Culture book by Lawrence Lessig, I chipped in and set up a DocBook processing framework to generate PDF, HTML and EPUB version of the short story. The tools to transform DocBook to different formats are already in my Linux distribution of choice, Debian, so all I had to do was to use the dblatex, dbtoepub and xmlto tools to do the conversion. After a few days, we decided to replace dblatex with xsltproc/fop (aka docbook-xsl), to get the copyright information to show up in the PDF and to get a nicer <variablelist> typesetting, but that is just a minor technical detail.

There were a few challenges, of course. We want to typeset the short story to look like the original, and that require fairly good control over the layout. The original short story have three parts/scenes separated by a single horizontally centred star (*), and the paragraphs do not contain only flowing text, but dialogs and text that started on a new line in the middle of the paragraph.

I initially solved the first challenge by using a paragraph with a single star in it, ie <para>*</para>, but it made sure a placeholder indicated where the scene shifted. This did not look too good without the centring. The next approach was to create a new preprocessor directive <?newscene?>, mapping to "<hr/>" for HTML and "<fo:block text-align="center"><fo:leader leader-pattern="rule" rule-thickness="0.5pt"/></fo:block>" for FO/PDF output (did not try to implement this in dblatex, as we had switched at this time). The HTML XSL file looked like this:

<?xml version='1.0'?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
  <xsl:template match="processing-instruction('newscene')">
    <hr/>
  </xsl:template>
</xsl:stylesheet> 

And the FO/PDF XSL file looked like this:

<?xml version='1.0'?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
  <xsl:template match="processing-instruction('newscene')">
    <fo:block text-align="center">
      <fo:leader leader-pattern="rule" rule-thickness="0.5pt"/>
    </fo:block>
  </xsl:template>
</xsl:stylesheet> 

Finally, I came across the <bridgehead> tag, which seem to be a good fit for the task at hand, and I replaced <?newscene?> with <bridgehead>*</bridgehead>. It isn't centred, but we can fix it with some XSL rule if the current visual layout isn't enough.

I did not find a good DocBook compliant way to solve the linebreak/paragraph challenge, so I ended up creating a new processor directive <?linebreak?>, mapping to <br/> in HTML, and <fo:block/> in FO/PDF. I suspect there are better ways to do this, and welcome ideas and patches on github. The HTML XSL file now look like this:

<?xml version='1.0'?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
  <xsl:template match="processing-instruction('linebreak)">
    <br/>
  </xsl:template>
</xsl:stylesheet> 

And the FO/PDF XSL file looked like this:

<?xml version='1.0'?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'
  xmlns:fo="http://www.w3.org/1999/XSL/Format">
  <xsl:template match="processing-instruction('linebreak)">
    <fo:block/>
  </xsl:template>
</xsl:stylesheet> 

One unsolved challenge is our wish to expose different ISBN numbers per publication format, while keeping all of them in some conditional structure in the DocBook source. No idea how to do this, so we ended up listing all the ISBN numbers next to their format in the colophon page.

If you want to check out the finished result, check out the source repository at github (future/new/official repository). We expect it to be ready and announced in a few days.

Tags: docbook, english, freeculture, opphavsrett.

Created by Chronicle v4.6