Petter Reinholdtsen

From English wiki to translated PDF and epub via Docbook
17th June 2014

The Debian Edu / Skolelinux project provide an instruction manual for teachers, system administrators and other users that contain useful tips for setting up and maintaining a Debian Edu installation. This text is about how the text processing of this manual is handled in the project.

One goal of the project is to provide information in the native language of its users, and for this we need to handle translations. But we also want to make sure each language contain the same information, so for this we need a good way to keep the translations in sync. And we want it to be easy for our users to improve the documentation, avoiding the need to learn special formats or tools to contribute, and the obvious way to do this is to make it possible to edit the documentation using a web browser. We also want it to be easy for translators to keep the translation up to date, and give them help in figuring out what need to be translated. Here is the list of tools and the process we have found trying to reach all these goals.

We maintain the authoritative source of our manual in the Debian wiki, as several wiki pages written in English. It consist of one front page with references to the different chapters, several pages for each chapter, and finally one "collection page" gluing all the chapters together into one large web page (aka the AllInOne page). The AllInOne page is the one used for further processing and translations. Thanks to the fact that the MoinMoin installation on wiki.debian.org support exporting pages in the Docbook format, we can fetch the list of pages to export using the raw version of the AllInOne page, loop over each of them to generate a Docbook XML version of the manual. This process also download images and transform image references to use the locally downloaded images. The generated Docbook XML files are slightly broken, so some post-processing is done using the documentation/scripts/get_manual program, and the result is a nice Docbook XML file (debian-edu-wheezy-manual.xml) and a handfull of images. The XML file can now be used to generate PDF, HTML and epub versions of the English manual. This is the basic step of our process, making PDF (using dblatex), HTML (using xsltproc) and epub (using dbtoepub) version from Docbook XML, and the resulting files are placed in the debian-edu-doc-en binary package.

But English documentation is not enough for us. We want translated documentation too, and we want to make it easy for translators to track the English original. For this we use the poxml package, which allow us to transform the English Docbook XML file into a translation file (a .pot file), usable with the normal gettext based translation tools used by those translating free software. The pot file is used to create and maintain translation files (several .po files), which the translations update with the native language translations of all titles, paragraphs and blocks of text in the original. The next step is combining the original English Docbook XML and the translation file (say debian-edu-wheezy-manual.nb.po), to create a translated Docbook XML file (in this case debian-edu-wheezy-manual.nb.xml). This translated (or partly translated, if the translation is not complete) Docbook XML file can then be used like the original to create a PDF, HTML and epub version of the documentation.

The translators use different tools to edit the .po files. We recommend using lokalize, while some use emacs and vi, others can use web based editors like Poodle or Transifex. All we care about is where the .po file end up, in our git repository. Updated translations can either be committed directly to git, or submitted as bug reports against the debian-edu-doc package.

One challenge is images, which both might need to be translated (if they show translated user applications), and are needed in different formats when creating PDF and HTML versions (epub is a HTML version in this regard). For this we transform the original PNG images to the needed density and format during build, and have a way to provide translated images by storing translated versions in images/$LANGUAGECODE/. I am a bit unsure about the details here. The package maintainers know more.

If you wonder what the result look like, we provide the content of the documentation packages on the web. See for example the Italian PDF version or the German HTML version. We do not yet build the epub version by default, but perhaps it will be done in the future.

To learn more, check out the debian-edu-doc package, the manual on the wiki and the translation instructions in the manual.

Tags: debian, debian edu, docbook, english.

Created by Chronicle v4.6