Petter Reinholdtsen

Entries from June 2014.

From English wiki to translated PDF and epub via Docbook
17th June 2014

The Debian Edu / Skolelinux project provide an instruction manual for teachers, system administrators and other users that contain useful tips for setting up and maintaining a Debian Edu installation. This text is about how the text processing of this manual is handled in the project.

One goal of the project is to provide information in the native language of its users, and for this we need to handle translations. But we also want to make sure each language contain the same information, so for this we need a good way to keep the translations in sync. And we want it to be easy for our users to improve the documentation, avoiding the need to learn special formats or tools to contribute, and the obvious way to do this is to make it possible to edit the documentation using a web browser. We also want it to be easy for translators to keep the translation up to date, and give them help in figuring out what need to be translated. Here is the list of tools and the process we have found trying to reach all these goals.

We maintain the authoritative source of our manual in the Debian wiki, as several wiki pages written in English. It consist of one front page with references to the different chapters, several pages for each chapter, and finally one "collection page" gluing all the chapters together into one large web page (aka the AllInOne page). The AllInOne page is the one used for further processing and translations. Thanks to the fact that the MoinMoin installation on wiki.debian.org support exporting pages in the Docbook format, we can fetch the list of pages to export using the raw version of the AllInOne page, loop over each of them to generate a Docbook XML version of the manual. This process also download images and transform image references to use the locally downloaded images. The generated Docbook XML files are slightly broken, so some post-processing is done using the documentation/scripts/get_manual program, and the result is a nice Docbook XML file (debian-edu-wheezy-manual.xml) and a handfull of images. The XML file can now be used to generate PDF, HTML and epub versions of the English manual. This is the basic step of our process, making PDF (using dblatex), HTML (using xsltproc) and epub (using dbtoepub) version from Docbook XML, and the resulting files are placed in the debian-edu-doc-en binary package.

But English documentation is not enough for us. We want translated documentation too, and we want to make it easy for translators to track the English original. For this we use the poxml package, which allow us to transform the English Docbook XML file into a translation file (a .pot file), usable with the normal gettext based translation tools used by those translating free software. The pot file is used to create and maintain translation files (several .po files), which the translations update with the native language translations of all titles, paragraphs and blocks of text in the original. The next step is combining the original English Docbook XML and the translation file (say debian-edu-wheezy-manual.nb.po), to create a translated Docbook XML file (in this case debian-edu-wheezy-manual.nb.xml). This translated (or partly translated, if the translation is not complete) Docbook XML file can then be used like the original to create a PDF, HTML and epub version of the documentation.

The translators use different tools to edit the .po files. We recommend using lokalize, while some use emacs and vi, others can use web based editors like Poodle or Transifex. All we care about is where the .po file end up, in our git repository. Updated translations can either be committed directly to git, or submitted as bug reports against the debian-edu-doc package.

One challenge is images, which both might need to be translated (if they show translated user applications), and are needed in different formats when creating PDF and HTML versions (epub is a HTML version in this regard). For this we transform the original PNG images to the needed density and format during build, and have a way to provide translated images by storing translated versions in images/$LANGUAGECODE/. I am a bit unsure about the details here. The package maintainers know more.

If you wonder what the result look like, we provide the content of the documentation packages on the web. See for example the Italian PDF version or the German HTML version. We do not yet build the epub version by default, but perhaps it will be done in the future.

To learn more, check out the debian-edu-doc package, the manual on the wiki and the translation instructions in the manual.

Tags: debian, debian edu, docbook, english.
Hvordan enkelt laste ned filmer fra NRK med den "nye" løsningen
16th June 2014

Jeg har fortsatt behov for å kunne laste ned innslag fra NRKs nettsted av og til for å se senere når jeg ikke er på nett, men min oppskrift fra 2011 sluttet å fungere da NRK byttet avspillermetode. I dag fikk jeg endelig lett etter oppdatert løsning, og jeg er veldig glad for å fortelle at den enkleste måten å laste ned innslag er å bruke siste versjon 2014.06.07 av youtube-dl. Støtten i youtube-dl kom inn for 23 dager siden og versjonen i Debian fungerer fint også som backport til Debian Wheezy. Det er et lite problem, det håndterer kun URLer med små bokstaver, men hvis en har en URL med store bokstaver kan en bare gjøre alle store om til små bokstaver for å få youtube-dl til å laste ned. Rapporterte nettopp problemet til utviklerne, og antar de får fikset det snart.

Dermed er alt klart til å laste ned dokumentarene om USAs hemmelige avlytting og Selskapene bak USAs avlytting, i tillegg til intervjuet med Edward Snowden gjort av den tyske tv-kanalen ARD. Anbefaler alle å se disse, sammen med foredraget til Jacob Appelbaum på siste CCC-konferanse, for å forstå mer om hvordan overvåkningen av borgerne brer om seg.

Takk til gode venner på foreningen NUUGs IRC-kanal #nuug på irc.freenode.net for tipsene som fikk meg i mål.

Oppdatering 2014-06-17: Etter at jeg publiserte denne, ble jeg tipset om bloggposten "Downloading HD content from tv.nrk.no" av Ingvar Hagelund, som har alternativ implementasjon og tips for å lage mkv-fil med undertekstene inkludert. Kanskje den passer bedre for deg? I tillegg ble feilen i youtube-dl ble fikset litt senere ut på dagen i går, samt at youtube-dl fikk støtte for å laste ned undertitler. Takk til Anders Einar Hilden for god innsats og youtube-dl-utviklerne for rask respons.

Tags: multimedia, norsk, video, web.

RSS Feed

Created by Chronicle v4.6