Petter Reinholdtsen

Entries from March 2017.

Free software archive system Nikita now able to store documents
19th March 2017

The Nikita Noark 5 core project is implementing the Norwegian standard for keeping an electronic archive of government documents. The Noark 5 standard document the requirement for data systems used by the archives in the Norwegian government, and the Noark 5 web interface specification document a REST web service for storing, searching and retrieving documents and metadata in such archive. I've been involved in the project since a few weeks before Christmas, when the Norwegian Unix User Group announced it supported the project. I believe this is an important project, and hope it can make it possible for the government archives in the future to use free software to keep the archives we citizens depend on. But as I do not hold such archive myself, personally my first use case is to store and analyse public mail journal metadata published from the government. I find it useful to have a clear use case in mind when developing, to make sure the system scratches one of my itches.

If you would like to help make sure there is a free software alternatives for the archives, please join our IRC channel (#nikita on and the project mailing list.

When I got involved, the web service could store metadata about documents. But a few weeks ago, a new milestone was reached when it became possible to store full text documents too. Yesterday, I completed an implementation of a command line tool archive-pdf to upload a PDF file to the archive using this API. The tool is very simple at the moment, and find existing fonds, series and files while asking the user to select which one to use if more than one exist. Once a file is identified, the PDF is associated with the file and uploaded, using the title extracted from the PDF itself. The process is fairly similar to visiting the archive, opening a cabinet, locating a file and storing a piece of paper in the archive. Here is a test run directly after populating the database with test data using our API tester:

~/src//noark5-tester$ ./archive-pdf mangelmelding/mangler.pdf
using arkiv: Title of the test fonds created 2017-03-18T23:49:32.103446
using arkivdel: Title of the test series created 2017-03-18T23:49:32.103446

 0 - Title of the test case file created 2017-03-18T23:49:32.103446
 1 - Title of the test file created 2017-03-18T23:49:32.103446
Select which mappe you want (or search term): 0
Uploading mangelmelding/mangler.pdf
  PDF title: Mangler i spesifikasjonsdokumentet for NOARK 5 Tjenestegrensesnitt
  File 2017/1: Title of the test case file created 2017-03-18T23:49:32.103446

You can see here how the fonds (arkiv) and serie (arkivdel) only had one option, while the user need to choose which file (mappe) to use among the two created by the API tester. The archive-pdf tool can be found in the git repository for the API tester.

In the project, I have been mostly working on the API tester so far, while getting to know the code base. The API tester currently use the HATEOAS links to traverse the entire exposed service API and verify that the exposed operations and objects match the specification, as well as trying to create objects holding metadata and uploading a simple XML file to store. The tester has proved very useful for finding flaws in our implementation, as well as flaws in the reference site and the specification.

The test document I uploaded is a summary of all the specification defects we have collected so far while implementing the web service. There are several unclear and conflicting parts of the specification, and we have started writing down the questions we get from implementing it. We use a format inspired by how The Austin Group collect defect reports for the POSIX standard with their instructions for the MANTIS defect tracker system, in lack of an official way to structure defect reports for Noark 5 (our first submitted defect report was a request for a procedure for submitting defect reports :).

The Nikita project is implemented using Java and Spring, and is fairly easy to get up and running using Docker containers for those that want to test the current code base. The API tester is implemented in Python.

Tags: english, nuug, offentlig innsyn, standard.
Detecting NFS hangs on Linux without hanging yourself...
9th March 2017

Over the years, administrating thousand of NFS mounting linux computers at the time, I often needed a way to detect if the machine was experiencing NFS hang. If you try to use df or look at a file or directory affected by the hang, the process (and possibly the shell) will hang too. So you want to be able to detect this without risking the detection process getting stuck too. It has not been obvious how to do this. When the hang has lasted a while, it is possible to find messages like these in dmesg:

nfs: server nfsserver not responding, still trying
nfs: server nfsserver OK

It is hard to know if the hang is still going on, and it is hard to be sure looking in dmesg is going to work. If there are lots of other messages in dmesg the lines might have rotated out of site before they are noticed.

While reading through the nfs client implementation in linux kernel code, I came across some statistics that seem to give a way to detect it. The om_timeouts sunrpc value in the kernel will increase every time the above log entry is inserted into dmesg. And after digging a bit further, I discovered that this value show up in /proc/self/mountstats on Linux.

The mountstats content seem to be shared between files using the same file system context, so it is enough to check one of the mountstats files to get the state of the mount point for the machine. I assume this will not show lazy umounted NFS points, nor NFS mount points in a different process context (ie with a different filesystem view), but that does not worry me.

The content for a NFS mount point look similar to this:

device /dev/mapper/Debian-var mounted on /var with fstype ext3
device nfsserver:/mnt/nfsserver/home0 mounted on /mnt/nfsserver/home0 with fstype nfs statvers=1.1
        opts:   rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=,mountvers=3,mountport=4048,mountproto=udp,local_lock=all
        age:    7863311
        caps:   caps=0x3fe7,wtmult=4096,dtsize=8192,bsize=0,namlen=255
        sec:    flavor=1,pseudoflavor=1
        events: 61063112 732346265 1028140 35486205 16220064 8162542 761447191 71714012 37189 3891185 45561809 110486139 4850138 420353 15449177 296502 52736725 13523379 0 52182 9016896 1231 0 0 0 0 0 
        bytes:  166253035039 219519120027 0 0 40783504807 185466229638 11677877 45561809 
        RPC iostats version: 1.0  p/v: 100003/3 (nfs)
        xprt:   tcp 925 1 6810 0 0 111505412 111480497 109 2672418560317 0 248 53869103 22481820
        per-op statistics
                NULL: 0 0 0 0 0 0 0 0
             GETATTR: 61063106 61063108 0 9621383060 6839064400 453650 77291321 78926132
             SETATTR: 463469 463470 0 92005440 66739536 63787 603235 687943
              LOOKUP: 17021657 17021657 0 3354097764 4013442928 57216 35125459 35566511
              ACCESS: 14281703 14290009 5 2318400592 1713803640 1709282 4865144 7130140
            READLINK: 125 125 0 20472 18620 0 1112 1118
                READ: 4214236 4214237 0 715608524 41328653212 89884 22622768 22806693
               WRITE: 8479010 8494376 22 187695798568 1356087148 178264904 51506907 231671771
              CREATE: 171708 171708 0 38084748 46702272 873 1041833 1050398
               MKDIR: 3680 3680 0 773980 993920 26 23990 24245
             SYMLINK: 903 903 0 233428 245488 6 5865 5917
               MKNOD: 80 80 0 20148 21760 0 299 304
              REMOVE: 429921 429921 0 79796004 61908192 3313 2710416 2741636
               RMDIR: 3367 3367 0 645112 484848 22 5782 6002
              RENAME: 466201 466201 0 130026184 121212260 7075 5935207 5961288
                LINK: 289155 289155 0 72775556 67083960 2199 2565060 2585579
             READDIR: 2933237 2933237 0 516506204 13973833412 10385 3190199 3297917
         READDIRPLUS: 1652839 1652839 0 298640972 6895997744 84735 14307895 14448937
              FSSTAT: 6144 6144 0 1010516 1032192 51 9654 10022
              FSINFO: 2 2 0 232 328 0 1 1
            PATHCONF: 1 1 0 116 140 0 0 0
              COMMIT: 0 0 0 0 0 0 0 0

device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc

The key number to look at is the third number in the per-op list. It is the number of NFS timeouts experiences per file system operation. Here 22 write timeouts and 5 access timeouts. If these numbers are increasing, I believe the machine is experiencing NFS hang. Unfortunately the timeout value do not start to increase right away. The NFS operations need to time out first, and this can take a while. The exact timeout value depend on the setup. For example the defaults for TCP and UDP mount points are quite different, and the timeout value is affected by the soft, hard, timeo and retrans NFS mount options.

The only way I have been able to get working on Debian and RedHat Enterprise Linux for getting the timeout count is to peek in /proc/. But according to Solaris 10 System Administration Guide: Network Services, the 'nfsstat -c' command can be used to get these timeout values. But this do not work on Linux, as far as I can tell. I asked Debian about this, but have not seen any replies yet.

Is there a better way to figure out if a Linux NFS client is experiencing NFS hangs? Is there a way to detect which processes are affected? Is there a way to get the NFS mount going quickly once the network problem causing the NFS hang has been cleared? I would very much welcome some clues, as we regularly run into NFS hangs.

Tags: debian, english, sysadmin.
How does it feel to be wiretapped, when you should be doing the wiretapping...
8th March 2017

So the new president in the United States of America claim to be surprised to discover that he was wiretapped during the election before he was elected president. He even claim this must be illegal. Well, doh, if it is one thing the confirmations from Snowden documented, it is that the entire population in USA is wiretapped, one way or another. Of course the president candidates were wiretapped, alongside the senators, judges and the rest of the people in USA.

Next, the Federal Bureau of Investigation ask the Department of Justice to go public rejecting the claims that Donald Trump was wiretapped illegally. I fail to see the relevance, given that I am sure the surveillance industry in USA believe they have all the legal backing they need to conduct mass surveillance on the entire world.

There is even the director of the FBI stating that he never saw an order requesting wiretapping of Donald Trump. That is not very surprising, given how the FISA court work, with all its activity being secret. Perhaps he only heard about it?

What I find most sad in this story is how Norwegian journalists present it. In a news reports the other day in the radio from the Norwegian National broadcasting Company (NRK), I heard the journalist claim that 'the FBI denies any wiretapping', while the reality is that 'the FBI denies any illegal wiretapping'. There is a fundamental and important difference, and it make me sad that the journalists are unable to grasp it.

Update 2017-03-13: Look like The Intercept report that US Senator Rand Paul confirm what I state above.

Tags: english, surveillance.
Norwegian Bokmål translation of The Debian Administrator's Handbook complete, proofreading in progress
3rd March 2017

For almost a year now, we have been working on making a Norwegian Bokmål edition of The Debian Administrator's Handbook. Now, thanks to the tireless effort of Ole-Erik, Ingrid and Andreas, the initial translation is complete, and we are working on the proof reading to ensure consistent language and use of correct computer science terms. The plan is to make the book available on paper, as well as in electronic form. For that to happen, the proof reading must be completed and all the figures need to be translated. If you want to help out, get in touch.

A fresh PDF edition in A4 format (the final book will have smaller pages) of the book created every morning is available for proofreading. If you find any errors, please visit Weblate and correct the error. The state of the translation including figures is a useful source for those provide Norwegian bokmål screen shots and figures.

Tags: debian, debian-handbook, english.
Unlimited randomness with the ChaosKey?
1st March 2017

A few days ago I ordered a small batch of the ChaosKey, a small USB dongle for generating entropy created by Bdale Garbee and Keith Packard. Yesterday it arrived, and I am very happy to report that it work great! According to its designers, to get it to work out of the box, you need the Linux kernel version 4.1 or later. I tested on a Debian Stretch machine (kernel version 4.9), and there it worked just fine, increasing the available entropy very quickly. I wrote a small test oneliner to test. It first print the current entropy level, drain /dev/random, and then print the entropy level for five seconds. Here is the situation without the ChaosKey inserted:

% cat /proc/sys/kernel/random/entropy_avail; \
  dd bs=1M if=/dev/random of=/dev/null count=1; \
  for n in $(seq 1 5); do \
     cat /proc/sys/kernel/random/entropy_avail; \
     sleep 1; \
0+1 oppføringer inn
0+1 oppføringer ut
28 byte kopiert, 0,000264565 s, 106 kB/s

The entropy level increases by 3-4 every second. In such case any application requiring random bits (like a HTTPS enabled web server) will halt and wait for more entrpy. And here is the situation with the ChaosKey inserted:

% cat /proc/sys/kernel/random/entropy_avail; \
  dd bs=1M if=/dev/random of=/dev/null count=1; \
  for n in $(seq 1 5); do \
     cat /proc/sys/kernel/random/entropy_avail; \
     sleep 1; \
0+1 oppføringer inn
0+1 oppføringer ut
104 byte kopiert, 0,000487647 s, 213 kB/s

Quite the difference. :) I bought a few more than I need, in case someone want to buy one here in Norway. :)

Update: The dongle was presented at Debconf last year. You might find the talk recording illuminating. It explains exactly what the source of randomness is, if you are unable to spot it from the schema drawing available from the ChaosKey web site linked at the start of this blog post.

Tags: debian, english.

RSS Feed

Created by Chronicle v4.6