Petter Reinholdtsen

How to figure out which RAID disk to replace when it fail
14th February 2012

Once in a while my home server have disk problems. Thanks to Linux Software RAID, I have not lost data yet (but I was close this summer :). But once a disk is starting to behave funny, a practical problem present itself. How to get from the Linux device name (like /dev/sdd) to something that can be used to identify the disk when the computer is turned off? In my case I have SATA disks with a unique ID printed on the label. All I need is a way to figure out how to query the disk to get the ID out.

After fumbling a bit, I found that hdparm -I will report the disk serial number, which is printed on the disk label. The following (almost) one-liner can be used to look up the ID of all the failed disks:

for d in $(cat /proc/mdstat |grep '(F)'|tr ' ' "\n"|grep '(F)'|cut -d\[ -f1|sort -u);
    printf "Failed disk $d: "
    hdparm -I /dev/$d |grep 'Serial Num'

Putting it here to make sure I do not have to search for it the next time, and in case other find it useful.

At the moment I have two failing disk. :(

Failed disk sdd1:       Serial Number:      WD-WCASJ1860823
Failed disk sdd2:       Serial Number:      WD-WCASJ1860823
Failed disk sde2:       Serial Number:      WD-WCASJ1840589

The last time I had failing disks, I added the serial number on labels I printed and stuck on the short sides of each disk, to be able to figure out which disk to take out of the box without having to remove each disk to look at the physical vendor label. The vendor label is at the top of the disk, which is hidden when the disks are mounted inside my box.

I really wish the check_linux_raid Nagios plugin for checking Linux Software RAID in the nagios-plugins-standard debian package would look up this value automatically, as it would make the plugin a lot more useful when my disks fail. At the moment it only report a failure when there are no more spares left (it really should warn as soon as a disk is failing), and it do not tell me which disk(s) is failing when the RAID is running short on disks.

Tags: english, raid.

Created by Chronicle v4.6