Feeding bad block reported by the kernel into e2fsck -l

Intro

Today I did again a backup of a machine that has served me well for years. As it happens with old hardware, apparently some hard drive sectors have gone bad over time. These messages I got in syslog:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=23181294, sector=23181191
ide: failed opcode was: unknown
junge end_request: I/O error, dev hda, sector 23181191
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=23181294, sector=23181199
ide: failed opcode was: unknown
end_request: I/O error, dev hda, sector 23181199

These sectors are “bad sectors” and they should be connected to the bad-inode of the ext3 filesystem the machine is using. A quick Google revealed tons of tutorials how to run badblocks to determine bad sectors, but I didn’t find one that tells you how to add these reported ones. So here it comes.

Very little bit background info on how ext3 sits on your hard-drive

A very good introduction is http://www2.uic.edu/~aciani1/sector_blues.html. Below is my own humble explanation in case the page is gone.

Your hard drive is abstracted by your kernel as a long file. In my case that file is /dev/hda. Due to the nature of how hard-drives work, the unit you access are sectors and they are typically 512 bytes long. That means if you want to modify byte 1 of your file, you have to read the first sector (that contains bytes 0 to 511) modify your byte one and then write that whole sector (meaning writing bytes 0 to 511) back to the disk. Although only one byte was to change we read 512 bytes in and wrote 512 bytes out. That’s why a hard-drive is called a block device: it writes blocks of bytes. These blocks are called physical blocks.

There are normally several partitions on a hard drive that further divide that file into smaller sections. The fdisk utility or the parted utility are two common Linux tools to view and manipulate partitions. These partitions can have further sub-partitioning, a common example is the “extended partition table” of MS-DOS. Or they can contain file-systems. On my machine in question there are two partitions: the first being the one (/dev/hda1) for the root file-system and the other one (/dev/hda2) being the swap partition.

The filesystems themself further subdivide their partition into logical blocks. That is they take n physical blocks and group them into m logical blocks. The ext2/3/4 filesystems is a so called inode based filesystem, that means a file is simply referred by a number and not by a filename. Filenames are paths that are assigned in directories to inodes. Some of those blocks store data on the actual filesystem, like the superblock. Other blocks store information on files or directories.

Generating a badblocks file for ext2/3/4

To mark the reported physical block as “bad” and thus prevent its usage, we have to go through these steps:

  1. Map the physical block of the hard-drive to a physical block of the partition
  2. Map the physical block of the partition to a logical block of the partition
  3. (Optionally) check what file this block belongs to
  4. Add the block to the “bad blocks” inode.

The first step can be done with fdisk by listing the boundaries of the partition in physical blocks.

root@host ~ $ fdisk -lu /dev/hda

Disk /dev/hda: 20.0 GB, 20003880960 bytes
255 heads, 63 sectors/track, 2432 cylinders, total 39070080 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0xaffeaffe

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *          63    38090114    19045026   83  Linux
/dev/hda2        38090115    39070079      489982+  82  Linux swap / Solaris

The table above means that the first physical block (“sector“) os /dev/hda1 is block 63 of /dev/hda. Both sector 23181191 and sector 23181199 from the error message in the beginning of the page are from /dev/hda1, since they are between Start and End of that partition. To get the physical block address of these two sectors we need to substract 63 of them.

root@host ~ $ echo $((23181191 - 63))
23181128
root@host ~ $ echo $((23181199 - 63))
23181136

The second step is to map these two physical blocks of the partition table to logical ones. The logical block size can be determined using dumpe2fs:

root@host ~ $ /sbin/dumpe2fs /dev/hda1 | grep size
Block size:               4096
Fragment size:            4096
Inode size:               128
Journal size:             128M

In this case the logical block size is 4096. Since the physical block size is 512, it means that every 8 (4096/512) physical blocks are grouped into one logical block. The first block of a ext2/3/4 partition is block 0, so we can calculate the logical block number like this:

root@host ~ $ echo $((23181128 / 8))
2897641
root@host ~ $ echo $((23181136 / 8))
2897642

Now these two logical blocks can be marked bad with e2fsck and the -l parameter given. The man page says: The format of this file is the same as the one generated by the badblocks(8)  program. And through testing that format seems to be simply a list of the bad block separated with the newline character (\n). In our example, the bad block list thus would look like this:

2897641
2897642

On my system I created such a file using this comand:

grep UncorrectableError /var/log/messages | grep -v grep | sed 's!.*sector=\([0-9]*\)$!\1!' | sort | uniq | while read sector; do echo $(( ($sector - 63) / 8 )); done | sort | uniq > /tmp/badblocks
debugfs:  Block Inode number
2897641 1436251
2897642 1436251

If you want to run the command above for your machine, you have to replace the 63 in the example above by the first sector of your partition.

Determining names of damaged files

If you want to know which files had been damaged, you can find out using debugfs. An excellent tutorial on how to do this is http://www.linuxjournal.com/article/0193.

To use the same command as I used on my system, you have to replace /dev/hda1 by the device name of whatever partition your bad blocks are on:

root@host ~ # xargs echo -n icheck < /tmp/badblocks | debugfs /dev/hda1
debugfs:  Block Inode number
2897641 1436251
2897642 1436251
junge ~ # echo ncheck 1436251 | debugfs /dev/hda1
debugfs:  Inode Pathname
1436251 /home/user/ondemand-mp3.dradio.de/file/dradio/2007/02/06/dlf_200702061647.mp3

Marking the blocks bad

Further reading

Modern disks know quite good how healthy they are and there is a standard to communicate with the disk about their state: S.M.A.R.T.

Under Linux the smartmontools are the package to access this data and there’s a tutorial how to use S.M.A.R.T. reports to mark block bad: http://smartmontools.sourceforge.net/badblockhowto.html.

Leave a Reply

Your email address will not be published. Required fields are marked *