I was so annoyed yesterday when I tried to set up a new TV recording on our MythTV system and it failed with an error message on the server.
This meant that we couldn’t watch any of our recordings and had to resort to recording/watching Sky+, which mere mortals think is fantastic but which is like going back to the dark ages once you’re used to a more capable system like MythTV.
Starting to investigate the problem this morning showed that the system disk had errors and had been set to read-only. I knew that this meant that a full disk check would be started on the next reboot, so I decided I should ‘go for it’ and reboot the system.
The server is ‘headless’ (i.e. has no display or keyboard attached) and as I knew that the disk checking process may need some interaction I moved the server next to the TV, attached a display cable and keyboard, and turned on the power.
The normal ‘Disk check in progress – please wait‘ screen was soon replaced by ‘Serious errors have been detected while checking the disk – Ignore, Skip or Manual recovery‘ – Oh ****!!
I thought I’d go for manual recovery and run ‘fsck’ manually; that way I should see what the serious errors were. At this point I came across something that I’d not seen before: the output looked like this:
Pass 1: Checking inodes, blocks, and sizes ... Illegal block #9 (463536155) in inode 14109880. CLEARED. Illegal block #11 (1275199487) in inode 14109880. CLEARED. Error storing directory block information (inode=14109880, block=0, num=471166008): Memory allocation failed
Erm, what? Memory Allocation Failed?
After some googling I found a mailing list post from back in 2014 saying that in some instances the disk checking progam ‘e2fsck’ itself can be caused to crash by a particular pattern of corrupt data and suggesting a ‘hack’ to overwrite the specific corrupt data that is causing the crash.
In this instance the command:
debugfs -w -R "clri <14109880>" /dev/sda1
‘zaps’ the contents of the offending inode, overwriting the corruption that was causing e2fsck to try to allocate too much memory, and then fail.
Once I had run this command, then checked the file system again, the remaining errors were fixed and the system then booted normally. Phew!
The MythTV system is so much more capable that Sky+ that once you’re used to it it’s really hard to go back to something so much more basic and primitive.
Hopefully the disk corruption was a ‘one off’ caused by a power glitch and the system is now back to its usual reliability.