Recovering Data From A Partially Formatted ext4 Drive

I screwed up several times.  I usually maintain two redundant backup drives; a raid and a single 5TB drive backup.  A few months ago, a customer on a limited budget needed a high capacity raid for something-or-another.   So I told myself: I have a huge raid and a cheap, single USB drive backup and I’ll be ok with the single drive for a couple of months.  Off goes the raid with NTFS (yuk!) formatted and zero’d drives.

A month an a half passes.  The single drive is plugged in, weekly project backups are made, the drive is unplugged. No problems.

Fast forward a week before the raid is slated to return, a legacy file is required. Plug in the drive, grab the file, and continue to work.  I do not unplug the drive.

A day or so later, I’ve got a new thinclient build configured and it’s ISO ready to be copied to a flash drive.  A girl friend calls and we’re yapping about the previous weekend events.  I lean over, plug the flash drive in, fire up a root terminal, cd to the build directory and type:

# dd if=mybuild.iso of=/dev/sdc bs=4M

A yellow flag immediately goes up. dd instantly returned, fastest 300MB ever written to flash.  Then I hear the backup drive click as data is transferred from cache to the platters.  Instant dread. Before I can finish saying the work “fuck” in my mind, I’m pulling it’s power cord.

Lessons learned:

  • Don’t have sexy talk when performing critical functions.
  • Verify the block device is the device you want to write.
  • Unplug the backup drive when not in use.
  • Don’t use a USB drive as a backup on a production machine.  If you can afford it, use a network backup solution.
  • Maintain two backups. Backup your backup!

The fix:

  1. Built a test rig: Dell PowerEdge R710 2x Xeon E5530 with 32GB RAM and 6 3TB drives. In the PERC, create three virtual drives.  VD1: 3TB. VD2: 2x 3TB, stripped. VD3: 3x 3TB, stripped. Install CentOS 7 Linux on VD1 (sda)
  2. dd‘d the blown drive onto VD2 (sdb) (that’s the 2x 3’s; for those that aren’t keeping up).  41 hours later, the copy completes. That’s 5TB transferred at 31MB/s over USB2. The blown drive is unplugged.
  3. The first thing I do is check the partition. Neither fdisk nor parted don’t know what to make of it. So run mount.  Sure enough, it mounts to the build iso. Unmount.
  4. Run fsck.ext4 for giggles.  Naturally without a partition, it fails. Not valid, not unix, bad superblock.
  5. Run dumpe2fs.  It says it can’t find a superblock (zero)
  6. Check for superblocks with dumpe2fs. It returns a dozen or so superblocks.  Great.
  7. Try running dumpe2fs with an alternative superblock. It fails. “Bad magic number” # dumpe2fs -o superblock=20480000 /dev/sdb
  8. Again run e2fsck with a superblock against a device and not a partition. FAIL. # e2fsck -b 20480000 /dev/sdb
  9. debug2fs doesn’t help.
  10. What the hell, try mounting the drive and point to a ‘good’ superblock. Useless without ext4 partition info.
  11. What the heck, attempt raw file recovery:
  12. Install autopsy.  No help. FAIL.
  13. Install foremost.  After an hour it finds 3 files. FAIL.
  14. Install sleuthkit. After two seconds, decide it’s not the tool. FAIL.
  15. Install testdisk. It manages to find the original drive label on a distant block. It won’t provide a block number, just reported the finding. After after dorking with it for 2 hours quit.  FAIL.
  16. Install photorec. After 3 hours it finds a 500,000+ files.  99% corrupt.  Interestingly, the only files which a valid are from a Macintosh VMware drive container. FAIL.
  17. Almost cry.  The cat looks at me and with a single disgusted look says, “And you call me pussy.” Hardened by the cat, I get a grip.
  18. Over on the production rig, plug in a flash drive and dd the build iso on. Note amount of data written (289MB)
  19. Back on the test rig, grab another 5TB drive. With gparted, create a gpt partition table and format it as a single ext4 partition.  Noting the superblock numbers.  They mostly matched what dumpe2fs initially returned.
  20. Run dd and copy the first 289MB onto VD2. dd actually coped 302MB.  Not bitching about 13 MB out of 5TB. # dd if=/dev/sdd of=/dev/sdb bs=1M count=289
  21. parted and fdisk on VD2 and they report a good 5TB partition. Yay!
  22. Run e2fsck with a known good superblock.    # e2fsck -b 20480000 /dev/sdb1
  23. Say, “Fuckin’-A it found a good superblock!” e2fsck is in interactive mode so it’s asking to fix this and that.  Tens of thousands of Enter key taps later (OK, I placed a weight on the enter kkkkkey) About 45 minutes later, it stops.
  24. Try to mount the partition without switches. Mount returns without error. # mount /dev/sdb1 /mnt/a
  25. # ls -l /mnt/a      returns a single directory, lost+found. *Chub*
  26. # ls -al /mnt/a/lost+found/*    FILES WITH PRESERVED FILE NAMES AND DIRECTORIES. HELL YEAH!  The only thing what appears missing is the root files names.  There are probably a few missing files, but that’s ok.  Tested a dozen or so files and they all appear good.
  27. Copied the lost+found files from VD2 to VD3 with  # cp -R /mnt/a/lost+found/  /mnt/w/   Four hours later… FILES!

The big problem was that the OS couldn’t find partition info on the drive.  Without the partition info it wouldn’t have a starting point for superblock zero.  Copying the “header” from the new formatted drive gave the OS the required partition info and then the regular tools were able to repair my screw up.

76 hours, beginning to recovery. 4.1TB of 4.8TB recovered by a distant superblock.  Next, I’ll dd closer to the 289MB mark and try a superblock much closer to zero.  More later.  Maybe.

Leave a Reply