HOWTO - Reconstruct failed RAID 0 arrays for fun and profit.
Hey, I had an experience I couldn't resist blogging about. :-D
WARNING! Ultra-Geeky Story Ahead! No, Really! You've Been Warned!
So, this teacher brought in his computer from home. It was a high-end Dell XPS, about 5 years old. He had ordered it with two 120GB hard disks striped in RAID 0, and recently he went to boot it up (into Windows XP) and it would not work; some very important system file was corrupt. He went round and round with Dell support before bringing it in to us to see what we could do.
At first glance, the situation was grim. He'd run diagnostics and knew there was a physically bad sector on the second disk of the array. Popping in the XP install disk and telling it to "R"ecover brought no luck, nor did booting up to a Bart PE disk; neither knew there was a disk there. Windows basically wanted nothing to do with the failed array.
So, in went the Ubuntu Live CD. It saw a disk, and I thought, "Great! We can at least get the data off the array, and get SOMETHING out of it. If nothing else, there's
dd and " Except, after trying to mount the Dell utility partition, listing its contents gave us crazy garbage.
"Oh, well," I thought, "
dd it is." So I ran
dd if=/dev/sda of=/mnt/dev.sda (having NFS mounted a network drive on
/mnt) and left for the day (it takes a while...).
The next morning, I saw the file's size was 120GBs...which is HALF of what I expected to see. Only THEN did I notice that Ubuntu was also seeing a
See, the two disks were plugged into a "RAID controller" card that we had all assumed made this "Hardware RAID" so that the OS didn't have to know anything to use it. Well, so apparently that "RAID controller" card relied on some software component somewhere, somehow under Windows to function right, and Ubuntu did not know it was supposed to be RAID.
Vern suggested we throw the disks into a software RAID utility to see what came out. Unfortunately, nothing.
So we felt pretty much defeated, since even if we got all the data off both disks, it'd be scrambled and useless. Still, for some reason, I couldn't quit thinking about this problem. It seemed like there should be SOMETHING we could do with the data from the disks to recombine them into something useful.
I needled Vern with questions about how RAID 0 works, and looked it up online myself. IF RAID 0 worked as simply as I figured it SHOULD, I should be able to figure out how to solve this problem. I had a fuzzy idea to work from, no experience with anything like thing, and I could not say I'd succeed. However, I figured, in order to save several years worth of this teacher's work, it was worth going for it and giving it my best shot.
I would write a Python program to "de-interlace" the two disks into one image. I had already written a program that worked with the sorts of tools I would need to use (though it was written with the opposite problem in mind) during my trek through the Python Challenge, so I looked it up, modified it, and came up with a simple program that would:
- Open two input files to read from (in binary mode).
- Open a output file to write to (in binary mode).
- Take a certain amount of data from the first file and write it to the output file, then take a certain amount of data from the second file and write it to the output file.
- Wash, rinse, repeat until there's no more to read from either input file.
- Close out all files.
All I needed to know was, what is "a certain amount" supposed to be? I could only find offers to sell me commercial solutions and vague descriptions of RAID theory, no concrete implementation details, through Google. I decided it can't be THAT hard to figure out, right? ;-).
Well, being a computer, we figured it probably had to be a power of 2. I grabbed the first 100,000 sectors of each drive's image to play with (drive 2, the failed one, STILL has
dd grinding away at it as I type; it's taking at least around an order of magnitude longer to recover that data than from the first, functional, disk). I tried guessing a few times, 512, 4096, etc. to no avail. I would go to mount the Dell utility partition and
ls's output came back scrambled.
I decided to instead pull out a hex editor and take a look at the raw data, to see if I could find any patterns. I found all KINDS of patterns :-), none of which seemed to pan out when it came to testing them, until I found this strange sequence of bytes, a pattern at the start of
00000000 01 40 02 40 03 40 04 40 05 40 06 40 07 40 08 40 |.@.@.@.@.@.@.@.@| 00000010 09 40 0a 40 0b 40 0c 40 0d 40 0e 40 0f 40 10 40 |.@.@.@.@.@.@.@.@| 00000020 11 40 12 40 13 40 14 40 15 40 16 40 17 40 18 40 |.@.@.@.@.@.@.@.@| 00000030 19 40 1a 40 1b 40 1c 40 1d 40 1e 40 1f 40 20 40 |.@.@.@.@.@.@.@ @| 00000040 21 40 22 40 23 40 24 40 25 40 26 40 27 40 28 40 |!@"@#@$@%@&@'@(@| ...[etc.]...
Then, later at some point, I noticed this strange sequence in
...[etc.]... 0000ffd0 e9 3f ea 3f eb 3f ec 3f ed 3f ee 3f ef 3f f0 3f |.?.?.?.?.?.?.?.?| 0000ffe0 f1 3f f2 3f f3 3f f4 3f f5 3f f6 3f f7 3f f8 3f |.?.?.?.?.?.?.?.?| 0000fff0 f9 3f fa 3f fb 3f fc 3f fd 3f fe 3f ff 3f 00 40 |.?.?.?.?.?.?.?.@| 00010000 91 89 76 7c 8c 5e 7e 66 c7 86 80 00 ff ff ff ff |..v|.^~f........| 00010010 8b d6 03 56 0b 89 56 78 8c 5e 7a 1e b8 70 00 8e |...V..Vx.^z..p..| 00010020 d8 8e c0 33 ff f3 a4 1f 89 7e 72 8c 46 74 81 c7 |...3.....~r.Ft..| ...[etc.]...
A strange break in a strange pattern at precisely a power of 2! On further investigation, the strange pattern would continue flawlessly if the start of
/dev/sdb continued it. So I took
ffff, which is 65535 in decimal, and plugged that into my script. This time the ls looked much more promising and still weird, with filenames like
onfig s.ys and
D'oh! I was one byte off! So I add a byte and Bam! the Dell utility partition worked precisely as it should.
So, strictly speaking, I don't know if we'll succeed in getting the data off the array. We'll have to wait a few days to find out how the rest of the second hard drive fares, as it's taking an inordinately long amount of time to get that data off. However, whatever happens from here, my theory payed off!
That's a beautiful thing for a geek like me. :-D
Oh, and it's cool that we might be able to save all/some of that data, too. ;-)
#!/usr/bin/python # -*- coding: utf-8 -*- # # deinterlace.py # # INTENT = This is a script for deinterlacing two raw dd images # taken from a failed RAID 0 array into one "valid" image file # that we hope to be able to recover data from. # # This is strictly experimental. # # Thursday, May 1, 2008 -Simón A. Ruiz # inputFiles = [open("dev.sda","rb"),open("dev.sdb","rb")] outputFile = open("output","wb") chunkSize = 65536 # And, so as not to have to figure this out every time through the loop... numFiles = len(inputFiles) i = 0 while True: nextChunk = inputFiles[i%numFiles].read(chunkSize) if not nextChunk: print 'Done! No more data.' break outputFile.write(nextChunk) i += 1 outputFile.close() for file in inputFiles: file.close()
Stay tuned for a post on mounting disk images as if they were real disks.