Wednesday, May 14, 2008

HOWTO - Work with Disk Images

Warning! Extra Geeky Content Ahead!

What is a disk image?

The way I'm using it, a disk image is a bit-by-bit copy of the information on a data storage device. For a broader exploration of the topic, check out the "Disk image" article over at the Wikipedia.

I'll talk about hard disk images, and CD images (the infamous .iso file).

Why would I care about disk images?

Maybe you wouldn't. Mostly, I think we want to interact with our disks through the standard filesystem tools to work with files.

There ARE several situations where working with disk images might be useful. Among them might be:

  • Downloading and burning the latest Ubuntu Live CD.
  • Copying a CD, to distribute or to have a backup copy just in case.
  • Accessing the contents of these CD images without burning them onto CD.
  • Duplicating smaller disks (e.g. SD cards, CompactFlash cards)
  • Saving a perfect data copy of a disk as a first step in sensitive situations involving data (e.g., data recovery or forensics work).
  • Keeping an archival copy of a disk (e.g. I have a disk image of the 120MB hard disk from my old 386 computer for nostalgia purposes; one of these days I'm going to figure out how to emulate it...).

Note that while technically possible, imaging a large number of computers this way is a very long process compared to tools that are specifically designed for that sort of work (such as partimage and the SystemImager suite).

What tools do we need?

In UNIX (and Linux, and BSD, OSX, etc.), we will use the dd utility to access data from the storage device directly (as opposed to via standard file manipulation, which is (thankfully) abstracted from direct access through concepts like partitioning and filesystems) in order to be able to read and write disk images.

dd, or a version of it, is available for every platform, I believe.

I'll also discuss using the UNIX mount utility to mount the disk images as if they were real disks. I believe Mac OSX has a similar functionality built right in, and I know there are all kinds of programs for mounting virtual drives in Windows.

Creating a Disk Image

You'll need the disk you're copying from to not be mounted. You'll also need a place to save the image file that has sufficient free space. This may mean:

  • You're copying from a smaller disk to a larger disk—with more free space than the smaller disk has total capacity— neither of which is mounted on your computer.
  • You're booted into a Live CD environment in order to have un-mounted access to your main (operating system) disk in order to be able to copy it off onto an external storage device, be that over USB or over the network.

Whatever the case, once you have things ready, the syntax we use is:

# dd if=input of=output

Where input is the disk device node and output is the file you want to write the disk's data to.

SO, if I want to take /dev/sda and make a disk image of it called sda.dd in the current working directory, I run:

# dd if=/dev/sda of=sda.dd

And now I have a file named sda.dd which contains an exact bit-by-bit copy of my /dev/sda disk!

Writing a Disk Image to a Disk

So, you created a disk image of your drive, and then you did something stupid and ruined the contents of your drive? or your drive died and you got a new one? or (more optimistically) you're just duplicating the hard disk and that's why you have an image? No worries, we can simply write the image right back onto that disk!

We'll need to have access to the disk image file, and the destination disk will need to be available and unmounted.

The pattern for dd stays the same:

# dd if=input of=output

Except now input is the disk image file and and output is the disk device node you want to write the disk's data to.

SO, if I want to take /dev/sda and write a disk image called sda.dd onto it, I run:

# dd if=sda.dd of=/dev/sda

Partitions

Incidentally, you can also do this with a partition rather than a full disk, by giving it a partition node instead of a disk node (e.g. /dev/sda1 is the first partition on /dev/sda) so dd will create an image of just the first partition rather than one of the full disk).

CD Images

Compact Disc images have been made really easy to work with.

In Ubuntu, you need only to right-click on a .iso file in order to be presented with the option to Write to Disc.... Also, using Brasero, you can run a "Disc copy" project and copy your disc to a "File image".

Alternatively, on the command-line, CD images work just like you might expect from the above. In order to create them you can just:

# dd if=/dev/scd0 of=discimage.iso

And in order to burn the disc image to a blank disc, you can just reverse the direction:

# dd if=discimage.iso of=/dev/scd0

Accessing Disk Image Partitions

So, we can write the disk images onto disks...

We can also access and manipulate them directly by treating the image file as a disk. I'll be discussing the use of the UNIX mount utility, which is responsible for mounting disks onto the UNIX file system, to do this.

The pattern for using mount (for this) is:

# mount -t fstype -o options device directory

Where:

  • fstype is the type of filesystem you're trying to mount; necessary if you're working with non-native filesystems, like NTFS.
  • options are extra options you may need to use, e.g. loop is the option we feed mount to let it know we're feeding it a disk image file rather than a real disk. If we want to mount a partition from inside a full disk image, we'll also need to use the offset option to let it know where the partition starts.
  • device is the device (in our case disk image) to be mounted.
  • directory is the destination directory in the UNIX file system where you want the image mounted.

So, in order to mount a CD iso, we just do:

# mount -o loop cdimage.iso /media/iso

And now running ls /media/iso will show us the root directory of the CD.

On Ubuntu 8.04 (and others), so long as you mount it in the /media directory, an icon will show up on the GNOME desktop to represent the "disc".

If we have the image of a disk partition, we can similarly mount it as above with:

# mount -o loop partition.img /media/partition

However, if we want to mount a partition from inside a full disk image, we'll first need to locate the spot in the image that the partition begins.

We need to use fdisk to get the needed information out of the disk image, with the -l option to list partition information and the -u option to show us the sizes in sectors.

For example, as I was working on recovering data off of a teacher's dying hard drives, I got this response:

root@om:/mnt# fdisk -lu output 
You must set cylinders.
You can do this from the extra functions menu.

Disk output: 0 MB, 0 bytes
255 heads, 63 sectors/track, 0 cylinders, total 0 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x9dc96e9e

 Device Boot      Start         End      Blocks   Id  System
output1              63       80324       40131   de  Dell Utility
output2   *       80325   464840774   232380225    7  HPFS/NTFS
Partition 2 has different physical/logical endings:
     phys=(1023, 254, 63) logical=(28934, 254, 63)

So, from the above we can tell that the second partition in the disk image "output" starts at sector 80325.

We also know that each sector is 512 bytes.

Multiply the two and we know that the second partition starts at byte number 41126400.

We can also tell the partition type is NTFS (It's a Windows XP partition...).

I created a directory called /mnt/C to be the mount point for the NTFS partition, and plugged all the information in the right order for mount:

root@om:/mnt# mount -o loop,offset=41126400 -t ntfs output C

And voilà!

the second partition of the disk image file "output" is now mounted on my system under the /mnt/C folder!

I proceed to grip all that teacher's years of work and yank it back from the jaws of oblivion. (For this, I just used graphical filesystem tools to copy out his Documents and Settings folder)

Conclusion

There are certain situations where knowing how to work with raw disk data can be useful.

These are the things I had to learn to figure out how to do it right. Now I've got a reference to look back on the next time I have to do it.

Also, hopefully others will find it useful or at least mildly entertaining.

:-)

Thursday, May 01, 2008

HOWTO - Reconstruct failed RAID 0 arrays for fun and profit.

Hey, I had an experience I couldn't resist blogging about. :-D

WARNING! Ultra-Geeky Story Ahead! No, Really! You've Been Warned!

So, this teacher brought in his computer from home. It was a high-end Dell XPS, about 5 years old. He had ordered it with two 120GB hard disks striped in RAID 0, and recently he went to boot it up (into Windows XP) and it would not work; some very important system file was corrupt. He went round and round with Dell support before bringing it in to us to see what we could do.

At first glance, the situation was grim. He'd run diagnostics and knew there was a physically bad sector on the second disk of the array. Popping in the XP install disk and telling it to "R"ecover brought no luck, nor did booting up to a Bart PE disk; neither knew there was a disk there. Windows basically wanted nothing to do with the failed array.

So, in went the Ubuntu Live CD. It saw a disk, and I thought, "Great! We can at least get the data off the array, and get SOMETHING out of it. If nothing else, there's dd and photorec..." Except, after trying to mount the Dell utility partition, listing its contents gave us crazy garbage.

"Oh, well," I thought, "dd it is." So I ran dd if=/dev/sda of=/mnt/dev.sda (having NFS mounted a network drive on /mnt) and left for the day (it takes a while...).

The next morning, I saw the file's size was 120GBs...which is HALF of what I expected to see. Only THEN did I notice that Ubuntu was also seeing a /dev/sdb drive.

See, the two disks were plugged into a "RAID controller" card that we had all assumed made this "Hardware RAID" so that the OS didn't have to know anything to use it. Well, so apparently that "RAID controller" card relied on some software component somewhere, somehow under Windows to function right, and Ubuntu did not know it was supposed to be RAID.

Vern suggested we throw the disks into a software RAID utility to see what came out. Unfortunately, nothing.

So we felt pretty much defeated, since even if we got all the data off both disks, it'd be scrambled and useless. Still, for some reason, I couldn't quit thinking about this problem. It seemed like there should be SOMETHING we could do with the data from the disks to recombine them into something useful.

I needled Vern with questions about how RAID 0 works, and looked it up online myself. IF RAID 0 worked as simply as I figured it SHOULD, I should be able to figure out how to solve this problem. I had a fuzzy idea to work from, no experience with anything like thing, and I could not say I'd succeed. However, I figured, in order to save several years worth of this teacher's work, it was worth going for it and giving it my best shot.

I would write a Python program to "de-interlace" the two disks into one image. I had already written a program that worked with the sorts of tools I would need to use (though it was written with the opposite problem in mind) during my trek through the Python Challenge, so I looked it up, modified it, and came up with a simple program that would:

  • Open two input files to read from (in binary mode).
  • Open a output file to write to (in binary mode).
  • Take a certain amount of data from the first file and write it to the output file, then take a certain amount of data from the second file and write it to the output file.
  • Wash, rinse, repeat until there's no more to read from either input file.
  • Close out all files.

All I needed to know was, what is "a certain amount" supposed to be? I could only find offers to sell me commercial solutions and vague descriptions of RAID theory, no concrete implementation details, through Google. I decided it can't be THAT hard to figure out, right? ;-).

Well, being a computer, we figured it probably had to be a power of 2. I grabbed the first 100,000 sectors of each drive's image to play with (drive 2, the failed one, STILL has dd grinding away at it as I type; it's taking at least around an order of magnitude longer to recover that data than from the first, functional, disk). I tried guessing a few times, 512, 4096, etc. to no avail. I would go to mount the Dell utility partition and ls's output came back scrambled.

I decided to instead pull out a hex editor and take a look at the raw data, to see if I could find any patterns. I found all KINDS of patterns :-), none of which seemed to pan out when it came to testing them, until I found this strange sequence of bytes, a pattern at the start of /dev/sdb:

00000000  01 40 02 40 03 40 04 40  05 40 06 40 07 40 08 40  |.@.@.@.@.@.@.@.@|
00000010  09 40 0a 40 0b 40 0c 40  0d 40 0e 40 0f 40 10 40  |.@.@.@.@.@.@.@.@|
00000020  11 40 12 40 13 40 14 40  15 40 16 40 17 40 18 40  |.@.@.@.@.@.@.@.@|
00000030  19 40 1a 40 1b 40 1c 40  1d 40 1e 40 1f 40 20 40  |.@.@.@.@.@.@.@ @|
00000040  21 40 22 40 23 40 24 40  25 40 26 40 27 40 28 40  |!@"@#@$@%@&@'@(@|
...[etc.]...

Then, later at some point, I noticed this strange sequence in /dev/sda:

...[etc.]...
0000ffd0  e9 3f ea 3f eb 3f ec 3f  ed 3f ee 3f ef 3f f0 3f  |.?.?.?.?.?.?.?.?|
0000ffe0  f1 3f f2 3f f3 3f f4 3f  f5 3f f6 3f f7 3f f8 3f  |.?.?.?.?.?.?.?.?|
0000fff0  f9 3f fa 3f fb 3f fc 3f  fd 3f fe 3f ff 3f 00 40  |.?.?.?.?.?.?.?.@|
00010000  91 89 76 7c 8c 5e 7e 66  c7 86 80 00 ff ff ff ff  |..v|.^~f........|
00010010  8b d6 03 56 0b 89 56 78  8c 5e 7a 1e b8 70 00 8e  |...V..Vx.^z..p..|
00010020  d8 8e c0 33 ff f3 a4 1f  89 7e 72 8c 46 74 81 c7  |...3.....~r.Ft..|
...[etc.]...

A strange break in a strange pattern at precisely a power of 2! On further investigation, the strange pattern would continue flawlessly if the start of /dev/sdb continued it. So I took ffff, which is 65535 in decimal, and plugged that into my script. This time the ls looked much more promising and still weird, with filenames like sdos s.ys', onfig s.ys and utoexecb.at.

D'oh! I was one byte off! So I add a byte and Bam! the Dell utility partition worked precisely as it should.

So, strictly speaking, I don't know if we'll succeed in getting the data off the array. We'll have to wait a few days to find out how the rest of the second hard drive fares, as it's taking an inordinately long amount of time to get that data off. However, whatever happens from here, my theory payed off!

That's a beautiful thing for a geek like me. :-D

Oh, and it's cool that we might be able to save all/some of that data, too. ;-)

Source

#!/usr/bin/python
# -*- coding: utf-8 -*-
#
# deinterlace.py
#
# INTENT = This is a script for deinterlacing two raw dd images
#     taken from a failed RAID 0 array into one "valid" image file
#     that we hope to be able to recover data from.
#
#          This is strictly experimental.
#
#                               Thursday, May 1, 2008 -Simón A. Ruiz
#

inputFiles = [open("dev.sda","rb"),open("dev.sdb","rb")]
outputFile = open("output","wb")
chunkSize = 65536

# And, so as not to have to figure this out every time through the loop...
numFiles = len(inputFiles)

i = 0
while True:
    nextChunk = inputFiles[i%numFiles].read(chunkSize)
    if not nextChunk:
        print 'Done! No more data.'
        break
    outputFile.write(nextChunk)
    i += 1

outputFile.close()
for file in inputFiles:
    file.close()

Stay tuned for a post on mounting disk images as if they were real disks.

[My next post talks about disk images in depth]