Page 1 of 1

NetBSD RAIDFRAME Recovery

Posted: Fri May 01, 2009 2:41 am
by cynjut
I've got a 4x250G RAID 5 array that I'm trying to recover after a bad sector in the superblock of one of the drive partitions. I've got the RAID array loaded into a machine that I'm dual-booting between NetBSD and Windows XP.

The configuration of the array is 1 row, 4 columns, 0 spares, 63 sectors per Stripe Unit, 1 SU per Parity Unit, 1 SU per recon unit, and RAID-5 parity. We chose 63 sectors per stripe unit because that's the size of 1 track.

On the raid there are two partitions, one starting at 3024 sectors for 20971520 sector (10 Gig) and one starting at 20974544 sectors and running through the end of the disk. (1444216582 sectors).

The second partition had the bad superblock, and when I set a region so that it corresponds to that partition, the system finds a Superblock some negative number of sectors away )I didn't right the number down, but it seems to me it was something like -168 sectors).

I'm in the process of copying the RAID array onto a 1G drive, but that takes about 20 hours. Once I'm done copying the fs image to the new drive, I expect the software will be able to handle the drive just fine (since the RAID components will be obscured).

Am I on the right track? Is the software just misinterpreting the block size (32 k instead of 31.5k)? Or am I completely going the wrong way. I really need to get started pulling the files off this disk ASAP, and waiting 22 hours for a 1 TB disk-to-disk copy is just a lot of time if there is some other solution.

Re: NetBSD RAIDFRAME Recovery

Posted: Sat May 02, 2009 4:18 am
by Alt
Which software do you use to copy the RAID? Is this R-Studio creating an image of the RAID?

Re: NetBSD RAIDFRAME Recovery

Posted: Sat May 02, 2009 11:49 am
by cynjut
I used 'dd' - my 'recovery console' boots NetBSD and XP, so I can easily switch back and forth between the two systems.

The exact command is "dd conv=noerror,sync progress=16 if=/dev/raid0d of=/dev/wd2d"

This way, if there's an error, the system will ignore it, and copy NULLs into what would have been a bad sector. It prints a '.' once per Gig of data. Since I'm using the alternate superblock for everything I can, the fact that there was a bad sector in the primary superblock should be OK. It would be cool if I could specify a different superblock for UFS1 and UFS2 drives.

I might have used to R-TT tools, but with the "odd" settings, the drive wasn't being interpreted correctly. The 31.5K block stripe size might have confused it a little bit.

On to the next question - it looks like a lot of my directory nodes are damaged, so a LOT of my files are ending up in the R-TT version of "Lost+Found" with 'inode number' as their file name. I'm getting ready to write a filename finder that scans the drive and tries to find all of the inodes associated with things that look like filenames. I can't really trust the directory structure - it looks pretty serious hosed. Since all of the files on the hard drive have files in the same format "200[789][01][0-9][0-3][0-9]-[012][0-9][0-6][0-9][0-6][0-9]-[0-9]*.[0-9]*.mp3", I should be able to scan the drive, find anything that looks like that, back up 8 bytes and display those 4 bytes as an unsigned long. That will give me the equivalent of "ls -ri /" without crashing the server with "Page fault in supervisor mode - bad directory entry".

If someone's already written that program, I'd love to hear from you....

Re: NetBSD RAIDFRAME Recovery

Posted: Sun May 03, 2009 12:09 pm
by cynjut
Posting to my own reply....

I've written a one line shell script that grabs all of the file names (strings -tx /dev/wd2f | grep '200[789]...' > files) that meet my search criteria out of the file systems with byte offsets into the file system. After I get them all into a file, I'll write a PERL script to take the offsets and file names, convert it to a decimal number (for lseek), subtract 16 from it, and grab the next 4 bytes as an unsigned long. That will give me the inode number for the file (if I'm reading the source code right). I'll then search for the file names "inode whatever" and rename it according to the name from the strings file. I could probably do it all in one swell foop if I wanted to (use 'cut' and 'expr' on the first argument, grab the second arg unchanged, and create a command to do the rename on the fly).

This way, I can see what's happening and not chance screwing up the files that I've managed to recover any more than I have.

BTW, has anyone written an XML definition for a UFS file system directory entry?