Fedora Core 5 LV Disk recovery or When the proverbial S..t hits the fan

 

My case:

A 160G drive’s lose power cable caused the drive to eventually stop running. Some serious damage was caused to the file system. When I managed to get the drive back on line and run a fsck it took about one hour and half for all the corrections to be applied. I had to run fsck several times before I had a clean drive again.

All data was recovered.

 

Just to add a few things to the excellent article below:

I’m assuming a standard installation as set up by the automatic partitioning of the fedora 5 install and no Raid.

Understand that its not /dev/hda2 that you are going to be addressing but /dev/VolGroup00/LogVol00 which in your standard fedora 5 installation included /dev/hda2 and the swap device

Note that the article below was written for a rather older Fedora version so not everything is exactly as described but its close enough that you should be able to figure things out. If you cannot email me, skype me of message me.

email: Anthony.Dawson@thelasis.com 
Skype ID: aegdawson
ICQ ID: 23-227-727
Messenger: aegdawson
MSN: anthony.dawson@hotmail.com

Any fsckyf should run against the LV so when you have done with your recovery process you will probably want to do a  clean up.

Also if the file system superblock is totally screwed you will need to use the –b option to select an alternative super block back up copy (of which there are many throughout the disk). If you don’t know where they are use:

mke2fs –n /dev/hda2 this will list a whole bunch of super block copies.

Apart from the rest, I had to issue a pvcreate command within the lvm environment in order to set up the LV table which I had previously deleted in ignorance…

Also note that the article renames the logical device so as not to conflict with the sane installation. I booted the linux system of the first installation disk in “linux recovery” mode and repaired the disk to the point where I could finally boot the system up again.

In my case the process was the following:

1)    boot off fedora boot disk press F5 and type in linux rescue

2)    Use parted to check that the partitions still exist

3)    fsckyf /dev/hda1 note that the /dev/hda1 is not a LV

4)    Use dd to extract the volume information

1)    Use vi to edit the volume information and create VolGoup00 configuration file

2)    Use pvcreate with the right drive ID to label the volume with the correct uuid which I go from 2)

3)    Use vgcfgrestore to restore the volume description

4)    Use vgchange to make the volume active

5)    Use fsckyf /dev/VolGroup00/LogVol00 to repair the file system

6)    If that fails use mke2fs –n /dev/hda2 to find the location of super block backup and then use

7)    Use fsckyfb nnnn /dev/VolGroup00/LogVol00 where nnn is the inode of the alternative super block

8)    Reboot

 

If you’re lucky you’re back in business J

 

Recovery of RAID and LVM2 Volumes

From Issue #146
June 2006

Apr 28, 2006  By Richard Bulling...

 in

Raid and Logical Volume Managers are great, until you lose data.

Restoring Access to the RAID Array Members

To recover, the first thing to do is to move the drive to another machine. You can do this pretty easily by putting the drive in a USB2 hard drive enclosure. It then will show up as a SCSI hard disk device, for example, /dev/sda, when you plug it in to your recovery computer. This reduces the risk of damaging the recovery machine while attempting to install the hardware from the original computer.

The challenge then is to get the RAID setup recognized and to gain access to the logical volumes within. You can use sfdisk -l /dev/sda to check that the partitions on the old drive are still there.

To get the RAID setup recognized, use mdadm to scan the devices for their raid volume UUID signatures, as shown in Listing 3.

Listing 3. Scanning a Disk for RAID Array Members

[root@recoverybox ~]# mdadm --examine --scan  /dev/sda1 /dev/sda2 /dev/sda3

ARRAY /dev/md2 level=raid1 num-devices=2

 UUID=532502de:90e44fb0:242f485f:f02a2565

   devices=/dev/sda3

ARRAY /dev/md1 level=raid1 num-devices=2

 UUID=75fa22aa:9a11bcad:b42ed14a:b5f8da3c

   devices=/dev/sda2

ARRAY /dev/md0 level=raid1 num-devices=2

 UUID=b3cd99e7:d02be486:b0ea429a:e18ccf65

   devices=/dev/sda1

This format is very close to the format of the /etc/mdadm.conf file that the mdadm tool uses. You need to redirect the output of mdadm to a file, join the device lines onto the ARRAY lines and put in a nonexistent second device to get a RAID1 configuration. Viewing the the md array in degraded mode will allow data recovery:

[root@recoverybox ~]# mdadm --examine --scan  /dev/sda1

 /dev/sda2 /dev/sda3 >> /etc/mdadm.conf

[root@recoverybox ~]# vi /etc/mdadm.conf

Edit /etc/mdadm.conf so that the devices statements are on the same lines as the ARRAY statements, as they are in Listing 4. Add the “missing” device to the devices entry for each array member to fill out the raid1 complement of two devices per array. Don't forget to renumber the md entries if the recovery computer already has md devices and ARRAY statements in /etc/mdadm.conf.

Listing 4. /etc/mdadm.conf

DEVICE partitions

ARRAY /dev/md0 level=raid1 num-devices=2

 UUID=b3cd99e7:d02be486:b0ea429a:e18ccf65

 devices=/dev/sda1,missing

ARRAY /dev/md1 level=raid1 num-devices=2

 UUID=75fa22aa:9a11bcad:b42ed14a:b5f8da3c

 devices=/dev/sda2,missing

ARRAY /dev/md2 level=raid1 num-devices=2

 UUID=532502de:90e44fb0:242f485f:f02a2565

 devices=/dev/sda3,missing

Then, activate the new md devices with mdadm -A -s, and check /proc/mdstat to verify that the RAID array is active. Listing 5 shows how the raid array should look.

Listing 5. Reactivating the RAID Array

 

[root@recoverybox ~]# mdadm -A -s

[root@recoverybox ~]# cat /proc/mdstat

Personalities : [raid1]

md2 : active raid1 sda3[1]

      77521536 blocks [2/1] [_U]

 

md1 : active raid1 sda2[1]

      522048 blocks [2/1] [_U]

 

md0 : active raid1 sda1[1]

      104320 blocks [2/1] [_U]

 

unused devices: <none>

 

If md devices show up in /proc/mdstat, all is well, and you can move on to getting the LVM volumes mounted again.

Recovering and Renaming the LVM2 Volume

The next hurdle is that the system now will have two sets of lvm2 disks with VolGroup00 in them. Typically, the vgchange -a -y command would allow LVM2 to recognize a new volume group. That won't work if devices containing identical volume group names are present, though. Issuing vgchange -a y will report that VolGroup00 is inconsistent, and the VolGroup00 on the RAID device will be invisible. To fix this, you need to rename the volume group that you are about to mount on the system by hand-editing its lvm configuration file.

If you made a backup of the files in /etc on raidbox, you can edit a copy of the file /etc/lvm/backup/VolGroup00, so that it reads VolGroup01 or RestoreVG or whatever you want it to be named on the system you are going to restore under, making sure to edit the file itself to rename the volume group in the file.

If you don't have a backup, you can re-create the equivalent of an LVM2 backup file by examining the LVM2 header on the disk and editing out the binary stuff. LVM2 typically keeps copies of the metadata configuration at the beginning of the disk, in the first 255 sectors following the partition table in sector 1 of the disk. See /etc/lvm/lvm.conf and man lvm.conf for more details. Because each disk sector is typically 512 bytes, reading this area will yield a 128KB file. LVM2 may have stored several different text representations of the LVM2 configuration stored on the partition itself in the first 128KB. Extract these to an ordinary file as follows, then edit the file:

dd if=/dev/md2 bs=512 count=255 skip=1 of=/tmp/md2-raw-start

vi /tmp/md2-raw-start

You will see some binary gibberish, but look for the bits of plain text. LVM treats this metadata area as a ring buffer, so there may be multiple configuration entries on the disk. On my disk, the first entry had only the details for the physical volume and volume group, and the next entry had the logical volume information. Look for the block of text with the most recent timestamp, and edit out everything except the block of plain text that contains LVM declarations. This has the volume group declarations that include logical volumes information. Fix up physical device declarations if needed. If in doubt, look at the existing /etc/lvm/backup/VolGroup00 file to see what is there. On disk, the text entries are not as nicely formatted and are in a different order than in the normal backup file, but they will do. Save the trimmed configuration as VolGroup01. This file should then look like Listing 6.

 

Listing 6. Modified Volume Group Configuration File

VolGroup01 {

id = "xQZqTG-V4wn-DLeQ-bJ0J-GEHB-4teF-A4PPBv"

seqno = 1

status = ["RESIZEABLE", "READ", "WRITE"]

extent_size = 65536

max_lv = 0

max_pv = 0

 

physical_volumes {

 

pv0 {

id = "tRACEy-cstP-kk18-zQFZ-ErG5-QAIV-YqHItA"

device = "/dev/md2"

 

status = ["ALLOCATABLE"]

pe_start = 384

pe_count = 2365

}

}

 

# Generated by LVM2: Sun Feb  5 22:57:19 2006

Once you have a volume group configuration file, migrate the volume group to this system with vgcfgrestore, as Listing 7 shows.

Listing 7. Activating the Recovered LVM2 Volume

[root@recoverybox ~]# vgcfgrestore -f VolGroup01 VolGroup01

[root@recoverybox ~]# vgscan

  Reading all physical volumes.  This may take a while...

  Found volume group "VolGroup01" using metadata type lvm2

  Found volume group "VolGroup00" using metadata type lvm2

[root@recoverybox ~]# pvscan

  PV /dev/md2    VG VolGroup01   lvm2 [73.91 GB / 32.00 MB free]

  PV /dev/hda2   VG VolGroup00   lvm2 [18.91 GB / 32.00 MB free]

  Total: 2 [92.81 GB] / in use: 2 [92.81 GB] / in no VG: 0 [0   ]

[root@recoverybox ~]# vgchange VolGroup01 -a y

  1 logical volume(s) in volume group "VolGroup01" now active

[root@recoverybox ~]# lvscan

  ACTIVE            '/dev/VolGroup01/LogVol00' [73.88 GB] inherit

  ACTIVE            '/dev/VolGroup00/LogVol00' [18.38 GB] inherit

  ACTIVE            '/dev/VolGroup00/LogVol01' [512.00 MB] inherit

At this point, you can now mount the old volume on the new system, and gain access to the files within, as shown in Listing 8.

Listing 8. Mounting the Recovered Volume

[root@recoverybox ~]# mount /dev/VolGroup01/LogVol00 /mnt

[root@recoverybox ~]# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/mapper/VolGroup00-LogVol00

                       19G  4.7G   13G  28% /

/dev/hda1              99M   12M   82M  13% /boot

none                  126M     0  126M   0% /dev/shm

/dev/mapper/VolGroup01-LogVol00

                       73G  2.5G   67G   4% /mnt

# ls -l /mnt

total 200

drwxr-xr-x   2 root root  4096 Feb  6 02:36 bin

drwxr-xr-x   2 root root  4096 Feb  5 18:03 boot

drwxr-xr-x   4 root root  4096 Feb  5 18:03 dev

drwxr-xr-x  79 root root 12288 Feb  6 23:54 etc

drwxr-xr-x   3 root root  4096 Feb  6 01:11 home

drwxr-xr-x   2 root root  4096 Feb 21  2005 initrd

drwxr-xr-x  11 root root  4096 Feb  6 02:36 lib

drwx------   2 root root 16384 Feb  5 17:59 lost+found

drwxr-xr-x   3 root root  4096 Feb  6 22:12 media

drwxr-xr-x   2 root root  4096 Oct  7 09:03 misc

drwxr-xr-x   2 root root  4096 Feb 21  2005 mnt

drwxr-xr-x   2 root root  4096 Feb 21  2005 opt

drwxr-xr-x   2 root root  4096 Feb  5 18:03 proc

drwxr-x---   5 root root  4096 Feb  7 00:19 root

drwxr-xr-x   2 root root 12288 Feb  6 22:37 sbin

drwxr-xr-x   2 root root  4096 Feb  5 23:04 selinux

drwxr-xr-x   2 root root  4096 Feb 21  2005 srv

drwxr-xr-x   2 root root  4096 Feb  5 18:03 sys

drwxr-xr-x   3 root root  4096 Feb  6 00:22 tftpboot

drwxrwxrwt   5 root root  4096 Feb  7 00:21 tmp

drwxr-xr-x  15 root root  4096 Feb  6 22:33 usr

drwxr-xr-x  20 root root  4096 Feb  5 23:15 var

Now that you have access to your data, a prudent final step would be to back up the volume group information with vcfgbackup, as Listing 9 shows.

Listing 9. Backing Up Recovered Volume Group Configuration

[root@teapot-new ~]# vgcfgbackup

Volume group "VolGroup01" successfully backed up.

Volume group "VolGroup00" successfully backed up.

[root@teapot-new ~]# ls -l /etc/lvm/backup/

total 24

-rw-------  1 root root 1350 Feb 10 09:09 VolGroup00

-rw-------  1 root root 1051 Feb 10 09:09 VolGroup01

Conclusion

LVM2 and Linux software RAID make it possible to create economical, reliable storage solutions with commodity hardware. One trade-off involved is that some procedures for recovering from failure situations may not be clear. A tool that reliably extracted old volume group information directly from the disk would make recovery easier. Fortunately, the designers of the LVM2 system had the wisdom to keep plain-text backup copies of the configuration on the disk itself. With a little patience and some research, I was able to regain access to the logical volume I thought was lost; may you have as much success with your LVM2 and RAID installation.

Resources for this article: /article/8948.

Richard Bullington-McGuire is the Managing Partner of PKR Internet, LLC, a software and systems consulting firm in Arlington, Virginia, specializing in Linux, Open Source and Java. He has been a Linux sysadmin since 1994. You can reach him at rbulling@pkrinternet.com.