I referenced these instructions to remind me how to replace a drive.
In my case the output of mdstat looks like this:
# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda2[0] sdb2[1]
33615936 blocks [2/2] [UU]
md1 : active raid1 sda3[2](F) sdb3[1]
2096384 blocks [2/1] [_U]
md0 : active raid1 sda1[0] sdb1[1]
128384 blocks [2/2] [UU]
unused devices:
So I have three partitions on two drives raided together. And sda3 is failing. This is the message I received in email.
This is an automatically generated mail message from mdadm running on host.domain.com A Fail event had been detected on md device /dev/md1. Faithfully yours, etc.
Device Boot Start End Blocks Id System
/dev/sda1 * 1 16 128488+ fd Linux raid autodetect
/dev/sda2 17 4201 33616012+ fd Linux raid autodetect
/dev/sda3 4202 4462 2096482+ fd Linux raid autodetect
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 16 128488+ fd Linux raid autodetect
/dev/sdb2 17 4201 33616012+ fd Linux raid autodetect
/dev/sdb3 4202 4462 2096482+ fd Linux raid autodetect
Disk /dev/md0: 131 MB, 131465216 bytes
Disk /dev/md1: 2146 MB, 2146697216 bytes
Disk /dev/md2: 34.4 GB, 34422718464 bytes
Removing the failed partition(s) and disk:
I used the mdadm command to first fail
mdadm –manage -dev/md0 –fail /dev/sda2
mdadm –manage -dev/md1 –fail /dev/sda3
mdadm –manage -dev/md2 –fail /dev/sda1
then remove the raid devices on the failing drive.
madam –manage /dev/md0 –remove /dev/sda2
madam –manage /dev/md1 –remove /dev/sda3
madam –manage /dev/md2 –remove /dev/sda1
Then I shut down the system
shutdown -h now
and replaced the drive with a new one. Then I tried to reboot. But because the failed drive was the first drive in the scsi chain, it failed to boot with the message.
No Operating System Present
Adding the new disk to the RAID Array:
So I ended up having to switch the drives, putting sdb in as sda and then proceeding. I used sfdisk to mirror the partitioning between the two drives.
sfdisk -d /dev/sd1 | sfdisk /dev/sdb
Add the partitions back into the RAID Arrays:
mdadm –manage /dev/md0 –add /dev/sdb2
mdadm –manage /dev/md1 –add /dev/sdb3
mdadm –manage /dev/md2 –add /dev/sdb1
cat /proc/mdstat
I could see the drive rebuilding. When it finished I hot swapped out sda and did the whole process over again, this time without rebooting the system, since the system uses hot swap drives. It worked fine and I had both drives up and running. I could have done the whole process without rebooting the machine.
Install Grub on new hard drive MBR:
# grub
grub> find /grub/stage1
(hd0,0)
grub> device (hd0) /dev/sdb
grub> root (hd0,0)
grub> setup (hd0)
grub> find /grub/stage1
(hd0,0)
(hd1,0)
grub> quit
So now I have the boot manager mirrored on both drives. I can reboot with either single drive and it will work fine.