1. Store
  2. Apps
  3. Hardware
  4. Support
  5. Solutions

ClearFoundation

Forums
Welcome, Guest
RAID1 Degraded
(1 viewing) 1 Guest
Go to bottomPage: 12
TOPIC: RAID1 Degraded
#26683
Re: RAID1 Degraded 2 Years, 1 Month ago  
To install grub to both drives, from the console run:-
Code:

grub
root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)
quit


Note that usually sda = hd0, sdb = hd1 etc...

Then edit /etc/grub.conf and make sure there are two menu entries, one for each drive (denoted by the root(hd0) command). Note that with SATA drives, the numbering can change on reboot depending on the boot order specified in the BIOS
Code:

title Linux-DRIVE1 (2.6.18-194.8.1.v5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-194.8.1.v5 ro root=/dev/md2 vga=0x317 nomodeset panic=5
        initrd /initrd-2.6.18-194.8.1.v5.img
title Linux-DRIVE2 (2.6.18-194.8.1.v5)
        root (hd1,0)
        kernel /vmlinuz-2.6.18-194.8.1.v5 ro root=/dev/md2 vga=0x317 nomodeset panic=5
        initrd /initrd-2.6.18-194.8.1.v5.img

Tim Burgess
Moderator
Posts: 5802
graph
User Online Now Click here to see the profile of this user
The administrator has disabled public write access.
 
#46125
Re: RAID1 Degraded 8 Months, 1 Week ago  
I seem to be having the exact same situation described here. The array /dev/md2 is degraded, with /dev/sdb1 showing as failed, but /proc/mdstat shows sdb1 as the only active drive on that array. The array should be RAID-1 with two drives.

Here is the output of the commands. If anyone can advise, I would appreciate it.

cat /proc/mdstat
Code:

Personalities : [raid1]
md0 : active raid1 hdd1[1] hdb1[0]
      104320 blocks [2/2] [UU]

md2 : active raid1 sdb1[1]
      976759936 blocks [2/1] [_U]

md1 : active raid1 hdd3[1] hdb3[0]
      77023552 blocks [2/2] [UU]

unused devices: <none>


mdadm --detail /dev/md2
Code:


/dev/md2:
        Version : 0.90
  Creation Time : Wed May 26 19:28:40 2010
     Raid Level : raid1
     Array Size : 976759936 (931.51 GiB 1000.20 GB)
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sat Sep 15 16:08:58 2012
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 9945cd5a:685f7b3e:8dec7fed:4cdf3e65
         Events : 0.1275212

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       17        1      active sync   /dev/sdb1

Joshua
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
Last Edit: 2012/09/15 17:04 By Bardwell.
The administrator has disabled public write access.
 
#46131
Re: RAID1 Degraded 8 Months, 1 Week ago  
If (big IF) your missing partition from raid array md2 is sda1 you would add it back to the array using something like
Code:

mdadm --add /dev/md2 /dev/sda1


Then monitor the resync process in /proc/mdstat

Where is sdb1 shown as failed? I would trust the mdadm output...which suggests its fine. You might also want to check your SMART data for your failed drive, or for clues in 'dmesg' as to why it has been removed from the array
Tim Burgess
Moderator
Posts: 5802
graph
User Online Now Click here to see the profile of this user
The administrator has disabled public write access.
 
#46138
Re: RAID1 Degraded 8 Months, 1 Week ago  
The missing partition is /dev/sda, yes.

It's in the webUI that /dev/sda1 shows as failed. Specifically, "System - Hardware - RAID". See attached image.



I think you're onto something with dmesg. I'm not an expert on these topics, but even I can see that this doesn't look good.

Code:



Buffer I/O error on device sda, logical block 0
Buffer I/O error on device sda, logical block 1
Buffer I/O error on device sda, logical block 2
Buffer I/O error on device sda, logical block 3
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 128
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 0



I have on my to-do list to use the manufacturer's utility to test the drive out, which I plan to do after my son goes to bed.

I still am not sure why the webUI shows /dev/sdb1 as down, but it seems more and more clear that it's actually /dev/sda1.

Any further advice is most welcome. If the device tests out okay can I just go ahead and add it back in without creating further problems? If the device tests bad, I will need to replace the hard drive. If that's the case, I will definitely need further instructions, and would welcome any pointers you have to resources.

Thanks.
Joshua
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
Last Edit: 2012/09/15 20:57 By Bardwell.
The administrator has disabled public write access.
 
#46165
Re: RAID1 Degraded 8 Months, 1 Week ago  
Okay. I have run the manufacturer's "thorough" test utility on both drives, and both drives passed. That involved pulling the drives from the ClearOS machine and moving them to a different machine. At this point, I'm a bit befuddled, because I can't reconcile the passed tests with the error messages in dmesg. Maybe it was a loose cable or something. Or maybe the SATA controller itself is going bad in the ClearOS machine.

At this point, my plan is to reinstall the drive in the ClearOS machine and rebuild the RAID array using mdadm --add. I may also try switching the cables on the drives if there are any further errors, to see if that's the issue. Will that cause any corruption of the partitions or the RAID array? Do the physical volumes need to remain in the same order on the bus, or can the O/S sort it out?

Any further advice is welcome.
Joshua
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
Last Edit: 2012/09/16 17:31 By Bardwell.
The administrator has disabled public write access.
 
#46166
Re: RAID1 Degraded 8 Months, 1 Week ago  
Additional information: I just got finished running badblocks on the first drive that passed physical testing. It came back with zero bad blocks. I re-mounted the drive and checked the contents, and it appears to be stuff from about a year ago! So my best guess is this is the drive that was offline, the problem was not related to the drive itself, and it has been basically offline since then. How can I be sure that when I rebuild the array, the data from the correct drive gets treated as current? Is this something that RAID tracks internally?
Joshua
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#46192
Re: RAID1 Degraded 8 Months, 1 Week ago  
Well, I just wanted to give you the final resolution. Both disks checked out on the surface scan and with badblocks, so I reinstalled them and used madm --add as you described previously, and I seem to be back to 100%. I have no idea what caused the errors. Anyway, I'm back up and running, and now I know a lot more about troubleshooting software RAID than I did before, so thanks for the advice.
Joshua
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#46235
Re: RAID1 Degraded 8 Months ago  
Glad you got it working - probably got booted out of the array because of controller or IO errors (e.g. dodgy cable?)

Mdadm is clever about resyncing old drives to an existing array. If your array is not running, and you want to add a drive that has been used/ changed since...then it gets more tricky and you would have to be more specific about marking 'stale' members of the array
Tim Burgess
Moderator
Posts: 5802
graph
User Online Now Click here to see the profile of this user
The administrator has disabled public write access.
 
Go to topPage: 12
  get the latest posts directly to your desktop