1. Store
  2. Apps
  3. Hardware
  4. Support
  5. Solutions

ClearFoundation

Forums
Welcome, Guest
Beware of 2 TB drives
(1 viewing) 1 Guest
Go to bottomPage: 1
TOPIC: Beware of 2 TB drives
#15489
Beware of 2 TB drives 4 Years, 2 Months ago  
Friends,

This is not a ClearOS issue in any way, just an attempt to keep others from experiencing my disasters. I have had so many 2TB drives die in the last month, that I have managed to lose both main and backup systems in close sequence, thus losing data despite considerable precaution. Fortunately, I've managed to recover most of it through other methods, but I have lost some family photos and other items permanently.

It was about time to retire and expand my RAID 5 of 500GB drives, so I naturally thought I'd go with a nice big 2 TB unit and save the complexity of RAID. I have a backup system that mirrors this server nightly, so it seemed justified. I read many reviews and saw only positive reports. Of course, these reviews tend to (very foolishly, IMHO) concentrate on a few percent difference in speed between models. I should have done *much* more homework.

My conclusion is now that the current class of 2GB drives are inherently unreliable and suffer from untolerable levels of DOA and early failure.

I have now had four 2 TB drives die within a week of installation. All 5 of my 500GB (Seagate, SATA) drives ran without failure for 5 years. I'm a scientist and an engineer, so I didn't just go by this anecdotal evidence. I did a (brief, admittedly) metaanalysis of many retailer user reviews and found a reported failure/return rate of between 15% and 25% among brands. This seemed to be true of both Seagate and WD drives. Failure of 2 TB Seagate drives has been the subject of many articles and field reports, but the WD stats looked only slightly better. Reports for Hitachi and Samsung were too few to get a clear picture. Similar study of 1 TB drives gave much better results.

Bear in mind, unhappy people write more reviews, so the failure rate is undoubtedly not as high as noted, but the numbers are still VERY concerning. Reports included apparent failures for no reason, issues with firmware revision from some vendors, some OS compatibility issues, so called "Green" features causing problems, and SMART features causing drives to drop out of RAID arrays. Simple DOA/early failure was the biggest category.

My problems were with various WD drives, since I had stayed away from Seagate due to the many reports. After the 4th dead WD drive, I purchased a Hitachi unit, but it is too soon to know of results. If I have no issues with one drive, remember that this proves nothing. Of course, many drives just work fine. Is 80% or 90% good enough for you? I have no doubt that the problem is industry wide.

One possible solution was to buy so called enterprise level drives. These are not pure hype as many suggest, they do add some RAID friendly features, use better bearings, and add other vibration tolerance features. Manufacturers rate them with better error rate and MTBF (mean time before failure). So I had hope, but a review similar to what I had undertaken for the consumer drives showed similar complaint/failure rate as for the consumer units. Perhaps better design, but apparently no better Quality Assurance.

I suspect that these units are working near their density/accuracy limits, and will get much better with some time. Right now, frankly, they suck. I have some hope for the Hitachi, as it uses 5 platter rather than 4, with a density of 285 (GB/in2) rather than 400 for WD and Seagate. The competitors tout this as an advancement and feature, but the more conservative design of the Hitachi unit seems wise given the evidence.

If you need a new large drive, I recommend:

1. If reliability is critical, and you need a drive now, consider a 1 or 1.5 TB drive.

2. Or, wait a year and see how things improve.

3. Do your homework. Don't be swayed by anecdotal evidence. It doesn't matter that your Uncle Joe bought brand X and it was fine.

4. Backup, backup, backup. Don't change too many things at once as I did. Assure confidence with one system before moving on to another.

5. Don't expect RAID to save your hide. Under these circumstances, it is almost as likely to burn you as to save you. Don't take my word, run through the statistical analysis; it's why RAID 6 is coming on board. And RAID 6 is only a bit better.

Thanks for listening to my rant. I sincerely hope it saves someone some pain.

Let's Be Careful Out There!
Drew
Drew Vonada-Smith
Platinum Boarder
Posts: 493
graphgraph
User Offline Click here to see the profile of this user
Last Edit: 2010/10/28 07:01 By NatureDude.
The administrator has disabled public write access.
 
#15506
Re:Beware of 2 TB drives 4 Years, 2 Months ago  
I agree with much of what you have written here, excellent post! However would change one of your recommendations. I follow the linux-raid Newsgroup for the excellent discussion there, especially the comments from Neil Brown who is the maintainer of the mdadm program. Anyway, there have been many reports there of 1.5 TB drives also having problems and the recommendation, as you mentioned, of only using Raid 6 or 10 with 1.5 TB and larger drives.

The big problem with using only Raid 5 is that if you loose a disk, then the chances with such big disks of having a read error on any one of the remaining disks during the rebuild of the newly added disk are so high that there is a chance of one of the original disks being kicked out of the array, and you are left with too few disks for a viable array. With Raid 6 you still have one level of redundancy left.

One other factor is heat. Most desktop cases are not suitable for a large number of drives, even though they might have the bays to fit them. In most cases the drives are too close together and/or the air flow is totally inadequate. Therefore they run too hot and suffer pre-mature failure.

All modern drives have a temperature sensor which can be accessed very easily. Thus programs such as mrtg (as used in ClearOS) can be easily enhanced to record and graph drive temperatures. In my experience one of the most unreliable parts in a server are the fans...
Tony Ellis
Platinum Boarder
Posts: 1258
graphgraph
User Offline Click here to see the profile of this user
Last Edit: 2010/08/08 21:26 By track.
The administrator has disabled public write access.
 
#15507
Re:Beware of 2 TB drives 4 Years, 2 Months ago  
Tony,

Thank you, and I indeed stand corrected. After looking again, 1.5 TB units do look better than 2 TB, but I agree that 1 TB is the sweet spot and more is trouble.

I also agree that the most unreliable parts of any system are:
1. Fans
2. Power Supply

Many fans and dual redundant supplies are quite useful in a server! A simple MTBF analysis will bear that out, the science dates back to the Minuteman program. (I'm an electrical engineer.) Hard drives are probably third, and frankly, it is just amazing that they are as good as they are.

Google did a very interesting study on HD failure, you can find the paper online. A few abbreviated conclusions are:

1. Heat matters, but not until quite hot. The rise from 25C to 35C is ok.
2. SMART is not a good predictor. However, when one DOES get a SMART error, even a single sector remap, the drive is dramatically more likely to fail.
3. Number one predictor of reliability is brand and model. The Google study does not disclose the winners and losers.

P.S. Google primarily uses Hitachi drives in their servers. *wink wink*

Drew
Drew Vonada-Smith
Platinum Boarder
Posts: 493
graphgraph
User Offline Click here to see the profile of this user
Last Edit: 2010/10/28 07:03 By NatureDude.
The administrator has disabled public write access.
 
#15690
Re:Beware of 2 TB drives - but good Hitachi news? 4 Years, 2 Months ago  
EDIT - This vendor ran out of stock in about a week. The early bird catches the worm!

FYI for all interested. My 2TB Hitachi is working fine. This doesn't prove much, of course, but at least this one is rock steady.

Another flash; for those of you who want to make the plunge, A vendor on Amazon, uBuy, is selling the Hitachi 2TB UltraStar for $150. That's about half of everyone else, about the same as the consumer version. This is the enterprise version suitable for RAID, etc. They don't seem to know what they have. I just received two and can confirm that they are indeed Ultrastars. Just search for model A7K2000 or HUA722020ALA330.

Drew
Drew Vonada-Smith
Platinum Boarder
Posts: 493
graphgraph
User Offline Click here to see the profile of this user
Last Edit: 2010/10/28 07:16 By NatureDude.
The administrator has disabled public write access.
 
#19584
Re:Beware of 2 TB drives 3 Years, 12 Months ago  
You should use WD RE drives, normal drives are not suitable for raid configurations due the lack of "TLER"

Normal WD drives take up to 60 seconds to recover however raid gives a drive about 6 till 25 seconds to recover.
Because of this issue the drive gets booted out of the raid array and you will lose your data.

So there is nothing wrong with clearOS raid or the new big advanced sector drives (4k drives)
Use raid drives for raid and align new advanced sector drives before putting any data on it.
MagicOnline
Fresh Boarder
Posts: 2
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#19585
Re:Beware of 2 TB drives 3 Years, 12 Months ago  
Magic,

No one noted a problem with ClearOS RAID. The thread did also note use of enterprise level drives; it is not only WD that makes them. The Hitachi Ultrastar is a similar beast with TLER. The subject was poor reliability of 2 TB drives generally, not RAID, TLER or sector alignment.


All,

My current experience with the Hitachi Ultrastar and Deskstar units is very good, once I was able to find the cause of incompatibilty with my ICH7R controller. While this controller does support NCQ, it does not support non-zero buffer offset used for out of order data delivery. Hitachi says that they make full use of this feature in the 2TB drives. This causes serious errors on ICH7 unless NCQ is disabled, which is easily done via queue_depth. The performance hit is negligable in my case, and some report a performance gain. Apparently the non-zero buffer offset limitation can also be intercepted and corrected in BIOS, which my motherboard apparently does not implement.

Regarding reliability, as always, your mileage may vary.

Drew


MagicOnline wrote:
You should use WD RE drives, normal drives are not suitable for raid configurations due the lack of "TLER"

Normal WD drives take up to 60 seconds to recover however raid gives a drive about 6 till 25 seconds to recover.
Because of this issue the drive gets booted out of the raid array and you will lose your data.

So there is nothing wrong with clearOS raid or the new big advanced sector drives (4k drives)
Use raid drives for raid and align new advanced sector drives before putting any data on it.
Drew Vonada-Smith
Platinum Boarder
Posts: 493
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
Go to topPage: 1
  get the latest posts directly to your desktop