5 04 2015
How to reset HGST Hitachi SMART values S.M.A.R.T. Spin_Retry_Count degraded mdadm raid array problem raid5 raid6 raid1 raid01 raid10 smartctl ultrastar deskstar a7k2000 a7k3000 a7k1000
Problem: your raid array has degraded, one or multiple disks are failing S.M.A.R.T. tests, because of Spin_retry_count. Actually all the data is OK, just the spin_retry_count keeps happening. The drives are perfecly fine, but the S.M.A.R.T. value makes the drive almost useless because bios and/or mdadm won’t accept it anymore for your raid array. This might be a costly thing, as quite a few drives will fail until you finally find out the cause is the power supply.
Reason: probably an overloaded power supply (still trying to test this out).
1) Download the Disk Fitness test tools from http://www.hgst.com/support/hdd-support.
In your BIOS, you might want to change the SATA functions to “IDE”. If the HDD is not found by the Disk Fitness Test tool, connect to a different SATA port.
2) Disable the SMART features using Utilities->ATA FUNCTIONS->S.M.A.R.T. Operations, then hit alt-x to exit the tool. you don’t need to run the disk tests a this time, they will show the bad SMART status.
3) Switch of PC and put the disk into a ICY BOX ib-120cl-u3, connected via USB to a ubuntu linux (in my case connected to ubuntu linux runnin in a VM on osx), sometimes requires 2-3 trials, suddenly some of the S.M.A.R.T. values seem to be reset, including Spin_retry_count …
on the command line, run
smartctl -H /dev/sdb -d sat
on my machines, this repetitively showed SMART values “PASSED” and automatically enabled the smart test again. You may reconnect the drive to the Disk Fitness Test tool and it will show the drive is “Good”, too. Don’t ask me why, but this way i could reset the SMART values for multiple disks. I discovered this by accident and tried to find out how and when the values have been reset.
After fixing those S.M.A.R.T. values, the drive should be working in the mdadm array again. ATTENTION: don’t try this for Reallocated_Sector_Ct etc. (this means your drive really is broken). The Spin_Retry_Count in my case is definately a non-issue, and the drives were perfectly fine (even happened to drives with less than 2 power on hours!).
nevertheless,this happened to me many times, i have no idea whether it is connected to restarting the server or it’s power supply or simply a firmware bug, at least this solutions is working fine for me. I’ve reduced the drive count on my server now, to test if it was an overloaded power supply.