Hard Drives Randomly becoming Disabled in unRAID

Recently I have had hard drives consistently becoming "disabled" on my new AsRock EP2C602-4L/D16.  At first I figured it was because I ripped out the unRAID USB drive from a Quanta S210-X22RQ and into the AsRock motherboard.  Probably not the wisest move but it worked out, well, unRAID installing itself on the new hardware and working it's magic to recognize all previous devices part worked out.

Symptoms

Well, I actually wasn't experiencing any issues, unRAID is pretty freakin resilient so I didn't notice I was having issues until I checked the dashboard and saw this...

unraid.png

Troubleshooting

At first glance I thought one of my disks had failed, despite being just over 2 years old, I felt like it was a bit too early for them to start causing issues.  None-the-less, I had to do a SMART scan to ensure the drives were okay.

  • SMART scan proved the disabled drives were good, so then I had to try something else

While, I'm not too familiar with unRAID, despite the fact I have used it for almost 4 years now, some quick research lead me to believe maybe it was faulty RAM.  Well, the RAM has always been fine in the past, in fact the RAM also came from the working Quanta S210-X22RQ, however folks online highly suggested I run MemTest86 to see if any errors were produced.  I let the tester run for just over 24 hours to allow it test all 64GBs of RAM.

  • MemTest86 didn't show any problems with the RAM

Hmmm, what else could it be.... At this point I was starting to get a bit worried.  The server was already down for 4 days at this point and I really needed it operational.  After scouring the web trying to find others with similar issues I was starting to feel a bit defeated because the information I did find from other people didn't solve my issues.

Some of the things other people with similar issues did to fix their server

  1. Replace failed ram

  2. Replaced a failed drive identified by SMART

  3. Tried different SATA ports on motherboard

  4. Swap out HBA / SAS cards

  5. Exchange SATA cables

  6. Try different molex or SATA power cables

  7. Swap power supplies

  8. Avoid using a backplane or replace it

After trying most of those things, the drives would still fail during a Parity Sync or after some duration of time when added back to the array.

Now, at this point I was real worried, I was thinking that maybe my AsRock motherboard was bad and I was going to have to get a replacement.  Finally I stopped looking for answers and just started asking around online.  Eventually, someone pointedly asked me which SATA controller types I was using... odd thing to ask I thought, but I told them I was using on-board Intel, on-board Marvell, and a LSI 9211-8i card.  Then as if the answer was always so simple he/she tells me this.

u/HoodleToodle

Well, first thing is unRAID doesn't work terribly well with Marvell controllers, as Marvell drivers suck donkey balls. I'd turn that off and move all your drives to the LSI and see if that cures your problem.

Screenshot.png

Solution

After reading his/her reply, I immediately ripped out all the drives on the Marvell SATA ports and only used the on-board Intel and my 9211-8i SATA ports.  And just like that everything started to work like I would expect to see but with only 1 problem.  Parity Sync was estimated to take 2 Days and 17 Hours!!!!  That was certainly not normal.

IMG-2EE2ACF1B362-1.jpg

I checked unRAID first thing in the morning to see if any disks failed during Parity Sync overnight.  FINALLY!!!! No disabled disks.

Apparently it is also normal for a Parity Sync to take an abnormally long time to finish under certain events.

  1. Hard Shutdown

  2. Shutting down while a Sync is occurring

  3. Basically interrupting a Sync in anyway

Everything seems to be running fine for now, so if you feel like you have tried everything and still have a problem, maybe, just maybe; Marvell is screwing you over too.

System Configuration

I am appending this information to help give you a better idea of my setup.  Just in case.