I've been taking some personal time over the holidays. Technically still doing that, but decided to drop in for a minute and see what's been going on.
The issue affected my setup, but I'm running 8TB Seagate Exos (kissin' cousin to Iron Wolfs) and a Marvel based controller. I think the problem is both bigger and smaller than reported: bigger in that more drives and controllers are affected than you see in that thread, and smaller in that the overall impact of this issue seems to be a very small population of Unraid users.
In general, I would say to tread cautiously if you have any 8TB or 10TB Seagate drives, regardless of what's printed on the label or what controller you have them connected. My guess is that this is probably a drive firmware incompatibility with a Linux feature change, and that it has been mis-interpreted by the user base, and as such it could rear its head with any size drive assuming it has the firmware issue. I think this is why the problem is less widely reported than you would expect, since not every 8/10 TB drive was affected.
If you upgrade to v6.9.x, the problems should appear within a few days, quicker if you do a parity check. Essentially, you won't be able to make it through a parity check (or rebuild). False errors are reported, and Unraid takes the drives offline. The good news is that your data is not altered, so nothing is lost, but it is a pain to recover from this scenario. Essentially, I rolled back to 6.8, then told Unraid to trust my array was good and just adopt it as-is, and I was back to healthy. But it was a nerve-racking experience.
That's my take on this too. I don't want to be stuck on 6.8, there's some great stuff coming in 6.10 and later. Whatever is causing the problem seems to be part of the newer Linux kernels, and doesn't really seem to be on Lime Tech's radar. There's always the possibility it gets resolved, but I think this is the new normal so I plan to try the fix myself. I've just been too busy to take on yet another project at the moment.
I'm conflicted on this. On the one hand, this seems like sage advice. But on the other, it is not common practice to tweak these settings, so in general I would say avoid this unless necessary.
The downside is that you have to upgrade to 6.9.x (or newer) just to see if the problem affects you, and then you get the fun of dealing with it. If you can upgrade to 6.9.x and complete a parity check, you're probably in the clear.
There's really not a win scenario here. You can either make questionable changes to your drives' configuration without knowing if they are needed, or you can upgrade and see if they are needed by waiting for the drives to get kicked offline. Manni's probably right, but I think you have to make your own judgement call about which path you prefer.
Just remember, having Unraid kick a drive offline does not mean you lost data, only that Unraid lost confidence in a drive. If it happens, don't panic, and definitely don't take any actions that could cause data to be wiped out. If you're unsure, just stop and ask for directions.
Also keep in mind that since we are applying the fix at the drive level, then any new/replacement drives need to be dealt with going forward, as you won't have the luxury of fixing them before upgrading to 6.9 or newer since you'll already be on it.
I think I've got 6 of these drives now. Two confirmed with the issue, two installed after reverting to 6.8.3, and two more pre-cleared but otherwise unused, installing soon. I had begun buying them because they were a great value - sometimes it's worth buying a better drive.
That's also one of the benefits of doing a pre-clear on your actual server, vs. doing one on a separate machine. It could be a bad drive, bad power, bad cabling, bad controller, or even some incompatible combination of parts. On my first server, I identified a couple drive bays that were flaky, and eventually learned to never put any drives in them. Never figured out the root problem, but the end result was randomly consistent.Manni wrote: ↑Fri Dec 31, 2021 10:30 pmInterestingly enough, I’m currently running a couple 6TB I’m not sure about through a pre-clear, and weirdly one of them seems to hang the unraid, despite the fact that both went through the WD Diag write zero process without any problem (except that it took a bit longer than usual).
If you haven't already, you should take a few extra steps to help trap this issue. By default, logs are written to RAM, and are lost on reboot. You can configure them to write to your thumb drive. You can also stay logged into the console and monitor traffic on it. Typically, if there is a crash there are errors output on the console, and sometimes these errors don't make it into the log.
I'm not sure if I know the best way, but I can share the steps I took.
I created a Win10 Pro VM that autostarts with Unraid. I installed My Movies and restored a DB copy from my old My Movies server. My Movies still accesses my Unraid shares as if it was on a different PC, no fancy local mappings, treat it like it really is a separate PC in another room.
Once that is running, then I just installed CCC & CME - for these I just copied those two sub folders from the zip file to C:\. This does mean I have to manually upgrade both of these, since I don't run CMC on this VM, but that's super easy to do. I didn't do anything special in configuring CCC or CME, other than maybe using localhost as the My Movies server name.
The only other thing I did special was using a USB over IP adaptor that allows me to have an external USB based UHD Blu-ray drive in my office that is connected to the My Movies VM over the network (since my server is far away from my office). If I didn't do this, then I would need to run a My Movies client on my office desktop in order to read disc titles/chapters. I know this sounds like overkill, but it works amazingly well. I can rip discs using this drive over my network at pretty much the same speed as ripping locally.
I'm not sure if I covered what you were after - everything I did is pretty much basic standard fare, nothing special.