Big scary Raw SMART value, is my drive dying?!
The other week I noticed this post on superuser.com:
4295032833, now that is a big scary number! That’s a lot of timeout errors!
Does this mean this drive is bad or even dying? Then how come the normalized values seem to indicate that nothing’s wrong? Some will answer that last question by telling you, you shouldn’t rely on normalized values but on RAW values instead. So then we’d go with big scary number?
No, usually when we see huge numbers it is a vendor specific RAW value and rather than interpreting the RAW value as one, we’re actually dealing with several values that we need to ‘break up’. Some times hard drive manufacturers provide the documentation that can help us do this. In this case we’re dealing with a Seagate drive and this document provides further info: http://t1.daumcdn.net/brunch/service/user/axm/file/zRYOdwPu3OMoKYmBOby1fEEQEbU.pdf.
What does this big scary number mean?
From the document we learn:
3.11 Attribute ID 188: Command Timeout Count
Normalized Command Timeout Count = 100 – Command Timeout Count .
This attribute tracks the number of command time outs as defined by an active command being interrupted by a HRESET and COMRESET or SRST or another command
The normalized value is only computed when the number of commands is in the range 103 to 104. The CommandCount and ErroCount are cleared when Number Of Commands reaches 104. The error count used to compute normalized value is not reported in attribute Raw value. It is reported in vendor info area of Attribute sector, bytes 474:475. If Command Timeout Count is > 99, normalize value of 1 is reported. The initial Worst Value is set to 0xFD as a special case.
Raw [1 – 0] = Total # of command timeouts, with Max hold of FFFFh
Raw [3 – 2] = Total # of commands with > 5 second completion, including those > 7.5 seconds
Raw [5 – 4] = Total # of commands with > 7.5 second completion
Decoding the huge Raw SMART value
Okay, we need that last section “Raw usage” to interpret our big scary number. We see it is not one value.
Big scary number is So 4295032833, we need to break that into 3 and for that we need to convert it to HEX. So we get HEX = 0x100010001.
It may just be me, but that looks less scary already!
We need to break this into 3 separate word values so we get 0x0001, 0x0001 and 0x0001. Using the Seagate document I then decode this as we had one time-out error, and one that took > 5 seconds to complete and one that took longer than 7.5 seconds to complete. Or in other words, we had one error that took longer than 7.5 seconds to complete.
Summarizing, big scary decimal Raw SMART value 4295032833 tells us one command timeout error that took longer than 7.5 seconds to complete occurred.
which I would not worry about too much.