Is my disk dying?

Hi!
I bought a server and after running YABS I see disk speed is not as fast as another server I have.

I’m using 2x240GB SSD in soft RAID1.

YABS

# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
#              Yet-Another-Bench-Script              #
#                     v2020-09-21                    #
# https://github.com/masonr/yet-another-bench-script #
# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #

Thu 05 Nov 2020 12:29:42 PM EST

Basic System Information:
---------------------------------
Processor  : Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
CPU cores  : 12 @ 1522.058 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ✔ Enabled
RAM        : 62Gi
Swap       : 14Gi
Disk       : 205G

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 956.00 KB/s    (239) | 15.01 MB/s     (234)
Write      | 991.00 KB/s    (247) | 15.56 MB/s     (243)
Total      | 1.94 MB/s      (486) | 30.57 MB/s     (477)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 58.17 MB/s     (113) | 62.31 MB/s      (60)
Write      | 61.02 MB/s     (119) | 66.75 MB/s      (65)
Total      | 119.19 MB/s    (232) | 129.07 MB/s    (125)

This is the smartctl output:

sda

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-12-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:     FCCT240M500SSD1
Serial Number:    140409659E78
LU WWN Device Id: 5 00a075 109659e78
Firmware Version: MU05
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Nov  5 12:27:36 2020 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x03)	Offline data collection activity
					is in progress.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		( 1030) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  18) minutes.
Conveyance self-test routine
recommended polling time: 	 (   3) minutes.
SCT capabilities: 	       (0x0035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       36204
  5 Reallocated_Sector_Ct   0x0033   092   092   000    Pre-fail  Always       -       336
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       24414
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       56
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   181   181   000    Old_age   Always       -       12947
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       56
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       3730
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       26014
194 Temperature_Celsius     0x0022   077   068   000    Old_age   Always       -       23 (Min/Max 0/32)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       336
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Unknown_SSD_Attribute   0x0031   181   181   000    Pre-fail  Offline      -       431
206 Unknown_SSD_Attribute   0x000e   100   100   000    Old_age   Always       -       0
210 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       77
246 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       53036564077
247 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       1657950280
248 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       4000476292
 
SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 4
 
ATA Error Count: 0
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
Error 0 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in an unknown state.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 f7 fe b7 40  Error: UNC at LBA = 0x00b7fef7 = 12058359
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 d5 01 f7 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  60 da 01 f6 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  60 00 01 f5 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  60 00 01 f4 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  60 00 01 f3 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
 
Error -1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in an unknown state.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 fe b7 40  Error: UNC at LBA = 0x00b7fe00 = 12058112
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  2f 00 01 10 00 00 e0 00  21d+16:43:14.080  READ LOG EXT
  60 da 00 00 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  61 00 80 b0 4c f1 40 00  21d+16:43:14.080  WRITE FPDMA QUEUED
  e5 00 00 00 00 00 00 00  21d+16:43:14.080  CHECK POWER MODE
 
Error -2 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in an unknown state.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 fe b7 40  Error: UNC at LBA = 0x00b7fe00 = 12058112
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 da 00 00 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  61 00 80 b0 4c f1 40 00  21d+16:43:14.080  WRITE FPDMA QUEUED
  e5 00 00 00 00 00 00 00  21d+16:43:14.080  CHECK POWER MODE
  61 d5 01 f6 fe b7 40 00  21d+16:43:14.080  WRITE FPDMA QUEUED
  60 da 01 c8 a7 f1 40 00  21d+16:43:14.080  READ FPDMA QUEUED
 
Error -3 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in an unknown state.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 f6 fe b7 40  Error: UNC at LBA = 0x00b7fef6 = 12058358
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 da 01 f6 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  60 00 01 f5 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  60 00 01 f4 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  60 00 01 f3 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  60 d5 01 f2 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
 
Error -4 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in an unknown state.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 fe b7 40  Error: UNC at LBA = 0x00b7fe00 = 12058112
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  2f 00 01 10 00 00 e0 00  21d+16:43:14.080  READ LOG EXT
  60 da 00 00 fe b7 40 00  21d+16:43:14.080  READ FPDMA QUEUED
  61 00 80 b0 4c f1 40 00  21d+16:43:14.080  WRITE FPDMA QUEUED
  e5 00 00 00 00 00 00 00  21d+16:43:14.080  CHECK POWER MODE
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Vendor (0xff)       Completed without error       00%     24408         -
# 2  Vendor (0xff)       Completed without error       00%     24394         -
# 3  Vendor (0xff)       Completed without error       00%     24391         -
# 4  Vendor (0xff)       Completed without error       00%     24370         -
# 5  Vendor (0xff)       Completed without error       00%     24364         -
# 6  Vendor (0xff)       Completed without error       00%     24361         -
# 7  Vendor (0xff)       Completed without error       00%     24360         -
# 8  Vendor (0xff)       Completed without error       00%     24358         -
# 9  Vendor (0xff)       Completed without error       00%     24357         -
#10  Vendor (0xff)       Completed without error       00%     24355         -
#11  Vendor (0xff)       Completed without error       00%     24355         -
#12  Vendor (0xff)       Completed without error       00%     24353         -
#13  Vendor (0xff)       Completed without error       00%     24332         -
#14  Vendor (0xff)       Completed without error       00%     24326         -
#15  Vendor (0xff)       Completed without error       00%     24309         -
#16  Vendor (0xff)       Completed without error       00%     24267         -
#17  Vendor (0xff)       Completed without error       00%     24246         -
#18  Vendor (0xff)       Completed without error       00%        41         -
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

sdb

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-12-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:     EDGE SE847-V SSD
Serial Number:    CD1606131001EE041
LU WWN Device Id: 5 888914 1001ee041
Firmware Version: P0330AA
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Nov  5 12:27:33 2020 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x71) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 (   2) minutes.
Conveyance self-test routine
recommended polling time: 	 (   1) minutes.
SCT capabilities: 	       (0x0035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       1333
 12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -       46
160 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
161 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       30
163 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       331
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       263340
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       3601
150 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       3401
151 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       3558
164 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       348000
165 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       314
166 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       215
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       271
169 Unknown_Attribute       0x0000   100   100   001    Old_age   Offline      -       86
181 Program_Fail_Cnt_Total  0x0000   100   100   000    Old_age   Offline      -       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -       0
192 Power-Off_Retract_Count 0x0000   100   100   000    Old_age   Offline      -       21
194 Temperature_Celsius     0x0000   100   100   070    Old_age   Offline      -       16 (24 23 25 27 0)
199 UDMA_CRC_Error_Count    0x0000   100   100   000    Old_age   Offline      -       0
232 Available_Reservd_Space 0x0000   100   100   000    Old_age   Offline      -       100
241 Total_LBAs_Written      0x0000   100   100   000    Old_age   Offline      -       480232
242 Total_LBAs_Read         0x0000   100   100   000    Old_age   Offline      -       226662
245 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       2055375
246 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       526680
247 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
 
SMART Error Log Version: 1
No Errors Logged
 
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
    6        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I should learn to interpret this stuff.

Looks like your first disk (sda) is starting to go:

Ideally, this would be at 0, but youre at 336 reallocated sectors currently. Usually when sectors start to fail like this, it’ll snowball until the disk is unusable.

1 Like

As Mason said the first drive seem bad would get that disk replaced.

SDA is a older generation of crucial SSD nothing wrong with if it still error free sdb is EDGE SE847-V never herd of that brand before https://m.cdw.com/product/edge-se847-v-solid-state-drive-250-gb-sata-6gb-s/4236251

1 Like

Thanks guys for the confirmation. Sent a message to the provider and they will replace the disk.

seems slow for ssd - however, make sure the raid-1 is not running its initial sync anymore, while you are already benchmarking, because the numbers won’t reflect reality… cat /proc/mdstat is your friend

2 Likes

Raid status was normal.

Reallocated sectors can be a normal part of the aging process on SSDs. Usually, SSDs have a LBAs written SMART attribute, which you can translate into TB written (TBW), and that should be on the manufacturer’s spec sheet for the drive. Once you go over the TBW that the manufacturer says is the life of the drive, continuing to use it can be a bit of a gamble. Some of the more premium manufacturers build in a significant buffer on their TBW ratings, and for instance, some Samsung drives go well beyond their TBW rating before dying.

Some manufacturers also have a life remaining SMART attribute, which counts down from 100% to 0% as the drive ages. This is another method of monitoring the drive’s health.

2 Likes

Thank you all!

Disk was replaced and I get better results now.

YABS:

# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
#              Yet-Another-Bench-Script              #
#                     v2020-09-21                    #
# https://github.com/masonr/yet-another-bench-script #
# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #

Fri 06 Nov 2020 12:05:36 PM EST

Basic System Information:
---------------------------------
Processor  : Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
CPU cores  : 12 @ 1751.951 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ✔ Enabled
RAM        : 62Gi
Swap       : 14Gi
Disk       : 205G

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 48.66 MB/s   (12.1k) | 61.23 MB/s     (956)
Write      | 48.73 MB/s   (12.1k) | 61.62 MB/s     (962)
Total      | 97.40 MB/s   (24.3k) | 122.86 MB/s   (1.9k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 77.64 MB/s     (151) | 83.02 MB/s      (81)
Write      | 81.77 MB/s     (159) | 88.55 MB/s      (86)
Total      | 159.41 MB/s    (310) | 171.58 MB/s    (167)

iperf3 Network Speed Tests (IPv6):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed     
                |                           |                 |                
Clouvider       | London, UK (10G)          | 862 Mbits/sec   | 274 Mbits/sec  
Online.net      | Paris, FR (10G)           | 857 Mbits/sec   | 179 Mbits/sec  
WorldStream     | The Netherlands (10G)     | 846 Mbits/sec   | 174 Mbits/sec  
Wifx            | Zurich, CH (10G)          | busy            | busy           
Clouvider       | NYC, NY, US (10G)         | 908 Mbits/sec   | 585 Mbits/sec  
Clouvider       | Los Angeles, CA, US (10G) | 59.2 Mbits/sec  | 799 Mbits/sec  

Geekbench 5 Benchmark Test:
---------------------------------
Test            | Value                         
                |                               
Single Core     | 576                           
Multi Core      | 2932                          
Full Test       | https://browser.geekbench.com/v5/cpu/4569540

Not the best disk speeds but it will work.

Bought this server at $160 on LET from LevelOneServers and it’s being colocated in Dallas. I hope it lasts a year at least :smiley:

1 Like

Bought, you own the hardware?

That’s what they said.