Running Ubuntu 14.04.3 with kernel 3.13.0-65-generic on an Intel 535 series ssd which I bought back in September. Ever since I bought it I have been unable to run smart tests on it they always manually abort thought this might be a firmware thing so I just let it go. Recently however my server has been locking up while over ssh, while playing a video over samba it is not for very long but it is annoying because it should not do this. I started out using rg20 firmware but updated this morning to rg21 (ssd toolbox on windows) which does not seem to make a difference. Also ran a full diagnostic test that passed 100% but does not show up under smartctl. I have replaced the sata cable as well which did not make a difference. This is the error:
Oct 16 09:37:54 LANbox kernel: [ 107.868025] ata3: lost interrupt (Status 0x50)
Oct 16 09:37:54 LANbox kernel: [ 107.868041] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct 16 09:37:54 LANbox kernel: [ 107.868046] ata3.00: failed command: READ DMA
Oct 16 09:37:54 LANbox kernel: [ 107.868051] ata3.00: cmd c8/00:08:f8:10:48/00:00:00:00:00/e7 tag 0 dma 4096 in
Oct 16 09:37:54 LANbox kernel: [ 107.868051] res 40/00:01:00:00:00/00:00:00:00:00/50 Emask 0x4 (timeout)
Oct 16 09:37:54 LANbox kernel: [ 107.868055] ata3.00: status: { DRDY }
Oct 16 09:37:54 LANbox kernel: [ 107.868062] ata3: soft resetting link
Oct 16 09:37:55 LANbox kernel: [ 108.125815] ata3.00: configured for UDMA/133
Oct 16 09:37:55 LANbox kernel: [ 108.140593] ata3.01: configured for UDMA/133
Oct 16 09:37:55 LANbox kernel: [ 108.140600] ata3.00: device reported invalid CHS sector 0
Oct 16 09:37:55 LANbox kernel: [ 108.140606] ata3: EH complete
Most of the errors are either write dma, read dma or flush cache errors. I would like to know a few things like is not being able to test your drives with smartmontools a firmware thing? Does it make a difference if it is mounted or not? Could this be a ubuntu kernel issue?
smartctl -a /dev/sda gives the following:
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-65-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSC2BW120H6
Serial Number: CVTR518200KG120AGN
LU WWN Device Id: 5 5cd2e4 14c81df6b
Firmware Version: RG21
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA >3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Fri Oct 16 10:43:17 2015 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 2930) seconds.
Offline data collection
capabilities: (0x7f) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 58) minutes.
Conveyance self-test routine
recommended polling time: ( 4) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 629
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 22
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 2
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 9
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 029 100 000 Old_age Always - 29 (Min/Max 20/36)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 2
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 1
225 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 5785
226 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 65535
227 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 60
228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 65535
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 5785
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 10620
249 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 2519
SMART Error Log not supported
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Aborted by host 90% 629 -
# 2 Extended offline Aborted by host 90% 17 -
# 3 Short offline Aborted by host 90% 16 -
# 4 Extended offline Aborted by host 90% 5 -
# 5 Short offline Aborted by host 90% 5 -
# 6 Extended offline Aborted by host 90% 5 -
# 7 Extended offline Aborted by host 90% 5 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.