smartctl获取raid卡下intel ssd寿命
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: smartctl获取raid卡下intel ssd寿命
我们在线上用了大量的Intel ssd盘,总所周知的是ssd盘是有寿命的,在实际的使用中能够监控ssd盘是非常有帮助的。
我们通常是在raid卡下用intel ssd盘做10level的阵列,通常的工具很难读出它的寿命信息等。 经过intel和社区的努力,对intel ssd的寿命读取代码集成到了smartctl中去了, 这下我们方便了。
先交代下我们的环境:
硬件和操作系统:
System | Huawei Technologies Co., Ltd.; Tecal RH2285; vV100R001 (Main Server Chassis)
Release | Red Hat Enterprise Linux Server release 5.4 (Tikanga)
Kernel | 2.6.18-164.el5
Architecture | CPU = 64-bit, OS = 64-bit
raid卡控制器:
# RAID Controller ############################################
Controller | LSI Logic MegaRAID SAS
Model | MegaRAID SAS PCI Express(TM) ROMB, PCIE interface, 8 ports
Cache | 256MB Memory, BBU
BBU | 96% Charged, Temperature 23C, isSOHGood=
VirtualDev Size RAID Level Disks SpnDpth Stripe Status Cache
========== ========= ========== ===== ======= ====== ======= =========
0 0 (:-1-0) 0 Depth-1 WB, no RA
1 0 (:-5-3) 0 Depth-1 WB, no RA
PhysiclDev Type State Errors Vendor Model Size
========== ==== ======= ====== ======= ============ ===========
Hard Disk SAS Online, 0/0/0 SEAGATE ST3300657SS 279.396
Hard Disk SAS Online, 0/0/0 SEAGATE ST3300657SS 279.396
Solid Stat SATA Online, 0/901/0 CVPO007400S4160AGN INTEL 149.049
Solid Stat SATA Online, 0/900/0 CVPO010400AR160AGN INTEL 149.049
Solid Stat SATA Online, 0/900/0 CVPO007000T3160AGN INTEL 149.049
Solid Stat SATA Online, 0/900/0 CVPO009002DN160AGN INTEL 149.049
Solid Stat SATA Online, 0/900/0 CVPO0104017E160AGN INTEL 149.049
Solid Stat SATA Online, 0/899/0 CVPO010200KS160AGN INTEL 149.049
我们再来演示下如何使用:
STEP1: 因为我们用的是LSI的raid卡,需要先安装raid卡的工具程序:
首先去lsi官网去拉个megacli-2.00.11-2.x86_64.rpm
$ sudo rpm -i megacli-2.00.11-2.x86_64.rpm #一定要root权限运行 $ sudo /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL Adapter #0 Number of Virtual Disks: 2 Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 278.464 GB State : Optimal Strip Size : 64 KB Number Of Drives : 2 Span Depth : 1 Default Cache Policy: WriteBack, ReadAheadNone, Direct, Write Cache OK if Bad BBU Current Cache Policy: WriteBack, ReadAheadNone, Direct, Write Cache OK if Bad BBU Access Policy : Read/Write Disk Cache Policy : Enabled Encryption Type : None Number of Spans: 1 Span: 0 - Number of PDs: 2 PD: 0 Information Enclosure Device ID: 12 Slot Number: 0 Enclosure position: 0 Device Id: 13 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 279.396 GB [0x22ecb25c Sectors] Non Coerced Size: 278.896 GB [0x22dcb25c Sectors] Coerced Size: 278.464 GB [0x22cee000 Sectors] Firmware state: Online, Spun Up SAS Address(0): 0x5000c5002bdd95fd SAS Address(1): 0x0 Connected Port Number: 4(path0) Inquiry Data: SEAGATE ST3300657SS 00066SJ017HA FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device Drive Temperature :39C (102.20 F) ... PD: 4 Information Enclosure Device ID: 12 Slot Number: 6 Enclosure position: 0 Device Id: 29 Sequence Number: 2 Media Error Count: 0 Other Error Count: 901 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 149.049 GB [0x12a19eb0 Sectors] Non Coerced Size: 148.549 GB [0x12919eb0 Sectors] Coerced Size: 148.080 GB [0x12829000 Sectors] Firmware state: Online, Spun Up SAS Address(0): 0x5286ed44f702c006 Connected Port Number: 4(path0) Inquiry Data: CVPO0104017E160AGN INTEL SSDSA2M160G2GN 2CV102HD FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Solid State Device Drive Temperature :0C (32.00 F) PD: 5 Information Enclosure Device ID: 12 Slot Number: 7 Enclosure position: 0 Device Id: 30 Sequence Number: 2 Media Error Count: 0 Other Error Count: 900 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 149.049 GB [0x12a19eb0 Sectors] Non Coerced Size: 148.549 GB [0x12919eb0 Sectors] Coerced Size: 148.080 GB [0x12829000 Sectors] Firmware state: Online, Spun Up SAS Address(0): 0x5286ed44f702c007 Connected Port Number: 4(path0) Inquiry Data: CVPO010200KS160AGN INTEL SSDSA2M160G2GN 2CV102HD FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Solid State Device Drive Temperature :0C (32.00 F)
注意上面的:Device Id: xxx, 下面要用的,说明raid正常读取信息了。
STEP2:
我们还需要下载较新的smartmontools 5.39或者5.40 来使用:
http://sourceforge.net/projects/smartmontools/files/smartmontools/
编译安装后我们就有了smarctl.
$ dmesg ... scsi0 : LSI SAS based MegaRAID driver Vendor: PMC Model: 8399 Rev: 1 Type: Enclosure ANSI SCSI revision: 05 Vendor: SEAGATE Model: ST3300657SS Rev: 0006 Type: Direct-Access ANSI SCSI revision: 05 Vendor: SEAGATE Model: ST3300657SS Rev: 0006 Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: INTEL SSDSA2M160 Rev: 02HD Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: INTEL SSDSA2M160 Rev: 02HD Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: INTEL SSDSA2M160 Rev: 02HD Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: INTEL SSDSA2M160 Rev: 02HD Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: INTEL SSDSA2M160 Rev: 02HD Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: INTEL SSDSA2M160 Rev: 02HD Type: Direct-Access ANSI SCSI revision: 05 Vendor: LSI Model: MegaRAID SAS RMB Rev: 1.40 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 583983104 512-byte hdwr sectors (298999 MB) sda: Write Protect is off sda: Mode Sense: 1f 00 00 08 SCSI device sda: drive cache: write back SCSI device sda: 583983104 512-byte hdwr sectors (298999 MB) sda: Write Protect is off sda: Mode Sense: 1f 00 00 08 SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 > sd 0:2:0:0: Attached scsi disk sda Vendor: LSI Model: MegaRAID SAS RMB Rev: 1.40 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdb: 1242184960 512-byte hdwr sectors (635999 MB) sdb: Write Protect is off sdb: Mode Sense: 1f 00 00 08 SCSI device sdb: drive cache: write back SCSI device sdb: 1242184960 512-byte hdwr sectors (635999 MB) sdb: Write Protect is off sdb: Mode Sense: 1f 00 00 08 SCSI device sdb: drive cache: write back sdb: sdb1 sd 0:2:1:0: Attached scsi disk sdb ... # 有了这些信息,就可以开工了: # 其中/dev/sda是你raid生成的设备, dmesg中可以看到。 # 其中29就是你要看的盘的device id $sudo smartctl -a -d megaraid,29 /dev/sda smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net /dev/sda [megaraid_disk_29] [SAT]: Device open changed type from 'megaraid' to 'sat' === START OF INFORMATION SECTION === Model Family: Intel X18-M/X25-M/X25-V G2 SSDs Device Model: INTEL SSDSA2M160G2GN Serial Number: CVPO0104017E160AGN Firmware Version: 2CV102HD User Capacity: 160,041,885,696 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 1 Local Time is: Mon Mar 14 16:33:26 2011 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 1) seconds. Offline data collection capabilities: (0x75) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 1) minutes. Conveyance self-test routine recommended polling time: ( 1) minutes. SMART Attributes Data Structure revision number: 5 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0020 100 100 000 Old_age Offline - 0 4 Start_Stop_Count 0x0030 100 100 000 Old_age Offline - 0 5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 3 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2423 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 37 192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 26 225 Host_Writes_32MiB 0x0030 200 200 000 Old_age Offline - 258771 226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 519 227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 1 228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 3426092101 232 Available_Reservd_Space 0x0033 099 099 010 Pre-fail Always - 0 233 Media_Wearout_Indicator 0x0032 097 097 000 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 099 Pre-fail Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
比如说: 233 Media_Wearout_Indicator 0x0032 097 097 000 Old_age Always – 0
我们知道盘的我们的寿命还剩下97%.
推荐读物:
我们dba写的有更详细的对intel ssd信息的解释,看这里!
另外我还写了个脚本自动获取各个intel ssd的脚本片段:
OPT_l="${OPT_l:-/tmp/intel_ssd_life}"; export OPT_l; OPT_s="${OPT_s:-60}"; export OPT_s; export PATH="${PATH}:/usr/local/bin:/usr/bin:/bin:/usr/libexec" export PATH="${PATH}:/usr/local/sbin:/usr/sbin:/sbin" export PATH="${PATH}:/opt/MegaRAID/MegaCli/" PID=$$ date +"TS %s.%N" > ${OPT_l} while true; do if MegaCli64 -PDList -aALL >/tmp/isl 2>/dev/null; then for dev in $(awk '/Device Id/{print $3}' /tmp/isl); do idev="$(sed -e '/./{H;$!d;}' -e "x;/Device Id: ${dev}/!d;" /tmp/isl\ |awk ' /Device Id/ {d=$3} /Inquiry Data/ {if ($0 ~ /INTEL/) {printf("%d\n", d)}} ')" if [ -n "${idev}" ]; then (echo -n "Device Id[${idev}]:" && smartctl -a -d megaraid,${idev} /dev/sda |grep "Media_Wearout_Indicator") >> ${OPT_l} 2>/dev/null fi done date +"TS %s.%N" >> ${OPT_l} if ! ps -p ${PID} >/dev/null 2>&1; then break; fi fi sleep ${OPT_s} done & echo "running in background, see result please type: cat /tmp/intel_ssd_life" # do something issue ios here # ... # end
结果类似以下:
$cat /tmp/intel_ssd_life
…
TS 1299786377.246216000
Device Id[25]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[26]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[27]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[28]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[29]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[30]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
TS 1299786438.160778000
…
用的上的同学自己再修改下要收集的信息点的代码。
玩得开心。
Post Footer automatically generated by wp-posturl plugin for wordpress.
貌似现在用dmesg也可以看到ssd盘的device id
# dmesg |grep -i “INTEL SSD”
scsi 0:0:12:0: Direct-Access ATA INTEL SSDSC2CW24 400i PQ: 0 ANSI: 5
scsi 0:0:13:0: Direct-Access ATA INTEL SSDSC2CW24 400i PQ: 0 ANSI: 5
Yu Feng Reply:
April 18th, 2013 at 10:49 am
能获取到就可以。
还有一事请教
#smartctl -a -d sat+megaraid,12 -d /dev/sda
我想把/dev/sda换成virtual id,这个您知道怎么写吗?