Home > Linux, 工具介绍 > smartctl获取raid卡下intel ssd寿命

smartctl获取raid卡下intel ssd寿命

March 14th, 2011

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: smartctl获取raid卡下intel ssd寿命

我们在线上用了大量的Intel ssd盘,总所周知的是ssd盘是有寿命的,在实际的使用中能够监控ssd盘是非常有帮助的。

我们通常是在raid卡下用intel ssd盘做10level的阵列,通常的工具很难读出它的寿命信息等。 经过intel和社区的努力,对intel ssd的寿命读取代码集成到了smartctl中去了, 这下我们方便了。

先交代下我们的环境:

硬件和操作系统:
System | Huawei Technologies Co., Ltd.; Tecal RH2285; vV100R001 (Main Server Chassis)
Release | Red Hat Enterprise Linux Server release 5.4 (Tikanga)
Kernel | 2.6.18-164.el5
Architecture | CPU = 64-bit, OS = 64-bit

raid卡控制器:
# RAID Controller ############################################
Controller | LSI Logic MegaRAID SAS
Model | MegaRAID SAS PCI Express(TM) ROMB, PCIE interface, 8 ports
Cache | 256MB Memory, BBU
BBU | 96% Charged, Temperature 23C, isSOHGood=

VirtualDev Size RAID Level Disks SpnDpth Stripe Status Cache
========== ========= ========== ===== ======= ====== ======= =========
0 0 (:-1-0) 0 Depth-1 WB, no RA
1 0 (:-5-3) 0 Depth-1 WB, no RA

PhysiclDev Type State Errors Vendor Model Size
========== ==== ======= ====== ======= ============ ===========
Hard Disk SAS Online, 0/0/0 SEAGATE ST3300657SS 279.396
Hard Disk SAS Online, 0/0/0 SEAGATE ST3300657SS 279.396
Solid Stat SATA Online, 0/901/0 CVPO007400S4160AGN INTEL 149.049
Solid Stat SATA Online, 0/900/0 CVPO010400AR160AGN INTEL 149.049
Solid Stat SATA Online, 0/900/0 CVPO007000T3160AGN INTEL 149.049
Solid Stat SATA Online, 0/900/0 CVPO009002DN160AGN INTEL 149.049
Solid Stat SATA Online, 0/900/0 CVPO0104017E160AGN INTEL 149.049
Solid Stat SATA Online, 0/899/0 CVPO010200KS160AGN INTEL 149.049

我们再来演示下如何使用:

STEP1: 因为我们用的是LSI的raid卡,需要先安装raid卡的工具程序:

首先去lsi官网去拉个megacli-2.00.11-2.x86_64.rpm

$ sudo rpm -i megacli-2.00.11-2.x86_64.rpm 

#一定要root权限运行
$ sudo  /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL  
Adapter #0

Number of Virtual Disks: 2
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 278.464 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Access Policy       : Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None
Number of Spans: 1
Span: 0 - Number of PDs: 2

PD: 0 Information
Enclosure Device ID: 12
Slot Number: 0
Enclosure position: 0
Device Id: 13
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.464 GB [0x22cee000 Sectors]
Firmware state: Online, Spun Up
SAS Address(0): 0x5000c5002bdd95fd
SAS Address(1): 0x0
Connected Port Number: 4(path0) 
Inquiry Data: SEAGATE ST3300657SS     00066SJ017HA            
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :39C (102.20 F)

...

PD: 4 Information
Enclosure Device ID: 12
Slot Number: 6
Enclosure position: 0
Device Id: 29
Sequence Number: 2
Media Error Count: 0
Other Error Count: 901
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 149.049 GB [0x12a19eb0 Sectors]
Non Coerced Size: 148.549 GB [0x12919eb0 Sectors]
Coerced Size: 148.080 GB [0x12829000 Sectors]
Firmware state: Online, Spun Up
SAS Address(0): 0x5286ed44f702c006
Connected Port Number: 4(path0) 
Inquiry Data: CVPO0104017E160AGN  INTEL SSDSA2M160G2GN                    2CV102HD
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Solid State Device
Drive Temperature :0C (32.00 F)




PD: 5 Information
Enclosure Device ID: 12
Slot Number: 7
Enclosure position: 0
Device Id: 30
Sequence Number: 2
Media Error Count: 0
Other Error Count: 900
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 149.049 GB [0x12a19eb0 Sectors]
Non Coerced Size: 148.549 GB [0x12919eb0 Sectors]
Coerced Size: 148.080 GB [0x12829000 Sectors]
Firmware state: Online, Spun Up
SAS Address(0): 0x5286ed44f702c007
Connected Port Number: 4(path0) 
Inquiry Data: CVPO010200KS160AGN  INTEL SSDSA2M160G2GN                    2CV102HD
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Solid State Device
Drive Temperature :0C (32.00 F)

注意上面的:Device Id: xxx, 下面要用的,说明raid正常读取信息了。

STEP2:

我们还需要下载较新的smartmontools 5.39或者5.40 来使用:
http://sourceforge.net/projects/smartmontools/files/smartmontools/

编译安装后我们就有了smarctl.

$ dmesg
...
scsi0 : LSI SAS based MegaRAID driver
  Vendor: PMC       Model: 8399              Rev: 1
  Type:   Enclosure                          ANSI SCSI revision: 05
  Vendor: SEAGATE   Model: ST3300657SS       Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: SEAGATE   Model: ST3300657SS       Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: INTEL SSDSA2M160  Rev: 02HD
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: INTEL SSDSA2M160  Rev: 02HD
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: INTEL SSDSA2M160  Rev: 02HD
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: INTEL SSDSA2M160  Rev: 02HD
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: INTEL SSDSA2M160  Rev: 02HD
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: INTEL SSDSA2M160  Rev: 02HD
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: LSI       Model: MegaRAID SAS RMB  Rev: 1.40
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 583983104 512-byte hdwr sectors (298999 MB)
sda: Write Protect is off
sda: Mode Sense: 1f 00 00 08
SCSI device sda: drive cache: write back
SCSI device sda: 583983104 512-byte hdwr sectors (298999 MB)
sda: Write Protect is off
sda: Mode Sense: 1f 00 00 08
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 >
sd 0:2:0:0: Attached scsi disk sda
  Vendor: LSI       Model: MegaRAID SAS RMB  Rev: 1.40
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 1242184960 512-byte hdwr sectors (635999 MB)
sdb: Write Protect is off
sdb: Mode Sense: 1f 00 00 08
SCSI device sdb: drive cache: write back
SCSI device sdb: 1242184960 512-byte hdwr sectors (635999 MB)
sdb: Write Protect is off
sdb: Mode Sense: 1f 00 00 08
SCSI device sdb: drive cache: write back
 sdb: sdb1
sd 0:2:1:0: Attached scsi disk sdb
...

# 有了这些信息,就可以开工了:
# 其中/dev/sda是你raid生成的设备, dmesg中可以看到。
# 其中29就是你要看的盘的device id

$sudo smartctl -a -d megaraid,29 /dev/sda 
smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

/dev/sda [megaraid_disk_29] [SAT]: Device open changed type from 'megaraid' to 'sat'
=== START OF INFORMATION SECTION ===
Model Family:     Intel X18-M/X25-M/X25-V G2 SSDs
Device Model:     INTEL SSDSA2M160G2GN
Serial Number:    CVPO0104017E160AGN
Firmware Version: 2CV102HD
User Capacity:    160,041,885,696 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
Local Time is:    Mon Mar 14 16:33:26 2011 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                 (   1) seconds.
Offline data collection
capabilities:                    (0x75) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   1) minutes.
Conveyance self-test routine
recommended polling time:        (   1) minutes.

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0020   100   100   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0030   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       3
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2423
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       37
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       26
225 Host_Writes_32MiB       0x0030   200   200   000    Old_age   Offline      -       258771
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       519
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       1
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       3426092101
232 Available_Reservd_Space 0x0033   099   099   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   097   097   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   099    Pre-fail  Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

比如说: 233 Media_Wearout_Indicator 0x0032 097 097 000 Old_age Always – 0
我们知道盘的我们的寿命还剩下97%.

推荐读物:
我们dba写的有更详细的对intel ssd信息的解释,看这里!

另外我还写了个脚本自动获取各个intel ssd的脚本片段:

    OPT_l="${OPT_l:-/tmp/intel_ssd_life}"; export OPT_l;
    OPT_s="${OPT_s:-60}"; export OPT_s;

    export PATH="${PATH}:/usr/local/bin:/usr/bin:/bin:/usr/libexec"
    export PATH="${PATH}:/usr/local/sbin:/usr/sbin:/sbin"
    export PATH="${PATH}:/opt/MegaRAID/MegaCli/"

    PID=$$
    
    date +"TS %s.%N" > ${OPT_l}

    while true; do
        if MegaCli64 -PDList -aALL >/tmp/isl 2>/dev/null; then
            for dev in $(awk '/Device Id/{print $3}' /tmp/isl); do
                idev="$(sed -e '/./{H;$!d;}' -e "x;/Device Id: ${dev}/!d;" /tmp/isl\
                        |awk '
                          /Device Id/                           {d=$3}
                          /Inquiry Data/                        {if ($0 ~ /INTEL/) {printf("%d\n", d)}}
                        ')"
                if [ -n "${idev}" ]; then
                    (echo -n "Device Id[${idev}]:" && smartctl -a -d megaraid,${idev} /dev/sda |grep "Media_Wearout_Indicator") >> ${OPT_l} 2>/dev/null
                fi
            done
            date +"TS %s.%N" >> ${OPT_l}
            if ! ps -p ${PID} >/dev/null 2>&1; then
                break;
            fi
        fi
        sleep ${OPT_s}
    done &

echo "running in background,  see result please type: cat /tmp/intel_ssd_life"

# do something issue ios here
# ...
# end

结果类似以下:

$cat /tmp/intel_ssd_life

TS 1299786377.246216000
Device Id[25]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[26]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[27]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[28]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[29]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
Device Id[30]:233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always – 0
TS 1299786438.160778000

用的上的同学自己再修改下要收集的信息点的代码。

玩得开心。

Post Footer automatically generated by wp-posturl plugin for wordpress.

  1. yu
    April 18th, 2013 at 10:37 | #1

    貌似现在用dmesg也可以看到ssd盘的device id
    # dmesg |grep -i “INTEL SSD”
    scsi 0:0:12:0: Direct-Access ATA INTEL SSDSC2CW24 400i PQ: 0 ANSI: 5
    scsi 0:0:13:0: Direct-Access ATA INTEL SSDSC2CW24 400i PQ: 0 ANSI: 5

    Yu Feng Reply:

    能获取到就可以。

  2. yu
    April 18th, 2013 at 11:39 | #2

    还有一事请教
    #smartctl -a -d sat+megaraid,12 -d /dev/sda
    我想把/dev/sda换成virtual id,这个您知道怎么写吗?

Comments are closed.