How To Replace A Failed SVM Disk

Before you replace (what you believe is) a failed Solaris Volume Manager (SVM) disk, you need to establish whether it has indeed failed or is still in the process of failing. Why is it important to determine if an SVM disk has failed? It could save you a little time replacing a failed SVM disk as opposed to a failing one.

Read¬†How To Tell The Difference Between A Failed Disk And A Failing Disk to find out which one your disk is. If your disk hasn’t quite failed yet, this article will show you How To Replace A Failing SVM Disk.

Now that you have established that you do have a failed SVM disk, find out if the disk contains SVM metadatabase replicas and delete them. Assuming that the failed disk is c1t1d0.

# metadb | grep c1t1d0
      W   p  l          16              8192            /dev/dsk/c1t1d0s7
      W   p  l          8208            8192            /dev/dsk/c1t1d0s7
      W   p  l          16400           8192            /dev/dsk/c1t1d0s7
#
# metadb -d c1t1d0s7
#
# metadb
        flags           first blk       block count
     a m  p  luo        16              8192            /dev/dsk/c1t0d0s7
     a    p  luo        8208            8192            /dev/dsk/c1t0d0s7
     a    p  luo        16400           8192            /dev/dsk/c1t0d0s7
#

Unconfigure the failed SVM disk

# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 CD-ROM       connected    configured   unknown
c1                             scsi-bus     connected    configured   unknown
c1::dsk/c1t0d0                 disk         connected    configured   unknown
c1::dsk/c1t1d0                 disk         connected    configured   unknown
c1::dsk/c1t2d0                 disk         connected    configured   unknown
c1::dsk/c1t3d0                 disk         connected    configured   unknown
c2                             scsi-bus     connected    unconfigured unknown
c3                             fc-fabric    connected    configured   unknown
c3::5006016239a02018           disk         connected    configured   unknown
c3::5006016b39a02018           disk         connected    configured   unknown
c3::5006048452a70c17           disk         connected    configured   unknown
c3::5006048c52a70c07           disk         connected    configured   unknown
c4                             fc-fabric    connected    configured   unknown
c4::5006016339a02018           disk         connected    configured   unknown
c4::5006016a39a02018           disk         connected    configured   unknown
c4::5006048452a70c18           disk         connected    configured   unknown
c4::5006048c52a70c08           disk         connected    configured   unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb1/1                         unknown      empty        unconfigured ok
usb1/2                         unknown      empty        unconfigured ok
#
# cfgadm -c unconfigure c1::dsk/c1t1d0
cfgadm: Component system is busy, try again: failed to offline:
     Resource              Information
------------------  -------------------------
/dev/dsk/c1t1d0s2   Device being used by VxVM
#

Note: This host uses SVM to manage internal disks and Veritas Volume Manager (VxVM) to manage SAN attached disks. VxVM keeps track of the internal disks – even if it doesn’t actually manage them – and may not allow you to unconfigure them. To get around this restriction, you may need to forcibly unconfigure the failed SVM disk by specifying the -f parameter to cfgadm.

# cfgadm -f -c unconfigure c1::dsk/c1t1d0
#
# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 CD-ROM       connected    configured   unknown
c1                             scsi-bus     connected    configured   unknown
c1::dsk/c1t0d0                 disk         connected    configured   unknown
c1::dsk/c1t1d0                 disk         connected    unconfigured unknown
c1::dsk/c1t2d0                 disk         connected    configured   unknown
c1::dsk/c1t3d0                 disk         connected    configured   unknown
c2                             scsi-bus     connected    unconfigured unknown
c3                             fc-fabric    connected    configured   unknown
c3::5006016239a02018           disk         connected    configured   unknown
c3::5006016b39a02018           disk         connected    configured   unknown
c3::5006048452a70c17           disk         connected    configured   unknown
c3::5006048c52a70c07           disk         connected    configured   unknown
c4                             fc-fabric    connected    configured   unknown
c4::5006016339a02018           disk         connected    configured   unknown
c4::5006016a39a02018           disk         connected    configured   unknown
c4::5006048452a70c18           disk         connected    configured   unknown
c4::5006048c52a70c08           disk         connected    configured   unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb1/1                         unknown      empty        unconfigured ok
usb1/2                         unknown      empty        unconfigured ok
#

Verify that the failed SVM disk is marked “unconfigured” as above. Sun servers with hot-swappable disks will also have the disk’s blue “ready to remove” LED lit.

Pull the failed SVM disk out of the drive bay and insert the new disk. The following message will come up in /var/adm/messages.

Jul 20 14:46:09 eap52 rmclomv: [ID 978967 kern.error] DISK @ HDD1 has been inserted.

Configure the new disk.

# cfgadm -c configure c1::dsk/c1t1d0
#
# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 CD-ROM       connected    configured   unknown
c1                             scsi-bus     connected    configured   unknown
c1::dsk/c1t0d0                 disk         connected    configured   unknown
c1::dsk/c1t1d0                 disk         connected    configured   unknown
c1::dsk/c1t2d0                 disk         connected    configured   unknown
c1::dsk/c1t3d0                 disk         connected    configured   unknown
c2                             scsi-bus     connected    unconfigured unknown
c3                             fc-fabric    connected    configured   unknown
c3::5006016239a02018           disk         connected    configured   unknown
c3::5006016b39a02018           disk         connected    configured   unknown
c3::5006048452a70c17           disk         connected    configured   unknown
c3::5006048c52a70c07           disk         connected    configured   unknown
c4                             fc-fabric    connected    configured   unknown
c4::5006016339a02018           disk         connected    configured   unknown
c4::5006016a39a02018           disk         connected    configured   unknown
c4::5006048452a70c18           disk         connected    configured   unknown
c4::5006048c52a70c08           disk         connected    configured   unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb1/1                         unknown      empty        unconfigured ok
usb1/2                         unknown      empty        unconfigured ok
#

Verify that the new disk has been configured as above.

Copy the volume table of contents (VTOC) from the other disk in the mirror set, c1t0d0, onto the new disk.

# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
fmthard:  New volume table of contents now in place.
#

If prtvtoc returns with an error similar to this, “/dev/rdsk/c1t1d0s2: Cannot get disk geometry“, you will need to run format to label the disk.

# format
Searching for disks...done

c1t1d0: configured with capacity of 72.36GB

AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1f,700000/scsi@2/sd@0,0
       1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1f,700000/scsi@2/sd@1,0
       2. c1t2d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1f,700000/scsi@2/sd@2,0
       3. c1t3d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1f,700000/scsi@2/sd@3,0
Specify disk (enter its number): 1
selecting c1t1d0
[disk formatted]
Disk not labeled.  Label it now? y

FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !     - execute , then return
        quit
format> q
#

Recreate the metadatabase replicas on the new disk.

# metadb -a -c 3 c1t1d0s7
#
# metadb
        flags           first blk       block count
     a m  p  luo        16              8192            /dev/dsk/c1t0d0s7
     a    p  luo        8208            8192            /dev/dsk/c1t0d0s7
     a    p  luo        16400           8192            /dev/dsk/c1t0d0s7
     a        u         16              8192            /dev/dsk/c1t1d0s7
     a        u         8208            8192            /dev/dsk/c1t1d0s7
     a        u         16400           8192            /dev/dsk/c1t1d0s7
#

Update the new disk’s device ID entry in SVM. This step may not be required but it’s a good idea to do it just in case.

# metadevadm -u c1t1d0
Updating Solaris Volume Manager device relocation information for c1t1d0
Old device reloc information:
        id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA
New device reloc information:
        id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA
#

Enable the submirrors on the replacement disk. Start with the swap partition as this won’t affect any data in case SVM runs into a problem. You may enable the submirrors in the new disk in parallel or in sequence. If the I/O load on the system is heavy then do it in sequence. Otherwise, enable the submirrors in parallel.

# metareplace -e d1 c1t1d0s1
d1: device c1t1d0s1 is enabled
solaris_1# metastat d1
d1: Mirror
    Submirror 0: d11
      State: Okay
    Submirror 1: d21
      State: Resyncing
    Resync in progress: 0 % done
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 10491456 blocks (5.0 GB)

d11: Submirror of d1
    State: Okay
    Size: 10491456 blocks (5.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t0d0s1          0     No            Okay   Yes

d21: Submirror of d1
    State: Resyncing
    Size: 10491456 blocks (5.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t1d0s1          0     No       Resyncing   Yes

Device Relocation Information:
Device   Reloc  Device ID
c1t0d0   Yes    id1,sd@SFUJITSU_MAW3073NCSUN72G_000707B0KHT4____DAN0P720KHT4
c1t1d0   Yes    id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA
#

SVM will resync the submirrors as soon as they are enabled. This is done in the background and may take a fair amount of time depending on the size of the submirrors. Now is a good time to go for a cup of coffee. Don’t forget to check the progress of the resync when you return.