VxVM Troubleshooting – Procedure to Replace Internal FibreChannel (FC) Disks controlled by VxVM

Hot swapping of a failed disk is fairly straight procedure if the disks are regular SCSI disks, but for the Fibre Channel (FC) disks we should follow different procedure for hot swaping.

Below specific procedure should be used when replacing one of the internal disks in a system with internal fibre drives (Sun Fire 280R, Sun Fire V480, Sun Fire V490, Sun Fire V880, Sun Fire V890), especially if the disk is under Veritas Volume Manager (VxVM) control.

the procedure below ensures to alert VxVM to the fact that the drive is being replaced, although the disks are hot-swappable. Failure to follow this procedure could result in a duplicate entry for the replaced disk in VxVM, in ‘vxdisk list’ command

For example:

# vxdisk list
EVICE TYPE DISK GROUP STATUS
c1t0d0s2 sliced rootdisk rootdg online
c1t1d0s2 sliced – – error
c1t1d0s2 sliced – – error

To remove the duplicate entries from the above command, the easy way is to reboot the server. Following below procedure will prevent the duplicate device from being created in the first place.

Please not If the disk is not under VxVM control, you can skip steps 3,5,10,11,12

Procedure To Replace FC Disk which is under VxVM Control

Step1 : Collect the information

NOTE: All data on these devices should have been backed up. Before replacing any disk under VxVM control, it should be in either a ‘failed’ or ‘removed’ state:

# vxdisk list

DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 sliced rootdisk rootdg online
c1t1d0s2 sliced – – online
– – disk01 rootdg failed was:c1t1d0s2

If the disk does not show up as “failed was”, as shown above, then you should run ‘vxdiskadm’ and choose option #4 to remove the disk for replacement. After running ‘vxdiskadm’, the output should look like this:

# vxdisk list

DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 sliced rootdisk rootdg online
c1t1d0s2 sliced – – online
– – disk01 rootdg removed was:c1t1d0s2

NOTE: If this is a root-disk or root-mirror, check the following removed disk information, before this operation. This information is needed to change nvramrc.

WWN information:

For example,

# ls -al /dev/rdsk/c1t0d0s0

lrwxrwxrwx 1 root root 74 Mar 6 2003 c1t0d0s0 -> ../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0:a,raw

devalias and boot-device in nvramrc

For example,

# eeprom nvramrc

devalias rootdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19920,0:a
devalias mirrdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19838,0:a
boot-device=rootdisk mirrdisk

Step 2. If this is a root-disk or root-mirror, use the dumpadm command to ensure that the dump-device is not on the failed disk. If it is, move it to the good side of the mirror, for example:

# dumpadm -d /dev/dsk/c1t0d0s1

Step 3. If vxdiskadm option 4 is used to remove the disk for replacement, instruct VxVM to re-read the device tree by running the command

# vxdctl enable

Step 4. Put the disk into the “offline” state with the following command:

# vxdisk offline c1t1d0s2

Step 5. Verify the disk has been marked “offline” with “vxdisk list”:

# vxdisk list

DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 sliced rootdisk rootdg online
c1t1d0s2 sliced – – offline
– – disk01 rootdg removed was:c1t1d0s2

Step 6. Once Veritas has recognized the disk as offline and ready for replacement, you need to tell the operating system. This is done as follows:

# /usr/sbin/luxadm remove_device /dev/rdsk/c1t1d0s2

This will produce output similar to the following:

WARNING!!! Please ensure that no file systems are mounted on these device(s).

All data on these devices should have been backed up.

The list of devices which will be removed is:

1: Device name: /dev/rdsk/c1t1d0s2 Node WWN: 20000020371b1f31
Device Type: Disk device
Device Paths: /dev/rdsk/c1t1d0s2
Please verify the above list of devices and then enter c or to  Continue or q to Quit. [Default: c]:c

stopping: /dev/rdsk/c1t1d0s2…. Done
offlining: /dev/rdsk/c1t1d0s2…. Done
The drives are now off-line and spun down.

Physically remove the disk and press the Return key.

Hit after removing the device(s).
picld[87]: Device DISK1 removed
Device: /dev/rdsk/c1t1d0s2
No FC devices found. – /dev/rdsk/c1t1d0s2

NOTE:  The picld daemon notifies the system that the disk has been removed.

If no errors are printed, continue to step 6. Otherwise, if you receive any errors during this step:

physically pull the bad disk from the host run the commands:

# vxdisk rm c1t1d0s2
# luxadm -e offline /dev/rdsk/c1t1d0s2

if the disk is multipathed, run the ‘luxadm -e offline’ on the second path as well.

Step 7. Initiate devfsadm cleanup subroutines by entering the following command:

# /usr/sbin/devfsadm -C -c disk

The default devfsadm operation, is to attempt to load every driver in the system, and attach these drivers to all possible device instances. The devfsadm command then creates device special files in the /devices directory, and logical links in /dev.

With the “-c disk” option, devfsadm will only update disk device files. This saves time and is important on systems that have tape devices attached.

Rebuilding these tape devices could cause undesirable results on non-Sun hardware.

The -C option cleans up the /dev directory, and removes any lingering logical links to the device link names. This should remove all the device paths for this particular disk. This can be verified with:

# ls -ld /dev/dsk/c1t1d*

This should return no devices.

Step 8. Verify that the reference to this disk is gone by running the commands

# vxdisk list (if the disk is under vxvm control)

# format

It is now safe to physically replace the disk.

Step 9. After replacing the disk, create the necessary entries in the Solaris OS

device tree with one of the following commands:

# devfsadm

or

# /usr/sbin/luxadm insert_device

where sx is the slot number.

NOTE: In many cases, luxadm insert_device does not require the enclosure name and slot number.

Use the following to find the slot number:

# luxadm display

To find the use:

# luxadm probe

Run “ls -ld /dev/dsk/c1t1d*” to verify that the new device paths have been created.

NOTE: After inserting disk and running devfsadm(or luxadm), the old ssd id was changed to a new one. So, just ignore this change.

For example:

When an error occurs on the following disks(ssd3).

  • WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0 (ssd3):
  • Error for Command: read(10) Error Level: Retryable
  • Requested Block: 15392944 Error Block: 15392958

(After inserting disk)

  • picld[287]: [ID 727222 daemon.error] Device DISK0 inserted
  • qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(2): Loop ONLINE
  • scsi: [ID 799468 kern.info] ssd10 at fp2: name w21000011c63f0c94,0, bus address ef
  • genunix: [ID 936769 kern.info] ssd10 is /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0
  • scsi: [ID 365881 kern.info]
  • genunix: [ID 408114 kern.info] /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c
  • 63f0c94,0 (ssd10) online

Step 10. Label the disk using the format command.

If the disk is under VxVM control, be sure to write an SMI label(Solaris 9 4/03 OS or later):

# format -e /dev/rdsk/c1t1d0s2

format> l

[0] SMI Label

[1] EFI Label

Specify Label type[1]: 0

Auto configuration via format.dat[no]? no

Auto configuration via generic SCSI-2[no]? yes

Ready to label disk, continue? yes

If the disk is not under VxVM control, label the disk to local requirements, otherwise, it could be labeled with a standard vtoc. Steps 9a – 9c are only required if this is a system running SunCluster

Note: It’s possible to get errors from c0t0d0 which is the cdrom/dvd drive on Sun fire v480,v880 etc..

Step 11. Instruct VxVM to re-read the device tree by running the command

# vxdctl enable

Step 12. The disk will remain in the “offline” state until the new disk is initialized.

To initialize it, use the command line first:

# vxdisksetup -i c1t1d0

Then, use ‘vxdiskadm’ and choose option #5 to replace the failed or removed disk.

– OR –

Run ‘vxdiskadm’ and choose option #5 to initialize it and replace the failed or removed disk. If the ‘vxdiskadm’ command is run, and option #5 is chosen, it will show that “Access is disabled” for this new disk (because it is still “offline”), and will be asked whether or not you wish to “enable access” to it. Answer ‘yes’ to this question.

Step 13. The disk should now be online and functional, within the operating system and VxVM. Confirm this with “vxdisk list”.

NOTE: Do not re-boot the system and Setp-13(modify nvramrc) until a synchronization is completed. If it is re-booted, it cannot boot from a new disk or modify devalias. Confirm this with “vxtask list”:

# vxtask list

Step 14. If a swap partition had to be moved, move it back, for example:

# dumpadm -d /dev/dsk/c1t1d0s1

Step 15. If this was a root-disk or a root-mirror, then you need to make sure and run /etc/vx/bin/vxbootsetup command. The vxbootsetup utility configures a disk by writing a boot track at the beginning of the disk and by creating physical disk partitions in the UNIX VTOC that match the mirrors of the root, swap, /usr and /var.

#/etc/vx/bin/vxbootsetup -g rootdg rootdisk

Step 16. If this was a root-disk or root-mirror, then ensure the nvram aliases are updated so you can boot.

# ls -al /dev/rdsk/s0

example: ls -al /dev/rdsk/c1t1d0s0

Check the WWN from the ls output with the appropriate root alias entries in the NVRAM. (eeprom nvramrc) and look at rootmirror or rootdisk entries.

NOTE: The change method of devalias in nvramrc. From removed disk information to new disk information.

For example,

– List before modifying nvramrc. (removed disk information)

# eeprom nvramrc

devalias rootdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19920,0:a

devalias mirrdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19838,0:a

– List the new disk information

# ls -al /dev/rdsk/c1t0d0s0

lrwxrwxrwx 1 root root 74 Mar 6 2003 c1t0d0s0 -> ../../

devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0:a,raw

– Modify nvramrc

(This example is written in the bourne shell)

# eeprom nvramrc= ‘devalias root-disk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@ w21000011c63f0c94,0:a [enter once] devalias rootmirror /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19838,0:a ‘  [enter second time]

– List after modifying nvramrc.

# eeprom nvramrc

devalias rootdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0:a

devalias mirrdisk /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100004cfa19838,0:a

NOTE: If this is a root-disk or rootmirror, the device path contains the WWN of the new disk. It is necessary to update the nvramrc devalias entries to the new device path, so the system will be able to boot from the newly-replaced rootdisk or rootmirror.

Leave a Comment

Your email address will not be published. Required fields are marked *

CAPTCHA * Time limit is exhausted. Please reload the CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top