Solaris 10 OS Multipathing Options for iSCSI Devices

Multipathing iSCSI devices can be implemented at different levels in the Solaris storage protocol stack.

The following figure shows the Solaris block I/O stack.

Note – The iSCSI Multiple Connections per Session (MC/S) is currently not supported in Solaris but might be available in a future release.

iSCSI is built on the Solaris IP Stack, which includes:

• IP multipathing (IPMP) over TCP/IP
• Above IPMP, iSCSI provides native multipathing using MC/S

• At a higher level (that is independent of the transport layer), Solaris provides multipathing software

(MPxIO). Because MPxIO is independent of transport, it can multipath a target that is visible on both iSCSI and FC ports.

Because of their location in the network protocol stack, each multipath solution is useful for different purposes.

IP Multipath

IP Multipath (IPMP) is a native Solaris system facility for network multipathing. Operating at the IP layer in the networking stack, IPMP provides for fail-over and aggregation over two or more NICs. For more information about IPMP, see the Solaris 10 System Administration Guide: IP Services, at http://docs.sun.com/app/docs/doc/816-4554.

To implement IPMP, a system administrator selects NICs that are on the same subnet and places them in logical IPMP groups. A daemon (part of the IPMP system) monitors the health of the ports and can be configured to monitor connections to specific iSCSI targets. In the event of a port failure, the other port on the same subnet assumes the same Media Access Control (MAC) address as the failed NIC, and the iSCSI connection continues uninterrupted.

The following figure shows a sample configuration for IPMP.


Figure 2. IP Multipathing (IPMP)

IPMP participates in dynamic reconfiguration (DR). On systems that support DR, administrators can replace NICs without disrupting networking traffic. When a NIC is replaced, it is added back to the IPMP group and used thereafter for I/O.

When used in combination with iSCSI, the major limitation of IPMP is that multiple target ports are not multipathed. IPMP enables redundancy between host ports but cannot fail-over to multiple target ports.

iSCSI Native Multipathing

The iSCSI specification addresses the requirement for redundant physical connections. While FC SANs support multiple paths, the iSCSI specification defines what is supported.

In TCP/IP, connections describe communication between two portals. A session is the association between an initiator and target, either of which may have one or more portals. Multiple Connections per Session (MC/S) allows initiator portals to communicate with target portals in an coordinated manner. Target portal and initiator portal redundancy are both supported. Link aggregation is also supported. The following figure shows one configuration that supports MC/S.


Figure 3. Multiple Connection/Session (MC/S)

MC/S also allows (but does not require) more sophisticated error handling than simply retrying a command. This error recovery allows commands from a failed connection to be recovered quickly by other good connections in the same session. The SCSI layer is not aware of the error.

In general, iSCSI vendors do not yet support MC/S. Therefore, MC/S is not supported in the Solaris 10 release, Update 1, of the Solaris software initiator, but it might be supported in a future release.

Sun Multipathing Software (MPxIO)

MPxIO is a Solaris component that supports multiple physical paths to storage. MPxIO is the current Solaris functionality that supports multiple physical FC connections. Because MPxIO operates above the transport layer (at the SCSI protocol layer), it can support FC, InfiniBand (IB), and iSCSI in certain configurations. For more information about MPxIO, see http://www.sun.com/products-n-solutions/hardware/docs/Software/Storage_Software/Sun_StorEdge_Traffic_Manager/.

FC and iSCSI drivers register logical units (LUNs) with MPxIO. MPxIO matches paths to the same logical unit at the SCSI protocol layer by querying the unique SCSI per LUN identifier from each device. MPxIO collapses duplicate paths to one device so that the target driver and layers above know only of the one device.

The iSCSI initiator driver determines which device(s) to register by examining the SCSI target port identifier of the target. The target port identifier consists of two parts:

• target node name

• target portal group tag (TPGT)

These two parts are concatenated, as shown in the following example target port identifier.

iqn.1921-02.com.sun.12432+[1]

where the target node name is iqn.1921-02.com.sun.12432 and the TPGT is 1.

The iSCSI initiator registers an instance with MPxIO for each LUN for every unique target port identifier.

MPxIO and Multiple SCSI Target Portals IDs

MPxIO might seem to be the ideal solution to the current lack of native iSCSI multipathing support in the Solaris initiator. However, in order for MPxIO to support an iSCSI target, the target must support configuring different SCSI target port identifiers for each portal. One method of doing this, as shown in the following figure, is to allocate portals into multiple target portal groups so that the TPGT makes the target port identifier unique.


Figure 4. MPxIO with Multiple Target Port Identifiers

Another method is to simply have different iSCSI target names per portal. To create unique names, array vendors can choose either of these approaches.

The target’s port configuration determines whether MC/S or MPxIO can be used for multipathing.

• If an iSCSI target supports MC/S, it will present all of its target portals in a single target portal group.

With such a target, all target portals form one logical SCSI target port, and the Solaris iSCSI driver therefore registers only one instance of a LUN with MPxIO.

• If an iSCSI target supports MPxIO, it will have different target port groups. Different target port groups force different sessions, so MC/S cannot be used for target port redundancy.

MPxIO with Dual SCSI/FC Bridges

MPxIO can also be used when there are dual iSCSI to FC bridges to a FibreChannel SAN, as shown in the following figure.


As in the previous example, each LUN has a different target identifier because the iSCSI specification requires unique names for different devices. iSCSI presents both instances to MPxIO, and then MPxIO matches the unique SCSI per LUN identifier, finds that they are identical, and presents one target to the target driver.

MPxIO with Different Transports to the Same Device

Because MPxIO is above the transport layer, MPxIO can support different transports to the same device. In the example configuration shown in the following figure, one LUN appears to the host via FC and iSCSI paths. In this configuration, MPxIO will utilize both paths.

Using iSCSI Multipathing in the Solaris™ 10 Operating System — update 1

Figure 6. MPxIO with IP/FC Bridge

LUN0 at the disk array appears both to the IP NIC and FC HBA in the host. MPxIO will consolidate the two paths into one and then present it to the target drivers. This is how bridges currently work today. Arrays that support both FC and iSCSI connections natively can use the same mechanism.

Note that, in this configuration, MPxIO performs its default load balancing. For a symmetric access device, this is generally round robin load balancing, so that I/O requests alternate between active links. This is independent of the performance of relative links. Because load balancing is round robin, MPxIO is most useful in configurations in which all links between initiator and target have equal bandwidth and latency.

References

Publications

• Sun Microsystems, Inc. “Configuring iSCSI Initiators” in System Administration Guide: Devices and File Systems, in the Solaris 10 Product Documentation.

http://docs.sun.com/app/docs/doc/819-2723/6n50a1n01?a=view

• Solaris Fibre Channel and Storage Multipathing Administration Guide, Sun Microsystems, Inc. Solaris

10 Product Documentation.

http://docs.sun.com/source/819-0139/index.html

• Sun Microsystems, Inc. System Administration Guide: IP Services, in the Solaris 10 Product Documentation.

http://docs.sun.com/app/docs/doc/816-4554

• Internet Protocol Network Multipathing (Updated), by Mark Garner (Sun BluePrints™ OnLine— November 2002)

http://www.sun.com/blueprints/1102/806-7230.pdf

• Enterprise Network Design Patterns: High Availability (Sun BluePrints Online—December, 2003) http://www.sun.com/blueprints/1203/817-4683.pdf

Linux: Red Hat Top Performance Monitor Tools

CPU Tools

1 – top

2 – vmstat

3 – mpstat -P all

4 – ps -ef

5 – sar -u

6 – procinfo

7 – iostat

8 – gnome-system-monitor

9 – KDE-monitor

10 – oprofile

Memory Tools

1 – top

2 – vmstat -s

3 – ipcs

4 – ps -o vss,rss

5 – sar -r -B -W

6 – meminfo

7 – free

8 – gnome-system-monitor

9 – KDE-monitor

10 – oprofile

Process Tools

1 – top

2 – ps -o pmem

3 – gprof

4 – strace,ltrace

5 – sar

Disk Tools

1 – iostat -x

2 – vmstat -D

3 – sar DEV#

4 – nsfstat

5 – NEED MORE!

Solaris: Useful commands at OK prompt

Solaris Useful commands at OK prompt.

Dignostics:

boot

General

banner

this command shows the following systems hardware informatiion : Model,architecture, processor,keyboard, openboot version, Serial no. ethernet address & host id.

test floppy – test floppy disk drive
test net – test network loopbacks
test scsi – test scsi interface
test-all test for all devices with selftest
method

watch-clock

Show ticks of real-time clock

watch-net

Monitor network broadcast packets

watch-net-all

Monitor broadcast packets on all net interfaces

probe-scsi

Show attached SCSI devices

probe-scsi-all

Show attached SCSI devices for all host adapters- internal & external.

boot – boot kernel from default device.
Factory default is to boot
from DISK if present, otherwise from NET.

boot net – boot kernel from network
boot cdrom – boot kernel from CD-ROM
boot disk1:h – boot from disk1 partition h
boot tape – boot default file from tape
boot disk myunix -as – boot myunix from disk with flags “-as”

DEVALIAS

ok>show-devs

ok cd /pci@1f,4000/scsi@3

ok .properties

ok
ls

f00809d8 tape

f007ecdc disk

ok
.speed

CPU Speed :200.00MHz

UPA Speed :100.00MHz

PCI Bus A :66Mhz

PCI Bus B :33Mhz

printenv
Display all variables and current values.

setenv <variable>
Set variable to the given value.

set-default
<variable>

Reset the value of variable to the factory default.

set-defaults

Reset variable values to the factory defaults.


Key Sequences

These commands are disabled if the PROM security is on. Also, if your system has full security enabled, you cannot apply any of the suggested commands unless you have the password to get to the ok prompt.

Stop– Bypass POST. This command does not depend on security-mode. (Note: some systems bypass POST as a default; in such cases, use Stop-D to start POST.)

Stop-A
Abort.

Stop-D– Enter diagnostic mode (set diag-switch? to true).

Stop-F– Enter
Forth on TTYA instead of probing. Use
pan to continue with the initialization sequence. Useful if hardware is broken.

Stop-N Reset NVRAM contents to default values.

Start an OpenBoot Diagnostics

<STOP A>
OK setenv diag-switch? true
OK setenv auto-boot? false
OK reset-all

OK test-all or obdiag

Configure Graphics Console (e.g. Sun XVR-100 Graphics Accelerator) instead of serial TTYA

OK show-displays
Select the graphics accelerator, e.g. b

OK nvalias mydev <CTRL-Y>
OK setenv output-device mydev
OK setenv use-nvramrc? true
OK reset-all

Solaris: To set up solaris mail relay sendmail

Set up solaris mail relay: sendmail 8.12.10+

    1. # cd /usr/lib/mail/cf
    2. # cp subsidiary.mc myhost.mc
    3. # vi myhost.mc
      1. change DOMAIN(`solaris-generic’)dnl à
        DOMAIN(`solaris-antispam’)dnl
      2. remove any reference to DAEMON_OPTIONS
      3. add FEATURE(`access_db’)dnl before the MAILER lines.
    4. # /usr/ccs/bin/make myhost.cf
    5. # /etc/init.d/sendmail stop

 

  1. # cp /etc/mail/sendmail.cf /etc/mail/sendmail.cf.save
  2. # cp myhost.cf /etc/mail/sendmail.cf
  3. # cd /etc/mail
  4. # vi access
    1. add IP address of server you need to relay and the keyword RELAY


i.
3.177.70.71 RELAY

  1. # makemap hash /etc/mail/access < /etc/mail/access
    1. creates an access.db file
  2. # /etc/init.d/sendmail start

Set up a mail client:

Solaris sendmail 8.12.10+

  1. # cd /usr/lib/mail/cf
  2. # cp subsidiary.mc myhost.mc
  3. # vi myhost.mc
      1. change DOMAIN(`solaris-generic’)dnl à
        DOMAIN(`solaris-antispam’)dnl

 

    1. remove any reference to DAEMON_OPTIONS
  1. # /usr/ccs/bin/make myhost.cf
  2. # /etc/init.d/sendmail stop
  3. # cp /etc/mail/sendmail.cf /etc/mail/sendmail.cf.save
  4. # cp myhost.cf /etc/mail/sendmail.cf
  5. # cd /etc/mail
  6. # rm access access.db
  7. # /etc/init.d/sendmail start

Linux sendmail 8.12.11

  1. ensure you have installed
    sendmail-cf-8.12.11-4.RHEL3.1.i386.rpm
    1. # rpm –i sendmail-cf-8.12.11-4.RHEL3.1.i386.rpm
  2. # cp /etc/mail/sendmail.mc /etc/mail/sendmail.mc.bak
  3. # vi /etc/mail/sendmail.mc
  4. be sure the following lines are configured
    1. define(`SMART_HOST’,`mailhost.erc.ge.com’)dnl
    2. define(`LOCAL_RELAY’,
      `mailhost’)dnl
    3. LOCAL_DOMAIN(`erc.ge.com’)dnl
    4. MASQUERADE_AS(`ercgroup.com’)dnl
    5. FEATURE(masquerade_envelope)dnl
  5. Be sure mailhost is defined in /etc/hosts or is resolvable
  6. # service sendmail stop
  7. # service sendmail start

Solaris 2.6 sendmail 8.6

  1. be sure mailhost is defined in /etc/hosts or is resolvable
  2. copy /etc/mail/subsidiary.cf /etc/mail/sendmail.cf
  3. stop sendmail
  4. start sendmail

Veritas: DISK OPERATIONS

Action Command Line
Initialize a disk vxdisksetup -i device (CDS disk)
vxdisksetup -i device format=sliced (sliced disk)
List disks owned by
local and remote hosts
vxdisk –o alldgs list
List disk header vxdisk list diskname|device
Evacuate a disk vxevac -g diskgroup from_disk to_disk
Rename a disk vxedit -g diskgroup rename oldname newname
Set spare, no hot relocation, or reserved space on a disk vxedit -g diskgroup set {spare|nohotuse|reserve}=on|off diskname
Unrelocate a disk vxunreloc -g diskgroup original_diskname

DiskSuite: Create the metadb replica databases

Step 3:

Because you must have at least two metadb replicas available for the
system to work, I suggect that your chosen slice have two metadb
replicas in it, on each drive. That way you have four metadbs in total
and you can lose either drive and keep running.

su to root and set up your path for sanity …

PATH=/usr/sbin:/usr/bin:/usr/opt/SUNWmd/sbin;export PATH

and then create the metadbs in /dev/dsk/c0t1d0s4 and /dev/dsk/c0t3d0s4
and make sure your boot drive has the first metadb replica db …

# metadb -a -c 2 -f /dev/dsk/c0t3d0s4 /dev/dsk/c0t1d0s4

and that creates the four needed dbs as ( note “-c 2” ) 2 dbs per slice.

Check your /etc/system file and you will see some new stuff at the
bottom :

* Begin MDD database info (do not edit)
set md:mddb_bootlist1=”sd:28:16 sd:28:1050 sd:12:16 sd:12:1050″
* End MDD database info (do not edit)

At this point, if you want, you can reboot. You will see a stack of
WARNING messages, not errors, WARNINGS only ( hand waving again ) about
a stack of failures due to raid this and trans that not loading. Ignore
this. Everyone has fits when they see it thinking it means some
catastrophe has happened to their system. Not yet. :) If you use
metadb -i you should see

# metadb -i
flags first blk block count
a m p luo 16 1034 /dev/dsk/c0t3d0s4
a p luo 1050 1034 /dev/dsk/c0t3d0s4
a p luo 16 1034 /dev/dsk/c0t1d0s4
a p luo 1050 1034 /dev/dsk/c0t1d0s4
o – replica active prior to last mddb configuration change
u – replica is up to date
l – locator for this replica was read successfully
c – replica’s location was in /etc/opt/SUNWmd/mddb.cf
p – replica’s location was patched in kernel
m – replica is master, this is replica selected as input
W – replica has device write errors
a – replica is active, commits are occurring to this replica
M – replica had problem with master blocks
D – replica had problem with data blocks
F – replica had format problems
S – replica is too small to hold current data base
R – replica had device read errors

Veritas: Mirroring The Root Disk

Create the mirrors
1. Give veritas control of a disk to create the mirror on
/usr/lib/vxvm/bin/vxdisksetup -i solaris-disk-name

2. Add the new disk to the rootdg disk group and give it a veritas name
vxdg -g rootdg adddisk rootmirror=solaris-disk-name
3. Mirror the root partition
/etc/vx/bin/vxrootmir rootmirror

4. Mirror the swap space
vxassist -g rootdg mirror swapvol rootmirror
5. Mirror the var volume
vxassist -g rootdg mirror var rootmirror
6. Mirror the opt volume
vxassist -g rootdg mirror opt rootmirror
Ensure that each volume on the rootdisk is tied to a hard partition
1. Identify which hard partition on the disk you wish to tie volume to by checking out /etc/vfstab. See what lines volume manager commented out.

2. Identify which subdisk each volume is on that you want to tie back to the hard partition

3. Run the command to link the subdisk to a hard partition
a. for swap
vxmksdpart –g rootdg subdisk hard-partition 0x03 0x01
b. for var
vxmksdpart –g rootdg subdisk hard-partition 0x07 0x00
c. for opt
vxmksdpart –g rootdg subdisk hard-partition 0x00 0x00
d. for /
vxmksdpart –g rootdg subdisk hard-partition 0x02 0x00

Veritas: Remove A Disk Group

1. remove any volumes from the disks
a. vxassist –g disk-group remove volume volume-name
b. repeat the above procedure for each volume on the disk

2. remove disks from the disk group
a. vxdg –g disk-group rmdisk vxvm-disk-name
b. repeat the above procedure until there is one disk remaining in the disk group; you cannot remove the last disk from a disk group. At this point you will just need to remove the disk group itself.

3. remove the disk group
a. vxdg destroy disk-group