How to Mount NFS From Linux To AIX

Linux :

  1. SSH to the Linux box
  2. Add an entry in the /etc/hosts to add the AIX server IP and hostname, if it is not already there
  3. Specify the file system to be exported in the /etc/exports file
    /data X.X.X.X
  4. Start the nfs service, if it has not already been started
    [root@linux ~]# service nfs status
    rpc.mountd (pid 3170) is running…
    nfsd (pid 3167 3166 3165 3164 3163 3162 3161 3160) is running…
    rpc.rquotad (pid 3145) is running…
    [root@linux ~]# service nfs restart
    Shutting down NFS mountd: [ OK ]
    Shutting down NFS daemon: [ OK ]
    Shutting down NFS quotas: [ OK ]
    Shutting down NFS services: [ OK ]
    Starting NFS services: [ OK ]
    Starting NFS quotas: [ OK ]
    Starting NFS daemon: [ OK ]
    Starting NFS mountd: [ OK ]
  5. Run the exportfs command to export the file system
    # exportfs -a

AIX :

  1. Add an entry in the /etc/hosts for the Linux box, if it is not already there.
  2. Run the showmount command to check if AIX can see the exported file system
    # showmount -e X.X.X.X 
  3. Create a directory to mount the NFS
    # mkdir /data 
  4. Run the mount command to mount the exported NFS
    # mount X.X.X.X:/data /data 
  5. If not able to mount and the error message is “vmount: operation not permitted“, run the command below. Detailed explanation here.
    #  nfso -p -o nfs_use_reserved_ports=1
  6. Try the mount command again.
  7. If successful, try to access the directory.

Aix Booting Process

boot_process_1
Loading the boot image of AIX
  •  After POST, the firmware (System Read Only Storage) detects the 1st bootable device stored in the bootlist. (here it is hdisk0)
  • then the bootstrap code (software ROS) i.e. 1st 512 bytes of the hard disk loads to RAM.
  • Bootstrap code locates the Boot Logical Volume (BLV = hd5) from the harddisk
  • BLV contains AIX kernel, rc.boot Script, Reduced ODM and Boot commands.
  • Then BLV in the RAM uncompresses and Kernel releases from it.
  • Then AIX Kernel gets control.
  • AIX Kernel creates a RAM File System (Rootvg not activated now).
  • kernel starts init process from the BLV.
  • init executes rc.boot script from the BLV in the RAM.
  • Init with rc.boot 1 configures base devices.

rc.boot 1 in detail

  • init process from RAMFS executes rc.boot 1 (if any error LED=c06)
  • restbase command copies ODM from BLV to RAMFS.(success LED=510, error LED=548)
  • cfgmgr -f calls Config_rules ( which are phase=1) and activates all base devices.
  • run command bootinfo -b to check last boot device ( success LED=511).

Then

  • rc.boot 2 activates rootvg from hard disk.

rc.boot 2 in detail

  • rc.boot 2 (LED 551)
  • ipl_varyon to activate rootvg ( success LED= 517, error LED=552,554,556).
  • run command fsck -f /dev/hd4 to check whether “/” unmounted uncleanely in the last shutdown ( error LED=555).
  • mount /dev/hd4 (/) to RAMFS (error LED=557 due to corrupted jfslog..)
  •  fsck -f /dev/hd2  i.e.  “/usr” ( error LED=518).
  • mount /dev/hd2 in RAMFS.
  • fsck -f /dev/hd9var  i.e. check “/var
  • mount /var
  • copycore command checks whether dump occured. then copy dump from primary dump device paging space (/dev/hd6) to /var/adm/ras/.
  • unmount /var
  • swapon /dev/hd6  i.e. activate primary paging space.

Now the condition is /dev/hd4 is mounted on / in the RAMFS;
cfgmgr -f configured all base devices . so configuration data has been written to ODM of RAMFS.

  • mergedev is called and copy /dev from RAMFS to disk.
  • copy customized ODM from RAMFS to hard disk(at this stage both ODM from hd5 and hd4 are sync now)
  • mount /var.
  • Boot messages copy to file on the hard disk ( /var/adm/ras/bootlog)    alog -t boot -o to view bootlog

Now / , /usr and /var are mounted in rootvg on the hard disk. Then

  • Kernel removes RAMFS
  • init process start from / in the rootvg

Here completes rc.boot 2, Now the condition is kernel removed RAMFS and accessing rootvg filesystems from hard disk. init from BLV replaced by init from hard disk

  • in rc.boot 3, init process /etc/inittab file and remaining devices are configured.

rc.boot 3 in detail

  •  /etc/init starts and reads /etc/inittab ( LED=553)
  •  runs /sbin/rc.boot 3 
  • fsck -f /dev/hd3 i.e. check /tmp.
  • mount /tmp
  • sysncvg rootvg &;    i.e. run syncvg in background and report stale PPs.
  • cfgmgr -P2  i.e. run cfgmgr in phase 2 in normal startup. (cfgmgr -P3 in service mode)
  • All remaining devices are configured now.
  • cfgcon configures console ( LED= c31 select console, c32 lft, c33 tty, c34 file on disk). If CDE mentioned in /etc/inittab we will get graphical console.
  • savebase calls to sync ODM from BLV with / FS (i.e. /etc/objrepos).
  • syncd daemon started. All data from cache memory to disk saves in every 60 seconds.
  • starts errdaemon for error logging.
  • LED display turned OFF.
  • rm /etc/nologin i.e. if the file is not removed, then login is not possible.
  • If any device are in missed state, (in Cudv chgstatus=3) display it.
  • Display “system initialization completed

Then execute next line from /etc/inittab


DETAILED
=======

Boot process of AIX in detail

I. The boot process in AIX

As a system administrator you should have a general understanding of the boot process. This knowledge is useful to solve problems that can prevent a system from booting properly. These problems can be both software or hardware.We also recommend that you be familiar with the hardware configuration of your system.

Booting involves the following steps:

The initial step in booting a system is named Power On Self Test (POST). Its purpose is to verify that basic hardware is in functional state.The memory, keyboard, communication and audio devices are also initialized. You can see an image for each of these devices displayed on the screen. It is during this step that you can press a function key to choose a different boot list. The LED values displayed during this phase are model specific. Both hardware and software problems can prevent the system from booting.

System Read Only Storage (ROS) is specific to each system type. It is necessary for AIX 5L Version 5.3 to boot, but it does not build the data structures required for booting. It will locate and load bootstrap code. System ROS contains generic boot information and is operating system independent. Software ROS (also named bootstrap) forms an IPL control block which is compatible with AIX 5L Version 5.3, takes control and builds AIX 5L
specific boot information. A special file system located in memory and named RAMFS file system is created. Software ROS then locates, loads, and turns control over to AIX 5L Boot Logical Volume (BLV). Software ROS is AIX 5L information created based on machine type and is responsible for completing machine preparation to enable it to start AIX 5L kernel. A complete list of files that are part of the BLV can be obtained from directory /usr/lib/boot.

The most important components are the following:

– The AIX 5L kernel
– Boot commands called during the boot process such as bootinfo, cfgmgr
– A reduced version of the ODM. Many devices need to be configured hd4 (/) made available, so their corresponding methods have to be stored in the BLV. These devices are marked as base in PdDv.
– The rc.boot script

Note: Old systems based on MCI architecture execute an additional step before this, the so called Built In Self Test (BIST). This step is no longer required for systems based on PCI architecture.

The AIX 5L kernel is loaded and takes control. The system will display 0299 on the LED panel. All previous codes are hardware-related. The kernel will complete the boot process by configuring devices and starting the init process. LED codes displayed during this stage will be generic AIX 5L codes. So far, the system has tested the hardware, found a BLV, created the RAMFS, and started the init process from the BLV. The rootvg has not yet been activated. From now on the rc.boot script will be called three times, each timebeing passed a different parameter.

1.Boot phase 1

During this phase, the following steps are taken:

The init process started from RAMFS executes the boot script rc.boot

If init process fails for some reason, code c06 is shown on LED display. At this stage, the restbase command is called to copy a partial image of ODM from the BLV into the RAMFS. If this operation is successful LED display shows 510, otherwise LED code 548 is shown.

After this, the cfgmgr -f command reads the Config_Rules class from the reduced ODM. In this class, devices with the attribute phase=1 are considered base devices. Base devices are all devices that are necessary to access rootvg.
For example, if the rootvg is located on a hard disk all devices starting from motherboard up to the disk will have to be initialized.The corresponding methods are called so that rootvg can be activated in the nextboot phase 2. At the end of boot phase 1, the bootinfo -b command is called to determine the last boot device. At this stage, the LED shows 511.

2.Boot phase 2

In this phase , the rc.boot script is passed to the parameter 2. During this phase the following steps are taken.

a) The rootvg volume group is varied on with the special version of the varyonvg command ipl_varyon. If this command is successful the system displays 517. otherwise one of the following LED will appear 552,554,556 and the boot process is halted.

b) Root file system hd4 is checked using the fsck -f command. This will verify only whether the filesystem was unmounted cleanly before the last shutdown. If this command fails, the system will display code 555.

c) The root file system ( /dev/hd4 ) is mounted on a temporary mount point /mnt in RAMFS. If this fails 557 will appear in LED.

d) The /usr file system is verified using fsck -f command and then mounted. the copycore command checks if a dump occured. if it did, it is copied from default dump devices, /dev/hd6 to the default copy directory /var/adm/ras. After this /var is unmounted.

e) The primary pagingspace from rootvg, /dev/hd6 will be activated.

f) The mergedev process is called and /dev files from RAMFS are copied to disk.

g) All customized ODM files from the RAMFS are copied to disk.Both ODM versions from hd4 and hd5 are synchronized.

h) Finaly, the root file system from rootvg (disk) is mounted over the root file system from the RAMFS. The mount points for the root filesystems become available. now the /var and /usr file systems from the rootvg are mounted again on their ordinary mount points. There is no console available at this stage; so all boot messages will be copied to alog. The alog command maintains and manages logs.

3.Boot Phase 3

After phase 2 is completed rootvg is activated and the following steps are taken,

a. /etc/init process is started. It reads /etc/inittab file and calls rc.boot with argument 3

b. The /tmp filesystem is mounted.

c. The rootvg is synchronized by calling the synchvg command and launching it as background process. As a result all stale partitions from rootvg are updated.At this stage LED code 553 is shown.

d. At this stage the cfgmgr command is called.if the system is booted in normal mode the cfgmgr command is called with option -p2; in service mode, option -p3. The cfgmgr command reads the Config_rules files from ODM and calls all methods corresponding to either phase 2 or 3. All other devices that are not base devices are configured at this time.

e. Next the console is configured by calling the cfgcon command. After the configuration of the console , boot messages are send to the console if no STDOUT redirection is made. However all missed messages can be found in /var/adm/ras/conslog. LED codes that can be displayed at this time are :

c31 = console not yet configured.
c32 = console is an LFT terminal.
c33 = console is a TTY.
c34 = console is a file on the disk.

f. finally the synchronization of the ODM in the BLV with the ODM from the / (root) filesyatem is done by the savebase command.

g. The syncd daemon and errdaemon are started.

h. LED display is turned off.

i. if the /etc/nologin exists, it will be removed.

j. If there are devices marked as missing in CuDv a message is displayed on the console.

i. the message system initialization completed is send to the console. the execution of the rc.boot has completed. init process will continue processing the next command from /etc/inittab.

II. system initialization

During system startup, after the root file system has been mounted in the pre-initialization process, the following sequence of events occurs:

1. The init command is run as the last step of the startup process.
2. The init command attempts to read the /etc/inittab file.
3. If the /etc/inittab file exists, the init command attempts to locate an initdefauult entry in the /etc/inittab file.

a. If the initdefault entry exists, the init command uses the specified runlevel as the initial system run level.
b. If the initdefault entry does not exist, the init command requests that the user enter a run level from the system console (/dev/console).
c. If the user enters an S, s, M, or m run level, the init command enters the maintenance run level. These are the only runlevels that do not require a properly formatted /etc/inittab file.

4. If the /etc/inittab file does not exist, the init command places the system in the maintenance run level by default.
5. The init command rereads the /etc/inittab file every 60 seconds. If the /etc/inittab file has changed since the last time the init command read it, the new commands in the /etc/inittab file are executed.

III. The /etc/inittab file

The /etc/inittab file controls the initialization process.

The /etc/inittab file supplies the script to the init command’s role as a general process dispatcher. The process that constitutes the majority of the init command’s process dispatching activities is the /etc/getty line process, which initiates individual terminal lines. Other processes typically dispatched by the init command are daemons and the shell.

The /etc/inittab file is composed of entries that are position-dependent and have the following format,

/etc/inittab format = Identifier:RunLevel:Action:Command

Each entry is delimited by a newline character. A backslash (\) preceding a new line character indicated the continuation of an entry. There are no limits (other than maximum entry size) on the number of entries in the /etc/inittab file.

The maximum entry size is 1024 characters.

The entry fields are :

Identifier
A one to fourteen character field that uniquely identifies an object.

RunLevel
The run level at which this entry can be processed. The run level has the following attributes:

-Run levels effectively correspond to a configuration of processes in the system.

-Each process started by the init command is assigned one or more run levels in which it can exist.

-Run levels are represented by the numbers 0 through 9.

Eg: if the system is in run level 1, only those entries with a 1 in the run-level field are started.

-When you request the init command to change run levels, all processes without a matching entry in the run-level field for the target run level receive a warning signal (SIGTERM). There is a 20-second grace period before processes are forcibly terminated by the kill signal (SIGKILL).

-The run-level field can define multiple run levels for a process by selecting more than one run level in any combination from 0 through 9. If no run level is specified, the process is assumed to be valid at all run levels.

-There are four other values that appear in the run-level field, even though they are not true run levels: a, b, c and h. Entries that have these characters in the run level field are processed only when the telinit command requests them to be run (regardless of the current run level of the system). They differ from run levels in that the init command can never enter run level a, b, c or h. Also, a request for the execution of any of these processes does not change the current run level. Furthermore, a process started by an a, b, or c command is not killed when the init command changes levels. They are only killed if their line in the /etc/inittab file is marked off in the action field, their line is deleted entirely from /etc/inittab, or the init command goes into single-user mode.

Action
It tells the init command how to treat the process specified in the process field. The following actions are recognized by the init command:

respawn If the process does not exist, start the process. Do not wait for its termination (continue scanning the /etc/inittab file). Restart the process when it dies. If the process exists, do nothing and continue scanning the /etc/inittab file.

wait When the init command enters the run level that matches the entry’s run level, start the process and wait for its termination. All subsequent reads of the /etc/inittab file, while the init command is in the same run level, will cause the init command to ignore this entry.

once When the init command enters a run level that matches the entry’s run level, start the process, and do not wait for termination. When it dies, do not restart the process. When the system enters a new run level, and the process is still running from a previous run level change, the program will not be restarted.

boot Process the entry only during system boot, which is when the init command reads the /etc/inittab file during system startup. Start the process, do not wait for its termination, and when it dies, do not restart the process. In order for the instruction to be meaningful, the run level should be the default or it must match the init command’s run level at boot time. This action is useful for an initialization function following a hardware reboot of the system.

bootwait Process the entry the first time that the init command goes from single-user to multi-user state after the system is booted. Start the process, wait for its termination, and when it dies, do not restart the process. If the initdefault is 2, run the process right after boot.

powerfail Execute the process associated with this entry only when the init command receives a power fail signal ( SIGPWR).

powerwait Execute the process associated with this entry only when the init command receives a power fail signal (SIGPWR), and wait until it terminates before continuing to process the /etc/inittab file.

off If the process associated with this entry is currently running, send the warning signal (SIGTERM), and wait 20 seconds before terminating the process with the kill signal (SIGKILL). If the process is not running, ignore this entry.

ondemand Functionally identical to respawn, except this action applies to the a, b, or c values, not to run levels.

initdefault An entry with this action is only scanned when the init command is initially invoked. The init command uses this entry, if it exists, to determine which run level to enter initially. It does this by taking the highest run level specified in the run-level field and using that as its initial state. If the run level field is empty, this is interpreted as 0123456789. therefore, the init command enters run level 9. Additionally, if the init command does not find an initdefault entry in the /etc/inittab file, it requests an initial run level from the user at boot time.

sysinit Entries of this type are executed before the init command tries to access the console before login. It is expected that this entry will only be used to initialize devices on which the init command might try to ask the run level question. These entries are executed and waited for before continuing.

Command
A shell command to execute. The entire command field is prefixed with exec and passed to a forked sh as sh -c exec command. Any legal sh command syntax can appear in this field. Comments can be inserted with the # comment syntax.

The getty command writes over the output of any commands that appear before it it in the /etc/inittab file. To record the output of these commands to the boot log, pipe their output to the alog -tboot command. The stdin, stdout, and stderr file descriptors may not be available while the init command is processing inittab entries. Any entries writing to stdout or stderr may not work predictably unless they redirect their output to a file or to /dev/console.
The following commands are the only supported methods for modifying the records in the /etc/inittab file.

lsitab Lists records in the /etc/inittab file.
mkitab Adds records to the /etc/inittab file.
chitab Changes records in the /etc/inittab file.
rmitab Removes records from the /etc/inittab file.

Eg:

If you want to add a record on the /etc/inittab file to run the find command on the run level 2 and start it again once it has finished:

1. Run the ps command and display only those processes that contain the word find:
# ps -ef | grep find
root 19750 13964 0 10:47:23 pts/0 0:00 grep find
#
2. Add a record named xcmd on the /etc/inittab using the mkitab command:
# mkitab “xcmd:2:respawn:find / -type f > /dev/null 2>&1”
3. Show the new record with the lsitab command:
# lsitab xcmd
xcmd:2:respawn:find / -type f > /dev/null 2>&1
#
4. Display the processes:
# ps -ef | grep find
root 25462 1 6 10:56:58 – 0:00 find / -type f
root 28002 13964 0 10:57:00 pts/0 0:00 grep find
#
5. Cancel the find command process:
# kill 25462
6. Display the processes:
# ps -ef | grep find
root 23538 13964 0 10:58:24 pts/0 0:00 grep find
root 28966 1 4 10:58:21 – 0:00 find / -type f
#

Since the action field is configured as respawn, a new process (28966 in this example) is started each time its predecessor finishes. The process will continue re-spawning, unless you change the action field,

Eg:

1. Change the action field on the record xcmd from respawn to once:
# chitab “xcmd:2:once:find / -type f > /dev/null 2>&1”
2. Display the processes:
# ps -ef | grep find
root 20378 13964 0 11:07:20 pts/0 0:00 grep find
root 28970 1 4 11:05:46 – 0:03 find / -type f
3. Cancel the find command process:
# kill 28970
4. Display the processes:
# ps -ef | grep find
root 28972 13964 0 11:07:33 pts/0 0:00 grep find
#

To delete this record from the /etc/inittab file, you use the rmitab command.

Eg:

# rmitab xcmd
# lsitab xcmd
#

Order of the /etc/inittab entries

The base process entries in the /etc/inittab file is ordered as follows:

1. initdefault
2. sysinit
3. Powerfailure Detection (powerfail)
4. Multiuser check (rc)
5. /etc/firstboot (fbcheck)
6. System Resource Controller (srcmstr)
7. Start TCP/IP daemons (rctcpip)
8. Start NFS daemons (rcnfs)
9. cron
10.pb cleanup (piobe)
11.getty for the console (cons)

The System Resource Controller (SRC) has to be started near the begining of the etc/inittab file since the SRC daemon is needed to start other processes. Since NFS requires TCP/IP daemons to run correctly, TCP/IP daemons are started ahead of the NFS daemons. The entries in the /etc/inittab file are ordered according to dependencies, meaning that if a process (process2) requires that another process (process1) be present for it to operate normally, then an entry for process1 comes before an entry for process2 in the /etc/inittab file.

IOPS calculation for your FAST Pool

I will provide a calculation example for calculating the required spindles in combination with a known skew. No capacity will be addressed in this post, where I will base it purely on IOPS / throughput and apply it to a mixed FAST VP pool.

We all know about the write penalty which is the following:

  • RAID10: 2
  • RAID5: 4
  • RAID6: 6

What if we have an environment which has a skew of 80% with a required amount of IOPS that is 50000. Besides this, we know that there are 80% reads and only 20% writes. Remember that Flash is a good reader?

Now that we know there is a skew of 80%, we can calculate the amount of flash we need inside the pool:

0.80 * 50000 = 40000 IOPS that we need inside the highest tier of our FAST VP pool. For the remaining 10000 IOPS, we will keep the rule of thumb where we base the remaining stuff on 80% for SAS and 20% for NLSAS:

0.2 * (0.2 * 10000) = 2000 IOPS for NLSAS

0.2 * (0.8 * 10000) = 8000 IOPS for SAS

Now, without the write penalty applied, we need to get the following in our pool:

  • Flash: 40000 IOPS
  • SAS: 8000 IOPS
  • NLSAS: 2000 IOPS

Write Penalty

But what about the backend load? By backend load, I mean that there will be the write penalty included for calculating the exact spindles we need. Remember that we have about 20% reads on this environment:

(0.8 * 40000) + (2 * 0.2 * 40000) = 32000 + 16000 = 48000 IOPS for FASTCache which is in RAID10

or..

(0.8 * 40000) + (4 * 0.2 * 40000) = 32000 + 32000 = 64000 IOPS for Flash in our pool on RAID5

(0.8 * 8000) + (4 * 0.2 * 8000) = 6400 + 6400 = 12800 IOPS for SAS in RAID5

(0.8 * 2000) + (6 * 0.2 * 2000) = 1600 + 2400 =  4000 IOPS for NLSAS in RAID6

How many drives to I need per tier?

We keep the following rule of thumbs in mind for the IOPS capacity per drive:

  • Flash: 3500 IOPS
  • SAS 15k: 180 IOPS
  • NLSAS: 90 IOPS

To make sure you are ready for bursts, you could use “little’s law”, which means you will use only about 70% of this rule of thumb so you always have an extra buffer, but this is up to you as we will also round up the amount of disks for best RAID purposes.

64000 / 3500 = 19 disks, which we would round up to 20 when we want flash to be in a RAID5 configuration

12800 / 180 = 72 disks, which we would round up to 75 to keep RAID5 best practices again

4000 / 90 = 45 disks, which we would round up to 48 if we want to keep 6+2 RAID6 sets for example

Keep in mind that this calculation does not incude any capacity based on TB or GB, only on IOPS!

IOPS calculation

What is IOPS?
IOPS (Input Output Per Second) is a common performance metrics used for comparing/measuring the performance of Storage systems like HDD, SSD, SAN.

Quick Calculation sheet

RPM IOPS
15 K 175 IOPS
10 K 125 IOPS
7.2 K 75 IOPS
5.4 K 50 IOPS
How to Calculate IOPS requirement?

We will consider the 600G Segate Cheetah 15K RPM HDD.

http://www.seagate.com/files/docs/pdf/datasheet/disc/cheetah-15k.7-ds1677.3-1007us.pdf

Read/ Write seek time : 3.4 / 3.9 ms (take as 3.65)
Average latency : 2.0 ms
IOPS = 1000/ (Average latency in ms + average read/ write seek time)
= 1000/ (2 + 3.65)
= 176.99 IOPS

RAID level and write Penalty 

RAID Level IO Write Penalty
RAID 0 0
RAID 1 / RAID 10 2
RAID 5 4
RAID 6 6
Total IOPS = Disk Speed IOPS * Number of disks
Actual IOPS = (((Total IOPS× Write %))/( RAID Penalty)) + (Total IOPS×Read %)

Suppose we have 8 the segate Cheetah 15K Hard Drives.
Total IOPS = 8 * 176.99
= 1415.92 IOPS (For RAID 0)
= ~ 1400 IOPS

Considering RAID Overheads

Work Load details
Write Load = 30 %
Read Load = 70%
RAID Level = 10
Actual IOPS = ((1400 * .30) / 2 + (1400 * .70)
= 1190 IOPS

Calculating number of Disks required (Reverse calculation)

Requirement I need 1200 IOPS with RAID 10, write load 30 % and read load 70%
Actual IOPS = 1200
Total IOPS = (Actual IOPS * RAID Penalty)/ (Write % + RAID Penalty – (RAID Penalty * write %))
= (1190 * 2) /( .3 + 2 – (.3*2))
= 2380 / 1.7
= 1400 IOPS

System Panic During Boot Logging the Error “NOTICE: zfs_parse_bootfs: error 19”

Today while migrating SAN i face this issue, hope it will help others too…

The system panic during boot logging the error:

{0} ok boot 56024-disk
Boot device: /virtual-devices@100/channel-devices@200/disk@1 File and args:
SunOS Release 5.10 Version Generic_147440-01 64-bit
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
NOTICE: zfs_parse_bootfs: error 19
Cannot mount root on rpool/68 fstype zfs
panic[cpu0]/thread=180e000: vfs_mountroot: cannot mount root

Changes

This issue usually occurs when system is trying to boot a ZFS rpool and the path to the disk changed, or customer is trying to boot the system from a cloned disk (that means the disk is a copy of another boot disks)

Cause

The issue is caused by a mismatch between the current path of the disk you are trying to boot from and the path stored in the ZFS label of the same disk:

ok boot 56024-disk
Boot device: /virtual-devices@100/channel-devices@200/disk@1 File and args:

 

# zdb -l /dev/rdsk/c0d1s0
——————————————–
LABEL 0
——————————————–
version: 29
name: ‘rpool’
state: 0
txg: 1906
pool_guid: 3917355013518575342
hostid: 2231083589
hostname: ”
top_guid: 3457717657893349899
guid: 3457717657893349899
vdev_children: 1
vdev_tree:
type: ‘disk’
id: 0
guid: 3457717657893349899
path: ‘/dev/dsk/c0d0s0
devid: ‘id1,vdc@f85a3722e4e96b600000e056e0049/a’
phys_path: ‘/virtual-devices@100/channel-devices@200/disk@0:a
whole_disk: 0
metaslab_array: 31
metaslab_shift: 27
ashift: 9
asize: 21361065984
is_log: 0
create_txg: 4

As you can see we are trying to boot the path disk@1 but in the ZFS label the path is disk@0.

Solution

To fix the issue you have to boot the system in failsafe mode or from cdrom and import the rpool on that disk to force ZFS to correct the path:

# zpool import -R /mnt rpool
cannot mount ‘/mnt/export’: failed to create mountpoint
cannot mount ‘/mnt/export/home’: failed to create mountpoint
cannot mount ‘/mnt/rpool’: failed to create mountpoint

# zdb -l /dev/rdsk/c0d1s0
——————————————–
LABEL 0
——————————————–
version: 29
name: ‘rpool’
state: 0
txg: 1923
pool_guid: 3917355013518575342
hostid: 2230848911
hostname: ”
top_guid: 3457717657893349899
guid: 3457717657893349899
vdev_children: 1
vdev_tree:
type: ‘disk’
id: 0
guid: 3457717657893349899
path: ‘/dev/dsk/c0d1s0
devid: ‘id1,vdc@f85a3722e4e96b600000e056e0049/a’
phys_path: ‘/virtual-devices@100/channel-devices@200/disk@1:a
whole_disk: 0
metaslab_array: 31
metaslab_shift: 27
ashift: 9
asize: 21361065984
is_log: 0
create_txg: 4

As you can see the path has been corrected, however you have also to remove the zpool.cache file otherwise after boot the ZFS command will still show the disk as c0d0:

# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 5.86G 13.7G 106K /mnt/rpool
rpool/ROOT 4.35G 13.7G 31K legacy
rpool/ROOT/s10s_u10wos_17b 4.35G 13.7G 4.35G /mnt
rpool/dump 1.00G 13.7G 1.00G –
rpool/export 63K 13.7G 32K /mnt/export
rpool/export/home 31K 13.7G 31K /mnt/export/home
rpool/swap 528M 14.1G 114M –

# zfs mount rpool/ROOT/s10s_u10wos_17b
# cd /mnt/etc/zfs
# rm zpool.cache

IBM VIOS CDROM – DVDROM (Virtual Optical Device)

CDROM – DVDROM (Virtual Optical Device):

Any optical device equipped on the Virtual I/O Server partition (either CD-ROM, DVD-ROM, or DVD-RAM) can be virtualized and assigned at any logical partition, one at a time, using the same virtual SCSI adapter provided to virtual disks. Virtual optical devices can be used to install the operating system and, if DVD-RAM, to make backups.

Creating Virtual Optical Device:

1. On VIO Server create SCSI Server Adapter. This adapter is set to Any client partition can connect.
This dedicated adapter for the virtual optical device helps to make things easier from a system management point of view.

2. On client LPAR: create SCSI client adapter, mapping the id with the server adapter (above)

3. cfgdev (on vio) will bring up a new vhostX
cfgmgr (on client) will bring up a new vscsiX

4. On VIO Server create optical device:

-for using physical CDs and DVDs, create an optical device
$ mkvdev -vdev cd0 -vadapter vhost4 -dev vcd
vcd Available

$ lsdev -virtual

vcd             Available  Virtual Target Device – Optical Media

for file backed (iso images) optical device
$ mkvdev -fbo -vadapter vhost1
vtopt0 Available

$lsdev -virtual

vtopt0           Available   Virtual Target Device – File-backed Optical

(copy the iso image to /var/vio/VMLibrary, ‘lsrep’ will show media repository content)
(lssp -> mkrep -sp rootvg -size 4G    <–this will create media repository)
(creating an iso image: mkvopt -name <filename>.iso -dev cd0 -ro)

load the image into the vtopt0 device: loadopt -vtd vtopt0 -disk dvd.1022A4_OBETA_710.iso
(lsmap -all will show it)

or you can check it:
padmin@vios1 : /home/padmin # lsvopt
VTD             Media                                   Size(mb)
vtopt0          AIX_7100-00-01_DVD_1_of_2_102010.iso        3206

if later another disk is needed, you can unload an image with this command: unloadopt -vtd vtopt0
if we don’t need the image anymore at all we can remove it from the repository: rmvopt -name AIX_7100-00-01.iso

5. On client LPAR cfgmgr and create CDROM filesystem
In the AIX client partition run the cfgmgr command to assign the virtual optical drive to it. If the drive is already assigned to another partition you will get an error message and you will have to release the drive from the partition holding it.

create mount point: mkdir /cdrom

create cdrom filesystem: smitty fs -> add cdrom filesystems:
device name: cd0
mount point: /cdrom
mount automatically

mount the filesystem: mount -v cdrfs -r /dev/cd0 /cdrom

Brocade SAN Switch NTP Setting the date and time

DS_300B_234:admin> date

Tue Dec 31 06:43:16 UTC 2013

DS_300B_234:admin> tstimezone

Time Zone Hour Offset: 0

Time Zone Minute Offset: 0

DS_300B_234:admin> date

Tue Dec 31 06:45:59 UTC 2013

DS_300B_234:admin> tstimezone –interactive

Please identify a location so that time zone rules can be set correctly.

Please select a continent or ocean.

1) Africa

2) Americas

3) Antarctica

4) Arctic Ocean

5) Asia

6) Atlantic Ocean

7) Australia

8) Europe

9) Indian Ocean

10) Pacific Ocean

11) none – I want to specify the time zone using the POSIX TZ format.

Enter number or control-D to quit ?5

Please select a country.

1) Afghanistan           18) Israel                35) Palestine

2) Armenia               19) Japan                 36) Philippines

3) Azerbaijan            20) Jordan                37) Qatar

4) Bahrain               21) Kazakhstan            38) Russia

5) Bangladesh            22) Korea (North)         39) Saudi Arabia

6) Bhutan                23) Korea (South)         40) Singapore

7) Brunei                24) Kuwait                41) Sri Lanka

8) Cambodia              25) Kyrgyzstan            42) Syria

9) China                 26) Laos                  43) Taiwan

10) Cyprus                27) Lebanon               44) Tajikistan

11) East Timor            28) Macau                 45) Thailand

12) Georgia               29) Malaysia              46) Turkmenistan

13) Hong Kong             30) Mongolia              47) United Arab Emirates

14) India                 31) Myanmar (Burma)       48) Uzbekistan

15) Indonesia             32) Nepal                 49) Vietnam

16) Iran                  33) Oman                  50) Yemen

17) Iraq                  34) Pakistan

Enter number or control-D to quit ?14

 

The following information has been given:

 

India

 

Therefore TZ=’Asia/Kolkata’ will be used.

Local time is now:      Tue Dec 31 12:16:40 IST 2013.

Universal Time is now:  Tue Dec 31 06:46:40 UTC 2013.

Is the above information OK?

1) Yes

2) No

Enter number or control-D to quit ?1

System Time Zone change will take effect at next reboot

DS_300B_234:admin> tsclockserver “10.X.X.X.14”

Updating Clock Server configuration…done.

Updated with the NTP servers

DS_300B_234:admin> date

Tue Dec 31 11:44:42 IST 2013

 

Aix using mkcd and creating a bootable ISO image from mksysb images

Create an ISO image of AIXLPAR2 from an existing mksysb file. I had already created a mksysb file of AIXLPAR2 in /usr/sap/put/AIXLPAR2-mksysb.

root@AIXLPAR2 /usr/sap/put # mkcd -L -S -I /usr/sap/put/image -m /usr/sap/put/AIXLPAR2-mksysb

Initializing mkcd log: /var/adm/ras/mkcd.log…

Verifying command parameters…

Creating temporary file system: /mkcd/cd_fs…

Populating the CD or DVD file system…

Building chrp boot image…

Copying backup to the CD or DVD file system…

Creating Rock Ridge format image: /usr/sap/put/image/cd_image_712892

Running mkisofs …

….

mkrr_fs was successful.

Making the CD or DVD image bootable…

Removing temporary file system: /mkcd/cd_fs…

root@AIXLPAR2 /usr/sap/put # ls -ltr

total 4369112

drwxr-xr-x    2 root     system          256 Aug 20 10:31 lost+found

-rw-r–r–    1 root     system   2236979200 Dec 10 09:16 AIXLPAR2-mksysb

drwxr-xr-x    2 root     system          256 Dec 10 09:23 image

– Confirm the ISO image has been created.

root@AIXLPAR2 /usr/sap/put # cd image

root@AIXLPAR2 /usr/sap/put/image # ls -ltr

total 4483256

-rw-r–r–    1 root     system   2295425024 Dec 10 09:24 cd_image_712892

 – Copy the image to the VIOS virtual media library directory.

 # df -m .

Filesystem    MB blocks      Free %Used    Iused %Iused Mounted on

/dev/VMLibrary_LV   7168.00   2768.93   62%        6     1% /var/vio/VMLibrary

# scp 10.X.X.X:/usr/sap/put/image/cd_image_712892 .

root@10.X.X.X’s password:

cd_image_712892                               100% 2189MB  40.5MB/s   00:54

 

# ls -ltr

total 13432312

drwxr-xr-x    2 root     system          256 Dec 03 11:08 lost+found

-rw-r–r–    1 root     staff    3857645568 Dec 04 08:53 AIX61_DVD_1.iso

-rw-r–r–    1 root     staff     724271104 Dec 04 10:09 AIX61_DVD_2.iso

-rw-r–r–    1 root     staff    2295425024 Dec 10 09:33 cd_image_712892

– Rename the image to a more meaningful name.

# mv cd_image_712892 AIXLPAR2_mksysb.iso

# ls -ltr

total 13432312

drwxr-xr-x    2 root     system          256 Dec 03 11:08 lost+found

-rw-r–r–    1 root     staff    3857645568 Dec 04 08:53 AIX61_DVD_1.iso

-rw-r–r–    1 root     staff     724271104 Dec 04 10:09 AIX61_DVD_2.iso

-rw-r–r–    1 root     staff    2295425024 Dec 10 09:33 AIXLPAR2_mksysb.iso

– Map a virtual optical device to the client LPAR.

$ lsmap –vadapter vhost1

SVSA            Physloc                                      Client Partition ID

————— ——————————————– ——————

vhost1          U7998.61X.10071DA-V1-C13                     0x00000000

VTD                   vtscsi3

Status                Available

LUN                   0x8100000000000000

Backing device        lp2vd1

Physloc

$ mkvdev -fbo -vadapter vhost1

vtopt0 Available

 

$ lsmap –vadapter vhost1

SVSA            Physloc                                      Client Partition ID

————— ——————————————– ——————

vhost1          U7998.61X.10071DA-V1-C13                     0x00000000

 

VTD                   vtopt0

Status                Available

LUN                   0x8200000000000000

Backing device

Physloc

 

VTD                   vtscsi3

Status                Available

LUN                   0x8100000000000000

Backing device        lp2vd1

Physloc

 

$ lsrep

Size(mb) Free(mb) Parent Pool         Parent Size      Parent Free

7139      579 rootvg                   139776            57344

 

Name                                    File Size Optical         Access

AIX61_DVD_1.iso                              3679 None            rw

AIX61_DVD_2.iso                               691 None            rw

AIXLPAR2_mksysb.iso                           2190 None            rw

 

$ loadopt -f -vtd vtopt0 -disk AIXLPAR2_mksysb.iso

 

$ lsmap -vadapter vhost1

SVSA            Physloc                                      Client Partition ID

————— ——————————————– ——————

vhost1          U7998.61X.10071DA-V1-C13                     0x00000000

 

VTD                   vtopt0

Status                Available

LUN                   0x8200000000000000

Backing device        /var/vio/VMLibrary/AIXLPAR2_mksysb.iso

Physloc

 

VTD                   vtscsi3

Status                Available

LUN                   0x8100000000000000

Backing device        lp2vd1

Physloc

– Boot the LPAR from the virtual “SCSI CD” and install the image as “normal”.

How to create Etherchannel with SEA VIO in IBM

Configuring your SEA with Etherchannel.

First check the shared virtual ethernet adapters you have –

$ lsmap -all -net
SVEA   Physloc
—— ——————————————–
ent8   U8231.E2D.06C83BT-V1-C11-T1

SEA                   ent11
Backing device        ent10
Status                Available
Physloc

SVEA   Physloc
—— ——————————————–
ent9   U8231.E2D.06C83BT-V1-C12-T1

SEA                 NO SHARED ETHERNET ADAPTER FOUND

$

Then we need to check what adapters we can use, now in our case we will be using the first 2 adapters from a dual port 1Gb card –

$ lsdev -type adapter
name             status      description
ent0             Available   4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent1             Available   4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent2             Available   4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent3             Available   4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent4             Available   4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent5             Available   4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent6             Available   4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent7             Available   4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent8             Available   Virtual I/O Ethernet Adapter (l-lan)
ent9             Available   Virtual I/O Ethernet Adapter (l-lan)
ent10            Available   EtherChannel / IEEE 802.3ad Link Aggregation
ent11            Available   Shared Ethernet Adapter
fcs0             Available   8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)
fcs1             Available   8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)
fcs2             Available   8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)
fcs3             Available   8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)
pkcs11           Available   PKCS#11 Device
sissas0          Available   PCIe x4 Planar 3Gb SAS RAID Adapter
sissas1          Available   PCIe x4 Internal 3Gb SAS RAID Adapter
usbhc0           Available   USB Host Controller (33103500)
usbhc1           Available   USB Host Controller (33103500)
usbhc2           Available   USB Enhanced Host Controller (3310e000)
vfchost0         Available   Virtual FC Server Adapter
vfchost1         Available   Virtual FC Server Adapter
vfchost2         Available   Virtual FC Server Adapter
vfchost3         Available   Virtual FC Server Adapter
vfchost4         Available   Virtual FC Server Adapter
vfchost5         Available   Virtual FC Server Adapter
vfchost6         Available   Virtual FC Server Adapter
vfchost7         Available   Virtual FC Server Adapter
vfchost8         Available   Virtual FC Server Adapter
vfchost9         Available   Virtual FC Server Adapter
vfchost10        Available   Virtual FC Server Adapter
vfchost11        Available   Virtual FC Server Adapter
vfchost12        Available   Virtual FC Server Adapter
vfchost13        Available   Virtual FC Server Adapter
vfchost14        Available   Virtual FC Server Adapter
vfchost15        Available   Virtual FC Server Adapter
vhost0           Available   Virtual SCSI Server Adapter
vhost1           Available   Virtual SCSI Server Adapter
vhost2           Available   Virtual SCSI Server Adapter
vhost3           Available   Virtual SCSI Server Adapter
vhost4           Available   Virtual SCSI Server Adapter
vhost5           Available   Virtual SCSI Server Adapter
vhost6           Available   Virtual SCSI Server Adapter
vhost7           Available   Virtual SCSI Server Adapter
vsa0             Available   LPAR Virtual Serial Adapter
$

Then we need to create the Etherchannel device –

# mkvdev -lnagg ent0 ent4

ent10 Available

This creates the device in standard mode, though you can switch it over to round robin –

# chdev -l ent10 -a mode=round_robin

Then we can create a SEA “bridge” between the physical Etherchannel device ent2 and the virtual ent3

# mkvdev -sea ent10 -vadapter ent8 -default ent8 -defaultid 1

ent11 Available

Once that is done, you can set-up initial TCP/IP config (en11 is the interface for the SEA ent11)

# mktcpip -hostname <vio-name> -inetaddr <ip-address> -interface en11 -start -netmask <subnet> -gateway <gateway-ip>

Now your server ready to go.