Tag Archives: system

System Panic During Boot Logging the Error “NOTICE: zfs_parse_bootfs: error 19”

Today while migrating SAN i face this issue, hope it will help others too…

The system panic during boot logging the error:

{0} ok boot 56024-disk
Boot device: /virtual-devices@100/channel-devices@200/disk@1 File and args:
SunOS Release 5.10 Version Generic_147440-01 64-bit
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
NOTICE: zfs_parse_bootfs: error 19
Cannot mount root on rpool/68 fstype zfs
panic[cpu0]/thread=180e000: vfs_mountroot: cannot mount root

Changes

This issue usually occurs when system is trying to boot a ZFS rpool and the path to the disk changed, or customer is trying to boot the system from a cloned disk (that means the disk is a copy of another boot disks)

Cause

The issue is caused by a mismatch between the current path of the disk you are trying to boot from and the path stored in the ZFS label of the same disk:

ok boot 56024-disk
Boot device: /virtual-devices@100/channel-devices@200/disk@1 File and args:

 

# zdb -l /dev/rdsk/c0d1s0
——————————————–
LABEL 0
——————————————–
version: 29
name: ‘rpool’
state: 0
txg: 1906
pool_guid: 3917355013518575342
hostid: 2231083589
hostname: ”
top_guid: 3457717657893349899
guid: 3457717657893349899
vdev_children: 1
vdev_tree:
type: ‘disk’
id: 0
guid: 3457717657893349899
path: ‘/dev/dsk/c0d0s0
devid: ‘id1,vdc@f85a3722e4e96b600000e056e0049/a’
phys_path: ‘/virtual-devices@100/channel-devices@200/disk@0:a
whole_disk: 0
metaslab_array: 31
metaslab_shift: 27
ashift: 9
asize: 21361065984
is_log: 0
create_txg: 4

As you can see we are trying to boot the path disk@1 but in the ZFS label the path is disk@0.

Solution

To fix the issue you have to boot the system in failsafe mode or from cdrom and import the rpool on that disk to force ZFS to correct the path:

# zpool import -R /mnt rpool
cannot mount ‘/mnt/export’: failed to create mountpoint
cannot mount ‘/mnt/export/home’: failed to create mountpoint
cannot mount ‘/mnt/rpool’: failed to create mountpoint

# zdb -l /dev/rdsk/c0d1s0
——————————————–
LABEL 0
——————————————–
version: 29
name: ‘rpool’
state: 0
txg: 1923
pool_guid: 3917355013518575342
hostid: 2230848911
hostname: ”
top_guid: 3457717657893349899
guid: 3457717657893349899
vdev_children: 1
vdev_tree:
type: ‘disk’
id: 0
guid: 3457717657893349899
path: ‘/dev/dsk/c0d1s0
devid: ‘id1,vdc@f85a3722e4e96b600000e056e0049/a’
phys_path: ‘/virtual-devices@100/channel-devices@200/disk@1:a
whole_disk: 0
metaslab_array: 31
metaslab_shift: 27
ashift: 9
asize: 21361065984
is_log: 0
create_txg: 4

As you can see the path has been corrected, however you have also to remove the zpool.cache file otherwise after boot the ZFS command will still show the disk as c0d0:

# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 5.86G 13.7G 106K /mnt/rpool
rpool/ROOT 4.35G 13.7G 31K legacy
rpool/ROOT/s10s_u10wos_17b 4.35G 13.7G 4.35G /mnt
rpool/dump 1.00G 13.7G 1.00G –
rpool/export 63K 13.7G 32K /mnt/export
rpool/export/home 31K 13.7G 31K /mnt/export/home
rpool/swap 528M 14.1G 114M –

# zfs mount rpool/ROOT/s10s_u10wos_17b
# cd /mnt/etc/zfs
# rm zpool.cache

How to Encrypt File System in AIX ?

Encrypting Filesystem on AIX 6.1.

EFS offers 2 modes of operation:

Root Admin mode
This is the default mode. Root can reset user and group keystore passwords.

Root Guard mode
Root does not have access to user’s encrypted files and cannot change their passwords.

Note: NFS exports of EFS filesystems are not supported.

1. Prerequisites:
RBAC has to be enabled. Should be by default on AIX 6.1. If not use chdev to enable it.

# lsattr -El sys0 | grep RBAC
enhanced_RBAC   true         Enhanced RBAC Mode        True

CryptoLite needs to be installed, verify using below command

bash-3.2# lslpp -l | grep  CryptoLite
  clic.rte.kernext           4.7.0.1  COMMITTED  CryptoLite for C Kernel
  clic.rte.lib               4.7.0.1  COMMITTED  CryptoLite for C Library
  clic.rte.kernext           4.7.0.1  COMMITTED  CryptoLite for C Kernel

2. EFS Commands:

efsenable – Enables EFS on a given system. This is run only once
efskeymgr – Encryption Key Management tool
efsmgr – File encryption and decryption

3. Setup:
To enable EFS on the system use:

# efsenable -a
Enter password to protect your initial keystore:
Enter the same password again:

If your password for EFS will be identical with your login password the EFS Kernel extention will be loaded automatically into the kernel. Thus
you will be able to access the encrypted files without having to provide a password.
Otherwise `efskeymgr -o ksh` has tto be executed in order to load the key’s.

In order to have the ability to encrypt files, the filesystem that will hold this files needs to be EFS enabled (efs=yes) and Extended Attribute V2 has to be activated.

This can be verified using lsfs -q

# lsfs -q /test
Name            Nodename   Mount Pt               VFS   Size    Options    Auto Accounting
/dev/fslv12     --         /test               jfs2  262144  rw         yes  no
  (lv size: 262144, fs size: 262144, block size: 4096, sparse files: yes, inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI: no, VIX: yes, EFS: no, ISNAPSHOT: no, MAXEXT: 0, MountGuard: no)

# chfs -a efs=yes /test

# lsfs -q /archive
Name            Nodename   Mount Pt               VFS   Size    Options    Auto Accounting
/dev/fslv12     --         /test               jfs2  262144  rw         yes  no
  (lv size: 262144, fs size: 262144, block size: 4096, sparse files: yes, inline log: no, inline log size: 0, EAformat: v2, Quota: no, DMAPI: no, VIX: yes, EFS: yes, ISNAPSHOT: no, MAXEXT: 0, MountGuard: no)

Now we will have a look at the keys associated  with the current shell.

# efskeymgr -V
List of keys loaded in the current process:
 Key #0:
                           Kind ..................... User key
                           Id   (uid / gid) ......... 0
                           Type ..................... Private key
                           Algorithm ................ RSA_1024
                           Validity ................. Key is valid
                           Fingerprint .............. s6295ea1:be7cae83:82g30ab8:a02379a0
 Key #1:
                           Kind ..................... Group key
                           Id   (uid / gid) ......... 7
                           Type ..................... Private key
                           Algorithm ................ RSA_1024
                           Validity ................. Key is valid
                           Fingerprint .............. 12928ecb:353f4268:e19078be:268c7d56:18928ecb
 Key #2:
                           Kind ..................... Admin key
                           Id   (uid / gid) ......... 0
                           Type ..................... Private key
                           Algorithm ................ RSA_1024
                           Validity ................. Key is valid
                           Fingerprint .............. 940201f9:89h618ac:2e555ac4:60fdb6b5:268c7d56

4. Encrypt file

Now we will create a file, try to encrypt it, have a problem with umask and finally encrypt the file.

# echo "I like black tee with milk." > secret.txt
# ls -U
total 8
-rw-r------    1 root     system           30 May 8  11:18 secret.txt
drwxr-xr-x-    2 root     system          256 Apr 30 14:10 tmp

        Encrypt file
          |
# efsmgr -e secret.txt
./.efs.LZacya: Security authentication is denied.

# umask 077

# efsmgr -e secret.txt
# ls -U
total 16
drwxr-xr-x-    2 root     system          256 30 May 5 12:13 lost+found
-rw-r-----e    1 root     system           30 30 May 8 11:18 secret.txt
          |
          Indicates that this file is encrypted

Display file encryption information:

# efsmgr -l secret.txt
EFS File information:
 Algorithm: AES_128_CBC
List of keys that can open the file:
 Key #1:
  Algorithm       : RSA_1024
  Who             : uid 0
  Key fingerprint : 00f06152:be7cae83:a02379a0:82e30ab8:f6295ea1

Now I set the file permission’s to 644 and try to read the file as another user.

# chmod 644 secret.txt
# ls -la
-rw-r--r--    1 root     system          145 30 May 8 11:19 secret.txt

user1 # file secret.txt
secret.txt: 0653-902 Cannot open the specified file for reading.
user1 # cat secret.txt
cat: 0652-050 Cannot open secret.txt.

As root we will list the inode number of the file, get the block pointer and read directly from the filesystem using fsdb to see if the file is stored  encrypted.

      Display inode no.
      |
# ls -iU
total 32

    5 -rw-r--r--e    1 root     system          145 30 May 8 11:19 secret.txt

# istat 5 /dev/fslv12
Inode 5 on device 10/27 File
Protection: rw-r--r--
Owner: 0(root)          Group: 0(system)
Link count:   1         Length 145 bytes

Last updated:   Tue May 8 11:18:23 GMT+02:00 2012
Last modified:  Tue May 8 11:18:52 GMT+02:00 2012
Last accessed:  Tue May 8 11:18:52 GMT+02:00 2012

Block pointers (hexadecimal):
29
# fsdb /dev/fslv12
Filesystem /dev/fslv12 is mounted.  Modification is not permitted.

File System:                    /dev/fslv12

File System Size:               261728  (512 byte blocks)
Aggregate Block Size:           4096
Allocation Group Size:          8192    (aggregate blocks)

> display 0x29
Block: 41     Real Address 0x29000
00000000:  119CB74E 637C6FE0 C0BF2DCD 36B775BB   |...Nc|o...-.6.u.|
00000010:  569B5A6C 43476ED3 F4BFE938 7C662A3B   |V.ZlCGn....8|f*;|
00000020:  B5D89C51 FA2BE7B6 CEAF2D3E 555EAA06   |...Q.+....->U^..|
00000030:  4FF23413 B11D1170 982690B3 5F1BCA9A   |O.4....p.&.._...|
00000040:  4AD3CEA5 A3CBFAD9 C730EE00 9BD1F409   |J........0......|
00000050:  71203B85 A51320C6 04A97DA4 43002DA7   |q ;... ...}.C.-.|
00000060:  994CC67B A1AC31DF 2C8201AD 3E5B50F7   |.L.{..1.,...>[P.|
00000070:  6BA7B01D EC5CB918 17E13F46 2935FA98   |k........?F)5..|
00000080:  718DF155 D6E69A41 EF592B60 EA5F7B24   |q..U...A.Y+`._{$|
00000090:  32521FE2 7AD8EC61 1A94413D A8338A26   |2R..z..a..A=.3.&|
000000a0:  62E4A319 D6251A66 F19D4739 2FC7E83A   |b....%.f..G9/..:|
000000b0:  DE0F878A 1F95AB89 5C7F3520 C65B7896   |.........5 .[x.|
000000c0:  915A7655 EC269DFF 68E2B08A 871114A9   |.ZvU.&..h.......|
000000d0:  E30B195F 280F7DCD 4F8BE094 4B5603D8   |..._(.}.O...KV..|
000000e0:  962303B0 D957A2A5 24A2A3A5 6260EA5E   |.#...W..$...b`.^|
000000f0:  A4C62B7D FB9B1841 893D253F 72E61065   |..+}...A.=%?r..e|
-hit enter for more-
00000100:  01A150FD AD54677D A856E9B1 320257E1   |..P..Tg}.V..2.W.|
00000110:  5F023AA3 0191E0D6 4B64583B D9F2A4C7   |_.:.....KdX;....|
00000120:  F988937A E0117EB2 26E61976 E4860D7D   |...z..~.&..v...}|
00000130:  0C724A4E 50616226 BDE06FEB 10A19564   |.rJNPab&..o....d|
00000140:  17C90BB7 774338B3 8525ED90 5EADFD8B   |....wC8..%..^...|
00000150:  636FC1AF D46C2E64 6AC37082 3B0168BE   |co...l.dj.p.;.h.|
00000160:  24C0CD2E D8587254 F6DBC1BA 93BE6AD6   |$....XrT......j.|
00000170:  E89EEFF9 08000B07 E3827C10 AE0FD7DB   |..........|.....|
00000180:  162D0E6D EF94D85A 3F09CD85 A19A31FF   |.-.m...Z?.....1.|
00000190:  49E13BFC 5328F670 E0B50878 942CC4BB   |I.;.S(.p...x.,..|
000001a0:  BF1D6C4F 9DA72F3D 8DC90691 328A7053   |..lO../=....2.pS|
000001b0:  99C31EEB 1CD2208A CBF609C1 4DB86819   |...... .....M.h.|
000001c0:  E2746288 5E152ECA 0E2BD9DF D1D1D210   |.tb.^....+......|
000001d0:  7ADDF0EC 522E93E2 CAA0A36F B3CBFB05   |z...R......o....|
000001e0:  4EA56F3C ECBA1A0C AA132269 2024E065   |N.o<......"i $.e|
000001f0:  00BC51B0 88BBCD8A 9C644F66 6A16DBC8   |..Q......dOfj...|

Above we see that the file on the disk is encrypted.

5. Decrypting a file

Decrypt file
          |
# efsmgr -d secret.txt
# ls -U
total 24

-rw-r--r---    1 root     system          145 May 8 12:23 secret.txt

6. Encryption Inheritance

If you enable Encryption Inheritance on a directory all newly created files in that directory will be automatically encrypted.

To enable Encryption inheritance use:

# efsmgr -E /archive

# ls -U / | grep archive
drwxr-xr-xe    3 root     system          256 Jul 17 12:09 archive

# touch next.txt

# ls -U
total 32

-rw-------e    1 root     system            0 May 8 11:10 next.txt
-rw-r--r---    1 root     system          145 May 8 12:25 secret.txt

7. Grant access to another user
Say we are  user1 and want to have a look at who has EFS access to the file.

user1 $ efsmgr -l secret.txt
EFS File information:
 Algorithm: AES_128_CBC
List of keys that can open the file:
 Key #1:
  Algorithm       : RSA_1024
  Who             : uid 0
  Key fingerprint : 00f06152:be7cae83:a02379a0:82e30ab8:f6295ea1

To grant access to a user use:

Add access to the specified file to a user or group(u/g)
          |
# efsmgr -a secret.txt -u user1
                        |
                        Add user to EFS access list

user1 $ cat secret.txt
I like black tee with milk.

Reference Red-books:

AIX 6.1 Diffrence Guide SG24-7559-00 Page 40
AIX V6 Advanced Security Features SG24-7430-00 Page 59

Tuning the Unix Operating System and Platform

This chapter discusses tuning the operating system (OS) for optimum performance. It discusses the following topics:

  • Server Scaling
  • Solaris 10 Platform-Specific Tuning Information
  • Tuning for the Solaris OS
  • Tuning for Solaris on x86
  • Tuning for Linux platforms
  • Tuning UltraSPARC CMT-Based Systems

Server Scaling

This section provides recommendations for optimal performance scaling server for the following server subsystems:

  • Processors
  • Memory
  • Disk Space
  • Networking
  • UDP Buffer Sizes

Processors

The GlassFish Server automatically takes advantage of multiple CPUs. In general, the effectiveness of multiple CPUs varies with the operating system and the workload, but more processors will generally improve dynamic content performance.

Static content involves mostly input/output (I/O) rather than CPU activity. If the server is tuned properly, increasing primary memory will increase its content caching and thus increase the relative amount of time it spends in I/O versus CPU activity. Studies have shown that doubling the number of CPUs increases servlet performance by 50 to 80 percent.

Memory

See the section Hardware and Software Requirements in the GlassFish Server Release Notes for specific memory recommendations for each supported operating system.

Disk Space

It is best to have enough disk space for the OS, document tree, and log files. In most cases 2GB total is sufficient.

Put the OS, swap/paging file, GlassFish Server logs, and document tree each on separate hard drives. This way, if the log files fill up the log drive, the OS does not suffer. Also, its easy to tell if the OS paging file is causing drive activity, for example.

OS vendors generally provide specific recommendations for how much swap or paging space to allocate. Based on Oracle testing, GlassFish Server performs best with swap space equal to RAM, plus enough to map the document tree.

Networking

To determine the bandwidth the application needs, determine the following values:

  • The number of peak concurrent users (N peak) the server needs to handle.
  • The average request size on your site, r. The average request can include multiple documents. When in doubt, use the home page and all its associated files and graphics.
  • Decide how long, t, the average user will be willing to wait for a document at peak utilization.

Then, the bandwidth required is:

Npeakr / t

For example, to support a peak of 50 users with an average document size of 24 Kbytes, and transferring each document in an average of 5 seconds, requires 240 Kbytes (1920 Kbit/s). So the site needs two T1 lines (each 1544 Kbit/s). This bandwidth also allows some overhead for growth.

The server’s network interface card must support more than the WAN to which it is connected. For example, if you have up to three T1 lines, you can get by with a 10BaseT interface. Up to a T3 line (45 Mbit/s), you can use 100BaseT. But if you have more than 50 Mbit/s of WAN bandwidth, consider configuring multiple 100BaseT interfaces, or look at Gigabit Ethernet technology.

UDP Buffer Sizes

GlassFish Server uses User Datagram Protocol (UDP) for the transmission of multicast messages to GlassFish Server instances in a cluster. For peak performance from a GlassFish Server cluster that uses UDP multicast, limit the need to retransmit UDP messages. To limit the need to retransmit UDP messages, set the size of the UDP buffer to avoid excessive UDP datagram loss.

To Determine an Optimal UDP Buffer Size

The size of UDP buffer that is required to prevent excessive UDP datagram loss depends on many factors, such as:

  • The number of instances in the cluster
  • The number of instances on each host
  • The number of processors
  • The amount of memory
  • The speed of the hard disk for virtual memory

If only one instance is running on each host in your cluster, the default UDP buffer size should suffice. If several instances are running on each host, determine whether the UDP buffer is large enough by testing for the loss of UDP packets.

Note:

On Linux systems, the default UDP buffer size might be insufficient even if only one instance is running on each host. In this situation, set the UDP buffer size as explained in To Set the UDP Buffer Size on Linux Systems.

  1. Ensure that no GlassFish Server clusters are running.

If necessary, stop any running clusters as explained in “To Stop a Cluster” in Oracle GlassFish Server High Availability Administration Guide.

  1. Determine the absolute number of lost UDP packets when no clusters are running.

How you determine the number of lost packets depends on the operating system. For example:

  • On Linux systems, use the netstat -su command and look for the packet receive errors count in the Udp section.
  • On AIX systems, use the netstat -s command and look for the fragments dropped (dup or out of space) count in the ip section.
  1. Start all the clusters that are configured for your installation of GlassFish Server.

Start each cluster as explained in “To Start a Cluster” in Oracle GlassFish Server High Availability Administration Guide.

  1. Determine the absolute number of lost UDP packets after the clusters are started.
  2. If the difference in the number of lost packets is significant, increase the size of the UDP buffer.

To Set the UDP Buffer Size on Linux Systems

On Linux systems, a default UDP buffer size is set for the client, but not for the server. Therefore, on Linux systems, the UDP buffer size might have to be increased. Setting the UDP buffer size involves setting the following kernel parameters:

  • net.core.rmem_max
  • net.core.wmem_max
  • net.core.rmem_default
  • net.core.wmem_default

Set the kernel parameters in the /etc/sysctl.conf file or at runtime.

If you set the parameters in the /etc/sysctl.conf file, the settings are preserved when the system is rebooted. If you set the parameters at runtime, the settings are not preserved when the system is rebooted.

  • To set the parameters in the /etc/sysctl.conf file, add or edit the following lines in the file:
  • To set the parameters at runtime, use the sysctl command.
·         net.core.rmem_max=rmem-max
·         net.core.wmem_max=wmem-max
·         net.core.rmem_default=rmem-default
·         net.core.wmem_default=wmem-default
·         $ /sbin/sysctl -w net.core.rmem_max=rmem-max
·         net.core.wmem_max=wmem-max
·         net.core.rmem_default=rmem-default
·         net.core.wmem_default=wmem-default

Example 5-1 Setting the UDP Buffer Size in the /etc/sysctl.conf File

This example shows the lines in the /etc/sysctl.conf file for setting the kernel parameters for controlling the UDP buffer size to 524288.

net.core.rmem_max=524288
net.core.wmem_max=524288
net.core.rmem_default=524288
net.core.wmem_default=524288

Example 5-2 Setting the UDP Buffer Size at Runtime

This example sets the kernel parameters for controlling the UDP buffer size to 524288 at runtime.

$ /sbin/sysctl -w net.core.rmem_max=524288
net.core.wmem_max=52428
net.core.rmem_default=52428
net.core.wmem_default=524288
net.core.rmem_max = 524288
net.core.wmem_max = 52428
net.core.rmem_default = 52428
net.core.wmem_default = 524288

Solaris 10 Platform-Specific Tuning Information

Solaris Dynamic Tracing (DTrace) is a comprehensive dynamic tracing framework for the Solaris Operating System (OS). You can use the DTrace Toolkit to monitor the system. The DTrace Toolkit is available through the OpenSolaris project from the DTraceToolkit page.

Tuning for the Solaris OS

  • Tuning Parameters
  • File Descriptor Setting

Tuning Parameters

Tuning Solaris TCP/IP settings benefits programs that open and close many sockets. Since the GlassFish Server operates with a small fixed set of connections, the performance gain might not be significant.

The following table shows Solaris tuning parameters that affect performance and scalability benchmarking. These values are examples of how to tune your system for best performance.

Table 5-1 Tuning Parameters for Solaris

Parameter Scope Default Tuned Value Comments
rlim_fd_max /etc/system 65536 65536 Limit of process open file descriptors. Set to account for expected load (for associated sockets, files, and pipes if any).
rlim_fd_cur /etc/system 1024 8192
sq_max_size /etc/system 2 0 Controls streams driver queue size; setting to 0 makes it infinite so the performance runs won’t be hit by lack of buffer space. Set on clients too. Note that setting sq_max_size to 0 might not be optimal for production systems with high network traffic.
tcp_close_wait_interval ndd /dev/tcp 240000 60000 Set on clients too.
tcp_time_wait_interval ndd /dev/tcp 240000 60000 Set on clients too.
tcp_conn_req_max_q ndd /dev/tcp 128 1024
tcp_conn_req_max_q0 ndd /dev/tcp 1024 4096
tcp_ip_abort_interval ndd /dev/tcp 480000 60000
tcp_keepalive_interval ndd /dev/tcp 7200000 900000 For high traffic web sites, lower this value.
tcp_rexmit_interval_initial ndd /dev/tcp 3000 3000 If retransmission is greater than 30-40%, you should increase this value.
tcp_rexmit_interval_max ndd /dev/tcp 240000 10000
tcp_rexmit_interval_min ndd /dev/tcp 200 3000
tcp_smallest_anon_port ndd /dev/tcp 32768 1024 Set on clients too.
tcp_slow_start_initial ndd /dev/tcp 1 2 Slightly faster transmission of small amounts of data.
tcp_xmit_hiwat ndd /dev/tcp 8129 32768 Size of transmit buffer.
tcp_recv_hiwat ndd /dev/tcp 8129 32768 Size of receive buffer.
tcp_conn_hash_size ndd /dev/tcp 512 8192 Size of connection hash table. See Sizing the Connection Hash Table.

 

Sizing the Connection Hash Table

The connection hash table keeps all the information for active TCP connections. Use the following command to get the size of the connection hash table:

ndd -get /dev/tcp tcp_conn_hash

This value does not limit the number of connections, but it can cause connection hashing to take longer. The default size is 512.

To make lookups more efficient, set the value to half of the number of concurrent TCP connections that are expected on the server. You can set this value only in /etc/system, and it becomes effective at boot time.

Use the following command to get the current number of TCP connections.

netstat -nP tcp|wc -l

File Descriptor Setting

On the Solaris OS, setting the maximum number of open files property using ulimit has the biggest impact on efforts to support the maximum number of RMI/IIOP clients.

To increase the hard limit, add the following command to /etc/system and reboot it once:

set rlim_fd_max = 8192

Verify this hard limit by using the following command:

ulimit -a -H

Once the above hard limit is set, increase the value of this property explicitly (up to this limit) using the following command:

ulimit -n 8192

Verify this limit by using the following command:

ulimit -a

For example, with the default ulimit of 64, a simple test driver can support only 25 concurrent clients, but with ulimit set to 8192, the same test driver can support 120 concurrent clients. The test driver spawned multiple threads, each of which performed a JNDI lookup and repeatedly called the same business method with a think (delay) time of 500 ms between business method calls, exchanging data of about 100 KB. These settings apply to RMI/IIOP clients on the Solaris OS.

Tuning for Solaris on x86

The following are some options to consider when tuning Solaris on x86 for GlassFish Server:

  • File Descriptors
  • IP Stack Settings

Some of the values depend on the system resources available. After making any changes to /etc/system, reboot the machines.

File Descriptors

Add (or edit) the following lines in the /etc/system file:

set rlim_fd_max=65536
set rlim_fd_cur=65536
set sq_max_size=0
set tcp:tcp_conn_hash_size=8192
set autoup=60
set pcisch:pci_stream_buf_enable=0

These settings affect the file descriptors.

IP Stack Settings

Add (or edit) the following lines in the /etc/system file:

set ip:tcp_squeue_wput=1
set ip:tcp_squeue_close=1
set ip:ip_squeue_bind=1
set ip:ip_squeue_worker_wait=10
set ip:ip_squeue_profile=0

These settings tune the IP stack.

To preserve the changes to the file between system reboots, place the following changes to the default TCP variables in a startup script that gets executed when the system reboots:

ndd -set /dev/tcp tcp_time_wait_interval 60000
ndd -set /dev/tcp tcp_conn_req_max_q 16384
ndd -set /dev/tcp tcp_conn_req_max_q0 16384
ndd -set /dev/tcp tcp_ip_abort_interval 60000
ndd -set /dev/tcp tcp_keepalive_interval 7200000
ndd -set /dev/tcp tcp_rexmit_interval_initial 4000
ndd -set /dev/tcp tcp_rexmit_interval_min 3000
ndd -set /dev/tcp tcp_rexmit_interval_max 10000
ndd -set /dev/tcp tcp_smallest_anon_port 32768
ndd -set /dev/tcp tcp_slow_start_initial 2
ndd -set /dev/tcp tcp_xmit_hiwat 32768
ndd -set /dev/tcp tcp_recv_hiwat 32768

Tuning for Linux platforms

To tune for maximum performance on Linux, you need to make adjustments to the following:

  • Startup Files
  • File Descriptors
  • Virtual Memory
  • Network Interface
  • Disk I/O Settings
  • TCP/IP Settings

Startup Files

The following parameters must be added to the /etc/rc.d/rc.local file that gets executed during system startup.

<-- begin
#max file count updated ~256 descriptors per 4Mb.
Specify number of file descriptors based on the amount of system RAM.
echo "6553"> /proc/sys/fs/file-max
#inode-max 3-4 times the file-max
#file not present!!!!!
#echo"262144"> /proc/sys/fs/inode-max
#make more local ports available
echo 1024 25000> /proc/sys/net/ipv4/ip_local_port_range
#increase the memory available with socket buffers
echo 2621143> /proc/sys/net/core/rmem_max
echo 262143> /proc/sys/net/core/rmem_default
#above configuration for 2.4.X kernels
echo 4096 131072 262143> /proc/sys/net/ipv4/tcp_rmem
echo 4096 13107262143> /proc/sys/net/ipv4/tcp_wmem
#disable "RFC2018 TCP Selective Acknowledgements," and
"RFC1323 TCP timestamps" echo 0> /proc/sys/net/ipv4/tcp_sack
echo 0> /proc/sys/net/ipv4/tcp_timestamps
#double maximum amount of memory allocated to shm at runtime
echo "67108864"> /proc/sys/kernel/shmmax
#improve virtual memory VM subsystem of the Linux
echo "100 1200 128 512 15 5000 500 1884 2"> /proc/sys/vm/bdflush
#we also do a sysctl
sysctl -p /etc/sysctl.conf
-- end -->

Additionally, create an /etc/sysctl.conf file and append it with the following values:

<-- begin
 #Disables packet forwarding
net.ipv4.ip_forward = 0
#Enables source route verification
net.ipv4.conf.default.rp_filter = 1
#Disables the magic-sysrq key
kernel.sysrq = 0
fs.file-max=65536
vm.bdflush = 100 1200 128 512 15 5000 500 1884 2
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_max= 262143
net.core.rmem_default = 262143
net.ipv4.tcp_rmem = 4096 131072 262143
net.ipv4.tcp_wmem = 4096 131072 262143
net.ipv4.tcp_sack = 0
net.ipv4.tcp_timestamps = 0
kernel.shmmax = 67108864

File Descriptors

You may need to increase the number of file descriptors from the default. Having a higher number of file descriptors ensures that the server can open sockets under high load and not abort requests coming in from clients.

Start by checking system limits for file descriptors with this command:

cat /proc/sys/fs/file-max
8192

The current limit shown is 8192. To increase it to 65535, use the following command (as root):

echo "65535"> /proc/sys/fs/file-max

To make this value to survive a system reboot, add it to /etc/sysctl.conf and specify the maximum number of open files permitted:

fs.file-max = 65535

Note that the parameter is not proc.sys.fs.file-max, as one might expect.

To list the available parameters that can be modified using sysctl:

sysctl -a

To load new values from the sysctl.conf file:

sysctl -p /etc/sysctl.conf

To check and modify limits per shell, use the following command:

limit

The output will look something like this:

cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       8192 kbytes
coredumpsize    0 kbytes
memoryuse       unlimited
descriptors     1024
memorylocked    unlimited
maxproc         8146
openfiles       1024

The openfiles and descriptors show a limit of 1024. To increase the limit to 65535 for all users, edit /etc/security/limits.conf as root, and modify or add the nofile setting (number of file) entries:

*         soft    nofile                     65535
*         hard    nofile                     65535

The character “*” is a wildcard that identifies all users. You could also specify a user ID instead.

Then edit /etc/pam.d/login and add the line:

session required /lib/security/pam_limits.so

On Red Hat, you also need to edit /etc/pam.d/sshd and add the following line:

session required /lib/security/pam_limits.so

On many systems, this procedure will be sufficient. Log in as a regular user and try it before doing the remaining steps. The remaining steps might not be required, depending on how pluggable authentication modules (PAM) and secure shell (SSH) are configured.

Virtual Memory

To change virtual memory settings, add the following to /etc/rc.local:

echo 100 1200 128 512 15 5000 500 1884 2> /proc/sys/vm/bdflush

For more information, view the man pages for bdflush.

Network Interface

To ensure that the network interface is operating in full duplex mode, add the following entry into /etc/rc.local:

mii-tool -F 100baseTx-FD eth0

where eth0 is the name of the network interface card (NIC).

Disk I/O Settings

 

To tune disk I/O performance for non SCSI disks

  1. Test the disk speed.

Use this command:

/sbin/hdparm -t /dev/hdX
  1. Enable direct memory access (DMA).

Use this command:

/sbin/hdparm -d1 /dev/hdX
  1. Check the speed again using the hdparm command.

Given that DMA is not enabled by default, the transfer rate might have improved considerably. In order to do this at every reboot, add the /sbin/hdparm -d1 /dev/hdX line to /etc/conf.d/local.start, /etc/init.d/rc.local, or whatever the startup script is called.

For information on SCSI disks, see: System Tuning for Linux Servers — SCSI.

TCP/IP Settings

 

To tune the TCP/IP settings

  1. Add the following entry to /etc/rc.local
  2. Add the following to /etc/sysctl.conf
  3. Add the following as the last entry in /etc/rc.local
  4. Reboot the system.
  5. Use this command to increase the size of the transmit buffer:
2.  echo 30> /proc/sys/net/ipv4/tcp_fin_timeout
3.  echo 60000> /proc/sys/net/ipv4/tcp_keepalive_time
4.  echo 15000> /proc/sys/net/ipv4/tcp_keepalive_intvl
5.  echo 0> /proc/sys/net/ipv4/tcp_window_scaling
7.  # Disables packet forwarding
8.  net.ipv4.ip_forward = 0
9.  # Enables source route verification
10.net.ipv4.conf.default.rp_filter = 1
11.# Disables the magic-sysrq key
12.kernel.sysrq = 0
13.net.ipv4.ip_local_port_range = 1204 65000
14.net.core.rmem_max = 262140
15.net.core.rmem_default = 262140
16.net.ipv4.tcp_rmem = 4096 131072 262140
17.net.ipv4.tcp_wmem = 4096 131072 262140
18.net.ipv4.tcp_sack = 0
19.net.ipv4.tcp_timestamps = 0
20.net.ipv4.tcp_window_scaling = 0
21.net.ipv4.tcp_keepalive_time = 60000
22.net.ipv4.tcp_keepalive_intvl = 15000
23.net.ipv4.tcp_fin_timeout = 30
25.sysctl -p /etc/sysctl.conf
28.tcp_recv_hiwat ndd /dev/tcp 8129 32768

Tuning UltraSPARC CMT-Based Systems

Use a combination of tunable parameters and other parameters to tune UltraSPARC CMT-based systems. These values are an example of how you might tune your system to achieve the desired result.

Tuning Operating System and TCP Settings

The following table shows the operating system tuning for Solaris 10 used when benchmarking for performance and scalability on UtraSPARC CMT-based systems (64-bit systems).

Table 5-2 Tuning 64-bit Systems for Performance Benchmarking

Parameter Scope Default Value Tuned Value Comments
rlim_fd_max /etc/system 65536 260000 Process open file descriptors limit; should account for the expected load (for the associated sockets, files, pipes if any).
hires_tick /etc/system 1
sq_max_size /etc/system 2 0 Controls streams driver queue size; setting to 0 makes it infinite so the performance runs won’t be hit by lack of buffer space. Set on clients too. Note that setting sq_max_size to 0 might not be optimal for production systems with high network traffic.
ip:ip_squeue_bind 0
ip:ip_squeue_fanout 1
ipge:ipge_taskq_disable /etc/system 0
ipge:ipge_tx_ring_size /etc/system 2048
ipge:ipge_srv_fifo_depth /etc/system 2048
ipge:ipge_bcopy_thresh /etc/system 384
ipge:ipge_dvma_thresh /etc/system 384
ipge:ipge_tx_syncq /etc/system 1
tcp_conn_req_max_q ndd /dev/tcp 128 3000
tcp_conn_req_max_q0 ndd /dev/tcp 1024 3000
tcp_max_buf ndd /dev/tcp 4194304
tcp_cwnd_max ndd/dev/tcp 2097152
tcp_xmit_hiwat ndd /dev/tcp 8129 400000 To increase the transmit buffer.
tcp_recv_hiwat ndd /dev/tcp 8129 400000 To increase the receive buffer.

 

Note that the IPGE driver version is 1.25.25.

Disk Configuration

If HTTP access is logged, follow these guidelines for the disk:

  • Write access logs on faster disks or attached storage.
  • If running multiple instances, move the logs for each instance onto separate disks as much as possible.
  • Enable the disk read/write cache. Note that if you enable write cache on the disk, some writes might be lost if the disk fails.
  • Consider mounting the disks with the following options, which might yield better disk performance: nologging, directio, noatime.

Network Configuration

If more than one network interface card is used, make sure the network interrupts are not all going to the same core. Run the following script to disable interrupts:

allpsr=`/usr/sbin/psrinfo | grep -v off-line | awk '{ print $1 }'`
   set $allpsr
   numpsr=$#
   while [ $numpsr -gt 0 ];
   do
       shift
       numpsr=`expr $numpsr - 1`
       tmp=1
       while [ $tmp -ne 4 ];
       do
           /usr/sbin/psradm -i $1
           shift
           numpsr=`expr $numpsr - 1`
           tmp=`expr $tmp + 1`
       done
   done

Put all network interfaces into a single group. For example:

$ifconfig ipge0 group webserver
$ifconfig ipge1 group webserver

Operating System Tuning for Oracle Database

This chapter describes how to tune Oracle Database. It contains the following sections:

  • Importance of Tuning
  • Operating System Tools
  • Tuning Memory Management
  • Tuning Disk I/O
  • Monitoring Disk Performance
  • System Global Area
  • Tuning the Operating System Buffer Cache

1.1 Importance of Tuning

Oracle Database is a highly optimizable software product. Frequent tuning optimizes system performance and prevents data bottlenecks.

Before tuning the database, you must observe its normal behavior by using the tools described in the “Operating System Tools” section.

1.2 Operating System Tools

Several operating system tools are available to enable you to assess database performance and determine database requirements. In addition to providing statistics for Oracle processes, these tools provide statistics for CPU usage, interrupts, swapping, paging, context switching, and I/O for the entire system.

This section provides information about the following common tools:

  • vmstat
  • sar
  • iostat
  • swap, swapinfo, swapon, or lsps
  • AIX Tools
  • HP-UX Tools
  • Linux Tools
  • Solaris Tools
  • Mac OS X Tools

See Also:

The operating system documentation and man pages for more information about these tools

1.2.1 vmstat

Note:

On Mac OS X, the vm_stat command displays virtual memory information. Refer to the vm_stat man page for more information about using this command.

Use the vmstat command to view process, virtual memory, disk, trap, and CPU activity, depending on the switches that you supply with the command. Run one of the following commands to display a summary of CPU activity six times, at five-second intervals:

  • On HP-UX and Solaris:
  • AIX, Linux, and Tru64 UNIX:
·         $ vmstat -S 5 6
·         $ vmstat 5 6

The following is sample output of this command on HP-UX:

procs     memory            page            disk          faults      cpu
 r b w   swap  free  si  so pi po fr de sr f0 s0 s1 s3   in   sy   cs us sy id
 0 0 0   1892  5864   0   0  0  0  0  0  0  0  0  0  0   90   74   24  0  0 99
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   46   25   21  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   47   20   18  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  2   53   22   20  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   87   23   21  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   48   41   23  0  0 100

The w sub column, under the procs column, shows the number of potential processes that have been swapped out and written to disk. If the value is not zero, then swapping occurs and the system is short of memory.

The si and so columns under the page column indicate the number of swap-ins and swap-outs per second, respectively. Swap-ins and swap-outs should always be zero.

The sr column under the page column indicates the scan rate. High scan rates are caused by a shortage of available memory.

The pi and po columns under the page column indicate the number of page-ins and page-outs per second, respectively. It is normal for the number of page-ins and page-outs to increase. Some paging always occurs even on systems with sufficient available memory.

Note:

The output from the vmstat command differs across platforms.

See Also:

Refer to the man page for information about interpreting the output

8.2.2 sar

Depending on the switches that you supply with the command, use the sar (system activity reporter) command to display cumulative activity counters in the operating system.

Note:

On Tru64 UNIX systems, the sar command is available in the UNIX SVID2 compatibility subset, OSFSVID.

On an HP-UX system, the following command displays a summary of I/O activity ten times, at ten-second intervals:

$ sar -b 10 10

The following example shows the output of this command:

13:32:45 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
13:32:55       0      14     100       3      10      69       0       0
13:33:05       0      12     100       4       4       5       0       0
13:33:15       0       1     100       0       0       0       0       0
13:33:25       0       1     100       0       0       0       0       0
13:33:35       0      17     100       5       6       7       0       0
13:33:45       0       1     100       0       0       0       0       0
13:33:55       0       9     100       2       8      80       0       0
13:34:05       0      10     100       4       4       5       0       0
13:34:15       0       7     100       2       2       0       0       0
13:34:25       0       0     100       0       0     100       0       0

Average        0       7     100       2       4      41       0       0

The sar output provides a snapshot of system I/O activity at a given point in time. If you specify the interval time with more than one option, then the output can become difficult to read. If you specify an interval time of less than 5, then the sar activity itself can affect the output.

See Also:

The man page for more information about sar

1.2.3 iostat

Use the iostat command to view terminal and disk activity, depending on the switches that you supply with the command. The output from the iostat command does not include disk request queues, but it shows which disks are busy. This information can be used to balance I/O loads.

The following command displays terminal and disk activity five times, at five-second intervals:

$ iostat 5 5

The following is sample output of the command on Solaris:

tty          fd0           sd0           sd1           sd3          cpu
 tin tout Kps tps serv  Kps tps serv  Kps tps serv  Kps tps serv  us sy wt id
   0    1   0   0    0    0   0   31    0   0   18    3   0   42   0  0  0 99
   0   16   0   0    0    0   0    0    0   0    0    1   0   14   0  0  0 100
   0   16   0   0    0    0   0    0    0   0    0    0   0    0   0  0  0 100
   0   16   0   0    0    0   0    0    0   0    0    0   0    0   0  0  0 100
   0   16   0   0    0    0   0    0    2   0   14   12   2   47   0  0  1 98

Use the iostat command to look for large disk request queues. A request queue shows how long the I/O requests on a particular disk device must wait to be serviced. Request queues are caused by a high volume of I/O requests to that disk or by I/O with long average seek times. Ideally, disk request queues should be at or near zero.

1.2.4 swap, swapinfo, swapon, or lsps

See Also:

“Determining Available and Used Swap Space” for information about swap space on Mac OS X systems

Use the swap, swapinfo, swapon, or lsps command to report information about swap space usage. A shortage of swap space can stop processes responding, leading to process failures with Out of Memory errors. The following table lists the appropriate command to use for each platform.

Platform Command
AIX lsps -a
HP-UX swapinfo -m
Linux and Tru64 UNIX swapon -s
Solaris swap -l and swap -s

 

The following example shows sample output from the swap -l command on Solaris:

swapfile             dev        swaplo blocks        free
/dev/dsk/c0t3d0s1    32,25      8      197592        162136

1.2.5 AIX Tools

The following sections describe tools available on AIX systems.

  • Base Operation System Tools
  • Performance Toolbox
  • System Management Interface Tool

See Also:

The AIX operating system documentation and man pages for more information about these tools

1.2.5.1 Base Operation System Tools

The AIX Base Operation System (BOS) contains performance tools that are historically part of UNIX systems or are required to manage the implementation-specific features of AIX. The following table lists the most important BOS tools.

Tool Function
lsattr Displays the attributes of devices
lslv Displays information about a logical volume or the logical volume allocations of a physical volume
netstat Displays the contents of network-related data structures
nfsstat Displays statistics about Network File System (NFS) and Remote Procedure Call (RPC) activity
nice Changes the initial priority of a process
no Displays or sets network options
ps Displays the status of one or more processes
reorgvg Reorganizes the physical-partition allocation within a volume group
time Displays the elapsed execution, user CPU processing, and system CPU processing time
trace Records and reports selected system events
vmo Manages Virtual Memory Manager tunable parameters

 

1.2.5.2 Performance Toolbox

The AIX Performance Toolbox (PTX) contains tools for monitoring and tuning system activity locally and remotely. PTX consists of two main components, the PTX Manager and the PTX Agent. The PTX Manager collects and displays data from various systems in the configuration by using the xmperf utility. The PTX Agent collects and transmits data to the PTX Manager by using the xmserd daemon. The PTX Agent is also available as a separate product called Performance Aide for AIX.

Both PTX and Performance Aide include the monitoring and tuning tools listed in the following table.

Tool Description
fdpr Optimizes an executable program for a particular workload
filemon Uses the trace facility to monitor and report the activity of the file system
fileplace Displays the placement of blocks of a file within logical or physical volumes
lockstat Displays statistics about contention for kernel locks
lvedit Facilitates interactive placement of logical volumes within a volume group
netpmon Uses the trace facility to report on network I/O and network-related CPU usage
rmss Simulates systems with various memory sizes for performance testing
svmon Captures and analyzes information about virtual-memory usage
syscalls Records and counts system calls
tprof Uses the trace facility to report CPU usage at module and source-code-statement levels
BigFoot Reports the memory access patterns of processes
stem Permits subroutine-level entry and exit instrumentation of existing executables

 

See Also:

  • Performance Toolbox for AIX Guide and Reference for information about these tools
  • AIX 5L Performance Management Guide for information about the syntax of some of these tools

1.2.5.3 System Management Interface Tool

The AIX System Management Interface Tool (SMIT) provides a menu-driven interface to various system administrative and performance tools. By using SMIT, you can navigate through large numbers of tools and focus on the jobs that you want to perform.

1.2.6 HP-UX Tools

The following performance analysis tools are available on HP-UX systems:

  • GlancePlus/UX

This HP-UX utility is an online diagnostic tool that measures the activities of the system. GlancePlus displays information about how system resources are used. It displays dynamic information about the system I/O, CPU, and memory usage on a series of screens. You can use the utility to monitor how individual processes are using resources.

  • HP PAK

HP Programmer’s Analysis Kit (HP PAK) consists of the following tools:

  • Puma

This tool collects performance statistics during a program run. It provides several graphical displays for viewing and analyzing the collected statistics.

  • Thread Trace Visualizer (TTV)

This tool displays trace files produced by the instrumented thread library, libpthread_tr.sl, in a graphical format. It enables you to view how threads are interacting and to find where threads are blocked waiting for resources.

HP PAK is bundled with the HP Fortran 77, HP Fortran 90, HP C, HP C++, HP ANSI C++, and HP Pascal compilers.

The following table lists the performance tuning tools that you can use for additional performance tuning on HP-UX.

Tools Function
caliper (Itanium only) Collects run-time application data for system analysis tasks such as cache misses, translation look-aside buffer (TLB) or instruction cycles, along with fast dynamic instrumentation. It is a dynamic performance measurement tool for C, C++, Fortran, and assembly applications.
gprof Creates an execution profile for programs.
monitor Monitors the program counter and calls to certain functions.
netfmt Monitors the network.
netstat Reports statistics on network performance.
nfsstat Displays statistics about Network File System (NFS) and Remote Procedure Call (RPC) activity.
nettl Captures network events or packets by logging and tracing.
prof Creates an execution profile of C programs and displays performance statistics for your program, showing where your program is spending most of its execution time.
profil Copies program counter information into a buffer.
top Displays the top processes on the system and periodically updates the information.

 

1.2.7 Linux Tools

On Linux systems, use the top, free, and cat /proc/meminfo commands to view information about swap space, memory, and buffer usage.

1.2.8 Solaris Tools

On Solaris systems, use the mpstat command to view statistics for each processor in a multiprocessor system. Each row of the table represents the activity of one processor. The first row summarizes all activity since the last system restart. Each subsequent row summarizes activity for the preceding interval. All values are events per second unless otherwise noted. The arguments are for time intervals between statistics and number of iterations.

The following example shows sample output from the mpstat command:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    1    71   21   23    0    0    0    0    55    0   0   0  99
  2    0   0    1    71   21   22    0    0    0    0    54    0   0   0  99
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0    61   16   25    0    0    0    0    57    0   0   0 100
  2    1   0    0    72   16   24    0    0    0    0    59    0   0   0 100

1.2.9 Mac OS X Tools

You can use the following additional performance tuning tools:

  • Use the top command to display information about running processes and memory usage.
  • Use the Apple Computer Hardware Understanding Developer (CHUD) tools, such as Shark and BigTop, to monitor system activity and tune applications.

See Also:

For more information about the CHUD tools, refer to

http://developer.apple.com/library/mac/#documentation/Performance/Conceptual/PerformanceOverview/Introduction/Introduction.html

1.3 Tuning Memory Management

Start the memory tuning process by measuring paging and swapping space to determine how much memory is available. After you determine your system memory usage, tune the Oracle buffer cache.

The Oracle buffer manager ensures that the most frequently accessed data is cached longer. If you monitor the buffer manager and tune the buffer cache, then you can significantly improve Oracle Database performance. The optimal Oracle Database buffer size for your system depends on the overall system load and the relative priority of Oracle Database over other applications.

This section includes the following topics:

  • Allocating Sufficient Swap Space
  • Controlling Paging
  • Adjusting Oracle Block Size

8.3.1 Allocating Sufficient Swap Space

Try to minimize swapping because it causes significant operating system overhead. To check for swapping, use the sar or vmstat commands. For information about the appropriate options to use with these commands, refer to the man pages.

If your system is swapping and you must conserve memory, then:

  • Avoid running unnecessary system daemon processes or application processes.
  • Decrease the number of database buffers to free some memory.
  • Decrease the number of operating system file buffers, especially if you are using raw devices.

Note:

On Mac OS X systems, swap space is allocated dynamically. If the operating system requires more swap space, then it creates additional swap files in the /private/var/vm directory. Ensure that the file system that contains this directory has sufficient free disk space to accommodate additional swap files. Refer “Determining Available and Used Swap Space” for more information on allocating swap space.

To determine the amount of swap space, run one of the following commands, depending on your platform:

Platform Command
AIX lsps -a
HP-UX swapinfo -m
Linux swapon -s
Solaris swap -l and swap -s
Tru64 UNIX swapon -s

 

To add swap space to your system, run one of the following commands, depending on your platform:

Platform Command
AIX chps or mkps
HP-UX swapon
Linux swapon -a
Solaris swap -a
Tru64 UNIX swapon -a

 

Set the swap space to between two and four times the physical memory. Monitor the use of swap space, and increase it as required.

See Also:

The operating system documentation for more information about these commands

1.3.2 Controlling Paging

Paging may not present as serious a problem as swapping, because an entire program does not have to be stored in memory to run. A small number of page-outs may not noticeably affect the performance of your system.

To detect excessive paging, run measurements during periods of fast response or idle time to compare against measurements from periods of slow response.

Use the vmstat (vm_stat on Mac OS X) or sar command to monitor paging.

See Also:

The man pages or your operating system documentation for information about interpreting the results for your platform

The following table lists the important columns from the output of these commands.

Platform Column Function
Solaris vflt/s Indicates the number of address translation page faults. Address translation faults occur when a process refers to a valid page not in memory.
Solaris rclm/s Indicates the number of valid pages that have been reclaimed and added to the free list by page-out activity. This value should be zero.
HP-UX at Indicates the number of address translation page faults. Address translation faults occur when a process refers to a valid page not in memory.
HP-UX re Indicates the number of valid pages that have been reclaimed and added to the free list by page-out activity. This value should be zero.

 

If your system consistently has excessive page-out activity, then consider the following solutions:

  • Install more memory.
  • Move some of the work to another system.
  • Configure the System Global Area (SGA) to use less memory.

1.3.3 Adjusting Oracle Block Size

During read operations, entire operating system blocks are read from the disk. If the database block size is smaller than the operating system file system block size, then I/O bandwidth is inefficient. If you set Oracle Database block size to be a multiple of the file system block size, then you can increase performance by up to 5 percent.

The DB_BLOCK_SIZE initialization parameter sets the database block size. However, to change the value of this parameter, you must re-create the database.

To see the current value of the DB_BLOCK_SIZE parameter, run the SHOW PARAMETER DB_BLOCK_SIZE command in SQL*Plus.

1.4 Tuning Disk I/O

Balance I/O evenly across all available disks to reduce disk access times. For smaller databases and those not using RAID, ensure that different data files and tablespaces are distributed across the available disks.

1.4.1 Using Automatic Storage Management

If you choose to use Automatic Storage Management for database storage, then all database I/O is balanced across all available disk devices in the Automatic Storage Management disk group. Automatic Storage Management provides the performance of raw device I/O without the inconvenience of managing raw devices.

By using Automatic Storage Management, you avoid manually tuning disk I/O.

1.4.2 Choosing the Appropriate File System Type

Depending on your operating system, you can choose from a range of file system types. Each file system type has different characteristics. This fact can have a substantial impact on database performance. The following table lists common file system types.

File System Platform Description
S5 HP-UX and Solaris UNIX System V file system
UFS AIX, HP-UX, Mac OS X, Solaris, Tru64 UNIX Unified file system, derived from BSD UNIXNote: On Mac OS X, Oracle does not recommend the use of the UFS file system for either software or database files.
VxFS AIX, HP-UX, and Solaris VERITAS file system
None All Raw devices (no file system)
ext2/ext3 Linux Extended file system for Linux
OCFS Linux Oracle cluster file system
AdvFS Tru64 UNIX Advanced file system
CFS Tru64 UNIX Cluster file system
JFS/JFS2 AIX Journaled file system
HFS Plus, HFSX Mac OS X HFS Plus is the standard hierarchical file system used by Mac OS X. HFSX is an extension to HFS Plus that enables case-sensitive file names.
GPFS AIX General parallel file system

 

The suitability of a file system for an application is usually not documented. For example, even different implementations of the Unified file system are hard to compare. Depending on the file system that you choose, performance differences can be up to 20 percent. If you choose to use a file system, then:

  • Make a new file system partition to ensure that the hard disk is clean and unfragmented.
  • Perform a file system check on the partition before using it for database files.
  • Distribute disk I/O as evenly as possible.
  • If you are not using a logical volume manager or a RAID device, then consider placing log files on a different file system from data files.

1.5 Monitoring Disk Performance

The following sections describe the procedure for monitoring disk performance.

Monitoring Disk Performance on Mac OS X

Use the iostat and sar commands to monitor disk performance. For more information about using these commands, refer to the man pages.

Monitoring Disk Performance on Other Operating Systems

To monitor disk performance, use the sar -b and sar -u commands.

The following table describes the columns of the sar -b command output that are significant for analyzing disk performance.

Columns Description
bread/s, bwrit/s Blocks read and blocks written per second (important for file system databases)
pread/s, pwrit/s Number of reads and writes per second from or to raw character devices.

 

An important sar -u column for analyzing disk performance is %wio, the percentage of CPU time spent waiting on blocked I/O.

Note:

Not all Linux distributions display the %wio column in the output of the sar -u command. For detailed I/O statistics, you can use iostat -x command.

Key indicators are:

  • The sum of the bread, bwrit, pread, and pwrit column values indicates the level of activity of the disk I/O subsystem. The higher the sum, the busier the I/O subsystem. The larger the number of physical drives, the higher the sum threshold number can be. A good default value is no more than 40 for 2 drives and no more than 60 for 4 to 8 drives.
  • The %rcache column value should be greater than 90 and the %wcache column value should be greater than 60. Otherwise, the system may be disk I/O bound.
  • If the %wio column value is consistently greater than 20, then the system is I/O bound.

1.6 System Global Area

The SGA is the Oracle structure that is located in shared memory. It contains static data structures, locks, and data buffers. Sufficient shared memory must be available to each Oracle process to address the entire SGA.

The maximum size of a single shared memory segment is specified by the shmmax (shm_max on Tru64 UNIX) kernel parameter.

The following table shows the recommended value for this parameter, depending on your platform.

Platform Recommended Value
AIX NA
HP-UX The size of the physical memory installed on the systemSee Also: HP-UX Shared Memory Segments for an Oracle Instance for information about the shmmax parameter on HP-UX
Linux Half the size of the physical memory installed on the system
Mac OS X Half the size of the physical memory installed on the system
Solaris and Tru64 UNIX 4294967295 or 4 GB minus 16 MBNote: The value of the shm_max parameter must be at least 16 MB for the Oracle Database instance to start. If your system runs both Oracle9i Database and Oracle Database 10g instances, then you must set the value of this parameter to 2 GB minus 16 MB. On Solaris, this value can be greater than 4 GB on 64-bit systems.

 

If the size of the SGA exceeds the maximum size of a shared memory segment (shmmax or shm_max), then Oracle Database attempts to attach more contiguous segments to fulfill the requested SGA size. The shmseg kernel parameter (shm_seg on Tru64 UNIX) specifies the maximum number of segments that can be attached by any process. Set the following initialization parameters to control the size of the SGA:

  • DB_CACHE_SIZE
  • DB_BLOCK_SIZE
  • JAVA_POOL_SIZE
  • LARGE_POOL_SIZE
  • LOG_BUFFERS
  • SHARED_POOL_SIZE

Alternatively, set the SGA_TARGET initialization parameter to enable automatic tuning of the SGA size.

Use caution when setting values for these parameters. When values are set too high, too much of the physical memory is devoted to shared memory. This results in poor performance.

An Oracle Database configured with Shared Server requires a higher setting for the SHARED_POOL_SIZE initialization parameter, or a custom configuration that uses the LARGE_POOL_SIZE initialization parameter. If you installed the database with Oracle Universal Installer, then the value of the SHARED_POOL_SIZE parameter is set automatically by Oracle Database Configuration Assistant. However, if you created a database manually, then increase the value of the SHARED_POOL_SIZE parameter in the parameter file by 1 KB for each concurrent user.

1.6.1 Determining the Size of the SGA

You can determine the SGA size in one of the following ways:

  • Run the following SQL*Plus command to display the size of the SGA for a running database:
·         SQL> SHOW SGA

The result is shown in bytes.

  • When you start your database instance, the size of the SGA is displayed next to the Total System Global Area heading.
  • On systems other than Mac OS X, run the ipcs command as the oracle user.

1.6.2 Shared Memory on AIX

Note:

The information in this section applies only to AIX.

Shared memory uses common virtual memory resources across processes. Processes share virtual memory segments through a common set of virtual memory translation resources, for example, tables and cached entries, for improved performance.

Shared memory can be pinned to prevent paging and to reduce I/O overhead. To perform this, set the LOCK_SGA parameter to true. On AIX 5L, the same parameter activates the large page feature whenever the underlying hardware supports it.

Run the following command to make pinned memory available to Oracle Database:

$ /usr/sbin/vmo -r -o v_pinshm=1

Run a command similar to the following to set the maximum percentage of real memory available for pinned memory, where percent_of_real_memory is the maximum percent of real memory that you want to set:

$ /usr/sbin/vmo -r -o maxpin%=percent_of_real_memory

When using the maxpin% option, it is important that the amount of pinned memory exceeds the Oracle SGA size by at least 3 percent of the real memory on the system, enabling free pinnable memory for use by the kernel. For example, if you have 2 GB of physical memory and you want to pin the SGA by 400 MB (20 percent of the RAM), then run the following command:

$ /usr/sbin/vmo -r -o maxpin%=23

Use the svmon command to monitor the use of pinned memory during the operation of the system. Oracle Database attempts to pin memory only if the LOCK_SGA parameter is set to true.

Large Page Feature on AIX POWER4- and POWER5-Based Systems

To turn on and reserve 10 large pages each of size 16 MB on a POWER4 or POWER 5 system, run the following command:

$ /usr/sbin/vmo -r -o lgpg_regions=10 -o lgpg_size=16777216

This command proposes bosboot and warns that a restart is required for the changes to take affect.

Oracle recommends specifying enough large pages to contain the entire SGA. The Oracle Database instance attempts to allocate large pages when the LOCK_SGA parameter is set to true. If the SGA size exceeds the size of memory available for pinning, or large pages, then the portion of the SGA exceeding these sizes is allocated to ordinary shared memory.

See Also:

The AIX documentation for more information about enabling and tuning pinned memory and large pages

1.7 Tuning the Operating System Buffer Cache

To take full advantage of raw devices, adjust the size of Oracle Database buffer cache. If memory is limited, then adjust the operating system buffer cache.

The operating system buffer cache holds blocks of data in memory while they are being transferred from memory to disk, or from disk to memory.

Oracle Database buffer cache is the area in memory that stores Oracle Database buffers. Because Oracle Database can use raw devices, it does not use the operating system buffer cache.

If you use raw devices, then increase the size of Oracle Database buffer cache. If the amount of memory on the system is limited, then make a corresponding decrease in the operating system buffer cache size.

Use the sar command to determine which buffer caches you must increase or decrease.

See Also:

The man page on Tru64 UNIX for more information about the sar command

Note:

On Tru64 UNIX, do not reduce the operating system buffer cache, because the operating system automatically resizes the amount of memory that it requires for buffering file system I/O. Restricting the operating system buffer cache can cause performance issues.

Boot the IBM Aix system into Service mode

This document describes how to boot the system into Service mode (also known as Maintenance mode) to install the machine, restore an operating system backup, or perform maintenance on the rootvg volume group.

The information in this document applies to AIX Versions 3.x, 4.x and 5.x.

Booting microchannel systems into Service mode Booting  PCI-based systems into Service mode PCI machine-specific information Accessing rootvg and mounting file systems
Related documentation

——————————————————————————–

Booting microchannel systems into Service mode

To boot microchannel systems into Service mode, turn the key to the Maintenance  position and press the yellow reset button twice. You must boot from bootable  media, such as an installation CD-ROM, installation tape, or a bootable backup  tape made via the mksysb command or the Sysback product of the correct level for  this  machine.

For AIX Version 3.2, you may use bootable bosboot diskettes. To boot from these,  insert the first bosboot diskette into the diskette drive. When you see LED c07,  insert the next diskette, which is usually the display extensions diskette.  After this diskette is read, you should receive a menu prompting you for the  installation diskette.

For information on accessing your rootvg volume group, see the section entitled  “Accessing rootvg and mounting file systems”.

The preceding discussion assumes that the Service mode bootlist has not been  modified from the default bootlist. If the bootlist has been modified, it must  be reset such that one of the boot media types from the preceding selections is  before the standard boot media, such asthe hard disk.

If the machine is an SMP model (7012-Gxx, 7013-Jxx, and 7015-Rxx) and the  Autoservice IPL flag is disabled, then a menu like the following will display  when it is booting in Service mode:

MAINTENANCE MENU (Rev. 04.03)
0> DISPLAY CONFIGURATION
1> DISPLAY BUMP ERROR LOG
2> ENABLE SERVICE CONSOLE
3> DISABLE SERVICE CONSOLE
4> RESET
5> POWER OFF
6> SYSTEM BOOT
7> OFF-LINE TESTS
8> SET PARAMETERS
9> SET NATIONAL LANGUAGE
SELECT:

You can boot these machines into Service mode or even Normal mode with the Fast  IPL Flag set. If you do not, the machine can take anywhere from 30 to 60 minutes  to boot up. There are a few ways to set the Fast IPL Flag for these machines.

NOTE: The console must be an ASCII type and connected to the S1 port of the  system. Graphic monitors will not work.

Use the following instructions to boot SMP machines into service with Fast IPL set.

Insert the bootable media of the same OS Level. Use the mksysb/cd-rom command.
Turn off the machine by pressing the white button on front.
Turn the key to the Wrench or Service position.
The LCD should read STAND-BY.
Press the Enter key on the console.
A greater-than prompt (>) should display on the monitor.
Type in sbb followed by the Enter key.
The menu Stand By Menu should now display.
Select 1 Set Flags. This will take you to another set of menus.
Select 6 Fast IPL. This should change to enable after it is selected.
Enter x to exit the second set of menus.
Enter x to exit the first menu.
At a blank screen, press the Enter key to obtain the greater-than prompt (>).
Type in the word power followed by the Enter key.
Turn the machine back on. It should start to boot up. A prompt may display asking
if you want to update the firmware. Do not respond; let it continue.
Now you may be at the Maintenance Menu with 10 options displayed, 0 through 9. If
that is the case, select option 6, System Boot. This will take you to another
menu. Select option 0, Boot from the list.
The Standard Maintenance Menu should display. System recovery and maintenance
can be completed from here.
After system recovery and maintenance has been performed, the system is ready to
be rebooted into Normal mode. Enter the command mpcfg -cf 11 1 at the command
line prompt, then press Enter. This will set the Fast IPL Flag. The system is
ready to reboot.
Turn the key back to the OK/Normal position.
Enter shutdown -Fr, followed by the Enter key.


——————————————————————————–

Booting PCI-based systems into Service mode

When booting a PowerPC into Service mode, cd0 or rmt0 must be before the hdisk in the bootlist. If not, change the bootlist at boot time. On some models, you can set the machine to use a default bootlist that includes both cd0 and rmt0. If a bootable CD or tape is in the CD-ROM or tape drive, the machine will boot from this device.

For most of the newer PCI-based models, selecting the default bootlist, with a bootable tape or CD loaded in the machine, causes the system to automatically boot from that device. Generally, the next menu on the screen asks the administrator to define the system console.

For all machines discussed here, if you are using a graphical terminal, you will use a function key such as F5. If you are using an ASCII terminal, use an equivalent number key such as 5. Use the numbers across the top of the keyboard, not the numbers on the numeric keypad. On ASCII terminals, the icons may not be displayed on the screen; the number can be pressed between the second and third beeps, the second beep being a series of three clicks.


——————————————————————————–

PCI machine-specific information
The following systems all use the F5 or 5 key to read from the default boot list, which is written into the system firmware:

MODEL       7017       7024       7025       7026       7043       7137
——-   ——-    ——-    ——-    ——-    ——-    ——-
TYPE      S70        E20        F30        H10        43P-140    F3L
S7A        E30        F40        H50        43P-150
S80                   F50        H70        43P-240
B80        43P-260

On these machines, use 5 (on the keyboard, not the keypad) if you are using an ASCII terminal. On a locally attached graphics console, use the F5 function key. The F5 or 5 key must be pressed just after the keyboard icon or message is displayed on the console. If you have either a 7026-M80, 7026-H80 or a 7025-F80, then the 5 key will be the default whether you have an ascii or graphics console.

The following systems use the F1 key to enter System Management Services mode (SMS):

MODEL       6040       7042       7247       7249
——-   ——-    ——-    ——-    ——-
TYPE        620        850        82x        860

You should be in an Easy-Setup menu. Select the Start Up menu. Clear the current bootlist settings and then select the CD-ROM for choice 1 and hdd (the hard disk) for choice 2. Select OK. Insert the CD-ROM and select the EXIT icon. The machine should now boot from the CD-ROM.

The following systems use the F2 key to enter SMS:

MODEL         6015       6050       6070       7020       7248
——-     ——-    ——-    ——-    ——-    ——-
TYPE          440        830        850        40P        43P

Select Select Boot Device from the initial menu on the screen, and then select Restore Default Settings from the list. Press the Esc key to exit all the menus, and then reboot the machine. The system should boot from your bootable media.

For information on accessing the rootvg volume group, see the next section in this document.

——————————————————————————–

Accessing rootvg and mounting file systems
For AIX Version 3, choose the limited function maintenance shell (option 5 for AIX 3.1, option 4 for AIX 3.2).

If you only have one disk on the system, then hdisk0 will be used in the execution of the getrootfs or /etc/continue commands, which follow. If you have more than one disk, determine which disk contains the boot logical volume in this manner:

AIX 3.2.4 or AIX 3.2.5:

Run getrootfs; the output will indicate which disk contains the hd5 logical volume.

AIX 3.1 to AIX 3.2.3e:

Run lqueryvg -Ltp hdisk# for each hdisk. You can obtain a listing of these with the command lsdev -Cc disk. Repeat this command until you get output similar to the following:

00005264feb3631c.2  hd5  1

If more than one disk contains this output, use any disk when running getrootfs.
Now, access the rootvg volume group by running one of the following commands, using the disk you obtained in the preceding step:

AIX 3.1:                     /etc/continue hdisk#
AIX 3.2.0-3.2.3e:            getrootfs -f hdisk#
AIX 3.2.4-3.2.5:             getrootfs hdisk#

NOTE: If you want to leave the primary OS file systems (/, /usr, /tmp, and /var) unmounted after this command has completed, to run fsck, for instance, place a space and the letters sh after the hdisk in the preceding command. For example:

getrootfs hdisk0 sh

For AIX Versions 4 and 5, choose Start Maintenance Mode for System Recovery , option 3. The next screen will be called Maintenance; select option 1, Access a Root Volume Group. At the next screen, type 0 to continue, and select the appropriate volume group by typing the number next to it. A screen like the following will display.
Example:

Access a Root Volume Group

Type the number for a volume group to display the logical volume information and press Enter.

1)  Volume Group 0073656f2608e46a contains these disks:
hdisk0  2063 04-C0-00-4,0

Once a volume group has been selected, information will be displayed about that volume group.

Example:

Volume Group Information
——————————————————————————
Volume Group ID 0073656f2608e46a includes the following logical volumes:
hd6         hd5         hd8         hd4         hd2      hd9var
hd3         hd1
——————————————————————————

Type the number of your choice and press Enter.

1) Access this Volume Group and start a shell
2) Access this Volume Group and start a shell before mounting filesystems
99) Previous Menu

If the logical volumes listed do not include logical volumes like hd4, hd2, hd3, and so on, you may have selected the wrong volume group. Press 99 to back up one screen and select again.

Now you may select one of two options: Access this volume group and start a shell , option 1, or Access this volume group and start a shell before mounting file systems , option 2. Option 2 allows you to perform file system maintenance on /, /usr, /tmp, and /var before mounting them.

NOTE: If you intend to use SMIT or vi, set your terminal type in preparation for editing the file. xx stands for a terminal type such as lft, ibm3151, or vt100.

TERM=<xx>
export TERM

Errors from these steps may indicate failed or corrupt disks in rootvg. These problems should
be corrected. For additional assistance, contact your vendor, your local branch office, or your AIX support center.

——————————————————————————–

Related documentation
For more in-depth coverage of this subject, the following IBM publication is recommended:
AIX Version 4.3 System Management Guide: Operating System and Devices
AIX Version 5.1 System Management Guide: Operating System and Devices

IBM documentation can also be accessed online through the following URL:
http://www.rs6000.ibm.com/resource/aix_resource/Pubs/index.html

Similar documents can be accessed through the following URL:
http://techsupport.services.ibm.com/server/support?view=pSeries

Add a RAM File System in Aix

Create a RAM disk of 10 MB

# mkramdisk 10M

/dev/rramdisk0

Create a JFS File System on this RAM disk

# mkfs -V jfs /dev/rramdisk0

mkfs:destroy /dev/rramdisk0 (yes) ? y

Create Mountpoint

# mkdir /ramdisk

Mount  RAM File System

# mount -V jfs -o nointegrity /dev/ramdisk0 /ramdisk

The purpose of the mkramdisk command is to create file systems directly in memory. This is useful for applications that make many temporary files. Use ramdisk only for data that can be lost. After each reboot the ramdisk file system is destroyed and must be rebuilt.

Collecting Unix System Information

At the very least, collect the following information for each system that you have:
1. Hostname:
% hostname
2. Hostname aliases:
% grep `hostname` /etc/hosts | awk ‘{ print $3 }’
3. Host network addresses:
% grep `hostname` /etc/hosts | awk ‘{ print $1 }’
4.   Host ID:
% hostid
5. System serial number:
On the back of most all computers.
6. Manufacturer of the system’s hardware:
On the front of most computers
7. System model name:
On the front of most computers
8. CPU type:
% uname -a
9. Application architecture:
% uname -a
10.   Kernel architecture:
% uname -a
11.  Amount of main memory:
Can be found at boot time
% dmesg
12.  Operating system name:
% uname -a
13.  Operating system version:
% uname -a
14.  Kernel version:
% uname -a
15.  Disk configuration:
% df

Solaris Cleaning up the Operating System device tree after removing LUNs

To clean up the device tree after you remove LUNs

  1. The removed devices show up as drive not available in the output of the format command:
    413. c3t5006016841e02f0Cd252 <drive not available>
            /pci@1d,700000/SUNW,qlc@1,1/fp@0,0/ssd@w5006016841e02f0c,fc
  2. After the LUNs are unmapped using Array management or the command line, Solaris also displays the devices as either unusable or failing.
    bash-3.00# cfgadm -al -o show_SCSI_LUN | grep -i unusable
    
    c6::5006016141e02f08,0         disk         connected    configured   unusable
    c6::5006016141e02f08,1         disk         connected    configured   unusable
    c6::5006016141e02f08,2         disk         connected    configured   unusable
    c6::5006016141e02f08,3         disk         connected    configured   unusable
    c6::5006016141e02f08,4         disk         connected    configured   unusable
    c6::5006016141e02f08,5         disk         connected    configured   unusable
    c6::5006016141e02f08,6         disk         connected    configured   unusable
    c6::5006016141e02f08,7         disk         connected    configured   unusable
    c6::5006016141e02f08,8         disk         connected    configured   unusable
    c6::5006016141e02f08,9         disk         connected    configured   unusable
    c6::5006016141e02f08,10        disk         connected    configured   unusable
    c6::5006016141e02f08,11        disk         connected    configured   unusable
    c6::5006016841e02f08,0         disk         connected    configured   unusable
    c6::5006016841e02f08,1         disk         connected    configured   unusable

    bash-3.00# cfgadm -al -o show_SCSI_LUN | grep -i failing
      c2::5006016841e02f03,71    disk  connected configured  failing
      c3::5006016841e02f0c,252   disk  connected configured  failing
  3. If the removed LUNs show up as failing, you need to force a LIP on the HBA. This operation probes the targets again, so that the device shows up as unusable. Unless the device shows up as unusable, it cannot be removed from the device tree.
    luxadm -e forcelip /devices/pci@1d,700000/SUNW,qlc@1,1/fp@0,0:devctl
  4. To remove the device from the cfgadm database, run the following commands on the HBA:
    cfgadm -c unconfigure -o unusable_SCSI_LUN c2::5006016841e02f03 cfgadm -c unconfigure -o unusable_SCSI_LUN c3::5006016841e02f0c 
  5. Repeat step 2 to verify that the LUNs have been removed.
  6. Clean up the device tree. The following command removes the /dev/rdsk… links to /devices.
    $devfsadm -Cv