Category Archives: Oracle

How to Rescan new LUN’s added in Linux, HP-UX, Aix, Solaris ?

HP-UX

1. Rescan the devices:

ioscan -fnC <disk|tape>

2. Generate device files:

 insf -e

3. Verify the new devices:

 ioscan -funC <disk|tape>

AIX

1. Rescan the devices ):

 cfgmgr -vl fcsx

Where x is FC adapter number

2. Verify the new devices:

 lsdev -Cc <disk|tape>

Linux

The rescan in Linux is HBA-specific.

For QLogic:

echo scsi-qlascan > /proc/scsi/qla<model#>/<adapter instance>

For Emulex:

 sh force_lpfc_scan.sh lpfc<adapter-instance>

For each identified device, run the following:

echo scsi add-single-device <host> <channel> <ID> <lun> >   /proc/scsi/scsi

Solaris

1. Determine the FC channels:

 cfgadm -al

2. Force rescan :

 cfgadm -o force_update -c configure cx

Where x is the FC channel number

3. Force rescan at HBA port level:

 luxadm -e forcelip /dev/fc/fpx

4. Force rescan on all FC devices:

 cfgadm -al -o show_FCP_dev

5. Install device files:

 devfsadm

6. Display all Qlogic HBA ports

 luxadm -e port

7. Display HBA port information

 luxadm -v display <WWPN>

8. Display HBA port information

 luxadm -e dump_map

Notes If one specific SANclient is missing a drive, please verify that your zoning is correct. Please also make sure the host initiator and VTL™s target ports are showing online via the Fibre Channel switch. (Check HBA link light and check the cable.)

 

Configuring X-Server Display For Oracle Solaris 11

What is X Window ?

The X Window System, commonly referred to as X, is a network-based graphical window system.
The X Window System uses a client-server architecture. It enables multiple programs to share and access a common set of hardware.
This hardware includes both input and display devices such as mouse, keyboards, video adapters, and monitors that are connected to the server.

The X Window System consists of X server and X clients.
The X clients are application programs that do not have direct access to the display.
They communicate with the X server which provides the display.

A proper X client-server setup is requires for Oracle’s Universal Installer ,because OUI is a X-window client which
requires a X-Server display to connect to, for its GUI.

What Packages are required ?

 

Packages Description
pkg://solaris/x11/x11-server-utilities X11 server state utilities
pkg://solaris/x11/session/xauth X authority file utility
pkg://solaris/library/motif Used for Motif-based Toolkit
pkg:/system/library/c++-runtime Sun Workshop Compilers Bundled libC
pkg:/system/library/math Math Libraries
pkg:/desktop/window-manager/twm Tab Window Manager for the X Window
pkg:/system/library System Lib files
pkg:/library/zlib The Zip compression library
pkg:/x11/library/libx11 X11 core protocol client library
pkg:/x11/library/libxau X authorization database library
pkg:/x11/library/libxdmcp X Display Manager Control Protocol
pkg:/x11/library/libxext X11 protocol common extensions client
pkg:/x11/library/libxtst libXtst – X Test and Record library
pkg:x11/library/toolkit/libxt X Toolkit Intrinsics library
pkg://solaris/terminal/xterm Terminal emulator for X
pkg://solaris/x11/xclock Displays the time in analog or digital form.

How to configure Xorg included in the Oracle Solaris OS ?

Xorg: Is installed by default .Present in directory /usr/bin/Xorg .
The Xorg server is designed to configure automatically and can run in most situations without the need to edit configuration files.
When configuration is needed, the Xorg server gathers configuration details from xorg.conf the Xorg server configuration file /etc/X11/ directory.

Note :
Xorg packages are included on the Live Media, but not with the text installer.Symbolic links for
X Window System software compatibility with other releases is also present in the directory /usr/X11R6 .

pkg info x11/server/xorg

Name: x11/server/xorg
Summary: Xorg – X11R7 X server
Publisher: solaris
Version: 1.12.2
Build Release: 5.11

How to Configure Xming ?

Download xming, install it on your windows pc system.You can go to http://sourceforge.net/projects/xming/files/ to download.

Do the following changes for Xming configuration on you windows pc:

Add a new entry in C:\Program Files\Xming\X0.hosts

ex. :

localhost
<IP address of server>

Login through putty client to the server :
export DISPLAY=<windows-pc-IP>:0.0
xhost +

To check if the X Window server-client is working You can use either of the applications :

XCLOCK-Displays the time in analog or digital form.

You will then see a clock gui pop up in your windows pc.

Which Toolkit does OUI use ?

Solaris will continue to use the MToolkit (Motif-based Toolkit) as the default in J2SE 5.0, but eventually will be replaced with XToolkit in JDK7.

The new XToolkit implementation provides the following advantages:
-Removes the dependency on Motif and Xt libraries.
-Interoperates better with other GUI Toolkits.
-Provides better performance and quality.

Setting the toolkit for an application:
Using an environment variable. This needs to be set before starting the VM.
csh:
setenv AWT_TOOLKIT XToolkit #selects the XToolkit
setenv AWT_TOOLKIT MToolkit #selects the MToolkit (Motif-based Toolkit)

export AWT_TOOLKIT=XToolkit
export AWT_TOOLKIT=MToolkit

How to configure VNC if the Xorg does not work ?

However for some reason the above method doesnot work the easiest is doing this via VNC as
it removes the requirement of having to have a X-Server installed on your client platform.
Install VNC server on your Oracle server platform as described in next step.
Xvnc:
Runs a VNC session that can be connected by using a VNC client. Virtual Network Computing (VNC) is a
remote software application that enables you to view and interact with one computer desktop,
the Xvnc server, by using the VNC viewer on another computer desktop.
The two computers do not have to be running the same type of operating system. Xvnc provides a guest domain graphical login.

pkg install xvnc

pkg info xvnc – Would list the package installed .
How to Start VNC Manually ?

Run VNC server as the oracle user .
Start the VNC server.
$ /usr/bin/vncserver
Enter the VNC server password.
Password:
Verify:
New ‘myhost:2 ()’ desktop is myhost:2

Creating default startup script /home/oracle/.vnc/xstartup
Creating default startup script /home/oracle/.vnc/xstartup
Starting applications specified in /home/oracle/.vnc/xstartup
Log file is /home/oracle/.vnc/myhost:2.log

From your client (likely Windows), run VNC viewer and connect to oracle’s VNC server display.
# vncviewer hostname:portnumber
For example:

# vncviewer myhost:2
Type the password you provided to the vncserver script.

Tuning the Unix Operating System and Platform

This chapter discusses tuning the operating system (OS) for optimum performance. It discusses the following topics:

  • Server Scaling
  • Solaris 10 Platform-Specific Tuning Information
  • Tuning for the Solaris OS
  • Tuning for Solaris on x86
  • Tuning for Linux platforms
  • Tuning UltraSPARC CMT-Based Systems

Server Scaling

This section provides recommendations for optimal performance scaling server for the following server subsystems:

  • Processors
  • Memory
  • Disk Space
  • Networking
  • UDP Buffer Sizes

Processors

The GlassFish Server automatically takes advantage of multiple CPUs. In general, the effectiveness of multiple CPUs varies with the operating system and the workload, but more processors will generally improve dynamic content performance.

Static content involves mostly input/output (I/O) rather than CPU activity. If the server is tuned properly, increasing primary memory will increase its content caching and thus increase the relative amount of time it spends in I/O versus CPU activity. Studies have shown that doubling the number of CPUs increases servlet performance by 50 to 80 percent.

Memory

See the section Hardware and Software Requirements in the GlassFish Server Release Notes for specific memory recommendations for each supported operating system.

Disk Space

It is best to have enough disk space for the OS, document tree, and log files. In most cases 2GB total is sufficient.

Put the OS, swap/paging file, GlassFish Server logs, and document tree each on separate hard drives. This way, if the log files fill up the log drive, the OS does not suffer. Also, its easy to tell if the OS paging file is causing drive activity, for example.

OS vendors generally provide specific recommendations for how much swap or paging space to allocate. Based on Oracle testing, GlassFish Server performs best with swap space equal to RAM, plus enough to map the document tree.

Networking

To determine the bandwidth the application needs, determine the following values:

  • The number of peak concurrent users (N peak) the server needs to handle.
  • The average request size on your site, r. The average request can include multiple documents. When in doubt, use the home page and all its associated files and graphics.
  • Decide how long, t, the average user will be willing to wait for a document at peak utilization.

Then, the bandwidth required is:

Npeakr / t

For example, to support a peak of 50 users with an average document size of 24 Kbytes, and transferring each document in an average of 5 seconds, requires 240 Kbytes (1920 Kbit/s). So the site needs two T1 lines (each 1544 Kbit/s). This bandwidth also allows some overhead for growth.

The server’s network interface card must support more than the WAN to which it is connected. For example, if you have up to three T1 lines, you can get by with a 10BaseT interface. Up to a T3 line (45 Mbit/s), you can use 100BaseT. But if you have more than 50 Mbit/s of WAN bandwidth, consider configuring multiple 100BaseT interfaces, or look at Gigabit Ethernet technology.

UDP Buffer Sizes

GlassFish Server uses User Datagram Protocol (UDP) for the transmission of multicast messages to GlassFish Server instances in a cluster. For peak performance from a GlassFish Server cluster that uses UDP multicast, limit the need to retransmit UDP messages. To limit the need to retransmit UDP messages, set the size of the UDP buffer to avoid excessive UDP datagram loss.

To Determine an Optimal UDP Buffer Size

The size of UDP buffer that is required to prevent excessive UDP datagram loss depends on many factors, such as:

  • The number of instances in the cluster
  • The number of instances on each host
  • The number of processors
  • The amount of memory
  • The speed of the hard disk for virtual memory

If only one instance is running on each host in your cluster, the default UDP buffer size should suffice. If several instances are running on each host, determine whether the UDP buffer is large enough by testing for the loss of UDP packets.

Note:

On Linux systems, the default UDP buffer size might be insufficient even if only one instance is running on each host. In this situation, set the UDP buffer size as explained in To Set the UDP Buffer Size on Linux Systems.

  1. Ensure that no GlassFish Server clusters are running.

If necessary, stop any running clusters as explained in “To Stop a Cluster” in Oracle GlassFish Server High Availability Administration Guide.

  1. Determine the absolute number of lost UDP packets when no clusters are running.

How you determine the number of lost packets depends on the operating system. For example:

  • On Linux systems, use the netstat -su command and look for the packet receive errors count in the Udp section.
  • On AIX systems, use the netstat -s command and look for the fragments dropped (dup or out of space) count in the ip section.
  1. Start all the clusters that are configured for your installation of GlassFish Server.

Start each cluster as explained in “To Start a Cluster” in Oracle GlassFish Server High Availability Administration Guide.

  1. Determine the absolute number of lost UDP packets after the clusters are started.
  2. If the difference in the number of lost packets is significant, increase the size of the UDP buffer.

To Set the UDP Buffer Size on Linux Systems

On Linux systems, a default UDP buffer size is set for the client, but not for the server. Therefore, on Linux systems, the UDP buffer size might have to be increased. Setting the UDP buffer size involves setting the following kernel parameters:

  • net.core.rmem_max
  • net.core.wmem_max
  • net.core.rmem_default
  • net.core.wmem_default

Set the kernel parameters in the /etc/sysctl.conf file or at runtime.

If you set the parameters in the /etc/sysctl.conf file, the settings are preserved when the system is rebooted. If you set the parameters at runtime, the settings are not preserved when the system is rebooted.

  • To set the parameters in the /etc/sysctl.conf file, add or edit the following lines in the file:
  • To set the parameters at runtime, use the sysctl command.
·         net.core.rmem_max=rmem-max
·         net.core.wmem_max=wmem-max
·         net.core.rmem_default=rmem-default
·         net.core.wmem_default=wmem-default
·         $ /sbin/sysctl -w net.core.rmem_max=rmem-max
·         net.core.wmem_max=wmem-max
·         net.core.rmem_default=rmem-default
·         net.core.wmem_default=wmem-default

Example 5-1 Setting the UDP Buffer Size in the /etc/sysctl.conf File

This example shows the lines in the /etc/sysctl.conf file for setting the kernel parameters for controlling the UDP buffer size to 524288.

net.core.rmem_max=524288
net.core.wmem_max=524288
net.core.rmem_default=524288
net.core.wmem_default=524288

Example 5-2 Setting the UDP Buffer Size at Runtime

This example sets the kernel parameters for controlling the UDP buffer size to 524288 at runtime.

$ /sbin/sysctl -w net.core.rmem_max=524288
net.core.wmem_max=52428
net.core.rmem_default=52428
net.core.wmem_default=524288
net.core.rmem_max = 524288
net.core.wmem_max = 52428
net.core.rmem_default = 52428
net.core.wmem_default = 524288

Solaris 10 Platform-Specific Tuning Information

Solaris Dynamic Tracing (DTrace) is a comprehensive dynamic tracing framework for the Solaris Operating System (OS). You can use the DTrace Toolkit to monitor the system. The DTrace Toolkit is available through the OpenSolaris project from the DTraceToolkit page.

Tuning for the Solaris OS

  • Tuning Parameters
  • File Descriptor Setting

Tuning Parameters

Tuning Solaris TCP/IP settings benefits programs that open and close many sockets. Since the GlassFish Server operates with a small fixed set of connections, the performance gain might not be significant.

The following table shows Solaris tuning parameters that affect performance and scalability benchmarking. These values are examples of how to tune your system for best performance.

Table 5-1 Tuning Parameters for Solaris

Parameter Scope Default Tuned Value Comments
rlim_fd_max /etc/system 65536 65536 Limit of process open file descriptors. Set to account for expected load (for associated sockets, files, and pipes if any).
rlim_fd_cur /etc/system 1024 8192
sq_max_size /etc/system 2 0 Controls streams driver queue size; setting to 0 makes it infinite so the performance runs won’t be hit by lack of buffer space. Set on clients too. Note that setting sq_max_size to 0 might not be optimal for production systems with high network traffic.
tcp_close_wait_interval ndd /dev/tcp 240000 60000 Set on clients too.
tcp_time_wait_interval ndd /dev/tcp 240000 60000 Set on clients too.
tcp_conn_req_max_q ndd /dev/tcp 128 1024
tcp_conn_req_max_q0 ndd /dev/tcp 1024 4096
tcp_ip_abort_interval ndd /dev/tcp 480000 60000
tcp_keepalive_interval ndd /dev/tcp 7200000 900000 For high traffic web sites, lower this value.
tcp_rexmit_interval_initial ndd /dev/tcp 3000 3000 If retransmission is greater than 30-40%, you should increase this value.
tcp_rexmit_interval_max ndd /dev/tcp 240000 10000
tcp_rexmit_interval_min ndd /dev/tcp 200 3000
tcp_smallest_anon_port ndd /dev/tcp 32768 1024 Set on clients too.
tcp_slow_start_initial ndd /dev/tcp 1 2 Slightly faster transmission of small amounts of data.
tcp_xmit_hiwat ndd /dev/tcp 8129 32768 Size of transmit buffer.
tcp_recv_hiwat ndd /dev/tcp 8129 32768 Size of receive buffer.
tcp_conn_hash_size ndd /dev/tcp 512 8192 Size of connection hash table. See Sizing the Connection Hash Table.

 

Sizing the Connection Hash Table

The connection hash table keeps all the information for active TCP connections. Use the following command to get the size of the connection hash table:

ndd -get /dev/tcp tcp_conn_hash

This value does not limit the number of connections, but it can cause connection hashing to take longer. The default size is 512.

To make lookups more efficient, set the value to half of the number of concurrent TCP connections that are expected on the server. You can set this value only in /etc/system, and it becomes effective at boot time.

Use the following command to get the current number of TCP connections.

netstat -nP tcp|wc -l

File Descriptor Setting

On the Solaris OS, setting the maximum number of open files property using ulimit has the biggest impact on efforts to support the maximum number of RMI/IIOP clients.

To increase the hard limit, add the following command to /etc/system and reboot it once:

set rlim_fd_max = 8192

Verify this hard limit by using the following command:

ulimit -a -H

Once the above hard limit is set, increase the value of this property explicitly (up to this limit) using the following command:

ulimit -n 8192

Verify this limit by using the following command:

ulimit -a

For example, with the default ulimit of 64, a simple test driver can support only 25 concurrent clients, but with ulimit set to 8192, the same test driver can support 120 concurrent clients. The test driver spawned multiple threads, each of which performed a JNDI lookup and repeatedly called the same business method with a think (delay) time of 500 ms between business method calls, exchanging data of about 100 KB. These settings apply to RMI/IIOP clients on the Solaris OS.

Tuning for Solaris on x86

The following are some options to consider when tuning Solaris on x86 for GlassFish Server:

  • File Descriptors
  • IP Stack Settings

Some of the values depend on the system resources available. After making any changes to /etc/system, reboot the machines.

File Descriptors

Add (or edit) the following lines in the /etc/system file:

set rlim_fd_max=65536
set rlim_fd_cur=65536
set sq_max_size=0
set tcp:tcp_conn_hash_size=8192
set autoup=60
set pcisch:pci_stream_buf_enable=0

These settings affect the file descriptors.

IP Stack Settings

Add (or edit) the following lines in the /etc/system file:

set ip:tcp_squeue_wput=1
set ip:tcp_squeue_close=1
set ip:ip_squeue_bind=1
set ip:ip_squeue_worker_wait=10
set ip:ip_squeue_profile=0

These settings tune the IP stack.

To preserve the changes to the file between system reboots, place the following changes to the default TCP variables in a startup script that gets executed when the system reboots:

ndd -set /dev/tcp tcp_time_wait_interval 60000
ndd -set /dev/tcp tcp_conn_req_max_q 16384
ndd -set /dev/tcp tcp_conn_req_max_q0 16384
ndd -set /dev/tcp tcp_ip_abort_interval 60000
ndd -set /dev/tcp tcp_keepalive_interval 7200000
ndd -set /dev/tcp tcp_rexmit_interval_initial 4000
ndd -set /dev/tcp tcp_rexmit_interval_min 3000
ndd -set /dev/tcp tcp_rexmit_interval_max 10000
ndd -set /dev/tcp tcp_smallest_anon_port 32768
ndd -set /dev/tcp tcp_slow_start_initial 2
ndd -set /dev/tcp tcp_xmit_hiwat 32768
ndd -set /dev/tcp tcp_recv_hiwat 32768

Tuning for Linux platforms

To tune for maximum performance on Linux, you need to make adjustments to the following:

  • Startup Files
  • File Descriptors
  • Virtual Memory
  • Network Interface
  • Disk I/O Settings
  • TCP/IP Settings

Startup Files

The following parameters must be added to the /etc/rc.d/rc.local file that gets executed during system startup.

<-- begin
#max file count updated ~256 descriptors per 4Mb.
Specify number of file descriptors based on the amount of system RAM.
echo "6553"> /proc/sys/fs/file-max
#inode-max 3-4 times the file-max
#file not present!!!!!
#echo"262144"> /proc/sys/fs/inode-max
#make more local ports available
echo 1024 25000> /proc/sys/net/ipv4/ip_local_port_range
#increase the memory available with socket buffers
echo 2621143> /proc/sys/net/core/rmem_max
echo 262143> /proc/sys/net/core/rmem_default
#above configuration for 2.4.X kernels
echo 4096 131072 262143> /proc/sys/net/ipv4/tcp_rmem
echo 4096 13107262143> /proc/sys/net/ipv4/tcp_wmem
#disable "RFC2018 TCP Selective Acknowledgements," and
"RFC1323 TCP timestamps" echo 0> /proc/sys/net/ipv4/tcp_sack
echo 0> /proc/sys/net/ipv4/tcp_timestamps
#double maximum amount of memory allocated to shm at runtime
echo "67108864"> /proc/sys/kernel/shmmax
#improve virtual memory VM subsystem of the Linux
echo "100 1200 128 512 15 5000 500 1884 2"> /proc/sys/vm/bdflush
#we also do a sysctl
sysctl -p /etc/sysctl.conf
-- end -->

Additionally, create an /etc/sysctl.conf file and append it with the following values:

<-- begin
 #Disables packet forwarding
net.ipv4.ip_forward = 0
#Enables source route verification
net.ipv4.conf.default.rp_filter = 1
#Disables the magic-sysrq key
kernel.sysrq = 0
fs.file-max=65536
vm.bdflush = 100 1200 128 512 15 5000 500 1884 2
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_max= 262143
net.core.rmem_default = 262143
net.ipv4.tcp_rmem = 4096 131072 262143
net.ipv4.tcp_wmem = 4096 131072 262143
net.ipv4.tcp_sack = 0
net.ipv4.tcp_timestamps = 0
kernel.shmmax = 67108864

File Descriptors

You may need to increase the number of file descriptors from the default. Having a higher number of file descriptors ensures that the server can open sockets under high load and not abort requests coming in from clients.

Start by checking system limits for file descriptors with this command:

cat /proc/sys/fs/file-max
8192

The current limit shown is 8192. To increase it to 65535, use the following command (as root):

echo "65535"> /proc/sys/fs/file-max

To make this value to survive a system reboot, add it to /etc/sysctl.conf and specify the maximum number of open files permitted:

fs.file-max = 65535

Note that the parameter is not proc.sys.fs.file-max, as one might expect.

To list the available parameters that can be modified using sysctl:

sysctl -a

To load new values from the sysctl.conf file:

sysctl -p /etc/sysctl.conf

To check and modify limits per shell, use the following command:

limit

The output will look something like this:

cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       8192 kbytes
coredumpsize    0 kbytes
memoryuse       unlimited
descriptors     1024
memorylocked    unlimited
maxproc         8146
openfiles       1024

The openfiles and descriptors show a limit of 1024. To increase the limit to 65535 for all users, edit /etc/security/limits.conf as root, and modify or add the nofile setting (number of file) entries:

*         soft    nofile                     65535
*         hard    nofile                     65535

The character “*” is a wildcard that identifies all users. You could also specify a user ID instead.

Then edit /etc/pam.d/login and add the line:

session required /lib/security/pam_limits.so

On Red Hat, you also need to edit /etc/pam.d/sshd and add the following line:

session required /lib/security/pam_limits.so

On many systems, this procedure will be sufficient. Log in as a regular user and try it before doing the remaining steps. The remaining steps might not be required, depending on how pluggable authentication modules (PAM) and secure shell (SSH) are configured.

Virtual Memory

To change virtual memory settings, add the following to /etc/rc.local:

echo 100 1200 128 512 15 5000 500 1884 2> /proc/sys/vm/bdflush

For more information, view the man pages for bdflush.

Network Interface

To ensure that the network interface is operating in full duplex mode, add the following entry into /etc/rc.local:

mii-tool -F 100baseTx-FD eth0

where eth0 is the name of the network interface card (NIC).

Disk I/O Settings

 

To tune disk I/O performance for non SCSI disks

  1. Test the disk speed.

Use this command:

/sbin/hdparm -t /dev/hdX
  1. Enable direct memory access (DMA).

Use this command:

/sbin/hdparm -d1 /dev/hdX
  1. Check the speed again using the hdparm command.

Given that DMA is not enabled by default, the transfer rate might have improved considerably. In order to do this at every reboot, add the /sbin/hdparm -d1 /dev/hdX line to /etc/conf.d/local.start, /etc/init.d/rc.local, or whatever the startup script is called.

For information on SCSI disks, see: System Tuning for Linux Servers — SCSI.

TCP/IP Settings

 

To tune the TCP/IP settings

  1. Add the following entry to /etc/rc.local
  2. Add the following to /etc/sysctl.conf
  3. Add the following as the last entry in /etc/rc.local
  4. Reboot the system.
  5. Use this command to increase the size of the transmit buffer:
2.  echo 30> /proc/sys/net/ipv4/tcp_fin_timeout
3.  echo 60000> /proc/sys/net/ipv4/tcp_keepalive_time
4.  echo 15000> /proc/sys/net/ipv4/tcp_keepalive_intvl
5.  echo 0> /proc/sys/net/ipv4/tcp_window_scaling
7.  # Disables packet forwarding
8.  net.ipv4.ip_forward = 0
9.  # Enables source route verification
10.net.ipv4.conf.default.rp_filter = 1
11.# Disables the magic-sysrq key
12.kernel.sysrq = 0
13.net.ipv4.ip_local_port_range = 1204 65000
14.net.core.rmem_max = 262140
15.net.core.rmem_default = 262140
16.net.ipv4.tcp_rmem = 4096 131072 262140
17.net.ipv4.tcp_wmem = 4096 131072 262140
18.net.ipv4.tcp_sack = 0
19.net.ipv4.tcp_timestamps = 0
20.net.ipv4.tcp_window_scaling = 0
21.net.ipv4.tcp_keepalive_time = 60000
22.net.ipv4.tcp_keepalive_intvl = 15000
23.net.ipv4.tcp_fin_timeout = 30
25.sysctl -p /etc/sysctl.conf
28.tcp_recv_hiwat ndd /dev/tcp 8129 32768

Tuning UltraSPARC CMT-Based Systems

Use a combination of tunable parameters and other parameters to tune UltraSPARC CMT-based systems. These values are an example of how you might tune your system to achieve the desired result.

Tuning Operating System and TCP Settings

The following table shows the operating system tuning for Solaris 10 used when benchmarking for performance and scalability on UtraSPARC CMT-based systems (64-bit systems).

Table 5-2 Tuning 64-bit Systems for Performance Benchmarking

Parameter Scope Default Value Tuned Value Comments
rlim_fd_max /etc/system 65536 260000 Process open file descriptors limit; should account for the expected load (for the associated sockets, files, pipes if any).
hires_tick /etc/system 1
sq_max_size /etc/system 2 0 Controls streams driver queue size; setting to 0 makes it infinite so the performance runs won’t be hit by lack of buffer space. Set on clients too. Note that setting sq_max_size to 0 might not be optimal for production systems with high network traffic.
ip:ip_squeue_bind 0
ip:ip_squeue_fanout 1
ipge:ipge_taskq_disable /etc/system 0
ipge:ipge_tx_ring_size /etc/system 2048
ipge:ipge_srv_fifo_depth /etc/system 2048
ipge:ipge_bcopy_thresh /etc/system 384
ipge:ipge_dvma_thresh /etc/system 384
ipge:ipge_tx_syncq /etc/system 1
tcp_conn_req_max_q ndd /dev/tcp 128 3000
tcp_conn_req_max_q0 ndd /dev/tcp 1024 3000
tcp_max_buf ndd /dev/tcp 4194304
tcp_cwnd_max ndd/dev/tcp 2097152
tcp_xmit_hiwat ndd /dev/tcp 8129 400000 To increase the transmit buffer.
tcp_recv_hiwat ndd /dev/tcp 8129 400000 To increase the receive buffer.

 

Note that the IPGE driver version is 1.25.25.

Disk Configuration

If HTTP access is logged, follow these guidelines for the disk:

  • Write access logs on faster disks or attached storage.
  • If running multiple instances, move the logs for each instance onto separate disks as much as possible.
  • Enable the disk read/write cache. Note that if you enable write cache on the disk, some writes might be lost if the disk fails.
  • Consider mounting the disks with the following options, which might yield better disk performance: nologging, directio, noatime.

Network Configuration

If more than one network interface card is used, make sure the network interrupts are not all going to the same core. Run the following script to disable interrupts:

allpsr=`/usr/sbin/psrinfo | grep -v off-line | awk '{ print $1 }'`
   set $allpsr
   numpsr=$#
   while [ $numpsr -gt 0 ];
   do
       shift
       numpsr=`expr $numpsr - 1`
       tmp=1
       while [ $tmp -ne 4 ];
       do
           /usr/sbin/psradm -i $1
           shift
           numpsr=`expr $numpsr - 1`
           tmp=`expr $tmp + 1`
       done
   done

Put all network interfaces into a single group. For example:

$ifconfig ipge0 group webserver
$ifconfig ipge1 group webserver

Operating System Tuning for Oracle Database

This chapter describes how to tune Oracle Database. It contains the following sections:

  • Importance of Tuning
  • Operating System Tools
  • Tuning Memory Management
  • Tuning Disk I/O
  • Monitoring Disk Performance
  • System Global Area
  • Tuning the Operating System Buffer Cache

1.1 Importance of Tuning

Oracle Database is a highly optimizable software product. Frequent tuning optimizes system performance and prevents data bottlenecks.

Before tuning the database, you must observe its normal behavior by using the tools described in the “Operating System Tools” section.

1.2 Operating System Tools

Several operating system tools are available to enable you to assess database performance and determine database requirements. In addition to providing statistics for Oracle processes, these tools provide statistics for CPU usage, interrupts, swapping, paging, context switching, and I/O for the entire system.

This section provides information about the following common tools:

  • vmstat
  • sar
  • iostat
  • swap, swapinfo, swapon, or lsps
  • AIX Tools
  • HP-UX Tools
  • Linux Tools
  • Solaris Tools
  • Mac OS X Tools

See Also:

The operating system documentation and man pages for more information about these tools

1.2.1 vmstat

Note:

On Mac OS X, the vm_stat command displays virtual memory information. Refer to the vm_stat man page for more information about using this command.

Use the vmstat command to view process, virtual memory, disk, trap, and CPU activity, depending on the switches that you supply with the command. Run one of the following commands to display a summary of CPU activity six times, at five-second intervals:

  • On HP-UX and Solaris:
  • AIX, Linux, and Tru64 UNIX:
·         $ vmstat -S 5 6
·         $ vmstat 5 6

The following is sample output of this command on HP-UX:

procs     memory            page            disk          faults      cpu
 r b w   swap  free  si  so pi po fr de sr f0 s0 s1 s3   in   sy   cs us sy id
 0 0 0   1892  5864   0   0  0  0  0  0  0  0  0  0  0   90   74   24  0  0 99
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   46   25   21  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   47   20   18  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  2   53   22   20  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   87   23   21  0  0 100
 0 0 0  85356  8372   0   0  0  0  0  0  0  0  0  0  0   48   41   23  0  0 100

The w sub column, under the procs column, shows the number of potential processes that have been swapped out and written to disk. If the value is not zero, then swapping occurs and the system is short of memory.

The si and so columns under the page column indicate the number of swap-ins and swap-outs per second, respectively. Swap-ins and swap-outs should always be zero.

The sr column under the page column indicates the scan rate. High scan rates are caused by a shortage of available memory.

The pi and po columns under the page column indicate the number of page-ins and page-outs per second, respectively. It is normal for the number of page-ins and page-outs to increase. Some paging always occurs even on systems with sufficient available memory.

Note:

The output from the vmstat command differs across platforms.

See Also:

Refer to the man page for information about interpreting the output

8.2.2 sar

Depending on the switches that you supply with the command, use the sar (system activity reporter) command to display cumulative activity counters in the operating system.

Note:

On Tru64 UNIX systems, the sar command is available in the UNIX SVID2 compatibility subset, OSFSVID.

On an HP-UX system, the following command displays a summary of I/O activity ten times, at ten-second intervals:

$ sar -b 10 10

The following example shows the output of this command:

13:32:45 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
13:32:55       0      14     100       3      10      69       0       0
13:33:05       0      12     100       4       4       5       0       0
13:33:15       0       1     100       0       0       0       0       0
13:33:25       0       1     100       0       0       0       0       0
13:33:35       0      17     100       5       6       7       0       0
13:33:45       0       1     100       0       0       0       0       0
13:33:55       0       9     100       2       8      80       0       0
13:34:05       0      10     100       4       4       5       0       0
13:34:15       0       7     100       2       2       0       0       0
13:34:25       0       0     100       0       0     100       0       0

Average        0       7     100       2       4      41       0       0

The sar output provides a snapshot of system I/O activity at a given point in time. If you specify the interval time with more than one option, then the output can become difficult to read. If you specify an interval time of less than 5, then the sar activity itself can affect the output.

See Also:

The man page for more information about sar

1.2.3 iostat

Use the iostat command to view terminal and disk activity, depending on the switches that you supply with the command. The output from the iostat command does not include disk request queues, but it shows which disks are busy. This information can be used to balance I/O loads.

The following command displays terminal and disk activity five times, at five-second intervals:

$ iostat 5 5

The following is sample output of the command on Solaris:

tty          fd0           sd0           sd1           sd3          cpu
 tin tout Kps tps serv  Kps tps serv  Kps tps serv  Kps tps serv  us sy wt id
   0    1   0   0    0    0   0   31    0   0   18    3   0   42   0  0  0 99
   0   16   0   0    0    0   0    0    0   0    0    1   0   14   0  0  0 100
   0   16   0   0    0    0   0    0    0   0    0    0   0    0   0  0  0 100
   0   16   0   0    0    0   0    0    0   0    0    0   0    0   0  0  0 100
   0   16   0   0    0    0   0    0    2   0   14   12   2   47   0  0  1 98

Use the iostat command to look for large disk request queues. A request queue shows how long the I/O requests on a particular disk device must wait to be serviced. Request queues are caused by a high volume of I/O requests to that disk or by I/O with long average seek times. Ideally, disk request queues should be at or near zero.

1.2.4 swap, swapinfo, swapon, or lsps

See Also:

“Determining Available and Used Swap Space” for information about swap space on Mac OS X systems

Use the swap, swapinfo, swapon, or lsps command to report information about swap space usage. A shortage of swap space can stop processes responding, leading to process failures with Out of Memory errors. The following table lists the appropriate command to use for each platform.

Platform Command
AIX lsps -a
HP-UX swapinfo -m
Linux and Tru64 UNIX swapon -s
Solaris swap -l and swap -s

 

The following example shows sample output from the swap -l command on Solaris:

swapfile             dev        swaplo blocks        free
/dev/dsk/c0t3d0s1    32,25      8      197592        162136

1.2.5 AIX Tools

The following sections describe tools available on AIX systems.

  • Base Operation System Tools
  • Performance Toolbox
  • System Management Interface Tool

See Also:

The AIX operating system documentation and man pages for more information about these tools

1.2.5.1 Base Operation System Tools

The AIX Base Operation System (BOS) contains performance tools that are historically part of UNIX systems or are required to manage the implementation-specific features of AIX. The following table lists the most important BOS tools.

Tool Function
lsattr Displays the attributes of devices
lslv Displays information about a logical volume or the logical volume allocations of a physical volume
netstat Displays the contents of network-related data structures
nfsstat Displays statistics about Network File System (NFS) and Remote Procedure Call (RPC) activity
nice Changes the initial priority of a process
no Displays or sets network options
ps Displays the status of one or more processes
reorgvg Reorganizes the physical-partition allocation within a volume group
time Displays the elapsed execution, user CPU processing, and system CPU processing time
trace Records and reports selected system events
vmo Manages Virtual Memory Manager tunable parameters

 

1.2.5.2 Performance Toolbox

The AIX Performance Toolbox (PTX) contains tools for monitoring and tuning system activity locally and remotely. PTX consists of two main components, the PTX Manager and the PTX Agent. The PTX Manager collects and displays data from various systems in the configuration by using the xmperf utility. The PTX Agent collects and transmits data to the PTX Manager by using the xmserd daemon. The PTX Agent is also available as a separate product called Performance Aide for AIX.

Both PTX and Performance Aide include the monitoring and tuning tools listed in the following table.

Tool Description
fdpr Optimizes an executable program for a particular workload
filemon Uses the trace facility to monitor and report the activity of the file system
fileplace Displays the placement of blocks of a file within logical or physical volumes
lockstat Displays statistics about contention for kernel locks
lvedit Facilitates interactive placement of logical volumes within a volume group
netpmon Uses the trace facility to report on network I/O and network-related CPU usage
rmss Simulates systems with various memory sizes for performance testing
svmon Captures and analyzes information about virtual-memory usage
syscalls Records and counts system calls
tprof Uses the trace facility to report CPU usage at module and source-code-statement levels
BigFoot Reports the memory access patterns of processes
stem Permits subroutine-level entry and exit instrumentation of existing executables

 

See Also:

  • Performance Toolbox for AIX Guide and Reference for information about these tools
  • AIX 5L Performance Management Guide for information about the syntax of some of these tools

1.2.5.3 System Management Interface Tool

The AIX System Management Interface Tool (SMIT) provides a menu-driven interface to various system administrative and performance tools. By using SMIT, you can navigate through large numbers of tools and focus on the jobs that you want to perform.

1.2.6 HP-UX Tools

The following performance analysis tools are available on HP-UX systems:

  • GlancePlus/UX

This HP-UX utility is an online diagnostic tool that measures the activities of the system. GlancePlus displays information about how system resources are used. It displays dynamic information about the system I/O, CPU, and memory usage on a series of screens. You can use the utility to monitor how individual processes are using resources.

  • HP PAK

HP Programmer’s Analysis Kit (HP PAK) consists of the following tools:

  • Puma

This tool collects performance statistics during a program run. It provides several graphical displays for viewing and analyzing the collected statistics.

  • Thread Trace Visualizer (TTV)

This tool displays trace files produced by the instrumented thread library, libpthread_tr.sl, in a graphical format. It enables you to view how threads are interacting and to find where threads are blocked waiting for resources.

HP PAK is bundled with the HP Fortran 77, HP Fortran 90, HP C, HP C++, HP ANSI C++, and HP Pascal compilers.

The following table lists the performance tuning tools that you can use for additional performance tuning on HP-UX.

Tools Function
caliper (Itanium only) Collects run-time application data for system analysis tasks such as cache misses, translation look-aside buffer (TLB) or instruction cycles, along with fast dynamic instrumentation. It is a dynamic performance measurement tool for C, C++, Fortran, and assembly applications.
gprof Creates an execution profile for programs.
monitor Monitors the program counter and calls to certain functions.
netfmt Monitors the network.
netstat Reports statistics on network performance.
nfsstat Displays statistics about Network File System (NFS) and Remote Procedure Call (RPC) activity.
nettl Captures network events or packets by logging and tracing.
prof Creates an execution profile of C programs and displays performance statistics for your program, showing where your program is spending most of its execution time.
profil Copies program counter information into a buffer.
top Displays the top processes on the system and periodically updates the information.

 

1.2.7 Linux Tools

On Linux systems, use the top, free, and cat /proc/meminfo commands to view information about swap space, memory, and buffer usage.

1.2.8 Solaris Tools

On Solaris systems, use the mpstat command to view statistics for each processor in a multiprocessor system. Each row of the table represents the activity of one processor. The first row summarizes all activity since the last system restart. Each subsequent row summarizes activity for the preceding interval. All values are events per second unless otherwise noted. The arguments are for time intervals between statistics and number of iterations.

The following example shows sample output from the mpstat command:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    1    71   21   23    0    0    0    0    55    0   0   0  99
  2    0   0    1    71   21   22    0    0    0    0    54    0   0   0  99
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0    61   16   25    0    0    0    0    57    0   0   0 100
  2    1   0    0    72   16   24    0    0    0    0    59    0   0   0 100

1.2.9 Mac OS X Tools

You can use the following additional performance tuning tools:

  • Use the top command to display information about running processes and memory usage.
  • Use the Apple Computer Hardware Understanding Developer (CHUD) tools, such as Shark and BigTop, to monitor system activity and tune applications.

See Also:

For more information about the CHUD tools, refer to

http://developer.apple.com/library/mac/#documentation/Performance/Conceptual/PerformanceOverview/Introduction/Introduction.html

1.3 Tuning Memory Management

Start the memory tuning process by measuring paging and swapping space to determine how much memory is available. After you determine your system memory usage, tune the Oracle buffer cache.

The Oracle buffer manager ensures that the most frequently accessed data is cached longer. If you monitor the buffer manager and tune the buffer cache, then you can significantly improve Oracle Database performance. The optimal Oracle Database buffer size for your system depends on the overall system load and the relative priority of Oracle Database over other applications.

This section includes the following topics:

  • Allocating Sufficient Swap Space
  • Controlling Paging
  • Adjusting Oracle Block Size

8.3.1 Allocating Sufficient Swap Space

Try to minimize swapping because it causes significant operating system overhead. To check for swapping, use the sar or vmstat commands. For information about the appropriate options to use with these commands, refer to the man pages.

If your system is swapping and you must conserve memory, then:

  • Avoid running unnecessary system daemon processes or application processes.
  • Decrease the number of database buffers to free some memory.
  • Decrease the number of operating system file buffers, especially if you are using raw devices.

Note:

On Mac OS X systems, swap space is allocated dynamically. If the operating system requires more swap space, then it creates additional swap files in the /private/var/vm directory. Ensure that the file system that contains this directory has sufficient free disk space to accommodate additional swap files. Refer “Determining Available and Used Swap Space” for more information on allocating swap space.

To determine the amount of swap space, run one of the following commands, depending on your platform:

Platform Command
AIX lsps -a
HP-UX swapinfo -m
Linux swapon -s
Solaris swap -l and swap -s
Tru64 UNIX swapon -s

 

To add swap space to your system, run one of the following commands, depending on your platform:

Platform Command
AIX chps or mkps
HP-UX swapon
Linux swapon -a
Solaris swap -a
Tru64 UNIX swapon -a

 

Set the swap space to between two and four times the physical memory. Monitor the use of swap space, and increase it as required.

See Also:

The operating system documentation for more information about these commands

1.3.2 Controlling Paging

Paging may not present as serious a problem as swapping, because an entire program does not have to be stored in memory to run. A small number of page-outs may not noticeably affect the performance of your system.

To detect excessive paging, run measurements during periods of fast response or idle time to compare against measurements from periods of slow response.

Use the vmstat (vm_stat on Mac OS X) or sar command to monitor paging.

See Also:

The man pages or your operating system documentation for information about interpreting the results for your platform

The following table lists the important columns from the output of these commands.

Platform Column Function
Solaris vflt/s Indicates the number of address translation page faults. Address translation faults occur when a process refers to a valid page not in memory.
Solaris rclm/s Indicates the number of valid pages that have been reclaimed and added to the free list by page-out activity. This value should be zero.
HP-UX at Indicates the number of address translation page faults. Address translation faults occur when a process refers to a valid page not in memory.
HP-UX re Indicates the number of valid pages that have been reclaimed and added to the free list by page-out activity. This value should be zero.

 

If your system consistently has excessive page-out activity, then consider the following solutions:

  • Install more memory.
  • Move some of the work to another system.
  • Configure the System Global Area (SGA) to use less memory.

1.3.3 Adjusting Oracle Block Size

During read operations, entire operating system blocks are read from the disk. If the database block size is smaller than the operating system file system block size, then I/O bandwidth is inefficient. If you set Oracle Database block size to be a multiple of the file system block size, then you can increase performance by up to 5 percent.

The DB_BLOCK_SIZE initialization parameter sets the database block size. However, to change the value of this parameter, you must re-create the database.

To see the current value of the DB_BLOCK_SIZE parameter, run the SHOW PARAMETER DB_BLOCK_SIZE command in SQL*Plus.

1.4 Tuning Disk I/O

Balance I/O evenly across all available disks to reduce disk access times. For smaller databases and those not using RAID, ensure that different data files and tablespaces are distributed across the available disks.

1.4.1 Using Automatic Storage Management

If you choose to use Automatic Storage Management for database storage, then all database I/O is balanced across all available disk devices in the Automatic Storage Management disk group. Automatic Storage Management provides the performance of raw device I/O without the inconvenience of managing raw devices.

By using Automatic Storage Management, you avoid manually tuning disk I/O.

1.4.2 Choosing the Appropriate File System Type

Depending on your operating system, you can choose from a range of file system types. Each file system type has different characteristics. This fact can have a substantial impact on database performance. The following table lists common file system types.

File System Platform Description
S5 HP-UX and Solaris UNIX System V file system
UFS AIX, HP-UX, Mac OS X, Solaris, Tru64 UNIX Unified file system, derived from BSD UNIXNote: On Mac OS X, Oracle does not recommend the use of the UFS file system for either software or database files.
VxFS AIX, HP-UX, and Solaris VERITAS file system
None All Raw devices (no file system)
ext2/ext3 Linux Extended file system for Linux
OCFS Linux Oracle cluster file system
AdvFS Tru64 UNIX Advanced file system
CFS Tru64 UNIX Cluster file system
JFS/JFS2 AIX Journaled file system
HFS Plus, HFSX Mac OS X HFS Plus is the standard hierarchical file system used by Mac OS X. HFSX is an extension to HFS Plus that enables case-sensitive file names.
GPFS AIX General parallel file system

 

The suitability of a file system for an application is usually not documented. For example, even different implementations of the Unified file system are hard to compare. Depending on the file system that you choose, performance differences can be up to 20 percent. If you choose to use a file system, then:

  • Make a new file system partition to ensure that the hard disk is clean and unfragmented.
  • Perform a file system check on the partition before using it for database files.
  • Distribute disk I/O as evenly as possible.
  • If you are not using a logical volume manager or a RAID device, then consider placing log files on a different file system from data files.

1.5 Monitoring Disk Performance

The following sections describe the procedure for monitoring disk performance.

Monitoring Disk Performance on Mac OS X

Use the iostat and sar commands to monitor disk performance. For more information about using these commands, refer to the man pages.

Monitoring Disk Performance on Other Operating Systems

To monitor disk performance, use the sar -b and sar -u commands.

The following table describes the columns of the sar -b command output that are significant for analyzing disk performance.

Columns Description
bread/s, bwrit/s Blocks read and blocks written per second (important for file system databases)
pread/s, pwrit/s Number of reads and writes per second from or to raw character devices.

 

An important sar -u column for analyzing disk performance is %wio, the percentage of CPU time spent waiting on blocked I/O.

Note:

Not all Linux distributions display the %wio column in the output of the sar -u command. For detailed I/O statistics, you can use iostat -x command.

Key indicators are:

  • The sum of the bread, bwrit, pread, and pwrit column values indicates the level of activity of the disk I/O subsystem. The higher the sum, the busier the I/O subsystem. The larger the number of physical drives, the higher the sum threshold number can be. A good default value is no more than 40 for 2 drives and no more than 60 for 4 to 8 drives.
  • The %rcache column value should be greater than 90 and the %wcache column value should be greater than 60. Otherwise, the system may be disk I/O bound.
  • If the %wio column value is consistently greater than 20, then the system is I/O bound.

1.6 System Global Area

The SGA is the Oracle structure that is located in shared memory. It contains static data structures, locks, and data buffers. Sufficient shared memory must be available to each Oracle process to address the entire SGA.

The maximum size of a single shared memory segment is specified by the shmmax (shm_max on Tru64 UNIX) kernel parameter.

The following table shows the recommended value for this parameter, depending on your platform.

Platform Recommended Value
AIX NA
HP-UX The size of the physical memory installed on the systemSee Also: HP-UX Shared Memory Segments for an Oracle Instance for information about the shmmax parameter on HP-UX
Linux Half the size of the physical memory installed on the system
Mac OS X Half the size of the physical memory installed on the system
Solaris and Tru64 UNIX 4294967295 or 4 GB minus 16 MBNote: The value of the shm_max parameter must be at least 16 MB for the Oracle Database instance to start. If your system runs both Oracle9i Database and Oracle Database 10g instances, then you must set the value of this parameter to 2 GB minus 16 MB. On Solaris, this value can be greater than 4 GB on 64-bit systems.

 

If the size of the SGA exceeds the maximum size of a shared memory segment (shmmax or shm_max), then Oracle Database attempts to attach more contiguous segments to fulfill the requested SGA size. The shmseg kernel parameter (shm_seg on Tru64 UNIX) specifies the maximum number of segments that can be attached by any process. Set the following initialization parameters to control the size of the SGA:

  • DB_CACHE_SIZE
  • DB_BLOCK_SIZE
  • JAVA_POOL_SIZE
  • LARGE_POOL_SIZE
  • LOG_BUFFERS
  • SHARED_POOL_SIZE

Alternatively, set the SGA_TARGET initialization parameter to enable automatic tuning of the SGA size.

Use caution when setting values for these parameters. When values are set too high, too much of the physical memory is devoted to shared memory. This results in poor performance.

An Oracle Database configured with Shared Server requires a higher setting for the SHARED_POOL_SIZE initialization parameter, or a custom configuration that uses the LARGE_POOL_SIZE initialization parameter. If you installed the database with Oracle Universal Installer, then the value of the SHARED_POOL_SIZE parameter is set automatically by Oracle Database Configuration Assistant. However, if you created a database manually, then increase the value of the SHARED_POOL_SIZE parameter in the parameter file by 1 KB for each concurrent user.

1.6.1 Determining the Size of the SGA

You can determine the SGA size in one of the following ways:

  • Run the following SQL*Plus command to display the size of the SGA for a running database:
·         SQL> SHOW SGA

The result is shown in bytes.

  • When you start your database instance, the size of the SGA is displayed next to the Total System Global Area heading.
  • On systems other than Mac OS X, run the ipcs command as the oracle user.

1.6.2 Shared Memory on AIX

Note:

The information in this section applies only to AIX.

Shared memory uses common virtual memory resources across processes. Processes share virtual memory segments through a common set of virtual memory translation resources, for example, tables and cached entries, for improved performance.

Shared memory can be pinned to prevent paging and to reduce I/O overhead. To perform this, set the LOCK_SGA parameter to true. On AIX 5L, the same parameter activates the large page feature whenever the underlying hardware supports it.

Run the following command to make pinned memory available to Oracle Database:

$ /usr/sbin/vmo -r -o v_pinshm=1

Run a command similar to the following to set the maximum percentage of real memory available for pinned memory, where percent_of_real_memory is the maximum percent of real memory that you want to set:

$ /usr/sbin/vmo -r -o maxpin%=percent_of_real_memory

When using the maxpin% option, it is important that the amount of pinned memory exceeds the Oracle SGA size by at least 3 percent of the real memory on the system, enabling free pinnable memory for use by the kernel. For example, if you have 2 GB of physical memory and you want to pin the SGA by 400 MB (20 percent of the RAM), then run the following command:

$ /usr/sbin/vmo -r -o maxpin%=23

Use the svmon command to monitor the use of pinned memory during the operation of the system. Oracle Database attempts to pin memory only if the LOCK_SGA parameter is set to true.

Large Page Feature on AIX POWER4- and POWER5-Based Systems

To turn on and reserve 10 large pages each of size 16 MB on a POWER4 or POWER 5 system, run the following command:

$ /usr/sbin/vmo -r -o lgpg_regions=10 -o lgpg_size=16777216

This command proposes bosboot and warns that a restart is required for the changes to take affect.

Oracle recommends specifying enough large pages to contain the entire SGA. The Oracle Database instance attempts to allocate large pages when the LOCK_SGA parameter is set to true. If the SGA size exceeds the size of memory available for pinning, or large pages, then the portion of the SGA exceeding these sizes is allocated to ordinary shared memory.

See Also:

The AIX documentation for more information about enabling and tuning pinned memory and large pages

1.7 Tuning the Operating System Buffer Cache

To take full advantage of raw devices, adjust the size of Oracle Database buffer cache. If memory is limited, then adjust the operating system buffer cache.

The operating system buffer cache holds blocks of data in memory while they are being transferred from memory to disk, or from disk to memory.

Oracle Database buffer cache is the area in memory that stores Oracle Database buffers. Because Oracle Database can use raw devices, it does not use the operating system buffer cache.

If you use raw devices, then increase the size of Oracle Database buffer cache. If the amount of memory on the system is limited, then make a corresponding decrease in the operating system buffer cache size.

Use the sar command to determine which buffer caches you must increase or decrease.

See Also:

The man page on Tru64 UNIX for more information about the sar command

Note:

On Tru64 UNIX, do not reduce the operating system buffer cache, because the operating system automatically resizes the amount of memory that it requires for buffering file system I/O. Restricting the operating system buffer cache can cause performance issues.

Tunning AIX for Oracle Database

  • Memory and Paging
  • Disk I/O Issues
  • CPU Scheduling and Process Priorities
  • Oracle Real Application Clusters Information
  • Setting the AIXTHREAD_SCOPE Environment Variable

Memory and Paging

Memory contention occurs when processes require more memory than is available. To cope with the shortage, the system pages programs and data between memory and disks.

Controlling Buffer-Cache Paging Activity

Excessive paging activity decreases performance substantially. This can become a problem with database files created on journaled file systems (JFS and JFS2). In this situation, a large number of SGA data buffers might also have analogous file system buffers containing the most frequently referenced data. The behavior of the AIX file buffer cache manager can have a significant impact on performance. It can cause an I/O bottleneck, resulting in lower overall system throughput.

On AIX, tuning buffer-cache paging activity is possible but you must do it carefully and infrequently. Use the /usr/samples/kernel/vmtune command to tune the following AIX system parameters:

Parameter Description
minfree The minimum free-list size. If the free-list space in the buffer falls below this size, the system uses page stealing to replenish the free list.
maxfree The maximum free-list size. If the free-list space in the buffer exceeds this size, the system stops using page stealing to replenish the free list.
minperm The minimum number of permanent buffer pages for file I/O.
maxperm The maximum number of permanent buffer pages for file I/O.

 

See Also:

For more information about AIX system parameters, see the AIX 5L Performance Management Guide.

Tuning the AIX File Buffer Cache

The purpose of the AIX file buffer cache is to reduce disk access frequency when journaled file systems are used. If this cache is too small, disk usage increases and potentially saturates one or more disks. If the cache is too large, memory is wasted.

See Also:

For more information about the implications of increasing the AIX file buffer cache, see “Controlling Buffer-Cache Paging Activity”.

You can configure the AIX file buffer cache by adjusting the minperm and maxperm parameters. In general, if the buffer hit ratio is low (less than 90 percent), as determined by the sar -b command, increasing the minperm parameter value might help. If maintaining a high buffer hit ratio is not critical, decreasing the minperm parameter value increases the physical memory available. Refer to the AIX documentation for more information about increasing the size of the AIX file buffer cache.

The performance gain cannot be quantified easily, because it depends on the degree of multiprogramming and the I/O characteristics of the workload.

Tuning the minperm and maxperm Parameters

AIX provides a mechanism for you to loosely control the ratio of page frames used for files rather than those used for computational (working or program text) segments by adjusting the minperm and maxperm values according to the following guidelines:

  • If the percentage of real memory occupied by file pages falls below the minperm value, the virtual memory manager (VMM) page-replacement algorithm steals both file and computational pages, regardless of repage rates.
  • If the percentage of real memory occupied by file pages rises above the maxperm value, the virtual memory manager page-replacement algorithm steals both file and computational pages.
  • If the percentage of real memory occupied by file pages is between the minperm and maxperm parameter values, the virtual memory manager normally steals only file pages, but if the repaging rate for file pages is higher then the repaging rate for computational pages, the computational pages are stolen as well.

Use the following algorithm to calculate the default values:

  • minperm (in pages) = ((number of page frames)-1024) * 0.2
  • maxperm (in pages) = ((number of page frames)-1024) * 0.8

Use the following command to change the value of the minperm parameter to 5 percent of the total number of page frames, and the value of the maxperm parameter to 20 percent of the total number of page frames:

# /usr/samples/kernel/vmtune -p 5 -P 20

The default values are 20 percent and 80 percent, respectively.

To optimize for quick response when opening new database connections, adjust the minfree parameter to maintain enough free pages in the system to load the application into memory without adding additional pages to the free list. To determine the real memory size (resident set size, working set) of a process, use the following command:

$ ps v process_id

Set the minfree parameter to this value or to 8 frames, whichever is larger.

If the database files are on raw devices, or if you are using Direct I/O, you can set the minperm and maxperm parameters to low values, for example 5 percent and 20 percent, respectively. This is because the AIX file buffer cache is not used either for raw devices or for Direct I/O. The memory might be better used for other purposes, such as for the Oracle System Global Area.

Allocating Sufficient Paging Space (Swap Space)

Inadequate paging space (swap space) usually causes the system to hang or suffer abnormally slow response times. On AIX, you can dynamically add paging space on raw disk partitions. The amount of paging space you should configure depends on the amount of physical memory present and the paging space requirements of your applications. Use the lsps command to monitor paging space use and the vmstat command to monitor system paging activities. To increase the paging space, use the smit pgsp command.

On platforms where paging space is pre-allocated, Oracle recommends that you set the paging space to a value larger than the amount of RAM. But on AIX paging space is not allocated until needed. The system uses swap space only if it runs out of real memory. If the memory is sized correctly, there is no paging and the page space can be small. Workloads where the demand for pages does not fluctuate significantly perform well with a small paging space. Workloads likely to have peak periods of increased paging require enough paging space to handle the peak number of pages.

As a general rule, an initial setting for the paging space is half the size of RAM plus 4 GB, with an upper limit of 32 GB. Monitor the paging space use with the lsps -a command, and increase or decrease the paging space size accordingly. The metric %Used in the output of lsps -a is typically less than 25% on a healthy system. A properly sized deployment should require very little paging space and an excessive amount of swapping is an indication that the RAM on the system might be undersized.

Caution:

Do not undersize the paging space. If you do, the system can terminate active processes when it runs out of space. However, over-sizing the paging space has little or no negative impact.

Controlling Paging

Constant and excessive paging indicates that the real memory is over-committed. In general, you should:

  • Avoid constant paging unless the system is equipped with very fast expanded storage that makes paging between memory and expanded storage much faster than Oracle can read and write data between the SGA and disks.
  • Allocate limited memory resource to where it is most beneficial to system performance. It is sometimes a recursive process of balancing the memory resource requirements and trade-offs.
  • If memory is not adequate, build a prioritized list of memory-requiring processes and elements of the system. Assign memory to where the performance gains are the greatest. A prioritized list might look like:
  1. OS and RDBMS kernels
  2. User and application processes
  3. Redo log buffer
  4. PGAs and shared pool
  5. Database block buffer caches

For instance, if you query Oracle dynamic performance tables and views and find that both the shared pool and database buffer cache require more memory, assigning the limited spare memory to the shared pool might be more beneficial than assigning it to the database block buffer caches.

The following AIX commands provide paging status and statistics:

  • vmstat -s
  • vmstat interval [repeats]
  • sar -r interval [repeats]

Setting the Database Block Size

You can configure the Oracle database block size for better I/O throughput. On AIX, you can set the value of the DB_BLOCK_SIZE initialization parameter to between 2 KB and 32 KB, with a default of 4 KB. If the Oracle database is installed on a journaled file system, then the block size should be a multiple of the file system block size (4 KB on JFS, 16 K to 1 MB on GPFS). For databases on raw partitions, the Oracle database block size is a multiple of the operating system physical block size (512 bytes on AIX).

Oracle recommends smaller Oracle database block sizes (2 KB or 4 KB) for online transaction processing (OLTP) or mixed workload environments and larger block sizes (8 KB, 16 KB, or 32 KB) for decision support system (DSS) workload environments.

Tuning the Log Archive Buffers

By increasing the LOG_BUFFER size you might be able to improve the speed of archiving the database, particularly if transactions are long or numerous. Monitor the log file I/O activity and system throughput to determine the optimum LOG_BUFFER size. Tune the LOG_BUFFER parameter carefully to ensure that the overall performance of normal database activity does not degrade.

Note:

The LOG_ARCHIVE_BUFFER_SIZE parameter was obsoleted with Oracle8i.

I/O Buffers and SQL*Loader

For high-speed data loading, such as using the SQL*Loader direct path option in addition to loading data in parallel, the CPU spends most of its time waiting for I/O to complete. By increasing the number of buffers, you can usually push the CPU usage harder, thereby increasing overall throughput.

The number of buffers (set by the SQL*Loader BUFFERS parameter) you choose depends on the amount of available memory and how hard you want to push CPU usage. See Oracle Database Utilities for information about adjusting the file processing options string for the BUFFERS parameter.

The performance gains depend on CPU usage and the degree of parallelism that you use when loading data.

See Also:

For more generic information about the SQL*Loader utility, see Oracle Database Utilities.

BUFFER Parameter for the Import Utility

The BUFFER parameter for the Import utility should be set to a large value to optimize the performance of high-speed networks when they are used. For instance, if you use the IBM RS/6000 Scalable POWERparallel Systems (SP) switch, you should set the BUFFER parameter to a value of at least 1 MB.

Disk I/O Issues

Disk I/O contention can result from poor memory management (with subsequent paging and swapping), or poor distribution of tablespaces and files across disks.

Make sure that the I/O activity is distributed evenly across multiple disk drives by using AIX utilities such as filemon, sar, iostat, and other performance tools to identify any disks with high I/O activity.

AIX Logical Volume Manager

The AIX Logical Volume Manager (LVM) can stripe data across multiple disks to reduce disk contention. The primary objective of striping is to achieve high performance when reading and writing large sequential files. Effective use of the striping features in the LVM allows you to spread I/O more evenly across disks, resulting in greater overall performance.

Note:

Do not add logical volumes to Automatic Storage Management (ASM) disk groups. ASM works best when you add raw disk devices to disk groups. If you are using ASM, do not use LVM for striping. Automatic Storage Management implements striping and mirroring.

Design a Striped Logical Volume

When you define a striped logical volume, you must specify the following items:

Item Recommended Settings
Drives At least two physical drives. The drives should have minimal activity when performance-critical sequential I/O is executed. Sometimes you might need to stripe the logical volume between two or more adapters.
Stripe unit size Although the stripe unit size can be any power of two from 2 KB to 128 KB, stripe sizes of 32 KB and 64 KB are good values for most workloads. For Oracle database files, the stripe size must be a multiple of the database block size.
Size The number of physical partitions allocated to the logical volume must be a multiple of the number of disk drives used.
Attributes Cannot be mirrored. Set the copies attribute to a value of 1.

 

Other Considerations

Performance gains from effective use of the LVM can vary greatly, depending on the LVM you use and the characteristics of the workload. For DSS workloads, you can see substantial improvement. For OLTP-type or mixed workloads, you can still expect significant performance gains.

Using Journaled File Systems Compared to Raw Logical Volumes

Note the following considerations when you are deciding whether to use journaled file systems or raw logical volumes:

  • File systems are continually being improved, as are various file system implementations. In some cases, file systems provide better I/O performance than raw devices.
  • File Systems require some additional configuration (AIX minservers and maxservers parameter) and add a small CPU overhead because Asynchronous I/O on file systems is serviced outside of the kernel.
  • Different vendors implement the file system layer in different ways to exploit the strengths of different disks. This makes it difficult to compare file systems across platforms.
  • The introduction of more powerful LVM interfaces substantially reduces the tasks of configuring and backing up logical disks based on raw logical volumes.
  • The Direct I/O and Concurrent I/O feature included in AIX 5L improves file system performance to a level comparable to raw logical volumes.

If you use a journaled file system, it is easier to manage and maintain database files than if you use raw devices. In earlier versions of AIX, file systems supported only buffered read and write and added extra contention because of imperfect inode locking. These two issues are solved by the JFS2 Concurrent I/O feature and the GPFS Direct I/O feature, enabling file systems to be used instead of raw devices, even when optimal performance is required.

Note:

To use the Oracle Real Application Clusters option, you must place data files in an ASM disk group on raw devices or on a GPFS file system. You cannot use JFS or JFS2. Direct I/O is implicitly enabled when you use GPFS.

File System Options

AIX 5L includes Direct I/O and Concurrent I/O support. Direct I/O and Concurrent I/O support allows database files to exist on file systems while bypassing the operating system buffer cache and removing inode locking operations that are redundant with the features provided by Oracle Database.

Where possible, Oracle recommends enabling Concurrent I/O or Direct I/O on file systems containing Oracle data files. The following table lists file systems available on AIX and the recommended setting.

File System Option Description
JFS dio Concurrent I/O is not available on JFS. Direct I/O (dio) is available, but performance is degraded compared to JFS2 with Concurrent I/O.
JFS large file none Oracle does not recommend using JFS large file for Oracle Database because its 128 KB alignment constraint prevents you from using Direct I/O.
JFS2 cio Concurrent I/O (cio) is a better setting than Direct I/O (dio) on JFS2 because it has support for multiple concurrent readers and writers on the same file.
GPFS N/A Oracle Database silently enables Direct I/O on GPFS for optimum performance. GPFS’ Direct I/O already supports multiple readers and writers on multiple nodes. Therefore, Direct I/O and Concurrent I/O are the same thing on GPFS.

 

Considerations for JFS and JFS2

If you are placing Oracle Database logs on a JFS2 file system, the optimal configuration is to create the file system using the agblksize=512 option and to mount it with the cio option. This delivers logging performance within a few percentage points of the performance of a raw device.Before Oracle Database 10g, Direct I/O and/or Concurrent I/O could not be enabled at file level on JFS/JFS2. Therefore, the Oracle home directory and data files had to be placed in separate file systems for optimal performance, the Oracle home directory placed on a file system mounted with default options, with the data files and logs on file systems mounted using the dio or cio options.With Oracle Database 10g, you can enable Direct I/O and/or Concurrent I/O on JFS/JFS2 at individual file level. You can do this by setting the FILESYSTEMIO_OPTIONS parameter in the server parameter file to setall, which is the default, or directIO. This enables Concurrent I/O on JFS2 and Direct I/O on JFS for all data file I/O. The result is that you can place data files on the same JFS/JFS2 file system as the Oracle home directory. As mentioned above, you should still place Oracle Database logs on a separate JFS2 file system for optimal performance.

Considerations for GPFS

If you are using GPFS, you can use the same file system for all purposes including the Oracle home directory, data files, and logs. For optimal performance, you should use a large GPFS block size (typically at least 512 KB). GPFS is designed for scalability and there is no requirement to create multiple GPFS file systems as long as the amount of data fits in a single GPFS file system.

Moving from a Journaled File System to Raw Logical Volumes

To move from a journaled file system to raw devices without having to manually reload all of the data, perform the following as the root user:

  1. Create a raw device (preferable in a BigVG) using the new raw logical volume device type (-T O), which allows putting the first Oracle block at offset zero for optimal performance:
2.  # mklv -T O -y new_raw_device VolumeGroup NumberOfPartitions
3.
Note:

The raw device should be larger than the existing file. Be sure to mind the size of the new raw device to prevent wasting space.

  1. Set the permissions on the raw device.
  2. Use dd to convert and copy the contents of the JFS file to the new raw device, as follows:
  3. Rename the data file.
6.  # dd if=old_JFS_file of=new_raw_device bs=1m
7.

Moving from Raw Logical Volumes to a Journaled File System

The first Oracle block on a raw logical volume is not necessarily at offset zero, whereas the first Oracle block on a file system is always at offset zero. To determine the offset and locate the first block on a raw logical volume, use the $ORACLE_HOME/bin/offset command. The offset can be 4096 bytes or 128 KB on AIX logical volumes or zero on AIX logical volumes created with the mklv -T O option.

When you have determined the offset, you can copy over data from a raw logical volume to a file system using the dd command and skipping the offset. The following example assumes an offset of 4096 bytes:

# dd if=old_raw_device bs=4k skip=1|dd of=new_file bs=256

You can instruct Oracle Database to use a number of blocks smaller that the maximum capacity of a raw logical volume. If you do, you must add a count clause to make sure to copy only data that contains Oracle blocks. The following example assumes an offset of 4096 bytes, an Oracle block size of 8 KB, and 150000 blocks:

# dd if=old_raw_device bs=4k skip=1|dd bs=8k count=150000|dd of=new_file bs=256k

Using Asynchronous I/O

Oracle Database takes full advantage of asynchronous I/O (AIO) provided by AIX, resulting in faster database access.

AIX 5L supports asynchronous I/O (AIO) for database files created both on file system partitions and on raw devices. AIO on raw devices is implemented fully into the AIX kernel, and does not require database processes to service the AIO requests. When using AIO on file systems, the kernel database processes (aioserver) control each request from the time a request is taken off the queue until it completes. The kernel database processes are also used with I/O with virtual shared disks (VSDs) and HSDs with FastPath disabled. By default, FastPath is enabled. The number of aioserver servers determines the number of AIO requests that can be executed in the system concurrently, so it is important to tune the number of aioserver processes when using file systems to store Oracle Database data files.

Note:

If you are using AIO with VSDs and HSDs with AIO FastPath enabled (the default), the maximum buddy buffer size must be greater than or equal to 128 KB.

Use one of the following commands to set the number of servers. This applies only when using asynchronous I/O on file systems rather than raw devices:

  • smit aio
  • chdev -l aio0 -a maxservers=' m ' -a minservers='n'
See Also:

For more information about SMIT, see the System Management Interface Tool (SMIT) online help, and for more information about the smit aio and chdev commands, see the man pages.

Note:

Starting with AIX 5L version 5.2, there are two AIO subsystems available. Oracle Database 10g uses Legacy AIO (aio0), even though the Oracle pre-installation script enables Legacy AIO (aio0) and POSIX AIO (posix_aio0). Both AIO subsystems have the same performance characteristics.

Set the minimum value to the number of servers to be started at system boot. Set the maximum value to the number of servers that can be started in response to a large number of concurrent requests. These parameters apply to file systems only, they do not apply to raw devices.

The default value for the minimum number of servers is 1. The default value for the maximum number of servers is 10. These values are usually too low to run Oracle Database on large systems with 4 CPUs or more, if you are not using kernelized AIO. Oracle recommends that you set the parameters to the values listed in the following table:

Parameter Value
minservers Oracle recommends an initial value equal to the number of CPUs on the system or 10, whichever is lower.
maxservers Starting with AIX 5L version 5.2, this parameter counts the maximum number of AIO servers per CPU, whereas on previous versions of AIX it was a system-wide value. If you are using GPFS, set maxservers to worker1threads divided by the number of CPUs. This is the optimal setting and increasing maxservers will not lead to additional I/O performance. If you are using JFS/JFS2, set the initial value to (10 * number of logical disks / number of CPUs) and monitor the actual number of aioservers started during a typical workload using the pstat or ps commands. If the actual number of active aioservers is equal to the maxservers, then increase the maxservers value.
maxreqs Set the initial value to (4 * number of logical disks * queue depth). You can determine the queue depth (typically 3), by running the following command:
$ lsattr -E -l hdiskxx

 

If the value of the maxservers or maxreqs parameter is set too low, you will see the following warning messages repeated:

Warning: lio_listio returned EAGAINPerformance degradation may be seen.

You can avoid these errors by increasing the value of the maxservers parameter. To display the number of AIO servers running, enter the following commands as the root user:

# pstat -a | grep -c aios
# ps -k | grep aioserver

Check the number of active AIO servers periodically and change the values of the minservers and maxservers parameters if necessary. The changes take place when the system restarts.

I/O Slaves

I/O Slaves are specialized Oracle processes that perform only I/O. They are rarely used on AIX, as asynchronous I/O is the default and recommended way for Oracle to perform I/O operations on AIX. I/O Slaves are allocated from shared memory buffers. I/O Slaves use a set of initialization parameters, listed in the following table.

Parameter Range of Values Default Value
DISK_ASYNCH_IO true/false true
TAPE_ASYNCH_IO true/false true
BACKUP_TAPE_IO_SLAVES true/false false
DBWR_IO_SLAVES 0 – 999 0
DB_WRITER_PROCESSES 1-20 1

 

Generally, you do not need to adjust the parameters in the preceding table. However, on large workloads, the database writer might become a bottleneck. If it does, increase DB_WRITER_PROCESSES. As a general rule, do not increase the number of database writer processes above one for each 2 CPUs in the system or partition.

There are times when you need to turn off asynchronous I/O, for example, if instructed to do so by Oracle Support for debugging. You can use the DISK_ASYNCH_IO and TAPE_ASYNCH_IO parameters to switch off asynchronous I/O for disk or tape devices. Because the number of I/O slaves for each process type defaults to zero, by default no I/O Slaves are deployed.

Set the DBWR_IO_SLAVES parameter to greater than 0 only if the DISK_ASYNCH_IO or TAPE_ASYNCH_IO parameter is set to false. Otherwise, the database writer process (DBWR) becomes a bottleneck. In this case, the optimal value on AIX for the DBWR_IO_SLAVES parameter is 4.

Using the DB_FILE_MULTIBLOCK_READ_COUNT Parameter

By default, Oracle Database 10g uses Direct I/O or Concurrent I/O when available, and therefore the file system does not perform any read-ahead on sequential scans. The read ahead is performed by Oracle Database as specified by the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter.

Setting a large value for the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter usually yields better I/O throughput on sequential scans. On AIX, this parameter ranges from 1 to 512, but using a value higher than 16 usually does not provide additional performance gain.

Set this parameter so that its value when multiplied by the value of the DB_BLOCK_SIZE parameter produces a number larger than the LVM stripe size. Such a setting causes more disks to be used.

Using Write Behind

The write behind feature enables the operating system to group write I/Os together up to the size of a partition. Doing this increases performance because the number of I/O operations is reduced. The file system divides each file into 16 KB partitions to increase write performance, limit the number of dirty pages in memory, and minimize disk fragmentation. The pages of a particular partition are not written to disk until the program writes the first byte of the next 16 KB partition. To set the size of the buffer for write behind to eight 16 KB partitions, enter the following command:

# /usr/samples/kernel/vmtune -c 8

To disable write behind, enter the following command:

# /usr/samples/kernel/vmtune -c 0

Tuning Sequential Read Ahead

The information in this section applies only to file systems, and only when neither Direct I/O nor Concurrent I/O are used.

The Virtual Memory Manager (VMM) anticipates the need for pages of a sequential file. It observes the pattern in which a process accesses a file. When the process accesses two successive pages of the file, the VMM assumes that the program will continue to access the file sequentially, and schedules additional sequential reads of the file. These reads overlap the program processing and make data available to the program sooner. Two VMM thresholds, implemented as kernel parameters, determine the number of pages it reads ahead:

  • minpgahead

The number of pages read ahead when the VMM first detects the sequential access pattern

  • maxpgahead

The maximum number of pages that VMM reads ahead in a sequential file

Set the minpgahead and maxpgahead parameters to appropriate values for your application. The default values are 2 and 8 respectively. Use the /usr/samples/kernel/vmtune command to change these values. You can use higher values for the maxpgahead parameter in systems where the sequential performance of striped logical volumes is of paramount importance. To set the minpgahead parameter to 32 pages and the maxpgahead parameter to 64 pages, enter the following command as the root user:

# /usr/samples/kernel/vmtune -r 32 -R 64

Set both the minpgahead and maxpgahead parameters to a power of two. For example, 2, 4, 8,…512, 1042… and so on.

Tuning Disk I/O Pacing

Disk I/O pacing is an AIX mechanism that allows the system administrator to limit the number of pending I/O requests to a file. This prevents disk I/O intensive processes from saturating the CPU. Therefore, the response time of interactive and CPU-intensive processes does not deteriorate.

You can achieve disk I/O pacing by adjusting two system parameters: the high-water mark and the low-water mark. When a process writes to a file that already has a pending high-water mark I/O request, the process is put to sleep. The process wakes up when the number of outstanding I/O requests falls below or equals the low-water mark.

You can use the smit command to change the high and low-water marks. Determine the water marks through trial-and-error. Use caution when setting the water marks because they affect performance. Tuning the high and low-water marks has less effect on disk I/O larger than 4 KB.

You can determine disk I/O saturation by analyzing the result of iostat, in particular, the percentage of iowait and tm_act. A high iowait percentage combined with high tm_act percentages on specific disks is an indication of disk saturation. Note that a high iowait alone is not necessarily an indication of I/O bottleneck.

Minimizing Remote I/O Operations

Oracle Real Application Clusters running on the SP architecture uses VSDs or HSDs as the common storage that is accessible from all instances on different nodes. If an I/O request is to a VSD where the logical volume is local to the node, local I/O is performed. The I/O traffic to VSDs that are not local goes through network communication layers.

For better performance, it is important to minimize remote I/O as much as possible. Redo logs of each instance should be placed on the VSDs that are on local logical volumes. Each instance should have its own undo segments that are on VSDs mapped to local logical volumes if updates and insertions are intensive.

In each session, each user is allowed only one temporary tablespace. The temporary tablespaces should each contain at least one data file local to each of the nodes.

Carefully design applications and databases (by partitioning applications and databases, for instance) to minimize remote I/O.

Resilvering with Oracle Database

If you disable mirror write consistency (MWC) for an Oracle data file allocated on a raw logical volume (LV), the Oracle Database crash recovery process uses resilvering to recover after a system crash. This resilvering process prevents database inconsistencies or corruption.During crash recovery, if a data file is allocated on a logical volume with more than one copy, the resilvering process performs a checksum on the data blocks of all of the copies. It then performs one of the following:

  • If the data blocks in a copy have valid checksums, the resilvering process uses that copy to update the copies that have invalid checksums.
  • If all copies have blocks with invalid checksums, the resilvering process rebuilds the blocks using information from the redo log file. It then writes the data file to the logical volume and updates all of the copies.

On AIX, the resilvering process works only for data files allocated on raw logical volumes for which MWC is disabled. Resilvering is not required for data files on mirrored logical volumes with MWC enabled, because MWC ensures that all copies are synchronized.If the system crashes while you are upgrading a previous release of Oracle Database that used data files on logical volumes for which MWC was disabled, enter the syncvg command to synchronize the mirrored LV before starting Oracle Database. If you do not synchronize the mirrored LV before starting the database, Oracle Database might read incorrect data from an LV copy.

Note:

If a disk drive fails, resilvering does not occur. You must enter the syncvg command before you can reactivate the LV.

Caution:

Oracle supports resilvering for data files only. Do not disable MWC for redo log files.

Backing Up Raw Devices

Oracle recommends that you use RMAN to back up raw devices. If you do use the dd command to back up raw devices, use it with caution, as follows.

The offset of the first Oracle block on a raw device may be 0, 4K or 128K depending on the device type. You can use the offset command to determine the proper offset.

When creating a logical volume, Oracle recommends using an offset of zero, which is possible if you use -T O option. However, existing raw logical volumes created with earlier versions of Oracle Database typically have a non-zero offset. The following example shows how to backup and restore a raw device whose first Oracle block is at offset 4K:

$ dd if=/dev/raw_device of=/dev/rmt0.1 bs=256k

To restore the raw device from tape, enter commands similar to the following:

$ dd if=/dev/rmt0.1 of=/dev/raw_device count=63 seek=1 skip=1 bs=4k
$ mt -f /dev/rmt0.1 bsf 1
$ dd if=/dev/rmt0.1 of=/dev/raw_device seek=1 skip=1 bs=256k

CPU Scheduling and Process Priorities

The CPU is another system component for which processes might contend. Although the AIX kernel allocates CPU effectively most of the time, many processes compete for CPU cycles. If your system has more than one CPU (SMP), there might be different levels of contention on each CPU.

Changing Process Running Time Slice

The default value for the runtime slice of the AIX RR dispatcher is ten milliseconds. Use the schedtune command to change the time slice. However, be careful when using this command. A longer time slice causes a lower context switch rate if the applications’ average voluntary switch rate is lower. As a result, fewer CPU cycles are spent on context-switching for a process and the system throughput should improve.

However, a longer runtime slice can deteriorate response time, especially on a uniprocessor system. The default runtime slice is usually acceptable for most applications. When the run queue is high and most of the applications and Oracle shadow processes are capable of running a much longer duration, you might want to increase the time slice by entering the following command:

# /usr/samples/kernel/schedtune -t n

In the previous command, choosing a value for n of 0 results in a slice of 10 milliseconds (ms), choosing a value of 1 results in a slice of 20 ms, choosing a value of 2 results in a slice of 30 ms, and so on.

Using Processor Binding on SMP Systems

Binding certain processes to a processor can improve performance substantially on an SMP system. Processor binding is available and fully functional on AIX 5L.

However, starting with AIX 5L version 5.2, specific improvements in the AIX scheduler allow Oracle Database processes to be scheduled optimally without the need for processor binding. Therefore, Oracle no longer recommends binding processes to processors when running on AIX 5L version 5.2 or later.

Oracle Real Application Clusters Information

The following sections provide information about Oracle Real Application Clusters.

UDP Tuning

Oracle Real Application Clusters uses User Datagram Protocol (UDP) for interprocess communications on AIX. You can tune UDP kernel settings to improve Oracle performance. You can modify kernel UDP buffering on AIX by changing the udp_sendspace and udp_recvspace parameters. The udp_sendspace value must always be greater than the value of the Oracle Database DB_BLOCK_SIZE initialization parameter. Otherwise, one or more of the Oracle Real Application Clusters instances will fail at startup. Use the following guidelines when tuning these parameters:

  • Set the value of the udp_sendspace parameter to the product of DB_BLOCK_SIZE by DB_FILE_MULTIBLOCK_READ_COUNT plus 4 KB. So, for example, if you have a 16 KB block size with 16 DB_FILE_MULTIBLOCK_READ_COUNT, set the udp_sendspace to 260 KB, that is 266240.
  • Set the value of the udp_recvspace parameter to at least ten times the value of the udp_sendspace parameter.
  • The value of the udp_recvspace parameter must be less than the value of the sb_max parameter.

To monitor the suitability of the udp_recvspace parameter settings, enter the following command:

$ netstat -p udp | grep "socker buffer overflows"

If the number of overflows is not zero, increase the value of the udp_recvspace parameter. You can use the following command to reset error counters before monitoring again:

$ netstat -Zs -p udp
See Also:

For information about setting these parameters, see the Oracle Real Application Clusters Installation and Configuration Guide. For additional information about AIX tuning parameters, see the AIX 5L Performance Management Guide.

Network Tuning for Transparent Application Failover

If you are experiencing Transparent Application Failover time of more than 10 minutes, consider tuning network parameters rto_length, rto_low, and rto_high to reduce the failover time.

The lengthy Transparent Application Failover time is caused by a TCP timeout and retransmission problem in which clients connected to a crashed node do not receive acknowledgement from the failed instance. Consequently, the client continues to retransmit the same packet again and again using an Exponential Backoff algorithm (refer to TCP/IP documentation for more information).

On AIX, the default timeout value is set to approximately 9 minutes. You can use the no command to tune this parameter using the load time attributes rto_length, rto_low, and rto_high. Using these parameters, you can control how often and how many times a client should retransmit the same packet before it gives up. The rto_low (default is 1 second) and rto_high (default is 64 seconds) parameters control how often to transmit the packet, while the rto_length (default is 13) parameter controls how many times to transmit the packet.

For example, using the Exponential Backoff algorithm with the AIX default values, the timeout value is set to approximately 9.3 minutes. However, using the same algorithm, and setting rto_length to 7, the timeout value is reduced to 2.5 minutes.

Note:

Check the quality of the network transmission before setting any of the parameters described in this section. You can check the quality of the network transmission using the netstat command. Bad quality network transmissions might require a longer timeout value.

Oracle Real Application Clusters and HACMP or PSSP

With Oracle Database 10g, Real Application Clusters (RAC) uses the group services provided by the AIX 5L RSCT Peer Domains (RPD). RAC no longer relies on specific services provided by HACMP or PSSP. In particular, there is no need to configure the PGSD_SUBSYS variable in the information repository.RAC remains compatible with HACMP and PSSP. HACMP is typically present when shared logical raw volumes are used instead of a GPFS file system. PSSP is present when a SP Switch or SP Switch 2 is used as the interconnect.If you are using an IP-based interconnect, such as Gigabit Ethernet, IEEE 802.3ad, EtherChannel, IP over SP Switch, RAC determines the name of the interface(s) to use, as specified by the CLUSTER_INTERCONNECTS parameter in the server parameter file.

Oracle Real Application Clusters and Fault Tolerant IPC

When the interconnect (IPC) used by Real Application Clusters 10g is based on the Internet Protocol (IP), RAC takes advantage of the fault tolerance and link aggregation that is built in AIX 5L via the IEEE 802.3ad Link Aggregation and/or EtherChannel technologies. This replaces the Fault Tolerant IPC feature (FT-IPC) that was used in previous versions of Real Application Clusters.

Link Aggregation using 802.3ad provide the same level of fault tolerance and adds support for bandwidth aggregation. It also simplifies the configuration of Real Application Clusters.

RAC determines which IP interface(s) to use by looking up the server parameter file for the CLUSTER_INTERCONNECTS parameter. This parameter typically contains only the name of IP interface created through IEEE 802.3ad Link Aggregation or EtherChannel. For more information refer to the AIX System Management Guide: Communications and Networks: EtherChannel and IEEE 802.3ad Link Aggregation.

Setting the AIXTHREAD_SCOPE Environment Variable

Threads in AIX can run with process-wide contention scope (M:N) or with system-wide contention scope (1:1). The AIXTHREAD_SCOPE environment variable controls which contention scope is used.

The default value of the AIXTHREAD_SCOPE environment variable is P which specifies process-wide contention scope. When using process-wide contention scope, Oracle threads are mapped to a pool of kernel threads. When Oracle is waiting on an event and its thread is swapped out, it may return on a different kernel thread with a different thread ID. Oracle uses the thread ID to post waiting processes so it is important for the thread ID to remain the same. When using system-wide contention scope, Oracle threads are mapped to kernel threads statically, one to one. For this reason Oracle recommends using system-wide contention. The use of system-wide contention is especially critical for Oracle Real Application Clusters (RAC) instances.Additionally, on AIX 5L version 5.2 or higher, if you set system-wide contention scope, significantly less memory is allocated to each Oracle process.

Oracle recommends that you set the value of the AIXTHREAD_SCOPE environment variable to S in the environment script that you use to set the ORACLE_HOME or ORACLE_SID environment variables for an Oracle database instance or an Oracle Net listener process, as follows:

  • Bourne, Bash, or Korn shell:

Add the following line to the ~/.profile or /usr/local/bin/oraenv script:

AIXTHREAD_SCOPE=S; export AIXTHREAD_SCOPE
  • C shell:

Add the following line to the ~/.login or /usr/local/bin/coraenv script:

setenv AIXTHREAD_SCOPE S

Doing this enables system-wide thread scope for running all Oracle processes.

References :  Oracle® Database Administrator’s Reference
10g Release 1 (10.1) for UNIX Systems: AIX-Based Systems, Apple Mac OS X, hp HP-UX, hp Tru64 UNIX, Linux, and Solaris Operating System
Part No. B10812-06