Tag Archives: Memory

Setup AWS Cloudwatch Memory and Drive Monitoring on RHEL

Download Scripts

Install Prerequisite Packages

sudo yum install wget unzip perl-core perl-DateTime perl-Sys-Syslog perl-CPAN perl-libwww-perl perl-Crypt-SMIME perl-Crypt-SSLeay

Install LWP Perl Bundles

  1. Launch cpan
    sudo perl -MCPAN -e shell
  2. Install Bundle
    install Bundle::LWP6 LWP YAML

Install Script

wget http://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.1.zip
unzip CloudWatchMonitoringScripts-1.2.1.zip -d /opt
rm -f CloudWatchMonitoringScripts-1.2.1.zip

Setup Credentials

API Access Key (Option 1)

This is good for testing, but it’s better to use IAM roles covered in Option 2.

  1. Copy awscreds template
    cp /opt/aws-scripts-mon/awscreds.template /opt/aws-scripts-mon/awscreds.conf
  2. Add access key id and secret access key
    vim /opt/aws-scripts-mon/awscreds.conf
  3. Lock down file access
    chmod 0400 /opt/aws-scripts-mon/awscreds.conf

IAM Role (Option 2)

  1. Login to AWS web console
  2. Select Identity & Access Management
  3. Select Roles | Create New Role
  4. Enter Role Name
    1. i.e. ec2-cloudwatch
  5. Select Next Step
  6. Select Amazon EC2
  7. Search for cloudwatch
  8. Select CloudwatchFullAccess
  9. Select Next Step | Create Role
  10. Launch a new instance and assign the ec2-cloudwatch IAM role

You can not add an IAM Role to an existing EC2 Instance; you can only specify a role when you launch a new instance.



This won’t send data to Cloudwatch.

/opt/aws-scripts-mon/mon-put-instance-data.pl --mem-util --verify --verbose


MemoryUtilization: 31.7258903184253 (Percent)
Using AWS credentials file <./awscreds.conf>
Endpoint: https://monitoring.us-west-2.amazonaws.com
Payload: {"MetricData":[{"Timestamp":1443537153,"Dimensions":[{"Value":"i-12e1fac4","Name":"InstanceId"}],"Value":31.7258903184253,"Unit":"Percent","MetricName":"MemoryUtilization"}],"Namespace":"System/Linux","__type":"com.amazonaws.cloudwatch.v2010_08_01#PutMetricDataInput"}

Verification completed successfully. No actual metrics sent to CloudWatch.

Report to Cloudwatch Test

Test that communication to Cloudwatch works and design the command you’ll want to cron out in the next step.

/opt/aws-scripts-mon/mon-put-instance-data.pl --mem-util --mem-used --mem-avail

After you run this command one point-in-time metric should show up for the instance under Cloudwatch | Linux System

Create Cron Task (as root)

Now that you’ve tested out the command and figured out what you want to report it’s time to add a Cron task so it runs ever X minutes. Usually 5 minutes is good.

  1. Edit cron table
    crontab -e
    */5 * * * * /opt/aws-scripts-mon/mon-put-instance-data.pl --mem-util --mem-used --mem-avail --disk-space-util --disk-path=/ --from-cron

Create Cron Task (as other user)

You may want to create a user that runs the cron. Here’s an example using a user named cloudwatch

  1. Create user
    useradd cloudwatch
  2. Disable user login
    usermod -s /sbin/nologin cloudwatch
  3. Set ownership
    chown -R cloudwatch.cloudwatch /opt/aws-scripts-mon
  4. Edit cron table
    crontab -e -u cloudwatch
  5. Add cron job
    */5 * * * * /opt/aws-scripts-mon/mon-put-instance-data.pl --mem-util --mem-used --mem-avail --swap-used --disk-space-util --disk-path=/ --from-cron

Verify Cron Job Ran

One way to verify the cron job ran is to look in the cron log.

less /var/log/cron
tail -f /var/log/cron


Monitor Script Arguments

Name Description
–mem-util Collects and sends the MemoryUtilization metrics in percentages. This option reports only memory allocated by applications and the operating system, and excludes memory in cache and buffers.
–mem-used Collects and sends the MemoryUsed metrics, reported in megabytes. This option reports only memory allocated by applications and the operating system, and excludes memory in cache and buffers.
–mem-avail Collects and sends the MemoryAvailable metrics, reported in megabytes. This option reports memory available for use by applications and the operating system.
–swap-util Collects and sends SwapUtilization metrics, reported in percentages.
–swap-used Collects and sends SwapUsed metrics, reported in megabytes.
–disk-path=PATH Selects the disk on which to report.PATH can specify a mount point or any file located on a mount point for the filesystem that needs to be reported. For selecting multiple disks, specify a –disk-path=PATH for each one of them. To select a disk for the filesystems mounted on / and /home, use the following parameters:
–disk-path=/ –disk-path=/home
–disk-space-util Collects and sends the DiskSpaceUtilization metric for the selected disks. The metric is reported in percentages.
–disk-space-used Collects and sends the DiskSpaceUsed metric for the selected disks. The metric is reported by default in gigabytes.Due to reserved disk space in Linux operating systems, disk space used and disk space available might not accurately add up to the amount of total disk space.
–disk-space-avail Collects and sends the DiskSpaceAvailable metric for the selected disks. The metric is reported in gigabytes.Due to reserved disk space in the Linux operating systems, disk space used and disk space available might not accurately add up to the amount of total disk space.
–memory-units=UNITS Specifies units in which to report memory usage. If not specified, memory is reported in megabytes. UNITS may be one of the following: bytes, kilobytes, megabytes, gigabytes.
–disk-space-units=UNITS Specifies units in which to report disk space usage. If not specified, disk space is reported in gigabytes. UNITS may be one of the following: bytes, kilobytes, megabytes, gigabytes.
–aws-credential- file=PATH Provides the location of the file containing AWS credentials.This parameter cannot be used with the –aws-access-key-id and –aws-secret-keyparameters.
–aws-access-key-id=VALUE Specifies the AWS access key ID to use to identify the caller. Must be used together with the –aws-secret-key option. Do not use this option with the –aws-credential-file parameter.
–aws-secret-key=VALUE Specifies the AWS secret access key to use to sign the request to CloudWatch. Must be used together with the –aws-access-key-id option. Do not use this option with –aws-credential-file parameter.
–verify Performs a test run of the script that collects the metrics, prepares a complete HTTP request, but does not actually call CloudWatch to report the data. This option also checks that credentials are provided. When run in verbose mode, this option outputs the metrics that will be sent to CloudWatch.
–from-cron Use this option when calling the script from cron. When this option is used, all diagnostic output is suppressed, but error messages are sent to the local system log of the user account.
–verbose Displays detailed information about what the script is doing.
–help Displays usage information.
–version Displays the version number of the script.

High Memory Utilized by ZFS File Data in Solaris

Today, In one of the server 42 % of the Physcial Memory has been allocated to ZFS File Data.

root@qctsun02:/tmp# echo ::memstat | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     337924              2640   16%
ZFS File Data              884736              6912   42%
Anon                       694708              5427   33%
Exec and libs               27627               215    1%
Page cache                  67434               526    3%
Free (cachelist)              976                 7    0%
Free (freelist)             83747               654    4%
Total                     2097152             16384

However the huge ZFS caches can cause a problem with applications which allocates a lot of memory on start up (for example Java-based applications).

The ideal size of the ZFS cache depends on what type of data you have, and how much data you have.

You can limit the size of the ZFS file cache by setting the zfs_arc_max in /etc/system as per your requirement, for example:

* limit ZFS ARC cache to 2G max: set zfs:zfs_arc_max =  2147483648

Observe with MDB and KStats :

root@qctsun02:/tmp# echo "::arc" | mdb -k
hits                      =    116086563
misses                    =     18255374
demand_data_hits          =     68163338
demand_data_misses        =     10123941
demand_metadata_hits      =     34673799
demand_metadata_misses    =       168035
prefetch_data_hits        =     12734721
prefetch_data_misses      =      7917615
prefetch_metadata_hits    =       514705
prefetch_metadata_misses  =        45783
mru_hits                  =    110075513
mru_ghost_hits            =            0
mfu_hits                  =     64792021
mfu_ghost_hits            =            0
deleted                   =     17522433
mutex_miss                =        19971
hash_elements             = 18446744073704262340
hash_elements_max         = 18446744073709551615
hash_collisions           =     25979272
hash_chains               = 18446744073708099047
hash_chain_max            =            8
p                         =           65 MB
c                         =           72 MB
c_min                     =           64 MB
c_max                     =         2048 MB
size                      =           70 MB
buf_size                  =            1 MB
data_size                 =           47 MB
other_size                =           20 MB
evict_mfu                 =      1491880 MB
evict_mru                 =       621150 MB
evict_l2_cached           =            0 MB
evict_l2_eligible         =      1801428 MB
evict_l2_ineligible       =       311601 MB
l2_hits                   =            0
l2_misses                 =     10291976
l2_feeds                  =            0
l2_rw_clash               =            0
l2_read_bytes             =            0 MB
l2_write_bytes            =            0 MB
l2_writes_sent            =            0
l2_writes_done            =            0
l2_writes_error           =            0
l2_writes_hdr_miss        =            0
l2_evict_lock_retry       =            0
l2_evict_reading          =            0
l2_abort_lowmem           =            0
l2_cksum_bad              =            0
l2_io_error               =            0
l2_hdr_size               =            0 MB
memory_throttle_count     =          321
meta_used                 =           22 MB
meta_max                  =          121 MB
meta_limit                =            0 MB
arc_no_grow               =            0
arc_tempreserve           =            0 MB

AIX Memory / RAM performance monitoring


Memory Leak: Caused by a program that repeatedly allocates memory without freeing it.

When a process exits, its working storage is freed up immediately and its associated memory frames are put back on the free list.
However any files the process may have opened can stay in memory.

AIX tries to use the maximum amount of free memory for file caching.

High levels of file system cache usually means that is the way the application runs and likes it (you have to decide if this is expected by understanding the workload) or AIX can’t find anything else to do with the memory and so thinks it might as well save disk I/O CPU cycles by caching – this is normal and a good idea.

Some notes regarding memory leak:

When a process gets busy, process will use malloc() system call (memory allocation) to get more memory, so its memory usage gets bigger.  Memory requests are satisfied by allocating portions from a large pool of memory called the heap. When the process goes idle, it uses free() system call, but that doesn’t actually free up the memory from the process. It just releases the memory into the “heap area”.

AIX keeps a list of the pages in the heap area about the free memory pages that were used, but not used now. If there are new new malloc() requests, they will be served from heap first. Only if the heap goes to a very small size, only then will be issued new malloc() request to get new memory pages. When heap pages are not used for a long time AIX will page out these to disk.

RSS size is the actual memory occupied by the process in the RAM. (RSS can be active pages or some other pages in the heap). RSS pages will be paged out only if memory is getting short. If there is free mamory, it will not page these out, becaue it maybe useful to have it in the RAM

So, usually it turns out, there is no memory leak at all, just normal memory usage behaviour!!!


topas -P    This does not tell how much of the application is paged out but how much of the application memory is backed by paging space.
(things in memory (working segment) should be backed by paging space by the actual size in memory of the process.)
svmon -Pt15 | perl -e ‘while(<>){print if($.==2||$&&&!$s++);$.=0 if(/^-+$/)}’        top 15 processes using the most memory
ps aux | head -1 ; ps aux | sort -rn +3 | head -20                                   top memory processes (the above is better)
ps -ef | grep -c LOCAL=NO        shows the number of oracle client connections (each connection takes up memory, so if it is high then…)

svmon -Pg -t 1 |grep Pid ; svmon -Pg -t 10 |grep “N”                                 top 10 processes using the most paging space
svmon -P -O sortseg=pgsp                                                             shows paging space usage of processes


# ps gv | head -n 1; ps gv | egrep -v “RSS” | sort +6b -7 -n -r
393428      – A    10:23 2070 54752 54840 32768    69    88  0.0  5.0 /var/opt
364774      – A     0:08  579 28888 28940 32768    32    52  0.0  3.0 [cimserve]
397542      – A     0:18  472  6468  7212    xx   526   744  0.0  1.0 /usr/sbi
344246      – A     0:02   44  7132  7204 32768    50    72  0.0  1.0 /opt/ibm

RSS:    The amount of RAM used for the text and data segments per process. PID 393428 is using 54840k. (RSS:resident set size)
%MEM:    The actual amount of the RSS / Total RAM. Watch for processes that consume 40-70 percent of %MEM.
TRS:    The amount of RAM used for the text segment of a process in kilobytes.
SIZE:    The actual amount of paging space (virtual mem. size) allocated for this process (text and data).

How much big is the process in memory? It is the RSS size.

Checking memory usage with nmon:

nmon –> t (top processes) –> 4 (order in process size)

PID       %CPU     Size      Res     Res      Res     Char    RAM      Paging         Command
Used       KB      Set     Text     Data     I/O     Use   io   other repage
16580722     0.0   226280   322004   280640    41364        0    5%      0      0      0 oracle
9371840      0.0   204324   300904   280640    20264        0    5%      0      0      0 oracle
10551416     0.0   198988   305656   280640    25016        0    5%      0      0      0 oracle
8650824      0.0   198756   305428   280640    24788        0    5%      0      0      0 oracle

Size KB: program on disk size
ResSize: Resident Set Size – how big it is in memory (excluding the pages still in the file system (like code) and some parts on paging disks)
ResText: code pages of the Resident Set
ResData: data and stack pages of the Resident Set


regarding ORACLE:
ps -ef | grep -c LOCAL=NO

This will show how many client connections we have. Each connections take up some memory, sometimes if there are memory problems too many users are logegd in causing this triouble.

shared memory segments:

root@aix2: /root #  ipcs -bm
IPC status from /dev/mem as of Sat Sep 17 10:04:28 CDT 2011
T        ID     KEY        MODE       OWNER    GROUP     SEGSZ
Shared Memory:
m   1048576 0x010060f0 –rw-rw-rw-     root   system       980
m   1048577 0xffffffff D-rw-rw-rw-     root   system       944
m   4194306 0x78000238 –rw-rw-rw-     root   system  16777216
m   1048579 0x010060f2 –rw-rw-rw-     root   system       976
m        12 0x0c6629c9 –rw-r—–     root   system   1663028
m        13 0x31000002 –rw-rw-rw-     root   system    131164
m 425721870 0x81fc461c –rw-r—–   oracle oinstall 130027520
m        15 0x010060fa –rw-rw-rw-     root   system      1010
m   2097168 0x849c6158 –rw-rw—-   oracle oinstall 18253647872

It will show our memory segments, who owns them and what their size (in bytes). It shows the maximum allocated size, that a memory segment can go to. It does not mean it is allocated, but the exception is Oracle (and DB2).
Oracle line shows the SGA for Oracle. (This memory is allocated for Oracle. It is 18GB in this case)


IBM script for checking what is causing paging space activity:
(it will run until po will be 50 then saves processes, svmon and exists)

/usr/bin/renice -n -20 -p $$
while [ true ]
vmstat -I 1 1 | tail -1 | awk ‘{print $9}’ | read po
if [[ $po -gt 50 ]]
ps -ef > ps.out &
svmon -G > svmon.G &
exit 0

My script for monitoring memory, paging activity:

/usr/bin/renice -n -20 -p $$

while [ true ]; do
echo `date` “–>” `svmon -G | head -2 | tail -1` “–>” `vmstat -v | grep numperm` >> svmon.out &
echo `date` “–>” `svmon -G | head -3 | tail -1` >> paging.out &
echo `vmstat -Iwt 1 1 | tail -1` >> vmstat.out &
sleep 60