System performance monitoring tools are always handy for sysadmins to monitor, troubleshoot and take corrective steps in their systems. There are a lot of such tools available and we wanted to see how they behave when the system is heavily loaded in terms of cpu, memory and swapping to disk (IO). We did not go with top, vmstat, sar, iostat and mpstat as most of us use this day in & out. We tried to make our hands dirty in other opensource command line performance analysis tools in Linux.
We used Linux Stress utility and my own awk based cpu loader to generate load to my Fedora Linux system.
Here is the stress command that we used:
# stress –cpu 2 –io 2 –vm 2 –vm-bytes 1800M –timeout 30s –verbose
Now let us take a look at the different tools and their output.
Glances : http://nicolargo.github.io/glances/
This tool presents maximum information in limited terminal space. It can adapt dynamically the displayed information depending on the terminal size. Glances can also work in client/server mode. Remote monitoring could be done via terminal or web interface.
Lot of information in a single view and it also presents some warning and critical alerts information.
Dstat gives you detailed selective information in columns and clearly indicates in what magnitude and unit the output is displayed. Dstat allows you to view all of your system resources instantly, you can eg. compare disk usage in combination with interrupts from your IDE controller, or compare the network bandwidth numbers directly with the disk throughput (in the same interval). You can customize the output by passing different set of arguments.
If you want to monitor what is going on in regular intervals dstat is the right tool. It has lot of options to display what you want.
htop : http://hisham.hm/htop/
Htop is a advanced interactive and real time Linux process monitoring tool. This is similar to Linux “top” command, but it has some rich features. It also supports vertical and horizontal view of the processes.
htop provides more info on a single page and also has filters and tree based options. You can even kill a process from htop terminal.
collectl : http://collectl.sourceforge.net/
Unlike most monitoring tools that either focus on a small set of statistics, format their output in only one way, run either interatively or as a daemon but not both, collectl tries to do it all. You can choose to monitor any of a broad set of subsystems which currently include buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp.
nmon : http://nmon.sourceforge.net/pmwiki.php
nmon is short for Nigel’s performance Monitor for Linux. Initially we thought it was network performance monitoring tool based on the name, but it can handle cpu, memory, network, disks, file systems, NFS, top processes, resources etc.
nmon allows you to view different set of reports based on key strokes and provides some very detailed information for advanced sys admins.
iotop : http://guichaz.free.fr/iotop/
iotop watches I/O usage information output by the Linux kernel and displays a table of current I/O usage by processes or threads on the system. iotop displays columns for the I/O bandwidth read and written by each process/thread during the sampling period. It also displays the percentage of time the thread/process spent while swapping in and while waiting on I/O. For each process, its I/O priority (class/level) is shown.
atop : http://www.atoptool.nl/
Atop is an ASCII full-screen performance monitor that is capable of reporting the activity of all processes (even if processes have finished during the interval), daily logging of system and process activity for long-term analysis, highlighting overloaded system resources by using colors, etc. At regular intervals, it shows system-level activity related to the CPU, memory, swap, disks (including LVM) and network layers, and for every process (and thread) it shows e.g. the CPU utilization, memory growth, disk utilization, priority, username, state, and exit code.
atop provides more details on a single page and helpful for deeper performance troubleshooting.