How to troubleshoot disk IO performance issues in Linux confidently

We’ll go through how to troubleshoot Linux disk input/output (I/O) performance problems in this article. A Linux system’s total performance is greatly influenced by the disc I/O performance, which controls how rapidly the system can read and write data to and from the disc. Poor disc I/O performance can cause system slowdowns and bottlenecks, which can affect the system’s overall performance and user interface. In this lesson, we’ll go over a few tried-and-true methods for locating and fixing disc I/O performance problems in Linux, including how to monitor and improve the system’s disc performance using tools like iostat, iotop, and hdparm.

Table of Contents

Step 1:Find the drive or disk that is the cause of the performance problem.

The iostat command can be used to track drive utilization and determine which drives are receiving a lot of traffic.

 iostat -hymx 1 4

On a Linux system, the iostat command can be used to display statistics about the input/output (I/O) activity of drives and other devices. The following describes how the -hymx parameters change how the iostat command behaves:

-h: Display the output in a “human-readable” format. This means that the output will use more intuitive units (e.g. kilobytes instead of bytes) and will include descriptive labels for each column.
-y: Display the output in “terse” format. This means that the output will be condensed and will only include the most important statistics.
-m: Display the output in “megabytes per second” instead of blocks per second.
-x: Include extended statistics in the output. These statistics provide more detailed information about the I/O activity of the disks.

The 1 and 4 at the end of the command are the interval and count parameters, respectively. The count option defines how many updates should be displayed, while the interval parameter specifies how many seconds should pass between updates.

The command iostat -hymx 1 4 will, for instance, give four updates of the I/O statistics for all of the system’s discs, with a one-second delay between each update. I/O activity will be shown in megabytes per second, and the output will be presented in a human-readable way with terse output and extended statistics available.

With these arguments, you may use the iostat command to track the I/O activity of your system’s disk and spot any potential performance hiccups. Additionally, you can select a specific disc to monitor with the -d option or a specific device or partition with the -p option. For instance, the command iostat -hymx -d /dev/sda 1 4 could be used to track the I/O activity of the device /dev/sda.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.8%    0.0%    6.6%   36.9%   25.4%   30.3%

     r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz Device
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop0
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop1
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop2
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop3
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop4
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop5
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop6
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop7
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop8
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop9
    0.00      0.0k     0.00   0.0%    0.00     0.0k sda
    0.00      0.0k     0.00   0.0%    0.00     0.0k sdb
 3032.00     11.8M     0.00   0.0%   73.68     4.0k sdc

     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz Device
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop0
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop1
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop2
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop3
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop4
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop5
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop6
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop7
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop8
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop9
    1.00      4.0k     0.00   0.0%    2.00     4.0k sda
    0.00      0.0k     0.00   0.0%    0.00     0.0k sdb
    0.00      0.0k     0.00   0.0%    0.00     0.0k sdc

     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz Device
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop0
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop1
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop2
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop3
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop4
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop5
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop6
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop7
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop8
    0.00      0.0k     0.00   0.0%    0.00     0.0k loop9
    0.00      0.0k     0.00   0.0%    0.00     0.0k sda
    0.00      0.0k     0.00   0.0%    0.00     0.0k sdb
    0.00      0.0k     0.00   0.0%    0.00     0.0k sdc

     f/s f_await  aqu-sz  %util Device
    0.00    0.00    0.00   0.0% loop0
    0.00    0.00    0.00   0.0% loop1
    0.00    0.00    0.00   0.0% loop2
    0.00    0.00    0.00   0.0% loop3
    0.00    0.00    0.00   0.0% loop4
    0.00    0.00    0.00   0.0% loop5
    0.00    0.00    0.00   0.0% loop6
    0.00    0.00    0.00   0.0% loop7
    0.00    0.00    0.00   0.0% loop8
    0.00    0.00    0.00   0.0% loop9
    0.00    0.00    0.00   0.4% sda
    0.00    0.00    0.00   0.0% sdb
    0.00    0.00  223.39  96.4% sdc

The sample result shown above indicates that sdc is being used at 96.4% of its capacity.

Step 2: Identify processes that are causing the most disk IO

Use the iotop command to see which processes are causing the most disk IO. This can help you determine whether the issue is caused by a specific application or process.

iotop is a Linux command-line utility that allows you to monitor the input/output (I/O) activity of processes on your system in real-time. It displays a list of processes that are currently performing I/O operations, along with the amount of I/O traffic generated by each process.

The -o option tells iotop to only show processes that are actually doing I/O, rather than all processes on the system. This can be useful if you want to focus on processes that are actively using the disk and ignore processes that are idle or have low I/O activity.

For example, the command iotop -o will display a list of processes that are currently performing I/O operations, sorted by the amount of I/O traffic generated by each process. The output will include the following information for each process:

The process ID (PID)
The user and group that owns the process
The percentage of total I/O bandwidth used by the process
The amount of I/O traffic generated by the process (in kilobytes per second)
The name of the process

sudo iotop -o

Total DISK READ:        11.88 M/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:      11.88 M/s | Current DISK WRITE:       0.00 B/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 155762 be/4 root        3.04 M/s    0.00 B/s  ?unavailable?  fio --filename=/dev/sdc --direct=1 --rw~iops-test-job --eta-newline=1 --readonly
 155763 be/4 root        2.88 M/s    0.00 B/s  ?unavailable?  fio --filename=/dev/sdc --direct=1 --rw~iops-test-job --eta-newline=1 --readonly
 155764 be/4 root        3.00 M/s    0.00 B/s  ?unavailable?  fio --filename=/dev/sdc --direct=1 --rw~iops-test-job --eta-newline=1 --readonly
 155765 be/4 root        2.96 M/s    0.00 B/s  ?unavailable?  fio --filename=/dev/sdc --direct=1 --rw~iops-test-job --eta-newline=1 --readonly

You can also use other options, such as -b to run iotop in batch mode, -d to specify a delay between updates, and -n to specify the number of updates to display.

For example, the command iotop -o -b -d 1 -n 5 will run iotop in batch mode, displaying five updates of I/O activity with a one-second delay between each update, and only showing processes that are actively performing I/O. This can be useful for monitoring the I/O activity of specific processes over time and identifying potential performance issues.

Step 3:Check the disk’s queue length using the `iostat` command.

A high queue length can indicate that the disk is being overloaded and is unable to keep up with the demand.

ubuntu@w3devops-test1:~$ sudo iostat -x /dev/sdc
Linux 5.15.0-1022-oracle (w3devops-test1) 	12/18/22 	_x86_64_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    0.02    0.03    0.05   99.84

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              0.84      3.45     0.00   0.01   73.08     4.10    0.00      0.60     0.00  49.23   97.67   913.88    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.06   0.03

The “aqu-sz” column in this case displays the disk’s queue length. The queue length in this instance is 0.06, which shows that the drive is handling I/O requests effectively and that there is a minimum amount of wait time before requests are fulfilled.

On the other hand, a long queue length—10 or more—can be a sign of a heavy burden on the drive and a potential inability to keep up with the demand for I/O requests. This might affect the performance of the system as a whole and result in longer wait times for requests to be handled.

Note: The iostat command provides a wide range of details on the I/O activity of a drive, including the frequency of read and writes requests, the volume of data read and written each second, and the average size of requests. The iostat command can be used to track a disk’s performance over time and spot any potential hiccups or I/O performance problems.

Step 4: Check the disk’s read and write speeds

Use the iostat command to determine the queue length on the drive. The device may be overloaded and unable to handle the demand if the queue length is excessive.

ubuntu@w3devops-test1:~$ sudo hdparm -tT /dev/sdc

/dev/sdc:
 Timing cached reads:   8712 MB in  1.95 seconds = 4470.05 MB/sec
 Timing buffered disk reads: 132 MB in  3.03 seconds =  43.63 MB/sec

In this illustration, the disk’s read and write speeds are respectively 43.63 MB/s and 4470.05 MB/s.

Note: A disk’s actual read and write speeds can change based on a variety of variables, such as the disk’s make and model, the connection used to access it, and the workload being applied to it. Although the hdparm command is a valuable tool for evaluating a disk’s raw performance, it may not always accurately represent the speeds that you will really experience when utilizing the device in a practical setting.

Conclusion

In conclusion, drive I/O performance is a crucial component of a Linux system’s total performance. You may track the I/O activity of the discs in your system and spot any potential performance problems or bottlenecks by utilizing tools like iostat, iotop, and hdparm.

To enhance performance, think about using a quicker drive, like an SSD, or adding extra discs to your system. If the problem continues, think about switching to a different filesystem or altering the disk’s settings, such as the read-ahead value, to boost performance.

You can also use these tools to optimize the performance of your disks by adjusting settings such as the queue length and read/write speeds. By following best practices for managing and optimizing the performance of your disks, you can ensure that your Linux system is running smoothly and efficiently.

It’s crucial to keep in mind that there are a variety of factors that can influence disk IO performance, so the procedures above might not work in every instance. Additionally, it’s always a good idea to consult with a system administrator or an experienced IT professional if you’re unsure how to proceed.

Step 1:Find the drive or disk that is the cause of the performance problem.

Step 2: Identify processes that are causing the most disk IO

Step 3:Check the disk’s queue length using the iostat command.

Step 4: Check the disk’s read and write speeds

Conclusion

Step 3:Check the disk’s queue length using the `iostat` command.