当前位置：网站首页>Performance file system

Performance file system

2022-06-25 10:27:00 【Ash technology】

This article is the last of the performance article , It's a study note , The examples are also excerpted from other articles , It is mainly used to explain how to use and view the corresponding indicators . This article mainly introduces the file system , The more specific point is the disk .

The author still follows ： Basic knowledge of ----》 Common commands and tools ----〉 The ways of checking ideas are sorted out .

One 、 Basic knowledge of 1. file system ： On the basis of disk , Provides a tree structure for managing files , It's about the files on the storage device , The mechanism of organizational management .

For the convenience of management ,Linux The file system assigns two data structures to each file , The index node （index node） And catalog items （directory entry）. They are mainly used to record the meta information and directory structure of files , The index node is the only flag for each file , The directory entry maintains the tree structure of the file system . The relationship between catalog entries and index nodes is many to one , You can simply understand it as , A file can have multiple aliases . The index node ： Referred to as inode, Metadata used to record files , such as inode Number 、 file size 、 Access right 、 modification date 、 Location of data, etc . The index node corresponds to the file one by one , It's the same as the file , Will be persisted to disk . So remember , Index nodes also occupy disk space . Catalog items ： Referred to as dentry, The name used to record the file 、 Index node pointer and its association with other directory entries . Multiple associated catalog entries , It forms the directory structure of the file system . however , Different from inodes , The directory entry is a memory data structure maintained by the kernel , So it's also known as directory entry caching .

2. Virtual file system VFS（Virtual File System）： To support a variety of different file systems ,Linux The kernel is between the user process and the file system , Another layer of abstraction is introduced .VFS Defines a set of data structures and standard interfaces that all file systems support , such , User processes and other subsystems in the kernel , Just follow VFS Provide a unified interface for interaction , You don't need to care about the implementation details of the underlying file systems .

3. file system I/O: VFS Provides a set of standard file access interfaces , These interfaces are in the form of system calls , Provided to applications to use , for example ：open()、read()、write(), It can be divided into the following four categories .

A class , Whether to use standard library cache , You can put the file I/O Divided into buffer I/O And unbuffering I/O.

1.  buffer  I/O, It refers to using standard library cache to speed up file access , The standard library can access files through system scheduling .
2.  Non buffering  I/O, It refers to accessing files directly through system call , No more standard library caching .

Two category , Whether to utilize the page cache of the operating system , You can put the file I/O Divided into direct I/O And indirect I/O.

1.  direct  I/O, Skip the page cache of the operating system , Interact directly with the file system to access files , Usually in a system call , Appoint  O_DIRECT  sign .
2.  Not directly  I/O  Just the opposite , When reading and writing documents , First through the system's page cache , And then by the kernel or additional system calls , Actually write to disk .

Three types of , Whether the application is blocking itself , You can put the file I/O Divided into blocking I/O And non blocking I/O

1.  Blocking  I/O, It means that the application executes  I/O  After the operation , If there is no response , Will block the current thread , Naturally, we can't perform other tasks .
2.  Non blocking  I/O, It means that the application executes  I/O  After the operation , Will not block the current thread , You can go on with other tasks , Then by polling or event notification , Get the result of the call .

Four types of , Whether to wait for the response result , You can put the file I/O It can be divided into synchronous and asynchronous I/O：

1.  Sync  I/O, It means that the application executes  I/O  After the operation , Wait until the whole  I/O  After completion , In order to obtain  I/O  Respond to .
2.  asynchronous  I/O, It means that the application executes  I/O  After the operation , Don't wait for the completion and the response after completion , It's just to carry on .
 Wait until this time  I/O  After completion , The response will be in the form of event notification , Tell the application .

4. Disk performance metrics ：

1.  Usage rate ： Disk processing  I/O  Percent of time . Too much usage （ For example, over  80%）, Usually means disk  I/O  Performance bottlenecks exist .
2.  saturation ： Disk processing  I/O  How busy . Too high saturation , It means that there are serious performance bottlenecks in the disk . When the saturation is  100%  when , The disk cannot accept the new  I/O  request .
3. IOPS（Input/Output Per Second）： It means... Every second  I/O  Number of requests .
4.  throughput ： It means... Every second  I/O  Request size .
5.  response time ： Refer to  I/O  The interval between the time when a request is sent and the time when a response is received .

Two 、 Common commands and tools

df // View disk space

# -h  Better readability , see /dev/sda1 Disk space used 
$ df -h /dev/sda1 
Filesystem      Size  Used Avail Use% Mounted on 
/dev/sda1        29G  3.1G   26G  11% / 

# -i, View the disk space of the inode 
$ df -i /dev/sda1 
Filesystem      Inodes  IUsed   IFree IUse% Mounted on 
/dev/sda1      3870720 157460 3713260    5% /

2. Cache size view

The kernel using Slab Mechanism , Manage caching of catalog entries and inodes ,/proc/meminfo Only gives Slab The overall size of , Specific to each Slab cache , And check out /proc/slabinfo This file .

#  see Slab The overall size of 
$ cat /proc/meminfo | grep -E "SReclaimable|Cached" 
Cached:           748316 kB 
SwapCached:            0 kB 
SReclaimable:     179508 kB 
#  View each  Slab Cache size of 
$ cat /proc/slabinfo | grep -E '^#|dentry|inode' 
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> 
xfs_inode              0      0    960   17    4 : tunables    0    0    0 : slabdata      0      0      0 
... 
ext4_inode_cache   32104  34590   1088   15    4 : tunables    0    0    0 : slabdata   2306   2306      0hugetlbfs_inode_cache     13     13    624   13    2 : tunables    0    0    0 : slabdata      1      1      0 
sock_inode_cache    1190   1242    704   23    4 : tunables    0    0    0 : slabdata     54     54      0 
shmem_inode_cache   1622   2139    712   23    4 : tunables    0    0    0 : slabdata     93     93      0 
proc_inode_cache    3560   4080    680   12    2 : tunables    0    0    0 : slabdata    340    340      0 
inode_cache        25172  25818    608   13    2 : tunables    0    0    0 : slabdata   1986   1986      0 
dentry             76050 121296    192   21    1 : tunables    0    0    0 : slabdata   5776   5776      0 
#  View the usage size of the cache type 
#  Press down c Sort by cache size , Press down a Sort by number of active objects  
$ slabtop 
Active / Total Objects (% used)    : 277970 / 358914 (77.4%) 
Active / Total Slabs (% used)      : 12414 / 12414 (100.0%) 
Active / Total Caches (% used)     : 83 / 135 (61.5%) 
Active / Total Size (% used)       : 57816.88K / 73307.70K (78.9%) 
Minimum / Average / Maximum Object : 0.01K / 0.20K / 22.88K 

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME 
69804  23094   0%    0.19K   3324       21     13296K dentry 
16380  15854   0%    0.59K   1260       13     10080K inode_cache 
58260  55397   0%    0.13K   1942       30      7768K kernfs_node_cache 
   485    413   0%    5.69K     97        5      3104K task_struct 
  1472   1397   0%    2.00K     92       16      2944K kmalloc-2048

3. iostat // see io Indicators of

# r/s： Number of read requests sent to disk per second 
# w/s： Number of write requests sent to disk per second 
# rkB/s： The amount of data read from disk per second 
# wkB/s： The amount of data written to disk per second 
# rrqm/s ： Number of read requests merged per second 
# wrqm/s： Number of write requests merged per second 
# r_await：  Read request processing completion wait time , Include ： Wait time in queue + Equipment processing time , Unit millisecond 
# w_await：  Write request processing completion wait time , Include ： Wait time in queue + Equipment processing time , Unit millisecond 
# aqu-sz： Average request queue length 
# rareq-sz： Average read request size , Company KB
# wareq-sz： Average write request size , Company KB
# svctm： Handle I/O The average time required for a request , Excluding waiting time , Unit millisecond 
# %util： Disk processing I/O Percentage of time 
# -d -x Shows all disks I/O Indicators of 
#  remarks ：-d  The option is to display the  I/O  Performance index of ;
#    -x  The option is to display extended Statistics （ It shows all  I/O  indicators ）
$ iostat -d -x 1 
Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util 
loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 
loop1            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 
sdb              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

%util ： disk I/O Usage rate ;

r/s+ w/s ： yes IOPS; rkB/s+wkB/s ： It's throughput ; r_await+w_await ： It's response time .

4.iotop ,pidstat // according to I/O Size sorts processes

$ iotop
Total DISK READ :       0.00 B/s | Total DISK WRITE :       7.85 K/s 
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s 
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND 
15055 be/3 root        0.00 B/s    7.85 K/s  0.00 %  0.00 % systemd-journald

# -d  You can display the process to disk io The situation of 
$ pidstat -d 1 
15:08:35      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command 
15:08:36        0     18940      0.00  45816.00      0.00      96  python 

15:08:36      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command 
15:08:37        0       354      0.00      0.00      0.00     350  jbd2/sda1-8 
15:08:37        0     18940      0.00  46000.00      0.00      96  python 
15:08:37        0     20065      0.00      0.00      0.00    1503  kworker/u4:2

5.strace // Observe system calls

# 18940 Is a process, 
$ strace -p 18940 
strace: Process 18940 attached 
...
mmap(NULL, 314576896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0f7aee9000 
mmap(NULL, 314576896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0f682e8000 
#  You can see here write Yes 300M data 
write(3, "2018-12-05 15:23:01,709 - __main"..., 314572844 
) = 314572844 
munmap(0x7f0f682e8000, 314576896)       = 0 
write(3, "\n", 1)                       = 1 
munmap(0x7f0f7aee9000, 314576896)       = 0 
close(3)                                = 0 
#  Here we can see that the operation is “ obtain  /tmp/logtest.txt.1  The state of ”
stat("/tmp/logtest.txt.1", {st_mode=S_IFREG|0644, st_size=943718535, ...}) = 0

6. lsof // View the files opened by the process

# FD： Represents a document description symbol ,
# TYPE： Indicates the file type ,
# NAME： Represents the file path 
$ lsof -p 18940 
COMMAND   PID USER   FD   TYPE DEVICE  SIZE/OFF    NODE NAME 
python  18940 root  cwd    DIR   0,50      4096 1549389 / 
python  18940 root  rtd    DIR   0,50      4096 1549389 / 
… 
python  18940 root    2u   CHR  136,0       0t0       3 /dev/pts/0 
python  18940 root    3w   REG    8,1 117944320     303 /tmp/logtest.txt

7. fio

# direct, Indicates whether to skip the system cache .1 It means skipping the system cache .
# iodepth, Means to use asynchrony  I/O（asynchronous I/O, abbreviation  AIO） when , At the same time  I/O  Request a cap .
# rw, Express  I/O  Pattern . In my example , read/write  Read in sequence  /  Write , and  randread/randwrite  They mean random reading  /  Write .
# ioengine, Express  I/O  engine , It supports synchronization （sync）、 asynchronous （libaio）、 Memory mapping （mmap）、 The Internet （net） And so on  I/O  engine .libaio  Means to use asynchrony  I/O.
# bs, Express  I/O  Size , 4K（ This is also the default value ）.
# filename, Represents the file path , Of course , It can be a disk path （ Test disk performance ）, It can also be a file path （ Test file system performance ）. But pay attention to , Test write with disk path , Will destroy the file system on this disk , So before using , You must back up your data in advance .
#  random block read 
fio -name=randread -direct=1 -iodepth=64 -rw=randread -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/dev/sdb
#  Write at random 
fio -name=randwrite -direct=1 -iodepth=64 -rw=randwrite -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/dev/sdb
#  Sequential reading 
fio -name=read -direct=1 -iodepth=64 -rw=read -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/dev/sdb
#  Sequential writing 
fio -name=write -direct=1 -iodepth=64 -rw=write -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/dev/sdb

Show the contents of the report ：

read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=16.7MiB/s,w=0KiB/s][r=4280,w=0 IOPS][eta 00m:00s]
read: (groupid=0, jobs=1): err= 0: pid=17966: Sun Dec 30 08:31:48 2018
   read: IOPS=4257, BW=16.6MiB/s (17.4MB/s)(1024MiB/61568msec)
   # slat , From  I/O  Submit to actual implementation  I/O  Duration （Submission latency）;
   # clat , From  I/O  Submitted to the  I/O  The length of time to complete （Completion latency）;
   # lat , Means from  fio  establish  I/O  To  I/O  Total time to complete .
    slat (usec): min=2, max=2566, avg= 4.29, stdev=21.76
    clat (usec): min=228, max=407360, avg=15024.30, stdev=20524.39
     lat (usec): min=243, max=407363, avg=15029.12, stdev=20524.26
    clat percentiles (usec):
     |  1.00th=[   498],  5.00th=[  1020], 10.00th=[  1319], 20.00th=[  1713],
     | 30.00th=[  1991], 40.00th=[  2212], 50.00th=[  2540], 60.00th=[  2933],
     | 70.00th=[  5407], 80.00th=[ 44303], 90.00th=[ 45351], 95.00th=[ 45876],
     | 99.00th=[ 46924], 99.50th=[ 46924], 99.90th=[ 48497], 99.95th=[ 49021],
     | 99.99th=[404751]
    # bw , It represents throughput 
   bw (  KiB/s): min= 8208, max=18832, per=99.85%, avg=17005.35, stdev=998.94, samples=123
   # iops , In fact, it's every second  I/O  The number of times 
   iops        : min= 2052, max= 4708, avg=4251.30, stdev=249.74, samples=123
  lat (usec)   : 250=0.01%, 500=1.03%, 750=1.69%, 1000=2.07%
  lat (msec)   : 2=25.64%, 4=37.58%, 10=2.08%, 20=0.02%, 50=29.86%
  lat (msec)   : 100=0.01%, 500=0.02%
  cpu          : usr=1.02%, sys=2.97%, ctx=33312, majf=0, minf=75
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwt: total=262144,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=16.6MiB/s (17.4MB/s), 16.6MiB/s-16.6MiB/s (17.4MB/s-17.4MB/s), io=1024MiB (1074MB), run=61568-61568msec

Disk stats (read/write):
  sdb: ios=261897/0, merge=0/0, ticks=3912108/0, in_queue=3474336, util=90.09%

remarks ：fio（Flexible I/O Tester） The most common file systems and disks I/O Performance benchmarking tools ：https://github.com/axboe/fio

3、 ... and 、I/O Performance troubleshooting ideas

The following ideas can be followed for troubleshooting and positioning , As shown below ：

1.  use  iostat  Disk found  I/O  Performance bottleneck ;
2.  With the help of  pidstat , Locate the process that caused the bottleneck ;
3.  Analyze the progress of  I/O  Behavior ;
4.  Combine the principles of the application , Analyze these  I/O  The source of the .

See information ：

https://github.com/axboe/fio

https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram

Linux Performance optimization practice

原网站

版权声明
本文为[Ash technology]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/176/202206250956389122.html