当前位置:网站首页>Why is the LS command stuck when there are too many files?
Why is the LS command stuck when there are too many files?
2020-11-06 21:04:00 【Zhang Yanfei Allen】
I don't know if you have ever encountered a folder with many files , Execute below ls Problems that have to wait a long time to show when ordering ? If there is , Have you ever thought about why , How can we solve ? To get a deeper understanding of the causes of this problem , We need to start with the disk space occupied by the folder .
inode Consumption verification
stay 《 How much disk space does a new empty file take ?》 I mentioned that each file consumes a little bit of space in its folder . Folder , In fact, it will also consume inode Of . Let's take a look at the current inode The occupancy of
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785020 2134576964 1% /search
Create another empty folder
# mkdir temp
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785021 2134576963 1% /search
adopt IUsed You can see , Like an empty file , Empty folders also consume one inode. But this one is very small , It's on my machine 256 It's just bytes , It should not have caused ls Order the culprit of the card owner .
block Consumption verification
Where is the name of the folder ? Um. , and 《 How much disk space does a new empty file take ?》 The files in are similar to , Will consume one ext4_dir_entry_2 ( Use today ext4 give an example , It's in linux Source code fs/ext4/ex4.h The document defines ), Put it in its parent directory block In the . According to this , I believe you can think of it soon , If it creates a bunch of files under its own node , It will take its own block. Let's start to verify :
# mkdir test
# cd test
# du -h
4.0K .
there 4KB It means that one is consumed block. Empty files don't consume block, Why is an empty directory consumed in the first place block What about it , That's because it has to default with two directory entries "." and "..". And this one 4K It doesn't have to be that big on your machine , It's actually a block size, It was decided when you formatted .
Let's create two more empty files , Check it again :
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# du -h
4.0K .
Looks like , Nothing has changed . This is because
- First of all 、 The new empty file does not occupy block, So what is shown here is still occupied by the directory block.
- second 、 Previously assigned when the folder was created 4KB There's still space in it , There's enough space for these two file items
So I'll try to create more , Using scripts to create 100 File names with a length of 32Byte Empty file .
#!/bin/bash
for((i=1;i<=100;i++));
do
file="tempDir/"$(echo $i|awk '{printf("%032d",$0)}')
echo $file
touch $file
done
# du -h
12K .
ha-ha , At this time, we found that the disk space occupied by the directory has increased , a 3 individual Block 了 . When we create 10000 When it comes to documents ,
# du -h
548K .
In every one of them ext4_dir_entry_2 Except for the file name , It also records inode Number and other information , The detailed definition is as follows :
struct ext4_dir_entry_2 {
__le32 inode; /* Inode number */
__le16 rec_len; /* Directory entry length */
__u8 name_len; /* Name length */
__u8 file_type;
char name[EXT4_NAME_LEN]; /* File name */
};
Let's calculate , Average space per file =548K/10000=54 byte . in other words , Than our file name 32 A little bit bigger , Basically right . Here we also get a fact , The longer the file name , The more space it consumes in its parent directory .
This paper concludes that
A folder, of course, also consumes disk space .
- The first thing to do is to consume one inode, On my machine, it's 256 byte
- Need to consume a directory entry under its parent directory
ext4_dir_entry_2, Save yourself inode Number , Directory name . - Below it, if you create a folder or file , It needs to be in its own block in
ext4_dir_entry_2Array
A file in a directory / The more subdirectories , The more you need to apply for a catalog block. in addition ext4_dir_entry_2 Size is not fixed , file name / The longer the subdirectory name is , The more space a single directory entry consumes .
For the opening question , I think you should understand by now why , The problem is in the folder block On the body . This is when there are so many files under your folder , Especially when the file name is long , It will consume a lot of block. When you traverse the folder , If Page Cache You didn't hit what you were going to visit block, It will penetrate into the disk and do the actual IO. From your point of view , It's when you're done ls after , Get stuck .
So you're sure to ask , I really want to save a lot of documents , What am I gonna do? ? It's also very simple , Just create more folders , Don't save too much in one directory , There won't be such a problem . In engineering practice , The general way is to go through the first level or even the second level hash Hash files into multiple directories , Control the number of single directory files to 100000 or less .
ext Of bug
It seems that today's practice should be over , Now let's delete all the files we just created , Look again. .
# rm -f *
# du -h
72K .
wait , What circumstance ? The files in the folder have been deleted , Why does this folder still occupy 72K Of disk space ? This doubt has been with me for a long time , It was only later that I was able to solve the puzzle . The key is ext4_dir_entry_2 Medium rec_len. This variable stores the current entire ext4_dir_entry_2 Length of object , So when the operating system traverses the folder , You can pass the current pointer , Add this length to find the next file in the folder dir_entry 了 . The advantage is that traversal is very convenient , It's kind of like a linked list , One by one . however , If you want to delete a file , It's a little bit of a hassle , The current file structure variable cannot be deleted directly , Otherwise, the list will be broken . Linux When you delete a file , In its catalog, it just put inode Set to 0 Just pull it down , It didn't recycle the whole thing ext4_dir_entry_2 object . In fact, we often use the false deletion when we do the project . current xfs The file system doesn't seem to have this little problem anymore , But how to solve it , There is no in-depth study for the time being , If you have an answer , Welcome to leave a message !

Development of hard disk album of internal training :
- 1. Disk opening : Take off the hard coat of the mechanical hard disk !
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ?
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ?
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ?
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO?
- 10.write When to write to disk after one byte of file IO?
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ?
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢
边栏推荐
- 開源一套極簡的前後端分離專案腳手架
- Using an example to understand the underlying processing mechanism of JS function
- 【自学unity2d传奇游戏开发】如何让角色动起来
- With this artifact, quickly say goodbye to spam messages
- An article takes you to understand CSS gradient knowledge
- 【:: 是什么语法?】
- Zero basis to build a web search engine of its own
- Top 5 Chinese cloud manufacturers in 2018: Alibaba cloud, Tencent cloud, AWS, telecom, Unicom
- 一路踩坑,被迫聊聊 C# 代码调试技巧和远程调试
- GUI engine evaluation index
猜你喜欢

A small goal in 2019 to become a blog expert of CSDN

The legality of IPFs / filecoin: protecting personal privacy from disclosure

This project allows you to quickly learn about a programming language in a few minutes

Swagger 3.0 brushes the screen every day. Does it really smell good?

游戏主题音乐对游戏的作用

Zero basis to build a web search engine of its own

An article will introduce you to HTML tables and their main attributes

StickEngine-架构12-通信协议

Helping financial technology innovation and development, atfx is at the forefront of the industry

Vue communication and cross component listening state Vue communication
随机推荐
The AI method put forward by China has more and more influence. Tianda et al. Mined the development law of AI from a large number of literatures
jenkins安装部署过程简记
WeihanLi.Npoi 1.11.0/1.12.0 Release Notes
JNI-Thread中start方法的呼叫與run方法的回撥分析
What knowledge do Python automated testing learn?
游戏主题音乐对游戏的作用
【学习】接口测试用例编写和测试关注点
Isn't data product just a report? absolutely wrong! There are university questions in this category
electron 實現檔案下載管理器
This project allows you to quickly learn about a programming language in a few minutes
使用 Iceberg on Kubernetes 打造新一代雲原生資料湖
Multi robot market share solution
Get twice the result with half the effort: automation without cabinet
华为云微认证考试简介
小游戏云开发入门
From overseas to China, rancher wants to do research on container cloud market
How to understand Python iterators and generators?
代码生成器插件与Creator预制体文件解析
Look! Internet, e-commerce offline big data analysis best practice! (Internet disk link attached)
An article takes you to understand CSS pagination examples