当前位置:网站首页>Address mapping of virtual memory paging mechanism
Address mapping of virtual memory paging mechanism
2022-06-24 21:12:00 【The smell of tobacco】
summary
Previous articles Virtual memory The paging mechanism is briefly introduced . There is another question , That is how to map the logical address in the virtual memory to the physical address ? Let's make a brief analysis today .
For a paged address , It usually contains two elements :
- Page number : Page
- Offset : The number of bytes of the current page
The following addr_virtual(p, o)
Represents a logical address , With addr_real(p, o)
Represents a physical address ( The physical address is also paged ).
A page table
The first step is to think about , If you want to find the corresponding physical address according to a logical address , Then the corresponding relationship must be stored somewhere , Because mapping is irregular . What data structure should be used to store ?
Because in paging , page Is a minimum unit , Therefore, we only need the mapping relationship of page numbers , Logical and physical addresses have the same page size , The offset is exactly the same .
according to key
seek value
, This is not a map
Well . Look at this again map
Of key
, Page numbers are all numbers , And it's sequential . This is just an array . beautiful , It's an array .
in other words , This page table is indexed by logical page numbers , One dimensional array with physical page number as value . The process of address translation is as follows :
The elements in the page table do not just store physical page numbers , Some additional flag bits are also stored , Used to identify the properties of the current page , A few simple examples :
- Existential bit : Whether it is currently loaded into memory . If it is not loaded, the operating system is required to load
- Modify bit : Whether the current page has been modified in memory . If modified , When reclaiming physical memory, you need to write it to the hard disk
- Kernel permissions : Whether the current page needs kernel mode to access
- Whether to read bits
- Whether bits can be written
- Is it executable
- wait
Because each process has its own virtual memory , So each process needs its own page table .
In order to improve operation efficiency , This translation process is completed by hardware , Both CPU
Memory management unit in mmu
To complete .
Is it easy to see ? good , Introduction completed , This concludes the article .
Problems and solutions
ha-ha , To make fun of . How could it be so easy to end . Now let's briefly analyze the problems of this simple model . According to the experience of the algorithm , Most algorithms are implemented , Or the time complexity is too high , Or the space complexity is too high .
Time issues
Imagine the steps to access a memory :
- Look up the page table to find the corresponding physical address
- Access physical address
The operation of looking up the page table is also a memory access . in other words , CPU
Each memory access requires an additional memory access . The execution time is doubled directly .
Solution
The solution is that we have now used rotten : cache . Memory to CPU
There are already L1
L2
cache , stay mmu
There is also a cache of page tables in TLB
. The steps for each address translation are as follows ( Ignore missing pages ):
- see
TLB
Whether there is a cache in , If there is direct access - if
TLB
Does not exist in the , Get from the memory page table and putTLB
TLB
The premise of existence , Is that the access of the program is local . finally , It is the locality of the program that saves us .
Space problem
Let's simply calculate how much space is needed to store this page table .
stay 32 position CPU in , The accessible logical address space is 4G
. Suppose the page size is : 4kb
, Then the total number of pages is :
4G / 4kb = (2^32) / (2^12) = 2^20 = 1mb
suppose , Each element of the page table needs 4 Bytes , Then the total space required is : 4mb
. Each process just needs to store the page table 4mb
. And it's still 32 position , If it is 64 Who? ? You can calculate it , The result is exaggerated .
Solution
Learn from the idea of memory paging , We divide memory into n
A page , You can load it on demand . Again , You can also divide a large page table into n
A small page table , Then you can partially load , Both Multi level page table
Explain with the simplest two-level page table , The virtual memory is roughly divided as follows :
The structure of the page table is roughly as follows :
Be careful , At this time, the page number in the logical address stores two contents :
- Index of the first level page table
- Index of secondary page table
Why does multi-level page table solve the problem of space ? Again, according to the local principle of the program , Most of the corresponding values in the first level page table are empty , Most of the secondary page tables are not loaded into memory .
Now let's calculate again , still 32 position CPU
, The size of the page is still 4kb
, The element size in the page is still 4 byte . At this point, assume that each element of the first level page table is responsible for 4mb
Space . Then the total number of pages occupied by the first level page table is : 4G / 4mb = (2^32) / (2^22) = 2^10
. The space occupied by the first level page table is : (2^10) * (2^2)=4kb
The total number of pages in each secondary page table is : 4mb / 4kb = (2^22) / 2(12) = 2^10 = 1024
, Occupancy space : (2^10) * (2^2) = 4kb
Only the first level page table is resident in memory , The second level page table only needs to load part of it . The space drops directly .
however , It brings a new problem , Now get a physical address , Need to access memory twice , This is slower than before ? Don't forget what just happened TLB
, With this layer of cache , Most of the visits are in mmu
It's going on inside . Again , The locality principle of the program saved us .
Multi level page table , Expand the secondary page table further , You can get a multi-level page table , I won't repeat .
The locality of the program
Know how the address is mapped , How does it help us to write programs in ordinary times ?
Page conversion is based on the locality of the program , So when we write code , Try to ensure that what you write is local , for instance :
int main() {
int i, j;
int arr[1024][1024];
// The first way
for(i = 0; i < 1024; i++){
for(j = 0; j < 1024; j++){
global_arr[i][j] = 0;
}
}
// The second way
for(j = 0; j < 1024; j++){
for(i = 0; i < 1024; i++){
global_arr[i][j] = 0;
}
}
}
The purpose of the above code is very simple , Give me a 1024*1024
Initializes a two-dimensional array of . Can you see the difference between the two methods ?
Traversal methods are different , Mode one Is a line by line traversal , Mode two Is a column by column traversal .
We know , Two dimensional arrays are stored sequentially in memory . in other words , A two-dimensional array : [[1, 2, 3], [4, 5, 6]]
, The storage order in memory is : 1, 2, 3, 4, 5, 6
.
And our array , Each row 1024 individual int
Elements , Is precisely 4kb
The size of a page .
therefore , Mode one The order in which pages are accessed is : page1, page1 ... page1024, page1024
, Per page access 1024 Next time , Switch to the next page , Co occurring 1024 Next page switching
and , Mode two The order in which pages are accessed is : page1, page2...page1024 ... page1, page2...page1024
, Visit each page in turn , Per page access 1024 Time , Co occurring 1024*1024 Next page switching
High performance, lower judgment , Mode one More in line with the principle of locality , Mode two Your visit is too jumping .
Of course , Now when the memory is large , Everything is loaded into memory , meanwhile TLB
Cached all page mappings , There is no difference between the two methods . however :
- if
TLB
Capacity is insufficient , The new cache will eliminate the old cache , Frequent access to different pages will cause more cache invalidations - If memory capacity is insufficient , Writing a new page will weed out the old page , Frequent access to different pages will result in more memory swapping in and out .
Words alone are no proof
Of course , Words alone are no proof , In order to have an intuitive feeling about the switching mechanism of the above page , We go through getrusage
Function to get the page switching information of the program . The code is as follows :
#include <stdio.h>
#include <sys/resource.h>
const int M = 1024;
// Increase the size of the column , To make the effect obvious . 10mb
const int N = 1024*10;
// Because the stack size is limited , Therefore, the variable is promoted to global , Put it in the pile
int global_arr[1024][1024*10];
int main() {
int i, j;
// The first way
for(i = 0; i < M; i++){
for(j = 0; j < N; j++){
global_arr[i][j] = 0;
}
}
// The second way
// for(j = 0; j < N; j++){
// for(i = 0; i < M; i++){
// global_arr[i][j] = 0;
// }
// }
struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
printf(" Page recycle times : %ld\n", usage.ru_minflt);
printf(" Number of page breaks : %ld\n", usage.ru_majflt);
}
It is still relatively simple for computers to run such a small program , It won't make any difference , Therefore, the memory of the process should be limited . I am through restriction docker
Available memory :
docker run -it -m 6m --memory-swap -1 debian bash
good , Everything is available. , Let's see the results :
Mode one
Mode two
You can see , Mode one Want to compare Mode two Is much better .
so , Procedures with high performance requirements , When you have no optimization direction , Locality may help you .
Link to the original text : https://hujingnb.com/archives/698
边栏推荐
- JUnit unit test
- Codeforces Round #720 (Div. 2)
- Leetcode (146) - LRU cache
- Second understanding permutation and combination
- Undo log and redo log must be clear this time
- 等保备案是等保测评吗?两者是什么关系?
- Limit summary (under update)
- Reflection - class object function - get method (case)
- Pytest test framework II
- Shell script
猜你喜欢
Sleep revolution - find the right length of rest
虚拟化是什么意思?包含哪些技术?与私有云有什么区别?
After a few years in the testing industry, do you still know a little?
Basic database syntax learning
How Fiddler works
JMeter response assertion
Network flow 24 questions (round table questions)
Web automation: summary of special scenario processing methods
Interpreter mode -- formulas for dating
[performance tuning basics] performance tuning strategy
随机推荐
Packaging_ Conversion between basic type and string type
Nifi fast authentication configuration
Shell script
Leetcode (455) - distribute cookies
More than ten years' work experience is recommended at the bottom of the box: how much does it cost to find a job? See here! Brothers and sisters are recommended to collect and pay attention
Leetcode (135) - distribute candy
Memo mode - game archiving
伯克利、MIT、剑桥、DeepMind等业内大佬线上讲座:迈向安全可靠可控的AI
After screwing the screws in the factory for two years, I earned more than 10000 yuan a month by "testing" and counterattacked
Procedural life: a few things you should know when entering the workplace
网络安全审查办公室对知网启动网络安全审查
A/B测试助力游戏业务增长
maptalks:数据归一化处理与分层设色图层加载
Record a deletion bash_ Profile file
IDEA Dashboard
Postman assertion
Hongxiang Yunteng is compatible with dragon lizard operating system, and the product runs stably
After 5 months' test, it took 15K to come for an interview. When I asked, it was not worth even 5K. It was really
Format method and parse method of dateformat class
Axi DMA IP core operation process