当前位置：网站首页>Address mapping of virtual memory paging mechanism

Address mapping of virtual memory paging mechanism

2022-06-24 21:12:00 【The smell of tobacco】

summary

Previous articles Virtual memory The paging mechanism is briefly introduced . There is another question , That is how to map the logical address in the virtual memory to the physical address ? Let's make a brief analysis today .

For a paged address , It usually contains two elements :

Page number : Page
Offset : The number of bytes of the current page

The following addr_virtual(p, o) Represents a logical address , With addr_real(p, o) Represents a physical address ( The physical address is also paged ).

A page table

The first step is to think about , If you want to find the corresponding physical address according to a logical address , Then the corresponding relationship must be stored somewhere , Because mapping is irregular . What data structure should be used to store ?

Because in paging , page Is a minimum unit , Therefore, we only need the mapping relationship of page numbers , Logical and physical addresses have the same page size , The offset is exactly the same .

according to key seek value, This is not a map Well . Look at this again map Of key, Page numbers are all numbers , And it's sequential . This is just an array . beautiful , It's an array .

in other words , This page table is indexed by logical page numbers , One dimensional array with physical page number as value . The process of address translation is as follows :

The elements in the page table do not just store physical page numbers , Some additional flag bits are also stored , Used to identify the properties of the current page , A few simple examples :

Existential bit : Whether it is currently loaded into memory . If it is not loaded, the operating system is required to load
Modify bit : Whether the current page has been modified in memory . If modified , When reclaiming physical memory, you need to write it to the hard disk
Kernel permissions : Whether the current page needs kernel mode to access
Whether to read bits
Whether bits can be written
Is it executable
wait

Because each process has its own virtual memory , So each process needs its own page table .

In order to improve operation efficiency , This translation process is completed by hardware , Both CPU Memory management unit in mmu To complete .

Is it easy to see ? good , Introduction completed , This concludes the article .

Problems and solutions

ha-ha , To make fun of . How could it be so easy to end . Now let's briefly analyze the problems of this simple model . According to the experience of the algorithm , Most algorithms are implemented , Or the time complexity is too high , Or the space complexity is too high .

Time issues

Imagine the steps to access a memory :

Look up the page table to find the corresponding physical address
Access physical address

The operation of looking up the page table is also a memory access . in other words , CPU Each memory access requires an additional memory access . The execution time is doubled directly .

Solution

The solution is that we have now used rotten : cache . Memory to CPU There are already L1L2 cache , stay mmu There is also a cache of page tables in TLB. The steps for each address translation are as follows ( Ignore missing pages ):

see TLB Whether there is a cache in , If there is direct access
if TLB Does not exist in the , Get from the memory page table and put TLB

TLB The premise of existence , Is that the access of the program is local . finally , It is the locality of the program that saves us .

Space problem

Let's simply calculate how much space is needed to store this page table .

stay 32 position CPU in , The accessible logical address space is 4G. Suppose the page size is : 4kb, Then the total number of pages is :

4G / 4kb = (2^32) / (2^12) = 2^20 = 1mb

suppose , Each element of the page table needs 4 Bytes , Then the total space required is : 4mb. Each process just needs to store the page table 4mb. And it's still 32 position , If it is 64 Who? ? You can calculate it , The result is exaggerated .

Solution

Learn from the idea of memory paging , We divide memory into n A page , You can load it on demand . Again , You can also divide a large page table into n A small page table , Then you can partially load , Both Multi level page table

Explain with the simplest two-level page table , The virtual memory is roughly divided as follows :

The structure of the page table is roughly as follows :

Be careful , At this time, the page number in the logical address stores two contents :

Index of the first level page table
Index of secondary page table

Why does multi-level page table solve the problem of space ? Again, according to the local principle of the program , Most of the corresponding values in the first level page table are empty , Most of the secondary page tables are not loaded into memory .

Now let's calculate again , still 32 position CPU, The size of the page is still 4kb, The element size in the page is still 4 byte . At this point, assume that each element of the first level page table is responsible for 4mb Space . Then the total number of pages occupied by the first level page table is : 4G / 4mb = (2^32) / (2^22) = 2^10. The space occupied by the first level page table is : (2^10) * (2^2)=4kb

The total number of pages in each secondary page table is : 4mb / 4kb = (2^22) / 2(12) = 2^10 = 1024, Occupancy space : (2^10) * (2^2) = 4kb

Only the first level page table is resident in memory , The second level page table only needs to load part of it . The space drops directly .

however , It brings a new problem , Now get a physical address , Need to access memory twice , This is slower than before ? Don't forget what just happened TLB, With this layer of cache , Most of the visits are in mmu It's going on inside . Again , The locality principle of the program saved us .

Multi level page table , Expand the secondary page table further , You can get a multi-level page table , I won't repeat .

The locality of the program

Know how the address is mapped , How does it help us to write programs in ordinary times ?

Page conversion is based on the locality of the program , So when we write code , Try to ensure that what you write is local , for instance :

int main() {
    
    int i, j;
    int arr[1024][1024];
    //  The first way 
    for(i = 0; i < 1024; i++){
    
        for(j = 0; j < 1024; j++){
    
            global_arr[i][j] = 0;
        }
    }
    //  The second way 
    for(j = 0; j < 1024; j++){
    
        for(i = 0; i < 1024; i++){
    
            global_arr[i][j] = 0;
        }
    }
}

The purpose of the above code is very simple , Give me a 1024*1024 Initializes a two-dimensional array of . Can you see the difference between the two methods ?

Traversal methods are different , Mode one Is a line by line traversal , Mode two Is a column by column traversal .

We know , Two dimensional arrays are stored sequentially in memory . in other words , A two-dimensional array : [[1, 2, 3], [4, 5, 6]], The storage order in memory is : 1, 2, 3, 4, 5, 6.

And our array , Each row 1024 individual int Elements , Is precisely 4kb The size of a page .

therefore , Mode one The order in which pages are accessed is : page1, page1 ... page1024, page1024, Per page access 1024 Next time , Switch to the next page , Co occurring 1024 Next page switching

and , Mode two The order in which pages are accessed is : page1, page2...page1024 ... page1, page2...page1024, Visit each page in turn , Per page access 1024 Time , Co occurring 1024*1024 Next page switching

High performance, lower judgment , Mode one More in line with the principle of locality , Mode two Your visit is too jumping .

Of course , Now when the memory is large , Everything is loaded into memory , meanwhile TLB Cached all page mappings , There is no difference between the two methods . however :

if TLB Capacity is insufficient , The new cache will eliminate the old cache , Frequent access to different pages will cause more cache invalidations
If memory capacity is insufficient , Writing a new page will weed out the old page , Frequent access to different pages will result in more memory swapping in and out .

Words alone are no proof

Of course , Words alone are no proof , In order to have an intuitive feeling about the switching mechanism of the above page , We go through getrusage Function to get the page switching information of the program . The code is as follows :

#include <stdio.h>
#include <sys/resource.h>

const int M = 1024;
//  Increase the size of the column ,  To make the effect obvious . 10mb
const int N = 1024*10;
//  Because the stack size is limited ,  Therefore, the variable is promoted to global ,  Put it in the pile 
int global_arr[1024][1024*10];

int main() {
    
    int i, j;
    //  The first way 
    for(i = 0; i < M; i++){
    
        for(j = 0; j < N; j++){
    
            global_arr[i][j] = 0;
        }
    }
    //  The second way 
// for(j = 0; j < N; j++){
    
// for(i = 0; i < M; i++){
    
// global_arr[i][j] = 0;
// }
// }

    struct rusage usage;
    getrusage(RUSAGE_SELF, &usage);
    printf(" Page recycle times : %ld\n", usage.ru_minflt);
    printf(" Number of page breaks : %ld\n", usage.ru_majflt);
}

It is still relatively simple for computers to run such a small program , It won't make any difference , Therefore, the memory of the process should be limited . I am through restriction docker Available memory :

docker run -it -m 6m --memory-swap -1 debian bash

good , Everything is available. , Let's see the results :

Mode one