Linux memory management

Reference

BEFORE MEMORY WAS VIRTUAL
Memory part 3: Virtual Memory

Contents

Logic gates: SRAM, DRAM
What is data/contrl/addr bus?
What problems we will meet in memory management?
Memory allocation and release. Bootmem and Buddy system is pretty good. Can we eliminate Bootmem?
How to distribute these memory to processes? Virtual memory.
How to translate linear address to physical address? Page table.
Exchange data between primary memory and second memory. Paging.

What are the pitfall of manuplate directly physcial memory

Memory pagge cache and buffer cache.

page cache for memory, buffer cache for fs(block size is dependent on filesystem).
address_pace->page_tree: why radix tree, read ahead,
do_page_fault->read_page

For buffer cache: buffer cache is only a wrapper of page for fs operations.
buffer_head is temporary data released in a deeper function than the function allocing the buffer_head.
__block_write_full_page
block_read_full_page

I can not cover every corner of kernel, so If need, I will learn it.
The coherency problem, fs-writeback
Flushing out pdflush

Swapping
User space process: anonymous mapping(stack,heap,mmap), IPC-share memory(anonymous?), private mapping

Shrink cache
LRU cache

page

An introduction to compound pages

Memroy mangement

GFP flags

__GFP_IO: allow disk IO
__GFP_FS: allow fs operations, depend on io.
more details in lwn, lkd

Virtual memory

vm的提出是为了解决。easy to use。
1. decoupling physical memory 符号集合。programmer 不需要关注底层细节。 任务转给操作系统。
2. VM相对物理内存增加了表达能力, 有了更多表达符号。着减少了swap or 不必要的页表抖动。

Vmalloc

may sleep.

Hwo Vmalloc works?

Work in HIGHMEM and NORMALMEM
The skeleton is rbtree, root is global variable vmap_area_root.rb_node.
struct vm_struct likes struct address_space, functionlly;
struct vmap_area likes struct vm_area_struct.
map_vm_area 页表映射
the page in ZONE_NORMAL will not use directly mapping pfn address! It use VMALLOC address!

##Process virtual memory
* struct vm_area_struct: The intervals of legal address are called memory areas is permitted to access.
* struct address_space: To establish an association between the regions of the vm and the places where the related data are located.
i_mmap: how many processes opened this file.
https://lkml.org/lkml/2012/8/7/46
* sturct mm_struct: how many files(vm_area_struct) does this process opened.

  • Memory mappings
    syscall remap_file_pages Nolinear mappings is deprecated, since Linux 3.16

  • link
    a virtual address and physical address. –page tale
    a memory region of a process and its virtual page addresses. –vm_area_struct
    a region of file(one physical) and all virtual address spaces(many virtual) into which the region is mapped. address_space->i_mmap.
    a physical page and the processes that share the page(used in swap case)

#Physical memory
* NUMA/UMA pg_data_t: My PC is UMA, numatop, numastat, numactl
* ZONE(DMA/NORMAL/HIGHMEM) struct zone:
* struct page is the basic unit of kernel mm knowns as page frame.
The goal of strcut page is to describe physical memory, not the data contained therein.
* The buddy system is per-zone struct free_area
* Physical address is connected to Virtual address by pfn = page - mem_map;

page allocator

alloc_pages()
##Page/buffer cache
struct address_space->page_tree
##Page writeback
data synchronization, the flush threads, pdflush
##Page swap
The available RAM memory in a computer is never enough to meet user needs or to always satisfy memory-intensive applications.

#FAQ
* Where is Per-CPU variable?
static Per-CPU in .data(?) below high_memory!
runtime Per-CPU, it’s GFP_KERNEL in pcpu_create_chunk()

  • Memory mode
    flat mem -> uma
    discontig -> NUMA
    sparse -> Hotplug + NUMA

  • When does kernel alloc these struct pages in x86_64?
    http://lwn.net/Articles/229670/
    vmemmap silimar to memmap

  • When kmap_atomic() BUG_ON effect?

  • How cpu resolve address below high_memory?
    Cpu-spicific!
    x86 used page table to all address!
    Mips cpu can be aware of this address!

  • How to deal with useless page? : > /home/firo/bigdata

  • pfmemalloc – skb 表示申请了紧急内存!
    page free

  • compound pages
    18fa11efc279c20af5eefff2bbe814ca067

    Memory initialization onset:

    先从bios 拿信息 main -> detect_memory save in boot_params.e820_map
    之后real -> protected -> long mode
    启动 protected? mode. What does protected mode mean
    setup_arch
    setup_memory_map -> default_machine_specific_memory_setup // Save into struct e820map e820; from boot_params.e820_map. That’s all.
    max_pfn = e820_end_of_ram_pfn(); // max_pfn BIOS-e820: mem 0x0000000100000000-0x00000003227fffff usable and last_pfn = 0x322800(12840MB), so last_pfn is invalid address, use it with <.
    mtrr update max_pfn, see Processor supplementary capability
    trim_low_memory_range // reserve 64k
    max_low_pfn = e820_end_of_low_ram_pfn(); //4GB以下的end of block
    memblock_x86_fill// copy e820 to memblock, reconstructs direct memory mapping and setups the direct mapping of the physical memory at PAGE_OFFSET
    early_trap_pf_init // X86_TRAP_PF, page_fault) => do_page_fault
    init_mem_mapping //set page table and cr3.
    initmem_init ; NUMA init
    x86_init.paging.pagetable_init();= paging_init //x86_64 ->zone_sizes_init->…free_area_init_core
    mm_init
    memblock the implementations of memblock is quite simple. static initialization with variable memblock.
    bootmem is discarded by ARM and x86
    a little history e820_register_active_region replaced by lmb replaced by memblock
    reserve_initrd ; // RAMDISK
    总结下, 内存初始化需要的基础.

  • e820 get memory region.

  • set PF trap do_page_fault.

  • set page table and cr3.
    这就完了. 之后开始开始加工.
    arch 相关的x86_init.paging.pagetable_init = native_pagetable_init = paging_init ->
    sparse_init
    {
    memblock_virt_alloc // sizeof(struct page *) * NR_MEM_SECTIONS;NR_MEM_SECTIONS = 1 << 19, Is it big?
    // alloc memory region will be marked in memory_reserved.
    }
    zone_sizes_init.
    {
    calculate_node_totalpages // 每个zone的page数.
    free_area_init_node-> alloc_node_mem_map // alloc mem_map for FLAT
    free_area_init_core
    {
    calc_memmap_size // 1/4k of spanned or present used for memmap. heuristic?
    zone_pcp_init // init percpu pageset with boot_pageset
    set_pageblock_migratetype(page, MIGRATE_MOVABLE);// 512 per zone
    memmap_init_zone-> __init_single_page // init every page in a zone.
    }
    }
    arch independent,
    build_all_zonelists
    page_alloc_init // drain percpu pageset when cpu dead or dead frozen for CPU hotplug
    mm_init
    Zone watermarks core_initcall(init_per_zone_wmark_min)