The computing processes



registers and RAM

Subroutine and procedure

a group of instructions

interrupt handler

procdure + optional own stack

softirq handler

procdure + optional own stack

kernel thread

thread vs event
change of processes: associate change structure with a object and has more states; event-driven -> schedule
space: +kernel address space, more kind of resource space
change: implict changings.


change: linkage of user space code
space: whole address space


share process address space
Linux线程TLS(Thread-Local Storage)相关 GS段寄存器作用


Change of processes


pc + 1

Subroutine and procedure

save: push pc + 1; optional caller saved registers
entry: save rbp; callee saved registers
return: mov rbp -> rsp; pop rbp; ret


system call

The Definitive Guide to Linux System Calls
Measurements of system call performance and overhead
AMD vs Intel and syscall vs sysenter
System Call Optimization with the SYSENTER Instruction
Sysenter Based System Call Mechanism in Linux 2.6
kernel documentation
Meltdown and Spectre


  • save: pc + 1, old rsp, registers
    pc + 1-> RCX
  • entry: pc
    IA32_LSTAR -> pc

    kernel implementations

    64-bit long mode: syscall; check syscall_init
    64-bit compatible kernel: sysenter, syscall, or int 0x80; check __kernel_vsyscall and def_idts
    ??32-bit kernel: int 0x80, sysenter;

    vDSO and vsyscall

    On vsyscalls and the vDSO
    linux syscalls on x86 64


  • 64-bit without COMPAT32/compatible kernel
    [ 730.583700] traps: int80[1697] general protection ip:4000c4 sp:7ffd84b59730 error:402 in int80[400000+1000]
    Segmentation fault (core dumped)

  • 64-bit syscall



history of interrupts
Another History of interrupts with video
Interrupts: asynonymous(passively received), external
Exception: synonymous(actively detected), internal
Software interrupts: is a trap. int/int3, into, bound.


Mask exception

RF in EFLAGS for masking #DB


Thread switch

Al Viro’s new execve/kernel_thread design

call+jump+ret - 0100301bfdf56a2a370c7157b5ab0fbf9313e1cd

((last) = __switch_to_asm((prev), (next))); #=====> call

jmp __switch_to #=====> jmp + ret

Old version switch_to - push+jmp+ret

- asm volatile(“pushl %%ebp\n\t” /* save EBP /
- “movl %%esp,%[prev_sp]\n\t” /
save ESP / \ #=====>PREV: Save ESP into task struct thread.
- “movl %[next_sp],%%esp\n\t” /
restore ESP / \ #=====>NEXT: Setup stack for linkage from task struct thread.
- “movl $1f,%[prev_ip]\n\t” /
save EIP / \ #=====>PREV: Save PC into task sturct thread.
- “pushl %[next_ip]\n\t” /
restore EIP / \ #=====>NEXT: push - Store PC on stack from task struct thread;
- “jmp __switch_to\n” /
regparm call / \ #=====>NEXT: jmp + ret - Return and restore PC.
- “1:\t”
- “popl %%ebp\n\t” /
restore EBP /
- /
output parameters /
- : [prev_sp] “=m” (prev->thread.sp),
- [prev_ip] “=m” (prev->thread.ip),
- “=a” (last),
- /
clobbered output registers: /
- “=b” (ebx), “=c” (ecx), “=d” (edx),
- “=S” (esi), “=D” (edi)
- /
input parameters: */
- : [next_sp] “m” (next->thread.sp),
- [next_ip] “m” (next->thread.ip),
Why does switch_to use push+jmp+ret to change EIP, instead of jmp directly?

Kernel thread


Task switching

DBG: Softlockup

 ps aux | grep watchdog
root 13 0.0 0.0 0 0 ? S 08:23 0:00 [watchdog/0]
root 16 0.0 0.0 0 0 ? S 08:23 0:00 [watchdog/1]
root 22 0.0 0.0 0 0 ? S 08:23 0:00 [watchdog/2]
root 28 0.0 0.0 0 0 ? S 08:23 0:00 [watchdog/3]

DBG: Hung tasks bugs

think for myself

A kernel bug casuse task to be stuck in “D” state indefinitely.
1. A D state task wait list.
2. Hung task timeout.
3. Timestamp on adding task to “D” state wait list.
4. Kernel thread for detecting hung tasks - schedule timeout; why kthread?


How could I find all the D state tasks?
1. kernel must use specific functions to put D-task on wait list.
2. Embeded codes into specific functions to catch ’D’ state tasks and put them in the wait list for Hung tasks detecting.

Kernel Implemention

* diffeneces
1: kenrel task list - init_task.tasks and p->signal->thread_head in copy_process
3: t->nvcsw + t->nivcsw, t->last_switch_count and timeout
cat /proc/self/status | grep ctxt_switches


User preemption - Linux kernel user mode is always User preemption.

  • When returning to user-space from a system call.
  • When returning to user-space from an interrupt hander.

    Linux kernel kernel mode is coppertive when CONFIG_PREEMPT is not set.

    bloked (which results in a call to schedule())
    If a task in the kernel explicitly calls schedule() it’s involuntary!!!

    Linux kernel kernel mode is coppertive + preemptive when CONFIG_PREEMPT is set.

  • When an interrupt handler exits, before returning to kernel-space.

  • need_resched - When kernel code becomes preemptible again.

  • set_tsk_need_resched() in resched_curr
    tick: check_preempt_tick or entity_tick
    fork: wake_up_new_task->check_preempt_curr->check_preempt_wakeup
    wakeup: check_preempt_wakeup

  • if (need_resched()) cond_resched();


  • if (!preempt && prev->state)in __schedule why prev->state?
    it’s because of need_resched
    ?? schedule_idle


Procedure and subroutine and linkage method
* 1945 Turing on subroutines in Proposed electronic calculator.
In Chapter 6. Outline of Logical Control.
We also wish to be able to arrange for the splitting up of operations into…
When we wish to start on a subsidiary operation we need only make a note
of where we left off the major operation…
* 1952 The use of sub-routines in programmes
The above remarks may be summarized by saying sub-routines are very useful — although not absolutely necessary — and that the prime objectives to be born in mind when constructing them are simplicity of use, correctness of codes and accuracy of description. All complexities should—if possible—be buried out of sight.
* 1960 Dijkstra, E. W. (1960). “Recursive Programming”


Structred programming

Structured Programming - Dijkstra




Periodical: schedule_timeout

inter-process info

Call stack

Fork a new process

What does the child process need from parent?
sched_fork: setup schduling stuff
memory: copy parent’s mm
How to share memory stuff with parent process?
Linux use COW technique to do this.
How does COW work?
Why does Linux just share page?
How to diverge the child execution flow from parent?
what is the first instruction executed by the child process?

Insepct process status

Kernel mapping: tgid_base_stuff show_map_vma


3A: Chaper 5


Check glibc sysdeps/unix/sysv/linux/x86_64/clone.S for creating a new thread.

idle kernel stack

master idle进程的kernel stack在init/init_task.c:init_thread_union
其他进程的kernel stack是fork产生.
this_cpu_write(kernel_stack,(unsigned long)task_stack_page(next_p) +THREAD_SIZE);
this_cpu_write(cpu_current_top_of_stack,(unsigned long)task_stack_page(next_p) +THREAD_SIZE);
主处理器上的idle由原始进程(pid=0)演变而来。从处理器上的idle由init进程fork得到,但是它们的pid都为0 init_idle.

Zombie process

forked child not reaped by parent will hooked in process list.
if parent was killed and exit will repaped.
表明父活着, 但不收尸.