How does the Linux Kernel begin a Course of

On this article, you’ll be taught what occurs contained in the Linux Kernel when a course of calls execve()
, how the Kernel prepares the stack and the way management is then handed to the userland course of for execution.
I needed to be taught this for the event of Zapper – a Linux software to delete all command line choices from any course of (while not having root).
Overview
-
The Kernel receives SYS_execve() by a userland program.
-
The Kernel reads the executable file (particular sections) into particular reminiscence areas.
-
The Kernel prepares the stack, heap, alerts, …
-
The Kernel passes execution to the userland program.
Inspecting a binary
Allow us to begin with a easy Linux C program:
int predominant(int argc, char *argv[0]) {
return 0;
}
Compile it with gcc -static -o none none.c
and discover out some particulars:
$ readelf -h none
ELF Header:
Magic: 7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00
Class: ELF64
Information: 2's complement, little endian
Model: 1 (present)
OS/ABI: UNIX - GNU
ABI Model: 0
Kind: EXEC (Executable file)
Machine: Superior Micro Gadgets X86-64
Model: 0x1
Entry level handle: 0x4014f0
Begin of program headers: 64 (bytes into file)
Begin of part headers: 760112 (bytes into file)
Flags: 0x0
Dimension of this header: 64 (bytes)
Dimension of program headers: 56 (bytes)
Variety of program headers: 10
Dimension of part headers: 64 (bytes)
Variety of part headers: 30
Part header string desk index: 29
The primary directions begin on the ‘Entry Level’ at 0x4014f0
. These directions had been created by the compiler (gcc
, go
, and many others). They differ by compiler.
Let’s load the binary into gdb and disass 0x4014f0
the directions. The directions carry out a little bit of housekeeping however finally will name predominant()
(or the GoLang equal).
Let’s set a break-point on the Entry Level (0x4014f0
)and run the app with two command line choices (firstarg
and secondarg
):
gdb ./none
pwndbg> disass 0x4014f0
pwndbg> br *0x4014f0
pwndbg> r firstarg secondarg
► 0x4014f0 <_start> xor ebp, ebp
0x4014f2 <_start+2> mov r9, rdx
0x4014f5 <_start+5> pop rsi
[...]
──────────────────────[ STACK ]──────────────────────
00:0000│ rsp 0x7ffca4229540 ◂— 0x3
01:0008│ 0x7ffca4229548 —▸ 0x7ffca422a4b3 ◂— '/sec/root/none'
02:0010│ 0x7ffca4229550 —▸ 0x7ffca422a4c2 ◂— 'firstarg'
03:0018│ 0x7ffca4229558 —▸ 0x7ffca422a4cb ◂— 'secondarg'
04:0020│ 0x7ffca4229560 ◂— 0x0
05:0028│ 0x7ffca4229568 —▸ 0x7ffca422a4d5 ◂— 'BASH_ENV=/and many others/shellrc'
[...]
(In case you are utilizing gdb with out pwngdb then it’s possible you’ll have to x/64a $rsp
to record the primary 64 entries from the stack.)
The Stack Pointer rsp
is at 0x7ffd4f48bd10
. Let’s discover out the top of the stack with grep -F '[stack]' /proc/$(pidof none)/maps
:
7ffd4f46c000-7ffd4f48d000 rw-p 00000000 00:00 0 [stack]
The Kernel has allotted the stack reminiscence from 0x7ffd4f46c000
to 0x7ffd4f48d000
– a complete of 132 KB. It should develop dynamically as much as 8MB (ulimit -s
kilobytes). Our program (to this point; see rsp
) solely makes use of the stack from the rsp
handle (0x7ffd4f48bd10
) all the way down to the identical finish of the stack (0x7ffd4f48d000
) – a complete of 4,848 bytes (echo $((0x7ffd4f48d000 - 0x7ffd4f48bd10))
== 4848).
That is the ‘delivery’ of the execution: The Kernel, in all its braveness, has handed management to our program. Our program is about to execute its very first instruction – to take its very first step (so to say).
What’s on the stack proper now’s all the knowledge this system will get from the Kernel to run. It accommodates the argument record, the surroundings variables and loads of different attention-grabbing data.
For Zapper we needed to manipulate the argument record, transfer stack values round, alter the pointers after which cross management again to this system – with out it falling over. It was prudent to grasp a bit higher what the Kernel had placed on the stack.
Let’s dump the stack:
pwndbg> dump binary reminiscence stack.dat $rsp 0x7ffd4f48d000
and cargo it into hd <stack.dat
or xxd <stac.dat
and have slightly sneak preview…
03 00 00 00 00 00 00 00 b3 a4 22 a4 fc 7f 00 00 |..........".....|
c2 a4 22 a4 fc 7f 00 00 cb a4 22 a4 fc 7f 00 00 |..".......".....|
00 00 00 00 00 00 00 00 d5 a4 22 a4 fc 7f 00 00 |..........".....|
eb a4 22 a4 fc 7f 00 00 11 a5 22 a4 fc 7f 00 00 |..".......".....|
25 a5 22 a4 fc 7f 00 00 30 a5 22 a4 fc 7f 00 00 |%.".....0.".....|
[...]
b4 5c 18 e0 ed f9 fb 0d 30 78 38 36 5f 36 34 00 |.......0x86_64.|
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00 00 00 2f 73 65 63 2f 72 6f 6f 74 2f 6e 6f 6e |.../sec/root/non|
65 00 66 69 72 73 74 61 72 67 00 73 65 63 6f 6e |e.firstarg.secon|
64 61 72 67 00 42 41 53 48 5f 45 4e 56 3d 2f 65 |darg.BASH_ENV=/e|
74 63 2f 73 68 65 6c 6c 72 63 00 43 48 45 41 54 |tc/shellrc.CHEAT|
5f 43 4f 4e 46 49 47 5f 50 41 54 48 3d 2f 65 74 |_CONFIG_PATH=/et|
[...]
55 4d 4e 53 3d 31 31 38 00 2f 73 65 63 2f 72 6f |UMNS=118./sec/ro|
6f 74 2f 6e 6f 6e 65 00 00 00 00 00 00 00 00 00 |ot/none.........|
A number of pointers. A number of strings. A number of unknowns.
Let’s observe the decision from execve()
to the brand new program’s entry level.
The execve()
calls the Kernel through a syscall which then calls do_execve():
Ultimately, this leads to do_execveat_common()
. The bprm
construction is created and assigned all types of details about this system (see binfmts.h).
Necessary to us, this system’s filename, surroundings variables and choices (argv
) are copied from Kernel reminiscence to the method’s stack. The stack grows UP in the direction of the decrease addresses: The primary merchandise placed on the stack (the bprm->filename
) is thus on the largest handle on the stack (the underside) and above it (e.g. smaller addresses) comes the envp after which the argv.
We observe to bprm_execve()
the place some checks are accomplished earlier than calling exec_binprm()
. From there into search_binary_handler()
the place the Kernel checks if the binary is ELF, a shebang (#!
) or another kind registered through the binfmt-misc module. The kernel then calls the suitable operate to load the binary.
In our case, it is an ELF binary and so load_elf_binary()
known as. The kernel creates the reminiscence and thereafter maps the sections from the binary file into reminiscence and likewise calls begin_new_exec()
to set all credentials and permissions for the brand new course of.
The kernel then checks if the ELF binary needs to be loaded by an interpreter (ld.so
):
or, within the case of a static binary like ours, loaded instantly with out an interpreter:
Lastly, create_elf_tables()
known as. That is the place all of the stack magic occurs that we’re fascinated about.
First, the operate arch_align_stack() provides a random quantity of zeros to the stack (e.g. stack randomization) to make (some) buffer overflow exploits work (slightly) much less reliably. It additionally aligns the stack to 16 bytes.
The kernel then places x86_64
onto the stack and subsequent provides 16 bytes of random knowledge on high (which libc makes use of as a seed for its PRNG):
The kernel then creates the elf auxiliary desk: A set of (id, worth) pairs that describe helpful details about this system being run and the surroundings it’s operating in, communicated from the kernel to person house.
The record ends with a Zero Identifier and Zero worth (e.g. 16 bytes of 0x00). There are about 20 entries (320 bytes) within the record.
The desk begins with ARCH_DLINFO (which expands to AT_SYSINFO_EHDR + AT_MINSIGSTKSZ). The ‘Identifiers’ are outlined in auxvec.h.
In gdb
the elf auxiliary desk appears to be like like this:
[... above is argc ...]
[... above is argvp ...]
[... above here is envp ...]
0x7ffca42296a8: 0x21 0x7ffca4351000 <-- AT_SYSINFO_EDHR
0x7ffca42296b8: 0x33 0xd30 <-- AT_MINSIGSTKSZ
0x7ffca42296c8: 0x10 0x178bfbff <-- AT_HWCAP
0x7ffca42296d8: 0x6 0x1000 <-- AT_PAGESZ
0x7ffca42296e8: 0x11 0x64
0x7ffca42296f8: 0x3 0x400040
0x7ffca4229708: 0x4 0x38
0x7ffca4229718: 0x5 0xa
0x7ffca4229728: 0x7 0x0
0x7ffca4229738: 0x8 0x0
0x7ffca4229748: 0x9 0x4014f0 <-- Our entry level
0x7ffca4229758: 0xb 0x0
0x7ffca4229768: 0xc 0x0
0x7ffca4229778: 0xd 0x0
0x7ffca4229788: 0xe 0x0
0x7ffca4229798: 0x17 0x0
0x7ffca42297a8: 0x19 0x7ffca42297f9 <-- Ptr to Random
0x7ffca42297b8: 0x1a 0x2
0x7ffca42297c8: 0x1f 0x7ffca422afe9 <-- Ptr to filename
0x7ffca42297d8: 0xf 0x7ffca4229809 <-- Ptr to x86_64
0x7ffca42297e8: 0x0 0x0 <-- NULL + NULL
[... ELF table stops here ...]
0x7ffca42297f8: 0xe8e8de3a49831f00 0xdfbf9ede0185cb4 <-- RND16
0x7ffca4229808: 0x34365f363878af 0x0 <-- "x86_64" + ' '
[... below is empty space from stack randomization ...]
[... below are argv strings ...]
[... below are env strings ...]
[... last is the filename (/root/none) ...]
Thereafter the kernel allocates stack reminiscence to retailer the elf-aux-table, the argv- and env-pointers and the argc worth (+1) and aligns the highest of the stack to 16 bytes:
…after which places the argc, argv-pointers and env-pointers onto the stack:
…after which copies the elf_info (from above) on the stack (aligned; under the env pointers).
(The intelligent reader might have seen that ‘RND16’ doesn’t begin at an aligned handle – 0x7ffca42297f9: It’s as a result of RND16 was placed on the stack earlier than the STACK_ROUND() name to place the elf-info desk and env/argv pointers).
Now again in load_elf_binary(), the kernel units the registers, clears some stuff and at last (!) calls START_THREAD() to start out this system.
Afterthought: Somebody identified the article at https://lwn.net/Articles/631631/ with higher ASCII artwork than mine ????. The format of the stack earlier than execution (the wrong way up; beginning on the largest handle and rising downwards in the direction of the smaller addresses):
Learn how to Zapper
Wheeee. What a experience. For Zapper we ptrace() on the entry-point, improve the stack to make a duplicate of argv/env-strings, alter all of the tips to level to ‘our’ copy of the argv/env-strings and ZERO the unique ones to 0x00. The kernel doesn’t learn about it and nonetheless references the now zero’d argv/env-strings and …wush..they’re gone from the method record.
Be a part of us on Telegram: https://t.me/thcorg