Now Reading
Be taught x86-64 meeting by writing a GUI from scratch

Be taught x86-64 meeting by writing a GUI from scratch

2023-06-01 11:11:07

All articles

Most individuals assume meeting is just for use to put in writing toy applications for studying functions, or to put in writing a extremely optimized model of a selected perform inside a codebase written in a high-level language.

Properly, what if we wrote an entire program in meeting that opens a GUI window? Will probably be the whats up world of the GUI world, however that also counts. Here’s what we’re working in the direction of:

Result

I needed to broaden my data of meeting and by doing one thing enjoyable and motivating. All of it originated from the remark that so many program binaries right this moment are very huge, typically over 30 Mib (!), and I requested myself: How small a binary might be for a (very simplistic) GUI? Properly, it seems, little or no. Spoiler alert: round 1 KiB!

I’m under no circumstances an knowledgeable in meeting or in X11. I simply hope to supply an entertaining, approachable article, one thing a newbie can perceive. One thing I wanted I had discovered once I was studying these subjects. In the event you spot an error, please open a Github issue!

Desk of Contents

What do we’d like?

I might be utilizing the nasm assembler which is easy, cross-platform, quick, and has fairly a readable syntax.

For the GUI, I might be utilizing X11 since I’m primarily based on Linux and it has some attention-grabbing properties that make it simple to do with out exterior libraries. If you’re working Wayland, it ought to work with XWayland out of the field, and maybe additionally on macOS with XQuartz, however I’ve not examined these.

Word that the one distinction between *nix working programs within the context of this program is the system name values. Since I’m primarily based on Linux I might be utilizing the Linux system name values, however ‘porting’ this program to, say, FreeBSD, would solely require to alter these values, probably utilizing the nasm macros:

%ifdef linux
  %outline SYSCALL_EXIT 1
%elifdef freebsd
  %outline SYSCALL_EXIT 60
%endif

%outline and its variants are a part of the macro system in nasm, which is highly effective however we’ll solely use it right here to outline constants, similar to in C: #outline FOO 3.

No want for added tooling to cross-compile, points with dynamic libraries, libc variations, and many others. Simply compile on Linux by defining the best variable on the command line, ship the binary to your good friend on FreeBSD, and it simply works(tm). That’s refreshing.

So let’s dive in!

X11 fundamentals

X11 is a server accessible over the community that handles windowing and rendering inside these home windows. A shopper opens a socket, connects to the server, and sends instructions in a selected format to open a window, draw shapes, textual content, and many others. The server sends message about errors or occasions to the shopper.

Most purposes will need to use libX11 or libxcb which supply a C API, however we need to do this ourselves.

The place the server lives is definitely not related for a shopper, it’d run on the identical machine or in a datacenter far far-off. After all, within the context of a desktop laptop in 2023, will probably be working on the identical machine, however that’s a element.

The official documentation is fairly good, so unsure we will seek advice from it.

Predominant in x64 meeting

Let’s begin gradual with minimal program that merely exits with 0, and construct from there.

First, we inform nasm we’re writing a 64 bit program and that we goal x86_64. Then, we’d like a foremost perform, which we name _start and must be seen since that is the entry level of our program (therefore the international key phrase):

; Feedback begin with a semicolon!
BITS 64 ; 64 bits.
CPU X64 ; Goal the x86_64 household of CPUs.

part .textual content
international _start
_start:
  xor rax, rax ; Set rax to 0. Not truly wanted, it is simply to keep away from having an empty physique.

part .textual content is telling nasm and the linker, that what follows is code that must be positioned within the textual content part of the executable.

We’ll quickly have a part .information for our international variables.

Word that these part often get mapped by the OS to completely different pages in reminiscence with completely different permissions (seen with readelf -l) in order that the textual content part just isn’t writable and the info part just isn’t executable, however that varies from OS to OS.

The _start perform has a physique that does nothing for now, however not for lengthy. The precise identify of the principle perform is definitely as much as us, it’s simply that begin or _start is common.

We construct and run our little program like this:

$ nasm -f elf64 -g foremost.nasm && ld foremost.o -static -o foremost

nasm truly solely produces an object file, so to get an executable out of it, we have to invoke the linker ld. The flag -g is telling nasm to provide debugging data which is immensely helpful when writing uncooked meeting, since firing the debugger is commonly our solely recourse in face of a bug.

To take away the debugging data, we will go -s to the linker, for instance once we are about to ship our program and need to save a couple of KiB.

We lastly have an executable:

$ file ./foremost
foremost: ELF 64-bit LSB executable, x86-64, model 1 (SYSV), statically linked, with debug_info, not stripped

We will see the completely different sections with readelf -a ./foremost, and it tells us that the .textual content part, which accommodates our code, is just 3 bytes lengthy.

Now, if we attempt to run our program, it would segfault. That’s as a result of we’re anticipated by the working system to exit (utilizing the exit system name) ourselves. That’s what libc does for us in C applications, so let’s deal with that:

%outline SYSCALL_EXIT 60

international _start:
_start:
  mov rax, SYSCALL_EXIT
  mov rdi, 0
  syscall

nasm makes use of the intel syntax: <instruction> <vacation spot>, <supply>, so mov rdi, 0 places 0 into the register rdi. Different assemblers use the AT&T syntax which swaps the supply and vacation spot. My recommendation: decide one syntax and one assembler and keep on with it, each syntaxes are high quality and most instruments have some assist for each.

Following the System V ABI, which is required on Linux and different Unices for system calls, invoking a system name requires us to place the system name code within the register rax, the parameters to the syscall (as much as 6) within the registers rdi, rsi, rdx, rcx, r8, r9, and extra parameters, if any, on the stack (which is not going to occur on this program so we will overlook about it).
We then use the instruction syscall and examine rax for the return worth, 0 often that means: no error.

Word that Linux has a ‘enjoyable’ distinction, which is that the fourth parameter of a system name is definitely handed utilizing the register r10.

Word that the System V ABI is required when making system calls and when interfacing with C however we’re free to make use of no matter conventions we would like in our personal meeting code. For a very long time, Go was utilizing a unique calling conference than the System V ABI, for instance, when calling features (passing arguments on the stack). Most instruments (debuggers, profilers) count on the System V ABI although, so I like to recommend sticking to it.

Again to our program: once we run it, we see…nothing. That’s as a result of the whole lot went properly, true to the UNIX philosophy!

We will examine the exit code:

Altering mov rdi, 0 to mov rdi, 8 will now end in:

One other method to observe system calls made by a program is with strace, which will even show very helpful when troubleshooting. On some BSD, its equal is truss or dtruss.


$ strace ./foremost
execve("./foremost", ["./main"], 0x7ffc60e6bf10 /* 60 vars */) = 0
exit(8)                                 = ?
+++ exited with 8 +++

Let’s change it again to 0 and proceed.

A stack primer

Earlier than we will proceed, we have to know the fundamentals of how the stack works in meeting since we’ve no pleasant compiler to try this for us.

The three most vital issues in regards to the stack are:

  • It grows downwards: to order extra space on the stack, we lower the worth of rsp
  • A perform should restore the stack pointer to its unique worth earlier than the perform returns, that means, both bear in mind the unique worth and set rsp to this, or, match each decrement by an increment of the identical worth.
  • Earlier than a perform name, the stack pointer must be 16 bytes aligned, in keeping with the System V ABI. Additionally, on the very starting of a perform, the stack pointer worth is: 16*N + 8. That’s as a result of earlier than the perform name, its worth was 16 byte aligned, i.e. 16*N, and the name instruction pushes on the stack the present location (the register rip, which is 8 bytes lengthy), to know the place to leap when the known as perform returns.

Not abiding by these guidelines will end in nasty crashes, so be warned. That’s as a result of the situation of the place to leap when the perform returns might be seemingly overwritten and this system will bounce to the fallacious location. That, or the stack content material might be overwritten and this system will function on fallacious values. Unhealthy both manner.

A small stack instance

Let’s write a perform that prints whats up to the usual out, utilizing the stack, to study the ropes.

We have to reserve (a minimum of) 5 bytes on the stack, since that’s the size in bytes of whats up.

The stack seems like this:

And rsp factors to the underside of it.

Right here’s how we entry every ingredient:

Reminiscence location (instance) Meeting code Stack ingredient
0x1016
0x1015 rsp + 5 rbp
0x1014 rsp + 4 o
0x1013 rsp + 3 l
0x1012 rsp + 2 l
0x1011 rsp + 1 e
0x1010 rsp + 0 h

We then go the handle on the stack of the start of the string to the write syscall, in addition to its size:

%outline SYSCALL_WRITE 1
%outline STDOUT 1

print_hello:
  push rbp ; Save rbp on the stack to have the ability to restore it on the finish of the perform.
  mov rbp, rsp ; Set rbp to rsp

  sub rsp, 5 ; Reserve 5 bytes of area on the stack.
  mov BYTE [rsp + 0], 'h' ; Set every byte on the stack to a string character.
  mov BYTE [rsp + 1], 'e'
  mov BYTE [rsp + 2], 'l'
  mov BYTE [rsp + 3], 'l'
  mov BYTE [rsp + 4], 'o'

  ; Make the write syscall
  mov rax, SYSCALL_WRITE
  mov rdi, STDOUT ; Write to stdout.
  lea rsi, [rsp] ; Deal with on the stack of the string.
  mov rdx, 5 ; Move the size of the string which is 5.
  syscall

  add rsp, 5 ; Restore the stack to its unique worth.

  pop rbp ; Restore rbp
  ret

lea vacation spot, supply hundreds the efficient handle of the supply into the vacation spot, which is how C pointers are carried out. To dereference a mememory location we use sq. brackets. So, assuming we simply have loaded an handle into rdi with lea, e.g. lea rdi, [hello_world], and we need to retailer the worth on the handle into rax, we do: mov rax, [rdi]. We often have to inform nasm what number of bytes to dereference with BYTE, WORD, DWORD, QWORD so: mov rax, DWORD [rdi], as a result of nasm doesn’t maintain observe of the sizes of every variable. That’s additionally what the C compiler does once we dereference a int8_t, int16_t, int32_t, and int64_t pointer, respectively.

There’s a lot to unpack right here.

First, what’s rbp? That’s a register like some other. However, you’ll be able to select to comply with the conference of not utilizing this register like the opposite registers, to retailer arbitrary values, and as an alternative, use it to retailer a linked record of name frames. That’s lots of phrases.

Mainly, on the very starting of a perform, the worth of rbp is saved on the stack (that’s push rbp). Since rbp shops an handle (the handle of the body that’s known as us), we’re storing on the stack the handle of the caller in a recognized location.

Instantly after that, we set rbp to rsp, that’s, to the stack pointer at first of the perform. push rbp and mov rbp, rsp are thus often known as the perform prolog.

For the remainder of the perform physique, we deal with rbp as a relentless and solely lower rsp if we have to reserve area on the stack.

So if perform A calls perform B which in flip calls perform C, and every perform shops on the stack the handle of the caller body, we all know the place to seek out on the stack the handle of every. Thus, we will print a stack hint in any location of our program just by inspecting the stack. Fairly nifty. That’s already very helpful to profilers and different related instruments.

We should not overlook in fact, simply earlier than we exit the perform, to revive rbp to its unique worth (which continues to be on the stack at that time): that’s pop rbp. That is also referred to as the perform epilog. One other manner to take a look at it’s that we take away the final ingredient of the linked record of name frames, since we’re exiting the leaf perform.

Don’t fear when you’ve got not totally understood the whole lot, simply bear in mind to at all times have the perform epilogs and prologs and also you’ll be high quality:

my_function:
  push rbp
  mov rbp, rsp

  sub rsp, N

  [...]


  add rsp, N
  pop rbp
  ret

Word: There may be an optimization methodology that makes use of rbp as an ordinary register (with a C compiler, that’s the flag -fomit-frame-pointer), which suggests we lose the details about the decision stack. My recommendation is: by no means do that, it’s no price it.

Wait, however didn’t you say the stack must be 16 byte aligned (that’s, a a number of of 16)? Final time I checked, 5 just isn’t actually a a number of of 16!

Good catch! The one cause why this program works, is that print_hello is a leaf perform, that means it doesn’t name one other perform. Bear in mind, the stack must be 16 bytes aligned once we do a name!

So the proper manner can be:

print_hello:
  push rbp
  mov rbp, rsp

  sub rsp, 16
  mov BYTE [rsp + 0], 'h'
  mov BYTE [rsp + 1], 'e'
  mov BYTE [rsp + 2], 'l'
  mov BYTE [rsp + 3], 'l'
  mov BYTE [rsp + 4], 'o'

  mov rax, SYSCALL_WRITE
  mov rdi, STDOUT
  lea rsi, [rsp]
  mov rdx, 5
  syscall

  name print_world

  add rsp, 16

  pop rbp
  ret

Since once we enter the perform, the worth of rsp is 16*N+8, and pushing rbp will increase it by 8, the stack pointer is 16 bytes aligned on the level of sub rsp, 16. Decrementing it by 16 (or a a number of of 16) retains it 16 bytes aligned.

We all know can safely name one other perform from inside print_hello:

print_world:
  push rbp
  mov rbp, rsp

  sub rsp, 16
  mov BYTE [rsp + 0], ' '
  mov BYTE [rsp + 1], 'w'
  mov BYTE [rsp + 2], 'o'
  mov BYTE [rsp + 3], 'r'
  mov BYTE [rsp + 4], 'l'
  mov BYTE [rsp + 5], 'd'

  mov rax, SYSCALL_WRITE
  mov rdi, STDOUT
  lea rsi, [rsp]
  mov rdx, 6
  syscall

  add rsp, 16

  pop rbp
  ret

print_hello:
  push rbp
  mov rbp, rsp

  sub rsp, 16
  mov BYTE [rsp + 0], 'h'
  mov BYTE [rsp + 1], 'e'
  mov BYTE [rsp + 2], 'l'
  mov BYTE [rsp + 3], 'l'
  mov BYTE [rsp + 4], 'o'

  mov rax, SYSCALL_WRITE
  mov rdi, STDOUT
  lea rsi, [rsp]
  mov rdx, 5
  syscall

  name print_world

  add rsp, 16

  pop rbp
  ret

And we get whats up world as an output.

Now, attempt to do sub rsp, 5 in print_hello, and your program might crash. There is no such thing as a assure, that’s what makes it exhausting to trace down.

My recommendation is:

  • All the time use the usual perform prologs and epilogs
  • All the time increment/decrement rsp by (a a number of of) 16
  • If it’s a must to decrement rsp by a worth that’s unknown at compile time (much like how alloca() works in C), you’ll be able to and rsp, -16 to 16 bytes align it.
  • Deal with gadgets on the stack relative to rsp, i.e. mov BYTE [rsp + 4], 'o'

And also you’ll be secure.

The final level is attention-grabbing, see for your self:

(gdb) p -100 & -16
$1 = -112
(gdb) p -112 & -16
$2 = -112

Which interprets in meeting to:

sub rsp, 100
and rsp, -16

Lastly, following these conventions signifies that our meeting features might be safely known as from C or different languages following the System V ABI, with none modification, which is nice.

I’ve not talked in regards to the crimson zone which is a 128 byte area on the backside of the stack which our program is free to make use of because it pleases with out having to alter the stack pointer. For my part, it isn’t useful and creates exhausting to trace bugs, so I don’t suggest to make use of it. To disable it solely, run: nasm -f elf64 -g foremost.nasm && cc foremost.o -static -o foremost -mno-red-zone -nostdlib.

Opening a socket

We now are able to open a socket with the socket(2) syscall, so we add a couple of constants, taken from the libc headers (be aware that these values would possibly truly be completely different on a unique Unix, I’ve not checked. Once more, a couple of %ifdef can simply treatment this discrepancy):

%outline AF_UNIX 1
%outline SOCK_STREAM 1

%outline SYSCALL_SOCKET 41

The AF_UNIX fixed means we would like a Unix area socket, and SOCK_STREAM means TCP. We use a site socket since we now that our server is working on the identical machine and it must be sooner, however we may change it to AF_INET to connect with a distant IPv4 handle for instance.

We then fill the related registers with these values and invoke the system name:

  mov rax, SYSCALL_SOCKET
  mov rdi, AF_UNIX ; Unix socket.
  mov rsi, SOCK_STREAM ; Tcp-like.
  mov rdx, 0 ; Automated protocol.
  syscall

The C equal can be: socket(AF_UNIX, SOCK_STREAM, 0);. So that you see that if we fill the registers in the identical order because the C perform parameters, we keep near what C code would do.

The entire program now seems like this:

BITS 64 ; 64 bits.
CPU X64 ; Goal the x86_64 household of CPUs.

part .textual content

%outline AF_UNIX 1
%outline SOCK_STREAM 1

%outline SYSCALL_SOCKET 41
%outline SYSCALL_EXIT 60

international _start:
_start:
  ; open a unix socket.
  mov rax, SYSCALL_SOCKET
  mov rdi, AF_UNIX ; Unix socket.
  mov rsi, SOCK_STREAM ; Tcp-like.
  mov rdx, 0 ; computerized protocol.
  syscall


  ; The top.
  mov rax, SYSCALL_EXIT
  mov rdi, 0
  syscall

Constructing and working it below strace reveals that it really works and we get a socket with the file descriptor 3 (on this case, it is likely to be completely different for you if you’re following at house):

$ nasm -f elf64 -g foremost.nasm && ld foremost.o -static -o foremost 
$ strace ./foremost
execve("./foremost", ["./main"], 0x7ffe54dfe550 /* 60 vars */) = 0
socket(AF_UNIX, SOCK_STREAM, 0)         = 3
exit(0)                                 = ?
+++ exited with 0 +++

Connecting to the server

Now that we’ve created a socket, we will hook up with the server with the join(2) system name.

It’s a great time to extract that logic in its personal little perform, similar to in some other high-level language.

x11_connect_to_server:
  ; TODO

In meeting, a perform is just a label we will bounce to. However for readability, each for readers of the code and instruments, we will add a touch that it is a actual perform we will name, like this: name x11_connect_to_server. This can enhance the decision stack for instance when utilizing strace -k. This trace has the shape (in nasm): static <identify of the perform>:perform.

After all, we additionally want so as to add our customary perform prolog and epilog:

x11_connect_to_server:
static x11_connect_to_server:perform
  push rbp
  mov rbp, rsp
  
  pop rbp
  ret

An extra assist when studying features in meeting code is including feedback describing what parameters they settle for and what’s the return worth, if any. Since there isn’t any language stage characteristic for this, we resort to feedback:

; Create a UNIX area socket and hook up with the X11 server.
; @returns The socket file descriptor.
x11_connect_to_server:
static x11_connect_to_server:perform
  push rbp
  mov rbp, rsp
  
  pop rbp
  ret

First, let’s transfer the socket creation logic to our perform and name it in this system:

; Create a UNIX area socket and hook up with the X11 server.
; @returns The socket file descriptor.
x11_connect_to_server:
static x11_connect_to_server:perform
  push rbp
  mov rbp, rsp
  
  ; Open a Unix socket: socket(2).
  mov rax, SYSCALL_SOCKET
  mov rdi, AF_UNIX ; Unix socket.
  mov rsi, SOCK_STREAM ; Tcp-like.
  mov rdx, 0 ; Automated protocol.
  syscall

  cmp rax, 0
  jle die

  mov rdi, rax ; Retailer socket fd in `rdi` for the rest of the perform.

  pop rbp
  ret

die:
  mov rax, SYSCALL_EXIT
  mov rdi, 1
  syscall

_start:
international _start:perform
  name x11_connect_to_server
  
  ; The top.
  mov rax, SYSCALL_EXIT
  mov rdi, 0
  syscall

The error checking could be very simplistic: we solely examine that the return worth of the system name (in rax) is what we count on, in any other case we exit this system with a non-zero code by leaping to the die part.

jle is a conditional bounce, which inspects international flags, hopefully set simply earlier than with cmp or take a look at, and jumps to a label if the situation is true. Right here, we evaluate the returned worth with 0, and whether it is decrease or equal to 0, we bounce to the error label. That’s how we implement conditionals and loops.


Okay, we will lastly hook up with the server now. The join(2) system name takes the handle of a sockaddr_un construction because the second argument. This construction is simply too huge to slot in a register.

That is the primary syscall we encounter that must be handed a pointer, in different phrases, the handle of a area in reminiscence. That area might be on the stack or on the heap, and even be our personal executable mapped in reminiscence. That’s meeting, we get to do no matter we would like.

Since we need to maintain issues easy and quick, we’ll retailer the whole lot on this program on the stack. And since we’ve 8 MiB of it (in keeping with restrict, on my machine, that’s), it’ll be a lot sufficient. Truly, probably the most area we’ll want on the stack on this program might be 32 KiB.

The dimensions of the sockaddr_un construction is 110 bytes, so we reserve 112 to align rsp to 16 bytes.

Nasm does have structs, however they’re quite a method to outline offsets with a reputation, than buildings like in C with a selected syntax to handle a selected subject. For the sake of simplicity, I’ll use the guide manner, with out nasm structs.

We set the primary 2 bytes of this construction to AF_UNIX since it is a area socket. Then comes the trail of the Unix area socket which X11 expects to be in a sure format. We need to show our window on the primary monitor beginning at 0, so the string is: /tmp/.X11-unix/X0.

In C, we’d do:

  const sockaddr_un addr = {.sun_family = AF_UNIX,
                            .sun_path = "/tmp/.X11-unix/X0"};
  const int res =
      join(x11_socket_fd, (const struct sockaddr *)&addr, sizeof(addr));

How will we translate that to meeting, particularly the string half?

We may set every byte to every character of the string within the construction, on the stack, manually, one after the other. One other way to do it’s to make use of the rep movsb idiom, which instructs the CPU to repeat a personality from a string A to a different string B, N occasions. That is precisely what we’d like!

The best way it really works is:

  • We put the string within the .rodata part (similar as the info part however read-only)
  • We load its handle in rsi (it’s the supply)
  • We load the handle of the string within the construction on the stack in rdi (it’s the vacation spot)
  • We set rcx to the variety of bytes to be copied
  • We use cld to clear the DF flag to make sure the copy is finished forwards (because it may also be completed backwards)
  • We name rep movsb and voila

It’s principally memcpy from C.

It is a attention-grabbing case: we will see that some directions count on a few of their operands to be in sure registers and there’s no manner round it. So, we’ve to plan forward and count on these registers to be overwritten. If we have to maintain their unique values round, we’ve to retailer these values elsewhere, for instance on the stack (that’s known as spilling) or in different registers. It is a broader matter of register allocation which is NP-hard! In small features, it’s manageable although.

First, the .rodata part:

part .rodata

sun_path: db "/tmp/.X11-unix/X0", 0
static sun_path:information

Then we copy the string:

  mov WORD [rsp], AF_UNIX ; Set sockaddr_un.sun_family to AF_UNIX
  ; Fill sockaddr_un.sun_path with: "/tmp/.X11-unix/X0".
  lea rsi, sun_path
  mov r12, rdi ; Save the socket file descriptor in `rdi` in `r12`.
  lea rdi, [rsp + 2]
  cld ; Transfer ahead
  mov ecx, 19 ; Size is nineteen with the null terminator.
  rep movsb ; Copy.

ecx is the 32 bit type of the register rcx, that means we solely set right here the decrease 32 bits of the 64 bit register. This handy table lists the entire varieties for the entire registers. However be cautious of the pitfall case of solely setting a worth in a part of a register, after which utilizing the entire register later. The remainder of the bits that haven’t been set will include some previous worth, which is tough to troubleshoot. The answer is to make use of movzx to zero lengthen, that means setting the remainder of the bits to 0. A great way to visualise that is to make use of data registers inside gdb, and that may show for every register the worth for every of its varieties, e.g. for rcx, it would show the worth for rcx, ecx, cx, ch, cl.

Then, we do the syscall, examine the returned worth, exit this system if the worth just isn’t 0, and eventually return the socket file descriptor, which might be used each time in the remainder of this system when speaking to the X11 server.

See Also

The whole lot collectively, it seems like:

; Create a UNIX area socket and hook up with the X11 server.
; @returns The socket file descriptor.
x11_connect_to_server:
static x11_connect_to_server:perform
  push rbp
  mov rbp, rsp 

  ; Open a Unix socket: socket(2).
  mov rax, SYSCALL_SOCKET
  mov rdi, AF_UNIX ; Unix socket.
  mov rsi, SOCK_STREAM ; Tcp-like.
  mov rdx, 0 ; Automated protocol.
  syscall

  cmp rax, 0
  jle die

  mov rdi, rax ; Retailer socket fd in `rdi` for the rest of the perform.

  sub rsp, 112 ; Retailer struct sockaddr_un on the stack.

  mov WORD [rsp], AF_UNIX ; Set sockaddr_un.sun_family to AF_UNIX
  ; Fill sockaddr_un.sun_path with: "/tmp/.X11-unix/X0".
  lea rsi, sun_path
  mov r12, rdi ; Save the socket file descriptor in `rdi` in `r12`.
  lea rdi, [rsp + 2]
  cld ; Transfer ahead
  mov ecx, 19 ; Size is nineteen with the null terminator.
  rep movsb ; Copy.

  ; Connect with the server: join(2).
  mov rax, SYSCALL_CONNECT
  mov rdi, r12
  lea rsi, [rsp]
  %outline SIZEOF_SOCKADDR_UN 2+108
  mov rdx, SIZEOF_SOCKADDR_UN
  syscall

  cmp rax, 0
  jne die

  mov rax, rdi ; Return the socket fd.

  add rsp, 112
  pop rbp
  ret

We’re prepared to speak to the X11 server!

Sending information over the socket

There may be the ship(2) syscall to do that, however we will maintain it easy and use the generic write(2) syscall as an alternative. Both manner works.

%outline SYSCALL_WRITE 1

The C construction for the handshake seems like this:

typedef struct {
  u8 order;
  u8 pad1;
  u16 main, minor;
  u16 auth_proto, auth_data;
  u16 pad2;
} x11_connection_req_t;

pad* fields might be ignored since they’re padding and their worth just isn’t learn by the server.

For our handshake, we have to set the order to be l, that’s, little-endian, since X11 might be informed to interpret message as huge or little endian. Since x64 is little-endian, we don’t need to have a endianness translation layer and so we keep on with little-endian.

We additionally have to set the main subject, which is the model, to 11. I’ll go away it to the reader to guess why.

In C, we’d do:

  x11_connection_req_t req = {.order = 'l', .main = 11};

This construction is just 12 bytes lengthy, however since we must learn the response from the server which is sort of huge (round 14 KiB throughout my testing), we’ll straight away reserve lots of area on the stack, 32 KiB, to be secure:

  sub rsp, 1<<15
  mov BYTE [rsp + 0], 'l' ; Set order to 'l'.
  mov WORD [rsp + 2], 11 ; Set main model to 11.

Then we ship it to the server:

  ; Ship the handshake to the server: write(2).
  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 12*8
  syscall

  cmp rax, 12*8 ; Examine that every one bytes had been written.
  jnz die

After that, we learn the server response, which must be at first 8 bytes:

  ; Learn the server response: learn(2).
  ; Use the stack for the learn buffer.
  ; The X11 server first replies with 8 bytes. As soon as these are learn, it replies with a a lot larger message.
  mov rax, SYSCALL_READ
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 8
  syscall

  cmp rax, 8 ; Examine that the server replied with 8 bytes.
  jnz die

  cmp BYTE [rsp], 1 ; Examine that the server despatched 'success' (first byte is 1).
  jnz die

The primary byte within the server response is 0 for failure and 1 for achievement (and 2 for authentication however we is not going to want it right here).

The server ship sends a giant message with lots of common data, which we’ll want for later, so we retailer sure fields in international variables positioned within the information part.

First we add these variables, every 4 bytes huge:

part .information

id: dd 0
static id:information

id_base: dd 0
static id_base:information

id_mask: dd 0
static id_mask:information

root_visual_id: dd 0
static root_visual_id:information

Then we learn the server response, and skip over the components we’re not fascinated with. This boils right down to incrementing a pointer by a dynamic worth, a couple of occasions. Word that since we don’t do any checks right here, that may be an excellent assault vector to set off a stack overflow or such in our program.

  ; Learn the remainder of the server response: learn(2).
  ; Use the stack for the learn buffer.
  mov rax, SYSCALL_READ
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 1<<15
  syscall

  cmp rax, 0 ; Examine that the server replied with one thing.
  jle die

  ; Set id_base globally.
  mov edx, DWORD [rsp + 4]
  mov DWORD [id_base], edx

  ; Set id_mask globally.
  mov edx, DWORD [rsp + 8]
  mov DWORD [id_mask], edx

  ; Learn the knowledge we'd like, skip over the remaining.
  lea rdi, [rsp] ; Pointer that may skip over some information.
  
  mov cx, WORD [rsp + 16] ; Vendor size (v).
  movzx rcx, cx

  mov al, BYTE [rsp + 21]; Variety of codecs (n).
  movzx rax, al ; Fill the remainder of the register with zeroes to keep away from rubbish values.
  imul rax, 8 ; sizeof(format) == 8

  add rdi, 32 ; Skip the connection setup
  add rdi, rcx ; Skip over the seller data (v).
  add rdi, rax ; Skip over the format data (n*8).

  mov eax, DWORD [rdi] ; Retailer (and return) the window root id.

  ; Set the root_visual_id globally.
  mov edx, DWORD [rdi + 32]
  mov DWORD [root_visual_id], edx

All collectively:

; Ship the handshake to the X11 server and browse the returned system data.
; @param rdi The socket file descriptor
; @returns The window root id (uint32_t) in rax.
x11_send_handshake:
static x11_send_handshake:perform
  push rbp
  mov rbp, rsp

  sub rsp, 1<<15
  mov BYTE [rsp + 0], 'l' ; Set order to 'l'.
  mov WORD [rsp + 2], 11 ; Set main model to 11.

  ; Ship the handshake to the server: write(2).
  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 12*8
  syscall

  cmp rax, 12*8 ; Examine that every one bytes had been written.
  jnz die

  ; Learn the server response: learn(2).
  ; Use the stack for the learn buffer.
  ; The X11 server first replies with 8 bytes. As soon as these are learn, it replies with a a lot larger message.
  mov rax, SYSCALL_READ
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 8
  syscall

  cmp rax, 8 ; Examine that the server replied with 8 bytes.
  jnz die

  cmp BYTE [rsp], 1 ; Examine that the server despatched 'success' (first byte is 1).
  jnz die

  ; Learn the remainder of the server response: learn(2).
  ; Use the stack for the learn buffer.
  mov rax, SYSCALL_READ
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 1<<15
  syscall

  cmp rax, 0 ; Examine that the server replied with one thing.
  jle die

  ; Set id_base globally.
  mov edx, DWORD [rsp + 4]
  mov DWORD [id_base], edx

  ; Set id_mask globally.
  mov edx, DWORD [rsp + 8]
  mov DWORD [id_mask], edx

  ; Learn the knowledge we'd like, skip over the remaining.
  lea rdi, [rsp] ; Pointer that may skip over some information.
  
  mov cx, WORD [rsp + 16] ; Vendor size (v).
  movzx rcx, cx

  mov al, BYTE [rsp + 21]; Variety of codecs (n).
  movzx rax, al ; Fill the remainder of the register with zeroes to keep away from rubbish values.
  imul rax, 8 ; sizeof(format) == 8

  add rdi, 32 ; Skip the connection setup
  add rdi, rcx ; Skip over the seller data (v).
  add rdi, rax ; Skip over the format data (n*8).

  mov eax, DWORD [rdi] ; Retailer (and return) the window root id.

  ; Set the root_visual_id globally.
  mov edx, DWORD [rdi + 32]
  mov DWORD [root_visual_id], edx

  add rsp, 1<<15
  pop rbp
  ret

From this level on, I’ll assume you’re accustomed to the fundamentals of meeting and X11 and won’t go as a lot into particulars.

Producing ids

When creating sources on the server-side, we often first generate an id on the shopper facet, and ship that id to the server when creating the useful resource.

We retailer the present id in a world variable and increment it every time a brand new id is generated.

That is how we do it:

; Increment the worldwide id.
; @return The brand new id.
x11_next_id:
static x11_next_id:perform
  push rbp
  mov rbp, rsp

  mov eax, DWORD [id] ; Load international id.

  mov edi, DWORD [id_base] ; Load international id_base.
  mov edx, DWORD [id_mask] ; Load international id_mask.

  ; Return: id_mask & (id) | id_base
  and eax, edx
  or eax, edi

  add DWORD [id], 1 ; Increment id.

  pop rbp
  ret

Opening a font

To open a font, which is a prerequisite to attract textual content, we ship a message to the server specifying (a part of) the identify of the font we would like, and the server will choose an identical font.

To play with one other font, you should use xfontsel which shows all of the font names that the X11 server is aware of about.

First, we generate an id for the font regionally, after which we ship it alongside the font identify.

; Open the font on the server facet.
; @param rdi The socket file descriptor.
; @param esi The font id.
x11_open_font:
static x11_open_font:perform
  push rbp
  mov rbp, rsp

  %outline OPEN_FONT_NAME_BYTE_COUNT 5
  %outline OPEN_FONT_PADDING ((4 - (OPEN_FONT_NAME_BYTE_COUNT % 4)) % 4)
  %outline OPEN_FONT_PACKET_U32_COUNT (3 + (OPEN_FONT_NAME_BYTE_COUNT + OPEN_FONT_PADDING) / 4)
  %outline X11_OP_REQ_OPEN_FONT 0x2d

  sub rsp, 6*8
  mov DWORD [rsp + 0*4], X11_OP_REQ_OPEN_FONT | (OPEN_FONT_NAME_BYTE_COUNT << 16)
  mov DWORD [rsp + 1*4], esi
  mov DWORD [rsp + 2*4], OPEN_FONT_NAME_BYTE_COUNT
  mov BYTE [rsp + 3*4 + 0], 'f'
  mov BYTE [rsp + 3*4 + 1], 'i'
  mov BYTE [rsp + 3*4 + 2], 'x'
  mov BYTE [rsp + 3*4 + 3], 'e'
  mov BYTE [rsp + 3*4 + 4], 'd'


  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, OPEN_FONT_PACKET_U32_COUNT*4
  syscall

  cmp rax, OPEN_FONT_PACKET_U32_COUNT*4
  jnz die

  add rsp, 6*8

  pop rbp
  ret

Making a graphical context

Since an utility in X11 can have a number of home windows, we first have to create a graphical context containing the final data. Once we create a window, we seek advice from this graphical context by id.

Once more, we have to generate an id for the graphical context to be.

X11 shops a hierarchy of home windows, so when creating the graphical context, we additionally want to provide it the basis window id (i.e. the mum or dad id).

; Create a X11 graphical context.
; @param rdi The socket file descriptor.
; @param esi The graphical context id.
; @param edx The window root id.
; @param ecx The font id.
x11_create_gc:
static x11_create_gc:perform
  push rbp
  mov rbp, rsp

  sub rsp, 8*8

%outline X11_OP_REQ_CREATE_GC 0x37
%outline X11_FLAG_GC_BG 0x00000004
%outline X11_FLAG_GC_FG 0x00000008
%outline X11_FLAG_GC_FONT 0x00004000
%outline X11_FLAG_GC_EXPOSE 0x00010000

%outline CREATE_GC_FLAGS X11_FLAG_GC_BG | X11_FLAG_GC_FG | X11_FLAG_GC_FONT
%outline CREATE_GC_PACKET_FLAG_COUNT 3
%outline CREATE_GC_PACKET_U32_COUNT (4 + CREATE_GC_PACKET_FLAG_COUNT)
%outline MY_COLOR_RGB 0x0000ffff

  mov DWORD [rsp + 0*4], X11_OP_REQ_CREATE_GC | (CREATE_GC_PACKET_U32_COUNT<<16)
  mov DWORD [rsp + 1*4], esi
  mov DWORD [rsp + 2*4], edx
  mov DWORD [rsp + 3*4], CREATE_GC_FLAGS
  mov DWORD [rsp + 4*4], MY_COLOR_RGB
  mov DWORD [rsp + 5*4], 0
  mov DWORD [rsp + 6*4], ecx

  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, CREATE_GC_PACKET_U32_COUNT*4
  syscall

  cmp rax, CREATE_GC_PACKET_U32_COUNT*4
  jnz die
  
  add rsp, 8*8

  pop rbp
  ret

Creating the window

We will now create the window, which refers back to the freshly created graphical context.
We additionally present the specified x and y coordinates of the window, in addition to the specified dimensions (width and peak).

Word that these are merely hints and the ensuing window might properly have completely different coordinates and dimensions, for instance when utilizing a tiling window supervisor, or when resizing the window.

; Create the X11 window.
; @param rdi The socket file descriptor.
; @param esi The brand new window id.
; @param edx The window root id.
; @param ecx The basis visible id.
; @param r8d Packed x and y.
; @param r9d Packed w and h.
x11_create_window:
static x11_create_window:perform
  push rbp
  mov rbp, rsp

  %outline X11_OP_REQ_CREATE_WINDOW 0x01
  %outline X11_FLAG_WIN_BG_COLOR 0x00000002
  %outline X11_EVENT_FLAG_KEY_RELEASE 0x0002
  %outline X11_EVENT_FLAG_EXPOSURE 0x8000
  %outline X11_FLAG_WIN_EVENT 0x00000800
  
  %outline CREATE_WINDOW_FLAG_COUNT 2
  %outline CREATE_WINDOW_PACKET_U32_COUNT (8 + CREATE_WINDOW_FLAG_COUNT)
  %outline CREATE_WINDOW_BORDER 1
  %outline CREATE_WINDOW_GROUP 1

  sub rsp, 12*8

  mov DWORD [rsp + 0*4], X11_OP_REQ_CREATE_WINDOW | (CREATE_WINDOW_PACKET_U32_COUNT << 16)
  mov DWORD [rsp + 1*4], esi
  mov DWORD [rsp + 2*4], edx
  mov DWORD [rsp + 3*4], r8d
  mov DWORD [rsp + 4*4], r9d
  mov DWORD [rsp + 5*4], CREATE_WINDOW_GROUP | (CREATE_WINDOW_BORDER << 16)
  mov DWORD [rsp + 6*4], ecx
  mov DWORD [rsp + 7*4], X11_FLAG_WIN_BG_COLOR | X11_FLAG_WIN_EVENT
  mov DWORD [rsp + 8*4], 0
  mov DWORD [rsp + 9*4], X11_EVENT_FLAG_KEY_RELEASE | X11_EVENT_FLAG_EXPOSURE


  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, CREATE_WINDOW_PACKET_U32_COUNT*4
  syscall

  cmp rax, CREATE_WINDOW_PACKET_U32_COUNT*4
  jnz die

  add rsp, 12*8

  pop rbp
  ret

Mapping the window

If you’re following alongside at house, and simply ran this system, you’ve got realized nothing is displayed.

That’s as a result of X11 doesn’t present the window till we’ve mapped it. It is a easy message to ship:

; Map a X11 window.
; @param rdi The socket file descriptor.
; @param esi The window id.
x11_map_window:
static x11_map_window:perform
  push rbp
  mov rbp, rsp

  sub rsp, 16

  %outline X11_OP_REQ_MAP_WINDOW 0x08
  mov DWORD [rsp + 0*4], X11_OP_REQ_MAP_WINDOW | (2<<16)
  mov DWORD [rsp + 1*4], esi

  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 2*4
  syscall

  cmp rax, 2*4
  jnz die

  add rsp, 16

  pop rbp
  ret

We now have a black window:

Black window

Yay!

Polling for server messages

We want to draw textual content within the window now, however we’ve to attend for the Expose occasion to be despatched to us, which signifies that the window is seen, to have the ability to begin drawing on it.

We need to pay attention for all server messages truly, be it errors or occasions, for instance when the consumer presses a key on the keyboard.

If we do a easy blocking learn(2), however the server sends nothing, this system will seem not responding. Not good.
The answer is to make use of the ballot(2) system name to be awoken by the working system at any time when there’s information to be learn on the socket, a la NodeJS or Nginx.

First, we have to mark the socket as ‘non-blocking’ since it’s by default in blocking mode:

; Set a file descriptor in non-blocking mode.
; @param rdi The file descriptor.
set_fd_non_blocking:
static set_fd_non_blocking:perform
  push rbp
  mov rbp, rsp

  mov rax, SYSCALL_FCNTL
  mov rdi, rdi 
  mov rsi, F_GETFL
  mov rdx, 0
  syscall

  cmp rax, 0
  jl die

  ; `or` the present file standing flag with O_NONBLOCK.
  mov rdx, rax
  or rdx, O_NONBLOCK

  mov rax, SYSCALL_FCNTL
  mov rdi, rdi 
  mov rsi, F_SETFL
  mov rdx, rdx
  syscall

  cmp rax, 0
  jl die

  pop rbp
  ret

Then, we write a small perform to learn information on the socket. For simplicity, we solely learn 32 bytes of information, as a result of most messages from X11 are of this measurement. We additionally return the primary byte which accommodates the occasion kind.

; Learn the X11 server reply.
; @return The message code in al.
x11_read_reply:
static x11_read_reply:perform
  push rbp
  mov rbp, rsp

  sub rsp, 32
  
  mov rax, SYSCALL_READ
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 32
  syscall

  cmp rax, 1
  jle die

  mov al, BYTE [rsp]

  add rsp, 32

  pop rbp
  ret

We now can ballot. If an error happens or the opposite facet has closed their finish of the socket, we exit this system.

; Ballot indefinitely messages from the X11 server with ballot(2).
; @param rdi The socket file descriptor.
; @param esi The window id.
; @param edx The gc id.
poll_messages:
static poll_messages:perform
  push rbp
  mov rbp, rsp

  sub rsp, 32

  %outline POLLIN 0x001
  %outline POLLPRI 0x002
  %outline POLLOUT 0x004
  %outline POLLERR  0x008
  %outline POLLHUP  0x010
  %outline POLLNVAL 0x020

  mov DWORD [rsp + 0*4], edi
  mov DWORD [rsp + 1*4], POLLIN

  mov DWORD [rsp + 16], esi ; window id
  mov DWORD [rsp + 20], edx ; gc id

  .loop:
    mov rax, SYSCALL_POLL
    lea rdi, [rsp]
    mov rsi, 1
    mov rdx, -1
    syscall

    cmp rax, 0
    jle die

    cmp DWORD [rsp + 2*4], POLLERR  
    je die

    cmp DWORD [rsp + 2*4], POLLHUP  
    je die

    mov rdi, [rsp + 0*4]
    name x11_read_reply

    jmp .loop

  add rsp, 16
  pop rbp
  ret

Drawing textual content

Eventually, we will draw textual content. The small problem right here is that the textual content is of unknown size within the common case, so we’ve to compute the scale of the X11 message, together with the padding on the finish. Up to now, we solely had messages of mounted measurement.

The official documentation has formulation to compute these values.

; Draw textual content in a X11 window with server-side textual content rendering.
; @param rdi The socket file descriptor.
; @param rsi The textual content string.
; @param edx The textual content string size in bytes.
; @param ecx The window id.
; @param r8d The gc id.
; @param r9d Packed x and y.
x11_draw_text:
static x11_draw_text:perform
  push rbp
  mov rbp, rsp

  sub rsp, 1024

  mov DWORD [rsp + 1*4], ecx ; Retailer the window id straight within the packet information on the stack.
  mov DWORD [rsp + 2*4], r8d ; Retailer the gc id straight within the packet information on the stack.
  mov DWORD [rsp + 3*4], r9d ; Retailer x, y straight within the packet information on the stack.

  mov r8d, edx ; Retailer the string size in r8 since edx might be overwritten subsequent.
  mov QWORD [rsp + 1024 - 8], rdi ; Retailer the socket file descriptor on the stack to free the register.

  ; Compute padding and packet u32 depend with division and modulo 4.
  mov eax, edx ; Put dividend in eax.
  mov ecx, 4 ; Put divisor in ecx.
  cdq ; Signal lengthen.
  idiv ecx ; Compute eax / ecx, and put the rest (i.e. modulo) in edx.
  ; LLVM optimizer magic: `(4-x)%4 == -x & 3`, for some cause.
  neg edx
  and edx, 3
  mov r9d, edx ; Retailer padding in r9.

  mov eax, r8d 
  add eax, r9d
  shr eax, 2 ; Compute: eax /= 4
  add eax, 4 ; eax now accommodates the packet u32 depend.


  %outline X11_OP_REQ_IMAGE_TEXT8 0x4c
  mov DWORD [rsp + 0*4], r8d
  shl DWORD [rsp + 0*4], 8
  or DWORD [rsp + 0*4], X11_OP_REQ_IMAGE_TEXT8
  mov ecx, eax
  shl ecx, 16
  or [rsp + 0*4], ecx

  ; Copy the textual content string into the packet information on the stack.
  mov rsi, rsi ; Supply string in rsi.
  lea rdi, [rsp + 4*4] ; Vacation spot
  cld ; Transfer ahead
  mov ecx, r8d ; String size.
  rep movsb ; Copy.

  mov rdx, rax ; packet u32 depend
  imul rdx, 4
  mov rax, SYSCALL_WRITE
  mov rdi, QWORD [rsp + 1024 - 8] ; fd
  lea rsi, [rsp]
  syscall

  cmp rax, rdx
  jnz die

  add rsp, 1024

  pop rbp
  ret

We then name this perform contained in the polling loop, and we retailer the ‘uncovered’ state in a boolean on the stack to know whether or not we must always render the textual content or not:

    %outline X11_EVENT_EXPOSURE 0xc
    cmp eax, X11_EVENT_EXPOSURE
    jnz .received_other_event

    .received_exposed_event:
    mov BYTE [rsp + 24], 1 ; Mark as uncovered.

    .received_other_event:

    cmp BYTE [rsp + 24], 1 ; uncovered?
    jnz .loop

    .draw_text:
      mov rdi, [rsp + 0*4] ; socket fd
      lea rsi, [hello_world] ; string
      mov edx, 13 ; size
      mov ecx, [rsp + 16] ; window id
      mov r8d, [rsp + 20] ; gc id
      mov r9d, 100 ; x
      shl r9d, 16
      or r9d, 100 ; y
      name x11_draw_text

Lastly, we see our Good day, world! textual content displayed contained in the window:

Result

The top

Wow, that was quite a bit. However we did it! We wrote a (albeit simplistic) GUI program in pure meeting, no dependencies, and that’s simply 600 traces of code in the long run.

How did we fare on the executable measurement half?

  • With debug data: 10744 bytes (10 KiB)
  • With out debug data (stripped): 8592 bytes (8 KiB)
  • Stripped and OMAGIC (--omagic linker flag, from the person web page: Set the textual content and information sections to be readable and writable. Additionally, don't page-align the info phase): 1776 bytes (1 KiB)

Not too shaby, a GUI program in 1 KiB.

The place to go from there?

  • We may transfer textual content rendering client-side. Doing it server-side has a number of limitations.
  • We may add form rendering, similar to quads and circles
  • We may take heed to keyboard and mouse occasions (the polling loop is simple to increase to try this)

I hope that you simply had as a lot enjoyable as I did!

Addendum: the total code

; Construct with: nasm -f elf64 -g foremost.nasm && ld foremost.o -static -o foremost 

BITS 64 ; 64 bits.
CPU X64 ; Goal the x86_64 household of CPUs.

part .rodata

sun_path: db "/tmp/.X11-unix/X0", 0
static sun_path:information

hello_world: db "Good day, world!"
static hello_world:information

part .information

id: dd 0
static id:information

id_base: dd 0
static id_base:information

id_mask: dd 0
static id_mask:information

root_visual_id: dd 0
static root_visual_id:information


part .textual content

%outline AF_UNIX 1
%outline SOCK_STREAM 1

%outline SYSCALL_READ 0
%outline SYSCALL_WRITE 1
%outline SYSCALL_POLL 7
%outline SYSCALL_SOCKET 41
%outline SYSCALL_CONNECT 42
%outline SYSCALL_EXIT 60
%outline SYSCALL_FCNTL 72

; Create a UNIX area socket and hook up with the X11 server.
; @returns The socket file descriptor.
x11_connect_to_server:
static x11_connect_to_server:perform
  push rbp
  mov rbp, rsp 

  ; Open a Unix socket: socket(2).
  mov rax, SYSCALL_SOCKET
  mov rdi, AF_UNIX ; Unix socket.
  mov rsi, SOCK_STREAM ; Tcp-like.
  mov rdx, 0 ; Automated protocol.
  syscall

  cmp rax, 0
  jle die

  mov rdi, rax ; Retailer socket fd in `rdi` for the rest of the perform.

  sub rsp, 112 ; Retailer struct sockaddr_un on the stack.

  mov WORD [rsp], AF_UNIX ; Set sockaddr_un.sun_family to AF_UNIX
  ; Fill sockaddr_un.sun_path with: "/tmp/.X11-unix/X0".
  lea rsi, sun_path
  mov r12, rdi ; Save the socket file descriptor in `rdi` in `r12`.
  lea rdi, [rsp + 2]
  cld ; Transfer ahead
  mov ecx, 19 ; Size is nineteen with the null terminator.
  rep movsb ; Copy.

  ; Connect with the server: join(2).
  mov rax, SYSCALL_CONNECT
  mov rdi, r12
  lea rsi, [rsp]
  %outline SIZEOF_SOCKADDR_UN 2+108
  mov rdx, SIZEOF_SOCKADDR_UN
  syscall

  cmp rax, 0
  jne die

  mov rax, rdi ; Return the socket fd.

  add rsp, 112
  pop rbp
  ret

; Ship the handshake to the X11 server and browse the returned system data.
; @param rdi The socket file descriptor
; @returns The window root id (uint32_t) in rax.
x11_send_handshake:
static x11_send_handshake:perform
  push rbp
  mov rbp, rsp

  sub rsp, 1<<15
  mov BYTE [rsp + 0], 'l' ; Set order to 'l'.
  mov WORD [rsp + 2], 11 ; Set main model to 11.

  ; Ship the handshake to the server: write(2).
  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 12*8
  syscall

  cmp rax, 12*8 ; Examine that every one bytes had been written.
  jnz die

  ; Learn the server response: learn(2).
  ; Use the stack for the learn buffer.
  ; The X11 server first replies with 8 bytes. As soon as these are learn, it replies with a a lot larger message.
  mov rax, SYSCALL_READ
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 8
  syscall

  cmp rax, 8 ; Examine that the server replied with 8 bytes.
  jnz die

  cmp BYTE [rsp], 1 ; Examine that the server despatched 'success' (first byte is 1).
  jnz die

  ; Learn the remainder of the server response: learn(2).
  ; Use the stack for the learn buffer.
  mov rax, SYSCALL_READ
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 1<<15
  syscall

  cmp rax, 0 ; Examine that the server replied with one thing.
  jle die

  ; Set id_base globally.
  mov edx, DWORD [rsp + 4]
  mov DWORD [id_base], edx

  ; Set id_mask globally.
  mov edx, DWORD [rsp + 8]
  mov DWORD [id_mask], edx

  ; Learn the knowledge we'd like, skip over the remaining.
  lea rdi, [rsp] ; Pointer that may skip over some information.
  
  mov cx, WORD [rsp + 16] ; Vendor size (v).
  movzx rcx, cx

  mov al, BYTE [rsp + 21]; Variety of codecs (n).
  movzx rax, al ; Fill the remainder of the register with zeroes to keep away from rubbish values.
  imul rax, 8 ; sizeof(format) == 8

  add rdi, 32 ; Skip the connection setup
  add rdi, rcx ; Skip over the seller data (v).
  add rdi, rax ; Skip over the format data (n*8).

  mov eax, DWORD [rdi] ; Retailer (and return) the window root id.

  ; Set the root_visual_id globally.
  mov edx, DWORD [rdi + 32]
  mov DWORD [root_visual_id], edx

  add rsp, 1<<15
  pop rbp
  ret

; Increment the worldwide id.
; @return The brand new id.
x11_next_id:
static x11_next_id:perform
  push rbp
  mov rbp, rsp

  mov eax, DWORD [id] ; Load international id.

  mov edi, DWORD [id_base] ; Load international id_base.
  mov edx, DWORD [id_mask] ; Load international id_mask.

  ; Return: id_mask & (id) | id_base
  and eax, edx
  or eax, edi

  add DWORD [id], 1 ; Increment id.

  pop rbp
  ret

; Open the font on the server facet.
; @param rdi The socket file descriptor.
; @param esi The font id.
x11_open_font:
static x11_open_font:perform
  push rbp
  mov rbp, rsp

  %outline OPEN_FONT_NAME_BYTE_COUNT 5
  %outline OPEN_FONT_PADDING ((4 - (OPEN_FONT_NAME_BYTE_COUNT % 4)) % 4)
  %outline OPEN_FONT_PACKET_U32_COUNT (3 + (OPEN_FONT_NAME_BYTE_COUNT + OPEN_FONT_PADDING) / 4)
  %outline X11_OP_REQ_OPEN_FONT 0x2d

  sub rsp, 6*8
  mov DWORD [rsp + 0*4], X11_OP_REQ_OPEN_FONT | (OPEN_FONT_NAME_BYTE_COUNT << 16)
  mov DWORD [rsp + 1*4], esi
  mov DWORD [rsp + 2*4], OPEN_FONT_NAME_BYTE_COUNT
  mov BYTE [rsp + 3*4 + 0], 'f'
  mov BYTE [rsp + 3*4 + 1], 'i'
  mov BYTE [rsp + 3*4 + 2], 'x'
  mov BYTE [rsp + 3*4 + 3], 'e'
  mov BYTE [rsp + 3*4 + 4], 'd'


  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, OPEN_FONT_PACKET_U32_COUNT*4
  syscall

  cmp rax, OPEN_FONT_PACKET_U32_COUNT*4
  jnz die

  add rsp, 6*8

  pop rbp
  ret

; Create a X11 graphical context.
; @param rdi The socket file descriptor.
; @param esi The graphical context id.
; @param edx The window root id.
; @param ecx The font id.
x11_create_gc:
static x11_create_gc:perform
  push rbp
  mov rbp, rsp

  sub rsp, 8*8

%outline X11_OP_REQ_CREATE_GC 0x37
%outline X11_FLAG_GC_BG 0x00000004
%outline X11_FLAG_GC_FG 0x00000008
%outline X11_FLAG_GC_FONT 0x00004000
%outline X11_FLAG_GC_EXPOSE 0x00010000

%outline CREATE_GC_FLAGS X11_FLAG_GC_BG | X11_FLAG_GC_FG | X11_FLAG_GC_FONT
%outline CREATE_GC_PACKET_FLAG_COUNT 3
%outline CREATE_GC_PACKET_U32_COUNT (4 + CREATE_GC_PACKET_FLAG_COUNT)
%outline MY_COLOR_RGB 0x0000ffff

  mov DWORD [rsp + 0*4], X11_OP_REQ_CREATE_GC | (CREATE_GC_PACKET_U32_COUNT<<16)
  mov DWORD [rsp + 1*4], esi
  mov DWORD [rsp + 2*4], edx
  mov DWORD [rsp + 3*4], CREATE_GC_FLAGS
  mov DWORD [rsp + 4*4], MY_COLOR_RGB
  mov DWORD [rsp + 5*4], 0
  mov DWORD [rsp + 6*4], ecx

  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, CREATE_GC_PACKET_U32_COUNT*4
  syscall

  cmp rax, CREATE_GC_PACKET_U32_COUNT*4
  jnz die
  
  add rsp, 8*8

  pop rbp
  ret

; Create the X11 window.
; @param rdi The socket file descriptor.
; @param esi The brand new window id.
; @param edx The window root id.
; @param ecx The basis visible id.
; @param r8d Packed x and y.
; @param r9d Packed w and h.
x11_create_window:
static x11_create_window:perform
  push rbp
  mov rbp, rsp

  %outline X11_OP_REQ_CREATE_WINDOW 0x01
  %outline X11_FLAG_WIN_BG_COLOR 0x00000002
  %outline X11_EVENT_FLAG_KEY_RELEASE 0x0002
  %outline X11_EVENT_FLAG_EXPOSURE 0x8000
  %outline X11_FLAG_WIN_EVENT 0x00000800
  
  %outline CREATE_WINDOW_FLAG_COUNT 2
  %outline CREATE_WINDOW_PACKET_U32_COUNT (8 + CREATE_WINDOW_FLAG_COUNT)
  %outline CREATE_WINDOW_BORDER 1
  %outline CREATE_WINDOW_GROUP 1

  sub rsp, 12*8

  mov DWORD [rsp + 0*4], X11_OP_REQ_CREATE_WINDOW | (CREATE_WINDOW_PACKET_U32_COUNT << 16)
  mov DWORD [rsp + 1*4], esi
  mov DWORD [rsp + 2*4], edx
  mov DWORD [rsp + 3*4], r8d
  mov DWORD [rsp + 4*4], r9d
  mov DWORD [rsp + 5*4], CREATE_WINDOW_GROUP | (CREATE_WINDOW_BORDER << 16)
  mov DWORD [rsp + 6*4], ecx
  mov DWORD [rsp + 7*4], X11_FLAG_WIN_BG_COLOR | X11_FLAG_WIN_EVENT
  mov DWORD [rsp + 8*4], 0
  mov DWORD [rsp + 9*4], X11_EVENT_FLAG_KEY_RELEASE | X11_EVENT_FLAG_EXPOSURE


  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, CREATE_WINDOW_PACKET_U32_COUNT*4
  syscall

  cmp rax, CREATE_WINDOW_PACKET_U32_COUNT*4
  jnz die

  add rsp, 12*8

  pop rbp
  ret

; Map a X11 window.
; @param rdi The socket file descriptor.
; @param esi The window id.
x11_map_window:
static x11_map_window:perform
  push rbp
  mov rbp, rsp

  sub rsp, 16

  %outline X11_OP_REQ_MAP_WINDOW 0x08
  mov DWORD [rsp + 0*4], X11_OP_REQ_MAP_WINDOW | (2<<16)
  mov DWORD [rsp + 1*4], esi

  mov rax, SYSCALL_WRITE
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 2*4
  syscall

  cmp rax, 2*4
  jnz die

  add rsp, 16

  pop rbp
  ret

; Learn the X11 server reply.
; @return The message code in al.
x11_read_reply:
static x11_read_reply:perform
  push rbp
  mov rbp, rsp

  sub rsp, 32
  
  mov rax, SYSCALL_READ
  mov rdi, rdi
  lea rsi, [rsp]
  mov rdx, 32
  syscall

  cmp rax, 1
  jle die

  mov al, BYTE [rsp]

  add rsp, 32

  pop rbp
  ret

die:
  mov rax, SYSCALL_EXIT
  mov rdi, 1
  syscall


; Set a file descriptor in non-blocking mode.
; @param rdi The file descriptor.
set_fd_non_blocking:
static set_fd_non_blocking:perform
  push rbp
  mov rbp, rsp

  %outline F_GETFL 3
  %outline F_SETFL 4

  %outline O_NONBLOCK 2048

  mov rax, SYSCALL_FCNTL
  mov rdi, rdi 
  mov rsi, F_GETFL
  mov rdx, 0
  syscall

  cmp rax, 0
  jl die

  ; `or` the present file standing flag with O_NONBLOCK.
  mov rdx, rax
  or rdx, O_NONBLOCK

  mov rax, SYSCALL_FCNTL
  mov rdi, rdi 
  mov rsi, F_SETFL
  mov rdx, rdx
  syscall

  cmp rax, 0
  jl die

  pop rbp
  ret

; Ballot indefinitely messages from the X11 server with ballot(2).
; @param rdi The socket file descriptor.
; @param esi The window id.
; @param edx The gc id.
poll_messages:
static poll_messages:perform
  push rbp
  mov rbp, rsp

  sub rsp, 32

  %outline POLLIN 0x001
  %outline POLLPRI 0x002
  %outline POLLOUT 0x004
  %outline POLLERR  0x008
  %outline POLLHUP  0x010
  %outline POLLNVAL 0x020

  mov DWORD [rsp + 0*4], edi
  mov DWORD [rsp + 1*4], POLLIN

  mov DWORD [rsp + 16], esi ; window id
  mov DWORD [rsp + 20], edx ; gc id
  mov BYTE [rsp + 24], 0 ; uncovered? (boolean)

  .loop:
    mov rax, SYSCALL_POLL
    lea rdi, [rsp]
    mov rsi, 1
    mov rdx, -1
    syscall

    cmp rax, 0
    jle die

    cmp DWORD [rsp + 2*4], POLLERR  
    je die

    cmp DWORD [rsp + 2*4], POLLHUP  
    je die

    mov rdi, [rsp + 0*4]
    name x11_read_reply

    %outline X11_EVENT_EXPOSURE 0xc
    cmp eax, X11_EVENT_EXPOSURE
    jnz .received_other_event

    .received_exposed_event:
    mov BYTE [rsp + 24], 1 ; Mark as uncovered.

    .received_other_event:

    cmp BYTE [rsp + 24], 1 ; uncovered?
    jnz .loop

    .draw_text:
      mov rdi, [rsp + 0*4] ; socket fd
      lea rsi, [hello_world] ; string
      mov edx, 13 ; size
      mov ecx, [rsp + 16] ; window id
      mov r8d, [rsp + 20] ; gc id
      mov r9d, 100 ; x
      shl r9d, 16
      or r9d, 100 ; y
      name x11_draw_text


    jmp .loop


  add rsp, 16
  pop rbp
  ret

; Draw textual content in a X11 window with server-side textual content rendering.
; @param rdi The socket file descriptor.
; @param rsi The textual content string.
; @param edx The textual content string size in bytes.
; @param ecx The window id.
; @param r8d The gc id.
; @param r9d Packed x and y.
x11_draw_text:
static x11_draw_text:perform
  push rbp
  mov rbp, rsp

  sub rsp, 1024

  mov DWORD [rsp + 1*4], ecx ; Retailer the window id straight within the packet information on the stack.
  mov DWORD [rsp + 2*4], r8d ; Retailer the gc id straight within the packet information on the stack.
  mov DWORD [rsp + 3*4], r9d ; Retailer x, y straight within the packet information on the stack.

  mov r8d, edx ; Retailer the string size in r8 since edx might be overwritten subsequent.
  mov QWORD [rsp + 1024 - 8], rdi ; Retailer the socket file descriptor on the stack to free the register.

  ; Compute padding and packet u32 depend with division and modulo 4.
  mov eax, edx ; Put dividend in eax.
  mov ecx, 4 ; Put divisor in ecx.
  cdq ; Signal lengthen.
  idiv ecx ; Compute eax / ecx, and put the rest (i.e. modulo) in edx.
  ; LLVM optimizer magic: `(4-x)%4 == -x & 3`, for some cause.
  neg edx
  and edx, 3
  mov r9d, edx ; Retailer padding in r9.

  mov eax, r8d 
  add eax, r9d
  shr eax, 2 ; Compute: eax /= 4
  add eax, 4 ; eax now accommodates the packet u32 depend.


  %outline X11_OP_REQ_IMAGE_TEXT8 0x4c
  mov DWORD [rsp + 0*4], r8d
  shl DWORD [rsp + 0*4], 8
  or DWORD [rsp + 0*4], X11_OP_REQ_IMAGE_TEXT8
  mov ecx, eax
  shl ecx, 16
  or [rsp + 0*4], ecx

  ; Copy the textual content string into the packet information on the stack.
  mov rsi, rsi ; Supply string in rsi.
  lea rdi, [rsp + 4*4] ; Vacation spot
  cld ; Transfer ahead
  mov ecx, r8d ; String size.
  rep movsb ; Copy.

  mov rdx, rax ; packet u32 depend
  imul rdx, 4
  mov rax, SYSCALL_WRITE
  mov rdi, QWORD [rsp + 1024 - 8] ; fd
  lea rsi, [rsp]
  syscall

  cmp rax, rdx
  jnz die

  add rsp, 1024

  pop rbp
  ret

_start:
international _start:perform
  name x11_connect_to_server
  mov r15, rax ; Retailer the socket file descriptor in r15.

  mov rdi, rax
  name x11_send_handshake

  mov r12d, eax ; Retailer the window root id in r12.

  name x11_next_id
  mov r13d, eax ; Retailer the gc_id in r13.

  name x11_next_id
  mov r14d, eax ; Retailer the font_id in r14.

  mov rdi, r15
  mov esi, r14d
  name x11_open_font


  mov rdi, r15
  mov esi, r13d
  mov edx, r12d
  mov ecx, r14d
  name x11_create_gc

  name x11_next_id
  
  mov ebx, eax ; Retailer the window id in ebx.

  mov rdi, r15 ; socket fd
  mov esi, eax
  mov edx, r12d
  mov ecx, [root_visual_id]
  mov r8d, 200 | (200 << 16) ; x and y are 200
  %outline WINDOW_W 800
  %outline WINDOW_H 600
  mov r9d, WINDOW_W | (WINDOW_H << 16)
  name x11_create_window

  mov rdi, r15 ; socket fd
  mov esi, ebx
  name x11_map_window

  mov rdi, r15 ; socket fd
  name set_fd_non_blocking

  mov rdi, r15 ; socket fd
  mov esi, ebx ; window id
  mov edx, r13d ; gc id
  name poll_messages

  ; The top.
  mov rax, SYSCALL_EXIT
  mov rdi, 0
  syscall
  

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top