An Encoding Diagram Try | Kenan Bölükbaşı
I attempted to generate a diagram of the x86-64 legacy encoding based mostly on what I do know up to now.
In the last post, we dove into the x86-64 binary we generated earlier, and discovered the encoding.
[ Check out all posts in “low-level” series here. ]Just lately, based mostly on my restricted (and presumably defective) understanding, I drew an encoding diagram to assist myself hand-decode x86-64. At the moment, I made a decision to share that diagram.
We may even take a look at it with just a few instruction variations. Hopefully, it can not less than assist somebody construct a minimal psychological mannequin.
Addressing Modes
Understanding the whole encoding requires understanding all doable “addressing modes” utilized in x86 household, intimately. Nevertheless, the nomenclature concerning addressing modes varies.
I’m just about a beginner in meeting. So I moderately keep away from including additional to the confusion by making an attempt to doc addressing modes. I simply skimmed this StackOverflow answer and this blog post, each appear to be good, detailed descriptions.
However I’ll point out just a few “constructing block” phrases to assist learn the diagram.
These three phrases will present up within the SIB byte
:
- Base: A base deal with saved in a register.
- Index: A, presumably scaled, numeric offset saved in a register.
- Scale: A scaling issue, scaling choices being 1, 2, 4, and eight, saved within the instruction.
Whereas at it, I additionally usually encounter these phrases that check with static displacements
encoded within the instruction:
- Absolute: Means there’s a static deal with actually encoded within the instruction.
- Offset: Means there’s a static numeric offset actually encoded within the instruction.
If you happen to learn up additional on addressing modes, be aware that there are additionally circumstances the place two addressing strategies sound equivalent, whereas they’re subtly totally different due to encoding limitations.
Encoding Diagram
Beneath is the diagram. It doesn’t doc VEX encoding. That is simply the legacy encoding with REX prefix. The ModR/M
and SIB
bytes) are vertically positioned (least-significant-bit being the highest), to emphasise that REX byte “extends” the values saved of these bytes. Their precise placement within the instruction format must also be clear.
The supply of this diagram is (and future iterations will likely be) available here in SVG form.
Proper click on and open the picture in new tab to see the diagram in full measurement.
Different Assets
In fact, that is inherently difficult. x86-only and less complicated diagrams in Encoding Real x86 Instructions web page are tremendous helpful for studying the format with out the added complexity of the REX prefix.
As standard, for those who want dependable data on this, it is best to try Intel’s Software Developer’s Manual and AMD’s AMD64 Architecture Programmer’s Manual volumes.
Testing
Now, let’s begin testing with an instruction, and make modifications based mostly on the diagram.
We begin with mov r8,rcx
. If we assemble this, we get 49 89 c8
.
The 4
(b0100
) at the start matches the REX Fastened Bit Sample.
The second byte is not 0F
, so this instruction has a single-byte opcode. Subsequently,89
is the opcode.
In response to reference table, opcode 89
doesn’t retailer any register values itself. But it surely states that ModR/M.reg
shops a register.
Subsequently:
REX : 49
OPCODE: 89
MODR/M: c8
Now that we all know the general instruction format, let’s determine the main points.
REX Prefix
Wanting on the REX byte (49
) in binary:
We stated 4
(0100
) is the REX Fastened Bit Sample. The remaining bits (1001
) of the REX byte are:
REX.W: 1
REX.R: 0
REX.X: 0
REX.B: 1
REX.W
flag being set means the instruction makes use of 64 bit operands.REX.B
goes to increase some register code someplace, successfully including 8 to its worth.
Since opcode doesn’t encode a register itself, and since there is no such thing as a SIB byte, I assume it’s ModR/M.rm
being prolonged. (I’m truly unsure if this reasoning is strong sufficient to cowl all circumstances.)
It’s straightforward to visually verify if there may be an “prolonged” register code, as a result of it ought to present up as a register that has a numeric title sample from R8*
to R15*
as an alternative of one thing like RDX
.
Keep in mind, our unique instruction was 49 89 c8
, which is mov r8,rcx
.
- We are able to see the impact of
REX.W
being set, as 64-bit registers are used. - If we disassemble the model that has
REX.W
unset (41 89 c8
), we getmov r8d, ecx
which has 32-bit register operands. - And if we toggle
REX.B
as an alternative (48 89 c8
), we getmov rax, rcx
, which changedr8
withrax
. Register code forrax
is 0, and register code forr8
is 8. - We are able to additionally attempt to toggle
REX.R
, which is meant to increase theModR/M.reg
slot. Disassembling4d 89 c8
will give usmov r8, r9
. So this time,rcx
grew to becomer9
. - Since there is no such thing as a
SIB
byte, I assumeREX.X
doesn’t have an effect for this instruction. (I’m not certain if toggling that’s legitimate or not.)
So the thriller of the REX is solved, and there may be nothing to decode within the opcode itself.
We are able to transfer on to the ModR/M byte.
ModR/M Byte
We’ll break up the ModR/M byte (c8
) in binary type, into octal teams:
MOD: 11
REG: 001
RM : 000
The opcode already established that ModR/M.reg
shops a register, not an opcode extension.
REX.R extends ModR/M.reg
, however it’s unset. So efficient register code for ModR/M.reg
is 1, which suggests rcx
.
REX.B extends ModR/M.rm
, and it’s set. So efficient register code for ModR/M.rm
is 8, which suggests r8
.
- If we bump
ModR/M.rm
by 1 (49 89 c9
), we getmov r9, rcx
. - If we as an alternative bump
ModR/M.reg
by 1 (49 89 d0
), we getmov r8, rdx
.
So the general encoding seems to be like this:
____________________
/
0100 1001 10001001 11 001 (B)000
| | MOV | %RCX , %R8
| | |
64-bit lengthen register addressing
ModR/M.rm
The Finish
That’s it!
I believe that is all I wish to write about instruction encoding for some time. However I discovered rather a lot within the course of.
Within the subsequent submit, I’ll share a listing of sources that I discovered whereas writing these posts, and wrap this subject. However the “low-level” collection will proceed.
Thanks for studying!