Now Reading
An Encoding Diagram Try | Kenan Bölükbaşı

An Encoding Diagram Try | Kenan Bölükbaşı

2024-01-07 03:38:41

I attempted to generate a diagram of the x86-64 legacy encoding based mostly on what I do know up to now.

In the last post, we dove into the x86-64 binary we generated earlier, and discovered the encoding.

[ Check out all posts in “low-level” series here. ]

Just lately, based mostly on my restricted (and presumably defective) understanding, I drew an encoding diagram to assist myself hand-decode x86-64. At the moment, I made a decision to share that diagram.

We may even take a look at it with just a few instruction variations. Hopefully, it can not less than assist somebody construct a minimal psychological mannequin.

Addressing Modes

Understanding the whole encoding requires understanding all doable “addressing modes” utilized in x86 household, intimately. Nevertheless, the nomenclature concerning addressing modes varies.

I’m just about a beginner in meeting. So I moderately keep away from including additional to the confusion by making an attempt to doc addressing modes. I simply skimmed this StackOverflow answer and this blog post, each appear to be good, detailed descriptions.

However I’ll point out just a few “constructing block” phrases to assist learn the diagram.

These three phrases will present up within the SIB byte:

  • Base: A base deal with saved in a register.
  • Index: A, presumably scaled, numeric offset saved in a register.
  • Scale: A scaling issue, scaling choices being 1, 2, 4, and eight, saved within the instruction.

Whereas at it, I additionally usually encounter these phrases that check with static displacements encoded within the instruction:

  • Absolute: Means there’s a static deal with actually encoded within the instruction.
  • Offset: Means there’s a static numeric offset actually encoded within the instruction.

If you happen to learn up additional on addressing modes, be aware that there are additionally circumstances the place two addressing strategies sound equivalent, whereas they’re subtly totally different due to encoding limitations.

Encoding Diagram

Beneath is the diagram. It doesn’t doc VEX encoding. That is simply the legacy encoding with REX prefix. The ModR/M and SIB bytes) are vertically positioned (least-significant-bit being the highest), to emphasise that REX byte “extends” the values saved of these bytes. Their precise placement within the instruction format must also be clear.

Legacy (non-VEX) x86-64 Instruction Encoding

The supply of this diagram is (and future iterations will likely be) available here in SVG form.

Proper click on and open the picture in new tab to see the diagram in full measurement.

Different Assets

In fact, that is inherently difficult. x86-only and less complicated diagrams in Encoding Real x86 Instructions web page are tremendous helpful for studying the format with out the added complexity of the REX prefix.

As standard, for those who want dependable data on this, it is best to try Intel’s Software Developer’s Manual and AMD’s AMD64 Architecture Programmer’s Manual volumes.

Testing

Now, let’s begin testing with an instruction, and make modifications based mostly on the diagram.

We begin with mov r8,rcx. If we assemble this, we get 49 89 c8.

The 4 (b0100) at the start matches the REX Fastened Bit Sample.

The second byte is not 0F, so this instruction has a single-byte opcode. Subsequently,89 is the opcode.

In response to reference table, opcode 89 doesn’t retailer any register values itself. But it surely states that ModR/M.reg shops a register.

Subsequently:

REX   : 49
OPCODE: 89
MODR/M: c8

Now that we all know the general instruction format, let’s determine the main points.

REX Prefix

Wanting on the REX byte (49) in binary:

We stated 4 (0100) is the REX Fastened Bit Sample. The remaining bits (1001) of the REX byte are:

REX.W: 1
REX.R: 0
REX.X: 0
REX.B: 1
  • REX.W flag being set means the instruction makes use of 64 bit operands.
  • REX.B goes to increase some register code someplace, successfully including 8 to its worth.

Since opcode doesn’t encode a register itself, and since there is no such thing as a SIB byte, I assume it’s ModR/M.rm being prolonged. (I’m truly unsure if this reasoning is strong sufficient to cowl all circumstances.)

It’s straightforward to visually verify if there may be an “prolonged” register code, as a result of it ought to present up as a register that has a numeric title sample from R8* to R15* as an alternative of one thing like RDX.

See Also

Keep in mind, our unique instruction was 49 89 c8, which is mov r8,rcx.

  • We are able to see the impact of REX.W being set, as 64-bit registers are used.
  • If we disassemble the model that has REX.W unset (41 89 c8), we get mov r8d, ecx which has 32-bit register operands.
  • And if we toggle REX.B as an alternative (48 89 c8), we get mov rax, rcx, which changed r8 with rax. Register code for rax is 0, and register code for r8 is 8.
  • We are able to additionally attempt to toggle REX.R, which is meant to increase the ModR/M.reg slot. Disassembling 4d 89 c8 will give us mov r8, r9. So this time, rcx grew to become r9.
  • Since there is no such thing as a SIB byte, I assume REX.X doesn’t have an effect for this instruction. (I’m not certain if toggling that’s legitimate or not.)

So the thriller of the REX is solved, and there may be nothing to decode within the opcode itself.

We are able to transfer on to the ModR/M byte.

ModR/M Byte

We’ll break up the ModR/M byte (c8) in binary type, into octal teams:

MOD:  11
REG: 001
RM : 000

The opcode already established that ModR/M.reg shops a register, not an opcode extension.

REX.R extends ModR/M.reg, however it’s unset. So efficient register code for ModR/M.reg is 1, which suggests rcx.

REX.B extends ModR/M.rm, and it’s set. So efficient register code for ModR/M.rm is 8, which suggests r8.

  • If we bump ModR/M.rm by 1 (49 89 c9), we get mov r9, rcx.
  • If we as an alternative bump ModR/M.reg by 1 (49 89 d0), we get mov r8, rdx.

So the general encoding seems to be like this:

          ____________________
         /                    
0100 1001  10001001  11  001  (B)000
     |  |  MOV        | %RCX ,   %R8
     |  |             |
64-bit  lengthen        register addressing
        ModR/M.rm

The Finish

That’s it!

I believe that is all I wish to write about instruction encoding for some time. However I discovered rather a lot within the course of.

Within the subsequent submit, I’ll share a listing of sources that I discovered whereas writing these posts, and wrap this subject. However the “low-level” collection will proceed.

Thanks for studying!

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top