Now Reading
totally_safe_transmute, line-by-line

totally_safe_transmute, line-by-line

2024-01-08 15:38:45



totally_safe_transmute, line-by-line




Programming, philosophy, pedaling.


Mar 16, 2021

   


Tags:

curiosity,

programming,

rust

   

This publish is at the least a 12 months previous.

A number of weeks in the past, Twitter deigned to share this with me:

The file linked
(written by Benjamin Herr) within the tweet purports to implement a
model of Rust’s std::mem::transmute
with none use of unsafe. For those who run it, you’ll discover that it does certainly work!

1
2
3
4
5
6
#[test]
fn principal() {
    let v: Vec<u8> = b"foo".to_vec();
    let v: String = totally_safe_transmute(v);
    assert_eq!(&v, "foo");
}

Yields:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ git clone https://github.com/ben0x539/totally-safe-transmute
$ cargo construct
$ cargo take a look at
   Compiling totally-safe-transmute v0.0.3 (/tmp/totally-safe-transmute)
    Completed take a look at [unoptimized + debuginfo] goal(s) in 0.49s
     Operating goal/debug/deps/totally_safe_transmute-be2ea6d9a3f8d258

operating 1 take a look at
take a look at principal ... okay

take a look at outcome: okay. 1 handed; 0 failed; 0 ignored; 0 measured; 0 filtered out; completed in 0.00s

   Doc-tests totally-safe-transmute

operating 0 assessments

take a look at outcome: okay. 0 handed; 0 failed; 0 ignored; 0 measured; 0 filtered out; completed in 0.00s

This weblog publish will undergo that implementation, line-by-line, and clarify
the way it works. Nothing about it’s particularly sophisticated; I simply received an enormous
kick out of it and figured I’d present an in depth rationalization. Rust newcomers
are the meant viewers.

Fast background: transmutation

Most unsafe languages have mechanisms for transmuting (or reinterpreting)
the information at some reminiscence handle as an completely new kind. C consists of
reinterpretation below its casting syntax; C++
gives the extra express reinterpret_cast<T>
(with plenty of warnings
about when reinterpret_cast is well-defined).

Reinterpretation has loads of use instances:

  • C-style “generic” APIs sometimes produce leads to the type of a void *,
    with the caller being anticipated to forged the void * to an appropriate kind.
    Callers are liable for guaranteeing that the vacation spot kind
    is an identical to or appropriate with the sort that was initially forged to void *.

  • C and C++ callback patterns steadily present a void * parameter, permitting
    customers to provide extra knowledge or context between callbacks. Every callback
    is then liable for casting to the suitable kind.

  • Pointer values sometimes want to be round-tripped by means of an
    integral kind. C++ particularly permits this, as long as the vacation spot integral
    kind has at the least ample width to characterize all potential pointer values.

  • Polymorphism: the Berkeley sockets API
    specifies connect(2)
    as accepting a struct sockaddr *, which is definitely reinterpreted internally
    as one of many family-specific sockaddr buildings (like sockaddr_in for IPv4
    sockets). C++ additionally explicitly permits this below its “similarity” guidelines.

  • Low cost object serialization or conversion: associated to the above, however barely
    completely different: each C and C++ are okay with you changing just about any object
    to char *. This enables objects to be handled as luggage of bytes,
    which is helpful when writing a hash desk (you don’t care what the contents
    are, you simply wish to uniquely determine them) or when serializing buildings
    in a host-specific format.

Every of the above is beneficial, however extremely unsafe: transmutation will not be
an operation at runtime that turns one kind into one other, however reasonably a
directive at compile time to deal with some place in reminiscence as if its kind
is completely different. The outcome: most potential transmutations between sorts
end in undefined conduct.

Transmutation in Rust

Rust must interface with C, so Rust helps transmutation. It does so through
std::mem::transmute. However transmutation
is a basically unsafe operation, so Rust forbids the usage of mem::transmute besides
for in explicitly unsafe contexts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
use std::mem;

#[repr(C)]
pub struct Foo {
    pub a: u8,
    pub b: u8,
    pub c: u8,
    pub d: u8
}

#[repr(C)]
pub struct Bar {
    pub a: u32
}

fn principal() {
    let foo = Foo { a: 0xaa, b: 0xbb, c: 0xcc, d: 0xdd };
    let bar: Bar = unsafe { mem::transmute(foo) };

    // output (on x86-64): bar.a = 0xddccbbaa
    println!("bar.a = {:x}", bar.a);
}

(View it on Godbolt.)

transmute can, in fact, be wrapped into secure contexts. However the underlying operation
will at all times be basically unsafe, and shouldn’t be potential in in any other case secure Rust code.

So, how does totally_safe_transmute do it?

Breakdown

First, right here’s the whole thing of totally_safe_transmute:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#![forbid(unsafe_code)]

use std::{io::{self, Write, Search}, fs};

pub fn totally_safe_transmute<T, U>(v: T) -> U {
    #[repr(C)]
    enum E<T, U> {
        T(T),
        #[allow(dead_code)] U(U),
    }
    let v = E::T(v);

    let mut f = fs::OpenOptions::new()
        .write(true)
        .open("/proc/self/mem").anticipate("welp");

    f.search(io::SeekFrom::Begin(&v as *const _ as u64)).anticipate("oof");
    f.write(&[1]).anticipate("darn");

    if let E::U(v) = v {
        return v;
    }

    panic!("rip");
}

Let’s undergo it, (largely) line-by-line.

#![forbid(unsafe_code)]

forbid is an attribute that controls the rustc linter (together with permit, warn, and deny).
On this case, we’re telling rustc to forbid something that journeys the
unsafe_code lint,
which does precisely what it says on the tin: catches use of unsafe.

On this case, forbidding use of unsafe doesn’t do something: a fast learn of the code reveals
that unsafe by no means reveals up. But it surely’s a top-level proof to the reader that, if rustc accepts
the code (and it does), then there isn’t any use of unsafe.

totally_safe_transmute

Right here’s our signature:

1
pub fn totally_safe_transmute<T, U>(v: T) -> U { ... }

In sum: totally_safe_transmute takes two kind parameters: T and U.

It then takes one concrete parameter, v, which is of kind T. Lastly, it
returns a U.

We all know that the job of a transmutation operate is to reinterpret a sort of some
worth as another kind, so we are able to rewrite this signature as:

1
pub fn totally_safe_transmute<SrcTy, DstTy>(v: SrcTy) -> DstTy { ... }

enum E

Our subsequent bit is a terse enum with some funky attributes. Rewritten with our pleasant kind
parameters:

1
2
3
4
5
6
#[repr(C)]
enum E<SrcTy, DstTy> {
    T(SrcTy),
    #[allow(dead_code)] U(DstTy),
}
let v = E::T(v);

First, we’re marking E as repr(C). That is an
ABI-modifying attribute: it tells
rustc to put E out utilizing the platform’s C ABI reasonably than the (deliberately) unstable Rust ABI.

What does this really imply? For enums with fields (like this one), Rust
uses a “tagged union” representation.
In impact, E turns into one thing like this (in C syntax):

1
2
3
4
5
6
7
struct E {
  int discriminant;
  union {
    SrcTy T;
    DstTy U;
  } knowledge;
};

We’ll see why that’s essential in a bit.

See Also

Subsequent: E has two variants: the primary holds a price of kind SrcTy, and the opposite holds a price
of DstTy.

However wait! One other rustc linter annotation: this time, we’re telling rustc that it’s okay
for the U variant to fail the
dead_code lint.
Usually, rustc would warn us upon statically inferring that U is rarely used; with
dead_code enabled, it silences that warning. Just like the ABI structure, we’ll see why that’s essential
shortly.

Lastly, we shadow our v parameter with a brand new binding. v was already of kind T, so
creating an E::T from it’s no downside in any respect.

I/O

That is the place the (principal) magic occurs:

1
2
3
4
5
6
let mut f = fs::OpenOptions::new()
    .write(true)
    .open("/proc/self/mem").anticipate("welp");

f.search(io::SeekFrom::Begin(&v as *const _ as u64)).anticipate("oof");
f.write(&[1]).anticipate("darn");

First, we’re opening a file. Particularly, we’re opening /proc/self/mem in write mode.

/proc/self/mem is a very particular file: it presents a view of the present course of’s
reminiscence, sparsely mapped by digital handle ranges.

As a fast hack, we are able to show this to ourselves in Python by testing the
in-memory illustration of a str object:

1
2
3
4
5
6
7
8
9
>>> x = "this string is lengthy sufficient to stop any string interning"
>>> # in cpython, an object's id is (often) its pointer
>>> x_addr = id(x)
>>> hex(x_addr)
'0x7ff1bc7cfce0'
>>> mem = open("/proc/self/mem", mode="rb")
>>> mem.search(x_addr)
>>> mem.learn(len(x) * 4)
b'[SNIP] "thix00x00x00x00x00x00x00x00this string is lengthy sufficient to stop any string interningx00e'[SNIP]'

(I trimmed the output a bit. You get the purpose.)

We are able to even poke reminiscence by writing into /proc/self/mem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>>> # utilizing ctypes to keep away from the structure muckery we noticed above
>>> import ctypes
>>> cstr = ctypes.c_char_p(b"look ma, no palms")
>>> cstr_addr = ctypes.forged(cstr, ctypes.c_void_p).worth
>>> hex(cstr_addr)
'0x7f47f3e9c790'
>>> mem = open("/proc/self/mem", mode="r+b")
>>> mem.search(cstr_addr)
>>> mem.learn(len(cstr.worth))
b'look ma, no palms'
>>> mem.search(cstr_addr + 5)
>>> mem.write('p')
>>> mem.search(cstr_addr)
>>> mem.learn(len(cstr.worth))
b'look pa, no palms'

The following two items of totally_safe_transmute ought to now make sense: we search
to the handle of our v variable (which is now a variant of E) inside our personal operating course of,
and we write a single u8 to it ([1]).

However why 1? Recall our C ABI illustration of E above! The primary piece of E is our
union discriminator. When knowledge is SrcTy, discriminant is 0. Once we forcefully
overwrite it to 1, knowledge is now interpreted as DstTy!

The final bit

Okay, so we’ve poked reminiscence and turned our E::T into an E::U. Let’s see how we get it out:

1
2
3
4
5
if let E::U(v) = v {
    return v;
}

panic!("rip");

At first look, there’s nothing particular about this: we’re merely discarding the enum wrapper
that we added earlier in order that we are able to return our newly-minted worth of DstTy.

However that is really deceptively intelligent, and entails fooling the compiler:

  • The compiler is aware of that totally_safe_transmute should return DstTy.
  • …however the one option to return a DstTy is for v to be an E::U.
  • however v was unconditionally initialized as an E::T, in order that return is rarely reached.
  • so, so far as Rust is anxious, this operate at all times unconditionally panic!s.

This is why we would have liked permit(dead_code) earlier: no E::U is ever constructed in a fashion that
may presumably attain the return assertion, so there’s merely no want for it as a variant.
And certainly, we are able to verify this by eradicating the permit attribute:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ cargo construct
   Compiling totally-safe-transmute v0.0.3 (/tmp/totally-safe-transmute)
warning: variant is rarely constructed: `U`
 --> src/lib.rs:9:9
  |
9 |         U(U),
  |         ^^^^
  |
  = observe: `#[warn(dead_code)]` on by default

warning: 1 warning emitted

    Completed dev [unoptimized + debuginfo] goal(s) in 0.15s

However alas: it’s probably not lifeless code: the compiler is “flawed,” and we pop
an E::U into existence at runtime by modifying this system’s personal reminiscence. We then hit
our not possible situation, and return our transmuted worth.

Wrapup

totally_safe_transmute is a pleasant hack that demonstrates a key limitation when reasoning
a few program’s conduct: each conduct mannequin is contingent on an environmental mannequin and
how this system (or this system’s runtime, or the compiler, or no matter else) chooses (or doesn’t
select) to deal with seemingly not possible circumstances in mentioned surroundings.

The power to do that doesn’t mirror basic unsafety in Rust, any greater than it does
any secure language: from Rust’s perspective, what totally_unsafe_transmute does is not possible
and subsequently undefined; there’s no level in in dealing with one thing that can’t occur.

Another attention-grabbing bits:

  • As talked about beforehand, this hack solely works on Linux on account of its dependency on /proc/self/mem.
    Different OSes might have related mechanisms.
  • I haven’t examined this, however I’m fairly positive it solely works on little-endian architectures (like x86).
    On big-endian architectures, the write would in all probability must be adjusted.
  • If we’re being extraordinarily pedantic: this technically isn’t a transmutation. Semantically,
    transmutation is a operationless change in sorts at compile time; totally_safe_transmute
    rewrites the in-memory illustration of this system to perform equal conduct at runtime.
    I don’t assume this can be a distinction that makes a distinction.
  • As a result of totally_safe_transmute depends on undefined conduct (an not possible program state),
    Rust could be right in erasing the E::U department altogether and lowering the operate to an
    unconditional panic!. It doesn’t do this in my testing (even in launch mode), however there’s
    completely nothing in this system semantics that stops it from doing so. However perhaps
    sooner or later it should, and totally_safe_transmute will cease working!



Discussions:

Reddit




Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top