totally_safe_transmute, line-by-line
Programming, philosophy, pedaling.
Mar 16, 2021
Tags:
This publish is at the least a 12 months previous.
A number of weeks in the past, Twitter deigned to share this with me:
The file linked
(written by Benjamin Herr) within the tweet purports to implement a
model of Rust’s std::mem::transmute
with none use of unsafe
. For those who run it, you’ll discover that it does certainly work!
1
2
3
4
5
6
#[test]
fn principal() {
let v: Vec<u8> = b"foo".to_vec();
let v: String = totally_safe_transmute(v);
assert_eq!(&v, "foo");
}
Yields:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ git clone https://github.com/ben0x539/totally-safe-transmute
$ cargo construct
$ cargo take a look at
Compiling totally-safe-transmute v0.0.3 (/tmp/totally-safe-transmute)
Completed take a look at [unoptimized + debuginfo] goal(s) in 0.49s
Operating goal/debug/deps/totally_safe_transmute-be2ea6d9a3f8d258
operating 1 take a look at
take a look at principal ... okay
take a look at outcome: okay. 1 handed; 0 failed; 0 ignored; 0 measured; 0 filtered out; completed in 0.00s
Doc-tests totally-safe-transmute
operating 0 assessments
take a look at outcome: okay. 0 handed; 0 failed; 0 ignored; 0 measured; 0 filtered out; completed in 0.00s
This weblog publish will undergo that implementation, line-by-line, and clarify
the way it works. Nothing about it’s particularly sophisticated; I simply received an enormous
kick out of it and figured I’d present an in depth rationalization. Rust newcomers
are the meant viewers.
Fast background: transmutation
Most unsafe languages have mechanisms for transmuting (or reinterpreting)
the information at some reminiscence handle as an completely new kind. C consists of
reinterpretation below its casting syntax; C++
gives the extra express reinterpret_cast<T>
(with plenty of warnings
about when reinterpret_cast
is well-defined).
Reinterpretation has loads of use instances:
-
C-style “generic” APIs sometimes produce leads to the type of a
void *
,
with the caller being anticipated to forged thevoid *
to an appropriate kind.
Callers are liable for guaranteeing that the vacation spot kind
is an identical to or appropriate with the sort that was initially forged tovoid *
. -
C and C++ callback patterns steadily present a
void *
parameter, permitting
customers to provide extra knowledge or context between callbacks. Every callback
is then liable for casting to the suitable kind. -
Pointer values sometimes want to be round-tripped by means of an
integral kind. C++ particularly permits this, as long as the vacation spot integral
kind has at the least ample width to characterize all potential pointer values. -
Polymorphism: the Berkeley sockets API
specifiesconnect(2)
as accepting astruct sockaddr *
, which is definitely reinterpreted internally
as one of many family-specificsockaddr
buildings (likesockaddr_in
for IPv4
sockets). C++ additionally explicitly permits this below its “similarity” guidelines. -
Low cost object serialization or conversion: associated to the above, however barely
completely different: each C and C++ are okay with you changing just about any object
tochar *
. This enables objects to be handled as luggage of bytes,
which is helpful when writing a hash desk (you don’t care what the contents
are, you simply wish to uniquely determine them) or when serializing buildings
in a host-specific format.
Every of the above is beneficial, however extremely unsafe: transmutation will not be
an operation at runtime that turns one kind into one other, however reasonably a
directive at compile time to deal with some place in reminiscence as if its kind
is completely different. The outcome: most potential transmutations between sorts
end in undefined conduct.
Transmutation in Rust
Rust must interface with C, so Rust helps transmutation. It does so through
std::mem::transmute
. However transmutation
is a basically unsafe operation, so Rust forbids the usage of mem::transmute
besides
for in explicitly unsafe
contexts:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
use std::mem;
#[repr(C)]
pub struct Foo {
pub a: u8,
pub b: u8,
pub c: u8,
pub d: u8
}
#[repr(C)]
pub struct Bar {
pub a: u32
}
fn principal() {
let foo = Foo { a: 0xaa, b: 0xbb, c: 0xcc, d: 0xdd };
let bar: Bar = unsafe { mem::transmute(foo) };
// output (on x86-64): bar.a = 0xddccbbaa
println!("bar.a = {:x}", bar.a);
}
(View it on Godbolt.)
transmute
can, in fact, be wrapped into secure contexts. However the underlying operation
will at all times be basically unsafe, and shouldn’t be potential in in any other case secure Rust code.
So, how does totally_safe_transmute
do it?
Breakdown
First, right here’s the whole thing of totally_safe_transmute
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#![forbid(unsafe_code)]
use std::{io::{self, Write, Search}, fs};
pub fn totally_safe_transmute<T, U>(v: T) -> U {
#[repr(C)]
enum E<T, U> {
T(T),
#[allow(dead_code)] U(U),
}
let v = E::T(v);
let mut f = fs::OpenOptions::new()
.write(true)
.open("/proc/self/mem").anticipate("welp");
f.search(io::SeekFrom::Begin(&v as *const _ as u64)).anticipate("oof");
f.write(&[1]).anticipate("darn");
if let E::U(v) = v {
return v;
}
panic!("rip");
}
Let’s undergo it, (largely) line-by-line.
#![forbid(unsafe_code)]
forbid
is an attribute that controls the rustc
linter (together with permit
, warn
, and deny
).
On this case, we’re telling rustc
to forbid something that journeys the
unsafe_code
lint,
which does precisely what it says on the tin: catches use of unsafe
.
On this case, forbidding use of unsafe
doesn’t do something: a fast learn of the code reveals
that unsafe
by no means reveals up. But it surely’s a top-level proof to the reader that, if rustc
accepts
the code (and it does), then there isn’t any use of unsafe
.
totally_safe_transmute
Right here’s our signature:
1
pub fn totally_safe_transmute<T, U>(v: T) -> U { ... }
In sum: totally_safe_transmute
takes two kind parameters: T
and U
.
It then takes one concrete parameter, v
, which is of kind T
. Lastly, it
returns a U
.
We all know that the job of a transmutation operate is to reinterpret a sort of some
worth as another kind, so we are able to rewrite this signature as:
1
pub fn totally_safe_transmute<SrcTy, DstTy>(v: SrcTy) -> DstTy { ... }
enum E
Our subsequent bit is a terse enum
with some funky attributes. Rewritten with our pleasant kind
parameters:
1
2
3
4
5
6
#[repr(C)]
enum E<SrcTy, DstTy> {
T(SrcTy),
#[allow(dead_code)] U(DstTy),
}
let v = E::T(v);
First, we’re marking E
as repr(C)
. That is an
ABI-modifying attribute: it tells
rustc
to put E
out utilizing the platform’s C ABI reasonably than the (deliberately) unstable Rust ABI.
What does this really imply? For enums with fields (like this one), Rust
uses a “tagged union” representation.
In impact, E
turns into one thing like this (in C syntax):
1
2
3
4
5
6
7
struct E {
int discriminant;
union {
SrcTy T;
DstTy U;
} knowledge;
};
We’ll see why that’s essential in a bit.
Subsequent: E
has two variants: the primary holds a price of kind SrcTy
, and the opposite holds a price
of DstTy
.
However wait! One other rustc
linter annotation: this time, we’re telling rustc
that it’s okay
for the U
variant to fail the
dead_code
lint.
Usually, rustc
would warn us upon statically inferring that U
is rarely used; with
dead_code
enabled, it silences that warning. Just like the ABI structure, we’ll see why that’s essential
shortly.
Lastly, we shadow our v
parameter with a brand new binding. v
was already of kind T
, so
creating an E::T
from it’s no downside in any respect.
I/O
That is the place the (principal) magic occurs:
1
2
3
4
5
6
let mut f = fs::OpenOptions::new()
.write(true)
.open("/proc/self/mem").anticipate("welp");
f.search(io::SeekFrom::Begin(&v as *const _ as u64)).anticipate("oof");
f.write(&[1]).anticipate("darn");
First, we’re opening a file. Particularly, we’re opening /proc/self/mem
in write mode.
/proc/self/mem
is a very particular file: it presents a view of the present course of’s
reminiscence, sparsely mapped by digital handle ranges.
As a fast hack, we are able to show this to ourselves in Python by testing the
in-memory illustration of a str
object:
1
2
3
4
5
6
7
8
9
>>> x = "this string is lengthy sufficient to stop any string interning"
>>> # in cpython, an object's id is (often) its pointer
>>> x_addr = id(x)
>>> hex(x_addr)
'0x7ff1bc7cfce0'
>>> mem = open("/proc/self/mem", mode="rb")
>>> mem.search(x_addr)
>>> mem.learn(len(x) * 4)
b'[SNIP] "thix00x00x00x00x00x00x00x00this string is lengthy sufficient to stop any string interningx00e'[SNIP]'
(I trimmed the output a bit. You get the purpose.)
We are able to even poke reminiscence by writing into /proc/self/mem
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>>> # utilizing ctypes to keep away from the structure muckery we noticed above
>>> import ctypes
>>> cstr = ctypes.c_char_p(b"look ma, no palms")
>>> cstr_addr = ctypes.forged(cstr, ctypes.c_void_p).worth
>>> hex(cstr_addr)
'0x7f47f3e9c790'
>>> mem = open("/proc/self/mem", mode="r+b")
>>> mem.search(cstr_addr)
>>> mem.learn(len(cstr.worth))
b'look ma, no palms'
>>> mem.search(cstr_addr + 5)
>>> mem.write('p')
>>> mem.search(cstr_addr)
>>> mem.learn(len(cstr.worth))
b'look pa, no palms'
The following two items of totally_safe_transmute
ought to now make sense: we search
to the handle of our v
variable (which is now a variant of E
) inside our personal operating course of,
and we write a single u8
to it ([1]
).
However why 1
? Recall our C ABI illustration of E
above! The primary piece of E
is our
union discriminator. When knowledge
is SrcTy
, discriminant
is 0
. Once we forcefully
overwrite it to 1
, knowledge
is now interpreted as DstTy
!
The final bit
Okay, so we’ve poked reminiscence and turned our E::T
into an E::U
. Let’s see how we get it out:
1
2
3
4
5
if let E::U(v) = v {
return v;
}
panic!("rip");
At first look, there’s nothing particular about this: we’re merely discarding the enum wrapper
that we added earlier in order that we are able to return our newly-minted worth of DstTy
.
However that is really deceptively intelligent, and entails fooling the compiler:
- The compiler is aware of that
totally_safe_transmute
should returnDstTy
. - …however the one option to return a
DstTy
is forv
to be anE::U
. - …however
v
was unconditionally initialized as anE::T
, in order that return is rarely reached. - …so, so far as Rust is anxious, this operate at all times unconditionally
panic!
s.
This is why we would have liked permit(dead_code)
earlier: no E::U
is ever constructed in a fashion that
may presumably attain the return
assertion, so there’s merely no want for it as a variant.
And certainly, we are able to verify this by eradicating the permit
attribute:
1
2
3
4
5
6
7
8
9
10
11
12
13
$ cargo construct
Compiling totally-safe-transmute v0.0.3 (/tmp/totally-safe-transmute)
warning: variant is rarely constructed: `U`
--> src/lib.rs:9:9
|
9 | U(U),
| ^^^^
|
= observe: `#[warn(dead_code)]` on by default
warning: 1 warning emitted
Completed dev [unoptimized + debuginfo] goal(s) in 0.15s
However alas: it’s probably not lifeless code: the compiler is “flawed,” and we pop
an E::U
into existence at runtime by modifying this system’s personal reminiscence. We then hit
our not possible situation, and return our transmuted worth.
Wrapup
totally_safe_transmute
is a pleasant hack that demonstrates a key limitation when reasoning
a few program’s conduct: each conduct mannequin is contingent on an environmental mannequin and
how this system (or this system’s runtime, or the compiler, or no matter else) chooses (or doesn’t
select) to deal with seemingly not possible circumstances in mentioned surroundings.
The power to do that doesn’t mirror basic unsafety in Rust, any greater than it does
any secure language: from Rust’s perspective, what totally_unsafe_transmute
does is not possible
and subsequently undefined; there’s no level in in dealing with one thing that can’t occur.
Another attention-grabbing bits:
- As talked about beforehand, this hack solely works on Linux on account of its dependency on
/proc/self/mem
.
Different OSes might have related mechanisms. - I haven’t examined this, however I’m fairly positive it solely works on little-endian architectures (like x86).
On big-endian architectures, thewrite
would in all probability must be adjusted. - If we’re being extraordinarily pedantic: this technically isn’t a transmutation. Semantically,
transmutation is a operationless change in sorts at compile time;totally_safe_transmute
rewrites the in-memory illustration of this system to perform equal conduct at runtime.
I don’t assume this can be a distinction that makes a distinction. - As a result of
totally_safe_transmute
depends on undefined conduct (an not possible program state),
Rust could be right in erasing theE::U
department altogether and lowering the operate to an
unconditionalpanic!
. It doesn’t do this in my testing (even in launch mode), however there’s
completely nothing in this system semantics that stops it from doing so. However perhaps
sooner or later it should, andtotally_safe_transmute
will cease working!
Discussions: