Now Reading
My first superoptimizer – Austin Z. Henley

My first superoptimizer – Austin Z. Henley

2023-05-29 15:47:30


My first superoptimizer – Austin Z. Henley









Austin Z. Henley

I work on AI + dev instruments.



5/29/2023

A screenshot of the code that generates all possible programs for the language.

See the dialogue of this submit on Hacker News.

It is time for one more useless venture.

For any given code snippet, is it attainable to search out absolutely the optimum code that has the identical output? Years in the past I chanced on this idea, often known as superoptimization. It is not sensible however the concept has been caught in my head.

The best way it really works: generate each attainable permutation of code directions and take a look at every generated program for equivalence to the unique program. That is principally it. As you’ll be able to think about, the area of attainable applications rapidly explodes, and testing two applications for equivalence additionally is not straightforward. But when it was attainable on computer systems again in 1987 then absolutely my pc can deal with it.

I’ll strive constructing a toy superoptimizer.


To make this possible, I’ll make up a fictional meeting language with only some directions. I am going to want to jot down an emulator that executes applications so as to present whether or not two applications are equal. The optimizer will then generate each attainable program on this language given some set of constraints and return the shortest one.

I broke the venture up into chunks:

  • Design a easy meeting language.
  • Write an emulator that executes an meeting program and returns the ultimate state.
  • Write a operate that exams the equivalence of two applications primarily based on their earlier than/after state.
  • Write an assembler that may translate to and from human-readable meeting and the interior meeting illustration.
  • Write an optimizer that generates each program of size 1 to n directions with each attainable operand from 0 to ok with x reminiscence cells.

The supply code might be discovered on GitHub.

Meeting language

I wish to restrict this to only some directions, however make sure that they’re attention-grabbing sufficient that the superoptimizer can discover optimizations that are not apparent to me.

Keep in mind, the objective is to not provide you with a complete, real-world meeting language, however quite to study superoptimizers.

To start out, I considerably arbitrarily selected:

  • LOAD val which masses the rapid worth into reminiscence location 0.
  • SWAP mem, mem which swaps the values of the 2 reminiscence areas.
  • XOR mem, mem which performs a bitwise XOR operation on the values of the reminiscence areas and shops the outcome within the first’s location.
  • INC mem which increments the worth on the reminiscence location.

You will see that these are fairly boring. There is not even a management stream instruction! It is going to be trivial so as to add extra later although.

Emulator

Writing the emulator is surprisingly straightforward. The reminiscence state is a listing of numbers and a program is a listing of directions with operands. Execute all of the directions and return the ultimate state of the reminiscence because the outcome.

class CPU:
    def __init__(self, max_mem_cells):
        self.max_mem_cells = max_mem_cells
        self.state = [0] * max_mem_cells
        self.ops = {'LOAD': self.load, 'SWAP': self.swap, 'XOR': self.xor, 'INC': self.inc}

    def execute(self, program):
        state = self.state.copy()
        for instruction in program:
            op = instruction[0]
            args = checklist(instruction[1:])
            args.insert(0, state)
            state = op(*args)
        return state

An instruction is a reference to the operate that performs the operation:

    def load(self, state, val):
        state[0] = val
        return state
    
    def swap(self, state, mem1, mem2):
        state[mem1], state[mem2] = state[mem2], state[mem1]
        return state
    
    def xor(self, state, mem1, mem2):
        state[mem1] = state[mem1] ^ state[mem2]
        return state
    
    def inc(self, state, mem):
        state[mem] += 1
        return state

That’s it for the emulator.

To make it simpler to make use of, I additionally applied a operate that takes human-readable meeting and converts it to a program in addition to a operate that takes a program and converts it to human-readable meeting. They are not notably attention-grabbing, however you’ll be able to see them right here: assembler.py.

Optimizer

Now for the enjoyable half. Generate each attainable program.

class Superoptimizer:
    def generate_programs(self, cpu, max_length, max_mem, max_val):
        for size in vary(1, max_length + 1):
            for prog in product(cpu.ops.values(), repeat=size):
                arg_sets = []
                for op in prog:
                    if op == cpu.load:
                        arg_sets.append([tuple([val]) for val in vary(max_val + 1)])
                    elif op == cpu.swap or op == cpu.xor: 
                        arg_sets.append(product(vary(max_mem), repeat=2))
                    elif op == cpu.inc:
                        arg_sets.append([tuple([val]) for val in vary(max_mem)])
                for arg_set in product(*arg_sets):
                    program = [(op, *args) for op, args in zip(prog, arg_set)] 
                    yield program

Do not let this operate overwhelm you. It’s producing each attainable program primarily based on a number of variables: program size, variety of directions, and variety of attainable operands (dimension of reminiscence or the utmost worth). I turned it right into a generator as a result of the primary model was inflicting my laptop computer to expire of reminiscence.

What’s the computational complexity of this? Horrible.

We are going to use this to check each program:

    def search(self, max_length, max_mem, max_val, target_state):
        cpu = CPU(max_mem)
        for program in self.generate_programs(cpu, max_length, max_mem, max_val):
            state = cpu.execute(program)
            if state == target_state:
                state = tuple(state) 
                if state not in self.program_cache or len(program) < len(self.program_cache[state]):
                    self.program_cache[state] = program
        return self.program_cache.get(tuple(target_state), None)

The search operate takes the constraints and the goal state as parameters. It considers two applications as equal if each of their ultimate reminiscence states are an identical. We are able to loosen this restriction later. It saves the shortest program with the proper state and finally returns it when the search is full.

Giving it a go

It’s time to put the superoptimizer to the take a look at. Given a program, can it discover the shortest program with the identical finish outcome?

My first program within the made-up meeting language:

LOAD 3
SWAP 0, 1
LOAD 3
SWAP 0, 2
LOAD 3
SWAP 0, 3
LOAD 3

The state of the completed program is [3, 3, 3, 3]. It’s filling the finite reminiscence with 3s.

What’s the optimum code in response to the superoptimizer?

The terminal output of running the first test, which shows the assembly input, the final memory state, and the optimal code.

LOAD 3
XOR 1, 0
XOR 2, 0
XOR 3, 0

It really works!

The superoptimizer is utilizing the XOR instruction to duplicate values and storing them within the applicable location as an alternative of LOAD and SWAP. Very cool, I did not consider that.

Now for one more take a look at.

It’s also attainable to offer the superoptimizer simply the specified ultimate state with out first writing a program that produces it. So I can inform it to search out the shortest program that leads to state [0, 2, 1]. The optimum program:

LOAD 2
SWAP 0, 1
INC 2

It used the LOAD and SWAP sample for the 2, however it used INC for the 1 because it takes one much less instruction. Victory, once more!

Efficiency

The efficiency will not be good by any means. Trying to find the ultimate state [1, 0, 1, 0, 1, 0] with not more than 6 directions and values solely as much as 2 precipitated the superoptimizer to generate 1,580,000,000 applications for 30+ minutes on my laptop computer earlier than I killed it.

The terminal showing that the superoptimizer still has not completed after trying 1.5 million different programs.

Future work

There are numerous enhancements to be made:

Begin state. The emulator assumes the beginning state is all the time the identical: 0 for all reminiscence areas. However actually we wish to permit totally different begin states in order that this system can take inputs, like fib(n), which might compute the n’th fibonacci quantity. We are able to do that by permitting the emulator to take a begin state parameter that units the primary subset of reminiscence values.

Program equivalence. To raised help program equivalence, together with inputs and outputs, like exhibiting fib(n) is equal to opt_fib(n), then we additionally want a notion of output. This may be one other parameter that defines the output as the primary x reminiscence cells when this system finishes. Moreover, we want a operate to indicate that the equivalence stands for a number of inputs. It’s most likely adequate to check a predefined set of integers for each enter quite than testing all of them exhaustively.

Pruning. The superoptimizer is looking out by way of a ridiculous variety of applications, lots of that are nonsensical. For instance, I peaked at 1000’s of generated applications they usually had been XORing zeros collectively repeatedly and overwriting unused values. It’s going to complicate the era operate, however it is going to cut back the search area dramatically.

Extra directions. Given the restricted instruction set, the superoptimizer will not be capable of uncover extra attention-grabbing patterns. The very first thing I might add is a few type of conditional instruction so it could possibly act on inputs. Like CMP mem, mem which units reminiscence location 0 to a selected worth relying on if the primary operand’s worth is bigger than, lower than, or equal to the second. Then I might add no matter else is required to implement traditional capabilities like fib(n) and max(a, b), most likely ADD and SUB. Basic compilers are nice about changing conditionals with arithmetic so it could be a superb take a look at for a superoptimizer. Watch out with emulating jumps although since you will not know if this system ever terminates!


Please let me know in the event you handle to make any enhancements to the superoptimizer. It could be enjoyable to proceed engaged on it. You could find all the supply on GitHub.

Should you’re taken with studying extra about compilers, see my tutorial sequence: Let’s make a Teeny Tiny compiler.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top