Now Reading
When Optimising Code Measure

When Optimising Code Measure

2024-01-15 05:59:44


Subscribe!



@ColinTheMathmo


My newest posts will be
discovered right here:


Earlier weblog posts:


Moreover, some earlier writings:

When optimising code, by no means guess, all the time measure.

This can be a truism that a number of folks quote, however it may be exhausting to
bear in mind, particularly within the warmth of battle (because it have been). Reasonably
luckily it got here to thoughts simply when wanted, as I discovered one thing
utterly surprising.

I used to be writing a easy implementation of the Fermat distinction of
squares methodology of factoring. This entails writing the quantity to
be factored as – you guessed it – the distinction of two squares.
If $n=a^2-b^2$ then $n=(a-b)(a+b)$ and now we have a factorisation
(offered we do not have $a-b=1$).


Model 0

So the plain factor to do is to take the sq. root of $n$, spherical
it as much as get a candidate for $a$, then have a look at $a^2-n$. If that is
a sq. then we’re achieved.
 
def factor_fermat0(n):

    a = square_root_ceil(n)
    whereas True:

        d = a*a-n

        b = square_root_ceil(d)

        if b**2 == d:
            return a-b, a+b

        a += 1


Model 1

We will enhance on this. We will enhance $a$ till now we have $a^2-b^2>n$,
after which enhance $b$ till $a^2-b^2<n$. If $n$ is odd then we’ll get
equality sooner or later, so we simply proceed till that occurs, and
we’re achieved.
 
def factor_fermat1( n ):
    a = square_root_ceil(n)
    b = square_root_ceil(a**2-n)

    whereas a**2-b**2 != n:
        whereas a**2-b**2 < n: a += 1
        whereas a**2-b**2 > n: b += 1

    return a-b, a+b


Model 2

Now we discover that we’re repeatedly computing the squares of $a$
and $b$, so we create variables to carry the squares and replace them
by remembering that $(x+1)^2=x^2+2x+1$.
 
def factor_fermat2( n ):
    a = square_root_ceil(n)
    a2 = a**2
    b = square_root_ceil(a**2-n)
    b2 = b**2
    whereas a2-b2 != n:
        whereas a2-b2 < n: a2, a = a2+a+a+1, a+1
        whereas a2-b2 > n: b2, b = b2+b+b+1, b+1

    return a-b, a+b


Model 3

We will enhance that barely additional by having $c=2a+1$ and $d=2b+1$,
and the code now turns into:
 
def factor_fermat3( n ):
    a = square_root_ceil(n)
    a2 = a**2
    c = 2*a+1
    b = square_root_ceil(a**2-n)
    b2 = b**2
    d = 2*b+1
    whereas a2-b2 != n:
        whereas a2-b2 < n: a2, c = a2+c, c+2
        whereas a2-b2 > n: b2, d = b2+d, d+2

    return (c-d)/2, (c+d)/2-1


Let’s examine how we’re doing

These enhancements all appear apparent and straight-forward, and we are able to
examine by doing a little timing checks. The occasions, and certainly, the ratios
of occasions, will differ based on the quantity being factored, so we are able to
do two completely different numbers and see what the pattern is. In the event you select to
analyse the algorithm you then’ll see what is going on on. Nevertheless, we
factored two completely different numbers:

4736906887651 =
1211141
x
3911111:
Routine

0

1

2

3
Run 0 9.223 0.409 0.258 0.188
Run 1 9.138 0.408 0.258 0.188
Run 2 9.146 0.407 0.262 0.194
Run 3 9.149 0.416 0.258 0.188
Run 4 9.228 0.408 0.272 0.189
Run 5 9.187 0.407 0.258 0.188
Run 6 9.154 0.421 0.259 0.188
Run 7 9.157 0.407 0.259 0.189


47368008965429 =
3911111
x
12111139:
Routine

0

1

2

3
Run 0 34.922 1.230 0.774 0.564
Run 1 34.276 1.214 0.775 0.562
Run 2 34.191 1.215 0.777 0.564
Run 3 34.198 1.214 0.776 0.564
Run 4 34.182 1.230 0.776 0.563
Run 5 34.226 1.244 0.775 0.563
Run 6 34.144 1.235 0.773 0.564
Run 7 34.104 1.233 0.775 0.565


We will see that the primary routine is garbage, however that is no shock,
because it’s computing a sq. root each time by the loop. For the
others we are able to see a transparent enchancment as we make these optimisations.
It did shock me simply how huge the advance was from #2 to #3,
however it was good to see.


Model 4

Then I made a small change to the code structure prepared for an additional shot
at an optimisation. Fairly uncontroversial – I ran the timing checks
once more to ensure there was no distinction, utilizing a bigger quantity,
and never bothering to make use of the reference “model 0” since that was so
sluggish.
 
def factor_fermat4( n ):
    a = square_root_ceil(n)
    a2 = a**2
    c = 2*a+1
    b = square_root_ceil(a**2-n)
    b2 = b**2
    d = 2*b+1
    whereas a2-b2 != n:
        whereas a2-b2 < n:
            a2 += c
            c += 2
        whereas a2-b2 > n:
            b2 += d
            d += 2

    return (c-d)/2, (c+d)/2-1


47368008965429 =
3911111
x
12111139:
Routine

1

2

3

4
Run 0 1.225 0.774 0.562 0.622
Run 1 1.223 0.772 0.562 0.624
Run 2 1.238 0.776 0.567 0.625
Run 3 1.233 0.777 0.566 0.626
Run 4 1.229 0.769 0.563 0.624
Run 5 1.231 0.773 0.562 0.624
Run 6 1.225 0.771 0.563 0.625
Run 7 1.219 0.771 0.563 0.622


Wait – what? It is slower? How can that be? I would’ve thought it might
compile to precisely the identical bytecode, and so execute in just about the
similar time.

Simply to make certain I ran it on an even bigger quantity.

4736791755876611 =
39111113
x
121111147:
Routine

1

2

3

4
Run 0 12.388 7.775 5.672 6.245
Run 1 12.215 7.767 5.660 6.256
Run 2 12.295 7.910 5.674 6.301
Run 3 12.196 7.749 5.645 6.247
Run 4 12.221 7.731 5.646 6.294
Run 5 12.227 7.750 5.638 6.232
Run 6 12.226 7.798 5.691 6.250
Run 7 12.164 7.753 5.646 6.264


There isn’t any doubt:

See Also

 
a2, c = a2+c, c+2

is quicker than

 
a2 += c
c += 2

That caught me without warning, however it actually does go to indicate:


When optimising,
measure, measure,
measure.

My guess is that within the first case the 2 evaluations and assignments
can occur in parallel, and so might occur on completely different cores, whereas
within the second case they have to occur in serial, however that is only a guess.
Possibly somebody studying this is aware of for positive.

Let me know.

I’ve bought extra modifications to make, however one factor is for positive – I will be
measuring the enhancements as I’m going to ensure they actually do make
issues higher.


References:


Feedback

The next remark was despatched, however I do not know if the individual is comfortable
to be talked about by title – if it is you, please let me know. Nevertheless, they
mentioned this (paraphrased):

Utilizing the usual python interpreter the GIL would forestall

 
a2, c = a2+c, c+2

from executing in parallel.

One other huge optimization can be to switch a**2 with a*a. Multiplication
is far quicker than energy capabilities < 4. It’s doable the interpreter
does this routinely, however I do not suppose so.

And to whomever it was – thanks for the response.




Ship us a remark …







You’ll be able to ship us a message right here. It would not get printed,
it simply sends us an e mail, and is a simple method to ask any
questions, or make any feedback, with out having to ship a
separate e mail. So simply fill within the containers after which:
In the event you embrace an e mail handle then I am going to be capable to reply,
and conversely, for those who do not, I will not! Likewise, for those who give
me express permission so as to add your feedback right here then I would
select to take action, and for those who do not, I will not!

So for those who reply, please embrace your preferences, in any other case
I am going to learn you e mail, nod sagely, and file it away.








Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top