When Optimising Code Measure
![](https://blinkingrobots.com/wp-content/uploads/2024/01/The-Point-Of-The-Banach-Tarski-Theorem.png)
My newest posts will be
discovered right here:
Earlier weblog posts:
Moreover, some earlier writings:
When optimising code, by no means guess, all the time measure.
This can be a truism that a number of folks quote, however it may be exhausting to
bear in mind, particularly within the warmth of battle (because it have been). Reasonably
luckily it got here to thoughts simply when wanted, as I discovered one thing
utterly surprising.
I used to be writing a easy implementation of the Fermat distinction of
squares methodology of factoring. This entails writing the quantity to
be factored as – you guessed it – the distinction of two squares.
If $n=a^2-b^2$ then $n=(a-b)(a+b)$ and now we have a factorisation
(offered we do not have $a-b=1$).
Model 0
So the plain factor to do is to take the sq. root of $n$, spherical it as much as get a candidate for $a$, then have a look at $a^2-n$. If that is a sq. then we’re achieved. |
def factor_fermat0(n): a = square_root_ceil(n) whereas True: d = a*a-n b = square_root_ceil(d) if b**2 == d: return a-b, a+b a += 1 |
Model 1
We will enhance on this. We will enhance $a$ till now we have $a^2-b^2>n$, after which enhance $b$ till $a^2-b^2<n$. If $n$ is odd then we’ll get equality sooner or later, so we simply proceed till that occurs, and we’re achieved. |
def factor_fermat1( n ): a = square_root_ceil(n) b = square_root_ceil(a**2-n) whereas a**2-b**2 != n: whereas a**2-b**2 < n: a += 1 whereas a**2-b**2 > n: b += 1 return a-b, a+b |
Model 2
Now we discover that we’re repeatedly computing the squares of $a$ and $b$, so we create variables to carry the squares and replace them by remembering that $(x+1)^2=x^2+2x+1$. |
def factor_fermat2( n ): a = square_root_ceil(n) a2 = a**2 b = square_root_ceil(a**2-n) b2 = b**2 whereas a2-b2 != n: whereas a2-b2 < n: a2, a = a2+a+a+1, a+1 whereas a2-b2 > n: b2, b = b2+b+b+1, b+1 return a-b, a+b |
Model 3
We will enhance that barely additional by having $c=2a+1$ and $d=2b+1$, and the code now turns into: |
def factor_fermat3( n ): a = square_root_ceil(n) a2 = a**2 c = 2*a+1 b = square_root_ceil(a**2-n) b2 = b**2 d = 2*b+1 whereas a2-b2 != n: whereas a2-b2 < n: a2, c = a2+c, c+2 whereas a2-b2 > n: b2, d = b2+d, d+2 return (c-d)/2, (c+d)/2-1 |
Let’s examine how we’re doing
These enhancements all appear apparent and straight-forward, and we are able to
examine by doing a little timing checks. The occasions, and certainly, the ratios
of occasions, will differ based on the quantity being factored, so we are able to
do two completely different numbers and see what the pattern is. In the event you select to
analyse the algorithm you then’ll see what is going on on. Nevertheless, we
factored two completely different numbers:
4736906887651 = 1211141 x 3911111: |
|
47368008965429 = 3911111 x 12111139: |
|
We will see that the primary routine is garbage, however that is no shock,
because it’s computing a sq. root each time by the loop. For the
others we are able to see a transparent enchancment as we make these optimisations.
It did shock me simply how huge the advance was from #2 to #3,
however it was good to see.
Model 4
Then I made a small change to the code structure prepared for an additional shot at an optimisation. Fairly uncontroversial – I ran the timing checks once more to ensure there was no distinction, utilizing a bigger quantity, and never bothering to make use of the reference “model 0” since that was so sluggish. |
|
47368008965429 = 3911111 x 12111139: |
|
Wait – what? It is slower? How can that be? I would’ve thought it might
compile to precisely the identical bytecode, and so execute in just about the
similar time.
Simply to make certain I ran it on an even bigger quantity.
4736791755876611 = 39111113 x 121111147: |
|
There isn’t any doubt:
a2, c = a2+c, c+2 is quicker than a2 += c c += 2 |
That caught me without warning, however it actually does go to indicate:
When optimising, measure, measure, measure. |
My guess is that within the first case the 2 evaluations and assignments
can occur in parallel, and so might occur on completely different cores, whereas
within the second case they have to occur in serial, however that is only a guess.
Possibly somebody studying this is aware of for positive.
Let me know.
I’ve bought extra modifications to make, however one factor is for positive – I will be
measuring the enhancements as I’m going to ensure they actually do make
issues higher.
References:
Feedback
The next remark was despatched, however I do not know if the individual is comfortable
to be talked about by title – if it is you, please let me know. Nevertheless, they
mentioned this (paraphrased):
Utilizing the usual python interpreter the GIL would forestall a2, c = a2+c, c+2 from executing in parallel.
One other huge optimization can be to switch a**2 with a*a. Multiplication |
And to whomever it was – thanks for the response.