Now Reading
Compressing Textual content into Photographs – Terence Eden’s Weblog

Compressing Textual content into Photographs – Terence Eden’s Weblog

2024-01-13 06:58:14

(That is, I feel, a foolish thought. However typically the silliest issues result in surprising outcomes.)

The textual content of Shakespeare’s Romeo and Juliet is about 146,000 characters lengthy. Due to the English language, every character might be represented by a single byte. So a plain Unicode textual content file of the play is about 142KB.

In Adventures With Compression, JamesG discusses a contest to compress textual content and poses an attention-grabbing thought:

Encoding the textual content as a picture and compressing the picture. I would want to make use of a lossless picture compressor, and utilizing RGB would enhance the variety of values related to every phrase. Maybe if I modified the picture to greyscale? Or maybe that isn’t price exploring.

Picture compression algorithms are, usually, fairly good at discovering patterns in photos and squashing them down. So if we convert textual content to a picture, will picture compression assist?

The English language and its punctuation will not be very difficult, so the play solely incorporates 77 distinctive symbols. The ASCII worth of every character spans from 0 – 127. So let’s create a greyscale picture which every pixel has the identical greyness because the ASCII worth of the character.

This is what it appears like when losslessly compressed to a PNG:

Random grey noise.

That is all the way down to 55KB! About 40% of the scale of the unique file. It’s barely smaller than ZIP, and about 9 bytes bigger than Brotli compression.

See Also

The file might be learn with the next Python:

from PIL import Picture
picture  = Picture.open("ascii_grey.png")
pixels = record(picture.getdata())
ascii  = "".be part of([chr(pixel) for pixel in pixels])
with open("rj.txt", "w") as file:
    file.write(ascii)

However, even with the most recent picture compression algorithms, it’s unlikely to compress a lot additional; the picture appears like random noise. Sure, you and I do know there’s knowledge in there. And a statistician in search of entropy would in all probability decide that the file incorporates readable knowledge. However picture compressors work in a distinct realm. They search for strong blocks, or predictable gradients, or different statistical options.

However there you go! A lossless picture is a fairly environment friendly technique to compress ASCII textual content.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top