Now Reading
The revenge of Unicode – The Eclectic Gentle Firm

The revenge of Unicode – The Eclectic Gentle Firm

2024-01-20 02:04:30

Unicode is an epitome of human achievement: an excellent concept that has grown uncontrolled to the purpose the place no human can grok all of it any extra. I typically surprise what number of of its 149,813 ‘characters’ anyone human is probably going to make use of, and suspect for many that’s within the low a whole bunch or much less. All these ‘characters’ allow deliberate misuse, the place visible similarities are exploited to spoof individuals over identification or worse. Let me clarify how one can get Unicode revenge with out harming a soul.

We nonetheless do an important deal in life utilizing textual content that may be searched quickly and readily. Typically it pays to obfuscate that in order that solely people studying it can perceive what it says. Whether or not it’s an eavesdropper bulk-scanning emails, or somebody’s AI crawler constructing your phrases into its subsequent Massive Language Mannequin (LLM), you can also make their job inconveniently troublesome by recasting its Unicode. For instance, the next obfuscated model of a paragraph from one among my latest articles reads clearly to the human eye:

dystextia1

However look extra carefully at these characters, like
Αlthоugh thе shір's bоаts hаd оrіgіnаllу іntеndеd tо tоw thе оvеrlоаdеd аnd раrtіаllу submеrgеd rаft
These aren’t what they appear, and on unusual textual content searches will draw a clean.

Apparently, some searches now make allowance for that diploma of sunshine obfuscation. To make issues far tougher for them, strive the extra excessive
Αⅼ𝚝𝚑о𝚞ɡ𝚑 𝚝𝚑е 𝚜𝚑ірᛌ𝚜 bоа𝚝𝚜 𝚑аⅾ о𝚛іɡі𝚗аⅼⅼу і𝚗𝚝е𝚗ⅾеⅾ 𝚝о 𝚝о𝚠 𝚝𝚑е о𝚟е𝚛ⅼоаⅾеⅾ а𝚗ⅾ ра𝚛𝚝іаⅼⅼу 𝚜𝚞b𝚖е𝚛ɡеⅾ 𝚛а𝚏𝚝
which stays completely comprehensible to people, however makes most machines hand over in confusion.

dystextia2

There at the moment are methods round this obfuscation. Apple’s Stay Textual content does a superb job of recognition on each these screenshots, however that further mile of changing all of your obfuscated textual content into photos, then utilizing textual content recognition on them isn’t one thing that many will strive, and it imposes a big computational burden on the eavesdropper or crawler.

Obfuscation is in fact no substitute for encryption: if the textual content accommodates secrets and techniques that you just don’t need others to see in any respect, then you need to encrypt it utilizing a sturdy methodology. However for holding off those that are simply going to make use of regular textual content looking out, it ought to be efficient.

See Also

Virtually seven years in the past, I wrote just a little utility for obfuscating Latin textual content on the two ranges proven above. Dystextia is pretty primary, however runs a deal with in macOS from Sierra to Sonoma. You can even use it to obfuscate shorter sections of textual content. Whereas Web domains that embody non-standard characters are transformed into ‘Punycode’ that makes them troublesome to spoof, the remainder of the URL is left in its authentic Unicode, thus preserving any obfuscation.

Maybe it’s time to see whether or not you should utilize Unicode’s code factors to hide different textual content in steganography.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top