Now Reading
Picture Stacks and iPhone Racks – Constructing an Web Scale Meme Search Engine

Picture Stacks and iPhone Racks – Constructing an Web Scale Meme Search Engine

2023-01-09 14:46:43

Anybody who’s spent any period of time on the Web has a good suggestion of
how prevalent meme utilization has turn out to be in on-line discourse. Discovering new
memes on the most recent taking place and sharing them with varied pal
teams to share within the humor is a long-enjoyed pastime of mine. Working
in tech and within the InfoSec subject has netted me an unsurprisingly
“terminally-online” teams of buddies who all do the identical.

Nevertheless, there’s an ironic duality to most memes: the extra area of interest they
are, the extra humorous they are typically. Among the finest memes are simply
silly in-jokes between my pal teams, or from the extremely area of interest
InfoSec business.

This offered an especially frequent downside: I might by no means discover the area of interest
memes I needed to ship people after I wanted them most. Mid-conversation,
spir-of-the-moment memes have been at all times unattainable to seek out. Scrolling
by a whole bunch of saved pictures in my cellphone will not be environment friendly looking
because it seems, so I made a decision to attempt to higher resolve the issue.

Earlier makes an attempt at writing a meme search engine in the end led to 1
core blocking situation: lack of scalable OCR. All the present options
have been both extraordinarily poor at recognizing the warped and highly-varied
textual content of most memes, or have been prohibitively costly.

For instance, Tesseract
OCR
is a free
open-source library to extract textual content from pictures. When testing with this
library it was OK at recognizing memes with very commonplace fonts and
colour schemes:

alt text

Instance easy-to-OCR meme, Tesseract outcome: i am imagined to really feel
refreshed after waking up from a nap however as an alternative i find yourself feeling like
this

Nevertheless, remixing, watermarking, and resharing of memes makes their
format something however commonplace. Take the next meme, for instance:

alt text

Tesseract states that the OCR-ed textual content for this meme is: 30 BLUE
man41;? S4-5?'flew/ — V [IL ' . ",2; g" .'Sj /B"f;T"EArmDand [red] mvslmunlm: sawmills
. That is fairly removed from the precise textual content as any
human might inform. It appeared that my choices have been both costly cloud
OCR companies, or poor-performing options like this.

Nevertheless, one night time I had an enormous realization after I was making an attempt to ship
somebody an instance old-school CAPTCHA picture on my iPhone:

alt text

By chance choosing the obfuscated textual content in a previous-generation
reCAPTCHA picture.

To my shock, iOS was very happy to spotlight the
intentionally-scrambled and warped textual content of the CAPTCHA picture. Much more
shocking, it decoded the textual content completely:

alt text

Pasting the copied reCAPTCHA textual content.

If it did this nicely with deliberately obfuscated textual content pictures, how would
it fare with the assorted codecs that almost all memes are available in? After testing
the OCR on a bunch of saved memes in my cellphone it appeared the reply was
“extraordinarily nicely”.

Higher but, after some fast Googling I discovered that this performance is
uncovered within the iOS Vision
framework
.
That means this OCR might be absolutely automated within the type of a {custom} iOS
app. Lastly it appeared there was a scalable OCR resolution to the issue
I had been going through!

Although I’ve written quite a lot of code, I had by no means written something severe
in Swift or Goal C. I wasn’t capable of finding any Imaginative and prescient Framework
plugins for Apache
Cordova
, so I couldn’t simply write
the app in JavaScript both. It seemed prefer it was time to chew the
bullet and write an OCR iOS server in Swift.

By combining the ability of intense Googling, reverse engineering varied
Swift repos on Github, and the occasional Xcode query to my iOS
pal, I used to be capable of cobble collectively a working resolution:

alt text

A really primary iOS Imaginative and prescient OCR server working on an iPhone.

My preliminary velocity exams have been pretty sluggish on my Macbook. Nevertheless, as soon as
I deployed the app to an precise iPhone the velocity of OCR was extraordinarily
promising (probably as a result of Vision framework
using the
GPU
).
I used to be then capable of carry out extraordinarily correct OCR on 1000’s of pictures
very quickly in any respect, even on the funds iPhone fashions just like the 2nd gen SE.

Total the API server constructed on high of GCDWebServer
labored pretty nicely however did endure from a slight reminiscence leak. After
20K-40K pictures being OCR-ed the app would often crash, which was a
fairly massive annoyance. Once more, my familiarity with Swift was about on par
with a golden retriever’s understanding of
finance
, so debugging the
downside proved fairly tough. After investigating extra “hacky” choices, I
realized that I might make the most of “Guided Access”
on iOS
to
robotically restart the app when it crashed. This primarily operated
as a daemon to make sure the OCR server would proceed to serve requests
and in addition guard in opposition to different unknown crashes from corrupt pictures halting
the pipeline.

With a technique to now correctly extract the textual content from all meme pictures the
downside was now search by an enormous corpus of textual content shortly.
Preliminary testing with the Postgres Full Text
Search

indexing performance proved unusably sluggish on the scale of something
over one million pictures, even when allotted the suitable {hardware}
assets.

I made a decision to present ElasticSearch
a strive as it’s mainly custom-built precisely for this downside. After
doc studying, early testing, and studying weblog posts from real-world utilization
of it, I got here to some conclusions about implementation of it for my use
case:

  • ElasticSearch is a glutton for RAM and system resources, particularly if run with a number of nodes. Having a number of nodes permits for resilience against failures after they happen, as is commonplace in any distributed system.

  • I might run ElasticSearch in a one-node cluster, because the mixed textual content of even thousands and thousands of memes was nonetheless not comparatively massive for ElasticSearch’s normal scale. This could be cost-effective however after all got here on the expense of reliability.

  • Since I used to be using Postgres for the remainder of the structured knowledge for the memes (e.g. context, supply, and so on), having the meme textual content saved in ElasticSearch involved me in that it might muddy the “single source of truth” paradigm. In previous expertise, having to make sure that two sources of fact align will be the supply of maximum complexity and headache.

  • Upon performing some looking I discovered that I might make the most of PGSync to robotically sync choose Postgres columns to ElasticSearch. This appeared a wonderful tradeoff to maintain a single supply of fact (Postgres) and run ElasticSearch affordably in single-node configuration. If there was any knowledge loss I might blow ElasticSearch away and PGSync would enable me to simply rebuild the textual content search index.

  • ElasticSearch had an enormous quantity of configurability for textual content search and a full REST API permitting for me to simply combine it into my service.

My remaining design seemed one thing like the next:

alt text

Extraordinarily haphazard diagram of the ultimate implementation infra.

Testing with generated datasets confirmed that it scaled very well,
permitting for looking of thousands and thousands of memes in lower than a second even
on comparatively modest {hardware}. On the time of this writing I’m capable of
index and search the textual content of round ~17 million memes on a shared
Linode occasion with solely 6 cores and 16GB of RAM. This retains the prices
comparatively low, which is essential for side-projects for those who intend on
retaining them working for any period of time.

Because it seems, memes are usually not solely pictures. Many memes are actually
video full with audio tracks as nicely. That is little question on account of
enhancements in cellular networks permitting fast supply of larger recordsdata.
In some instances, like GIF, movies are even higher as a result of they’ve a lot
higher compression and thus will be a lot smaller in dimension.

With a purpose to index memes of this sort, the movies needed to be chopped up
into units of screenshots which might then be OCRed identical to common
memes. To handle this I wrote a small microservice which does the
following:

  • Takes an enter video file.

  • Utilizing ffmpeg (by way of a library), pulls out ten evenly spaced screenshots from the video.

  • Sends the screenshot recordsdata off to the iPhone OCR service.

  • Returns the outcome set after OCRing every screenshot from the video file.

Upgrading the iPhone OCR Service Into An OCR Cluster

Non-suprisingly, this elevated the load on the OCR service
considerably. For each video meme it was primarily 10x the work to
do OCR. Regardless of the velocity of the OCR app server this grew to become a significant
bottleneck, I in the end opted to improve the iOS OCR service right into a
cluster:

See Also

alt text

Don’t fear, there’s a fan retaining them cool.

This setup appears fairly costly as a result of many iPhones in use.
Nevertheless, there are some issues that performed in my favor to make this a lot
cheaper than you’d anticipate:

  • Since these are devoted for doing OCR by way of the iOS Vision API, I might use older (and cheaper) iPhone fashions such because the iPhone SE (2nd generation).

  • I’ve the benefit of not caring about issues akin to display cracks, scratches, and different beauty points which lowers the price even additional.

  • Higher but, I don’t even wish to use them as telephones, so even iPhones which are IMEI banned or are locked to unpopular networks are completely high quality for my use.

Taking all these components into consideration, I used to be capable of finding iPhones
at a significantly cheaper value. For instance, right here’s an inventory that
matches my standards and is kind of reasonably priced:

alt text

This cellphone possible went for less than 40$ as a result of it was locked to an
unpopular US provider (Cricket) and thus most individuals wouldn’t wish to be
caught with it.

How do these prices evaluate to cloud OCR companies anyway? GCP’s Cloud Vision API charges you $1.50 for every
thousand images you
OCR
. That implies that by
utilizing this homebrewed resolution, we’d eclipse the price of the iPhone
after ~27K pictures. In fact, possibly GCP’s OCR service is significantly better
quality-wise, however in my testing the outcomes appeared very comparable for
this use case. Attempting to make use of the Cloud API on the scale of tens of
thousands and thousands of OCR requests and the price would have been prohibitive for
this undertaking.

Retaining an in depth eye on eBay auctions I purchased any iPhones which went for
filth low-cost charges like this. Utilizing an outdated Raspberry Pi I had across the
home, I configured it to behave as an Nginx load balancer to unfold
requests throughout the iPhones evenly. Including in some networking, and a
low-cost fan to maintain all the things cool, and I had a working OCR cluster which
might simply deal with a lot bigger demand.

alt text

The ultimate structure appears one thing just like the above diagram.
Whereas there’s positively some additional complexity on account of my optimizations
for value, the cheaper infrastructure will enable me to run this facet
undertaking for for much longer.

Total it was a enjoyable undertaking with a bonus of getting a considerable amount of
private utility. I discovered quite a bit about quite a lot of matters together with
ElasticSearch configuration, iOS app growth, and a bit about
machine studying. In future posts I hope to elaborate on a number of the
different options I’ve constructed for it, together with issues like:

  • “Search by Picture”/”Picture Similarity” looking on the scale of thousands and thousands of memes.

  • Automated detection and labeling of NSFW memes.

  • Constructing out the scraping infrastructure to really index all the memes.

In the event you’d like to present it a strive, try the positioning at https://findthatmeme.com
and let me know what you assume!

@IAmMandatory



Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top