Now Reading
Making my bookshelves clickable | James’ Espresso Weblog

Making my bookshelves clickable | James’ Espresso Weblog

2024-02-15 02:58:56

You may make areas of a picture clickable with various strategies, from overlyaing an SVG that incorporates onclick JavaScript handlers all the way in which to utilizing picture maps. I really like this concept. I began to consider how I may create a picture of my bookshelves that you might click on to be taught extra about every ebook I’m studying. This could be extra participating than a standard checklist of textual content.

I constructed a script that takes in a picture of a bookshelf and makes every ebook clickable. If you click on on a ebook, you’re taken to the Google Books web page related to the ebook. You don’t want to manually annotate any ebook or map every ebook to its title. This occurs routinely. You may try a demo of a clickable bookshelf on GitHub.

Here’s a video of my clicking via totally different books on my bookshelf:

The crimson border signifies the polygon whose contents are clickable.

On this weblog publish, I’m going to debate how I made this venture. This publish makes use of pc imaginative and prescient, however I’ll do my greatest to clarify all jargon. You should not want a pc imaginative and prescient background to get pleasure from this publish. If any particulars don’t make sense, e mail me at readers [at] jamesg [dot] weblog. If you wish to learn to use this instrument your self, refer to the project GitHub repository setup instructions.

With out additional ado, let’s get began!

The issue and resolution

The issue: I needed to make a picture of bookshelves clickable.

How may I’m going about addressing this downside? Listed here are the steps I had in thoughts once I began to work on this venture:

  1. Isolate the area of every ebook within the picture.
  2. Retrieve the title for every ebook utilizing Optical Character Recognition (OCR).
  3. Retrieve the Google Books URL for every ebook.
  4. Map every URL to its respective area.
  5. Create an SVG that may be overlaid onto a picture.

Let’s discuss via every of those steps.

Isolating ebook areas

We have to know the place books are in a picture earlier than we are able to make them clickable. We may manually annotate every ebook. I made a instrument for manually drawing polygons referred to as PolygonZone that you should use to manually annotate areas. However, I needed to make an answer that’s computerized. For that, I wanted a pc imaginative and prescient mannequin.

For this venture, I made a decision to make use of a mixture of two fashions: Grounding DINO and Section Something (SAM). The mix is known as Grounded SAM.

If you haven’t any expertise in pc imaginative and prescient, stick with me!

Grounding DINO enables you to establish objects in photos. You may give Grounding DINO a textual content immediate (i.e. “ebook backbone”) and the mannequin will attempt to establish all cases of that object in a picture. Right here is an instance of the consequence from Grounding DINO when passing via a picture of my bookshelf:

Book bounding boxes

There’s a field round (most of) the books within the picture.

That is nice! We now know the place every ebook is. However, every field is bigger than the ebook it represents. It’s because each ebook is angled within the picture. We may use these containers to make every ebook clickable, however some areas would overlap. This could be complicated and unintuitive.

We will use a segmentation mannequin to establish the actual area of every ebook. That is the place the Section Something Mannequin (SAM) is available in. We will use SAM to retrieve masks for every ebook. You may convert masks into polygons to get the define of an object.

Right here is an instance of the bookshelf processed with Grounding DINO then SAM:

Book segmentation masks

The purple areas are polygons. Should you look intently, you’ll be able to see boundaries between every ebook that aren’t in purple. This reveals our mannequin is segmenting particular person books.

There are some areas highlighted that aren’t books. These areas do not need textual content in them. Thus, GPT will be unable to search out information for them. We will solely plot polygons for which we are able to retrieve a title to make sure that solely related polygons are displayed within the output. As well as, a couple of books aren’t highlighted. This implies the mannequin we’re utilizing — a mixture of Grounding DINO and SAM — couldn’t isolate a area for the ebook. This could possibly be manually corrected utilizing a polygon annotation instrument, however just isn’t supreme. I must suppose via what resolution could be best for customers.

The method of producing masks takes ~15 seconds on an M1 Macbook Air.

Retrieving ebook titles

We now know the place our books are in a picture. Subsequent, we have to work out the title and creator of every ebook. This entails a couple of steps. First, we have to isolate every ebook. Then, we have to learn the characters on every ebook. At minimal, we should always get the title of a ebook. We may additionally get the creator identify, relying on if the creator identify is on the backbone. We will then use this data to seek for a ebook on Google Books.

Studying characters in a picture is a website referred to as Optical Character Recognition (OCR). There are various methods to do OCR, however for this venture I selected to make use of GPT-4 with Imaginative and prescient, which has been correct in lots of OCR assessments I’ve run and seen run. GPT-4 with Imaginative and prescient lets you ask questions on photos. On this case, I may request the mannequin establish the characters in every ebook picture.

Earlier than sending a picture to GPT-4 with Imaginative and prescient, I remoted the area of every ebook. I then rotated the ebook to the left by 90 levels so it might be horizontal as a substitute of vertical. This could enhance OCR efficiency. Right here is an instance of a picture despatched to GPT-4 with Imaginative and prescient:

Isolated book

On this picture, one particular ebook is remoted. We will ship this picture to GPT-4 with Imaginative and prescient to retrieve the characters on the ebook.

I used the next immediate with the picture:

Learn the textual content on the ebook backbone. Solely say the ebook cowl title and creator if you could find them. Say the ebook that’s most distinguished. Return the format Making my bookshelves clickable | James’ Espresso Weblog , with no punctuation.

Right here is an instance response:

The Poetry Pharmacy Returns William Sieghart

With this data, we are able to search for the ebook with the Google Books API. The Google Books search API makes use of the next syntax:

See Also


https://www.googleapis.com/books/v1/volumes?q={ebook}

You may add any textual content within the {ebook} part above. On this script, we ship the ebook identify and, if accessible, the creator identify. I did not separate them out. Together with each items of knowledge appeared to work properly.

This API returns a number of items of details about a ebook. For my script, I gathered the:

  • Creator identify
  • ISBN
  • Google Books URL

Right here is an instance books URL for the Google Books itemizing URL for the ebook matching The Poetry Pharmacy Returns William Sieghart:


https://play.google.com/retailer/books/particulars?id=vdOXDwAAQBAJ&supply=gbs_api

This complete course of — calling the GPT-4 with Imaginative and prescient and Google Books API — takes a couple of seconds per ebook.

Create a clickable SVG

All the data collected with GPT-4 with Imaginative and prescient and the Google Books search API is related to every ebook and area within the picture. Every masks — the shape returned by Section Something — is transformed to a polygon so it may be utilized in an SVG that I can overlay over my picture. Utilizing these polygons, I can generate a HTML file with two elements:

  1. The supply picture, and;
  2. An SVG file that may be overlaid over the picture.

The SVG can embrace JavaScript. For this venture, I’ve an onclick handler that opens the Google Books URL related to every ebook.

I generate a HTML file with an SVG. Within the HTML file, I embed my supply picture and overlay the SVG. The SVG makes use of polygons to symbolize every ebook area. Every polygon usually has dozens of factors. The onclick handler redirects the consumer to the corresponding Google Books web page when a ebook is clicked. Here’s a screenshot of the ensuing web page (the books aren’t clickable as a result of it is a screenshot):

Result

You can try the demo — and click the books! — on GitHub pages.

Conclusion and Subsequent Steps

My system to make clickable bookshelves is designed to be autonomous. It is best to have the ability to add an arbitrary bookshelf and generate clickable areas as above. With that stated, there are limitations. If a ebook title is difficult to learn, the GPT-4 with Imaginative and prescient API might wrestle to run OCR. Thus, you will be unable to affiliate a area with a ebook URL. If a ebook just isn’t on Google Books, you would wish to make use of one other URL. In a single take a look at, a ebook URL was completely improper as a result of the ebook wasn’t accessible on Google Books.

In fact, Google Books might be swapped with any supply. When you have a weblog, the supply of URLs could possibly be your weblog. You may make every ebook clickable and take the consumer to your evaluate of the ebook.

There are a couple of enhancements I bear in mind that I want to make:

  1. Use EfficientSAM, a sooner model of SAM.
  2. Make the polygons look nicer.
  3. Possibly add a handbook correction system so if the system cannot learn the textual content of a ebook I can repair it.

When you have questions on this venture, e mail me at readers [at] jamesg [dot] weblog.

Tagged in IndieWeb.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top