Studying thermometer temperatures over time from a video
Natalie has been experimenting with utilizing a microwave as a kiln for pottery, particularly for Raku.
She wished to grasp how lengthy to run the microwave to succeed in the specified temperature for the Raku glazes to work.
However… her thermometer cannot go within the microwave whereas it is operating. So she must take away the kiln element from the microwave, connect it to the thermometer and measure the way it cools over time in an effort to calculate its peak temperature whereas the microwave is operating.
The tedious method to do that could be to look at the thermometer for over an hour because it cools down, making notes on the temperature at totally different factors after which plotting these on a graph to derive that authentic temperature.
Or… how about recording a video of the thermometer and extracting the readings by way of OCR as an alternative?
We rigged up an iPhone on a stand, pointed it on the digital LCD on the thermometer, and left it operating for practically two hours.
Splitting a video into frames each 10s
GPT-4 immediate:
Write a zsh script to run on macOS with ffmpeg out there that takes a path to a video file after which spits out a single body from each 10s of video in a file with a filename that displays the time inside the video
It gave me again this:
#!/bin/zsh
# Examine if enter file is supplied
if [ "$#" -ne 1 ]; then
echo "Utilization: $0 <input_video_file>"
exit 1
fi
# Examine if ffmpeg is put in
if ! command -v ffmpeg >/dev/null 2>&1; then
echo "ffmpeg is required however not put in. Please set up ffmpeg first."
exit 1
fi
INPUT_VIDEO="$1"
OUTPUT_DIR="frames"
# Create the output listing if it doesn't exist
mkdir -p "$OUTPUT_DIR"
# Get the period of the video in seconds
DURATION=$(ffprobe -v error -show_entries format=period -of default=noprint_wrappers=1:nokey=1 "$INPUT_VIDEO")
DURATION=$(printf "%.0f" "$DURATION")
# Extract a body each 10 seconds
for ((i=0; i<=DURATION; i+=10)); do
OUTPUT_FILENAME=$(printf "${OUTPUT_DIR}/frame_percent02d:%02d:%02d.jpg" $((i/3600)) $(((i/60)%60)) $((i%60)))
ffmpeg -ss "$i" -i "$INPUT_VIDEO" -vf "choose=eq(pict_type,PICT_TYPE_I)" -vframes 1 -q:v 2 "$OUTPUT_FILENAME" -y
executed
echo "Frames extracted efficiently within the ${OUTPUT_DIR} listing."
Executed like this:
./extract_frames.sh IMG_9684.MOV
This ran completely, with none modifications. It generated a folder stuffed with JPG pictures, taken from 10s intervals of the uncooked video.
Every picture regarded like this:
Working OCR
This subsequent step proved tougher than I anticipated.
My first try used textra by Dylan Freedman. This can be a unbelievable little CLI instrument which runs on macOS and makes use of the highly effective VisionKit APIs on that platform to run OCR and audio extraction.
% textra frame_01:00:00.jpg -s
DM6801A+
Only one downside: it picked up the textual content on the prime of the picture, nevertheless it did not get the output of the LCD show itself with the temperature.
Subsequent I attempted my very own s3-ocr instrument, which makes use of AWS Textract. I’ve had amazing results with this prior to now, however whereas it did higher it nonetheless wasn’t practically dependable sufficient at extracting the numeric readouts from this explicit sequence of pictures.
I hadn’t tried it myself, however I would heard good issues about Google Cloud Imaginative and prescient – so I gave {that a} go utilizing their online demo:
That is precisely what I want!
Subsequent problem: how you can automate it?
Utilizing the Cloud Imaginative and prescient API
GPT-4 immediate:
I wish to use the google cloud imaginative and prescient API to run OCR from a python script
First assist me perceive what entry token credentials I want for this undertaking and how you can create them
Full transcript here. Asking it about entry token credentials was a little bit of a stretch: I hate determining how you can make these, however I used to be fairly assured that any steps it gave me right here would have been out-dated by adjustments Google had made to their console interface because the coaching deadline for GPT-4 of September 2021.
My skepticism was misplaced! It gave me step-by-step directions which largely labored – and gave me sufficient info to determine how you can pay money for a JSON file representing credentials for a service account that would name the Cloud Imaginative and prescient API.
The code it gave me was helpful too. I fired up a Jupyter pocket book and developed it till it regarded like this:
import os
from google.cloud import imaginative and prescient
from google.oauth2 import service_account
def ocr_image(image_path, credentials_path):
credentials = service_account.Credentials.from_service_account_file(credentials_path)
shopper = imaginative and prescient.ImageAnnotatorClient(credentials=credentials)
with open(image_path, 'rb') as image_file:
content material = image_file.learn()
picture = imaginative and prescient.Picture(content material=content material)
response = shopper.text_detection(picture=picture)
texts = response.text_annotations
return texts[0].description if texts else ''
image_path = "ocr-video-frames/frames-9679/frame_00:00:30.jpg"
credentials_path = "cloud-vision-ocr-382418-fee63d63124b.json"
outcome = ocr_image(image_path, credentials_path)
print(outcome)
Then I wrote my very own code to run it towards each picture in my listing:
import pathlib
import sqlite_utils
root = pathlib.Path("ocr-video-frames/")
db = sqlite_utils.Database("ocr.db")
for path in root.glob("*/*.jpg"):
relative = str(path.relative_to(root))
textual content = ocr_image(path, credentials_path)
db["results"].insert({
"key": relative,
"textual content": textual content
}, pk="key")
This gave me a SQLite database file containing the extracted textual content from each one in every of my pictures.
Extracting the temperatures in Datasette with an everyday expression
The OCR extracted textual content knowledge wasn’t simply the temperatures I wanted. For a few of the frames it regarded extra like this:
DM6801A+
180
POWER
HOLD
0.1°
F
°F
TI
THERMOMETER
TYPE-Ok
1.0°
°C
The bit I care about is the “180” – three decimal characters with a newline earlier than and after them.
I put in the brand new datasette-sqlite-regex plugin by Alex Garcia:
datasette set up datasette-sqlite-regex
Then I opened up my SQLite database in Datasette and constructed the next question:
choose
regex_find('[0-9]{2}:[0-9]{2}:[0-9]{2}', key) as time,
trim(regex_find('n[0-9]{3}n', textual content), char(10)) as temperature
from
outcomes
the place key like 'frames/%' and temperature isn't null
order by
[time]
The ensuing desk regarded like this:
time | temperature |
---|---|
00:02:00 | 830 |
00:02:10 | 834 |
00:02:20 | 836 |
00:02:40 | 834 |
00:03:20 | 820 |
00:03:30 | 816 |
00:03:40 | 812 |
I used datasette-copyable to repeat the information out to a Google Sheet. Here is my first try at charting the information, which makes it straightforward to identify locations the place the OCR acquired the fallacious outcomes:
Associated
Created 2023-04-02T10:29:24-07:00, up to date 2023-04-02T10:51:58-07:00 · History · Edit