Now Reading
To Audio & Again Once more

To Audio & Again Once more

2023-09-12 06:43:52




To Audio & Again Once more



« Back to unascribed.com

Skip to the demo

In the event you’re something like me, you have questioned what would occur in case you used a compression algorithm
for one thing it was not designed for. A whole lot of the time, the outcomes are boring — you simply get
actually massive output.

Nevertheless, picture and audio codecs are particular. These days, psychoacoustic and psychovisual codecs
are used — that’s, codecs which can be tuned particularly for human notion, and throw away
data that people is not going to discover. An amazing instance of that is that almost all audio codecs will
merely discard all audio frequencies above 20kHz. However the traits of the visible system
and the auditory system are very totally different.

So this gives two apparent concepts — what if we use an audio codec to encode imagery, or an
picture codec to encode audio? Sadly, doing something unusual with audio tends to
produce horrible screeching, and that usually makes you need to duck and canopy or run away
screaming. Imagery is quite a bit totally different — for most individuals, a garishly coloured or in any other case corrupted
picture could be considered safely. So this web page gives photos which have been compressed with lossy
psychoacoustic audio codecs, after which transformed again to pictures.

In reality, these function fascinating visualizations of the assorted sorts of adjustments that audio
codecs make to their bitstreams. None of those are helpful technique of comparability (for that you just
want an ABX check) however they’re not less than entertaining.

Methodology

For the uninitiated, the very concept right here is not sensible. How do you count on to encode a picture as
audio? They’re utterly totally different codecs!

Effectively, on a pc, all the things is on the finish of the day a stream of bits, ones and zeroes.
With out the correct tagging data, you’ll be able to “lie” to the pc that some information is in a
totally different format than what it truly is. 8 bits make up a byte, which we are able to characterize
as a quantity from 0 to 255, or 00 to FF in hexadecimal. Calling it mendacity is overselling it a bit.
Computer systems don’t actually “know” something — information is information.

To oversimplify, uncooked imagery appears to be like one thing like this (in hex):

Or simply FFAA0000AAFF556677 as a steady stream. These characterize a gold coloration,
light-blue coloration, and blue-gray coloration respectively; right here it’s scaled up 10x: .
You simply repeat that for as many pixels as you need to characterize — so a 640×480 picture is simply
307,200 such triplets in a row, or 921,600 bytes. PNG, JPEG, WebP, JXL, and so on are all simply alternative ways to encode
that while not having all that uncooked information.

In the meantime, audio is encoded as PCM — only a listing of amplitudes.

B8DCF4FFFDEBCA944B2008020B254E as a steady stream. Interpreted as colours, that
would create the next picture (scaled up 10x):

So, 10 seconds of 8-bit 48kHz audio (pretty normal) is simply… 480,000 bytes in a row. 16-bit
and 24-bit audio are usually extra frequent as of late, however 8-bit is simpler to debate so I am going to
cease there. Opus, Vorbis, MP3, AAC, and so on are all simply methods to encode that in much less house.

Okay, lengthy winded diversion over. Hopefully you have got some concept of what I imply now — it is all simply
bits, and bits can group up into numbers, and numbers are how audio and pictures each work. So, what
occurs if we take a uncooked picture stream, and provides it to an audio encoder as if it had been audio?

Effectively, that is what occurs. We’ll use Bliss, the well-known Home windows XP wallpaper, for instance. No
explicit purpose, it is simply what I used to be taking part in with once I first posted about this on Mastodon
three years in the past. We’ll re-interpret the uncooked picture information as uncooked audio information, with 8-bit samples,
in stereo, at 48kHz.

The audio generated by this course of is surprisingly listenable, however nonetheless very glitchy and has
some high-pitched whines. I’ve decreased the quantity quite a bit, however pay attention at your discretion.

Wait, YUV?

YUV is an alternate solution to characterize RGB that compresses higher. As an alternative of crimson, inexperienced, and blue,
which share loads of redundant information (e.g. grey is identical worth for all three) you encode
the brightness and two coloration coordinates. Grey in YUV is one quantity adopted by two zeroes.

However, properly… It isn’t that straightforward. YUV is “planar”, the place RGB is interleaved. To chop a protracted story
quick, this implies it is represented in a stream as YYYYYYUUUUUUVVVVVV as an alternative of
RGBRGBRGBRGBRGBRGB. This makes it act pretty totally different to the audio codecs.

To make issues extra fascinating, NV12 is an alternate type of YUV that’s “chroma subsampled”
(coloration information is 1 / 4 the decision of luminance information) and is encoded in a semi-planar
format — YYYYYYUVUVUV, roughly.

Lastly, the demo


Alright, so we’ve turned our picture into audio. Now let’s simply decode the audio again into uncooked
information, and reinterpret that information as a picture once more. There are a variety of how we are able to resolve to
do this, so a number of choices are supplied right here. Much less channels and decrease pattern charges are extra
forgiving, as they make the information extra “audio-like”, so to talk.

See Also



A number of the photos are offset, like they had been on a roll and had been solely turned part-way round.
This is because of these codecs having no normal solution to take away junk information at first of the
stream — audio codecs take a number of moments to “heat up” and begin emitting actual information. Opus and
Vorbis encode this into the stream, so they’re completely aligned. MP3 and AAC are significantly
infamous for this — in the actual audio world, this causes points with “gapless playback”. If the
offset just isn’t divisible by 3, this additionally causes the colours to be corrupted by exchanging the RGB
channels — if the sky is inexperienced and the bottom is pink, then it has been offset two too far,
making the RGB be learn as GBR. Planar YUV is much less inclined to this because of the planar format.

The way in which among the codecs flip interleaved RGB grey demonstrates their elimination of high-frequency
data — in essence, the close by pixel values get averaged collectively, making them grey. The
separate planes of YUV localize the corruption and smoothing to 1 channel, making the corruption
usually much less unhealthy. Opus has significantly unhealthy habits in RGB stereo right here, because it always will get
the values not-quite-right and will get misaligned with the RGB order, introducing colourful noise.

On the whole, the higher a codec is at being an audio codec, the more severe it’s at being a picture
codec. It sounds trivial once I put it like that, however discover how Vorbis and MP3 stuff appears to be like principally
effective, whereas Opus is the worst.

A part of why Opus is so fascinating is that it does a high-pass along with a low-pass —
that’s, it removes extraordinarily low frequencies in addition to extraordinarily excessive. In YUV, this finally ends up
eradicating and corrupting sluggish minute adjustments in coloration, such because the sky gradient, however preserves
sharp adjustments such because the define of the hill and the clouds. The opposite codecs right here do solely a
low-pass.

Bonus: That is roughly the shell pipeline used to supply the photographs seen above:

	
ffmpeg -i ~/Footage/bliss.jxl -s 854x480 -pix_fmt "$pixfmt" -f rawvideo - 
	| ffmpeg -f u8 -ar 48000 -ac 2 -i - -f matroska -strict -2 -acodec "$codec" -ab "$bitrate" - 
	| ffmpeg -f matroska -i - -f u8 -ar 48000 -ac 2 - 
	| ffmpeg -f rawvideo -s 854x480 -pix_fmt "$pixfmt" -i - -pix_fmt rgba -frames 1 "$output"

-strict -2 allows experimental codecs, reminiscent of FFmpeg’s built-in Vorbis
encoder. Don’t use this selection for normal use. This snippet is the script I used to generate
the primary model of this web page, with solely two pixel codecs and no possibility for pattern charge,
channel rely, and so on. The brand new script is… far more advanced, however the essence is identical.


This web page doesn’t have commercials. Please take into account supporting me
in case you get pleasure from my works!

You’ll be able to touch upon the Fediverse,
Cohost, or simply
send me an email.


This web page Copyright © 2023 Una Thompson (unascribed)
Bliss picture is property of Microsoft Company. Its low-resolution utilization right here as a
demonstration of compression artifacts is honest use.


Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top