A Perceptually Significant Audio Visualizer | by David Lu
Audioscope: What you see is what you hear.
I pay quite a lot of consideration to particulars in sound. I wished to have the ability to see these particulars, and in addition level them out whereas describing sounds to folks. Sadly, most audio visualizers don’t reveal these particulars.
So I created Audioscope, and made a video and soundtrack to exhibit how a few of these fantastic sonic particulars are made seen and apparent:
tl;dr it turns sine waves into circles
Technically: The y axis is the uncooked audio sign, and the x axis is the sign filtered such that each frequency is part shifted by 90˚.
Right here’s the visible clarification:
Sound Is Made from Sine Waves
We will decompose indicators/sound waves into sine waves/pure frequency parts. These parts have an amplitude and part.
By summing them collectively, we will get the unique sign again.
Sine Waves Are Produced from Circles
We will get a sine wave by tracing out a circle and plotting the y axis.
We will do the identical utilizing the x axis.
These two waves are the identical, besides they’re 90˚ out of part.
If we put a sine wave on the y axis and mix it with a 90˚ phase-shifted model on the x axis, we hint out a circle.
If we make part/time a dimension on the z axis, we hint out a helix.
Helices grow to be a mathematically simple way of working with signals. For my part, it’s additionally a extra pure means of deciphering audio indicators. Given a pure sine wave sound, whereas changing it to a helix requires you so as to add an imaginary part to the sign, the ensuing helix is extra consultant of the purity of the sound, because the radius/magnitude is fixed.
However let’s preserve time within the time dimension and preserve the visuals two dimensional.
Turning Waves into Circles
As a result of we will decompose indicators into part sine waves, and convert sine waves into circles, we will convert each part sine wave of a sign into circles, and represent the sign as a sum of circles, the place the y axis is the unique sign, and the x axis is the sign with each part sine wave part shifted by 90˚.
This ends in a visualization of the sign that’s one half actual and one half imaginary, but in addition perceptually significant:
- Loud sounds have massive shapes, and quiet sounds have small shapes. Close to silence is a dot within the center, and pure silence is a plain black display screen.
- A pure sine wave is only a circle, the place the radius corresponds to amplitude.
- Purer sounds are very spherical as a result of they’re manufactured from only a few sine waves.
- Brighter sounds find yourself wanting spiky as a result of they’ve many frequency parts and in addition digital sound has restricted decision/is “pixelated”.
- Percussive/transient sounds flash on the display screen as a result of these indicators are very brief.
- Sustained tones create sustained shapes as a result of tones are periodic indicators which have repeating components which have the identical form, and these shapes preserve getting traced out again and again.
- A number of tones in good concord even have sustained shapes as a result of good concord means the frequencies are integer ratios of one another. In different phrases, the mix of those periodic indicators can be a periodic sign.
- A number of tones in imperfect concord have shifting/vibrating shapes as a result of one thing to do with interference and beating and it’s simply not periodic so the identical form doesn’t get repeated okay additionally most music makes use of imperfect concord so each time there are a number of tones it’s most likely gonna look messy sorry this deserves a devoted publish
Thickness, Hue, and Saturation
The beam of the Audioscope visualizer has variable thickness and shade. This stuff are extra delicate and unpredictable, however in case you’re curious and comfortable with extra math, learn on.
- Thickness: inversely proportional to hurry.
- Hue: instantaneous pitch, derived from angular velocity.
- Saturation: inversely proportional to quantity of noise
The thickness decreases because the beam strikes quicker. This causes high-frequency sounds or loud sounds to look thinner.
The colour is much more sophisticated. Utilizing the HSV color space:
- Hue pertains to pitch (extra technically, pitch class). Pitch is round, and hue is round, so this can be a pure mapping to make.
- Saturation corresponds to quantity of noise, the place: extra noise → extra white, much less noise → purer colours.
- Worth is maxed out as a result of I would like solely the brightest colours
At a excessive degree: the hue of the colour roughly corresponds to the pitch of the regionally largest frequency part. If we’re coping with pure sine waves, it straight corresponds to the pitch of the sine wave. Which means, if a 440Hz (A4) sine wave is purple, 220Hz (A3) and 880Hz (A5) are additionally purple. A sine wave going from 440Hz to 880Hz would begin at purple, cycle by means of each shade of the rainbow, and find yourself at purple.
pitch ≈ log_2(frequency)
pitch class ≈ pitch mod 1
Technically: At a given level within the beam, we now have the angular velocity ω (how briskly the beam is popping at that time) (that is distinct from instantaneous frequency). For a pure sine wave, ω corresponds to frequency; If the beam turns twice as quick, the frequency doubles. Deciphering ω as frequency, we will use the above system to transform it to one thing similar to pitch (class), and use the end result because the hue of the colour at this level.
Much more technically: For small values of ω, the results of noise are way more distinguished, so there’s really a filtering step on the finish that mainly will get the typical hue and quantity of noise. Nonetheless, any such noise isn’t straight associated to noise within the sign; It’s associated to the quantity of noise within the angular velocity over time. Properly, it needs to be associated, however the present system wants enchancment.
In any case this, the colours solely have obvious that means in distinctive circumstances (pure frequencies). Nevertheless it does make for good rainbows that totally rely upon the sound.
Filter Design and Implementation
I’m capable of describe the idea of part shifting each frequency by 90˚ whereas avoiding heavy arithmetic. However really creating the filter that does this for arbitrary indicators requires domain knowledge. That is for many who are accustomed to digital sign processing.
I created a generator for an FIR filter that removes all adverse frequencies and in addition DC and Nyquist. I may have simply used the plain Hilbert remodel, however I wished to make it possible for, for the decrease transition band, the magnitude of the actual half roughly decreases equally because the imaginary half, and equally for the transition band close to Nyquist, in order that the outcomes will probably be as round as attainable (versus having vertically oriented ellipses). Low frequencies are crucial in digital music.
Rust continues to be comparatively new and it appears nobody has carried out an environment friendly convolution but utilizing the FFT, so I simply carried out overlap-save on the spot, and made the filter size be as massive as attainable (and in addition odd) relying on the FFT dimension. I generated the impulse response for a bandpass filter with actual half eradicating DC and Nyquist and imaginary half the Hilbert remodel, and had it windowed with a Hamming window.
It was an choice to make use of a pair of IIR filter that used much less reminiscence and had higher magnitude response, however I noticed the group delays for the lows and felt it was unacceptably lengthy for an utility that must be as responsive as attainable. Additionally, I wasn’t okay with the thought of non-linear part, which I think about would break the integrity of the waveform.
As for the filters for getting the hue and saturation; I simply carried out my very own biquad lowpass (as in, I copypasted the system). As I discussed, I feel there’s room for enchancment. At the moment, I take the angular velocity, take the logarithm of it, after which filter it, as a result of my reasoning was that taking the log would trigger the noise to be amplified and the filter would extra strongly take away it. However isn’t there some invariance in that ordering? idk I didn’t wish to suppose an excessive amount of about math tbh and in addition was too rushed to essentially take have a look at the waveform and spectrum of the angular velocity BUT IT WOUDL BE NICE