This is a follow-up to my previous post.
I described how sound can be digitized and put on an audio CD. In short, the numbers on the CD exactly describe the motion a speaker membrane needs to vibrate in order to reproduce the sound. Theses numbers are physically stored as miniscule pits burned along a spiral track around the CD, and 1,411,200 of these pits need to be read every second in order to reproduce CD quality sound.
This is also known as the bitrate, usually expressed for music in thousands of bits per second (kbps). Thus, the raw audio data on a CD supplies about 1,400 kbps.
The audio CD standard uses raw data (which means uncompressed and untreated) because when the standard was being developed, it was less expensive to create consumer devices without powerful processor to decode compressed audio data.
Today, the cost of microprocessors have decreased sufficiently that my discman can read compressed (MP3) music — but still at a price. It consumes three times the power compared to reading standard audio CDs.
So why do we compress music? Simply for storage reasons — instead or requiring 1,400 kbps, we can get reasonably equivalent results in a tenth of the bitrate. This means that for every uncompressed album that you store on your hard drive or MP3 player, you could have stored ten compressed albums.
So how do we compress music? First, we’ll visit another mindblowing concept — the frequency domain.
Phillip Glass’ famously composed a four minute song with nothing but silence. If you put this song on a CD, your stereo will continue to read 1,400 kbps from the disk, for a total of 46 megabytes. All of those numbers will, however, be zero. Zero, zero, zero.
You could recreate the entire track exactly with the following statement: 274 seconds of silence. I’ve compressed the song from 46 megabytes to two dozen letters without losing any information at all.
That’s a trivial example, and here’s another. I’ve composed a song that started an infinitely long time ago and will continue forever. It’s just a single pure tone, a wave that oscillates 440 times a second and never stops. I call it Fonzie.
Fonzie can’t be stored on an audio CD because it would take an infinitely large CD. You couldn’t use the graphing device from my first post because it would take an infinite amount of paper. You could excerpt it of course, but then it’s no longer Fonzie.
On the other hand, you can graph it another way. Instead of describing the sound wave with respect to time (and wasting paper), you can put all the possible tones that we can hear along the X axis, with the waves that oscillate slowly on the left (low frequency or bass tones) going to high frequency waves (treble tones) on the right. At exact 440 Hz, place a single dot at the volume you hear Fonzie. (440 Hertz is another way of saying 440 times a second).
Here we’ve taken an infinitely long song that goes on forever in the time domain and compressed it down to a tiny graph in the frequency domain, without losing a speck of information.
You’ve seen this type of graph before, in the equalizer of your stereo. The bars dance, showing the distribution of energy along the audible frequencies — a song with heavy base shows big bars on the left. If you were to play a morcel of Fonzie on your stereo, there would be a single, unmoving bar where 440 Hz is.
Obviously, my song Fonzie is very simple, regardless of the representation. However, changing your point of view of sound to the frequency domain is one of the most important concepts in analyzing and treating sound.
My song, Fonzie, is trivial to decompose into a single sinusoidal wave. Real noises, however, are much more complicated. Imagine that a special acoustic performance of Fonzie (abridged) is performed at Carnagie Hall, interpreted by the celebrated flautist, Henry Winkler.
When Henry plays the note on his flute, the sound he makes isn’t going to be a perfect wave. The graph of his performance in the frequency domain isn’t going to be a single point, but a steep hill centered around 440 Hz. This shape defines the characteristics of the flute sound. We can also expect little hills at 880 Hz, 1320 Hz and other multiples of 440. These are “harmonics” and are characteristic of most analog musical instruments.
My belch in the frequency domain won’t have nice and tidy spikes, which are characteristic of lovely tonal instruments. I imagine that it will be large and flat with a bit of energy in most of the audible range. This is characteristic of atonal instruments, such as percussion and distorted electric guitars.
One mathematical way to turn a signal in the time domain (such as the raw data coming off an audio CD) into a representation in the frequency domain (such as the display in the equalizer on your stereo) is to use the Fourier Transform. This formula provides a method to decompose any signal into sinusoidal waves and vice versa.
Mathematically, any signal or sound can be expressed as the sum of these pure waves. It’s somewhat of a paradox, however, since the sine wave is infinitely long and repetitive, and most sounds aren’t.
A newer branch of signal analysis proposes the decomposition of the time domain function into “wavelets” instead of sine waves. Where a sine function is like an constant wave along a long stretch of ocean, a wavelet is like ripples spreading out from a thrown rock — higher in the centre and gradually fading to nothing.
While wavelets have many interesting properties, the most important thing to take from this post is that a sound signal can be converted into another representation, and converted back without losing any information.
The fun comes from manipulating the signal in the frequency domain and seeing or hearing the changes in the time domain!
Should I go on?
Tin Foiled Technology
Recent Comments