Home > Technology > The Inqualified Scientist: Frequency Domain

The Inqualified Scientist: Frequency Domain

February 1st, 2005

This is a follow-up to my previous post.

I described how sound can be digitized and put on an audio CD. In short, the numbers on the CD exactly describe the motion a speaker membrane needs to vibrate in order to reproduce the sound. Theses numbers are physically stored as miniscule pits burned along a spiral track around the CD, and 1,411,200 of these pits need to be read every second in order to reproduce CD quality sound.

This is also known as the bitrate, usually expressed for music in thousands of bits per second (kbps). Thus, the raw audio data on a CD supplies about 1,400 kbps.

The audio CD standard uses raw data (which means uncompressed and untreated) because when the standard was being developed, it was less expensive to create consumer devices without powerful processor to decode compressed audio data.

Today, the cost of microprocessors have decreased sufficiently that my discman can read compressed (MP3) music — but still at a price. It consumes three times the power compared to reading standard audio CDs.

So why do we compress music? Simply for storage reasons — instead or requiring 1,400 kbps, we can get reasonably equivalent results in a tenth of the bitrate. This means that for every uncompressed album that you store on your hard drive or MP3 player, you could have stored ten compressed albums.

So how do we compress music? First, we’ll visit another mindblowing concept — the frequency domain.

Phillip Glass’ famously composed a four minute song with nothing but silence. If you put this song on a CD, your stereo will continue to read 1,400 kbps from the disk, for a total of 46 megabytes. All of those numbers will, however, be zero. Zero, zero, zero.

You could recreate the entire track exactly with the following statement: 274 seconds of silence. I’ve compressed the song from 46 megabytes to two dozen letters without losing any information at all.

That’s a trivial example, and here’s another. I’ve composed a song that started an infinitely long time ago and will continue forever. It’s just a single pure tone, a wave that oscillates 440 times a second and never stops. I call it Fonzie.

Fonzie can’t be stored on an audio CD because it would take an infinitely large CD. You couldn’t use the graphing device from my first post because it would take an infinite amount of paper. You could excerpt it of course, but then it’s no longer Fonzie.

On the other hand, you can graph it another way. Instead of describing the sound wave with respect to time (and wasting paper), you can put all the possible tones that we can hear along the X axis, with the waves that oscillate slowly on the left (low frequency or bass tones) going to high frequency waves (treble tones) on the right. At exact 440 Hz, place a single dot at the volume you hear Fonzie. (440 Hertz is another way of saying 440 times a second).

Here we’ve taken an infinitely long song that goes on forever in the time domain and compressed it down to a tiny graph in the frequency domain, without losing a speck of information.

You’ve seen this type of graph before, in the equalizer of your stereo. The bars dance, showing the distribution of energy along the audible frequencies — a song with heavy base shows big bars on the left. If you were to play a morcel of Fonzie on your stereo, there would be a single, unmoving bar where 440 Hz is.

Obviously, my song Fonzie is very simple, regardless of the representation. However, changing your point of view of sound to the frequency domain is one of the most important concepts in analyzing and treating sound.

My song, Fonzie, is trivial to decompose into a single sinusoidal wave. Real noises, however, are much more complicated. Imagine that a special acoustic performance of Fonzie (abridged) is performed at Carnagie Hall, interpreted by the celebrated flautist, Henry Winkler.

When Henry plays the note on his flute, the sound he makes isn’t going to be a perfect wave. The graph of his performance in the frequency domain isn’t going to be a single point, but a steep hill centered around 440 Hz. This shape defines the characteristics of the flute sound. We can also expect little hills at 880 Hz, 1320 Hz and other multiples of 440. These are “harmonics” and are characteristic of most analog musical instruments.

My belch in the frequency domain won’t have nice and tidy spikes, which are characteristic of lovely tonal instruments. I imagine that it will be large and flat with a bit of energy in most of the audible range. This is characteristic of atonal instruments, such as percussion and distorted electric guitars.

One mathematical way to turn a signal in the time domain (such as the raw data coming off an audio CD) into a representation in the frequency domain (such as the display in the equalizer on your stereo) is to use the Fourier Transform. This formula provides a method to decompose any signal into sinusoidal waves and vice versa.

Mathematically, any signal or sound can be expressed as the sum of these pure waves. It’s somewhat of a paradox, however, since the sine wave is infinitely long and repetitive, and most sounds aren’t.

A newer branch of signal analysis proposes the decomposition of the time domain function into “wavelets” instead of sine waves. Where a sine function is like an constant wave along a long stretch of ocean, a wavelet is like ripples spreading out from a thrown rock — higher in the centre and gradually fading to nothing.

While wavelets have many interesting properties, the most important thing to take from this post is that a sound signal can be converted into another representation, and converted back without losing any information.

The fun comes from manipulating the signal in the frequency domain and seeing or hearing the changes in the time domain!

Should I go on?

Categories: Technology Tags:
  1. February 1st, 2005 at 15:34 | #1

    Dr. Inqualified, your lectures are most illuminating. We would like to hear more (although some of this is over my head).

    I still want to understand exactly what information we are getting rid of when we compress a song from 1,400 kbps to something less, say 192 kbps. I’ve heard it explained before, but it was done in a “simplified for morons” kind of way.

  2. February 1st, 2005 at 18:26 | #2

    I think that the sound of chicken (around 600Hz) is the best.

  3. February 1st, 2005 at 22:51 | #3

    I’m not an expert like the Inqualified Scientist, but I can happily summarize some information I read on the Internet here.

    As TinFoiled stated, the logic of why mp3s sound as good as CDs boils down to compression: Specifically two types of compression: one lossy, one lossless.

    Lossless compression? What’s that? Well consider a zip file on your computer. The zip of a file is EXACTLY the same information as the original – just way smaller. An mp3 uses similar methods to achieve some reduction in file size.

    Lossy compression?
    Well – lossless compression on it’s own isn’t quite good enough – so there’s additional decisions made. Mp3s rely on the fact that humans hear a certain way: and will remove frequencies beyond our hearing capacity and will also remove some quieter noises in the background of louder noises (which we wouldn’t hear). The amount of lossy compression is determined by the final bitrate. 64Kbps = lots and lots of lossy compression, while 192Kbps = not very much lossy compression (so it sounds good).

    But why is 192 great and 128 terrible? No reason. It just turned out that way. So why does the world use 128 more than 192? Cause they’re dumb? 192 rocks!

    Does that make sense?

  4. February 2nd, 2005 at 03:13 | #4

    Thank you Doctor Skraba – if you are up for it, I’d love you to continue in your lucid elucidation of all things of the frequent domain.

    To Gned and others who are humming and hawing over which codec to use, and at which bitrate, and did I really notice the high-hat missing on that last song, here are a range of tests done from different slants:
    http://ff123.net/128test/interim.html
    http://www.xiph.org/ogg/vorbis/listen.html
    http://ekei.com/audio/

    And a good overall page is here:
    http://ff123.net/

    And, again, please Ryan continue, sir, continue!

  1. No trackbacks yet.
Comments are closed.