
It occurred to me the other day as I was scrubbing through an audio track and writing down the time stamps for major transitions, “There has got to be an easier way to do this.” When I first started playing around with audio responsive visuals, I simply relied on the FFT data and nothing more. This was fine for a while but soon I started to want a bit more control.
I knew that accurate beat detection was beyond my grasp and as far as I can tell, it is beyond the grasp of even MIT PhDs. Echo Nest was mentioned recently and I gave it a whirl. It offers info like pitch data, beat estimation, timbre, loudness, and a ton of other features that I don’t fully understand. But the beat detection didn’t seem much more accurate than what I and many others on the Processing forum have been able to achieve. I am not trying to poop on their work, which is substantial and seems to shine in their audio editing examples involving restoration and looping. More than anything, their work proved to me that if MIT PhDs can’t make beat detection work to a level of accuracy that would satisfy me, then perhaps I should stop trying. I mean really… where is the beat here?

Once you add vocals to the equation, the beat often gets obscured to a point where it is nearly impossible for folk like me to find it. I imagine I would have better luck if I looked into predictive beat detection, but my mind just isn’t up for that particular task.
Strangely, a new level of awesomeness came as a result of this decision. When working on the Advanced Beauty video piece, I wrote down about two dozen transition points where something big needed to happen. This was a bit of a chore. Actually, this was a horribly annoying chore. And it was one that I repeated a few times afterwards. It resulted in incredibly messy code which took the form of:
if( currentTime > 15.3 && currentTime < = 15.3334 ){
do this thing
} else if( currentTime > 37.9 && currentTime < = 37.9334 ){
do this thing
} else if ...
I know, I know. Its horrible. And all the while I knew what I should have been doing but it would have required taking a step back and doing something a bit tedious. But it was finally time. I had had enough of the horrible multi-page long if-statements.

In theory, its simple enough. The execution though… that was a bit of a head scratcher and was probably the reason I put it off for so long. It was a mathematics mess. What I needed to make was an application that would allow me to scrub through an audio track and mark transitions with key presses. Then I would just save out the cue points and pop them into my next visualizer. Voila!
What I hadn’t anticipated was the confusion with the variable names. Here is a summary. Since I was dealing with video, I had to refer to FRAMERATE. The FRAMERATE is the number of FRAMES per second. The audio file uses a SAMPLE object. The SAMPLE has a SAMPLERATE. The SAMPLERATE is the number of FRAMES or SAMPLES every second. A common SAMPLERATE is 44100 which means the audio has 44100 SAMPLES per second, but the SAMPLES can also be called FRAMES. DAMN!
After a bunch of variable renaming and a mess of ratio problems, I had my application. But as I was working on it, I realized I was limiting myself. I could do much more than just punch in major transitions. I could also punch in the beat! Manual beat analysis! Should I even try it?

And try it I did, and I was surprised to find it was actually quite easy. I use the number keys to assign different cue points. For this first piece, I used ‘1′ for the snare, ‘2′ for the bass, and ‘3′ for the whistling. Simple enough. It required a couple passes but it took no longer than 10 minutes. Since I was also able to slowdown, stop, and reverse the playback, it was easy to go back and correct some badly timed key presses.
Then another thought: Why not do the vocals too? Why not indeed. So as I was punching the ‘4′ key every time Goldfrapp sang a syllable, it seemed that I might as well just go one more step and add the lyric data. A quick search of Google got me the actual lyrics from the song. I went back through and used the ‘5′ key to add extended cue zones. If she was in the process of singing, I would hold down the number ‘5′. Combined with some modified data from the ‘4′ key which I reentered to signify any time she was starting to sing a new word instead of pressing it every syllable as I had the first time through, I was able to sync the lyric data with the visuals. Whee! All in all, it took less than half an hour. Compared to my older process which could take over a day and would have really weak beat detection, this was a huge step forward.

Audio is by Goldfrapp (”Lovely Head” from her first album”). There are a couple places where the text failed to show. This is a bug in my code and is being addressed. Sadly, the word ‘Frankenstein’ appears for but a frame near the end of the song. Sad!
Solar, with lyrics. from flight404 on Vimeo.
Click ^ to go to the HD version.

Have you looked at Serato software package? I’m a DJ and it provides a lot of visual data other than just the waveform, as you talked about, with specific color coding for bass / treble / mid sound data, so that a DJ can isolate the beat within the waveform.
You can sort of see what’s going on here:
http://www.youtube.com/watch?v=oE17Fq9uRHI
Awesome work! I’m not too familiar with the research that you’re referring to at MIT for beat detection, but has anyone used Sparse-Time-Frequency-Analysis to make beat detection software? It seems as though it’d make the problem a lot easier by allowing you/the software to deconvolve which spikes in amplitude to actually focus on for marking.
http://www.pnas.org/cgi/content/abstract/103/16/6094?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&fulltext=magnasco+frequency&searchid=1&FIRSTINDEX=0&resourcetype=HWCIT
just a random thought. take care and keep it up!
I usualy dont post …
probably never
but now, I cant just watch this and pass
man, this is great !!
really great !!
Jaw. Drops. Open.
This is really awesome work.
It should get its own gallery installation.
Wow! Stunning stuff! The addition of the lyrics elevates this way above the previous version, and the song itself is one of my absolute favourites. Thanks for sharing.
BTW: Goldfrapp are a band, with members Alison Goldfrapp and Will Gregory.
thank you very much for allowing me to see this
Where can I buy a copy on bluray/download a 720p version?! I’m serious!
Flight404: You said you were too lazy to look up references. Here is one link, with a lot of related work. Hundreds of papers, in fact:
http://www.ismir.net/proceedings/
And while I am not claiming that the following reference is the only or even necessarily the best work on music signal segmentation, let me point you to some of my own work that looks at this issue:
http://ismir2005.ismir.net/proceedings/1038.pdf
In this work, my co-author and I used a combination of beat tracking and harmonic analysis to automatically label and segment the various sections of songs (see Section 6, for example). We found that it generally performed quite well, with pleasing results. This is extremely similar to the notion of using beat tracking plus lyrics — in fact, harmonic progression changes often coincided with lyrical changes (e.g. chorus, verse, etc.)
But you are correct; implementing this did take a lot of effort, with some pretty serious mathematics.
Other authors who have explored similar notions, if you have further interest, include but are not limited to Christopher Raphael and David Temperley. Take a look at some of Temperley’s work. It would be very interesting to play around with visualizations for some of his “metrical levels” computations. In other words, he goes beyond just looking at the beat, to automatically analyzing the time signature of a piece (4/4, 3/4, 6/8, 11/16, etc). Then, you not only know the beat/tactus of a piece, but you know *which* beat is the downbeat, *which* beat is the upbeat, etc. That knowledge could give you very cool visualizations.
geee this is important… gosh too many things to do
geee this is important… gosh I’d love to do another video with the song of a friend of mine. do you plan to release a tool 4 beginners sometime???
Flight404: Radiohead music video e Solar…
Weird Fishes: Arpeggi from flight404 on Vimeo.
O Radiohead abriu um concurso de vÃdeo em março – que esqueci de comentar no último post sobre os mesmos. Enfim, Robert Hodgin – do Flight404 – criou o vÃdeo acima, da música Weird Fishes, todo fe…
I know at least that Thomas said it earlier, but I’ll mention it again..
I’m a musician, DJ, and producer so I stare at graphical representations of sound all the time, be it samples of voice, or drums, or complex songs. I would hard pressed to identify a single beat in most songs from the waveform. As I understand it, waveforms are basically just functions of intensity (loudness).
On the other hand, I generally use a Spectrographic representation of the sound which *very* clearly shows beats, snares and hats can be deciphered from the visual cacophony as well. Certain speech and syllables can be distinguished by someone with a learned eye. It would be easier for a program to do if it knew the BPM at the beginning.
The best example of automatic BPM and beat-detection I’ve seen is in Traktor, a DJ studio application by Native Instruments, it’s usually dead-on.
My email is available to you Flight404/Robert.
use a combo of transient (time-domain), fft (freq-domain), and filterbanks. triangulate and “fuzzy-logic” confidence between the three and you can get amazing results most of the time. i did some playing in MaxMSP but not yet Processing. the problem is all the damn compression, har. most people here don’t understand that you are trying to pull the instruments out of a stereo mix. it’s not the bpm as that rarely changes these days…
off to check out that Analyze API…
cheers you are an inspiration.
I dont know if this has been posted or not because as much as you dont like math, I dont like reading strangers ramblings. But as far as lyrics making beat detection near impossible, why not get the Karaoke version or instrumentals?
Also, bands like NIN are now releasing multi-track songs. Some of them even encourage you to remix their stuff.
Love the Goldfrapp video though. Unbelievable work.
this stuff is amazing i must say, i haven’t any idea about processing but it looks amazing.
i downloaded the software and can just about manage to make a damn pyramid … what next ? haha
This is really awesome work. HD version is better.
Wow! I just got into Processing and I didn’t know you could do anything like that? Do you have source code for this or any of your other great pieces (namely magnetic ink, solar, etc)?
Thanks!
[...] ã® Solar, with lyrics ã‚‚ãã‚“ãªä½œå“ã®ï¼‘ã¤ã€HDã¯vime.comã§ã€‚ [...]
[...] loved this song for a long time, but combined with Processing it’s out of this [...]
[...] animação produzida pela flight404, onde a banda Goldfrapp (álbum â€Lovely Headâ€) exibi a música solar. É uma animação que une [...]
I had long forgotten about (never tried it either) Processing. Today I was reading an article about it and got here. Mesmerizing visualization. Amazing work.
[...] // CitÄts no “Processing: A Programming Handbook for Visual Designers and Artists”. Processing mÄjaslapa. Abi video no flight404. [...]
[...] Aqui vai um excelente exemplo de Robert Hodgin do Flight404. [...]
[...] Clipe feito por um dos caras do Barbarian Group em Processing para uma música do Goldfrapp (”Lovely Head” do primeiro album). Mais sobre no blog flight404… [...]
[...] all manner of distractions » Blog Archive » Solar, with lyrics (tags: cool math audio graphics visualization animation processing visualisation lyrics) [...]
[...] on a way to integrate lyrics into the visualizers. Read about my process here… flight404.com/blog/?p=111 Posted by Mate » Animation, Musica, [...]