Visual DJ Software

Visual DJ


Visual DJ was a project that I intended to implement on my own. I started learning about audio programming, and in general was all gung ho about doing it. I had a lot of ideas about what the marriage of digital technology and DJing could do.

Nobody really seems to have gotten it right yet, as far as I'm concerned, mostly because they haven't combined good algorithms with an easy UI.

I gave up on writing Visual DJ myself. Why? Well, it just seemed like too much work for too little payoff. But I put up this webpage in the hope that it may influence other software designers to implement a few of these ideas in their own products. If this page influences your design, all I ask in return is that you provide me with a lifetime of free licenses to the software (smile).

This document is a little rough. I'd like to add some more to it eventually, such as illustrations and so on. We'll see if it happens!

Description of the problem

Do you know very much about DJing? A basic description of the task would be:

High level:

Select from a set of songs a sequence of music that "sounds good" together, and produces the desired effect in the audience. It often "tells a story".

Low level:

You have two turntables, and a box in the middle which can mix the sounds from these turntables. While turntable #1 is playing, place a record on turntable #2, and

  1. match the speed of the two songs.
  2. match up the structure of the two songs so that when they overlap, they sound good together.
  3. gradually or quickly mix together the two songs, and bring in record #2, and eventually fade out record #1
If you look at the structure of a musical piece (especially most electronic music), most of it has very clear underlying patterns in the music (generally referred to as musical phrases, of 8, 16, or 24 measures).

If we represent measures as # and a switch in the pattern as |, then

record #1 ########|########|################|########|########|
record #2                                    ########|########|########|########
          (record #1 is playing)             (both records)    (record #2)

The problem with vinyl

Here are the problems I see with the way DJs currently work (on vinyl):

Problem #1: making two records go the same speed

The primary skill they work on for about the first year is "beat matching", or making the two records go the same speed and have the beats hit at the same time.

This is essentially hand-to-ear coordination. It is a surprisingly difficult task to master.

The input device is a pitch control on both turntables, which adjusts the pitch of the record from -8% to +8%. You can also directly manipulate the record, by moving it forwards or backwards. This gives you both a position control and velocity control. As you can imagine, controlling the velocity is difficult.

Let's say the record you are trying to match is around 129.3 beats per minute (bpm). You will move the pitch on the second record back and forth, until you zero in on the correct pitch. (An interesting side project would be to find out if this works like Fitts' law. Does it work the same way for audio? It seems to be an acquired skill, with better DJs able to do it much quicker than novices.)

I've observed that DJs tend to get very bogged down in the technical details of their craft, and as a result are not able to concentrate on the aesthetic portions. It simply requires too much brain power to focus on the technical details and the aesthetic details simultaneously.

They are concentrating on how to do the task, rather than on the task itself. Sounds like something from Bill Buxton's Chunking and Phrasing paper?

Problem two: phrasing and music structure

After the first year, most DJs will have figured out how to match beats, and make two records go the same speed. What they have not usually mastered is phrasing, which is knowing how the structure of songs work, and matching the phrases of the two records together.

If a DJ understands phrasing, they will know how to add in a new song when it is appropriate. Because most electronic music has similar phrasing, it allows the DJ to layer music and have the resulting effect sound correct.

Phrasing requires an understanding of the structure of the music, and using it requires a deep memory of the structure of thousands of songs already played.

This means that phrasing is something that some DJs don't learn for several years. Precocious DJs will, and they will stand out because their performances will sound better. But again, this is an example of an area where DJs spend so much time mastering a technical skill that they're unable to focus on the task rather than the mechanics of the task.

Surprisingly, nobody seems to have developed a notation system for the structure of the music. At least there is not a system that DJs use.

Visual DJing

My project is to have a computer handle as much of the technical aspects of DJing as possible, leaving the DJ to concentrate on the task rather than the mechanics of the task.

Essentially, the task is reduced to two steps:

  1. marking up the music
  2. dragging (perhaps) and dropping to mix records

Marking up the music

This is the more interesting problem area, which I don't think has been covered very well elsewhere.

In this stage, the DJ marks up the music, showing as much useful information about the song as possible. In addition, the computer attempts to mark up any part of the music it can by algorithmic means. For example, there are probably algorithms for determining the BPM for a song, and the placement of the beats.

Here are the types of markup that might be useful:

  • Beat patterns, (perhaps similar to what drummer use)
  • Location of changes in the music (phrasing). I don't believe anybody has developed a concise notation for musical phrasing, although I've been working on it.
  • BPM (you could teach the computer by tapping the shift key for a few measures, or there are algorithms for determining this automatically. They are of varying level of reliability)
  • Visualizations of where the highs and lows are in the music, patterns in the music, etc. This portion would require some creativity, because we're attempting to make this as meaningful and yet as concise as possible. This is immensely useful for a DJ, because they often cut out the highs or lows while they are mixing two records together. Yet there hasn't been a really good form of visualizing the sounds over time that I've seen. Mostly what I've seen in music programs is just the frequencies mapped against time, but there is usually too much information, and not displayed in a meaningful way.
Here is a good website for Music and sound visualization


Here is the way I currently see it working, although I am probably going to change the design quite a bit:

Out of the entire collection of a DJ's music, there is a set of music which sounds good with the current song that is playing.

The DJ is shown a list of music that he has identified as sounding good with the current song (song #1-3). He can also choose any other song by name from his collection.

Information on the structure of the current song, and songs that would go well with the current song, is displayed on the screen.

The DJ can see if the beat structure matches up, because the markup shows where the beats are. They also can see any other visualization we're able to represent meaningfully. (What can be done meaningfully is a good research question).

The DJ just drags the desired song to the location indicated on the current song, and then they're able to mix the two records.

The computer handles the task of beat matching, and the phrasing is taken care of because the music markup reveals all the structure that phrasing requires. When the DJ drags the second song to the first, it jumps to spots on the song that represents phrase changes.

The DJ thus has a far easier task than at present. Instead of trying to master hand-to-ear coordination with beat matching, they go through a simple mark up process.

They transform the task from how to do it to what to do to sound good.


A Guide to Bird Songs, by Aretas A. Saunders 1951 Doubleday & Company Golden City, New York

The songs of birds vary according to five characters. These are time, pitch, loudness, quality, and phonetics. By combination of these characters we get form, rhythm, accent, and other attributes.

Musical notation describes pitch, time, and loudness, and to such a record notes on quality and phonetics may be added.

I have devised a method by which the time, pitch, and loudness of bird songs may be recorded as accurately as by musical notation, and at the same time more quickly and without the air or detailed knowledge of musical symbols.

Very clever  uses lines to represent the sounds.
Thickness of lines represents loudness.
Position vertically represents pitch
Position horizontally represents time
Gaps represent rhythm
Wavy lines represent trills
Curves represent slurs
Loops represent liquid consonants.
Quality of song is represented in text