Surround-Sound Mixing

It’s been 15 years now since I took the plunge and invested in a complete Dynaudio Air 5.1 surround sound monitor system. At the time, I thought that surround sound on DVD-Audio and SACD discs was going to take off among music listeners. Unfortunately, it didn’t happen. Back then, I could walk into Tower Records or the Virgin Mega store in Union Square, NYC and find DVD-Audio discs from the likes of Seal or John Hiatt, but the last time I visited Virgin, I had to talk to a whole bunch of store people before I found someone who even knew what a DVD-Audio disc was and how they differed from regular DVD-Video discs with music content on them. It was probably ten years ago, and the answer I got was something like “Hmm, yeah, I think I do remember those. No one ever bought them.”

Now, Tower Records is no more, and I’m actually not sure about the Virgin Mega store in Union Square that I used to visit. I’m guessing it’s not there anymore. SACD and DVD-Audio discs are still available on Amazon of course, but I doubt they sell in any significant numbers.

Looking back, there were a few obvious reasons for why SACD and DVD-Audio were commercial failures, some of them dependent of each outer in some sort of negative circular way:

  • The format war between SACD and DVD-Audio never got resolved. Without a clear winner, the market remained split. They’re both still around, but at best, they occupy small niches in the market.
  • Legal music distribution moved online, first to the iTunes store, and later to streaming services like Spotify and Apple Music, neither of which initially offered music in surround sound.
  • A large installed base of surround-sound capable playback equipment for SACD and DVD-Audio discs never appeared.
  • The DVD-Audio standard includes the CPPM copy prevention system, and SACDs do not use Pulso Code Modulation (PCM) digital audio but a completely different system that cannot natively be played back on most computers. Why would consumers want to invest in a new technology that restricts making backup copies or moving audio between various playback units? I think that without CPPM, DVD-Audio would have quickly beat SACD. That failure was completely self-inflicted.
  • There is no great way to play back standard 5.1 (or X.Y) audio through regular headphones, and so as portable audio became the dominant means of listening with the rise of the iPod and later iPhone and iPhone clone (Android etc) smartphones, standard surround sound was initially relegated to DVD and Blu-Ray discs and later to streaming film formats.


Before I purchased my own surround system, I’d hardly ever heard surround sound outside of movie theaters, so when I finally got my system all setup, I had a blast watching, and especially listening to, all three Lord Of The Ring DVD’s and many other big blockbuster action movies. Well-mixed surround sound really is an amazing enhancement of the movie watching experience, and that elevated my already high expectations for surround-sound music even further.

But after purchasing my first couple of DVD-Audio discs I was kind of disappointed about the whole experience. It took hours and hours of headache to figure out how to even play them back on my Mac. To this day I can only playback Dolby Digital and DTS streams through VLC and I have to resort to obscure command-line DVD-Audio utilities running in Windows under VMWare Fusion to rip the CPPM protected Meridian Lossless Streams to hard disk. But more significantly, I was disappointed by how music was mixed. It seemed to me that most mixes could be classified into either of two groups:

  • Conservative mixes
  • Gimmicky mixes.

The conservative mixes were mostly enhancements of the stereo-listening experience with a bit of ambience and some of everything added to the rear channels with lead vocals and a few other elements getting featured in the center channel. It was possible to get a similar experience by simply sending a plain old stereo mix to both the front and rear channels. The mixes surrounded the listener, but they didn’t really bring the listener into the music the way I had felt drawn into the films by those great film soundtrack mixes. I had been expecting much more depth, clarity, and more of a feeling of envelopment.

The gimmicky mixes felt like the engineers had simply spread all the elements out between the different speakers. There’d be a guitar in the rear right speaker, a synth sound in the rear left, vocals exclusively in the center channel, tons of bass and kick in the sub channel and so on. Then you’d have pan moves where the guitar solo would move from the front to the back or spin around you. Kind of cool, and it worked decently for some styles of music, but I wouldn’t call it tasteful and I fairly quickly grew tired of them.

I knew there was potential for beautiful, clear, and enveloping experiences but none of that was realized. And so I set out to try and create better-sounding surround-sound mixes.

Recording and Mixing in Stereo

There are many approaches to creating a stereo mix. For example, when mixing multiple mono sources, such as electric guitars, an electric bass, a saxophone, and a vocalist, all recorded as mono sources, they can be panned across the stereo image through the use of the pan knobs found on all mixing consoles and in all DAW:s. The pan knob divides the signal between the two speakers. Turn the knob to the left and it will send more of the signal to the left speaker and less to the right. Because ur ears and brains are sensitive to volume differences between what is heard by the left and the right ears, we perceive them as if the sound originated somewhere to the left of middle point between the speakers.

Another way to achieve a similar result is through the use of delay. For a sound source at a certain location, such as someone clapping their hands in front of you, but a little to the left, the sound waves reach your left ear slightly before they reach your right one. This happens because the distance from the source to the right ear is greater than the distance from the source to the left ear. To achieve a similar effect while mixing mono sources in stereo, the mix engineer can delay the signal being sent to the right speaker slightly. There are limits to how much delay can be used though, because at a certain point, we start distinguishing the two signals as separate signals instead.

True stereo recordings, done with multiple microphones, take advantage of one or both of these effects during the recording. One example is placing a pair of microphones in front of an orchestra in a so called ORTF-configuration – French audio engineers at the Office de Radiodiffusion Television Francaise came up with this approach in the 1960’s, trying to mimic how human ears work. Two cardioid microphones are spaced apart about 7 inches, (approximating the distance between human ears) and aimed away from each other at a 110-degree angle, thereby taking advantage of both arrival time differences and volume differences between the microphones’ on-axis and off-axis response.

Another example, that only takes advantage of volume differences, is the X/Y stereo configuration, involving two microphones placed extremely close together (eliminating arrival-time differences) but aimed away from each other at a 90 degree angle. This approach uses volume differences created by the microphones’ on-axis and off-axis responses in combination with the 90-degree angle, and gives a fairly realistic stereo image while still maintaining good mono compatibility.

Summing a stereo image that includes arrival time differences, whether recorded that way or created by using delay during mixing, can often lead to undesirable effects such as comb filtering and loss of sound at certain frequencies.

Frequency differences between what is picked up by the left and the right ear are also used by our brains for localisation. This is related to how the shape of our skulls and our noses filter frequencies. Additionally, the shape of our ears and how those shapes affect sound ways on their way into the ear are used by our brain to further enhance localization.

A lot of these characteristics are used in so called “binaural recordings”. A typical example of a binaural mic uses a “dummy” head, with fake ears made of silicone. Microphones are placed a bit into the fake ear canal, mimicking human hearing as closely as possible. That way, all the different aspects of how we localise sounds are used. The main down side of binaural recordings are that they only work well when played back over headphones. Additionally, the shape of everyone’s ears are unique, so the best quality would be achieved when using silicone replicas of one individual’s ears. Of course, it would only work optimally for that one individual.

There are other stereo recording techniques such as M/S and Blumlein which I won’t discuss here.

Generalising, I would say that stereo recording techniques provide more realistic sounding results than stereo images created during mixing. That being said, what works best in any given situation is of course completely at the discretion of the engineer and will depend on his or her artistic goals.

Mixing in Surround

Now what about surround? Obviously, the panning of mono sources in a standard 5.1 (or X.Y) surround-sound mixing environment works very similarly to the way it works in stereo. It gets more complex of course, as the panner has to divide signals between more speakers, and you now have a three dimensional sound field with both an x and a y-axel, rather than just a simple two dimensional field between the left and right speaker. Panning through delay works too (with the above mentioned limitations), and naturally, the two can be combined.

There are issues with mixing this way though. Since the same signal gets sent to many speakers (which is what happens with the surround-sound panners that are built into most DAW software digital or analog consoles capable of surround-sound mixing), the location of the listener becomes very important. Unless he or she stays exactly in the center of the surround field, you’re going to end up with arrival time differences between the signals originating from the different speakers. And this time, these are not intentional differences, decided upon by the engineer, but rather, they depend upon the listener’s location, which of course, the engineer cannot control. The result is that pan positions change when the listener moves. This is true for stereo as well, but now there are more speakers reproducing the same signal, resulting in more comb filtering effects and ultimately a more cluttered and less clean-sounding listening experience.

There are also surround-sound recording techniques equivalent to the X/Y and ORTF methods using multiple microphones, sometimes combined into surround-sound microphones. These can be great, but they don’t leave much room for creative choices during mixing since the panning position for audio sources largely gets decided by how they were placed in the space where the recording took place. Sure, levels sent to the different surround speakers can be tweaked and manipulated during mixing, but only so much, because at some point, where the level gets too low or too high in some of the speakers, you loose the realistic feeling of the original recording, which defeats the purpose of having recorded that way in the first place.

MSS – Multi-Stereo Surround

With all of these issues and potential problems in mind, I developed a new surround-sound mixing approach that can help achieve better results. A good friend of mine later came up with a name for it MSS, for Multi-Stereo Surround, since the technique involves creating multiple stereo images that are distributed between the various stereo pairs that are possible in a system with many speakers. Recording in stereo using the previously mentioned stereo techniques gives excellent results, and unlike their surround equivalents, they are not very complicated to setup. Using two microphones for a piano, drum overheads, and acoustic guitars etc, is very common, and most recording engineers already do. Additionally, all existing plugin and external reverb and delay effects can output their results in stereo.

So the approach is basically to create multiple virtual stereo pairs between all the speakers. In a standard 5.1-channel surround-sound system (leaving the subwoofer channel out of the equation since it is only used for very low frequencies) there are a total of 10 possible stereo pairs:

Figure 01

This means that you can have 10 stereo sources playing between 10 different speaker pairs (and therefore not competing with each other very much), theoretically resulting in very clean and spacious-sounding results.

For example, I could mix a simple jazz recording as described below:

For the piano (recorded with an ORTF stereo array), the left mic is sent to the left rear speaker and the right mic is sent to the center speaker. The stereo piano channel is also sent to a stereo reverb effect with its stereo output configured so that the left channel is sent to the left front speaker and the right channel to the right rear speaker. I really like the kind of sound this creates. It’s definitely not a realistic sound in the true meaning of the word, but having the close-miked piano come from two speakers while having the ambience of it coming from two other speakers in different locations does create a very clear, enveloping, and, arguably, natural sound.

The overhead stereo pair for the drums is treated similarly and sent to the left front and the right rear speaker, with the reverb from them sent to the left rear and right front speakers. The mono kick drum and snare drum mics are panned in the center between the left and right front speakers, with some being sent to the center speaker as well. The snare is also sent to a reverb that outputs to the left and right rear speakers.

The acoustic bass (which was recorded in mono) is panned between the center and right front speaker. It’s also sent to a stereo reverb with the output being sent to the left rear and left front speakers.

The vocals (recorded in mono) are kept in the middle, sent to the center speaker as well as panned center between the left and right front speakers. The vocals would also be treated with some reverb, and the output of the reverb would be sent mostly to the rear left and right speakers with only some to the front left and right and the center speaker to keep the front from sounding “too dry”.

Finally, the saxophone is placed almost in the same location as the bass, (since those two instruments occupy quite different frequencies and hardly compete with each other) between the center and right speakers. Putting the piano where I did kind of tilts the whole mix towards the left, so having the saxophone and bass towards the right helps compensate for that. Some saxophone is also sent to a simple short delay, imitating a natural slap back, with the output sent to the left rear and the left front speakers. The saxophone also gets some reverb with the output sent to the left front and right rear speakers.

Take a listen for yourself!

In case this all sounds a bit too abstract and hard to understand, the mix I described above is a real song that I actually did mix that way. You can download it and check it out for yourself. However, since online distribution of surround-sound content is still in its infancy (and might stay that way indefinitely), it’s not as easy as I’d like it to be. Apart from the obvious basic requirement of you, the listener, actually having a surround-sound playback system, there are three different ways for you to get the content from my server to your ears, and the way that is best for you depends on what kind of equipment you own and how it is setup.

You can download the file right here: My Little Brown Book DTS.wav

Note! Just right-click on the link and choose Save As, Save Target As, Download Linked File, or something like that. DO NOT just left-click on the file as your browser might try to play it back directly, which could cause a lot of loud noise to come out of your speakers. It also likely would not play back in surround sound.

Downloading the file is only the first step. Now you have to decide how to play it back. And just like when you were downloading the file, DO NOT just double click on it since doing so could cause a lot of noise to come out of your speakers. That’s because this file is encoded in DTS format.

Now, the three ways of playing this file back:

You own a multichannel sound card for your computer that outputs each channel (left, right, center, LFE, left rear, and right rear on separate analog output connectors, hooked up to either six analog inputs on an amplifier or directly to each powered speaker. If this is the case, all you need to do is download the free (as in both beer and freedom) media player VLC and configure it to work with the surround outputs of your sound card.

Your computer has either an optical or a coaxial digital audio output (spdif) connector, and you have it connected to a receiver with the appropriate input, and that receiver has a DTS logo on it. If this is the case, you should be able to play back the file through any media player software, such as iTunes, Quicktime, Microsoft Media Player, Winamp, etc. If you still only receive noise, it might be that you have to enable something like Bitstream Passthrough, DTS Passthrough, Dolby Digital Passthrough, or Spdif Passthrough, either in your playback software or your sound card settings or both.

You own a CD or DVD player with the same kinds of digital audio outputs I described earlier, and you have it connected to a receiver with a DTS logo on it. If this is the case, just burn the file onto a CD with any burning software (Toast, iTunes, Easy CD-Creator, etc) and you should be able to play it back on your system. Just be sure you make a regular Audio CD and not an mp3 or data CD or something like that.