Ambiophonics, 2nd Edition: Replacing Stereophonics to Achieve Concert-Hall Realism
Preface
Ralph Glasgal, Founder, Ambiophonics Institute, Rockleigh, New Jersey
July 1999
Ch. 1 Ch. 2 Ch. 3 Ch. 4 Ch. 5 Ch. 6 Ch. 7 Ch. 8

www.ambiophonics.org

Dedication

To Johann Sebastian Bach, Richard Wagner, Gistav Mahler, and Lauritz Melchior, without whom we would never have bothered and to Manfred Schroader, Don Keele Jr., Bob Carver, Yoicho Ando, and Ole Kirkeby without whose collective research we would never have succeeded.

Introduction

Ambiophonics: Recreating the Concert Hall Experience at Home

There are essentially only two ways for music lovers to enjoy music performed for them on traditional acoustic instruments. One is by going to a concert hall or other auditorium, and the second is by staying home and playing the radio/TV or a recording. This book and the techniques it describes are dedicated to helping you make the home music-listening experience as audibly exciting as the live experience. Those audiophiles who share the dream of recreating a concert-hall sound field in their home, and who constantly strive to create a sense of "you-are-there," we have christened "ambiophiles". We call the science and technology used to create such an acoustic illusion "Ambiophonics".

Defining the Problem

Barry Willis wrote in Stereophile Magazine (August, 1994), "The idea that any musical event can be reproduced accurately through a two-channel home-audio system in a room that in no way resembles the space in which the original event took place is ludicrous."

Mr. Willis was absolutely correct in this when he wrote those words, but is much less so now, because Ambiophonics successfully works and its purpose is precisely to make the home-audio room resemble the space in which the original event took place. He goes on, "At present even the best discrete multichannel surround systems can offer only an illusion of being there." Experienced ambiophiles (a rare breed) would agree, but would also point out that surround sound is deliberately designed to produce the illusion of "they-are-here-around-you" which, while exciting for movies, is always going to be the antithesis of "being there". Finally, thoroughly despondent, Mr. Willis writes, "Tonal accuracy is the best that can be hoped for in a traditional audio system; true spatial accuracy will never happen. Audio products should come bearing this disclaimer: WARNING: IMAGE PRESENTED IS LESS THAN LIFELIKE."

The rest of us need not despair. Ten years of experiments have been devoted to demonstrating that "lifelike" can happen, and with exceptional fidelity to the original, from just two standard LP or CD channels. Yes, the Ambiophonic method described in this book may not always precisely duplicate a particular hall, but it can create a hall and a vibrant stage that could exist architecturally, that rings true, and is lifelike enough to mimic a good seat at a live musical event.

Traditional Audiophile Articles of Faith

Many, if not most, serious audio enthusiasts presently believe that it is possible to achieve a solid stage image that may even extend beyond the loudspeaker positions, by employing the usual arrangement where two loudspeakers and the listener form something close to an equilateral triangle. They have faith that the perfect loudspeaker, amplifier, CD, LP, or 96/24 DVD player, and special cables will produce that wide, sharp imaging, stage depth, and ambient clarity that we all seek. Many also believe that audiophile-grade equipment, properly selected and tweaked, combined with signal path minimalism is more likely than simple acoustic listening room treatments to produce a higher fidelity sound field with enhanced width and depth. Some audio hobbyists prefer to listen primarily to small ensemble "they-are-here" small jazz-combo sounds such as found in the Chesky catalog and thus have no need or even desire to achieve a realistic orchestral or operatic sound field. They feel strongly that large scale symphonic or operatic classical sound reproduction is not what the high end should concern itself with and this view is reinforced at hi-fi shows and showrooms where almost all demos use recordings of small combos, often consisting of just a voice, a guitar and a little percussion. Many devoted home listeners also hold that the rear hall reverberation captured by the recording microphones is being properly reproduced when it comes, together with the direct sounds, from the front loudspeakers.

A new breed of video-age audiophile is in the majority and is convinced that hall ambience, extracted from specially encoded or directly from multichannel recordings and steered or fed to two or even four surround speakers can achieve the "you-are-there" illusion. This latter group is at odds with those who hold that any such processing or non-minimal microphone techniques is anathema.

Considering these prevailing and conflicting conceptions and misconceptions, it is remarkable how good, and even exciting, a sound can be produced by such ad-hoc equipment and still basically stereophonic methods. The musical sound generated by products from the overwhelming majority of serious stereo reproduction equipment manufacturers is truly first class as far as it goes. But the traditional methods of deploying this superb equipment at home has reached a dead end as far as closing that last yawning gap between perfect but flat fidelity and true spatial realism.

Ambiophonics-the Next Audiophile Paradigm

I believe that Ambiophonics not 5.1 or similar surround sound method is the logical successor to Stereophoinics. I also believe the majority of serious home music listeners are closet ambiophiles who really want to be in a realistic, electronically created concert hall, church, jazz club, theater or opera house when listening to recorded music at home. The purpose of this book is to pass on the results of the research and experiments that I and others have performed. Ambiophiles everywhere can take comfort in the fact that it is both theoretically possible, possible in practice, and reasonable in cost to achieve the formerly impossible dream of recreating a "you-are-there" soundfield from standard unencoded LPs, CDs or DVDs in virtually any properly treated room at home. In Ambiophonic parlance, when we say "real" we mean that an acoustic space of appropriate size and stage width has been created that is realistic enough to fool the human ear-brain system into believing that it is within that space with the performers on stage clearly delineated in front. The nice thing about Ambiophonics and existing two channel recordings is that so-called stereo recordings are not inherently stereophonic. That is, the microphones act like ears. They don't know that their signals are going to be played back in an untreated room and subjected to crosstalk, pinna angle distortion, and the other ills described below. Thus virtually any two channel recording of acoustic music, unless panned or multi-mic'ed to death, will respond well to Ambiophonic processing and reproduction.

The Ambiophonic techniques described in the following chapters produce a sound stage as wide as that seen by the recording microphones, an early reflection sound pattern that defines the hall size and the character of the recording space, the listener's position within that hall, and finally a reverberant field that complements the content of the music and the original recording venue.

Although Ambiophonics does not rely on decoders, matrices or ambience extraction, it does incorporate commercially available digital signal processors, which are essentially special-purpose computers, to recreate the appropriate ambience signals. It is therefore a prime article of ambiophile faith that while such signal generators are always subject to improvement, they have already reached an audiophile level of performance if one uses them Ambiophonically as described in the chapters that follow. It is also not the belief of the author that there is only one fixed way to achieve the Ambiophonic result. But I believe the Ambiophonic principles enumerated below can form a better foundation to build on than the now seventy-year old stereo and its unfortunately, closely related, surround-sound technology.

In brief, Ambiophonics uses room treatment, radical front channel loudspeaker positioning, computer generation of early reflections and the later reverberant fields, and additional loudspeakers, strategically placed, to accurately propagate such ambience. Not every audiophile will be able or willing to do all that I suggest, but as each feature of the Ambiophonic system is implemented the improvement in realism will be easily audible and clearly rewarding.

If any science can be called ancient, acoustics is certainly one of them. The literature on acoustics, concert-hall design and sound recording is so vast that I am prepared to concede in advance that no individual fact or idea in the chapters below has not already appeared, at some time in some journal. I can only hope that the concatenation of all the ideas and devices that define Ambiophonics has some modicum of novelty. While I don't need to credit pioneers as far back as Helmholtz and Berliner, I would like to acknowledge my debt to such relatively recent researchers as W. B. Snow, James Moir, Don Keele Jr, stereo dipole-ist Ole Kirkeby, Manfred R. Schroeder, and his former colleague Yoichi Ando whose ideas on how to build better public concert halls inspired me to adapt his methods to create fine virtual halls for at home concerts.

Preface to 2nd edition

Whither Stereo In A Surround-Sound World?

The Psychoacoustic Flaws in Both Stereo and 5.1 Music Reproduction and Why Multi-Channel Recording Cannot Correct For Them

As we approach the 21st century, it seems reasonable for audiophiles to ask where the bridge from stereo music reproduction to the next sonic millennium is leading or even if there is such a bridge. Stereophonic sound reproduction dates from 1931 and unfortunately as we shall see in this book has serious unredeemable flaws. But it only makes sense to replace it if there is something better that is reasonably practical and of true high-end quality. Fortunately, there is such a paradigm as described in the chapters that follow. I want to make it clear from the outset that this book is often technical and should be of interest only to those who are not satisfied with the performance of their present stereo or surround systems, possess keen ears, and have an insatiable desire to replicate the concert-hall experience when reproducing serious music from standard LPs, DVDs (DTS or AC-3) and CDs, a happening that has almost certainly eluded them thus far. Ambiophonics does not deal with movie or video sound reproduction where direct sound effects come from locations off-stage.

What Is Realism in Sound Reproduction

In this book, realism in staged music sound reproduction is understood to mean the generation of a sound field realistic enough to satisfy any normal ear-brain system that it is in the same space as the performers, that this is a space that could physically exist, and that the sound sources in this space are as full bodied and as easy to locate as at a live concert. Realism does not necessarily equate to accuracy. For instance, a recording made in Symphony Hall but reproduced as if it were in Carnegie Hall is still realistic even if inaccurate. In a similar vein, realism achieved carelessly does not always mean perfection. If a full symphony orchestra is recorded in Carnegie Hall but played back as if it were crammed into Carnegie Recital Hall, one may have achieved realism but certainly not perfection. Likewise, as long as localization is as effortless as in real life, the reproduced locations of discrete sound sources might not have precisely the same sometimes exaggerated perspective as at the recording site to meet the standards of realism discussed here. An example of this occurs if a recording site has a stage width of 120 degrees but is played back on a stage that is only 90 degrees wide. What this really means in the context of realism is that the listener has moved back in the reproduced auditorium some twenty rows, from the first row but either stage perspective can be legitimately real. Finally, mere localization of a sound source does not guarantee that such a source will sound real. For example, a piano reproduced entirely via one loudspeaker, as in mono, or by two in stereo is easy to localize but almost never sounds real. The mantra goes, Mere Localization Is No Guarantor of Realism. Interestingly, one can have monophonic realism as when you hear a live orchestra from the last row of the balcony but can't tell whether the horns are left, right, or center.

Since most of us are quite familiar with what live music in an auditorium sounds like, we soon realize that something is missing in our stereo systems. What is missing is soundfield completeness and psychoacoustic consistency. One can only achieve realism if all of the ear's hearing mechanisms are simultaneously satisfied without contradictions. If we assume that we know exactly how the ears work, then we could conceivably come up with a sound recording and reproduction system that would be quite realistic. But if we take the position that we don't know all the ear's characteristics or more significantly that we don't know how much they vary from one individual to another or that we don't know the relative importance of the hearing mechanisms we do know about, then the only thing we can do, until a greater understanding dawns, is what Manfred Schroeder suggested over a quarter of a century ago, and deliver to the remote ears an exact replica of what those same ears would have heard if present where and when the sound was originally generated. The old saw that, since we only have two ears, we only need two channels in reproduction has been justly disparaged. I would rephrase this hazy axiom to read, that since humans have only two ear canals, to achieve realism in reproduction, we need only provide the same sound pressure at the entrance to a particular listener's ear canal, even in the presence of head movement, that this same listener would have experienced at his ear canals had he himself been present at the recording session. Fortunately, it does turn out that only two recorded channels are in fact needed for realistic music reproduction (more are actually detrimental) and it is the purpose of this book to show why this is so and how to do it.

This axiom requires that all reproduced, higher frequency direct or ambient sound come from as close to the correct direction as possible so as to reach the ear canal over a path that traverses the normal pinna structures and head parts. Thus home reproduced hall reverberation should reach the ears from many sideward and rearward locations and the early reflections from a variety of appropriate front, side and rear directions. This is why just the two rear surround speakers of 5.1 can never provide psychoacoustically satisfying hall ambience. Likewise central sound sources should come from straight ahead rather than from two speakers spanning 60 degrees. (A center speaker is no help in this regard as we will show below). Another precept that must be kept in mind is that your pinnae are unique like fingerprints. Using somebody else's pinna or pinna response, unless you get desperate, is not a good audiophile practice. A case in point is the use of dummy head microphones with pinnae. If the sound is reproduced by loudspeakers then all the sounds pass by two pinnae one of which is not even yours, and the result is strange and often in your head. If you listen, using normal pinna compressing earphones, then you are listening with someone else's pinnae and there is no proper directional component at higher frequencies. The usual result is that the sound seems to be inside your head. If the dummy head doesn't have molded pinnae, and you listen with earphones, there are no pinnae at all and the sound again seems to be inside your head or strange. You can't fool Mother Nature.

Perfecting Stereo

While there are some widely held hi-end beliefs that may have to give way to psychoacoustic reality, the basic audiophile ideal that two channel recordings can deliver concert-hall caliber musical realism is not that far off the mark. However, having only two recorded channels does not mean being limited to only two playback loudspeakers. I call the coming replacement for today's stereo 'ambio' optimized but uncompromised for the recording and reproduction of frontal acoustic musical performances such as concerts or operas rather than movies and video. By definition, and as substantiated below, where audiophile purity is concerned, multi-channel recording, especially with a center front channel, not only is not needed but is actually psychoacoustically counter productive. The concert hall genie cannot be squeezed into the 5.1, 6.1, or 7.1 or 12.2 moving picture surround sound bottle.

There are two basic theoretical technologies that are prime candidates to replace stereo in elite high-end music reproduction, where mass marketing and complex technical concepts should not be (but of course are) major stumbling blocks. One is the wavefront reconstruction method often employing hundreds of microphones and speaker walls or, where recording is involved, Ambisonics. The Ambisonic wavefront reconstruction method generates the correct sound pressure and sound direction in a region that at least encompasses one listener's head. The other is the binaural technology method that more directly duplicates the live experience, independently, at each ear. Of course, both tehnologies aim to deliver to the entrance of your ear canal an accurate replica of the original sound field. The Ambisonic method does have the advantage that it can reproduce direct sound sources from any angle and so is quite well suited to non-concert events or movies. But since the Ambisonic wavefront reconstruction method requires a special microphone, a minimum of three (or better four recording) channels and a very complex decoder, is not as accurate as the binaural technology methods, and does nothing for the existing library of LPs and CDs it will not be considered further here.

As we shall show, the advantages of a binaural technology method such Ambiophonics is that only two recorded channels, two front loudspeakers, and a scaleable number of non-critical ambience speakers are necessary. That, although using a single pinnaless dummy head microphone works best (such as the Schoeps KFM-6, see below), this new playback technology does not obsolete the vast library of LPs and CDs; it enhances most of them almost beyond belief. Binaural Technology is also room shape and decorator friendly in that the front speakers can be very close together and thus be placed almost anywhere in a room. Another difference between direct wavefront reconstruction such as Ambisonics and binaural field synthesis such as Ambiophonics is that in the latter case one can season the experience by moving one's virtual seat or changing a space, entirely, to suit the music or your taste. As explained in later chapters, this is not logical with 5.1 multi-channel recording systems since to make such changes you would be incurring the expense of a processor to undo the original expense of recording and storing the now superfluous center and rear surround tracks.

I Vant To Be Alone, Or, The Listening Mob Fallacy

The concept that dedicated music listening in the concert hall, theater, jazz venue, or at home is a group activity is superficial. Yes, there may be 2500 people in the opera house, but while the curtain is up there is, ideally, no interaction between them. Each member of the audience might just as well be sitting alone unless you believe in ESP. Listeners in a concert hall are also restricted as to the size of their sweet spot. They can't slouch to the floor or stand up, their permitted side to side or back to front movement is not extensive and there are plenty of seats in most halls where the sound and the view are not quite so sweet.

At home, how often does the gang come over to sit with you for five hours of Die Goetterdaemmerung? Certainly, serious home listening to classical music and to a lesser extent longer popular genres such as Broadway shows, new age, movie scores, jazz concerts, etc., is sad to say a solitary or at most a two person pursuit. Of course we all want to demonstrate our great reproduction systems to friends and family, but since these sessions usually last just a few minutes, one can show off the system to one or two people at a time and after everyone has heard it, at its best, from the sweet spot, the party can go on.

The point here is that it is difficult enough to correct the inherent defects of stereo and create a concert-hall caliber soundfield at home without making compromises in the design in order to unduly enlarge the sweet spot. Note that stereo, Ambisonics, VMAx, 5.1, 7.1, DTS, Ambiophonics, etc. all have listening box limitations that one must live with. Creating larger listening areas is the province of those concerned with reproducing the rapidly changing and moving surround sounds of video and movies both in theaters and homes. In the case of movies, compromises in fidelity to achieve 360 degree localization over larger listening areas at the expense of realism are barely noticeable. Likewise, compromises made to improve localization in PC virtual reality/multimedia systems (another solitary situation) are justified and often quite ingenious. However, the technologies and compromises appropriate for surround sound or virtual reality are not suited to the high-end caliber reproduction of recorded classical or popular music. The rules are simple. Let us see how they apply to two channel sound reproduction in general.

Why Stereo Can't Deliver Realism Without Some Fixing

By now, every one in the industry has recognized that when a two channel recording is played back through two loudspeakers that form an 60 or 90 degree angle from the listener, that each such speaker communicates with both ears, producing interaural crosstalk. The deleterious effects of this crosstalk at higher frequencies have been greatly underappreciated. For openers, crosstalk is what almost always prevents any sound source from appearing to come from beyond the angular position of the loudspeakers. This result is intuitively obvious, since if we postulate an extreme-right sound source, and can safely ignore the contribution from the left speaker, we can now hear the right speaker by itself, as usual with both ears, and no matter how we turn our heads the sound will always be localized to the right speaker as in any normal hearing situation. However, if we keep the right speaker sound from getting so easily to the left ear then the brain thinks that the sound must be at a larger angle to the right, well beyond the, say 30 degree position of the loudspeaker, since, as in the concert hall, the lesser sound reaching the left ear is now fully filtered by the head and the left pinna. So, for starters, stereo, because of its crosstalk, inadvertently compresses the width of its own sound stage.

A second, perhaps more serious defect, is also caused by this same crosstalk. For centrally located sound sources, two almost equally loud acoustic signals reach each ear, (instead of one as in the concert hall) but one of these signals, in the normal stereo listening setup, travels about half a head's width or 300 usec., longer than the sound from the nearer speaker. This produces multiple peaks and nulls in the frequency response at each ear from 1500 Hz up known as comb filtering. Since the nulls are narrow, and are muddied by even later crosstalk coming around the back or over the top of the head, and since the other ear is also getting a similar but not precisely, identical set of peaks and nulls, the ear seldom perceives this comb filtering as a change in timbre. But it can and does perceive these gratuitous dips and peaks as a kind of second, but foreign, pinna function and this causes confusion in the brains mechanism for locating musical transients. Remember, in real halls, the ear can hear a one degree shift in angular position, but not if strong comb-filter effects occur in the same 2-10 kHz region where the ear is most sensitive to its own intrapinna convolution effects and interpinna intensity differences. As long as this wrongful interaural crosstalk is allowed to persist, the sound stage will never be as natural or as tactile as it could be and for some people, such listening is fatiguing after awhile and all 60 (or similar) degree stereo reproduction sounds canned to them.

Pinna-Sensitive Front Speaker Positioning

Just as there are optical illusions, so there are sonic illusions. One can create sonic illusions by using complex filters to create virtual sound sources that float in mid air or rise up in front of you. As with optical illusions some people detect them and some people don't. The most prominent audio illusion is in stereo where phantom images are created between two speakers. You may have observed that most optical illusions are two-dimensional drawings, that imply three dimensions. Likewise there is something indistinct about the stereo phantom illusion. This is because the phantom image is largely based on lower frequency interaural cues and barely succeeds in the face of the higher frequency head and pinna localization contradictions.

The fact that earphone systems such as Toltec based processors, a host of PC virtual reality systems, SRS, Lexicon, etc. can move images in circles just by manipulating high frequency head and pinna response curves, even if not of great high-end quality, does show that these hearing characteristics are of considerable importance. Thus the direction from which complex sounds over 1500 Hz originate, particularly from the reproduced stage, should be as close to correct as possible.

Most stage sounds, particularly soloists and small ensembles, originate in the center twenty degrees or so. Remember that we want to launch sounds as much as possible from the directions they originate. Thus it makes much more sense to move the front channel speakers to where the angle between them to the listening position is perhaps ten degrees. This eliminates the pinna processing error for the bulk of the stage. But, of course, if the speakers are so close together, what happens to the separation? The answer is that with the crosstalk eliminated, as is necessary anyway, separation, as in earphone binaural, is no longer dependent on angular speaker spacing.

The Stereo Dipole

Crosstalk elimination is not a new concept to Ambiophonics. Most of the older electronic crosstalk elimination circuits such as those of Lexicon, Carver, Polk etc. assume the stereo triangle and have, therefore, had to make compromises to enlarge the sweet spot size over which they are effective. I would hesitate to class any of them as high-end components, especially as they still promote pinna position errors. But now a new crosstalk canceller from England, called a Stereo Dipole, assumes that the speakers are practically touching. Usually good crosstalk cancellers require complex compensation for the fact that the crosstalk signal being canceled has had to go around the head and over the pinna on its way to the remote ear. Since VMAx, Lexicon, etc. don't know what your particular head and pinna are like, they assume an average response and thus can't do a very good job of cancellation at high frequencies. If they try, most listeners experience phasiness, a sort of unease or pressure particularly if they move about. But when the speakers are in front of you there is not much of the head to get in the way and so the head response functions are much simpler, less deleterious if ignored or averaged, and head motions make little difference. Electronic stereo dipoles are just now appearing but you can easily achieve an inexpensive and truly high-end result using a simple three foot square six inch thick absorbent panel set on edge at the listening position. You get used to the panel rather quickly and it is a high-end tweak that needs no cables and produces no grunge. There is already a commercially produced panel from Echobusters that folds up for storage when not in use. Either electronic stereo dipole processors or panels allow complete freedom of head motion without audible effect and afford more squirm room at the listening position than one has in a concert hall. Two people can be accommodated comfortably but usually one need to be directly behind the other for optimum results, not unlike stereo.

One can have video with either arrangement but adding a picture can have its down side. One reason that so many listeners are impressed with the realism of movie surround sound systems is the presence of the visual image. While the research in this field is not definitive, it stands to reason that a brain preoccupied with processing a fast moving visual image is not going to have too much processing power left over to detect fine nuances of sound. Certainly, if you close your eyes while listening to any system, your sensitivity to the faults of the sound field is heightened. Thus when a seemingly great home theater system is used to play music only, without a picture, the experience is often less than thrilling. Adding a picture to Ambio seems to make fine adjustments to the ambient field much less audible, but one must observe that most people keep their eyes open at concerts and so perhaps an image is desirable to provide the ultimate home musical experience.

Nothing we have done to make the front stage image more realistic and psychoacoustically correct has required any extra recorded channels. I call all these changes to standard stereophonics, Ambiophonics or Ambio. Ambio, does not rely on the fluky phantom image mechanism. But there still remains one further difficulty with the stereo triangle and that is that we need a proper ambient field coming from more directions than just those of our now crosstalk-free, pinna-correct, front speakers.

The Case For Ambience By Reconstruction

Like the federal budget agreement, a method of achieving that air, space, and appropriate concert hall ambience at home, has technical devils in its details. The most obvious suggestion, based on movie and video surround-sound techniques, to just stick the ambient sound on additional DVD multi-channel tracks, on closer examination, just can't do it for hi-enders. The problem with using third, fourth or fifth microphones at or facing the rear of the hall and then recording these signals on a multi-channel DVD, is that these microphones inevitably pick up direct sound which, when played back from the rear or side speakers, causes crosstalk, pinna angle confusion, and comb filter notching. It is also pinnatically incorrect to have all rear hall ambience coming from just two point sources even if these surround speakers are THX dipoles. Remember, using rear dipoles implies a live listening room, which will thus also increase unwanted early reflections from the front speakers. Additionally, recording hall ambience directly is really not cost effective or necessary. Unlike movies, the acoustical signature of Carnegie Hall (despite its always ongoing renovations) does not change with every measure so why waste bits recording its very static ambience over and over again? It is much more cost and acoustically effective to measure the hall response once from the best seat (or several) for say five, left, right, and center positions on the stage (If the hall is symmetric, the measurement process is simpler) and either include this data in a preamble on the DVD, store it in your playback system or provide it as part of a DVD-ROM library of the best ambient fields of the world.

The process of combining a frontal, two (I hope) channel recording with the hall impulse response is called convolution and convolution is the job of the ambience regenerator which may be a PC or a special purpose DSP computer or it may be a part of the DVD/CD DAC. The use of ambience reconstruction would obviate the need for ARA type, 8 channel recordings at least where classical music is concerned and would leave plenty of DVD room for 24 bit words and 96kbps sampling. Unlike frontal sound, ambience can and should come from as many speakers as one can afford or has room for. Crosstalk, and comb-filtering are not problems with ambient sound sources if these signals are uncorrelated (unrelated closely in time, amplitude, frequency response, duration, etc.) which is normally the case both with concert halls and good ambience convolvers.

An Uphill Political Struggle

The cause of concert-hall early reflection and reverberation tail synthesis by digital signal processors (DSP) in computers or audio products was set back by the late Michael Gerzon, the Oxford Ambisonics pioneer, who wrote in 1974 "Ideally, one would like a surround-sound system (yes, he did use this term in 1974) to recreate exactly, over a reasonable listening area, the original sound field of the concert hall.... Unfortunately, arguments from information theory can be used to show that to recreate a sound field over a two-meter diameter listening area for frequencies up to 20 kHz, one would need 400,000 channels and loudspeakers. These would occupy 8 gHz of bandwidth equivalent to the space used by 1000, 625-line television channels!" This article of faith was quoted recently at an AES meeting in New York by Bob Stuart of Meridian (and the ARA) and is still a widely held belief in audio engineering circles.

Later, however, Gerzon did not let information theory prevent him from capturing a 98% complete concert-hall sound field using a single coincident array of four microphones. Indeed the complete impulse response of a hall can be measured and stored on one floppy disk by placing an orthogonal array of three microphone pairs at the best seat in the house and launching a test signal from the stage during the recording session or at any time.

Convolution to The Rescue

An audiophile-friendly approach to ambience reconstruction is to derive the surround speaker feeds by convolution of a two channel recording, preferably made using the microphone technique described below, that limits rear hall pickup. The questions to be asked are these:

How many channels of early reflections do we really need to reconstruct and where in the listening room should these speakers be placed?
How many channels of late reverberation do we absolutely need to generate and where should these speakers be placed?
How realistic sounding is the software now available to generate these fields?
What about the problem that each instrument on the stage produces a different set of early reflections?
There may never be a definitive answer to the first two questions. Just as there is no sure recipe for physical concert hall design, there is no best virtual concert hall specification. But, adjusting the number, placement, and shape of early reflections is easily more audible than changing amplifiers or cables and offers a tweaker delights that can last a lifetime. I can only say that in my own experience, just as there are thousands of real concert halls that differ in spite of being real, so there are thousands of ambience combinations that sound perfectly realistic even if not perfect. How do you get more real than real? Remember, absolute, particular hall parameter accuracy is not essential to achieve realism. By analogy, even if one sits on the side, in the last row of the balcony at Carnegie Hall where the ambience is lopsided, the sonic experience is still real.

In my opinion the best software for this purpose is based on impulse response measurements made in actual concert halls as was done by JVC and Yamaha some ten years ago for consumer products and is being done all the time by acoustical architects tuning auditoriums. Others, such as Dr. Dave Griesinger at Lexicon, create ambience signals using an imaginary model. I am not talking here about professional effects synthesizers that generate artifacts never heard by anybody in any physically existing space. Someday, I presume, we will have a DVD-ROM that contains the ambient parameters of Leo Beranek's 76 greatest concert houses of the world and with a simple mouse click, tweaking will yield to selection. With enough hall impulse responses stored, you could even select a seat and a stage width. (If its a solo recital one wants only central derived early reflections, if a symphony orchestra, the works, etc.)

While I may not be the best one at executing my own theories, I have gotten startlingly good results using the relatively primitive convolvers available. It is a rare AES convention that does not describe advances in the state of this art. Another important point is that ambience regeneration is scaleable. As computers get faster, and cheaper and as convolution software gets better, it is easy to upgrade or add more ambience speakers. The hall ambience storage method is also inherently tolerant of speaker type and the precise location or speaker response matter little and are akin to repainting the balcony or curving a wall in the concert hall.

The fact is that the brain is not all that sensitive to whether there are 30 early reflections from the right and only 25 from the left or whether they come from 50 degrees instead of the concert-hall ideal (according to Ando) of 55 degrees. If the reverberant field is not precisely diffuse or decays in 1.8 seconds instead of 2.0 seconds, that may only mean you are in Carnegie Hall instead of Symphony Hall. I make no claim to be an authority on setting ambience hall parameters, and I am sure many audiophiles could do better at this game. I now use two large area speakers at +/- 30 degrees two at +/- 55 degrees, two at +/- 70 degrees and two at +/- 90 degrees to launch left and right early reflections. I tilt these long speakers so that they really span some five degrees about these points to better mimic real halls. I similarly use eight, large area, tilted speakers to the sides and rear to provide a reverberation field as diffuse as possible.

Since central early proscenium reflections come from the recording via the main front speakers, these need not be regenerated and of course by definition they are natural and are coming from the right direction. By using the left channel to recreate left leaning early reflections (some of which may end up coming from the right) and the right channel to produce a set of right reflections, the early reflection patterns for different instruments on the stage have enough diversity to exceed the threshold of the brain's reality barrier. I doubt that additional front recorded channels would help in this regard and would very much aggravate the crosstalk problem and destroy the frontal, binaural coherence discussed in the first part of this preface.

Some Recent Developments On The Ambience Front

The NHK Science and Technical Research Laboratories in Japan have developed a method of reproducing concert hall ambience using walls of small loudspeakers covering the rear and the sides of the listening room. These prefabricated panels can contain hundreds of small loudspeakers depending on the size of the room. The panels also serve as room treatment and are easy to install.

To feed the speaker walls, the Japanese have devised a new mathematical randomizing method for generating many uncorrelated reverberation trains from a single real impulse response measured in a concert hall. Normally, (information theory again) one would have to measure a separate impulse response at hundreds of locations in the hall for these hundreds of loudspeakers in order to ensure that the reverberant field produced by all these speakers was truly diffuse (the same power but not identical in all directions). If too many speakers are fed exactly the same signal there will be comb filter effects and little interaural excitement. Some forty ambient signals were generated, using a scrambling algorithm, and applied to groups of physically close speakers. Even if one does not want to buy into speaker walls, the measured wave pressures throughout their room show that it is possible to simulate a realistic, diffuse reverberant field from a stored room impulse response starting with a simple stereo signal. They have also conclusively demonstrated that one can have realistic ambience without requiring gigabits of storage.

Another Japanese group, this time from Matsushita, has devised a compression algorithm to simplify the generation of ambient fields without loss of audible realism. Their technique reduces the number of processing cycles necessary to convolute the front signals to get natural sounding reverberation. I suspect however, that computers will continue to improve obviating the need to use such compression. For the present however, the commercial development of any ambience products for home, music loving audiophiles has been slowed in the face of the overwhelming demand for multi-channel surround sound for video, movies and PCs.

Whither Recording In An Ambiophonic Hi-End World

While audiophiles do not often concern themselves with recording techniques over which they have little control, but almost any LP or CD made with spaced microphones is greatly enhanced by Ambio playback. But one can heighten the accuracy, if not gild the lily of realism, by taking advantage, in the microphone arrangement, of the knowledge that in playback, the rear half and side part of the hall ambience will be synthesized, that there is no crosstalk, that the front loudspeakers are relatively close together and that listening room reflections are minimized.

To make a long story short, exceptionally realistic "You are there" recordings can be made by using a head shaped, pinnaless ball with holes at the ear canal positions to hold the microphones. The Schoeps KFM-6 is a good example of such a microphone even though it is a sphere and an oval would be slightly better. However, for best results, this microphone should be well baffled to prevent most rear hall ambience pickup. KFM-6 recordings are a feature of the PGM label, produced by the late Gabe Wiener who was a staunch advocate of this recording method, first expounded by Guenther Theile. As expected, these PGM recordings are exceptionally lifelike when played back Ambiophonically so as to be free of crosstalk or pinna distortion.

The reason such a microphone is optimum is that particularly for central sounds the sound rays reach the ears almost as they do in the concert hall. That is, one ray from a central instrument reaches the left ear of the microphone, goes to the left speaker where it is sent straight ahead to the left pinna and ear. The fact that the head response transfer function of the microphone is not the same as the listener's is not significant for central sound sources that don't cross either head. For side sources the microphone ball becomes a substitute for the listener's HRTF but at least there is still only one HRTF and one real pinna in the chain. Eventually listeners will be able to store their personal HRTF and pinna function in the playback computer and correct for any microphone HRTF anomalies they hear. In my own experience, I hear no deterioration until the sound stage width exceeds about 120 degrees. Perhaps the hardest part of migrating to Ambio will be to convince specialty audiophile recording engineers, who are usually rugged individualists, to use microphones and positions that are Ambio compatible.

Law of the First Impression

No matter how many great stereo systems I listen to, they still never have the impact that my first Emory Cook stereo disc had. Likewise, I still compare the multichannel systems I hear now to the mental image of air and presence I retain of the first RCA CD-4 true discrete quad LP of Mahler's 2nd I heard in the early 70's. The moral of this phenomena is that the first time anyone hears a major upgrade in reproduction, particularly when going beyond two speakers for the first time, they are always very favorably impressed. Dissatisfaction with systems like the Hafler arrangement, SQ, Dolby pro-logic etc only set in later. Unfortunately, this will be the scenario with the new discrete multi-channel format for music as well. At first 5.1 or even 7.1 sounds really exciting and a great contrast to stereo but in the end it fails as a realistic replica of the live music concert-hall experience.

Conquering Anechoaphobia

Audiophiles must get over their fear of room treatment (anechoaphobia) and embrace it enthusiastically. Even stereo, Ambiophonics, Ambisonics, VMAx, surround-sound, multi-channel, etc. can all be substantially `improved by eliminating early room reflections and reverb that conflict with what is on the recording. Finally, after the multi-channel furor has subsided, I believe some form of two-channel binaural technology, if not Ambiophonics per se, will emerge to serve audiophile music lovers.

Ch. 1 Ch. 2 Ch. 3 Ch. 4 Ch. 5 Ch. 6 Ch. 7 Ch. 8