Headphone Soundstage and Imaging: Definitions & Measurements

samvafaei · Apr 23, 2018

Hey guys,

So like last year, we've made a video explaining the philosophy and testing methodology behind a couple of our tests, and this one is about soundstage and imaging. I would appreciate any comments or suggestions.

Sam

maverickronin · Apr 23, 2018

There's some really good stuff in there, other things I'm unsure about, and some things are just wrong, though most of the later are applicable testing elements besides your soundstage methodology.

The PRTF is an absolutely excellent idea. I've always found that the the extent to which a driver/enclosure interact with my pinna, usually when the driver is farther away from my ear, to have a large effect on perceived soundstage an you've found a pretty good way to measure it.

I'm not so sure about how you go about assigning subcategory scores and weightings to everything though. All of that data you've collected has value, but the way it's combined into one score is pretty arbitrary.

What I know I don't like is using Harman-esque bass boost in the target curve. To a large degree, bass response in a headphone is pure preference since bass in headphones can never match the full body impact of either speakers or real life. There is always a trade off between sounding accurate and attempting to feel accurate so the best procedure is to leave the bass compensation flat and let individuals decide on their own preference.

ultrabike · Apr 23, 2018

I will check the video. But I have yet to get a good feel about soundstage with the absence of cross channels, and indeed motion (or any kind of queues) if one wants better placement resolution.

samvafaei · Apr 24, 2018

maverickronin said: ↑

I'm not so sure about how you go about assigning subcategory scores and weightings to everything though. All of that data you've collected has value, but the way it's combined into one score is pretty arbitrary.

What I know I don't like is using Harman-esque bass boost in the target curve. To a large degree, bass response in a headphone is pure preference since bass in headphones can never match the full body impact of either speakers or real life. There is always a trade off between sounding accurate and attempting to feel accurate so the best procedure is to leave the bass compensation flat and let individuals decide on their own preference.
Click to expand...

That's true about assigning weights to the values in each category for coming up with a score. What makes it even harder is that soundstage is quite difficult to isolate and judge, so there's a certain amount guess work that's gone in our scoring. But unless you make an attempt to solve a problem, you won't know what needs improvement.

The other thing is that some aspects of imaging/soundstage is influenced by other factors, like the treble response. For example, a headphone with a poor or lacking treble will have a poor imaging. But, if such a headphone has very good L/R driver matching, it willl score well in our imaging calculations. This is a limitation of our current scoring system, but we have some ideas on how to improve it.

We have an updated target curve in the works. In addition to updating our "official" neutral target response, we are planning on adding other targets too (like the B&K) so the user can sort the tables based on that. We are also thinking about allowing the user to adjust the bass and treble of each target by a few dBs. So hopefully this will cover for the individual preferences.

samvafaei · Apr 24, 2018

ultrabike said: ↑

I will check the video. But I have yet to get a good feel about soundstage with the absence of cross channels, and indeed motion (or any kind of queues) if one wants better placement resolution.
Click to expand...

Yes, crosstalk makes a big difference. I am yet to measure a headphone that has a significant and useful crosstalk.

maverickronin · Apr 24, 2018

samvafaei said: ↑

But unless you don't make an attempt to solve a problem, you won't know what needs improvement.
Click to expand...

I certainly agree with that.

samvafaei said: ↑

The other thing is that some aspects of imaging/soundstage is influenced by other factors, like the treble response. For example, a headphone with a poor or lacking treble will have a poor imaging. But, if such a headphone has very good L/R driver matching, it willl score well in our imaging calculations. This is a limitation of our current scoring system, but we have some ideas on how to improve it.
Click to expand...

Not sure how this relates but I do notice that many IEMs, especially balanced armatures, have excellent imaging even if the treble is shelved or rolled off. Right now, I'm having a hard time thinking of full size 'phones with similar traits.

samvafaei said: ↑

We have an updated target curve in the works. In addition to updating our "official" neutral target response, we are planning on adding other targets too (like the B&K) so the user can sort the tables based on that. We are also thinking about allowing the user to adjust the bass and treble of each target by a few dBs. So hopefully this will cover for the individual preferences.
Click to expand...

That's actually a pretty cool idea, but the programming sounds like a lot of work. I was thinking more along the lines of having a section explaining the trade-offs and how to compare graphs manually but an interactive system would be damned cool.

I think diffuse field and free field target curves are the best choices since they are the closest to the perceived frequency response of real life.

samvafaei said: ↑

Yes, crosstalk makes a big difference. I am yet to measure a headphone that has a significant and useful crosstalk.
Click to expand...

Are you referring to crossfeed circuits?

Acoustic crosstalk is diffraction of sound waves around your head and body which decreases with frequency. Standard electrical crosstalk in circuits is caused by capacitive coupling between the channels which increases with frequency, the opposite of what is needed. The only way any headphone or amplifier will mimic proper acoustic crosstalk is if it has a built in filter circuit for this purpose. I know of at least one headphone which does have such a filter built in, the discontinued Sony MA900.

I'm a huge fan of crossfeed and HRTF DSPs. For some reason they aren't very popular, even though they're essential to get anything resembling natural imaging out of a headphone since 99+% of audio content is mixed for speakers and assume levels of acoustic crosstalk and HRTF interactions which headphones lack. Listening to anything with much stereo separation on headphones will give me a headache pretty quickly unless I use some sort of crossfeed.

Having said all that, I think it's a bad idea to integrate a crossfeed filter directly into the headphone itself. It would work for something that's already a fully active DSP powered model like modern ANC sets but on regular audiophile headphones a passive circuit won't have the quality of an active one, let alone a DSP, and even adding a switch for it, you'd probably mess with the impedance, frequency response, and efficiency as well.

The reason you'd want a switch is so you could use a better crossfeed implementation when available from your source. I play music out of foobar2K with the TB Isone HRTF simulator VST plugin. In my video player I use a different HRTF simulator which will downmix 5.1 channels too binaural. For the rare binaural recording or game with a good binaural downmix I'd want to turn it off. For everything else my RME ADI-2 DAC has a DSP with a Bauer Binaural crossfeed implementation.

Basically, this is just a long way of saying that any crosstalk in the headphones themselves should probably count against their imaging and soundstage score since it interferes with any attempt to more precisely control it upstream and can limit what content or sources it can be used with.

ultrabike · Apr 24, 2018

maverickronin said: ↑

Not sure how this relates but I do notice that many IEMs, especially balanced armatures, have excellent imaging even if the treble is shelved or rolled off. Right now, I'm having a hard time thinking of full size 'phones with similar traits.
Click to expand...

I'm uneasy to assign specific imaging attributes to balanced armatures in general. I don't think all balance armature drivers covering the same frequency range behave the same.

I believe imaging (as in localization) is somewhat a FR response dependent attribute. I'm no expert on sound localization (at least when it comes to human perception), but it seems that as far as treble is concerned, internaural intensity difference is most important from 1500 Hz and up. If drivers are not matched and if the frequency response is shelved in the treble region, probably localization will suffer. And by extension, imaging.

If the claim that "treble is shelved or rolled off" is based on FR measurements, then I would question the measurements (and specifically the rig used to perform them). If the claim is based on tone sweeps or similar, I would caution about perceived equal loudness not being a linear function of frequency. Indeed such equal loudness curve may be even individual dependent.

maverickronin said: ↑

I think diffuse field and free field target curves are the best choices since they are the closest to the perceived frequency response of real life.
Click to expand...

I believe that diffuse and free field target curves are sufficiently off, such that even if one feels one is closest to real life, the other is necessarily off.

maverickronin said: ↑

Acoustic crosstalk is diffraction of sound waves around your head and body which decreases with frequency. Standard electrical crosstalk in circuits is caused by capacitive coupling between the channels which increases with frequency, the opposite of what is needed. The only way any headphone or amplifier will mimic proper acoustic crosstalk is if it has a built in filter circuit for this purpose. I know of at least one headphone which does have such a filter built in, the discontinued Sony MA900.

I'm a huge fan of crossfeed and HRTF DSPs. For some reason they aren't very popular, even though they're essential to get anything resembling natural imaging out of a headphone since 99+% of audio content is mixed for speakers and assume levels of acoustic crosstalk and HRTF interactions which headphones lack. Listening to anything with much stereo separation on headphones will give me a headache pretty quickly unless I use some sort of crossfeed.

Having said all that, I think it's a bad idea to integrate a crossfeed filter directly into the headphone itself. It would work for something that's already a fully active DSP powered model like modern ANC sets but on regular audiophile headphones a passive circuit won't have the quality of an active one, let alone a DSP, and even adding a switch for it, you'd probably mess with the impedance, frequency response, and efficiency as well.

The reason you'd want a switch is so you could use a better crossfeed implementation when available from your source. I play music out of foobar2K with the TB Isone HRTF simulator VST plugin. In my video player I use a different HRTF simulator which will downmix 5.1 channels too binaural. For the rare binaural recording or game with a good binaural downmix I'd want to turn it off. For everything else my RME ADI-2 DAC has a DSP with a Bauer Binaural crossfeed implementation.

Basically, this is just a long way of saying that any crosstalk in the headphones themselves should probably count against their imaging and soundstage score since it interferes with any attempt to more precisely control it upstream and can limit what content or sources it can be used with.
Click to expand...

I agree fully the source is the most practical and perhaps best place to implement DSP algorithms to improve soundstage and imaging.

maverickronin · Apr 24, 2018

ultrabike said: ↑

I'm uneasy to assign specific imaging attributes to balanced armatures in general. I don't think all balance armature drivers covering the same frequency range behave the same.

I believe imaging (as in localization) is somewhat a FR response dependent attribute. I'm no expert on sound localization (at least when it comes to human perception), but it seems that as far as treble is concerned, internaural intensity difference is most important from 1500 Hz and up. If drivers are not matched and if the frequency response is shelved in the treble region, probably localization will suffer. And by extension, imaging.

If the claim that "treble is shelved or rolled off" is based on FR measurements, then I would question the measurements (and specifically the rig used to perform them). If the claim is based on tone sweeps or similar, I would caution about perceived equal loudness not being a linear function of frequency. Indeed such equal loudness curve may be even individual dependent.
Click to expand...

It could definitely use further investigating, but all BA IEMs I've owned/heard that I and other forum goers usually consider dark or mid centric also have excellent imaging. Since you really have to listen to something to judge imaging at this point, my own determination is made from listening to them with music. Off the top of my head I can think of the SoundMagic PL50, MEE A151, and all the Shure BAs. Brighter ones I've heard continue the trend as well.

Not sure if this something inherent to the drivers, a coincidence in the BA IEMs I've heard, or a coincidence in the DD IEMs I've heard and am comparing the BAs to. BAs tend to have a very specific sound though, so I think the imaging could be related.

ultrabike said: ↑

I believe that diffuse and free field target curves are sufficiently off, such that even if one feels one is closest to real life, the other is necessarily off.
Click to expand...

They are quite different curves, but they're averages of acoustics in different environments. I don't remember the exact conditions off the top of my head but diffuse field is supposed to represent a closed room with semi-reflective surfaces while free field is either an anechoic chamber or just outdoors with no walls or ceiling.

Neither is perfect, and personally I prefer DF as it tends to match most recordings/mixes better than FF, but a better argument can be made for either FF or DF than the Harman curve since they're based on actual measurements instead of just listener preferences. OTOH, if you get rid of the bass boost, the Harman curve isn't that far away from DF.

ultrabike · Apr 24, 2018

Very interesting. Indeed if I recall correctly, the few BA based IEMs I've heard had a pretty characteristic sound to them. It may be down to it's distortion characteristics.

As far as target curves, and which one is accurate, it's a bit controversial. Measurements used to arrive at DF and FF target curves may be based on average human characteristics, which may poorly represent any given individual. Similar arguments could be made about listener preferences. Furthermore, I'm not fully convinced that a driver should reproduce room characteristics which are not already available in the recording.

However, as far as IEM performance evaluation is concerned, a case could be made about relative differences. For example, one may have difficulty making an absolute statement about how a particular IEM sounds relative to real life sounds given individual head dependencies (much is bypassed when using IEMs and volume driven may be different depending on individual). But perhaps one can make a few statements about how one IEM sounds relative to another.

maverickronin · Apr 24, 2018

ultrabike said: ↑

Very interesting. Indeed if I recall correctly, the few BA based IEMs I've heard had a pretty characteristic sound to them. It may be down to it's distortion characteristics.
Click to expand...

When comparing any measurements which include distortion spectra, that's the only categorical difference which jumps out at me but I'm unsure how that would translate to perceived imaging.

If I'm looking at Tyll's measurements, with only total levels of distortion, (and if I don't look at the impedance curve ) it gets a lot tougher to tell them apart. A ginormous bass boost is probably a dynamic but BAs can still do bass I'd consider unreasonable. IIRC BAs never or very rarely show any "jaggies" in their FR on Tyll's rig, but plenty of DDs don't have those either.

I'm having a hard time thinking of any other differences...

ultrabike said: ↑

As far as target curves, and which one is accurate, it's a bit controversial. Measurements used to arrive at DF and FF target curves may be based on average human characteristics, which may poorly represent any given individual. Similar arguments could be made about listener preferences.
Click to expand...

Yeah. They're not perfect. They probably don't fit anyone perfectly and won't work at all for some people. OTOH if you use a measurement rigs with pinna and want to release your results to the public you have to use a compensation curve if you don't want every other comment asking about the extra 12dB in the upper mids and if you have to pick one I think that DF and FF have the best reasoning behind them.

At least of the ones I know about. Always hopeful something better will come along.

ultrabike said: ↑

However, a case could be made about relative differences. For examples, one may have difficulty making an absolute statement about how a particular IEM sounds relative to real life sounds. But perhaps one can make a few statements about how one IEM sounds relative to another.
Click to expand...

I think trying to compare headphones to "real life" with music usually is a fools errand. Even minimally produced audiophile label stuff tends to have more variables than can be easily controlled for. I only mention DF and FF as being closest to real life since they originally based on averages of real people, but like I said above, they aren't perfect. Relative comparisons are much better since absolute comparisons are a lost cause without some kind of really elaborate protocol.

samvafaei · Apr 25, 2018

maverickronin said: ↑

Basically, this is just a long way of saying that any crosstalk in the headphones themselves should probably count against their imaging and soundstage score since it interferes with any attempt to more precisely control it upstream and can limit what content or sources it can be used with.
Click to expand...

It doesn't really matter how crosstalk is created, as long as it is good crosstalk (certain amount of roll-off in the treble is needed to avoid comb filtering). But since the goal is to have a speaker-like soundstage, then crosstalk is one of the major things to add. But yes, most likely it will have to be added using DSP, which then would be crossfeed and not crosstalk.

maverickronin said: ↑

Neither is perfect, and personally I prefer DF as it tends to match most recordings/mixes better than FF, but a better argument can be made for either FF or DF than the Harman curve since they're based on actual measurements instead of just listener preferences. OTOH, if you get rid of the bass boost, the Harman curve isn't that far away from DF.
Click to expand...

DF is the opposite of FF. So I wouldn't call it a semi-reflective environment, but it obviously is not infinitely reflective either, since such a thing doesn't really exist!

Having said that, neither of these curves sound good without any modification. I prefer the DF curve over the FF curve too. But the FF and DF curves that come with the HMS have their 3KHz peak at around 16dB (and flat on the 0dB line from 500Hz downwards). This is going to sound super bright! The most recent Harman over-ear curve peaks at 10dB, and their in-ear curve peaks at 12dB (and they have a lot of bass too!). So in the end, you have to multiply the DF/FF curve by a specific "slope" in order to get a more balanced sound. It seems some people don't like the slope to add any bass to the final curve, but it should definitely roll some treble off.

maverickronin · Apr 25, 2018

samvafaei said: ↑

Having said that, neither of these curves sound good without any modification. I prefer the DF curve over the FF curve too. But the FF and DF curves that come with the HMS have their 3KHz peak at around 16dB (and flat on the 0dB line from 500Hz downwards). This is going to sound super bright! The most recent Harman over-ear curve peaks at 10dB, and their in-ear curve peaks at 12dB (and they have a lot of bass too!). So in the end, you have to multiply the DF/FF curve by a specific "slope" in order to get a more balanced sound. It seems some people don't like the slope to add any bass to the final curve, but it should definitely roll some treble off.
Click to expand...

You're right about them being pretty bright but there are headphones that are pretty close to either DF or FF and have plenty of fans. Even Etymotic takes a few dB off the ER4SR and ER4S. They say it's to work better with the way music is mastered for speakers and have an ER4B model with full DF equalization they say is intended for pure binaural recordings.

ultrabike · Apr 25, 2018

I've tried a few of the Etymotic offerings. I know some folks like them. I personally think they suck horribly. IMO they are bassless bright pieces of shit that should be given to people you hate. But I've been told my lack of appreciation for them is probably due to my rather large ear canals. Which actually makes sense to me.

skem · Apr 25, 2018

@ultrabike : The default rubbery ear tips suck. If you use the foam comply tips of appropriate diameter, performance is much helped. Which did you try?

maverickronin · Apr 25, 2018

I quite like the frequency response on the ER4S and SR. Mids and vocals on them seem very natural to me. I had a pair of ER4s but got rid of the for other reasons.

First was the comfort. You need to use triple flanges or the extra long Complys and stick them in practically up to your eardrum or they'll get even brighter than they're supposed to be. I don't see large ear canals helping much. On top of that, they still stick out too far the other way and I can't loop them over my ears comfortably. I get annoying microphonics and they wobble around, irritating my inner ear.

Second is the fact that to get that frequency response from a single drive they push it too far. Tyll's distortion plots at 90dB and 100dB show rising distortion with level which fits the observation that they can't keep it together when I crank it because something awesome came up on my random play. OTOH, good multi-drivers can get cleaner at higher levels and single drivers with less aggressive tunings don't usually have quite as much rise in distortion either.

The XR has more low end, but likes high volumes even less.

I totally get why some people hate them. You need a really good fit to the intended sound signature, which isn't even all that popular to begin with, and anything less than a perfect fit makes it sound considerably worse. A good fit for the sound may not be very comfortable either so many people who could fit them properly probably settle on a fit which is more comfortable, but makes them even brighter.

The "ER" actually stands for "Ear Rape"

skem · Apr 25, 2018

On microphonics, I solved this by putting IEMs in so the strain relief points up and the wire loops over the top of my pinna and comes down the back. This damps all wire microphonics. I can even jog with them without hearing anything annoying.

ultrabike · Apr 25, 2018

As far as tips I think I tried whatever OJ had there, including triple flanges. Probably the triple flanges were too narrow and did not provided seal. Of whatever there was that did provided seal, they sounded like shit. And I don't mean that as just "Meh" shit. I mean that as total utter shit. Complete failure. Horrid in fact.

OJ loved them, probably because the tips worked for him and stuff.

There were other IEMs that we tried that did not require triple flanges. The ones that others thought were bassy, I thought were just right. Again, proly my inner ears are roomy.

In terms of comfort and looks the Etymotic IEMs are also a complete failure. They look like torture devices. They sound like torture devices (to me). They feel like torture devices.

I can picture Theon Greyjoy strapped to a chair with Ramsey Bolton laughing while holding the Etymotic case.

EDIT: This is somewhat why I think IEMs stuff is difficult. Even when folks may have similar tastes in full size headphones, they may disagree with IEMs. Relative impressions still seem to hold though.

briskly · Apr 25, 2018

For this sort of discussion, I would prefer a write-up over a video to respond to so I can more easily keep track of points asserted. This is more a personal gripe and not a suggestion that should necessarily be followed.

Comments/questions:

Proximity would be mostly related to lower frequency boundary effects related to the shoulders and torso. The pinna appears to have a much smaller effect, if any, in this role.

Is the phase response component a representation of the all-pass component of a headphone, after minimum-phase correction and compensation?

What is the function of PRTF here? It looks like ear decomposition, but the cochlea only detects the audio signal fed to it at the very end, not the intermediate spatial properties of the wave.

CSDs are a bit of a strange deal. I would think that a logarithmic approach would be more sensible. Perhaps more importantly, what useful information would be uniquely spotlighted in joint time-frequency analysis? I get less clear on this with time.

Other things:

A diffuse sound field has no relation to any natural environment. The classic approximation would be a specialized reverb room. Or, you could surround the listener with speakers and play them all back at once. It might look something like this.

Balanced armatures are inherently non-linear by conception, unlike the other driver methods that are based on linear principles. They have quite a few internal resonances, some for the open canal resonance, whereas others do not coincide so neatly. Air does not normally behave non-linearly at the usual sound pressure range, so I don't see much of a fit into sound source determination.

samvafaei · Apr 25, 2018

briskly said: ↑

Proximity would be mostly related to lower frequency boundary effects related to the shoulders and torso. The pinna appears to have a much smaller effect, if any, in this role.

Is the phase response component a representation of the all-pass component of a headphone, after minimum-phase correction and compensation?

What is the function of PRTF here? It looks like ear decomposition, but the cochlea only detects the audio signal fed to it at the very end, not the intermediate spatial properties of the wave.

CSDs are a bit of a strange deal. I would think that a logarithmic approach would be more sensible. Perhaps more importantly, what useful information would be uniquely spotlighted in joint time-frequency analysis? I get less clear on this with time.

Click to expand...

Not sure what you're referring to regarding proximity. Are you talking about "distance of the sound source" or the bass bump in our compensation curve?

For phase, we only look at the inter-channel phase performance, that is L/R phase-matching. So although we don't perform any compensation on it, it shouldn't really matter. But we do calculate the headphone's group delay, from their phase response.

PRTF is basically trying to find out how much the final frequency response (measured at the eardrum), was created using the pinna. In-ears don't interact with the pinna at all, so the spatial cues that you would get from your pinna are missing. Not only that, our reference PRTF is from a loudspeaker at 30 degrees (like in a stereo setup), so we also want to see how much the pinna interaction resembles a loudspeaker at 30 degrees.

We haven't implemented CSD yet, but chanes are if we do it, it'll end up in the Imaging category. It's kind of like the opposite of group delay. Or another way of thinking of it would be in terms of an ADSR envelope. Group delay is the attack, and CSD is the decay/release. Thinking of extreme examples usually helps to clarify these things: For a headphone with a group delay of 1s at 120Hz the bass is going to be late 1 second compared to the rest of the range. In the case of CSD, a decay of 1s at 120Hz would mean that the bass would linger on 1 second after the rest of the frequencies have faded out.

Real-life environments fall somewhere between a diffuse field and a free field for the most part...

ultrabike · Apr 25, 2018

At this point with IEMs, I think if sound is coming from somewhere near the center of ones brain, maybe kudos. If things are coming from the right or left lobe, maybe not kudos. Depending on recording of course.

(Unless maybe one goes wild with DSP stuff)

Headphone Soundstage and Imaging: Definitions & Measurements

samvafaei New

maverickronin Friend

ultrabike Measurbator - Admin

samvafaei New

samvafaei New

maverickronin Friend

ultrabike Measurbator - Admin

maverickronin Friend

ultrabike Measurbator - Admin

maverickronin Friend

samvafaei New

maverickronin Friend

ultrabike Measurbator - Admin

skem Friend

maverickronin Friend

skem Friend

ultrabike Measurbator - Admin

briskly Friend

samvafaei New

ultrabike Measurbator - Admin

Share This Page

ABOUT US

RELATED LINKS

REFERENCES

CONTACT US

Headphone Soundstage and Imaging: Definitions & Measurements

samvafaei New

maverickronin Friend

ultrabike Measurbator - Admin

samvafaei New

samvafaei New

maverickronin Friend

ultrabike Measurbator - Admin

maverickronin Friend

ultrabike Measurbator - Admin

maverickronin Friend

samvafaei New

maverickronin Friend

ultrabike Measurbator - Admin

skem Friend

maverickronin Friend

skem Friend

ultrabike Measurbator - Admin

briskly Friend

samvafaei New

ultrabike Measurbator - Admin

Share This Page

Useful Searches