Table of Contents
Intro
Standardized vs. Subjective Language
1. THE BIG PICTURE: UNDERSTANDING FREQUENCY RESPONSE
The Sound Quality Glossary
1. A
2. B
3. C
4. D
5. F
6. G
7. H
8. L
9. M
10. N
11. O
12. P
13. R
14. S
15. T
16. V
17. W
Conclusion
Comments

Sound Quality Glossary

Speakers

Updated Jul 04, 2023 at 12:20 pm

By Becca Fischer

If you're not in the know, it can be hard to make sense of all the terms used to describe how a product sounds. Words like 'hollow', 'crisp', and 'lush' often feel akin to a sommelier describing wine notes: abstract, poetic, and personal. Many terms used in the audio community are also subjective—if you ask one person what 'boomy' means, you may get a bit of a different answer than another person.

You're in the right place if you don't know what these words mean or want to brush up on your audio lexicon! This article covers the general landscape of descriptive audio language and broad terms related to frequency response. At the end of this article, there's also an audio glossary with definitions of commonly used sound descriptors. As we mentioned, many of these terms are subjective, and some don't even have commonly agreed-upon definitions. They don't exist in a vacuum either and are often related (and used) in combination. However, knowing these terms can help orient you when you want to pick up a new speaker, especially if you can't listen to it before buying.

Standardized vs. Subjective Language

If you think there has to be a better, more consistent way of describing sound quality, you're not alone. There has been a movement in recent years to standardize it. Enter Report ITU-R BS.2399-0, or "Methods for Selecting and describing attributes and terms, in the preparation of subjective tests" (2017). This document was created by the International Telecommunication Union (ITU) to build a consensus vocabulary: a lexicon of agreed-upon terms attributed to sound reproduction. This means that whenever you encounter words like 'nasal' or 'full', you can be assured that they always refer to the same phenomenon.

Rep ITU-R BS 23990-0 audio wheel The audio wheel created by the ITU found in Report ITU-R BS.2399-0 (03/2017).

However, just because there's a standard doesn't mean everyone uses it. Consensus vocabulary is useful when objectively assessing sound professionally, as the language is precise. Everyday listeners have a different relationship with sound quality than industry experts. As a result, consensus vocabulary doesn't neatly match common usage. For example, what the ITU calls 'boomy' may be different than in audiophile communities. Unless there's a push to use these terms outside of these formal settings, people will continue to use whatever words are available.

Even if casual definitions don't neatly align with the standardized definition, it doesn't delegitimize their usage or meaning. As you'll see below, our audio quality glossary covers both terms used in the 2017 report and common terms in the audiophile community. It's not an exhaustive list of terms; not all terms are agreed upon. However, if you're shopping for a new speaker, you'll come across these terms somewhere.

THE BIG PICTURE: UNDERSTANDING FREQUENCY RESPONSE

Before we jump into the nitty-gritty, it's helpful to take a step back and focus on how we talk about the sound profile. If you're wondering how we test sound, you'll want to check out our article on frequency response accuracy. Reading the sound profile or frequency response graphs can help determine if a sound is right for you. However, you'll also want to check out our soundstage and dynamics tests to get a feel for a product's spatial capabilities and the overall clarity of audio reproduction. These aspects of sound quality have their own terminology, which you can also find below this section.

There are some common forms a sound profile can take. We'll cover the following:

Neutral
Bass-heavy
V-shaped/excited

Neutral

The Sonos Era 300 has a neutral sound profile.

A neutral frequency response has a balanced bass, mid, and treble. Each range is proportional to the next, ensuring one doesn't dominate the other. This means that audio sounds as natural and accurate (as it's intended by the audio engineers who mixed that content) as possible. This sound profile is also good if you're looking to mix and master audio. It's important to note that a sound profile that follows our target curve may not be the most neutral choice for your needs, though—our target is based on what most people will enjoy. For example, some speakers like the Sonos Era 300 lack a touch of low-bass, though this is pretty common for speakers with smaller and more compact designs.

Bass-Heavy

The Sony SRS-XP500 is a party speaker with a bass-heavy sound profile.

Bass-heavy refers to more bass compared to the mid and treble ranges. Some users prefer this sound as it helps further emphasize the thump, rumble, and boom in genres like EDM and hip-hop. That said, if the extra bass extends into the mid-range, the bass can affect the clarity and presence of vocals and instruments. When there's a lot of bass and a lot less treble, this is a dark sound profile. A lot of casual-use speakers, like the Ultimate Ears MEGABOOM 3, have bass-heavy sound profiles, as well as more party-oriented speakers like the Sony SRS-XP500.

V-Shaped

The SOUNDBOKS (Gen. 3) has a V-shaped sound profile.

A V-shaped sound profile, also known as an excited sound profile, has overemphasized bass and treble. The extra bass adds thump, rumble, and boom, while the extra treble keeps vocals and instruments from getting lost in the mix by drawing attention to their detail and articulation. This sound is well-suited for audio with many highs and lows like rock and pop. A less severe V-shaped sound profile is called a U-shaped sound profile. This sound profile has fewer recessed mids than the V-shaped one, resulting in a more balanced mix. Although there's a difference between a V-shaped and a U-shaped sound profile, you can use these terms interchangeably.

The Sound Quality Glossary

So, now that you have an idea regarding general sound profile trends, it's time to dive deep into more specific language to help us fine-tune our sound quality knowledge. Remember that just like wine tasting, people will experience and describe sound differently. As a result, some terms aren't completely agreed upon or utilized similarly by users.

A

Airy: A spacious and open sound in the high-treble (from 11kHz-20kHz).
Analytical: A bright sound that’s very detailed. Ensures that vocals, instruments, and sibilants stand out from the mix. However, some people may find this sound fatiguing.
Articulate: Clear vocal and instrument detail.

B

Bassy: Emphasized bass from the low-bass to the low-mid (20Hz-500Hz). Mixes have extra thump, punch, and boom, good for bass-heavy genres like EDM and hip-hop.
Balance: 1) Related to tuning the sound profile: when a sound is well-balanced, all frequency ranges have similar levels, and one range doesn't dominate another. 2) Related to the soundstage: refers to how centered the soundstage is. Sound is well-placed and isn't skewed to the left or right.
Bloat: Excessive amplitude around the high-bass to low-mid (250Hz). It isn't very pleasing to the ear since it negatively affects the presence of vocals and instruments.
Boomy: Excessive amplitude around the mid to high-bass (125Hz). Some people may like this when it comes to gaming since it can help emphasize sound effects like gunshots.
Boxy: An over-emphasis around low-mid (250-500Hz). This sound has resonances as if the music is enclosed in a box.
Bright: Overemphasized response in the treble range. It's not necessarily bad; it can help make vocals and instruments sound clearer in the mix.

C

Cool: The progressive attenuation of frequencies below the high-bass (150Hz). Mixes lack body and warmth and sound light on bass.
Clean: Audio with clear and accurate reproduction of vocals and instruments. There aren't distortion or compression artifacts.
Clear/clarity: Adequate high-mid to low-treble (1kHz-5kHz). Detail like the articulation of woodwind instruments are easy to hear in the mix.
Cluttered: Related to muddy. An overemphasis of low-mid (200-500Hz) that makes the mix sound less clear.
Clipping: Distortion artifacts like a hiss, click, or skip that are overly pronounced in your audio. It usually occurs when you push your audio too loud.
Crisp: Related to clean. Audio reproduced with clarity and accuracy. Lacks any distortion or noise.
Compression artifacts: Noise that appears when there's too much dynamic range compression. The greater the difference in frequency response between moderate and high volume, the more likely you are to experience compression artifacts like pumping and hissing.

D

Dark/Dull: Underemphasized low-treble (2kHz-5kHz), creating a tonal balance that tilts downwards. This results in vocals and instruments that lack detail and clarity.
Delicate: A smooth treble range extending to the high-treble (15-20kHz) without peaks. The treble is pleasing to the ear.
Detail: The delicate and subtle parts of the audio, like how an instrument reverberates in a room or a musician's breathing. Detail is easy to lose in mixes. Detail is related to an adequate low-treble (2kHz-5kHz), which isn't overemphasized.
Depth: The perceived distance of sounds away from the listener. A lot of depth makes sounds feel like they're coming from around you. When the depth is shallower, audio feels like it's coming from inside your head.
Directivity: Related to soundstage. Refers to whether a speaker is front-facing or omnidirectional. An omnidirectional speaker provides a consistent sound from all angles, while a front-facing speaker is inconsistent at certain angles.
Distortion: Unwanted frequencies that play simultaneously as desired frequencies, usually at high volumes. This results in an unclear and impure sound. Examples of distortion are clipping and hissing.

F

Forward: Overemphasized mid-mid (500Hz-1kHz). This pushes vocals and instruments to the front of the mix above the rest. If there's a lot of overemphasis, it can also sound aggressive.
Full/Full-Bodied: Strong fundamentals relative to harmonics. Adequate mid-bass to low-mid (around 100-300Hz) response. Male voices feel full in the high-bass range (around 125Hz), while female voices and some instruments like violins are full in the high-bass to low-mid (around 250Hz).

G

Gentle: Treble range is a bit underemphasized or not exaggerated. The underemphasis isn't enough to weaken the detail of vocals and instruments, but it rounds off their edges.

H

Hard: Too much overemphasis in the low-treble (around 3kHz), which makes the upper harmonics of vocals and instruments much more present and detailed compared to the rest of the response.
Harsh: Peaks in the low to mid-treble (around 2 and 6kHz). The upper harmonics of vocals and instruments have overemphasized detail and articulation, which is unpleasant to hear.
Hissing: A kind of distortion artifact that sounds like a sizzling noise.
Hollow: A recessed mid-range that negatively hurts the presence and clarity of vocals and instruments, making them sound thinned out and pushed to the back of your mix.
Honky: Over-emphasized mid to high-mid (500Hz-2kHz) ranges. Vocals and instruments are pushed forward in the mix, further emphasizing their intensity and clarity.

L

Laid-back: Slightly underemphasized mid-range, making vocals and instruments less present and clear in the mix. This can be desired when paired with a strong bass. The term is also related to hollow, which is more extreme and negatively hurts the presence and clarity of vocals and instruments.
Lush: Strong mid-bass to low-mid (around 100-300Hz). It's pleasing to the ear as bass, like drums, sound full-bodied and warm.

M

Mellow: Slightly underemphasized high frequencies, but doesn't completely weaken vocals, instruments, or sibilants. Vocals and instruments still sound present but soft.
Muddy: Overemphasized bass, which negatively affects vocals and instruments, rendering the mix unclear.
Muffled: Weak high frequencies. Vocals and instruments in the treble range lack detail and clarity, making them unclear or fuzzy.

N

Narrow: The perceived lack of distance of sounds in the soundstage. It feels like sound is just coming from in front of you rather than all around you. The opposite of this term is wide.
Nasal: A bump in the mid-mid (around 600Hz), pushing forward vocals and instruments ahead of the rest of the mix.
Natural: Realistic or true-to-life sound. You can use this term to describe any sound that's accurately reproduced but tends to be used around vocals and instruments in and around the mid-range.

O

Open: The depth and width of the sound as it relates to the soundstage. Also refers to a perception of wide space between instruments, which helps them sound more distinct in the mix.

P

Piercing: Narrow peaks in the low to mid-treble (around 3-10kHz). Can hurt to hear.
Punchy: 1) Overemphasis in high-bass (around 200Hz), which makes the fundamentals of most instruments stand out. 2) Overemphasis between the low to mid-treble (around 5kHz). The upper harmonics of vocals and instruments are detailed and crisp.
Presence: Dependent on the low to mid-mid (around 220Hz-1kHz). Vocals and instruments that are clearly defined in the mix and aren’t cluttered by other sounds.
Pumping: Caused by compression, this noise makes audio content quickly louder than quieter.

R

Recessed: When a range is underemphasized compared to other ranges.
Reverberation: Sound that continues, even though the audio source has stopped.
Roll-off: Usually occurs at the beginning or end of the frequency response as it tapers off and becomes underemphasized. Bass roll-off can cause tracks to feel light on thump, rumble, and boom. In contrast, treble roll-off makes mixes sound veiled and dark.
Rumbling: Balanced low-bass (35Hz range and below). Genres like EDM and hip-hop produce these sounds, which can be mostly felt and not heard.

S

Shrill: Very sharp mid-treble (around 5kHz-10kHz), which makes sibilants like S and T sounds painful to hear.
Sibilant: Related to sibilants, which are a kind of consonant with high amplitude and pitch, like S and T sounds. This sound quality comes from an overemphasized mid-treble (around 5kHz-10kHz), making sibilants like cymbals sound piercing.
Smooth: Flat response, especially in the mid-range, where vocals and instruments reside. This sound is easy on the ears.
Sparkle: Overemphasized treble response that feels energetic and positive.
Sweet: Flat treble response with low distortion. No peaks in the response, so it doesn't feel piercing or harsh.

T

Thumpy: Over-emphasized low-bass (35Hz range and below). It's mostly felt rather than heard.
Timbre: The sound quality of a note.

V

Veiled: Recessed low-treble (around 2kHz-5kHz), which weakens the details and clarity of vocals and instruments, making them sound softer and rounder.

W

Warm: Less emphasis in the treble range, but more emphasis in the high-bass to mid-range. Warm mixes pack extra bass without completely overwhelming vocals and instruments. When there's too much warmth, it can start to sound dark since the overemphasis in the bass to mid-range is much more prominent than the treble range.
Weighty: Good frequency response in the low-bass (below 50Hz). A sense of substance produced by deep and controlled bass.
Wide: Related to soundstage. A wide soundstage makes it seem like audio is coming from all around you rather than just in front of you. The opposite of this term is narrow.

Conclusion

There are a lot of ways to describe sound. While there has been work done to standardize audio terminology, a variety of terms are still used by the community. It's important to take written descriptions of sound with a grain of salt. Ultimately, your preferences will decide whether a speaker's sound quality suits your needs.

To access

and

Become an Insider