A song, an algorithm, and the end of the analog world

There's a moment in the history of technology when everything changes. We don't always recognise it. Sometimes it takes years to understand that a small spark, an apparently insignificant detail, ignited a revolution that would forever change the way we live, communicate, and consume culture. In 1987, an American singer-songwriter named Suzanne Vega released a minimalist track called “Tom's Diner”. Two minutes and nine seconds of a cappella vocals, no instrumental accompaniment, no special effects. Just a voice telling the story of an ordinary morning in a New York diner. A song so essential, so pure in its simplicity, that someone on the other side of the world – a German engineer obsessed with #audio compression – would use it as a benchmark to create a technology that would shake the global music industry to its core. That technology was called #MP3. And that voice, that “warm a cappella voice” as Karlheinz Brandenburg would later describe it, would become the ultimate test to determine whether a compression algorithm actually worked or not.

This is the story – part documented reality, part urban legend – of how a folk song became the unwitting mother of the greatest revolution in music distribution since vinyl. A story that has always fascinated me because it contains all the contradictions of our digital age: innovation and destruction, democratization and loss of quality, openness and control. And yes, it's also because I've always had a soft spot for stories that intertwine in unexpected ways. Perhaps because I too, during my years in radio, saw first-hand what it means to work with audio, manipulate it, compress it, broadcast it. Perhaps because, like many of us who lived through the transition from analog to digital, I still carry the memory of those first MP3 collections downloaded via a 56k modem (crimes do become time-barred after 20 years, right?). But above all, this story fascinates me because it reminds us that behind every technological innovation there's always a human element: a voice, an aesthetic choice, an obsession. And in the case of MP3, that human element was precisely Suzanne Vega's voice singing about coffee and rain on a November morning.

Late 1980s: the race for compression

To understand how “Tom's Diner” ended up in the laboratories of the #Fraunhofer Institute, we need to step back and understand what was happening in the world of digital audio in the late 1980s. The CD had arrived in 1981, bringing the promise of perfect audio quality, crystalline, immune to scratches and the wear of time. But there was a massive problem: digital audio files were enormous. A three-minute song, encoded in PCM (Pulse-Code Modulation) format at 44.1 kHz and 16 bits, occupied around 30-35 megabytes. An entire album? Over 600 megabytes.

To put this in perspective: in the 1980s, the portable listening revolution was the Sony Walkman, which played analog cassettes. With the arrival of CDs, Sony launched the Discman, but these portable CD players were bulky, drained batteries, and skipped at the slightest movement. The idea of carrying an entire record collection was still science fiction.

In an era when a 40MB hard drive was considered gigantic, these numbers were simply impractical. You couldn't think of transmitting music via the internet – which was still an academic and military network – nor of efficiently archiving it on home computers. A radical solution was needed: audio had to be compressed while maintaining acceptable quality. This is where the small city of Erlangen, in Bavaria, enters the scene. Not exactly Silicon Valley, but a German town with a long tradition of scientific excellence. Here was the headquarters of the Fraunhofer Institute for Integrated Circuits, a research centre that would forever change the way we listen to music. The team was led by a man named Dieter Seitzer, who had worked for years on psychoacoustics – that branch of science studying how humans perceive sounds. Seitzer had a vision: to find a way to transmit high-quality music through ISDN telephone lines. It seemed like science fiction, but his doctoral student, a young engineer named Karlheinz Brandenburg, was convinced it was possible. The underlying idea was elegant in its simplicity: the human ear isn't perfect. There are frequencies we don't hear, sounds that get “masked” by louder ones, sonic details that our brain simply discards. Why waste disk space for information we can't perceive anyway?

The goal, therefore, was to create an algorithm that eliminated everything the human ear couldn't distinguish, reducing an audio file to a tenth of its original size without the average listener noticing the difference. But the competition was fierce. In 1989, when the Moving Picture Experts Group (MPEG) – the international standardisation organisation – issued a call for audio codec proposals, 14 candidates arrived from around the world. Among them were AT&T Bell Labs in the United States, Thomson in France, Philips in the Netherlands, and naturally the Erlangen team with their algorithm called ASPEC (Adaptive Spectral Perceptual Entropy Coding). It was a race where whoever demonstrated the most efficient algorithm won: maximum compression, minimum perceptible quality loss. And to prove it, tests were needed. Many tests. Obsessive, maniacal tests, repeated hundreds, thousands of times. In other words, a reference song was needed. A song that would put the algorithm to the most ruthless test possible.

Why that voice?

Several versions exist of how Brandenburg discovered “Tom's Diner”. In one interview, he tells of hearing it on the radio while walking down a corridor. In another, he says he read about this song in a hi-fi magazine that used it to test high-quality speakers. The stories change, overlap, contradict each other. Brandenburg himself has given different versions over the years. But one thing is certain: when he heard that voice, he immediately knew he had found his ultimate test.

“I was ready to fine-tune my compression algorithm,” Brandenburg recalls in a 2009 interview, “and somewhere down the corridor a radio was playing Tom's Diner. I was electrified. I knew it would be nearly impossible to compress this warm a cappella voice.”

And it's precisely in that phrase – “nearly impossible” – that you understand the challenge. The human voice is the most difficult instrument to compress. Evolutionarily, our ears are optimised to recognise voices. We evolved to hear nuances, emotions, the micro-tonal variations that distinguish one person from another, that tell us if someone is happy or sad, sincere or lying. Voice is the primary interface of human communication, and our brain has developed sophisticated mechanisms to analyse it. For this reason, any artifact, any distortion introduced by compression, immediately jumps out when dealing with voice. If MP3 could faithfully reproduce Suzanne Vega's voice, then it could handle anything.

But why “Tom's Diner” specifically? What made this song so special?

First: it's an a cappella recording. There are no instruments to mask or distract. There's no powerful bass covering the low frequencies, no electric guitars filling the mid-range. It's just voice. Naked, exposed, with nowhere to hide. Second: it's an exceptionally high-quality recording. It was recorded at A&M Records studio with professional equipment, meaning it captures all the nuances, all the breaths, all the details of Vega's performance. There's no background noise that might mask compression artifacts. Third: Suzanne Vega's voice has a particular timbre – warm, intimate, with that touch of huskiness that makes it instantly recognisable. It has an interesting dynamic range, with more whispered passages and more assertive ones. It is, in essence, an acoustically “complex” voice.

Brandenburg began working obsessively on that song. He listened to it hundreds of times a day, modifying the algorithm, listening again, modifying again. It was an exhausting, maniacal process. Every time he made a change to the code, he had to listen again to verify whether the result was acceptable or not. The problem was that where instrumental music still sounded acceptable, the voice became a disaster.

Brandenburg had to keep refining, optimising, adjusting the algorithm until that voice sounded good, until he managed to capture that warmth, that intimacy, that human quality that made “Tom's Diner” so special. To be fair, “Tom's Diner” wasn't the only song used in testing. Brandenburg and his team also used other tracks: “Mountains O' Things” by Tracy Chapman, “In All Languages” by Ornette Coleman, “Diamonds on the Soles of Her Shoes” by Paul Simon. James Johnston, from the AT&T team working on a competing algorithm, also used some of these tracks. But “Tom's Diner” became the symbol, the ultimate test, the benchmark. If the algorithm could reproduce that voice, it could reproduce anything.

1992: the MPEG Audio Layer-3 Standard is born

The hard work paid off. In 1992, after years of comparative testing conducted by independent institutes, the MPEG committee approved the MPEG-1 Audio Layer-3 standard. Brandenburg's team had won the competition. Their algorithm had proven superior to the others, capable of compressing audio by a factor of 10-12 while maintaining quality that most listeners judged “indistinguishable” from the original. But no one, at that moment, could imagine what was about to happen. MPEG-1 included three audio encoding layers: Layer-1, Layer-2, and Layer-3. Layer-3 was the most complex and most efficient, but also the most computationally demanding. In the early 1990s, home computers were still too slow to encode audio in Layer-3 in real time. It was cutting-edge technology, but without immediate practical applications. Layer-2, simpler and less efficient, was adopted for Digital Audio Broadcasting (DAB) in Europe. It seemed that Layer-3 – what would later become MP3 – was destined for a marginal role, a technical curiosity for audiophiles with powerful computers.

Brandenburg himself had already developed a successor called Advanced Audio Coding (AAC), which was even more efficient than MP3. It seemed Layer-3 was destined for oblivion before it even took off. And then 1995 arrived. Two things changed everything: the World Wide Web and Windows 95. The Web was exploding. Suddenly, millions of people had internet access and wanted to share things: images, texts, and naturally, music. But connections were incredibly slow – 28.8k modems, if you were lucky, that took hours to download files of just a few megabytes. A format was needed that allowed music sharing in reasonable sizes. Windows 95 brought increasingly powerful computers into millions of homes, with processors capable of decoding compressed audio in real time. And, crucially, Windows used three-character file extensions to identify file types. On 14 July 1995, with a simple internal email at the Fraunhofer Institute, Layer-3 got its definitive name: .mp3

Date: Fri, 14 Jul 1995 12:29:49 +0200
Subject: File extension for Layer 3: .mp3
Hello, In light of the overwhelming consensus of the survey participants, 
the file extension for ISO MPEG Audio Layer 3 is .mp3

Three letters that would change the history of music.

But MP3 still needed a catalyst to take off. That catalyst arrived in the form of software. Brandenburg and his team, perhaps sensing the possibilities, perhaps just to experiment, developed a software player for Windows. They released it for free. Other developers began creating MP3 encoders, some legal with Fraunhofer licenses, others less so. The format spread virally, completely beyond its creators' control. And when #Napster arrived in 1999 – the peer-to-peer file sharing service – MP3 became the standard format for large-scale music piracy. The record industry, caught completely off guard, cried scandal. Metallica protested (anyone who remembers that period raise your hand...). But it was too late. The genie was out of the bottle.

The Irony: A Lossy Technology to Democratise Music

There's a profound irony in all this. MP3 is a “lossy” technology – with loss of information. Every time you compress an audio file to MP3, data is lost. Permanently. It's not reversible. An MP3, technically speaking, is a degraded version of the original. Yet this “imperfect” technology democratised access to music in a way no one could have predicted. It made it possible to have an entire record collection in your pocket. It allowed millions of people to discover artists they would never have listened to otherwise. It gave independent artists the ability to distribute their music without needing record labels. Brandenburg himself always had mixed feelings about MP3's success. On one hand, he was proud that his technology had had such an enormous impact. On the other, he was frustrated that many people used low bitrates – 128 kbps or less – that produced obvious sonic artifacts.

MP3 at 320 kbps sounded excellent, practically indistinguishable from the original for most listeners. But for reasons of space and download speed, many settled for lower quality. And then there was the piracy question. Brandenburg had never imagined his technology would be used primarily to violate copyright on an industrial scale. The Fraunhofer team had worked for years on copy protection systems, DRM, digital watermarking. But none of these technologies were ever effectively implemented in the MP3 ecosystem that developed in the wild (but beautiful) west of the internet at the end of the '90s. In a 1994 interview, Ricky Adar – an Indo-British entrepreneur – said to Brandenburg: “Do you know that you will destroy the music industry?”

Brandenburg, at the time, thought it was an exaggeration. It wasn't. MP3 didn't destroy the music industry in the literal sense – music still exists, artists continue to create, people continue to listen. But it radically transformed it. The business model based on selling physical albums collapsed. Record labels lost their power, only to reorganise and regain it in subsequent years. Distribution became democratised. And all this thanks to a mathematical formula that eliminated frequencies the human ear struggles to perceive.

How MP3 compression actually works

Behind the “magic” of MP3 lies solid mathematics. The algorithm is based on four fundamental pillars:

MDCT Transform The audio signal is broken down into 576 samples per frame, transformed from the time domain to the frequency domain. Basically, instead of having a waveform, we get a spectrum.

Psychoacoustics The algorithm calculates which frequencies are “masked” by louder ones. Example: if there's a very powerful drum at 100 Hz, our ear won't hear a weak sound at 110 Hz. Why waste bits encoding it? The psychoacoustic model divides the spectrum into 32 critical bands that correspond to the frequency resolution of the human ear.

Quantisation The “important” frequencies (those we hear) are encoded with more bits. Those masked or barely audible are coarsely quantised or eliminated entirely. A sound at 15 kHz, almost at the limit of audibility, might be represented with 2-3 bits instead of 16.

Huffman Coding The already compressed data is further compressed with entropy coding. More frequent patterns get shorter codes.

Numerical result: PCM Audio: 44100 samples/sec × 16 bits × 2 channels = 1411.2 kbps MP3 at 128 kbps: compression ratio 11:1 MP3 at 320 kbps: compression ratio 4.4:1

Suzanne Vega discovers she's the mother of MP3s

For years, Suzanne Vega had no idea of the role her song had played in MP3 development. It was the year 2000. Vega, by then an established artist with a consolidated career, was taking her daughter to nursery school. A father approached and congratulated her on being “the mother of the MP3”. Vega had no idea what he was talking about. The man explained he had read an article – hyperbolically titled “Ich Bin Ein Paradigm Shifter: The MP3 Format is a Product of Suzanne Vega's Voice and This Man's Ears” – that recounted how Brandenburg had used “Tom's Diner” to develop the compression algorithm. Vega was astonished. Her song, that small intimate track she had written in the 1980s while attending Barnard College, had become a fundamental piece in the history of digital technology.

In 2007, Vega was invited to the Fraunhofer Institute in Erlangen. Brandenburg and his team played her how “Tom's Diner” sounded in the early versions of the algorithm, before it was refined. It was, in Brandenburg's own words, “horrible”. The voice was distorted, full of artifacts, almost unrecognisable. They then showed her how they had worked for months, iteration after iteration, to capture that vocal quality that made the track special. They explained the psychoacoustics, the listening tests, the obsession with detail. Vega, who had always been attentive to the quality of her recordings, appreciated the irony: a song recorded with maniacal care had helped develop a compression technology that, in a sense, sacrificed part of that quality for practical reasons.

And there's another irony in this story. In 2012, Vega was invited to the Thomas Edison National Historical Park in New Jersey. There, she sang “Tom's Diner” – the song that had become the symbol of the digital revolution – recording it onto an Edison cylinder, one of the oldest and most analog recording technologies in existence. It was a symbolic gesture: bringing the song back to its analog roots, recording it with technology that predated even vinyl by decades. And naturally, someone took that Edison cylinder recording and converted it to MP3, closing the circle in a way that only modern technology could allow. The Museum of Portable Sound made that MP3 file available – an analog wax recording of the track that defined digital audio compression – as a gift for enthusiasts. An act that symbolically connects the Edison era to the Spotify era.

From Walkman to Spotify, via iPod

Before the iPod: for twenty years, from 1979, the Sony Walkman had dominated portable listening. First with cassettes, then with the Discman for CDs. But you always had a physical limit: one cassette, one CD at a time. Pre-iPod MP3 players – like the MPMan F10 of 1998 – promised to solve this problem, but with only 32MB of storage (about 8 songs at 128kbps) they were little more than technological curiosities.

1999: Napster arrives. Shawn Fanning, a nineteen-year-old student, creates software that allows MP3 files to be shared directly between users, without central servers. Within months, millions of people are downloading music for free. The record industry panics. Lawsuits follow, court battles. Napster is shut down in 2001, but it's too late. The model has been established: music can circulate freely online.

2001: Apple launches the iPod. “1000 songs in your pocket” is the slogan. The definitive MP3 player, elegant, with an intuitive interface. The iPod wasn't the first MP3 player – there were already dozens on the market – but it was the one that made the idea mainstream. Suddenly, having your entire music collection in your pocket wasn't a nerd's dream anymore, it was a consumer reality.

2003: Apple launches iTunes. Finally, a legal way to buy digital music. 99 cents per song, reasonable quality, no invasive DRM. It doesn't solve the piracy problem, but it offers a valid alternative. Within a few years, iTunes becomes the world's largest music retailer.

2008: Spotify launches in Sweden. A new model: streaming, not downloading. Unlimited access to millions of tracks for a monthly fee (or free with ads). The MP3 as a file to own slowly begins to become obsolete. Why have files on your hard drive when you can have instant access to everything?

2017: MP3 patents expire. The Fraunhofer Institute officially announces the “death” of MP3 and focuses on more modern codecs like AAC and Opus. But it's a purely technical death: MP3 continues to be used everywhere, a legacy format that will probably never completely die.

Throughout all these years, Fraunhofer earned hundreds of millions of euros in royalties from MP3 patents. That money was reinvested in research, creating new generations of ever more efficient audio codecs: AAC (used by Apple), MPEG-H (for immersive audio), EVS (for 5G calls). Brandenburg, who in 2000 received the prestigious “Deutscher Zukunftspreis” (the German innovation prize), never stopped. Today he leads Brandenburg Labs, a startup working on advanced audio technologies like immersive audio for headphones, trying to create sonic experiences indistinguishable from reality. The original Fraunhofer team – Brandenburg, Bernhard Grill, Jürgen Herre, Harald Popp, Ernst Eberlein – has been awarded prizes and recognition worldwide. They've entered the Internet Hall of Fame. The CE Hall of Fame. The German Research Hall of Fame. But perhaps the most significant recognition is the simplest: go to any corner of the world, ask someone of any age what an “MP3” is, and they'll know. A format that defined an entire era of digital culture.

FLAC, OGG, vinyl, and the return of quality

And here we arrive at one of the most interesting parts of this story. Because not everyone embraced MP3. Not everyone embraced streaming. Not everyone settled for convenience at the expense of freedom and control. In the 2000s, while MP3 dominated and Fraunhofer profited from patents, there was already a counterculture growing silently.

#OGG Vorbis – released in 2000 by the Xiph.Org Foundation – was the open source community's response to the MP3 monopoly. While Fraunhofer and Thomson required licenses and royalties for MP3 encoders, OGG was completely free, without patents, without restrictions. Not only that: at the same bitrate, OGG often offered quality superior to MP3. It was technically better and philosophically consistent with free software ethics. For those who believed in open source, for those who rejected the idea of paying royalties on an audio format, for those who wanted full control over their tools, OGG became the format of choice. It wasn't just a technical matter: it was a matter of principle. The same spirit that had animated the free software movement in the 1980s – the GPL, the Free Software Foundation, all of Stallman's work – now extended to the world of audio codecs.

And then there were those who completely rejected lossy compression. #FLAC – Free Lossless Audio Codec, released in 2001 – offered compression without data loss. Larger files, sure, but bit-for-bit identical to the original. For the most uncompromising audiophiles, FLAC was the only acceptable choice. But it wasn't just about digital formats. Just as digital seemed to have won, vinyl records began making a comeback. Sales, which had collapsed in the '90s and 2000s, started growing again. In 2020, for the first time in decades, vinyl sales surpassed CD sales.

Nostalgia, certainly. The charm of the physical object, the large cover, the ritual of putting the record on the turntable, certainly. But there's also a “visceral” element: owning a vinyl, or a CD, means owning something real, tangible. Something that can't be deleted from a server, revoked by a streaming service, lost in a hard drive crash.

I myself, for years, have decided to stay out of streaming services. I buy, physically, CDs (almost always used), rip them to OGG, tag them properly, and put them on my FreeBSD NAS with ZFS. And then my #Navidrome server, calling them via NFS, does the rest. I've chosen to maintain control over my data, to privilege a free and open source format over proprietary convenience. It's a choice that requires time (and a few scattered curses...), hard drives to manage, docker compose files to update, backups to make, players to configure. But it's also a choice that gives me a sense of ownership, of control that streaming cannot provide.

There's an irony in all this: the technology that “Tom's Diner” helped create – MP3, lossy compression, the idea that “good enough” is sufficient – triggered two types of resistance. Those who rejected it for quality reasons (audiophiles with FLAC), and those who rejected it for freedom reasons (the open source community with OGG). And often, these two souls overlapped.

But this choice is only possible because hard drives have become enormous, internet connections fast, storage cheap. The same technologies that made MP3 obsolete have made it possible to collect OGG or FLAC without thinking twice. In a sense, MP3 created the conditions for its own obsolescence – and for the birth of freer and often better alternatives.

Some Lessons to Take Away

This story has taught us several things. It taught us that convenience often beats perfection. It taught us that technologies developed for one purpose (professional transmission via ISDN) can end up being used in completely different ways (mass file sharing). It taught us that established industries can be disrupted by technologies that initially seem marginal or niche. But perhaps the most important lesson is this: technology is always, at its core, a human matter. MP3 isn't just a mathematical algorithm. It's Suzanne Vega's voice singing about coffee and rain.

I am sitting in the morning At the diner on the corner I am waiting at the counter For the man to pour the coffee

It's Brandenburg's obsession with capturing that warm vocal tonality. We are living, in other words, the consequences of those thousands of repeated listens to “Tom's Diner”, of that obsession with detail, of that search for perfect compression.

And if Suzanne Vega hadn't written that song? If Brandenburg had chosen another track for his tests? Probably MP3 would have been developed anyway. The technology was in the air, the problem of audio compression had to be solved. But perhaps it would have taken longer. Perhaps the algorithm would have been slightly different. Perhaps history would have taken a different turn.

I like to think that technological progress is inevitable, deterministic, that it follows an unstoppable internal logic. But stories like this remind us how random it is, how much it depends on individual choices, on coincidences.

And now, if you'll excuse me, I'm going to update the latest release of Navidrome on my Proxmox server. With Docker, obviously.

#MP3 #DigitalAudio #SuzanneVega #TomsDiner #Fraunhofer #MusicHistory #AudioCompression #OpenSource #FLAC #TechHistory

Discuss...