MP3 is a popular digital audio encoding and lossy compression format. It was designed to greatly reduce the amount of data (10:1 compression is common) required to represent audio, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners. In popular usage, MP3 also refers to files of sound or music recordings stored in the MP3 format on computers.
The name is derived from "MPEG-1 Audio Layer 3", more formally known as "MPEG-1 Part 3 Layer 3" or "ISO/IEC 11172–3 Layer 3". Reportedly, the ".mp3" filename extension is also sometimes used on audio files encoded using the newer "MPEG-2 Audio Layer 3" standard (a.k.a. "MPEG-2 Part 3 Layer 3" or "ISO/IEC 13818–3 Layer 3").
MP3 is a lossy compression format. It provides a representation of pulse-code modulation-encoded (PCM) audio data in a much smaller size by discarding portions that are considered "less important" to human hearing. (This is similar in concept to JPEG lossy compression for images.) A number of techniques are employed in MP3 to determine what portions of the audio can be discarded, including psychoacoustics. MP3 audio can be compressed with different bit rates, providing a range of tradeoffs between data size and sound quality.
MPEG-1 Audio Layer 2 encoding started life as the Digital Audio Broadcast (DAB) project initiated by the Fraunhofer Society. This project was financed by the European Union as a part of the EUREKA research program where it was commonly known as EU-147.
EU-147 ran from 1987 to 1994. In 1991, there were two proposals available: Musicam (known as Layer II) and ASPEC (Adaptive Spectral Perceptual Entropy Coding) with similarities to MP3. Musicam was chosen due to its simplicity and error robustness.
A working group around Karlheinz Brandenburg and Jürgen Herre took ideas from Musicam and ASPEC, added some of their own ideas and created MP3, which was designed to achieve the same quality at 128 kbit/s as MP2 at 192 kbit/s.
Both algorithms were finalized in 1992 as part of MPEG-1, the first standard suite by MPEG, which resulted in the international standard ISO/IEC 11172–3, published in 1993. Further work on MPEG audio was finalized in 1994 as part of the second suite of MPEG standards, MPEG-2, more formally known as international standard ISO/IEC 13818–3, originally published in 1995.
Compression efficiency of lossy compression encoders is typically defined by the bit rate because compression rate depends on bit depth and sampling rate of the input signal. Nevertheless, there are often published compression rates which use the CD parameters as references (44.1 kHz, 2 channels at 16 bits per channel or 2x16 bit). Sometimes the Digital Audio Tape (DAT) SP parameters are used (48 kHz, 2x16 bit). Compression ratios for this reference is higher, which demonstrates the problem of the term "compression ratio" for lossy encoders.
Karlheinz Brandenburg used a CD recording of Suzanne Vega's song "Tom's Diner" as his model for the MP3 compression algorithm. This song was chosen because of its softness and simplicity, making it easier to hear imperfections in the compression format during playbacks.
Fraunhofer Gesellschaft (abbreviated FhG) publish on their official webpage the following compression ratios and data rates for MPEG-1 Layer 1, 2 and 3, intended for comparison:
- Layer 1: 384 kbit/s, compression 4:1
- Layer 2: 192...256 kbit/s, compression 6:1...8:1
- Layer 3: 112...128 kbit/s, compression 10:1...12:1
These values are probably overly optimistic (which is likely to be influenced by public relations) because the quality depends not only on the encoding file format, but also on the quality of the psychoacoustic algorithms used by the encoder. Typical layer 1 encoders use simple psycho acoustics which result in a higher needed bit rate for transparent encoding.
- Layer 1 encoding at 384 kbit/s, even with these simple psychoacoustics, is better than Layer 2 at 192...256 kbit/s
- Layer 3 encoding at 112...128 kbit/s is worse than Layer 2 at 192...256 kbit/s.
That is to say, the assumed bit rates are not equivalent in quality and the qualities are not necessarily optimal (it is generally agreed that 112 to 128 kbit/s Layer 3 is not excellent sound) and therefore the comparison is probably not reliable as an objective source.
More realistic bit rates are:
- Layer 1: excellent at 384 kbit/s
- Layer 2: excellent at 256...384 kbit/s, very good at 224...256 kbit/s, good at 192...224 kbit/s
- Layer 3: excellent at 224...320 kbit/s, very good at 192...224 kbit/s, good at 128...192 kbit/s
Comparing a new file format typically is done by comparing a medium quality encoder of the old format and a highly tuned encoder of the new format.
The MP3 format uses, at its heart, a hybrid transform to transform a time domain signal into a frequency domain signal:
- 32 band polyphase quadrature filter
- 36 or 12 tap MDCT, size can be selected independent for subband 0...1 and 2...31
- aliasing reduction postprocessing
In terms of the MPEG specifications, Advanced audio coding (AAC) from MPEG-4 is to be the successor of the MP3 format, although there has been a significant movement to create and popularize other audio formats. Nevertheless, any 'succession' is not likely to happen for a significant amount of time due to MP3's overwhelming popularity. MP3 enjoys extremely wide popularity and support, not just by end-users and software but by hardware such as DVD and CD players.
MP2 and MP3 and the Internet
In October 1993, MP2 ("MPEG-1 Audio Layer 2") files appeared on the Internet and were often played back using "Xing MPEG Audio Player", and later in a program for Unix by Tobias Bading called MAPlay initially released on February 22, 1994. (MAPlay was also ported to Microsoft Windows.)
Initially the only encoder available for MP2 production was the Xing Encoder, accompanied by the program CDDA2WAV, a CD ripper that copied CD audio to hard disks. Internet Underground Music Archive (IUMA) is generally recognized as the start of the on-line music revolution. IUMA was the Internet's first high-fidelity music web site, hosting thousands of authorized MP2 recordings before MP3 or the web were popularized. IUMA was started by Rob Lord (who later headed pioneering Nullsoft) and Jeff Patterson, both from University of California, Santa Cruz, in 1993. Other founding members include Jon Luini, Brandee Selck and Ahin Savara.
In the first half of 1995, MP3 files began flourishing on the Internet. Its popularity was mostly due to, and interchangeable with, the successes of companies and software packages like Nullsoft's Winamp, mpg123 and the now Roxio-owned Napster. Since 2003, the number of MP3 blogs has exploded, while largely avoiding a backlash from record companies.
Controversies regarding peer to peer file sharing of MP3 files have flourished in recent years — largely because high compression enables sharing of files that would otherwise be too large and cumbersome to share.
Quality of MP3 audio
Many listeners accept the MP3 bitrate of 128 kilobits per second (kbit/s) as near enough to compact disc quality for them. This provides a compression ratio of approximately 11:1, although listening tests show that with a bit of practice, many listeners can reliably distinguish 128 kbit/s MP3s from CD originals. To some listeners, 128 kbit/s is unacceptably low quality. Even though differences may be perceptible, this is acceptable for some listeners in some listening environments, such as a noisy car or train.
A few possible encoders:
- LAME first created by Mike Cheng in early 1998, it is a (by contrast to others) fully LGPL'd MP3 encoder, with excellent speed and quality, rivaling even MP3's technological successors.
- Fraunhofer Gesellschaft: Some encoders are good, some have bugs.
Some early encoders are not widely used any more: ISO dist10 reference code, Xing, BladeEnc, and ACM Producer Pro.
The quality of MP3 files depend on the quality of the encoder and the difficulty of the signal which must be encoded.
- Good encoders produce acceptable quality at 128 to 160 kbit/s and near-transparency at 160 to 192 kbit/s.
- Low quality encoders may never reach transparency, not even at 320 kbit/s.
So it is pointless to speak of 128 kbit/s or 192 kbit/s quality, except in the context of a particular encoder or of the best available coders. A 128 kbit/s MP3 produced by a good encoder might sound better than a 192 kbit/s MP3 file produced by a bad encoder.
Additionally, it is important to note that this is subjective. A given bitrate suffices for some listeners but not for others. The numbers given above are rough guidelines that work for many people, but in the field of lossy audio compression, the only true measure of the quality of a compression process is to listen to the results.
An important feature of MP3 is that it is lossy — meaning that it removes information from the input in order to save space (and bandwidth cost in transferred). As with most modern lossy encoders, MP3 algorithms work hard to ensure that the parts it removes cannot be detected by human listeners by modeling characteristics of human hearing (i.e., noise masking). The importance of this is that it can gain huge savings in storage space with reasonable and acceptable (although detectable) losses in fidelity.
If your aim is to archive sound files with no loss of quality (or work on the sound files in a studio) you should consider Lossless compression such as:
- Monkey's Audio (APE)
- Shorten (SHN)
- Free Lossless Audio Codec (FLAC)
- Wavpack (WV)
- True Audio (TTA)
- Lossless Predictive Audio Compression (LPAC)
- Apple Lossless
These are capable of compressing 16-bit PCM audio to 38 to 80% of its original size (depending upon the characteristics of the audio itself), leaving the audio bit-for-bit identical to the original (ergo "lossless").
It is important to understand the difference between those who use audio for further processing (later work on samples for example) and those who merely listen to it. Audio professionals will use lossless formats (in any pre-mastering stage) as well as people who trade live recordings. Nevertheless, individual acoustic perception may vary so it is not evident that a certain psychoacoustic model can give satisfactory results for everyone. Merely changing the conditions of listening, such as the audio playing system or environment, can expose unwanted distortions caused by lossy compression. Lossless formats will produce the best result for any person and hardware resolution.
If MP3 audio needs to be decoded and re-encoded another time, for example when it will be aired on radio, cascading lossy compression stages can significantly reduce the quality of the end-result. To prevent this, keep audio data in its original state if further operating on it is necessary. If any operation needs to be done on MP3 data, such as cutting or merging audio, or lowering bitrate, it is preferable to use software that works directly with the encoded data (such as mp3DirectCut and MP3Gain) and prevent extra decoding-encoding steps.
The bit rate is variable for MP3 files. The general rule is that the higher the bitrate, the more information is included from the original sound file, and thus the higher the quality of played back audio. In the early days of MP3 encoding, a fixed bit rate was used for the entire file.
Bit rates available in MPEG-1 layer 3 are 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kibit/s, and the available sample frequencies are 32, 44.1 and 48 kHz. 44.1 kHz is almost always used (coincides with the sampling rate of compact discs), and 128 Kbit has become the de facto "good enough" standard, although 192 kbit is becoming increasingly popular on peer-to-peer file sharing programs. MPEG-2 and (the non-official) MPEG-2.5 adds more bitrates: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160 kbit/s.
As already mentioned, variable bit rates (VBR) are also possible. Audio in MP3 files are divided into frames (which have their own bit rate) so it is possible to change the bit rate dynamically as the file is encoded. This was not originally done, but VBR is in extensive use today. This technique makes it possible to use more bits for parts of the sound with high dynamics (much "sound movement") and fewer bits for parts with low dynamics, increasing quality and decreasing storage space further. This method compares to a sound activated tape recorder which saves the tape space from when silence was prevalent for the times when sound is being heard. Some encoders utilize this technique to a great extent.
Design limitations of MP3
There are several limitations inherent to the MP3 format that cannot be overcome by using a better encoder. More recent audio-compression formats such as Vorbis, AAC, Musepack and WMA no longer have these limitations. In technical terms, MP3 is limited in the following ways:
- In constant bitrates, bitrates are limited to a maximum of 320 kbit/s
- Time resolution can be too low for highly transient signals
- Encoder/decoder overall delay is not defined
- No scaleband factor for frequencies above 15.5/15.8 kHz
- Joint stereo is done on a frame-to-frame basis
Nevertheless a well-tuned MP3 encoder can perform competitively even with these restrictions.
Encoding of MP3 audio
The MPEG-1 standard does not include a precise specification for an MP3 encoder. The decoding algorithm and file format, as a contrast, are well defined. Implementors of the standard were supposed to devise their own algorithms suitable for removing parts of the information in the raw audio (or rather its MDCT representation in the frequency domain). This is the domain of psychoacoustics, which aims at understanding how human acoustical perception works (both in our ears and in our brain).
As a result, there are many different MP3 encoders available, each producing files of differing quality. Comparisons are widely available, so it is easy for a prospective user of an encoder to research the best choice. It must be kept in mind that an encoder that is proficient at encoding at higher bitrates (such as LAME, which is in widespread use for encoding at higher bitrates) is not necessarily as good at other, lower bitrates.
Decoding of MP3 audio
Decoding, on the other hand, is carefully defined in the standard. Most decoders are "bitstream compliant", meaning that the uncompressed output they produce from a given MP3 file will be the same (within a specified degree of rounding tolerance) as the output specified mathematically in the standard document. Therefore, for the most part, comparison of decoders is almost exclusively based on how computationally efficient they are (i.e., how much memory or CPU time they use in the decoding process).
ID3 and other tags
A "tag" is data stored in an MP3 (as well as other formats) which contains metadata such as the title, artist, album, track number or other information about the MP3 file to be added to the file itself. The most widespread standard tag formats are currently the ID3 ID3v1 and ID3v2 tags, and the more recent APEv2 tag.
As compact discs and other various sources are recorded and mastered at different volumes, it is useful to store volume information about a file in the tag so that at playback time, the volume can be dynamically adjusted.
A few standards for encoding the gain of an MP3 file have been proposed. The idea is to normalize the volume (not the volume peaks) of audio files, so that the volume does not change between consecutive tracks.
The most popular and widely-used solution for storing replay gain is known simply as "Replay Gain". Typically, the average volume and clipping information about an audio track is stored in the metadata tag.
Many other lossy audio codecs exist, including:
- MPEG-4 AAC, used by Apple's iTunes Music Store and iPod
- AC-3, used in Dolby Digital and DVD;
- ATRAC, used in Sony's Minidisc;
- MPEG-1/2 Audio Layer 2 (MP2), MP3's predecessor;
- mp3PRO from Thomson Multimedia combining MP3 with SBR;
- MPC, also known as Musepack (formerly MP+), a derivative of MP2;
- Ogg Vorbis from the Xiph.org Foundation, a free software and patent free codec.
- QDesign, used in QuickTime at low bitrates;
- AMR-WB+ Enhanced Adaptive Multi Rate WideBand codec, optimized for cellular and other limited bandwidth use;
- RealAudio from RealNetworks, frequently in use for streaming on websites;
- Windows Media Audio (WMA) from Microsoft, which like MP3 is widely supported by hardware devices;
mp3PRO, MP3, AAC, and MP2 are all members of the same technological family and depend on roughly similar psychoacoustic models. The Fraunhofer Gesellschaft owns many of the basic patents underlying these codecs, with Dolby Labs, Sony, Thomson Consumer Electronics, and AT&T holding other key patents.
There are also some non-lossy (lossless) audio compression methods used on the Internet. While they are not similar to MP3, they are good examples of other compression schemes available. These include:
Listening tests  have attempted to find the best-quality lossy audio codecs at certain bitrates. The tests have suggested that for some audio samples, newer audio codecs including Ogg Vorbis, mp3PRO, AC-3, Windows Media Audio, MPC and RealAudio perform better than MP3. Generally, these codecs achieve the equivalent of MP3 128kbps at around 80kbps. At 128kbps, Ogg Vorbis and MPC performed marginally better than other codecs. At 64kbps, ACC and mp3pro performed marginally better than other codecs. At high bitrates (128kbps+), most people do not hear significant differences. What is considered 'CD quality' is quite subjective; for some 128kbps MP3 is sufficient, while for others 192kbps MP3 is necessary.
Though proponents of newer codecs such as WMA and RealAudio have asserted that their respective algorithms can achieve CD quality at 64 kbit/s, listening tests have shown otherwise; however, the quality of these codecs at 64 kbit/s is definitely superior to MP3 at the same bandwidth. The developers of the patent-free Ogg Vorbis codec claim that their algorithm surpasses MP3, RealAudio and WMA sound quality, and the listening tests mentioned above support that claim. Thomson claims that its mp3PRO codec achieves CD quality at 64 kbit/s, but listeners have reported that a 64 kbit/s mp3PRO file compares in quality to a 112 kbit/s MP3 file and does not come reasonably close to CD quality until about 80 kbit/s.
MP3, which was designed and tuned for use alongside MPEG-1/2 Video, generally performs poorly on monaural data at less than 48 kbit/s or in stereo at less than 80 kbit/s.
Licensing and patent issues
Thomson Consumer Electronics controls licensing of the MPEG-1/2 Layer 3 patents in countries such as the United States of America and Japan that recognize software patents. Thomson has decided to attempt to collect royalties for the patents.
In September 1998, the Fraunhofer Institute sent a letter to several developers of MP3 software stating that a license was required to "distribute and/or sell decoders and/or encoders". The letter claimed that unlicensed products "infringe the patent rights of Fraunhofer and THOMSON. To make, sell and/or distribute products using the [MPEG Layer-3] standard and thus our patents, you need to obtain a license under these patents from us."
These patent issues significantly slowed the development of unlicensed MP3 software and led to increased focus on creating and popularising alternatives such as WMA and Ogg Vorbis. Microsoft, the makers of the Windows operating system, chose to move away from MP3 to their own proprietary Windows Media formats to avoid the licensing issues associated with the patents. Until the key patents expire, open source / free software encoders and players appear to be illegal for commercial use in countries that recognize software patents.
In spite of the patent restrictions, the perpetuation of the MP3 format continues; the reasons for this appear to be the network effects caused by:
- familiarity with the format, not knowing alternatives exist,
- the large quantity of music now available in the MP3 format,
- the wide variety of existing software and hardware that takes advantage of the file format that revolutionized the music industry and copyright law.
Online music resources
Tools such as iRate try to make it easier to find music that matches the listener's tastes. There are several online music stores. Apple's iTunes store is presently the most popular commercial online music offering. A controversial MP3 portal is the Russian site AllOfMP3.com, which offers downloads of thousands of albums and video clips by mainstream artists, priced at $20 per gigabyte. There are also several online columnists who edit news sites focused on digital music and the grassroots community it spawned. They include Richard Menta's MP3newswire.net, an early MP3 news site started in 1998, Jon Newton's P2Pnet, and Thomas Mennecke's Slyck.com. Finally there are sites like Download.com and Vitaminic.com which allow artists to choose to post their own music for free download.
- Fraunhofer IIS
- List of relevant patents
- MP3 File Format Specification
- MPEG Audio Web Page
- MPEG Audio FAQ
- MP3 Search Engine – Finds MP3 files
- ID3v2 Information
- News about the MP3 format
- Thomson's mp3licensing.com
- Xiph.org listening test – Vorbis vs. MP3, RealAudio, Windows Media, etc.
- Roberto's public listening tests – blind, controlled listening tests of lossy compression formats including MP3.
- Coding Technologies (DAB related)
- Factum Electronics (DAB related)
- Mp3 Limitations – LAME developer explains flaws and restricitons of MP3 compared to newer formats
- LAME MP3 Encoder downloads
- Rarewares MP3 software downloads
- mp3DirectCut, for fast mpeg audio editing
- MP3Gain, changes audio-level of mp3 files
- MPEG Audio Resources and Software
- mp3 tagging library (software)
- ID3 tag + filename manipulation software (ID3 is the tagging system used to store information about tracks in the MP3 files)