Skip to main content

Audio File Format

PCM audio file format


1.WAV

It is an encoding format developed by Microsoft and IBM to store audio streams on personal computers, and is widely supported on the Windows platform application software. Generally speaking, WAV is regarded as a lossless format.

References: https://zh.wikipedia.org/wiki/WAV


2.AIFF

Audio Interchange File Format (Audio Interchange File Format, abbreviated as AIFF) is an audio format used to store audio data in personal computers and other electronic audio devices. This format was developed by Apple in 1988 on the Interchange File Format (abbreviated as IFF, widely used in the Amiga system) and was used on Apple's OS X operating system.

References: https://zh.wikipedia.org/zh-tw/%E9%9F%B3%E9%A2%91%E4%BA%A4%E6%8D%A2%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F


3.APE

Monkey's Audio, a common lossless audio compression encoding format with an extension of .ape. Monkey's Audio is a fast and convenient way to compress digital music. Unlike traditional methods like mp3, ogg or wma that permanently abandon quality to save space, Monkey's audio only replicates music perfectly. This means it always sounds perfectly exactly the same as the original version. Even if the sound is perfect, it can still save a lot of space (think of it as a perfect Winzip™ music). Another great thing is that you can always decompress Monkey's audio files back to the exact original file. This way, you will switch formats without having to re-copy the CD collection and you can always recreate the original music CD perfectly.

Reference documentation: https://zh.wikipedia.org/wiki/Monkey%27s_Audio https://monkeysaudio.com/index.html https://monkeysaudio.com/theory.html


4.FLAC

FLAC (pronunciation: /ˈflæk/; full name: Free Lossless Audio Codec), literally translated as free lossless audio compression encoding (Note: "Free" here refers to free software - and not only free). FLAC is a free audio compression encoding, which is characterized by lossless compression of audio files. Unlike other lossy compression encodings (such as MP3, AAC, etc.), there will be no sound quality loss after compression, and it is now supported by many software and hardware audio products.

Reference documentation: https://zh.wikipedia.org/wiki/FLAC https://xiph.org/flac https://xiph.org/flac/format.html https://xiph.org/flac/comparison.html


5.alac

Apple Lossless Audio Codec (ALAC) is Apple's lossless audio compression encoding format, which can compress non-compressed audio formats (WAV, AIFF) to about 40% to 60% of the original capacity, and the encoding and decoding speed is very fast. It is also because it is lossless compression, which sounds exactly the same as the original file and will not change due to decompression and compression.

Reference documentation: https://zh.wikipedia.org/wiki/Apple_Lossless


6.ASF

Advanced Systems Format (formerly Advanced Streaming Format, Active Streaming Format) is Microsoft's proprietary digital audio/digital video container format, especially suitable for streaming media. ASF is part of the Media Foundation framework.

ASF is based on serialized objects, which are essentially sequences of bytes identified by GUID tags. This format does not specify how the video or audio should be encoded (i.e. which codec is used); it only specifies the structure of the video/audio stream. This is similar to the functions performed in QuickTime file format, AVI, or Ogg format. One of the goals of ASF is to support playback of digital media servers, HTTP servers, and local storage devices such as hard disks. The most common media in ASF files are Windows Media Audio (WMA) and Windows Media Video (WMV). The most common file extensions for ASF files are the extension .WMA (pure audio files using Windows Media audio, with MIME type audio/x-ms-WMA ) and .WMV (files containing video, using Windows Media audio and video codecs, with MIME type video/x-ms-ASF ). These files are the same as the old .ASF files, but have different extensions and MIME types. Using different extensions makes it easier to identify the content of a media file. [3] The ASF file can also contain objects representing metadata, such as the artist, title, album and genre of an audio track, or the director of a video track, much like the ID3 tag of an MP3 file. It supports extensible media types and stream priority; therefore, it is a format optimized for streaming. ASF containers provide a framework for digital permission management in Windows Media audio and Windows Media video. Analysis of an old scheme used in WMA shows that the scheme uses a combination of elliptic curve cipher, key exchange, DES packet cipher, custom packet cipher, RC4 stream cipher, and SHA-1 hash function. Media based on ASF containers are sometimes still transmitted over the internet via the MMS protocol or RTSP protocol. However, most of the time, they contain materials encoded for "progressive downloads" that can be distributed by any web server, and then provide the same advantages as streaming: the file starts playing as soon as a minimum number of bytes is received, and the rest of the download continues in the background when one person watches or listens. Library of Congress Digital Preservation Project believes that ASF is actually the successor to RIFF [2] In 2010, Google chose RIFF as the container format for WebP.

Reference documentation: https://en.wikipedia.org/wiki/Advanced_Systems_Format


7.WavPack(WV)

WavPack is a free, open source lossless audio compression format developed by David Bryant, with the suffix of the file named .wv.

Reference documentation: https://zh.wikipedia.org/wiki/WavPack


8.WMA

WMA (Windows Media Audio) is a series of audio codecs developed by Microsoft, which also refers to the corresponding digital audio encoding format. WMA includes four different codecs: (1) WMA, the original WMA codec, as a competitor to MP3 and RealAudio codecs [1][2]; (2) WMA Pro, supports more channels and higher quality audio [3]; (3) WMA Lossless, lossless codec; (4) WMA Voice, used to store voice, uses low-code rate compression [3]. Some pure audio ASF files that encode all of their content in the Windows Media Audio encoding format also use WMA as the extension.

The WMA format was originally developed by Microsoft, but with the support of many players, this format is becoming one of the competitors of the MP3 format. It is compatible with MP3's ID3 metadata tags and supports additional tags. In addition, in general, WMA and MP3 audio with the same sound quality have a smaller file size.

WMA can be used in encoded files in multiple formats. Applications can use the Windows Media Format SDK for encoding and decoding in WMA format. Some common WMA-enabled applications include Windows Media Player, Windows Media Encoder, RealPlayer, Winamp, and more. This format is also supported by other platforms such as Linux and software and software in mobile devices.

Reference documentation: https://zh.wikipedia.org/zh-hans/Windows_Media_Audio


9.mp3

LAME Ain't an MP3 Encoder (LAME is not an MP3 encoder) Dynamic Image Expert Group-1 or Dynamic Image Expert Group-2 Audio Layer III (English: MPEG-1 or MPEG-2 Audio Layer III), often referred to as MP3, is a popular digital audio encoding and lossy compression format today. It is designed to significantly reduce the amount of audio data and achieve the purpose of compressing into smaller files by discarding the parts of PCM audio data that are not important to human hearing. For most users' auditory experience, the sound quality of MP3 has no significant decline compared to the original uncompressed audio. It was invented and standardized in 1991 by a team of engineers from the Fraunhofer Association, a research organization based in Erlangen, Germany. The popularity of MP3 has had an impact and impact on the music industry.

MP3 is a data compression format. It abandons pulse-encoded modulation (PCM) audio data that is not important to human hearing (similar to JPEG, a compression format for lossy images), thereby achieving compression into much smaller file sizes.

A number of techniques are used in MP3, including psychoacoustics, to determine which part of the audio can be discarded. MP3 audio can be compressed at different bit rates, providing a basis for weighing data size and sound quality.

The MP3 format uses a hybrid conversion mechanism to convert time domain signals into frequency domain signals:

32-band multiphase integral filter (PQF) 36 or 12 tap Modified discrete cosine filter (improved discrete cosine transformation); each subband size can be independently selected between 0...1 and 2...31 Aliasing attenuation post-processing Despite many important efforts to create and promote other formats, such as AAC (Advanced Audio Coding) in the MPEG standard and Opus in the IETF open standard. However, due to the unprecedented circulation of MP3, other formats are unlikely to threaten their status at present. MP3 not only has extensive user-side software support, but also has a lot of hardware support, such as portable digital audio players (generally referring to MP3 players), mobile phones, digital multi-function audio and video discs and CD players.

Reference documentation: https://zh.wikipedia.org/wiki/MP3 https://zh.wikipedia.org/wiki/LAME


10.AAC

Advanced Audio Coding (AAC), which appeared in 1997, is a patented audio coding standard based on lossy digital audio compression based on MPEG-2, developed by Fraunhofer IIS, Dolby Labs, AT&T, Sony, Nokia and other companies. In 2000, the MPEG-4 standard added technologies such as PNS (Perceptual Noise Substitution) to the original basis and provided a variety of expansion tools. In order to distinguish it from the traditional MPEG-2 AAC, it is also called MPEG-4 AAC. It is designed as the successor of MP3, and at the same bit rate, AAC can usually achieve better sound quality compared to MP3.

AAC is standardized by the International Organization for Standardization and the International Electrotechnical Commission as part of the MPEG-2 and MPEG-4 specifications. Part of AAC and HE-AAC (AAC+) are part of MPEG-4 audio and are used in the two digital broadcast standards, digital sound broadcasting and world digital broadcasting, and in the two mobile TV standards, DVB-H and ATSC-M/H.

Reference documentation: https://zh.wikipedia.org/wiki/%E9%80%B2%E9%9A%8E%E9%9F%B3%E8%A8%8A%E7%B7%A8%E7%A2%BC


11.ogg

Ogg is a free and open standard multimedia file format maintained by the Xiph.Org Foundation. The Ogg format is not limited by software patents and is designed to efficiently stream and process high-quality digital multimedia.

"Ogg" means a file format that can incorporate a variety of free and open source codecs, including the processing of sound effects, video, text (like subtitles) and metadata.

Under Ogg's multimedia framework, Theora provides a lossy image level, and the music-oriented Vorbis codec is usually used as the sound level. Compression codecs for voice design Speex and lossless sound compression codecs FLAC and OggPCM may also be used as sound effects.

Reference documentation: https://zh.wikipedia.org/wiki/Ogg


12.vorbis

Vorbis is a lossy audio compression format, a free open source software project led by the Xiph.Org Foundation and open source. This project generates audio encoding formats and software reference encoder/decoder (codec) for lossy audio compression. Vorbis usually uses Ogg as the container format, so it is often collectively called Ogg Vorbis.

Reference documentation: https://zh.wikipedia.org/wiki/Vorbis


13.Opus

Opus is a lossy sound encoding format developed by the Xiph.Org Foundation and later standardized by the IETF Internet Engineering Task Force. The goal is to replace Speex and Vorbis in a single format, and is suitable for low-latency instant sound transmission on the network. The standard format is defined in the RFC 6716 file. Opus format is an open format with no patents or restrictions on use.

Opus integrates two sound encoding technologies: SILK oriented towards speech encoding and CELT with low latency. Opus can seamlessly adjust high and low bit rates. Inside the encoder it uses linear predictive encoding at lower bit rates and transform encoding at high bit rates (the combination of the two is also used at the junction of high and low bit rates). Opus has very low algorithm delay (default is 22.5 ms), which is very suitable for encoding low-latency voice calls, such as instant sound streaming on the network, instant synchronous voice narration, etc. In addition, Opus can also achieve lower algorithm delay by reducing the encoding bit rate, with a minimum of up to 5ms. In multiple auditory blind tests, Opus has lower latency and better sound compression rates than common formats such as MP3, AAC, HE-AAC, etc.

Reference documentation: https://zh.wikipedia.org/wiki/Opus_(%E9%9F%B3%E9%A2%91%E6%A0%BC%E5%BC%8F


14.DTS

Digital Theater Systems (DTS, Digital Theater Systems) is developed by DTS Inc. (NASDAQ: DTSI). It is one of the multi-channel audio formats and is widely used in DVD sound effects. Its most common format is 5.1 channels. With Dolby Digital as its main competitor. To achieve DTS sound output, you need to match the DTS specifications on the hardware and software, and most of them will mark the DTS trademark on the product.

Reference documentation: https://zh.wikipedia.org/zh/DTS


15.dxd

Digital eXtreme Definition is a digital audio format that was originally developed for editing high-resolution recordings recorded in DSD (the audio standard used on SACD). Since the 1-bit DSD format used on SACD is not suitable for editing, alternative formats such as DXD or DSD-Wide must be used during the mastering phase.

Reference documentation: https://en.wikipedia.org/wiki/Digital_eXtreme_Definition


16.HLS

HTTP Live Streaming, abbreviated as HLS, was proposed by Apple for the HTTP-based streaming network transmission protocol. It is part of Apple's QuickTime X and iPhone software systems. Its working principle is to divide the entire stream into small HTTP-based files to download, only some downloads at a time. When the media stream is playing, the client can choose to download the same resource at different rates from many different alternate sources, allowing the streaming session to adapt to different data rates. When starting a streaming session, the client downloads an extended M3U (m3u8) playlist file containing metadata to find available media streams.

HLS only requests basic HTTP messages. Unlike Real-Time Transfer Protocol (RTP), HLS can pass through any firewall or proxy server that allows HTTP data to pass through. It is also easy to use content distribution networks to transmit media streams.

Apple has submitted the HLS protocol as a draft Internet bill (step-by-step submission) to the IETF as an informal standard in the first phase. In August 2017, RFC 8216 was released, describing the definition of the 7th edition of the HLS protocol.

Reference documentation: https://zh.wikipedia.org/zh-hans/HTTP_Live_Streaming


17.TS

MPEG2-TS Transport Stream (MPEG-2 Transport Stream; also known as MPEG-TS, MTS, TS) is a standard digital packaging format used to transmit and store video, audio, channel, and program information, and is used in digital TV broadcast systems, such as DVB, ATSC, ISDB:118, IPTV, etc.

References: https://zh.wikipedia.org/zh-hans/MPEG2-TS

DSD audio file format

1.DFF

The full name of DFF is DSDIFF (Direct Stream Digital Interchange File Format), which is a file system that encapsulates DSD streams defined by Philips, which is similar to wav that encapsulates PCM.

2.DSF

DSF is the abbreviation of DSD Stream File, a file format that encapsulates DSD streams defined by Sony.

3.SACD

Super Audio CD (SACD) is a read-only disc format for audio storage, launched in 1999. It was jointly developed by Sony and Philips Electronics to become the successor to the CD (CD) format. The SACD format allows multiple audio channels (i.e., surround sound or multi-channel sound). It also provides higher bitrates and longer playback times than traditional CDs. SACD is designed to be played on SACD players. Hybrid SACD contains a Red Book Disc Digital Audio (CDDA) layer that can be played on a standard CD player.

The digital version of SACD is generally made with an ISO suffix. In fact, it is saved into an ISO file after SACD is captured.

References: https://en.wikipedia.org/wiki/Super_Audio_CD

4. DST

To reduce the space and bandwidth requirements of DSDs, a lossless data compression method called direct streaming (DST) is used. DST compression is mandatory for multi-channel areas and optional for stereo areas. It usually compresses 2 to 3 times, allowing a disc to contain both 80 minutes of 2-channel and 5.1-channel sound. Direct streaming compression was standardized in 2005 as an amendment to the MPEG-4 audio standard (ISO/IEC 14496-3:2001/Amd 6:2005 – Lossless encoding of oversampled audio). It contains the DSD and DST definition specifications described in SACD. MPEG-4 DST provides lossless encoding of oversampled audio signals. The target application of DST is to archive and store 1-bit oversampled audio signals and SA-CD. In 2007, the reference implementation of MPEG-4 DST was published as ISO/IEC 14496-5:2001/Amd.10:2007.

References: https://en.wikipedia.org/wiki/Super_Audio_CD#Direct_Stream_Transfer

References

https://www.owlapps.net/owlapps_apps/articles?id=2316&lang=en