Flac File
Base
The basic structure of FLAC streams is: • Four byte string "fLaC" • STREAMINFO metadata block • Zero or more other metadata blocks • One or more audio frames The first four bytes are used to identify the FLAC stream. The subsequent metadata contains all information about the stream except for the audio data itself. The metadata is followed by encoded audio data.
The basic structure of a FLAC stream is: • The four byte string "fLaC" • The STREAMINFO metadata block • Zero or more other metadata blocks • One or more audio frames The first four bytes are to identify the FLAC stream. The metadata that follows contains all the information about the stream except for the audio data itself. After the metadata comes the encoded audio data.
Metadata
FLAC defines several types of metadata blocks (see the Format page for a complete list). The metadata block can be of any length and can be customized. The decoder should allow skipping any metadata types it cannot parse. However, there is a metadata type that is required: the STREAMINFO block. The block contains information such as sampling rate, number of channels, and can help the decoder manage data in its buffers, such as minimum and maximum data rates and minimum and maximum block sizes. The STREAMINFO block also contains an MD5 signature with unencoded audio data. This is useful for checking for transmission errors in the entire stream.
Other blocks allow for filling, lookup tables, labels, CUE tables, and application-specific data. There are some flac options that can be used to add PADDING blocks or specify jump points. Jump points are not required for FLAC jumps, but they can speed up jumps or be used as prompts in editing applications.
Also, if you need to customize your metadata block, you can define your own metadata block and request the ID here. You can then keep the PADDING block of the correct size when encoding and overwrite the padding block with the APPLICATION block after encoding. The generated stream will be compatible with FLAC; the decoder knows that your defined metadata can use it correctly, and the rest of the decoders will also safely ignore it.
METADATA
FLAC defines several types of metadata blocks (see the format page for the complete list). Metadata blocks can be any length and new ones can be defined. A decoder is allowed to skip any metadata types it does not understand. Only one is mandatory: the STREAMINFO block. This block has information like the sample rate, number of channels, etc., and data that can help the decoder manage its buffers, like the minimum and maximum data rate and minimum and maximum block size. Also included in the STREAMINFO block is the MD5 signature of the unencoded audio data. This is useful for checking an entire stream for transmission errors.
Other blocks allow for padding, seek tables, tags, cuesheets, and application-specific data. There are flac options for adding PADDING blocks or specifying seek points. FLAC does not require seek points for seeking but they can speed up seeks, or be used for cueing in editing applications.
Also, if you have a need of a custom metadata block, you can define your own and request an ID here. Then you can reserve a PADDING block of the correct size when encoding, and overwrite the padding block with your APPLICATION block after encoding. The resulting stream will be FLAC compatible; decoders that are aware of your metadata can use it and the rest will safely ignore it.
Audio data
The metadata is followed by encoded audio data. Audio data and metadata are not intertwined. Like most audio codecs, FLAC splits unencoded audio data into blocks and encodes each block separately. The encoded block is packaged into a frame and added to the stream. The reference encoder uses a single block size for the entire stream, but for FLAC it is not necessary.
AUDIO DATA
After the metadata comes the encoded audio data. Audio data and metadata are not interleaved. Like most audio codecs, FLAC splits the unencoded audio data into blocks, and encodes each block separately. The encoded block is packed into a frame and appended to the stream. The reference encoder uses a single block size for the whole stream but the FLAC format does not require it.
piece
Block size is an important parameter for encoding. If it is too small, the frame overhead will reduce the compression rate. If it is too large, the modeling phase of the compressor will not be able to generate an effective model. Understanding FLAC modeling will help you improve the compression rate of certain inputs by changing the block size. In most cases, linear prediction (LP) is used for 44.1kHz audio, with the optimal block size between 2-6 k samples. In this case, the default block size of flac is 4096. Using fast fixed predictors, it is usually better to use smaller block sizes because the frame header is smaller.
BLOCKING
The block size is an important parameter to encoding. If it is too small, the frame overhead will lower the compression. If it is too large, the modeling stage of the compressor will not be able to generate an efficient model. Understanding FLAC's modeling will help you to improve compression for some kinds of input by varying the block size. In the most general case, using linear prediction on 44.1kHz audio, the optimal block size will be between 2-6 ksamples. flac defaults to a block size of 4096 in this case. Using the fast fixed predictors, a smaller block size is usually preferable because of the smaller frame header.
Related between channels
In the case of stereo input, once the data is blocked, the relevant phase can be selected through the channel. Convert left and right channels to center and side channels by following conversion: Center = (left + right)/2, side = left - right. Unlike Joint Stereo, this is a lossless process. For normal CD audio, this will produce significant additional compression. Flac provides two options for this: -m always compresses the left and right and middle versions of the block and adopts the smallest frame; -M adaptively switches between the left and right and middle.
INTER-CHANNEL DECORRELATION
In the case of stereo input, once the data is blocked it is optionally passed through an inter-channel decorrelation stage. The left and right channels are converted to center and side channels through the following transformation: mid = (left + right) / 2, side = left - right. This is a lossless process, unlike joint stereo. For normal CD audio this can result in significant extra compression. flac has two options for this: -m always compresses both the left-right and mid-side versions of the block and takes the smallest frame, and -M, which adaptively switches between left-right and mid-side.
Modeling
In the next stage, the encoder tries to approximate the signal using some function, and the approximation result (called residual, residual, or error) needs to encode the sample with fewer bits. The parameters of the function must also be transferred, so they should not be too complex to exhaust the space saved. There are two ways to form approximate values in FLAC: 1) fit a simple polynomial to the signal; 2) general linear predictive coding (LPC). Here, I don't touch on the details, just some general information about the encoding options.
First, fixed polynomial prediction (specified with -l 0) is much faster, but not as accurate as LPC. The higher the maximum LPC order, the slower the model will be, but more precise. Moreover, as the order increases, it becomes increasingly difficult to improve accuracy. Similarly, at some point (usually around 9th order), the part of the encoder guessing the best order of use will start to go wrong and the compression rate will drop slightly. By then you will have to overcome this with the exhaustive search option -e, which is much slower.
Secondly, parameters of fixed predictors can be transferred in 3 bits, while parameters of the LPC model depend on the number of bits per sample and the LPC order. This means that the frame header length depends on the method and order you choose and may affect the optimal block size.
MODELING
In the next stage, the encoder tries to approximate the signal with a function in such a way that when the approximation is subracted, the result (called the residual, residue, or error) requires fewer bits-per-sample to encode. The function's parameters also have to be transmitted so they should not be so complex as to eat up the savings. FLAC has two methods of forming approximations: 1) fitting a simple polynomial to the signal; and 2) general linear predictive coding (LPC). I will not go into the details here, only some generalities that involve the encoding options.
First, fixed polynomial prediction (specified with -l 0) is much faster, but less accurate than LPC. The higher the maximum LPC order, the slower, but more accurate, the model will be. However, there are diminishing returns with increasing orders. Also, at some point (usually around order 9) the part of the encoder that guesses what is the best order to use will start to get it wrong and the compression will actually decrease slightly; at that point you will have to you will have to use the exhaustive search option -e to overcome this, which is significantly slower.
Second, the parameters for the fixed predictors can be transmitted in 3 bits whereas the parameters for the LPC model depend on the bits-per-sample and LPC order. This means the frame header length varies depending on the method and order you choose and can affect the optimal block size.
Residual encoding
Once the model is generated, the encoder approximates the original signal to obtain the residual (error) signal. The error signal is then lost-free. For this purpose, FLAC takes advantage of the fact that error signals usually have Laplace (double-sided geometric) distributions and there is a set of special Hoffman codes called Rice Codes that can be used to efficiently encode such signals, a fast and dictionary-free encoding.
Rice encoding involves finding a single parameter that matches the signal distribution and then using that parameter to generate the code. As the distribution changes, the optimal parameters change so too, so FLAC supports a method that allows parameters to be changed as needed. Residues can be divided into several contexts or partitions, each with its own Rice parameter. Flac allows you to specify how partitioning is completed using the -r option. By using the options -r n,n, the residuals can be decomposed into 2^n partitions. Parameter n is called partition order. In addition, by specifying -r m,n, the encoder can be made to search m to n partition order and search in the best order. Generally, the choice of n will not affect the encoding speed, but m, n will affect the encoding speed. The greater the difference between m and n, the longer it takes for the encoder to search for the best order. Block size also affects the optimal order.
RESIDUAL CODING
Once the model is generated, the encoder subracts the approximation from the original signal to get the residual (error) signal. The error signal is then losslessly coded. To do this, FLAC takes advantage of the fact that the error signal generally has a Laplacian (two-sided geometric) distribution, and that there are a set of special Huffman codes called Rice codes that can be used to efficiently encode these kind of signals quickly and without needing a dictionary.
Rice coding involves finding a single parameter that matches a signal's distribution, then using that parameter to generate the codes. As the distribution changes, the optimal parameter changes, so FLAC supports a method that allows the parameter to change as needed. The residual can be broken into several contexts or partitions, each with it's own Rice parameter. flac allows you to specify how the partitioning is done with the -r option. The residual can be broken into 2^n partitions, by using the option -r n,n. The parameter n is called the partition order. Furthermore, the encoder can be made to search through m to n partition orders, taking the best one, by specifying -r m,n. Generally, the choice of n does not affect encoding speed but m,n does. The larger the difference between m and n, the more time it will take the encoder to search for the best order. The block size will also affect the optimal order.
frame
The front of the audio frame is the frame header and the back of the frame pin. The header starts with a synchronous code and contains the minimum information required by the decoder to play the stream, such as the sampling rate, bits per sample, etc. It also contains the block or sample number and the 8-bit CRC of the frame header. Synchronize code, frame header CRC and block/sample count allow resynchronization and search, even without search points. The end of the frame contains a 16-bit CRC of the entire encoded frame for error detection. If the reference decoder detects a CRC error, it generates a mute block.
FRAMING
An audio frame is preceded by a frame header and trailed by a frame footer. The header starts with a sync code, and contains the minimum information necessary for a decoder to play the stream, like sample rate, bits per sample, etc. It also contains the block or sample number and an 8-bit CRC of the frame header. The sync code, frame header CRC, and block/sample number allow resynchronization and seeking even in the absence of seek points. The frame footer contains a 16-bit CRC of the entire encoded frame for error detection. If the reference decoder detects a CRC error it will generate a silent block.
other
For convenience, the reference decoder knows how to skip ID3v1 and ID3v2 tags. Note, however, that FLAC specifications do not require compatible implementations to support ID3 in any form, so they are highly recommended not to be used.
Flac has the verification option -V to verify output when encoding. With this option, the decoder runs in parallel with the encoder and compares its output to the original input. If the difference is found, flac will stop and display an error.
MISCELLANEOUS
As a convenience, the reference decoder knows how to skip ID3v1 and ID3v2 tags. Note however that the FLAC specification does not require compliant implementations to support ID3 in any form and their use is strongly discouraged.
flac has a verify option -V that verifies the output while encoding. With this option, a decoder is run in parallel to the encoder and its output is compared against the original input. If a difference is found flac will stop with an error.
References
https://xiph.org/flac/ https://xiph.org/flac/documentation_format_overview.html https://xiph.org/flac/format.html