💾 Archived View for hecanjog.com › research › 20210410-linear-predictive-coding.gmi captured on 2024-08-24 at 23:55:12. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-01-29)
-=-=-=-=-=-=-
Posted on 2021-04-10 12:00
Linear predictive coding is a process of deriving the shape of a filter from a particle of sound that can be used to later reconstruct that particle by feeding either a pulse stream or white noise into a filter with the derived coefficients. It is typically used as a compression scheme for speech, because the pulse/noise into a filter roughly models the human vocal tract (pulse train -> filter) and associated sibilants et al (noise -> filter) fairly well.
This is a very nice introduction to the topic by Hyung-Suk Kim.
To read:
Source code
another lecture slide deck from Lawrence Rabiner at UCSB looks useful for comparative reading
Digital Speech Processing Course (winter 2015) homepage
soundpipe's OpenLPC-based lpc module.
the source looks very readable
Hyung-Suk, Kim. Linear Predictive Coding Is All-Pole Resonance Modeling, Center for Computer Research in Music and Acoustics, Stanford University, 4 Dec. 2020, ccrma.stanford.edu/~hskim08/lpc/.
Linear Predictive Coding is All-Pole Resonance Modeling
Probably wrong python translation of the first math block in section 3:
p = 10 a = [0] * p # should be filled with coefficients? o = [0] * p for i in range(numsamples): for j in range(1, p+1): o[j] = 1.0 / (a[j] * math.pow(i, -j))
Alright the matrix transformation stuff in section 4 is too math for me. It seems like the basic idea is that the block of P coefficients needs to be computed for every sample though. The end of the section says the process is the same as a linear regression, so some future comparative reading on that topic might help clarify this part.
The final equation form looks very simple but I don't know how to translate it:
a = (A1 A)-1 AT b
Apparantly this part:
(A1 A)-1 AT
Is known as the Moore-Penrose pseudoinverse so there's some nice jargon-salt to add to this wound.
Anyway, this is apparantly the way to compute the coefficients A.
Next in the recipe is to extract the residuals using A -- this process is also unclear to me but seems to involve at least these additional steps per block:
1. Determine if the current block of samples is pitched & get a pitch contour if so (I can let aubio handle this for me)
2. In either case determine the "power of the source signal" -- so, what is this the spectral magnitude? It's described here as a variance which is also clear as mud to me.
Three formants are apparantly enough (6 poles or p=6) for voice synthesis, so playing with p could also be interesting...
Here we repeat the process using the usual overlap-add, though I'm curious to read more about older approaches. For example IIRC Lansky doesn't use any windowing in his implementation.
The examples here (and every LPC usage I've seen so far) use pretty small block sizes, so I assume the smaller the block the less advantage to doing the computation as an FFT but who knows. Here Kim is using a blocksize of 240 samples, Lansky was using 250 sample blocks in the 80s, and even Charles Dodge used a block size of 125 samples for In Celebration. (See: Composers & The Computer ed. by Curtis Roads)
OK, so: decoding.
Doing the analysis process with p=6 leaves us with 6 coefficient values (for A) and one variance value (which I still don't really get) for every block.
This is the fun part -- to synthesize this again, we can feed either white noise or a pulse train into a filter using the given coefficients (using the value of the variance somehow to decide if it should be the pulse train -- pitched -- or white noise -- unpitched) and get a full block of "resynthesized" sound to overlap-add into an output signal.
Aside: so is this standard for math notation? A lowercase ak means the value in collection a at index k or a[k]...
and the uppercase A refers to the entire collection (array) as a single unit? Math notation still confuses me.
Cross-synthesis!
This is one thing I really want to play with. A low-overhead cross-synthesis via LPC would be a very nice little tool to add to the collection.
The idea is pretty simple: instead of doing the analysis step once for one sound, do it twice for two sounds, one being the pitch source and the other being the noise source. So we do basically everything else the same, except in the end each analysis block produces P values for the filter which are derived from the pitch source, and 1 value for the variance (which I get represents the pitched/unpitched state of the signal but I don't follow exactly how to derive it right now) which is derived from the noise source. Taking these franken-blocks and resynthesizing a new signal from them will give us the pitched content from one sound with the sibilance/noise from the other. That's a neat way to do cross-synthesis!
Finally, the closing section has some good thoughts:
The core LPC model is understandable enough that it should be possible to fuck with many aspects of it and hopefully find some nice edges.
Lansky, P., and Steiglitz, K. 1981. "Synthesis of Timbral Families by Warped Linear Prediction." Computer Music Journal 5(3), 45-49.
Warped linear prediction tries to make use of LPC resynthesis to derive similar-sounding but distinct families of instruments from a single analysis source. Paul Lansky's Pine Ridge from the album Folk Images is a beautiful example and also the specific piece discussed in this paper:
PS: This lovely 1995 album of computer music and more is still available to purchase from Bridge Records. Lansky has also posted liner notes for the album to his website.
still available to purchase from Bridge Records
Aside: An earlier interview with Richard Moore in the CMJ collection I'm reading from (The Music Machine ed. Roads) mentions that Kenneth Steiglitz
wrote the fortran library that Paul Lansky used for his LPC-based work, at least initially. I wonder if that library is available somewhere still...
he is also the author of one of the most readable books on DSP in my opinion.
one of the most readable books on DSP in my opinion.
OK, more filter jargon right off the bat. They mention using a transfer function of 1/D(z) for the filter. I don't remember what that means. :)
Update: talking a bit with J, he says it's a shorthand for describing the essential math routine for given filter -- the filter kernel maybe? Not totally sure, but that seems to track...
In any case, the general approach to the LPC procedure described so far seems to match what I read in the Kim, in this case though the idea is to use a corpus of recorded notes as the basis of an instrumental family to be able to synthesize variations from the original by means of:
The input for Pine Ridge was an 11 second long phrase played by violinist Cyrus Stevens. I'm assuming they did manual segmentation to split the sound into logical "note sections" but I don't remember if that's discussed further... I also don't remember if there is much discussion about how Lansky approached managing that corpus of notes during the composition phase...
Anyway, in this case the particular flavor of LPC being used employs the covariance method and they say they don't window the blocks during analysis, though they still overlap by 50% in the normal overlap-add fashion...
Interlude
So, as an aside, there appear to be two basic approaches to LPC...
Update: nope! there are others... "A new autoregressive spectrum analysis algorithm" by L. Marple apparantly describes "Burg's method".
Lansky & Steiglitz say they use the covariance method with no windowing. This method is "well known" to produce unstable frames which they compensate for by adding the last half of the previous frame to the first half of the next.
In other words:
<imagine each digit is 1/4 the block size> 1234 5678 1234 3456 5678 7800
(The zero padding is an assumption! Not sure how they're handling the final block...)
They say the covariance method is also well known for producing unstable filter frames... to compensate they check each filter coefficient to see if the pole has a radius greater than 0.998, and move it to 0.998 if so...
I'm confused by that, it seems to imply they're using an FFT-based approach if they are able to convert the coefficient frequences into polar form?
I'm still also confused as to why this approach doesn't need to do any windowing!
Ellis, Dan. 2013. "Lecture 6: Linear Prediction (LPC)." E4896 Music Signal Processing, Columbia University.
OK right away these slides make the connection between resonances and the poles of a filter, which is interesting...
The pseudocode implementation of a second order IIR bandpass filter is the first example given of "simple resonance" -- copied directly from the slide:
Q = 10; omega = pi*0.1; r = 1 - omega/(2*Q); a = [1 -2*r*cos(omega) r*r]; y = filter(1,a,[1,zeros(1,100)]); stem(y)
slides covering more obvious aspects of LPC design
OK, page 6 shows the actual LPC prediction equation...
p is the number of filter poles. {ak} are the filter coefficients e[n] is the prediction error...
After some help from W reading the sigma notation this actually makes a bit of sense...
Re-written as python pseudocode:
length = 100 poles = 3 # prediction error e = [0] * length # filter coefficients a = [0] * poles # input s = [0] * length # output o = [0] * (length + poles + 1) for n in range(length): value = 0 for k in range(p): value += a[k] * s[max(0, n - k + 1)] + e[n] o[i] = value
I should re-read Charles Dodge's chapter on In Celebration from the Roads book Composers & the Computer which was my introduction to LPC...
LPCTorch uses this.