libm17 received a few polishes recently, bumping up its version to 1.1.9. Changes include:
improved Viterbi decoder (in terms of speed, memory use, and robustness)
slight speed improvement of the Golay decoder
improved symbol slicer (should be faster)
improved lat/lon/radius encoding
tiny API changes – using char* for callsigns now
stricter unit tests
cmake fix
various cosmetic clean-ups
gr-m17 uses libm17 1.1.9 now. Running make should result with a clean compilation. Nice! We are in the process of adding Codec 2 Out of Tree blocks, as the built-in set of Codec2 blocks depends on libcodec2. The native blocks can not be reset externally, they are not meant for “repeated” use. For the LinHT, we need improved blocks – computationally optimized and externally resettable. This is what we provide within gr-m17’s dev branch. As soon as the work is done, the experimental dev branch will be merged with main.
After a few days of tweaking Codec2 guts, one question appeared. How much worse the LSP quantizer in ACELP (EN 300 395-2) is, as compared to that in Codec2? Both use the same excitation->filter model. The latter has excellent spectral distortion, much below of what could be called “transparent”. I remember comparing it back in 2022, but the results were never published.
Now, back to theory. As per Wai Chu “Speech Coding Algorithms”, the formant filter’s transparency criteria are as follows:
average spectral distortion of less than 1 dB
less than 2% outliers having a spectral distortion above 2 dB
no outliers with spectral distortion larger than 4 dB
Spectral distortion is defined as a “distance metric” between frequency responses of two filters (we are using log magnitude spectra here):
(frequency responses) are based on discrete Fourier transforms of the measured and reference LPC filters. FFT bins are usually set to 256. Typically, at 8 kHz sample rate, and , giving 125..3,125 Hz range. Distortion outside this band is not considered perceptually critical.
Now back to Codec2 and ETSI ACELP. We are only going to focus on the 3,200 bps rate of the former (20 ms frames, 64 bits each). ACELP runs at around 4,567bps (30 ms frames, 137 bits each).
Codec2’s bit allocation for quantized LSPs is generous at 5 bits per LSP, yielding 50 bits total. ACELP isn’t as generous, spending only about half of that – 26 bits only.
Codec2 uses delta-encoded spectral frequencies in frequency domain – . ACELP utilizes split vector quantizer with LSPs in cosine domain – .
Let’s look at the spectral distortion of both codecs:
Spectral distortion of Codec2 and ETSI EN 300 395-2 ACELP
Now, the cumulative distribution (probability of SD below given threshold) for both:
Cumulative Distribution Function of the Spectral Distortion for Codec2 vs. ETSI ACELP
Conclusion? Codec2 outperforms ACELP in terms of formant reconstruction but ACELP offers much, much more sophisticated excitation model. Excitation is Codec2’s biggest weakness. I really hope to be able to propose a better model in the coming months.
I have used N=200 random files from the LibriSpeech corpus to run a comparison between Codec2 and Codec2-mod. I used ViSQOL perceptual quality estimator to calculate the Mean Opinion Score (MOS) for original-decompressed pairs for both implementations. Results are shown in the table at the bottom of this post(copied verbatim from terminal).
The few mismatches are most likely caused by slightly different floating point maths (comparisons and fast cosine and arc cosine functions). Those may diverge slightly given edge-case input signal vectors. Still, Codec2-mod is functionally bit-exact with reference Codec2 for almost all inputs.
The +0.006 difference in MOS is not significant (and even imperceptible). This number does not signify any perceptual improvement. I’m pretty sure that the value would converge to 0.0, given enough samples. A close to zero value means that there is no regression (which is good!).
This is the histogram for both implementations:
Histograms of MOS scores. N=200.Statistical analysis for 2,000 samples.(more…)
After a few days of optimizing Codec2’s code (3200 bps mode), it is time to share the results (and the code itself!).
What the goals were:
provide the 3200 bps mode through a separate, clean repository (only C code, no Octave test benches, no modems etc.)
prepare an easy experimenting ground for further improvements (beyond the bit-exactness constraint)
code clean-up: remove all the unnecessary and obsolete constructs, applied optimizations
fully static memory allocation (including KISS FFT)
After all of that has been done, the resulting code is still fully compatible with the original Codec2, but executes faster and with much less memory footprint. Sounds like an excellent drop-in replacement for OpenRTX and other embedded projects. See the readme file for more details.
The execution time was 7.021s with the standard cosf() / acosf() pair in the LPC-LSP path and 7.019s with their optimized fast_ counterparts (see util.c for details). No significant ViSQOL MOS change. With -O2 flag, the gain is much larger, execution time was 6.909s.
Quick follow-up #2:
After replacing the decimating FIR inside nlp() with a polyphase equivalent, the execution time dropped to 6.754s (-Os). Code is upstream.
M17 GNU Radio Out-of-Tree blocks (gr-m17) now support text messaging. The development is still ongoing, but single-frame text messages can already be successfully transferred between the encoder and decoder blocks. The latter emits a special message at its output, signalling successful packet text message decode.
libm17 has just been updated with polyphase square root raised cosine filter taps, for both 24 and 48kHz sample rates. Polyphase filters offer great speed improvement over classic FIR filter implementations. This approach is also being implemented in OpenRTX.
Tested on an STM32F405 (modified Nokia 3310) – filtered 1,000 frames with 10x upsampling (-Os optimization):
Hell yeah! Vlastimil, OK5VAS, managed to prepare an example LinHT SoapySDR driver for the SX1255. This allows OpenWebRX to be ran on the device, fetching IQ baseband samples directly from the ZMQ proxy (described a few posts back).
OpenWebRX decodes M17, along with other digi modes.
Successful APRS decode. Courtesy of Vlastimil, OK5VAS
Astonishing work! This is the innovation amateur radio world needs. Can your radio do this? 😉
Our T9 predictive text entry implementation has just been updated with a binary search. By using this kind of search method, with just a 6kB overhead for a 22kB dictionary (about 3,000 entries), the search time decreases considerably from 6.3 to about 0.27 milliseconds. The previous version required over 11 milliseconds to perform the same task (using linear method).
That extra space is used to store word locations within the sorted dictionary (array of uint16_t).