(See both Quick follow-ups down below)
After a few days of optimizing Codec2’s code (3200 bps mode), it is time to share the results (and the code itself!).
What the goals were:
- provide the 3200 bps mode through a separate, clean repository (only C code, no Octave test benches, no modems etc.)
- prepare an easy experimenting ground for further improvements (beyond the bit-exactness constraint)
- code clean-up: remove all the unnecessary and obsolete constructs, applied optimizations
- fully static memory allocation (including KISS FFT)
After all of that has been done, the resulting code is still fully compatible with the original Codec2, but executes faster and with much less memory footprint. Sounds like an excellent drop-in replacement for OpenRTX and other embedded projects. See the readme file for more details.
GitHub repository: https://github.com/M17-Project/Codec2-mod
As always: have fun testing the code. Feedback is welcome!
Quick follow-up:
I have tested the modified code on an STM32F405RGTx running at 168MHz, compiled with -Os flag:
codec2_t c2;
codec2_init(&c2);
uint8_t encoded[CODEC2_BYTES_PER_FRAME] = {0};
int16_t speech[CODEC2_SAMPLES_PER_FRAME];
for (uint8_t i=0; i<CODEC2_SAMPLES_PER_FRAME; i++)
speech[i] = 0.5f * sinf(i/80.0f * TWO_PI);
uint32_t tick = HAL_GetTick();
for (uint16_t i=0; i<1000; i++)
{
codec2_encode(&c2, encoded, speech);
}
uint32_t tock = HAL_GetTick();
dbg_print("Time: %lums\n", tock-tick);
The execution time was 7.021s with the standard cosf() / acosf() pair in the LPC-LSP path and 7.019s with their optimized fast_ counterparts (see util.c for details). No significant ViSQOL MOS change. With -O2 flag, the gain is much larger, execution time was 6.909s.
Quick follow-up #2:
After replacing the decimating FIR inside nlp() with a polyphase equivalent, the execution time dropped to 6.754s (-Os). Code is upstream.
Leave a Reply