Use SIMD instructions in 4-to-8-bit frame reconstruction
Very good idea proposed by @debionne:
Reference solution:
29725 usec for 100000 frames
3.36417e+06 fps
Optimized solution:
2540 usec for 100000 frames
3.93701e+07 fps
It's about 11 times faster. But this is not a drop-in replacement of your code, we can work together on that!
There may be similar optimizations to be done in the processlib.