Use SIMD instructions in 4-to-8-bit frame reconstruction
Very good idea proposed by @debionne:
Reference solution: 29725 usec for 100000 frames 3.36417e+06 fps Optimized solution: 2540 usec for 100000 frames 3.93701e+07 fps It's about 11 times faster. But this is not a drop-in replacement of your code, we can work together on that! There may be similar optimizations to be done in the processlib.