extensions: add AVX2 linear-float -> gamma-u8 conversions
authorEll <ell_se@yahoo.com>
Wed, 24 Jul 2019 20:24:29 +0000 (23:24 +0300)
committerEll <ell_se@yahoo.com>
Wed, 24 Jul 2019 20:41:03 +0000 (23:41 +0300)
commit41a31deeb36a72bbe568d491a326a2cebd21cf95
treed0c9825e3e806203c90cdb5ac88c8b2e3acc975b
parent385f0b545727262f58d3cfcf5523f69ace0e0166
extensions: add AVX2 linear-float -> gamma-u8 conversions

Add AVX2 conversions from Y float, YA float, RGB float, and RGBA
float, to Y' u8, Y'A u8, R'G'B' u8, and R'G'B'A u8, respectively.
The conversions use a lookup table, similarly to the two-table
conversions, indexed using AVX2's 256-bit gather instruction, which
allow us to process 8 floats at once.  Over here, this conversion
is ~5x faster than the SSE conversions.

Note, however, that unlike two-table, we don't use a second
"feedback" table to correct the result, leading to an off-by-one
conversion error with a probability of ~0.1%, using a 2^16-element
table.  This error rate is low enough for babl to use the
conversion, but might still be a bit too high regardless; it can be
further reduced with a bigger table.
extensions/Makefile.am
extensions/avx2-int8-tables.h [new file with mode: 0644]
extensions/avx2-int8.c [new file with mode: 0644]
extensions/meson.build