x86emul/test: encourage compiler to use more embedded broadcast
For one it was an oversight to leave dup_{hi,lo}() undefined for 512-bit
vector size. And then in FMA testing we can also arrange for the
compiler to (hopefully) recognize broadcasting potential. Plus we can
replace the broadcast(1) use in the addsub() surrogate with inline
assembly explicitly using embedded broadcast (even gcc12 still doesn't
support broadcast for any of the addsub/subadd builtins).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>