testsuite: Use separate setups for unstable tests instead of should_fail
There are two possible interpretations of "expected failure": either
the test *must* fail (exactly the inverse of an ordinary test, with
success becoming failure and failure becoming success), or the test
*may* fail (with success intended, but failure possible in some
environments). Autotools had the second interpretation, which seems
more useful in practice, but Meson has the first.
Instead of using should_fail, we can put the tests in one of two new
suites: "flaky" is intended for tests that succeed or fail unpredictably
according to the test environment or chance, while "failing" is for
tests that ought to succeed but currently never do as a result of a
bug or missing functionality. With a sufficiently new version of Meson,
the flaky and failing tests are not run by default, but can be requested
by running a setup that does not exclude them, with a command like:
meson test --setup=x11_unstable --suite=flaky --suite=failing
As a bonus, now that we're setting up setups and their excluded suites
programmatically, the gsk-compare-broadway tests are also excluded by
default when running the test setup for a non-broadway backend.
When running the tests in CI, --suite=gtk overrides the default
exclude_suites, so we have to specify --no-suite=flaky and
--no-suite=failing explicitly.
This arrangement is inspired by GNOME/glib!2987, which was contributed
by Marco Trevisan.
Signed-off-by: Simon McVittie <smcv@debian.org>