GIT update of https://sourceware.org/git/glibc.git/release/2.41/master from glibc-2.41
GIT update of https://sourceware.org/git/glibc.git/release/2.41/master from glibc-2.41
Gbp-Pq: Name git-updates.diff
--- /dev/null
+For the GNU C Library Security Advisories, see the git master branch:
+https://sourceware.org/git/?p=glibc.git;a=tree;f=advisories;hb=HEAD
$(link-libc-tests-after-rpath-link)
# This is how to find at build-time things that will be installed there.
-rpath-dirs = math elf dlfcn nss nis rt resolv mathvec support
+rpath-dirs = math elf dlfcn nss nis rt resolv mathvec support misc
rpath-link = \
$(common-objdir):$(subst $(empty) ,:,$(patsubst ../$(subdir),.,$(rpath-dirs:%=$(common-objpfx)%)))
else # build-static
Please send GNU C library bug reports via <https://sourceware.org/bugzilla/>
using `glibc' in the "product" field.
\f
+Version 2.41.1
+
+Deprecated and removed features, and other changes affecting compatibility:
+
+* The glibc.rtld.execstack now supports a compatibility mode to allow
+ programs that require an executable stack through dynamic loaded
+ shared libraries.
+
+The following bugs were resolved with this release:
+
+ [32269] RISC-V IFUNC resolver cannot access gp pointer
+ [32626] math: math: log10p1f is not correctly rounded
+ [32627] math: math: sinhf is not correctly rounded
+ [32630] math: math: tanf is not correctly rounded for all rounding
+ modes
+ [32653] dynamic-link: Review options for improving both security and
+ backwards compatibility of glibc 2.41 dlopen / execstack handling
+ [32781] Linux: Remove attribute access from sched_getattr
+ [32782] nptl: Race conditions in pthread cancellation causing crash
+ [32786] nptl: PTHREAD_COND_INITIALIZER compatibility with pre-2.41 versions
+ [32810] Crash on x86-64 if XSAVEC disable via tunable
+ [32882] tst-audit10 fails with SIGILL on CPUs without AVX
+ [32897] dynamic-link: pthread_getattr_np fails when executable stack
+ tunable is set
+\f
Version 2.41
Major new features:
+++ /dev/null
-printf: incorrect output for integers with thousands separator and width field
-
-When the printf family of functions is called with a format specifier
-that uses an <apostrophe> (enable grouping) and a minimum width
-specifier, the resulting output could be larger than reasonably expected
-by a caller that computed a tight bound on the buffer size. The
-resulting larger than expected output could result in a buffer overflow
-in the printf family of functions.
-
-CVE-Id: CVE-2023-25139
-Public-Date: 2023-02-02
-Vulnerable-Commit: e88b9f0e5cc50cab57a299dc7efe1a4eb385161d (2.37)
-Fix-Commit: c980549cc6a1c03c23cc2fe3e7b0fe626a0364b0 (2.38)
-Fix-Commit: 07b9521fc6369d000216b96562ff7c0ed32a16c4 (2.37-4)
+++ /dev/null
-getaddrinfo: Stack read overflow in no-aaaa mode
-
-If the system is configured in no-aaaa mode via /etc/resolv.conf,
-getaddrinfo is called for the AF_UNSPEC address family, and a DNS
-response is received over TCP that is larger than 2048 bytes,
-getaddrinfo may potentially disclose stack contents via the returned
-address data, or crash.
-
-CVE-Id: CVE-2023-4527
-Public-Date: 2023-09-12
-Vulnerable-Commit: f282cdbe7f436c75864e5640a409a10485e9abb2 (2.36)
-Fix-Commit: bd77dd7e73e3530203be1c52c8a29d08270cb25d (2.39)
-Fix-Commit: 4ea972b7edd7e36610e8cde18bf7a8149d7bac4f (2.36-113)
-Fix-Commit: b7529346025a130fee483d42178b5c118da971bb (2.37-38)
-Fix-Commit: b25508dd774b617f99419bdc3cf2ace4560cd2d6 (2.38-19)
+++ /dev/null
-getaddrinfo: Potential use-after-free
-
-When an NSS plugin only implements the _gethostbyname2_r and
-_getcanonname_r callbacks, getaddrinfo could use memory that was freed
-during buffer resizing, potentially causing a crash or read or write to
-arbitrary memory.
-
-CVE-Id: CVE-2023-4806
-Public-Date: 2023-09-12
-Fix-Commit: 973fe93a5675c42798b2161c6f29c01b0e243994 (2.39)
-Fix-Commit: e09ee267c03e3150c2c9ba28625ab130705a485e (2.34-420)
-Fix-Commit: e3ccb230a961b4797510e6a1f5f21fd9021853e7 (2.35-270)
-Fix-Commit: a9728f798ec7f05454c95637ee6581afaa9b487d (2.36-115)
-Fix-Commit: 6529a7466c935f36e9006b854d6f4e1d4876f942 (2.37-39)
-Fix-Commit: 00ae4f10b504bc4564e9f22f00907093f1ab9338 (2.38-20)
+++ /dev/null
-tunables: local privilege escalation through buffer overflow
-
-If a tunable of the form NAME=NAME=VAL is passed in the environment of a
-setuid program and NAME is valid, it may result in a buffer overflow,
-which could be exploited to achieve escalated privileges. This flaw was
-introduced in glibc 2.34.
-
-CVE-Id: CVE-2023-4911
-Public-Date: 2023-10-03
-Vulnerable-Commit: 2ed18c5b534d9e92fc006202a5af0df6b72e7aca (2.34)
-Fix-Commit: 1056e5b4c3f2d90ed2b4a55f96add28da2f4c8fa (2.39)
-Fix-Commit: dcc367f148bc92e7f3778a125f7a416b093964d9 (2.34-423)
-Fix-Commit: c84018a05aec80f5ee6f682db0da1130b0196aef (2.35-274)
-Fix-Commit: 22955ad85186ee05834e47e665056148ca07699c (2.36-118)
-Fix-Commit: b4e23c75aea756b4bddc4abcf27a1c6dca8b6bd3 (2.37-45)
-Fix-Commit: 750a45a783906a19591fb8ff6b7841470f1f5701 (2.38-27)
+++ /dev/null
-getaddrinfo: DoS due to memory leak
-
-The fix for CVE-2023-4806 introduced a memory leak when an application
-calls getaddrinfo for AF_INET6 with AI_CANONNAME, AI_ALL and AI_V4MAPPED
-flags set.
-
-CVE-Id: CVE-2023-5156
-Public-Date: 2023-09-25
-Vulnerable-Commit: e09ee267c03e3150c2c9ba28625ab130705a485e (2.34-420)
-Vulnerable-Commit: e3ccb230a961b4797510e6a1f5f21fd9021853e7 (2.35-270)
-Vulnerable-Commit: a9728f798ec7f05454c95637ee6581afaa9b487d (2.36-115)
-Vulnerable-Commit: 6529a7466c935f36e9006b854d6f4e1d4876f942 (2.37-39)
-Vulnerable-Commit: 00ae4f10b504bc4564e9f22f00907093f1ab9338 (2.38-20)
-Fix-Commit: 8006457ab7e1cd556b919f477348a96fe88f2e49 (2.34-421)
-Fix-Commit: 17092c0311f954e6f3c010f73ce3a78c24ac279a (2.35-272)
-Fix-Commit: 856bac55f98dc840e7c27cfa82262b933385de90 (2.36-116)
-Fix-Commit: 4473d1b87d04b25cdd0e0354814eeaa421328268 (2.37-42)
-Fix-Commit: 5ee59ca371b99984232d7584fe2b1a758b4421d3 (2.38-24)
+++ /dev/null
-syslog: Heap buffer overflow in __vsyslog_internal
-
-__vsyslog_internal did not handle a case where printing a SYSLOG_HEADER
-containing a long program name failed to update the required buffer
-size, leading to the allocation and overflow of a too-small buffer on
-the heap.
-
-CVE-Id: CVE-2023-6246
-Public-Date: 2024-01-30
-Vulnerable-Commit: 52a5be0df411ef3ff45c10c7c308cb92993d15b1 (2.37)
-Fix-Commit: 6bd0e4efcc78f3c0115e5ea9739a1642807450da (2.39)
-Fix-Commit: 23514c72b780f3da097ecf33a793b7ba9c2070d2 (2.38-42)
-Fix-Commit: 97a4292aa4a2642e251472b878d0ec4c46a0e59a (2.37-57)
-Vulnerable-Commit: b0e7888d1fa2dbd2d9e1645ec8c796abf78880b9 (2.36-16)
-Fix-Commit: d1a83b6767f68b3cb5b4b4ea2617254acd040c82 (2.36-126)
+++ /dev/null
-syslog: Heap buffer overflow in __vsyslog_internal
-
-__vsyslog_internal used the return value of snprintf/vsnprintf to
-calculate buffer sizes for memory allocation. If these functions (for
-any reason) failed and returned -1, the resulting buffer would be too
-small to hold output.
-
-CVE-Id: CVE-2023-6779
-Public-Date: 2024-01-30
-Vulnerable-Commit: 52a5be0df411ef3ff45c10c7c308cb92993d15b1 (2.37)
-Fix-Commit: 7e5a0c286da33159d47d0122007aac016f3e02cd (2.39)
-Fix-Commit: d0338312aace5bbfef85e03055e1212dd0e49578 (2.38-43)
-Fix-Commit: 67062eccd9a65d7fda9976a56aeaaf6c25a80214 (2.37-58)
-Vulnerable-Commit: b0e7888d1fa2dbd2d9e1645ec8c796abf78880b9 (2.36-16)
-Fix-Commit: 2bc9d7c002bdac38b5c2a3f11b78e309d7765b83 (2.36-127)
+++ /dev/null
-syslog: Integer overflow in __vsyslog_internal
-
-__vsyslog_internal calculated a buffer size by adding two integers, but
-did not first check if the addition would overflow.
-
-CVE-Id: CVE-2023-6780
-Public-Date: 2024-01-30
-Vulnerable-Commit: 52a5be0df411ef3ff45c10c7c308cb92993d15b1 (2.37)
-Fix-Commit: ddf542da94caf97ff43cc2875c88749880b7259b (2.39)
-Fix-Commit: d37c2b20a4787463d192b32041c3406c2bd91de0 (2.38-44)
-Fix-Commit: 2b58cba076e912961ceaa5fa58588e4b10f791c0 (2.37-59)
-Vulnerable-Commit: b0e7888d1fa2dbd2d9e1645ec8c796abf78880b9 (2.36-16)
-Fix-Commit: b9b7d6a27aa0632f334352fa400771115b3c69b7 (2.36-128)
+++ /dev/null
-ISO-2022-CN-EXT: fix out-of-bound writes when writing escape sequence
-
-The iconv() function in the GNU C Library versions 2.39 and older may
-overflow the output buffer passed to it by up to 4 bytes when converting
-strings to the ISO-2022-CN-EXT character set, which may be used to
-crash an application or overwrite a neighbouring variable.
-
-ISO-2022-CN-EXT uses escape sequences to indicate character set changes
-(as specified by RFC 1922). While the SOdesignation has the expected
-bounds checks, neither SS2designation nor SS3designation have its;
-allowing a write overflow of 1, 2, or 3 bytes with fixed values:
-'$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'.
-
-CVE-Id: CVE-2024-2961
-Public-Date: 2024-04-17
-Vulnerable-Commit: 755104edc75c53f4a0e7440334e944ad3c6b32fc (2.1.93-169)
-Fix-Commit: f9dc609e06b1136bb0408be9605ce7973a767ada (2.40)
-Fix-Commit: 31da30f23cddd36db29d5b6a1c7619361b271fb4 (2.39-31)
-Fix-Commit: e1135387deded5d73924f6ca20c72a35dc8e1bda (2.38-66)
-Fix-Commit: 89ce64b269a897a7780e4c73a7412016381c6ecf (2.37-89)
-Fix-Commit: 4ed98540a7fd19f458287e783ae59c41e64df7b5 (2.36-164)
-Fix-Commit: 36280d1ce5e245aabefb877fe4d3c6cff95dabfa (2.35-315)
-Fix-Commit: a8b0561db4b9847ebfbfec20075697d5492a363c (2.34-459)
-Fix-Commit: ed4f16ff6bed3037266f1fa682ebd32a18fce29c (2.33-263)
-Fix-Commit: 682ad4c8623e611a971839990ceef00346289cc9 (2.32-140)
-Fix-Commit: 3703c32a8d304c1ee12126134ce69be965f38000 (2.31-154)
-
-Reported-By: Charles Fol
+++ /dev/null
-nscd: Stack-based buffer overflow in netgroup cache
-
-If the Name Service Cache Daemon's (nscd) fixed size cache is exhausted
-by client requests then a subsequent client request for netgroup data
-may result in a stack-based buffer overflow. This flaw was introduced
-in glibc 2.15 when the cache was added to nscd.
-
-This vulnerability is only present in the nscd binary.
-
-CVE-Id: CVE-2024-33599
-Public-Date: 2024-04-23
-Vulnerable-Commit: 684ae515993269277448150a1ca70db3b94aa5bd (2.15)
-Fix-Commit: 69c58d5ef9f584ea198bd00f7964d364d0e6b921 (2.31-155)
-Fix-Commit: a77064893bfe8a701770e2f53a4d33805bc47a5a (2.32-141)
-Fix-Commit: 5c75001a96abcd50cbdb74df24c3f013188d076e (2.33-264)
-Fix-Commit: 52f73e5c4e29b14e79167272297977f360ae1e97 (2.34-460)
-Fix-Commit: 7a95873543ce225376faf13bb71c43dea6d24f86 (2.35-316)
-Fix-Commit: caa3151ca460bdd9330adeedd68c3112d97bffe4 (2.36-165)
-Fix-Commit: f75c298e747b2b8b41b1c2f551c011a52c41bfd1 (2.37-91)
-Fix-Commit: 5968aebb86164034b8f8421b4abab2f837a5bdaf (2.38-72)
-Fix-Commit: 1263d583d2e28afb8be53f8d6922f0842036f35d (2.39-35)
-Fix-Commit: 87801a8fd06db1d654eea3e4f7626ff476a9bdaa (2.40)
+++ /dev/null
-nscd: Null pointer crash after notfound response
-
-If the Name Service Cache Daemon's (nscd) cache fails to add a not-found
-netgroup response to the cache, the client request can result in a null
-pointer dereference. This flaw was introduced in glibc 2.15 when the
-cache was added to nscd.
-
-This vulnerability is only present in the nscd binary.
-
-CVE-Id: CVE-2024-33600
-Public-Date: 2024-04-24
-Vulnerable-Commit: 684ae515993269277448150a1ca70db3b94aa5bd (2.15)
-Fix-Commit: b048a482f088e53144d26a61c390bed0210f49f2 (2.40)
-Fix-Commit: 7835b00dbce53c3c87bbbb1754a95fb5e58187aa (2.40)
-Fix-Commit: c99f886de54446cd4447db6b44be93dabbdc2f8b (2.39-37)
-Fix-Commit: 5a508e0b508c8ad53bd0d2fb48fd71b242626341 (2.39-36)
-Fix-Commit: 2ae9446c1b7a3064743b4a51c0bbae668ee43e4c (2.38-74)
-Fix-Commit: 541ea5172aa658c4bd5c6c6d6fd13903c3d5bb0a (2.38-73)
-Fix-Commit: a8070b31043c7585c36ba68a74298c4f7af075c3 (2.37-93)
-Fix-Commit: 5eea50c4402e39588de98aa1d4469a79774703d4 (2.37-92)
-Fix-Commit: f205b3af56740e3b014915b1bd3b162afe3407ef (2.36-167)
-Fix-Commit: c34f470a615b136170abd16142da5dd0c024f7d1 (2.36-166)
-Fix-Commit: bafadc589fbe21ae330e8c2af74db9da44a17660 (2.35-318)
-Fix-Commit: 4370bef52b0f3f3652c6aa13d7a9bb3ac079746d (2.35-317)
-Fix-Commit: 1f94122289a9bf7dba573f5d60327aaa2b85cf2e (2.34-462)
-Fix-Commit: 966d6ac9e40222b84bb21674cc4f83c8d72a5a26 (2.34-461)
-Fix-Commit: e3eef1b8fbdd3a7917af466ca9c4b7477251ca79 (2.33-266)
-Fix-Commit: f20a8d696b13c6261b52a6434899121f8b19d5a7 (2.33-265)
-Fix-Commit: be602180146de37582a3da3a0caa4b719645de9c (2.32-143)
-Fix-Commit: 394eae338199078b7961b051c191539870742d7b (2.32-142)
-Fix-Commit: 8d7949183760170c61e55def723c1d8050187874 (2.31-157)
-Fix-Commit: 304ce5fe466c4762b21b36c26926a4657b59b53e (2.31-156)
+++ /dev/null
-nscd: netgroup cache may terminate daemon on memory allocation failure
-
-The Name Service Cache Daemon's (nscd) netgroup cache uses xmalloc or
-xrealloc and these functions may terminate the process due to a memory
-allocation failure resulting in a denial of service to the clients. The
-flaw was introduced in glibc 2.15 when the cache was added to nscd.
-
-This vulnerability is only present in the nscd binary.
-
-Subsequent refactoring of the netgroup cache only added more uses of
-xmalloc and xrealloc. Uses of xmalloc and xrealloc in other parts of
-nscd only occur during startup of the daemon and so are not affected by
-client requests that could trigger an out of memory followed by
-termination.
-
-CVE-Id: CVE-2024-33601
-Public-Date: 2024-04-24
-Vulnerable-Commit: 684ae515993269277448150a1ca70db3b94aa5bd (2.15)
-Fix-Commit: c04a21e050d64a1193a6daab872bca2528bda44b (2.40)
-Fix-Commit: a9a8d3eebb145779a18d90e3966009a1daa63cd8 (2.39-38)
-Fix-Commit: 71af8ca864345d39b746d5cee84b94b430fad5db (2.38-75)
-Fix-Commit: 6e106dc214d6a033a4e945d1c6cf58061f1c5f1f (2.37-94)
-Fix-Commit: b6742463694b1dfdd5120b91ee21cf05d15ec2e2 (2.36-168)
-Fix-Commit: 7a5864cac60e06000394128a5a2817b03542f5a3 (2.35-319)
-Fix-Commit: 86f1d5f4129c373ac6fb6df5bcf38273838843cb (2.34-463)
-Fix-Commit: 4d27d4b9a188786fc6a56745506cec2acfc51f83 (2.33-267)
-Fix-Commit: 3ed195a8ec89da281e3c4bf887a13d281b72d8f4 (2.32-144)
-Fix-Commit: bbf5a58ccb55679217f94de706164d15372fbbc0 (2.31-158)
+++ /dev/null
-nscd: netgroup cache assumes NSS callback uses in-buffer strings
-
-The Name Service Cache Daemon's (nscd) netgroup cache can corrupt memory
-when the NSS callback does not store all strings in the provided buffer.
-The flaw was introduced in glibc 2.15 when the cache was added to nscd.
-
-This vulnerability is only present in the nscd binary.
-
-There is no guarantee from the NSS callback API that the returned
-strings are all within the buffer. However, the netgroup cache code
-assumes that the NSS callback uses in-buffer strings and if it doesn't
-the buffer resizing logic could lead to potential memory corruption.
-
-CVE-Id: CVE-2024-33602
-Public-Date: 2024-04-24
-Vulnerable-Commit: 684ae515993269277448150a1ca70db3b94aa5bd (2.15)
-Fix-Commit: c04a21e050d64a1193a6daab872bca2528bda44b (2.40)
-Fix-Commit: a9a8d3eebb145779a18d90e3966009a1daa63cd8 (2.39-38)
-Fix-Commit: 71af8ca864345d39b746d5cee84b94b430fad5db (2.38-75)
-Fix-Commit: 6e106dc214d6a033a4e945d1c6cf58061f1c5f1f (2.37-94)
-Fix-Commit: b6742463694b1dfdd5120b91ee21cf05d15ec2e2 (2.36-168)
-Fix-Commit: 7a5864cac60e06000394128a5a2817b03542f5a3 (2.35-319)
-Fix-Commit: 86f1d5f4129c373ac6fb6df5bcf38273838843cb (2.34-463)
-Fix-Commit: 4d27d4b9a188786fc6a56745506cec2acfc51f83 (2.33-267)
-Fix-Commit: 3ed195a8ec89da281e3c4bf887a13d281b72d8f4 (2.32-144)
-Fix-Commit: bbf5a58ccb55679217f94de706164d15372fbbc0 (2.31-158)
+++ /dev/null
-assert: Buffer overflow when printing assertion failure message
-
-When the assert() function fails, it does not allocate enough space for the
-assertion failure message string and size information, which may lead to a
-buffer overflow if the message string size aligns to page size.
-
-This bug can be triggered when an assertion in a program fails. The assertion
-failure message is allocated to allow developers to see this failure in core
-dumps and it typically includes, in addition to the invariant assertion
-string and function name, the name of the program. If the name of the failing
-program is user controlled, for example on a local system, this could allow an
-attacker to control the assertion failure to trigger this buffer overflow.
-
-The only viable vector for exploitation of this bug is local, if a setuid
-program exists that has an existing bug that results in an assertion failure.
-No such program has been discovered at the time of publishing this advisory,
-but the presence of custom setuid programs, although strongly discouraged as a
-security practice, cannot be discounted.
-
-CVE-Id: CVE-2025-0395
-Public-Date: 2025-01-22
-Vulnerable-Commit: f8a3b5bf8fa1d0c43d2458e03cc109a04fdef194 (2.13-175)
-Fix-Commit: 68ee0f704cb81e9ad0a78c644a83e1e9cd2ee578 (2.41)
-Fix-Commit: 7d4b6bcae91f29d7b4daf15bab06b66cf1d2217c (2.40-66)
-Reported-By: Qualys Security Advisory
+++ /dev/null
-GNU C Library Security Advisory Format
-======================================
-
-Security advisories in this directory follow a simple git commit log
-format, with a heading and free-format description augmented with tags
-to allow parsing key information. References to code changes are
-specific to the glibc repository and follow a specific format:
-
- Tag-name: <commit-ref> (release-version)
-
-The <commit-ref> indicates a specific commit in the repository. The
-release-version indicates the publicly consumable release in which this
-commit is known to exist. The release-version is derived from the
-git-describe format, (i.e. stripped out from glibc-2.34.NNN-gxxxx) and
-is of the form 2.34-NNN. If the -NNN suffix is absent, it means that
-the change is in that release tarball, otherwise the change is on the
-release/2.YY/master branch and not in any released tarball.
-
-The following tags are currently being used:
-
-CVE-Id:
-This is the CVE-Id assigned under the CVE Program
-(https://www.cve.org/).
-
-Public-Date:
-The date this issue became publicly known.
-
-Vulnerable-Commit:
-The commit that introduced this vulnerability. There could be multiple
-entries, one for each release branch in the glibc repository; the
-release-version portion of this tag should tell you which branch this is
-on.
-
-Fix-Commit:
-The commit that fixed this vulnerability. There could be multiple
-entries for each release branch in the glibc repository, indicating that
-all of those commits contributed to fixing that issue in each of those
-branches.
-
-Reported-By:
-The entity that reported this issue. There could be multiple entries, one for
-each reporter.
-
-Adding an Advisory
-------------------
-
-An advisory for a CVE needs to be added on the master branch in two steps:
-
-1. Add the text of the advisory without any Fix-Commit tags along with
- the fix for the CVE. Add the Vulnerable-Commit tag, if applicable.
- The advisories directory does not exist in release branches, so keep
- the advisory text commit distinct from the code changes, to ease
- backports. Ask for the GLIBC-SA advisory number from the security
- team.
-
-2. Finish all backports on release branches and then back on the msater
- branch, add all commit refs to the advisory using the Fix-Commit
- tags. Don't bother adding the release-version subscript since the
- next step will overwrite it.
-
-3. Run the process-advisories.sh script in the scripts directory on the
- advisory:
-
- scripts/process-advisories.sh update GLIBC-SA-YYYY-NNNN
-
- (replace YYYY-NNNN with the actual advisory number).
-
-4. Verify the updated advisory and push the result.
-
-Getting a NEWS snippet from advisories
---------------------------------------
-
-Run:
-
- scripts/process-advisories.sh news
-
-and copy the content into the NEWS file.
test-assert-perr \
tst-assert-c++ \
tst-assert-g++ \
+ tst-assert-sa-2025-0001 \
# tests
ifeq ($(have-cxx-thread_local),yes)
--- /dev/null
+/* Test for CVE-2025-0395.
+ Copyright The GNU Toolchain Authors.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* Test that a large enough __progname does not result in a buffer overflow
+ when printing an assertion failure. This was CVE-2025-0395. */
+#include <assert.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xstdio.h>
+#include <support/xunistd.h>
+
+extern const char *__progname;
+
+int
+do_test (int argc, char **argv)
+{
+
+ support_need_proc ("Reads /proc/self/maps to add guards to writable maps.");
+ ignore_stderr ();
+
+ /* XXX assumes that the assert is on a 2 digit line number. */
+ const char *prompt = ": %s:99: do_test: Assertion `argc < 1' failed.\n";
+
+ int ret = fprintf (stderr, prompt, __FILE__);
+ if (ret < 0)
+ FAIL_EXIT1 ("fprintf failed: %m\n");
+
+ size_t pagesize = getpagesize ();
+ size_t namesize = pagesize - 1 - ret;
+
+ /* Alter the progname so that the assert message fills the entire page. */
+ char progname[namesize];
+ memset (progname, 'A', namesize - 1);
+ progname[namesize - 1] = '\0';
+ __progname = progname;
+
+ FILE *f = xfopen ("/proc/self/maps", "r");
+ char *line = NULL;
+ size_t len = 0;
+ uintptr_t prev_to = 0;
+
+ /* Pad the beginning of every writable mapping with a PROT_NONE map. This
+ ensures that the mmap in the assert_fail path never ends up below a
+ writable map and will terminate immediately in case of a buffer
+ overflow. */
+ while (xgetline (&line, &len, f))
+ {
+ uintptr_t from, to;
+ char perm[4];
+
+ sscanf (line, "%" SCNxPTR "-%" SCNxPTR " %c%c%c%c ",
+ &from, &to,
+ &perm[0], &perm[1], &perm[2], &perm[3]);
+
+ bool writable = (memchr (perm, 'w', 4) != NULL);
+
+ if (prev_to != 0 && from - prev_to > pagesize && writable)
+ xmmap ((void *) from - pagesize, pagesize, PROT_NONE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0);
+
+ prev_to = to;
+ }
+
+ xfclose (f);
+
+ assert (argc < 1);
+ return 0;
+}
+
+#define EXPECTED_SIGNAL SIGABRT
+#define TEST_FUNCTION_ARGV do_test
+#include <support/test-driver.c>
## args: double
## ret: double
## includes: math.h
+## name: workload-random
0x1.5a2730bacd94ap-1
-0x1.b57eb40fc048ep-21
-0x1.c0b185fb450e2p-17
## args: double
## ret: double
## includes: math.h
+## name: workload-random
0x1.bcb6129b5ff2bp8
-0x1.63057386325ebp9
0x1.62f1d7dc4e8bfp9
enable-werror = @enable_werror@
have-z-execstack = @libc_cv_z_execstack@
+have-no-error-execstack = @libc_cv_no_error_execstack@
have-protected-data = @libc_cv_protected_data@
have-insert = @libc_cv_insert@
have-glob-dat-reloc = @libc_cv_has_glob_dat@
libc_cv_fpie
libc_cv_test_static_pie
libc_cv_z_execstack
+libc_cv_no_error_execstack
ASFLAGS_config
libc_cv_cc_with_libunwind
libc_cv_insert
fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for linker that supports --no-error-execstack" >&5
+printf %s "checking for linker that supports --no-error-execstack... " >&6; }
+libc_linker_feature=no
+cat > conftest.c <<EOF
+int _start (void) { return 42; }
+EOF
+if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp
+ -Wl,--no-error-execstack -nostdlib -nostartfiles
+ -fPIC -shared -o conftest.so conftest.c
+ 1>&5'
+ { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+ (eval $ac_try) 2>&5
+ ac_status=$?
+ printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; }
+then
+ if ${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp -Wl,--no-error-execstack -nostdlib \
+ -nostartfiles -fPIC -shared -o conftest.so conftest.c 2>&1 \
+ | grep "warning: --no-error-execstack ignored" > /dev/null 2>&1; then
+ true
+ else
+ libc_linker_feature=yes
+ fi
+fi
+rm -f conftest*
+if test $libc_linker_feature = yes; then
+ libc_cv_no_error_execstack=yes
+else
+ libc_cv_no_error_execstack=no
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_linker_feature" >&5
+printf "%s\n" "$libc_linker_feature" >&6; }
+
+
{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for linker that supports -z execstack" >&5
printf %s "checking for linker that supports -z execstack... " >&6; }
libc_linker_feature=no
fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the compiler supports __attribute__ ((aligned (65536)))" >&5
+printf %s "checking whether the compiler supports __attribute__ ((aligned (65536)))... " >&6; }
+if test ${libc_cv_aligned_65536+y}
+then :
+ printf %s "(cached) " >&6
+else case e in #(
+ e)
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+
+char bss0xb5dce8 __attribute__ ((aligned (65536)));
+
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"
+then :
+ libc_cv_aligned_65536=yes
+else case e in #(
+ e) libc_cv_aligned_65536=no ;;
+esac
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
+ ;;
+esac
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_cv_aligned_65536" >&5
+printf "%s\n" "$libc_cv_aligned_65536" >&6; }
+config_vars="$config_vars
+aligned-65536 = $libc_cv_aligned_65536"
+
ac_ext=cpp
ac_cpp='$CXXCPP $CPPFLAGS'
ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
config_vars="$config_vars
load-address-ldflag = $libc_cv_load_address_ldflag"
+# Check if compilers support GCS in branch protection:
+
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if compiler supports -mbranch-protection=gcs" >&5
+printf %s "checking if compiler supports -mbranch-protection=gcs... " >&6; }
+if test ${libc_cv_cc_gcs+y}
+then :
+ printf %s "(cached) " >&6
+else case e in #(
+ e) if { ac_try='${CC-cc} -Werror -mbranch-protection=gcs -xc /dev/null -S -o /dev/null'
+ { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+ (eval $ac_try) 2>&5
+ ac_status=$?
+ printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; }
+then :
+ libc_cv_cc_gcs=yes
+else case e in #(
+ e) libc_cv_cc_gcs=no ;;
+esac
+fi ;;
+esac
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_cv_cc_gcs" >&5
+printf "%s\n" "$libc_cv_cc_gcs" >&6; }
+if test "$TEST_CC" = "$CC"; then
+ libc_cv_test_cc_gcs=$libc_cv_cc_gcs
+else
+
+saved_CC="$CC"
+CC="$TEST_CC"
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if compiler supports -mbranch-protection=gcs in testing" >&5
+printf %s "checking if compiler supports -mbranch-protection=gcs in testing... " >&6; }
+if test ${libc_cv_test_cc_gcs+y}
+then :
+ printf %s "(cached) " >&6
+else case e in #(
+ e) if { ac_try='${CC-cc} -Werror -mbranch-protection=gcs -xc /dev/null -S -o /dev/null'
+ { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+ (eval $ac_try) 2>&5
+ ac_status=$?
+ printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; }
+then :
+ libc_cv_test_cc_gcs=yes
+else case e in #(
+ e) libc_cv_test_cc_gcs=no ;;
+esac
+fi ;;
+esac
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_cv_test_cc_gcs" >&5
+printf "%s\n" "$libc_cv_test_cc_gcs" >&6; }
+
+CC="$saved_CC"
+
+fi
+
+config_vars="$config_vars
+have-cc-gcs = $libc_cv_cc_gcs"
+config_vars="$config_vars
+have-test-cc-gcs = $libc_cv_test_cc_gcs"
+
+# Check if linker supports GCS marking
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for linker that supports -z gcs=always" >&5
+printf %s "checking for linker that supports -z gcs=always... " >&6; }
+libc_linker_feature=no
+cat > conftest.c <<EOF
+int _start (void) { return 42; }
+EOF
+if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp
+ -Wl,-z,gcs=always -nostdlib -nostartfiles
+ -fPIC -shared -o conftest.so conftest.c
+ 1>&5'
+ { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+ (eval $ac_try) 2>&5
+ ac_status=$?
+ printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; }
+then
+ if ${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp -Wl,-z,gcs=always -nostdlib \
+ -nostartfiles -fPIC -shared -o conftest.so conftest.c 2>&1 \
+ | grep "warning: -z gcs=always ignored" > /dev/null 2>&1; then
+ true
+ else
+ libc_linker_feature=yes
+ fi
+fi
+rm -f conftest*
+if test $libc_linker_feature = yes; then
+ libc_cv_ld_gcs=yes
+else
+ libc_cv_ld_gcs=no
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_linker_feature" >&5
+printf "%s\n" "$libc_linker_feature" >&6; }
+config_vars="$config_vars
+have-ld-gcs = $libc_cv_ld_gcs"
+
{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if we can build programs as PIE" >&5
printf %s "checking if we can build programs as PIE... " >&6; }
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
fi
AC_SUBST(ASFLAGS_config)
+LIBC_LINKER_FEATURE([--no-error-execstack], [-Wl,--no-error-execstack],
+ [libc_cv_no_error_execstack=yes], [libc_cv_no_error_execstack=no])
+AC_SUBST(libc_cv_no_error_execstack)
+
LIBC_LINKER_FEATURE([-z execstack], [-Wl,-z,execstack],
[libc_cv_z_execstack=yes], [libc_cv_z_execstack=no])
AC_SUBST(libc_cv_z_execstack)
AC_DEFINE([HAVE_BUILTIN_TRAP])
fi
+dnl Check if
+AC_CACHE_CHECK([whether the compiler supports __attribute__ ((aligned (65536)))],
+ libc_cv_aligned_65536, [
+AC_COMPILE_IFELSE([AC_LANG_SOURCE([
+char bss[0xb5dce8] __attribute__ ((aligned (65536)));
+])],
+ [libc_cv_aligned_65536=yes],
+ [libc_cv_aligned_65536=no])
+])
+LIBC_CONFIG_VAR([aligned-65536], [$libc_cv_aligned_65536])
+
dnl C++ feature tests.
AC_LANG_PUSH([C++])
[libc_cv_load_address_ldflag=])
LIBC_CONFIG_VAR([load-address-ldflag], [$libc_cv_load_address_ldflag])
+# Check if compilers support GCS in branch protection:
+LIBC_TRY_CC_AND_TEST_CC_OPTION([if compiler supports -mbranch-protection=gcs],
+ [-Werror -mbranch-protection=gcs],
+ libc_cv_cc_gcs,
+ [libc_cv_cc_gcs=yes],
+ [libc_cv_cc_gcs=no],
+ libc_cv_test_cc_gcs,
+ [libc_cv_test_cc_gcs=yes],
+ [libc_cv_test_cc_gcs=no])
+LIBC_CONFIG_VAR([have-cc-gcs], [$libc_cv_cc_gcs])
+LIBC_CONFIG_VAR([have-test-cc-gcs], [$libc_cv_test_cc_gcs])
+
+# Check if linker supports GCS marking
+LIBC_LINKER_FEATURE([-z gcs=always], [-Wl,-z,gcs=always],
+ [libc_cv_ld_gcs=yes], [libc_cv_ld_gcs=no])
+LIBC_CONFIG_VAR([have-ld-gcs], [$libc_cv_ld_gcs])
+
AC_MSG_CHECKING(if we can build programs as PIE)
AC_COMPILE_IFELSE([AC_LANG_SOURCE([[#ifdef PIE_UNSUPPORTED
# error PIE is not supported
dl-deps \
dl-exception \
dl-execstack \
+ dl-execstack-tunable \
dl-fini \
dl-init \
dl-load \
tst-execstack \
tst-execstack-needed \
tst-execstack-prog \
+ tst-execstack-tunable \
# tests-execstack-yes
tests-execstack-static-yes = \
- tst-execstack-prog-static
+ tst-execstack-prog-static \
+ tst-execstack-prog-static-tunable \
# tests-execstack-static-yes
ifeq (yes,$(run-built-tests))
tests-execstack-special-yes = \
tst-pie1 \
tst-pie2 \
# tests-pie
+ifeq (yes,$(aligned-65536))
+tests += tst-pie-bss
+tests-pie += tst-pie-bss
+endif
ifneq (,$(load-address-ldflag))
tests += \
tst-pie-address \
tests-static += \
tst-pie-address-static \
# tests-static
+ifeq (yes,$(aligned-65536))
+tests += tst-pie-bss-static
+tests-static += tst-pie-bss-static
+endif
LDFLAGS-tst-pie-address-static += \
$(load-address-ldflag)=$(pde-load-address)
endif
CPPFLAGS-tst-execstack.c += -DUSE_PTHREADS=0
LDFLAGS-tst-execstack = -Wl,-z,noexecstack
LDFLAGS-tst-execstack-mod.so = -Wl,-z,execstack
+ifeq ($(have-no-error-execstack),yes)
+LDFLAGS-tst-execstack-mod.so += -Wl,--no-error-execstack
+endif
$(objpfx)tst-execstack-needed: $(objpfx)tst-execstack-mod.so
LDFLAGS-tst-execstack-needed = -Wl,-z,noexecstack
CFLAGS-tst-execstack-prog.c += -Wno-trampolines
CFLAGS-tst-execstack-mod.c += -Wno-trampolines
+# It expects loading a module with executable stack to work.
+CFLAGS-tst-execstack-tunable.c += -DUSE_PTHREADS=0 -DDEFAULT_RWX_STACK=1
+$(objpfx)tst-execstack-tunable.out: $(objpfx)tst-execstack-mod.so
+tst-execstack-tunable-ENV = GLIBC_TUNABLES=glibc.rtld.execstack=2
+
+LDFLAGS-tst-execstack-prog-static-tunable = -Wl,-z,noexecstack
+tst-execstack-prog-static-tunable-ENV = GLIBC_TUNABLES=glibc.rtld.execstack=2
+
LDFLAGS-tst-execstack-prog-static = -Wl,-z,execstack
+ifeq ($(have-no-error-execstack),yes)
+LDFLAGS-tst-execstack-prog-static += -Wl,--no-error-execstack
+endif
CFLAGS-tst-execstack-prog-static.c += -Wno-trampolines
ifeq (yes,$(build-hardcoded-path-in-tests))
CFLAGS-tst-pie1.c += $(pie-ccflag)
CFLAGS-tst-pie2.c += $(pie-ccflag)
+CFLAGS-tst-pie-bss.c += $(pie-ccflag)
CFLAGS-tst-pie-address.c += $(pie-ccflag)
$(objpfx)tst-piemod1.so: $(libsupport)
--- /dev/null
+/* Stack executability handling for GNU dynamic linker.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <ldsodefs.h>
+#include <dl-tunables.h>
+
+void
+_dl_handle_execstack_tunable (void)
+{
+ switch (TUNABLE_GET (glibc, rtld, execstack, int32_t, NULL))
+ {
+ case stack_tunable_mode_disable:
+ if ((__glibc_unlikely (GL(dl_stack_flags)) & PF_X))
+ _dl_fatal_printf (
+"Fatal glibc error: executable stack is not allowed\n");
+ break;
+
+ case stack_tunable_mode_force:
+ if (_dl_make_stack_executable (__libc_stack_end) != 0)
+ _dl_fatal_printf (
+"Fatal glibc error: cannot enable executable stack as tunable requires");
+ break;
+ }
+}
so as to mprotect it. */
int
-_dl_make_stack_executable (void **stack_endp)
+_dl_make_stack_executable (const void *stack_endp)
{
return ENOSYS;
}
_dl_map_object_from_fd (const char *name, const char *origname, int fd,
struct filebuf *fbp, char *realname,
struct link_map *loader, int l_type, int mode,
- void **stack_endp, Lmid_t nsid)
+ const void *stack_endp, Lmid_t nsid)
{
struct link_map *l = NULL;
const ElfW(Ehdr) *header;
void *stack_end = __libc_stack_end;
return _dl_map_object_from_fd (name, origname, fd, &fb, realname, loader,
- type, mode, &stack_end, nsid);
+ type, mode, stack_end, nsid);
}
struct add_path_state
switch (ph->p_type)
{
case PT_LOAD:
- if (ph->p_offset == 0)
+ /* Skip the empty PT_LOAD segment at offset 0. */
+ if (ph->p_filesz != 0 && ph->p_offset == 0)
file_p_vaddr = ph->p_vaddr;
break;
case PT_DYNAMIC:
break;
}
- if ((__glibc_unlikely (GL(dl_stack_flags)) & PF_X)
- && TUNABLE_GET (glibc, rtld, execstack, int32_t, NULL) == 0)
- _dl_fatal_printf ("Fatal glibc error: executable stack is not allowed\n");
+ _dl_handle_execstack_tunable ();
call_function_static_weak (_dl_find_object_init);
execstack {
type: INT_32
minval: 0
- maxval: 1
+ maxval: 2
default: 1
}
}
bool has_interp = rtld_setup_main_map (main_map);
- if ((__glibc_unlikely (GL(dl_stack_flags)) & PF_X)
- && TUNABLE_GET (glibc, rtld, execstack, int32_t, NULL) == 0)
- _dl_fatal_printf ("Fatal glibc error: executable stack is not allowed\n");
+ /* Handle this after PT_GNU_STACK parse, because it updates dl_stack_flags
+ if required. */
+ _dl_handle_execstack_tunable ();
/* If the current libname is different from the SONAME, add the
latter as well. */
--- /dev/null
+#include <tst-execstack-prog-static.c>
--- /dev/null
+#include <tst-execstack.c>
--- /dev/null
+/* Test static PIE with an empty PT_LOAD segment at offset 0.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "tst-pie-bss.c"
--- /dev/null
+/* Test PIE with an empty PT_LOAD segment at offset 0.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <stdio.h>
+
+char bss[0xb5dce8] __attribute__ ((aligned (65536)));
+
+static int
+do_test (void)
+{
+ printf ("Hello\n");
+ return 0;
+}
+
+#include <support/test-driver.c>
glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0x[f]+)
glibc.rtld.dynamic_sort: 2 (min: 1, max: 2)
glibc.rtld.enable_secure: 0 (min: 0, max: 1)
-glibc.rtld.execstack: 1 (min: 0, max: 1)
+glibc.rtld.execstack: 1 (min: 0, max: 2)
glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10)
glibc.rtld.optional_static_tls: 0x200 (min: 0x0, max: 0x[f]+)
log10p1 -0x1p-1021
log10p1 -0x1p-16381
+log10p1 0x1.27f7dap-17
+
log10p1 0x7.2a4368p-4
log10p1 0x6.d3a118p-4
log10p1 0x5.03f228p+0
sinh -0x5.ee9218p-4
sinh -0x1.bcfc98p+0
sinh -0x6.9bbb6df7c5d08p-4
+sinh 0x1.250bfep-11
# the next value generates larger error bounds on x86_64 (ldbl-96)
sinh 0x2.c5d376167f4052f4p+12
sinh max
tan -0x1.0d55b8p+0
tan 1.57079697
tan -1.57079697
+tan 0x1.ada6aap+27
tan 0x1p-5
tan 0x1p-10
tan 0x1p-15
= log10p1 tonearest binary128 -0x8p-16384 : -0x3.796f62a4dca1c654d56eaabeb4dp-16384 : inexact-ok underflow errno-erange-ok
= log10p1 towardzero binary128 -0x8p-16384 : -0x3.796f62a4dca1c654d56eaabeb4ccp-16384 : inexact-ok underflow errno-erange-ok
= log10p1 upward binary128 -0x8p-16384 : -0x3.796f62a4dca1c654d56eaabeb4ccp-16384 : inexact-ok underflow errno-erange-ok
+log10p1 0x1.27f7dap-17
+= log10p1 downward binary32 0x9.3fbedp-20 : 0x4.044b5p-20 : inexact-ok
+= log10p1 tonearest binary32 0x9.3fbedp-20 : 0x4.044b5p-20 : inexact-ok
+= log10p1 towardzero binary32 0x9.3fbedp-20 : 0x4.044b5p-20 : inexact-ok
+= log10p1 upward binary32 0x9.3fbedp-20 : 0x4.044b58p-20 : inexact-ok
+= log10p1 downward binary64 0x9.3fbedp-20 : 0x4.044b5157872ep-20 : inexact-ok
+= log10p1 tonearest binary64 0x9.3fbedp-20 : 0x4.044b5157872e4p-20 : inexact-ok
+= log10p1 towardzero binary64 0x9.3fbedp-20 : 0x4.044b5157872ep-20 : inexact-ok
+= log10p1 upward binary64 0x9.3fbedp-20 : 0x4.044b5157872e4p-20 : inexact-ok
+= log10p1 downward intel96 0x9.3fbedp-20 : 0x4.044b5157872e2868p-20 : inexact-ok
+= log10p1 tonearest intel96 0x9.3fbedp-20 : 0x4.044b5157872e2868p-20 : inexact-ok
+= log10p1 towardzero intel96 0x9.3fbedp-20 : 0x4.044b5157872e2868p-20 : inexact-ok
+= log10p1 upward intel96 0x9.3fbedp-20 : 0x4.044b5157872e287p-20 : inexact-ok
+= log10p1 downward m68k96 0x9.3fbedp-20 : 0x4.044b5157872e2868p-20 : inexact-ok
+= log10p1 tonearest m68k96 0x9.3fbedp-20 : 0x4.044b5157872e2868p-20 : inexact-ok
+= log10p1 towardzero m68k96 0x9.3fbedp-20 : 0x4.044b5157872e2868p-20 : inexact-ok
+= log10p1 upward m68k96 0x9.3fbedp-20 : 0x4.044b5157872e287p-20 : inexact-ok
+= log10p1 downward binary128 0x9.3fbedp-20 : 0x4.044b5157872e2868f5c04287d808p-20 : inexact-ok
+= log10p1 tonearest binary128 0x9.3fbedp-20 : 0x4.044b5157872e2868f5c04287d80cp-20 : inexact-ok
+= log10p1 towardzero binary128 0x9.3fbedp-20 : 0x4.044b5157872e2868f5c04287d808p-20 : inexact-ok
+= log10p1 upward binary128 0x9.3fbedp-20 : 0x4.044b5157872e2868f5c04287d80cp-20 : inexact-ok
+= log10p1 downward ibm128 0x9.3fbedp-20 : 0x4.044b5157872e2868f5c04287d8p-20 : inexact-ok
+= log10p1 tonearest ibm128 0x9.3fbedp-20 : 0x4.044b5157872e2868f5c04287d8p-20 : inexact-ok
+= log10p1 towardzero ibm128 0x9.3fbedp-20 : 0x4.044b5157872e2868f5c04287d8p-20 : inexact-ok
+= log10p1 upward ibm128 0x9.3fbedp-20 : 0x4.044b5157872e2868f5c04287dap-20 : inexact-ok
log10p1 0x7.2a4368p-4
= log10p1 downward binary32 0x7.2a4368p-4 : 0x2.9248dcp-4 : inexact-ok
= log10p1 tonearest binary32 0x7.2a4368p-4 : 0x2.9248ep-4 : inexact-ok
= sinh tonearest ibm128 -0x6.9bbb6df7c5d08p-4 : -0x6.cc3ddf003dcda77f8f9e892e36p-4 : inexact-ok
= sinh towardzero ibm128 -0x6.9bbb6df7c5d08p-4 : -0x6.cc3ddf003dcda77f8f9e892e36p-4 : inexact-ok
= sinh upward ibm128 -0x6.9bbb6df7c5d08p-4 : -0x6.cc3ddf003dcda77f8f9e892e36p-4 : inexact-ok
+sinh 0x1.250bfep-11
+= sinh downward binary32 0x2.4a17fcp-12 : 0x2.4a17fcp-12 : inexact-ok
+= sinh tonearest binary32 0x2.4a17fcp-12 : 0x2.4a17fcp-12 : inexact-ok
+= sinh towardzero binary32 0x2.4a17fcp-12 : 0x2.4a17fcp-12 : inexact-ok
+= sinh upward binary32 0x2.4a17fcp-12 : 0x2.4a18p-12 : inexact-ok
+= sinh downward binary64 0x2.4a17fcp-12 : 0x2.4a17fdffffffep-12 : inexact-ok
+= sinh tonearest binary64 0x2.4a17fcp-12 : 0x2.4a17fep-12 : inexact-ok
+= sinh towardzero binary64 0x2.4a17fcp-12 : 0x2.4a17fdffffffep-12 : inexact-ok
+= sinh upward binary64 0x2.4a17fcp-12 : 0x2.4a17fep-12 : inexact-ok
+= sinh downward intel96 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87cp-12 : inexact-ok
+= sinh tonearest intel96 0x2.4a17fcp-12 : 0x2.4a17fdfffffff88p-12 : inexact-ok
+= sinh towardzero intel96 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87cp-12 : inexact-ok
+= sinh upward intel96 0x2.4a17fcp-12 : 0x2.4a17fdfffffff88p-12 : inexact-ok
+= sinh downward m68k96 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87cp-12 : inexact-ok
+= sinh tonearest m68k96 0x2.4a17fcp-12 : 0x2.4a17fdfffffff88p-12 : inexact-ok
+= sinh towardzero m68k96 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87cp-12 : inexact-ok
+= sinh upward m68k96 0x2.4a17fcp-12 : 0x2.4a17fdfffffff88p-12 : inexact-ok
+= sinh downward binary128 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87e8d322786ec88p-12 : inexact-ok
+= sinh tonearest binary128 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87e8d322786ec8ap-12 : inexact-ok
+= sinh towardzero binary128 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87e8d322786ec88p-12 : inexact-ok
+= sinh upward binary128 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87e8d322786ec8ap-12 : inexact-ok
+= sinh downward ibm128 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87e8d322786ecp-12 : inexact-ok
+= sinh tonearest ibm128 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87e8d322786edp-12 : inexact-ok
+= sinh towardzero ibm128 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87e8d322786ecp-12 : inexact-ok
+= sinh upward ibm128 0x2.4a17fcp-12 : 0x2.4a17fdfffffff87e8d322786edp-12 : inexact-ok
sinh 0x2.c5d376167f4052f4p+12
= sinh downward binary32 0x2.c5d378p+12 : 0xf.fffffp+124 : inexact-ok overflow errno-erange-ok
= sinh tonearest binary32 0x2.c5d378p+12 : plus_infty : inexact-ok overflow errno-erange
= tan tonearest ibm128 -0x1.921fc00ece4f02f278ade6ad9fp+0 : 0x1.7b91a0851bbbafa14cf21c2b5c8p+20 : inexact-ok
= tan towardzero ibm128 -0x1.921fc00ece4f02f278ade6ad9fp+0 : 0x1.7b91a0851bbbafa14cf21c2b5cp+20 : inexact-ok
= tan upward ibm128 -0x1.921fc00ece4f02f278ade6ad9fp+0 : 0x1.7b91a0851bbbafa14cf21c2b5c8p+20 : inexact-ok
+tan 0x1.ada6aap+27
+= tan downward binary32 0xd.6d355p+24 : 0x3.d00608p-4 : inexact-ok
+= tan tonearest binary32 0xd.6d355p+24 : 0x3.d00608p-4 : inexact-ok
+= tan towardzero binary32 0xd.6d355p+24 : 0x3.d00608p-4 : inexact-ok
+= tan upward binary32 0xd.6d355p+24 : 0x3.d0060cp-4 : inexact-ok
+= tan downward binary64 0xd.6d355p+24 : 0x3.d00608p-4 : inexact-ok
+= tan tonearest binary64 0xd.6d355p+24 : 0x3.d00608p-4 : inexact-ok
+= tan towardzero binary64 0xd.6d355p+24 : 0x3.d00608p-4 : inexact-ok
+= tan upward binary64 0xd.6d355p+24 : 0x3.d006080000002p-4 : inexact-ok
+= tan downward intel96 0xd.6d355p+24 : 0x3.d006080000000504p-4 : inexact-ok
+= tan tonearest intel96 0xd.6d355p+24 : 0x3.d006080000000508p-4 : inexact-ok
+= tan towardzero intel96 0xd.6d355p+24 : 0x3.d006080000000504p-4 : inexact-ok
+= tan upward intel96 0xd.6d355p+24 : 0x3.d006080000000508p-4 : inexact-ok
+= tan downward m68k96 0xd.6d355p+24 : 0x3.d006080000000504p-4 : inexact-ok
+= tan tonearest m68k96 0xd.6d355p+24 : 0x3.d006080000000508p-4 : inexact-ok
+= tan towardzero m68k96 0xd.6d355p+24 : 0x3.d006080000000504p-4 : inexact-ok
+= tan upward m68k96 0xd.6d355p+24 : 0x3.d006080000000508p-4 : inexact-ok
+= tan downward binary128 0xd.6d355p+24 : 0x3.d0060800000005067d16c1c9c15ap-4 : inexact-ok
+= tan tonearest binary128 0xd.6d355p+24 : 0x3.d0060800000005067d16c1c9c15ap-4 : inexact-ok
+= tan towardzero binary128 0xd.6d355p+24 : 0x3.d0060800000005067d16c1c9c15ap-4 : inexact-ok
+= tan upward binary128 0xd.6d355p+24 : 0x3.d0060800000005067d16c1c9c15cp-4 : inexact-ok
+= tan downward ibm128 0xd.6d355p+24 : 0x3.d0060800000005067d16c1c9c1p-4 : inexact-ok
+= tan tonearest ibm128 0xd.6d355p+24 : 0x3.d0060800000005067d16c1c9c1p-4 : inexact-ok
+= tan towardzero ibm128 0xd.6d355p+24 : 0x3.d0060800000005067d16c1c9c1p-4 : inexact-ok
+= tan upward ibm128 0xd.6d355p+24 : 0x3.d0060800000005067d16c1c9c2p-4 : inexact-ok
tan 0x1p-5
= tan downward binary32 0x8p-8 : 0x8.00aabp-8 : inexact-ok
= tan tonearest binary32 0x8p-8 : 0x8.00aacp-8 : inexact-ok
#define __MATHCALLX(function,suffix, args, attrib) \
__MATHDECLX (_Mdouble_,function,suffix, args, attrib)
#define __MATHDECLX(type, function,suffix, args, attrib) \
- __MATHDECL_1(type, function,suffix, args) __attribute__ (attrib);
+ __MATHDECL_1(type, function,suffix, args) __attribute__ (attrib)
#define __MATHDECL_1_IMPL(type, function, suffix, args) \
extern type __MATH_PRECNAME(function,suffix) args __THROW
#define __MATHDECL_1(type, function, suffix, args) \
LDFLAGS-tst-execstack-threads = -Wl,-z,noexecstack
LDFLAGS-tst-execstack-threads-mod.so = -Wl,-z,execstack
CFLAGS-tst-execstack-threads-mod.c += -Wno-trampolines
+ifeq ($(have-no-error-execstack),yes)
+LDFLAGS-tst-execstack-threads-mod.so += -Wl,--no-error-execstack
+endif
tst-stackguard1-ARGS = --command "$(host-test-program-cmd) --child"
tst-stackguard1-static-ARGS = --command "$(objpfx)tst-stackguard1-static --child"
|| si->si_code != SI_TKILL)
return;
- /* Check if asynchronous cancellation mode is set or if interrupted
- instruction pointer falls within the cancellable syscall bridge. For
- interruptable syscalls with external side-effects (i.e. partial reads),
- the kernel will set the IP to after __syscall_cancel_arch_end, thus
- disabling the cancellation and allowing the process to handle such
+ /* Check if asynchronous cancellation mode is set and cancellation is not
+ already in progress, or if interrupted instruction pointer falls within
+ the cancellable syscall bridge.
+ For interruptable syscalls with external side-effects (i.e. partial
+ reads), the kernel will set the IP to after __syscall_cancel_arch_end,
+ thus disabling the cancellation and allowing the process to handle such
conditions. */
struct pthread *self = THREAD_SELF;
int oldval = atomic_load_relaxed (&self->cancelhandling);
- if (cancel_async_enabled (oldval) || cancellation_pc_check (ctx))
+ if (cancel_enabled_and_canceled_and_async (oldval)
+ || cancellation_pc_check (ctx))
__syscall_do_cancel ();
}
> (size_t) iattr->stackaddr - last_to)
iattr->stacksize = (size_t) iattr->stackaddr - last_to;
#else
- /* The limit might be too high. */
+ /* The limit might be too low. */
if ((size_t) iattr->stacksize
- > to - (size_t) iattr->stackaddr)
+ < to - (size_t) iattr->stackaddr)
iattr->stacksize = to - (size_t) iattr->stackaddr;
#endif
/* We succeed and no need to look further. */
#include <unistd.h>
#include <stddef.h>
+#include <stdlib/setenv.h>
/* This must be initialized; we cannot have a weak alias into bss. */
char **__environ = NULL;
/* The SVR4 ABI says `_environ' will be the name to use
in case the user overrides the weak alias `environ'. */
weak_alias (__environ, _environ)
+
+struct environ_array *__environ_array_list;
+environ_counter __environ_counter;
tst-environ-change-3 \
tst-environ-change-4 \
tst-getenv-signal \
+ tst-getenv-static \
tst-getenv-thread \
tst-getenv-unsetenv \
tst-getrandom \
# tests-internal
tests-static := \
+ tst-getenv-static \
tst-secure-getenv \
# tests-static
#include <string.h>
#include <unistd.h>
-struct environ_array *__environ_array_list;
-environ_counter __environ_counter;
-
char *
getenv (const char *name)
{
--- /dev/null
+/* Static interposition of getenv (bug 32541).
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <stdlib.h>
+#include <support/check.h>
+
+/* Some programs try to interpose getenv for their own use (not
+ glibc's internal use). Make sure that this is possible without
+ introducing linker failures due to duplicate symbols. */
+
+char *
+getenv (const char *ignored)
+{
+ return NULL;
+}
+
+static int
+do_test (void)
+{
+ TEST_COMPARE_STRING (getenv ("PATH"), NULL);
+ return 0;
+}
+
+#include <support/test-driver.c>
<https://www.gnu.org/licenses/>. */
#include "sv_math.h"
-#include "poly_sve_f64.h"
#define SignMask (0x8000000000000000)
#define One (0x3ff0000000000000)
#define Thres (0x5fe0000000000000) /* asuint64 (0x1p511). */
+#define IndexMask (((1 << V_LOG_TABLE_BITS) - 1) << 1)
static const struct data
{
- double poly[18];
- double ln2, p3, p1, p4, p0, p2;
- uint64_t n;
- uint64_t off;
+ double even_coeffs[9];
+ double ln2, p3, p1, p4, p0, p2, c1, c3, c5, c7, c9, c11, c13, c15, c17;
+ uint64_t off, mask;
} data = {
- /* Polynomial generated using Remez on [2^-26, 1]. */
- .poly
- = { -0x1.55555555554a7p-3, 0x1.3333333326c7p-4, -0x1.6db6db68332e6p-5,
- 0x1.f1c71b26fb40dp-6, -0x1.6e8b8b654a621p-6, 0x1.1c4daa9e67871p-6,
- -0x1.c9871d10885afp-7, 0x1.7a16e8d9d2ecfp-7, -0x1.3ddca533e9f54p-7,
- 0x1.0becef748dafcp-7, -0x1.b90c7099dd397p-8, 0x1.541f2bb1ffe51p-8,
- -0x1.d217026a669ecp-9, 0x1.0b5c7977aaf7p-9, -0x1.e0f37daef9127p-11,
- 0x1.388b5fe542a6p-12, -0x1.021a48685e287p-14, 0x1.93d4ba83d34dap-18 },
+ /* Polynomial generated using Remez on [2^-26, 1]. */
+ .even_coeffs ={
+ -0x1.55555555554a7p-3,
+ -0x1.6db6db68332e6p-5,
+ -0x1.6e8b8b654a621p-6,
+ -0x1.c9871d10885afp-7,
+ -0x1.3ddca533e9f54p-7,
+ -0x1.b90c7099dd397p-8,
+ -0x1.d217026a669ecp-9,
+ -0x1.e0f37daef9127p-11,
+ -0x1.021a48685e287p-14, },
+
+ .c1 = 0x1.3333333326c7p-4,
+ .c3 = 0x1.f1c71b26fb40dp-6,
+ .c5 = 0x1.1c4daa9e67871p-6,
+ .c7 = 0x1.7a16e8d9d2ecfp-7,
+ .c9 = 0x1.0becef748dafcp-7,
+ .c11 = 0x1.541f2bb1ffe51p-8,
+ .c13 = 0x1.0b5c7977aaf7p-9,
+ .c15 = 0x1.388b5fe542a6p-12,
+ .c17 = 0x1.93d4ba83d34dap-18,
+
.ln2 = 0x1.62e42fefa39efp-1,
.p0 = -0x1.ffffffffffff7p-2,
.p1 = 0x1.55555555170d4p-2,
.p2 = -0x1.0000000399c27p-2,
.p3 = 0x1.999b2e90e94cap-3,
.p4 = -0x1.554e550bd501ep-3,
- .n = 1 << V_LOG_TABLE_BITS,
- .off = 0x3fe6900900000000
+ .off = 0x3fe6900900000000,
+ .mask = 0xfffULL << 52,
};
static svfloat64_t NOINLINE
of the algorithm used. */
svuint64_t ix = svreinterpret_u64 (x);
- svuint64_t tmp = svsub_x (pg, ix, d->off);
- svuint64_t i = svand_x (pg, svlsr_x (pg, tmp, (51 - V_LOG_TABLE_BITS)),
- (d->n - 1) << 1);
- svint64_t k = svasr_x (pg, svreinterpret_s64 (tmp), 52);
- svuint64_t iz = svsub_x (pg, ix, svand_x (pg, tmp, 0xfffULL << 52));
+ svuint64_t i_off = svsub_x (pg, ix, d->off);
+ svuint64_t i
+ = svand_x (pg, svlsr_x (pg, i_off, (51 - V_LOG_TABLE_BITS)), IndexMask);
+ svuint64_t iz = svsub_x (pg, ix, svand_x (pg, i_off, d->mask));
svfloat64_t z = svreinterpret_f64 (iz);
svfloat64_t invc = svld1_gather_index (pg, &__v_log_data.table[0].invc, i);
svfloat64_t p1_p4 = svld1rq (svptrue_b64 (), &d->p1);
svfloat64_t r = svmla_x (pg, sv_f64 (-1.0), invc, z);
- svfloat64_t kd = svcvt_f64_x (pg, k);
+ svfloat64_t kd
+ = svcvt_f64_x (pg, svasr_x (pg, svreinterpret_s64 (i_off), 52));
svfloat64_t hi = svmla_lane (svadd_x (pg, logc, r), kd, ln2_p3, 0);
- svfloat64_t r2 = svmul_x (pg, r, r);
-
+ svfloat64_t r2 = svmul_x (svptrue_b64 (), r, r);
svfloat64_t y = svmla_lane (sv_f64 (d->p2), r, ln2_p3, 1);
-
svfloat64_t p = svmla_lane (sv_f64 (d->p0), r, p1_p4, 0);
+
y = svmla_lane (y, r2, p1_p4, 1);
y = svmla_x (pg, p, r2, y);
y = svmla_x (pg, hi, r2, y);
svuint64_t iax = svbic_x (pg, ix, SignMask);
svuint64_t sign = svand_x (pg, ix, SignMask);
svfloat64_t ax = svreinterpret_f64 (iax);
-
svbool_t ge1 = svcmpge (pg, iax, One);
svbool_t special = svcmpge (pg, iax, Thres);
svfloat64_t option_1 = sv_f64 (0);
if (__glibc_likely (svptest_any (pg, ge1)))
{
- svfloat64_t x2 = svmul_x (pg, ax, ax);
+ svfloat64_t x2 = svmul_x (svptrue_b64 (), ax, ax);
option_1 = __sv_log_inline (
svadd_x (pg, ax, svsqrt_x (pg, svadd_x (pg, x2, 1))), d, pg);
}
The largest observed error in this region is 1.51 ULPs:
_ZGVsMxv_asinh(0x1.fe12bf8c616a2p-1) got 0x1.c1e649ee2681bp-1
want 0x1.c1e649ee2681dp-1. */
+
svfloat64_t option_2 = sv_f64 (0);
if (__glibc_likely (svptest_any (pg, svnot_z (pg, ge1))))
{
- svfloat64_t x2 = svmul_x (pg, ax, ax);
- svfloat64_t x4 = svmul_x (pg, x2, x2);
- svfloat64_t p = sv_pw_horner_17_f64_x (pg, x2, x4, d->poly);
- option_2 = svmla_x (pg, ax, p, svmul_x (pg, x2, ax));
+ svfloat64_t x2 = svmul_x (svptrue_b64 (), ax, ax);
+ svfloat64_t x4 = svmul_x (svptrue_b64 (), x2, x2);
+ /* Order-17 Pairwise Horner scheme. */
+ svfloat64_t c13 = svld1rq (svptrue_b64 (), &d->c1);
+ svfloat64_t c57 = svld1rq (svptrue_b64 (), &d->c5);
+ svfloat64_t c911 = svld1rq (svptrue_b64 (), &d->c9);
+ svfloat64_t c1315 = svld1rq (svptrue_b64 (), &d->c13);
+
+ svfloat64_t p01 = svmla_lane (sv_f64 (d->even_coeffs[0]), x2, c13, 0);
+ svfloat64_t p23 = svmla_lane (sv_f64 (d->even_coeffs[1]), x2, c13, 1);
+ svfloat64_t p45 = svmla_lane (sv_f64 (d->even_coeffs[2]), x2, c57, 0);
+ svfloat64_t p67 = svmla_lane (sv_f64 (d->even_coeffs[3]), x2, c57, 1);
+ svfloat64_t p89 = svmla_lane (sv_f64 (d->even_coeffs[4]), x2, c911, 0);
+ svfloat64_t p1011 = svmla_lane (sv_f64 (d->even_coeffs[5]), x2, c911, 1);
+ svfloat64_t p1213
+ = svmla_lane (sv_f64 (d->even_coeffs[6]), x2, c1315, 0);
+ svfloat64_t p1415
+ = svmla_lane (sv_f64 (d->even_coeffs[7]), x2, c1315, 1);
+ svfloat64_t p1617 = svmla_x (pg, sv_f64 (d->even_coeffs[8]), x2, d->c17);
+
+ svfloat64_t p = svmla_x (pg, p1415, x4, p1617);
+ p = svmla_x (pg, p1213, x4, p);
+ p = svmla_x (pg, p1011, x4, p);
+ p = svmla_x (pg, p89, x4, p);
+
+ p = svmla_x (pg, p67, x4, p);
+ p = svmla_x (pg, p45, x4, p);
+
+ p = svmla_x (pg, p23, x4, p);
+
+ p = svmla_x (pg, p01, x4, p);
+
+ option_2 = svmla_x (pg, ax, p, svmul_x (svptrue_b64 (), x2, ax));
}
- /* Choose the right option for each lane. */
- svfloat64_t y = svsel (ge1, option_1, option_2);
-
if (__glibc_unlikely (svptest_any (pg, special)))
return special_case (
- x, svreinterpret_f64 (sveor_x (pg, svreinterpret_u64 (y), sign)),
+ x,
+ svreinterpret_f64 (sveor_x (
+ pg, svreinterpret_u64 (svsel (ge1, option_1, option_2)), sign)),
special);
+
+ /* Choose the right option for each lane. */
+ svfloat64_t y = svsel (ge1, option_1, option_2);
return svreinterpret_f64 (sveor_x (pg, svreinterpret_u64 (y), sign));
}
{
float64_t poly[3];
float64_t inv_ln2, ln2_hi, ln2_lo, shift, thres;
- uint64_t index_mask, special_bound;
+ uint64_t special_bound;
} data = {
.poly = { 0x1.fffffffffffd4p-2, 0x1.5555571d6b68cp-3,
0x1.5555576a59599p-5, },
.shift = 0x1.8p+52,
.thres = 704.0,
- .index_mask = 0xff,
/* 0x1.6p9, above which exp overflows. */
.special_bound = 0x4086000000000000,
};
static svfloat64_t NOINLINE
-special_case (svfloat64_t x, svfloat64_t y, svbool_t special)
+special_case (svfloat64_t x, svbool_t pg, svfloat64_t t, svbool_t special)
{
+ svfloat64_t half_t = svmul_x (svptrue_b64 (), t, 0.5);
+ svfloat64_t half_over_t = svdivr_x (pg, t, 0.5);
+ svfloat64_t y = svadd_x (pg, half_t, half_over_t);
return sv_call_f64 (cosh, x, y, special);
}
svuint64_t u = svreinterpret_u64 (z);
svuint64_t e = svlsl_x (pg, u, 52 - V_EXP_TAIL_TABLE_BITS);
- svuint64_t i = svand_x (pg, u, d->index_mask);
+ svuint64_t i = svand_x (svptrue_b64 (), u, 0xff);
svfloat64_t y = svmla_x (pg, sv_f64 (d->poly[1]), r, d->poly[2]);
y = svmla_x (pg, sv_f64 (d->poly[0]), r, y);
y = svmla_x (pg, sv_f64 (1.0), r, y);
- y = svmul_x (pg, r, y);
+ y = svmul_x (svptrue_b64 (), r, y);
/* s = 2^(n/N). */
u = svld1_gather_index (pg, __v_exp_tail_data, i);
/* Up to the point that exp overflows, we can use it to calculate cosh by
exp(|x|) / 2 + 1 / (2 * exp(|x|)). */
svfloat64_t t = exp_inline (ax, pg, d);
- svfloat64_t half_t = svmul_x (pg, t, 0.5);
- svfloat64_t half_over_t = svdivr_x (pg, t, 0.5);
/* Fall back to scalar for any special cases. */
if (__glibc_unlikely (svptest_any (pg, special)))
- return special_case (x, svadd_x (pg, half_t, half_over_t), special);
+ return special_case (x, pg, t, special);
+ svfloat64_t half_t = svmul_x (svptrue_b64 (), t, 0.5);
+ svfloat64_t half_over_t = svdivr_x (pg, t, 0.5);
return svadd_x (pg, half_t, half_over_t);
}
svuint32_t i = svqadd (svreinterpret_u32 (z), dat->off_idx);
/* Lookup erfc(r) and 2/sqrt(pi)*exp(-r^2) in tables. */
- i = svmul_x (pg, i, 2);
+ i = svlsl_x (svptrue_b32 (), i, 1);
const float32_t *p = &__v_erfcf_data.tab[0].erfc - 2 * dat->off_arr;
svfloat32_t erfcr = svld1_gather_index (pg, p, i);
svfloat32_t scale = svld1_gather_index (pg, p + 1, i);
/* erfc(x) ~ erfc(r) - scale * d * poly(r, d). */
svfloat32_t r = svsub_x (pg, z, shift);
svfloat32_t d = svsub_x (pg, a, r);
- svfloat32_t d2 = svmul_x (pg, d, d);
- svfloat32_t r2 = svmul_x (pg, r, r);
+ svfloat32_t d2 = svmul_x (svptrue_b32 (), d, d);
+ svfloat32_t r2 = svmul_x (svptrue_b32 (), r, r);
svfloat32_t coeffs = svld1rq (svptrue_b32 (), &dat->third);
- svfloat32_t third = svdup_lane (coeffs, 0);
svfloat32_t p1 = r;
- svfloat32_t p2 = svmls_lane (third, r2, coeffs, 1);
- svfloat32_t p3 = svmul_x (pg, r, svmla_lane (sv_f32 (-0.5), r2, coeffs, 0));
+ svfloat32_t p2 = svmls_lane (sv_f32 (dat->third), r2, coeffs, 1);
+ svfloat32_t p3
+ = svmul_x (svptrue_b32 (), r, svmla_lane (sv_f32 (-0.5), r2, coeffs, 0));
svfloat32_t p4 = svmla_lane (sv_f32 (dat->two_over_five), r2, coeffs, 2);
p4 = svmls_x (pg, sv_f32 (dat->tenth), r2, p4);
<https://www.gnu.org/licenses/>. */
#include "sv_math.h"
-#include "poly_sve_f64.h"
#define SpecialBound 307.0 /* floor (log10 (2^1023)). */
static const struct data
{
- double poly[5];
+ double c1, c3, c2, c4, c0;
double shift, log10_2, log2_10_hi, log2_10_lo, scale_thres, special_bound;
} data = {
/* Coefficients generated using Remez algorithm.
rel error: 0x1.9fcb9b3p-60
abs error: 0x1.a20d9598p-60 in [ -log10(2)/128, log10(2)/128 ]
max ulp err 0.52 +0.5. */
- .poly = { 0x1.26bb1bbb55516p1, 0x1.53524c73cd32ap1, 0x1.0470591daeafbp1,
- 0x1.2bd77b1361ef6p0, 0x1.142b5d54e9621p-1 },
+ .c0 = 0x1.26bb1bbb55516p1,
+ .c1 = 0x1.53524c73cd32ap1,
+ .c2 = 0x1.0470591daeafbp1,
+ .c3 = 0x1.2bd77b1361ef6p0,
+ .c4 = 0x1.142b5d54e9621p-1,
/* 1.5*2^46+1023. This value is further explained below. */
.shift = 0x1.800000000ffc0p+46,
.log10_2 = 0x1.a934f0979a371p1, /* 1/log2(10). */
/* |n| > 1280 => 2^(n) overflows. */
svbool_t p_cmp = svacgt (pg, n, d->scale_thres);
- svfloat64_t r1 = svmul_x (pg, s1, s1);
+ svfloat64_t r1 = svmul_x (svptrue_b64 (), s1, s1);
svfloat64_t r2 = svmla_x (pg, s2, s2, y);
- svfloat64_t r0 = svmul_x (pg, r2, s1);
+ svfloat64_t r0 = svmul_x (svptrue_b64 (), r2, s1);
return svsel (p_cmp, r1, r0);
}
comes at significant performance cost. */
svuint64_t u = svreinterpret_u64 (z);
svfloat64_t scale = svexpa (u);
-
+ svfloat64_t c24 = svld1rq (svptrue_b64 (), &d->c2);
/* Approximate exp10(r) using polynomial. */
- svfloat64_t r2 = svmul_x (pg, r, r);
- svfloat64_t y = svmla_x (pg, svmul_x (pg, r, d->poly[0]), r2,
- sv_pairwise_poly_3_f64_x (pg, r, r2, d->poly + 1));
+ svfloat64_t r2 = svmul_x (svptrue_b64 (), r, r);
+ svfloat64_t p12 = svmla_lane (sv_f64 (d->c1), r, c24, 0);
+ svfloat64_t p34 = svmla_lane (sv_f64 (d->c3), r, c24, 1);
+ svfloat64_t p14 = svmla_x (pg, p12, p34, r2);
+
+ svfloat64_t y = svmla_x (pg, svmul_x (svptrue_b64 (), r, d->c0), r2, p14);
/* Assemble result as exp10(x) = 2^n * exp10(r). If |x| > SpecialBound
multiplication may overflow, so use special case routine. */
<https://www.gnu.org/licenses/>. */
#include "sv_math.h"
-#include "poly_sve_f64.h"
#define N (1 << V_EXP_TABLE_BITS)
static const struct data
{
- double poly[4];
+ double c0, c2;
+ double c1, c3;
double shift, big_bound, uoflow_bound;
} data = {
/* Coefficients are computed using Remez algorithm with
minimisation of the absolute error. */
- .poly = { 0x1.62e42fefa3686p-1, 0x1.ebfbdff82c241p-3, 0x1.c6b09b16de99ap-5,
- 0x1.3b2abf5571ad8p-7 },
- .shift = 0x1.8p52 / N,
- .uoflow_bound = UOFlowBound,
+ .c0 = 0x1.62e42fefa3686p-1, .c1 = 0x1.ebfbdff82c241p-3,
+ .c2 = 0x1.c6b09b16de99ap-5, .c3 = 0x1.3b2abf5571ad8p-7,
+ .shift = 0x1.8p52 / N, .uoflow_bound = UOFlowBound,
.big_bound = BigBound,
};
/* |n| > 1280 => 2^(n) overflows. */
svbool_t p_cmp = svacgt (pg, n, d->uoflow_bound);
- svfloat64_t r1 = svmul_x (pg, s1, s1);
+ svfloat64_t r1 = svmul_x (svptrue_b64 (), s1, s1);
svfloat64_t r2 = svmla_x (pg, s2, s2, y);
- svfloat64_t r0 = svmul_x (pg, r2, s1);
+ svfloat64_t r0 = svmul_x (svptrue_b64 (), r2, s1);
return svsel (p_cmp, r1, r0);
}
svuint64_t top = svlsl_x (pg, ki, 52 - V_EXP_TABLE_BITS);
svfloat64_t scale = svreinterpret_f64 (svadd_x (pg, sbits, top));
+ svfloat64_t c13 = svld1rq (svptrue_b64 (), &d->c1);
/* Approximate exp2(r) using polynomial. */
- svfloat64_t r2 = svmul_x (pg, r, r);
- svfloat64_t p = sv_pairwise_poly_3_f64_x (pg, r, r2, d->poly);
- svfloat64_t y = svmul_x (pg, r, p);
-
+ /* y = exp2(r) - 1 ~= C0 r + C1 r^2 + C2 r^3 + C3 r^4. */
+ svfloat64_t r2 = svmul_x (svptrue_b64 (), r, r);
+ svfloat64_t p01 = svmla_lane (sv_f64 (d->c0), r, c13, 0);
+ svfloat64_t p23 = svmla_lane (sv_f64 (d->c2), r, c13, 1);
+ svfloat64_t p = svmla_x (pg, p01, p23, r2);
+ svfloat64_t y = svmul_x (svptrue_b64 (), r, p);
/* Assemble exp2(x) = exp2(r) * scale. */
if (__glibc_unlikely (svptest_any (pg, special)))
return special_case (pg, scale, y, kd, d);
static const struct data
{
- double poly[4];
+ double c0, c2;
+ double c1, c3;
double ln2_hi, ln2_lo, inv_ln2, shift, thres;
+
} data = {
- .poly = { /* ulp error: 0.53. */
- 0x1.fffffffffdbcdp-2, 0x1.555555555444cp-3, 0x1.555573c6a9f7dp-5,
- 0x1.1111266d28935p-7 },
+ .c0 = 0x1.fffffffffdbcdp-2,
+ .c1 = 0x1.555555555444cp-3,
+ .c2 = 0x1.555573c6a9f7dp-5,
+ .c3 = 0x1.1111266d28935p-7,
.ln2_hi = 0x1.62e42fefa3800p-1,
.ln2_lo = 0x1.ef35793c76730p-45,
/* 1/ln2. */
.thres = 704.0,
};
-#define C(i) sv_f64 (d->poly[i])
#define SpecialOffset 0x6000000000000000 /* 0x1p513. */
/* SpecialBias1 + SpecialBias1 = asuint(1.0). */
#define SpecialBias1 0x7000000000000000 /* 0x1p769. */
svuint64_t b
= svdup_u64_z (p_sign, SpecialOffset); /* Inactive lanes set to 0. */
- /* Set s1 to generate overflow depending on sign of exponent n. */
- svfloat64_t s1 = svreinterpret_f64 (
- svsubr_x (pg, b, SpecialBias1)); /* 0x70...0 - b. */
- /* Offset s to avoid overflow in final result if n is below threshold. */
+ /* Set s1 to generate overflow depending on sign of exponent n,
+ ie. s1 = 0x70...0 - b. */
+ svfloat64_t s1 = svreinterpret_f64 (svsubr_x (pg, b, SpecialBias1));
+ /* Offset s to avoid overflow in final result if n is below threshold.
+ ie. s2 = as_u64 (s) - 0x3010...0 + b. */
svfloat64_t s2 = svreinterpret_f64 (
- svadd_x (pg, svsub_x (pg, svreinterpret_u64 (s), SpecialBias2),
- b)); /* as_u64 (s) - 0x3010...0 + b. */
+ svadd_x (pg, svsub_x (pg, svreinterpret_u64 (s), SpecialBias2), b));
/* |n| > 1280 => 2^(n) overflows. */
svbool_t p_cmp = svacgt (pg, n, 1280.0);
- svfloat64_t r1 = svmul_x (pg, s1, s1);
+ svfloat64_t r1 = svmul_x (svptrue_b64 (), s1, s1);
svfloat64_t r2 = svmla_x (pg, s2, s2, y);
- svfloat64_t r0 = svmul_x (pg, r2, s1);
+ svfloat64_t r0 = svmul_x (svptrue_b64 (), r2, s1);
return svsel (p_cmp, r1, r0);
}
svfloat64_t z = svmla_x (pg, sv_f64 (d->shift), x, d->inv_ln2);
svuint64_t u = svreinterpret_u64 (z);
svfloat64_t n = svsub_x (pg, z, d->shift);
-
+ svfloat64_t c13 = svld1rq (svptrue_b64 (), &d->c1);
/* r = x - n * ln2, r is in [-ln2/(2N), ln2/(2N)]. */
svfloat64_t ln2 = svld1rq (svptrue_b64 (), &d->ln2_hi);
svfloat64_t r = svmls_lane (x, n, ln2, 0);
r = svmls_lane (r, n, ln2, 1);
/* y = exp(r) - 1 ~= r + C0 r^2 + C1 r^3 + C2 r^4 + C3 r^5. */
- svfloat64_t r2 = svmul_x (pg, r, r);
- svfloat64_t p01 = svmla_x (pg, C (0), C (1), r);
- svfloat64_t p23 = svmla_x (pg, C (2), C (3), r);
+ svfloat64_t r2 = svmul_x (svptrue_b64 (), r, r);
+ svfloat64_t p01 = svmla_lane (sv_f64 (d->c0), r, c13, 0);
+ svfloat64_t p23 = svmla_lane (sv_f64 (d->c2), r, c13, 1);
svfloat64_t p04 = svmla_x (pg, p01, p23, r2);
svfloat64_t y = svmla_x (pg, r, p04, r2);
/* Data is defined in v_pow_log_data.c. */
#define N_LOG (1 << V_POW_LOG_TABLE_BITS)
-#define A __v_pow_log_data.poly
#define Off 0x3fe6955500000000
/* Data is defined in v_pow_exp_data.c. */
#define N_EXP (1 << V_POW_EXP_TABLE_BITS)
#define SignBias (0x800 << V_POW_EXP_TABLE_BITS)
-#define C __v_pow_exp_data.poly
#define SmallExp 0x3c9 /* top12(0x1p-54). */
#define BigExp 0x408 /* top12(512.). */
#define ThresExp 0x03f /* BigExp - SmallExp. */
#define HugeExp 0x409 /* top12(1024.). */
/* Constants associated with pow. */
+#define SmallBoundX 0x1p-126
#define SmallPowX 0x001 /* top12(0x1p-126). */
#define BigPowX 0x7ff /* top12(INFINITY). */
#define ThresPowX 0x7fe /* BigPowX - SmallPowX. */
#define BigPowY 0x43e /* top12(0x1.749p62). */
#define ThresPowY 0x080 /* BigPowY - SmallPowY. */
+static const struct data
+{
+ double log_c0, log_c2, log_c4, log_c6, ln2_hi, ln2_lo;
+ double log_c1, log_c3, log_c5, off;
+ double n_over_ln2, exp_c2, ln2_over_n_hi, ln2_over_n_lo;
+ double exp_c0, exp_c1;
+} data = {
+ .log_c0 = -0x1p-1,
+ .log_c1 = -0x1.555555555556p-1,
+ .log_c2 = 0x1.0000000000006p-1,
+ .log_c3 = 0x1.999999959554ep-1,
+ .log_c4 = -0x1.555555529a47ap-1,
+ .log_c5 = -0x1.2495b9b4845e9p0,
+ .log_c6 = 0x1.0002b8b263fc3p0,
+ .off = Off,
+ .exp_c0 = 0x1.fffffffffffd4p-2,
+ .exp_c1 = 0x1.5555571d6ef9p-3,
+ .exp_c2 = 0x1.5555576a5adcep-5,
+ .ln2_hi = 0x1.62e42fefa3800p-1,
+ .ln2_lo = 0x1.ef35793c76730p-45,
+ .n_over_ln2 = 0x1.71547652b82fep0 * N_EXP,
+ .ln2_over_n_hi = 0x1.62e42fefc0000p-9,
+ .ln2_over_n_lo = -0x1.c610ca86c3899p-45,
+};
+
/* Check if x is an integer. */
static inline svbool_t
sv_isint (svbool_t pg, svfloat64_t x)
static inline svbool_t
sv_isodd (svbool_t pg, svfloat64_t x)
{
- svfloat64_t y = svmul_x (pg, x, 0.5);
+ svfloat64_t y = svmul_x (svptrue_b64 (), x, 0.5);
return sv_isnotint (pg, y);
}
static inline svbool_t
sv_zeroinfnan (svbool_t pg, svuint64_t i)
{
- return svcmpge (pg, svsub_x (pg, svmul_x (pg, i, 2), 1),
+ return svcmpge (pg, svsub_x (pg, svadd_x (pg, i, i), 1),
2 * asuint64 (INFINITY) - 1);
}
additional 15 bits precision. IX is the bit representation of x, but
normalized in the subnormal range using the sign bit for the exponent. */
static inline svfloat64_t
-sv_log_inline (svbool_t pg, svuint64_t ix, svfloat64_t *tail)
+sv_log_inline (svbool_t pg, svuint64_t ix, svfloat64_t *tail,
+ const struct data *d)
{
/* x = 2^k z; where z is in range [Off,2*Off) and exact.
The range is split into N subintervals.
The ith subinterval contains z and c is near its center. */
- svuint64_t tmp = svsub_x (pg, ix, Off);
+ svuint64_t tmp = svsub_x (pg, ix, d->off);
svuint64_t i = svand_x (pg, svlsr_x (pg, tmp, 52 - V_POW_LOG_TABLE_BITS),
sv_u64 (N_LOG - 1));
svint64_t k = svasr_x (pg, svreinterpret_s64 (tmp), 52);
- svuint64_t iz = svsub_x (pg, ix, svand_x (pg, tmp, sv_u64 (0xfffULL << 52)));
+ svuint64_t iz = svsub_x (pg, ix, svlsl_x (pg, svreinterpret_u64 (k), 52));
svfloat64_t z = svreinterpret_f64 (iz);
svfloat64_t kd = svcvt_f64_x (pg, k);
|z/c - 1| < 1/N, so r = z/c - 1 is exactly representible. */
svfloat64_t r = svmad_x (pg, z, invc, -1.0);
/* k*Ln2 + log(c) + r. */
- svfloat64_t t1 = svmla_x (pg, logc, kd, __v_pow_log_data.ln2_hi);
+
+ svfloat64_t ln2_hilo = svld1rq_f64 (svptrue_b64 (), &d->ln2_hi);
+ svfloat64_t t1 = svmla_lane_f64 (logc, kd, ln2_hilo, 0);
svfloat64_t t2 = svadd_x (pg, t1, r);
- svfloat64_t lo1 = svmla_x (pg, logctail, kd, __v_pow_log_data.ln2_lo);
+ svfloat64_t lo1 = svmla_lane_f64 (logctail, kd, ln2_hilo, 1);
svfloat64_t lo2 = svadd_x (pg, svsub_x (pg, t1, t2), r);
/* Evaluation is optimized assuming superscalar pipelined execution. */
- svfloat64_t ar = svmul_x (pg, r, -0.5); /* A[0] = -0.5. */
- svfloat64_t ar2 = svmul_x (pg, r, ar);
- svfloat64_t ar3 = svmul_x (pg, r, ar2);
+
+ svfloat64_t log_c02 = svld1rq_f64 (svptrue_b64 (), &d->log_c0);
+ svfloat64_t ar = svmul_lane_f64 (r, log_c02, 0);
+ svfloat64_t ar2 = svmul_x (svptrue_b64 (), r, ar);
+ svfloat64_t ar3 = svmul_x (svptrue_b64 (), r, ar2);
/* k*Ln2 + log(c) + r + A[0]*r*r. */
svfloat64_t hi = svadd_x (pg, t2, ar2);
- svfloat64_t lo3 = svmla_x (pg, svneg_x (pg, ar2), ar, r);
+ svfloat64_t lo3 = svmls_x (pg, ar2, ar, r);
svfloat64_t lo4 = svadd_x (pg, svsub_x (pg, t2, hi), ar2);
/* p = log1p(r) - r - A[0]*r*r. */
/* p = (ar3 * (A[1] + r * A[2] + ar2 * (A[3] + r * A[4] + ar2 * (A[5] + r *
A[6])))). */
- svfloat64_t a56 = svmla_x (pg, sv_f64 (A[5]), r, A[6]);
- svfloat64_t a34 = svmla_x (pg, sv_f64 (A[3]), r, A[4]);
- svfloat64_t a12 = svmla_x (pg, sv_f64 (A[1]), r, A[2]);
+
+ svfloat64_t log_c46 = svld1rq_f64 (svptrue_b64 (), &d->log_c4);
+ svfloat64_t a56 = svmla_lane_f64 (sv_f64 (d->log_c5), r, log_c46, 1);
+ svfloat64_t a34 = svmla_lane_f64 (sv_f64 (d->log_c3), r, log_c46, 0);
+ svfloat64_t a12 = svmla_lane_f64 (sv_f64 (d->log_c1), r, log_c02, 1);
svfloat64_t p = svmla_x (pg, a34, ar2, a56);
p = svmla_x (pg, a12, ar2, p);
- p = svmul_x (pg, ar3, p);
+ p = svmul_x (svptrue_b64 (), ar3, p);
svfloat64_t lo = svadd_x (
- pg, svadd_x (pg, svadd_x (pg, svadd_x (pg, lo1, lo2), lo3), lo4), p);
+ pg, svadd_x (pg, svsub_x (pg, svadd_x (pg, lo1, lo2), lo3), lo4), p);
svfloat64_t y = svadd_x (pg, hi, lo);
*tail = svadd_x (pg, svsub_x (pg, hi, y), lo);
return y;
}
+static inline svfloat64_t
+sv_exp_core (svbool_t pg, svfloat64_t x, svfloat64_t xtail,
+ svuint64_t sign_bias, svfloat64_t *tmp, svuint64_t *sbits,
+ svuint64_t *ki, const struct data *d)
+{
+ /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */
+ /* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N]. */
+ svfloat64_t n_over_ln2_and_c2 = svld1rq_f64 (svptrue_b64 (), &d->n_over_ln2);
+ svfloat64_t z = svmul_lane_f64 (x, n_over_ln2_and_c2, 0);
+ /* z - kd is in [-1, 1] in non-nearest rounding modes. */
+ svfloat64_t kd = svrinta_x (pg, z);
+ *ki = svreinterpret_u64 (svcvt_s64_x (pg, kd));
+
+ svfloat64_t ln2_over_n_hilo
+ = svld1rq_f64 (svptrue_b64 (), &d->ln2_over_n_hi);
+ svfloat64_t r = x;
+ r = svmls_lane_f64 (r, kd, ln2_over_n_hilo, 0);
+ r = svmls_lane_f64 (r, kd, ln2_over_n_hilo, 1);
+ /* The code assumes 2^-200 < |xtail| < 2^-8/N. */
+ r = svadd_x (pg, r, xtail);
+ /* 2^(k/N) ~= scale. */
+ svuint64_t idx = svand_x (pg, *ki, N_EXP - 1);
+ svuint64_t top
+ = svlsl_x (pg, svadd_x (pg, *ki, sign_bias), 52 - V_POW_EXP_TABLE_BITS);
+ /* This is only a valid scale when -1023*N < k < 1024*N. */
+ *sbits = svld1_gather_index (pg, __v_pow_exp_data.sbits, idx);
+ *sbits = svadd_x (pg, *sbits, top);
+ /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (exp(r) - 1). */
+ svfloat64_t r2 = svmul_x (svptrue_b64 (), r, r);
+ *tmp = svmla_lane_f64 (sv_f64 (d->exp_c1), r, n_over_ln2_and_c2, 1);
+ *tmp = svmla_x (pg, sv_f64 (d->exp_c0), r, *tmp);
+ *tmp = svmla_x (pg, r, r2, *tmp);
+ svfloat64_t scale = svreinterpret_f64 (*sbits);
+ /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there
+ is no spurious underflow here even without fma. */
+ z = svmla_x (pg, scale, scale, *tmp);
+ return z;
+}
+
/* Computes sign*exp(x+xtail) where |xtail| < 2^-8/N and |xtail| <= |x|.
The sign_bias argument is SignBias or 0 and sets the sign to -1 or 1. */
static inline svfloat64_t
sv_exp_inline (svbool_t pg, svfloat64_t x, svfloat64_t xtail,
- svuint64_t sign_bias)
+ svuint64_t sign_bias, const struct data *d)
{
/* 3 types of special cases: tiny (uflow and spurious uflow), huge (oflow)
and other cases of large values of x (scale * (1 + TMP) oflow). */
/* |x| is large (|x| >= 512) or tiny (|x| <= 0x1p-54). */
svbool_t uoflow = svcmpge (pg, svsub_x (pg, abstop, SmallExp), ThresExp);
- /* Conditions special, uflow and oflow are all expressed as uoflow &&
- something, hence do not bother computing anything if no lane in uoflow is
- true. */
- svbool_t special = svpfalse_b ();
- svbool_t uflow = svpfalse_b ();
- svbool_t oflow = svpfalse_b ();
+ svfloat64_t tmp;
+ svuint64_t sbits, ki;
if (__glibc_unlikely (svptest_any (pg, uoflow)))
{
+ svfloat64_t z
+ = sv_exp_core (pg, x, xtail, sign_bias, &tmp, &sbits, &ki, d);
+
/* |x| is tiny (|x| <= 0x1p-54). */
- uflow = svcmpge (pg, svsub_x (pg, abstop, SmallExp), 0x80000000);
+ svbool_t uflow
+ = svcmpge (pg, svsub_x (pg, abstop, SmallExp), 0x80000000);
uflow = svand_z (pg, uoflow, uflow);
/* |x| is huge (|x| >= 1024). */
- oflow = svcmpge (pg, abstop, HugeExp);
+ svbool_t oflow = svcmpge (pg, abstop, HugeExp);
oflow = svand_z (pg, uoflow, svbic_z (pg, oflow, uflow));
+
/* For large |x| values (512 < |x| < 1024) scale * (1 + TMP) can overflow
- or underflow. */
- special = svbic_z (pg, uoflow, svorr_z (pg, uflow, oflow));
+ or underflow. */
+ svbool_t special = svbic_z (pg, uoflow, svorr_z (pg, uflow, oflow));
+
+ /* Update result with special and large cases. */
+ z = sv_call_specialcase (tmp, sbits, ki, z, special);
+
+ /* Handle underflow and overflow. */
+ svbool_t x_is_neg = svcmplt (pg, x, 0);
+ svuint64_t sign_mask
+ = svlsl_x (pg, sign_bias, 52 - V_POW_EXP_TABLE_BITS);
+ svfloat64_t res_uoflow
+ = svsel (x_is_neg, sv_f64 (0.0), sv_f64 (INFINITY));
+ res_uoflow = svreinterpret_f64 (
+ svorr_x (pg, svreinterpret_u64 (res_uoflow), sign_mask));
+ /* Avoid spurious underflow for tiny x. */
+ svfloat64_t res_spurious_uflow
+ = svreinterpret_f64 (svorr_x (pg, sign_mask, 0x3ff0000000000000));
+
+ z = svsel (oflow, res_uoflow, z);
+ z = svsel (uflow, res_spurious_uflow, z);
+ return z;
}
- /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */
- /* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N]. */
- svfloat64_t z = svmul_x (pg, x, __v_pow_exp_data.n_over_ln2);
- /* z - kd is in [-1, 1] in non-nearest rounding modes. */
- svfloat64_t shift = sv_f64 (__v_pow_exp_data.shift);
- svfloat64_t kd = svadd_x (pg, z, shift);
- svuint64_t ki = svreinterpret_u64 (kd);
- kd = svsub_x (pg, kd, shift);
- svfloat64_t r = x;
- r = svmls_x (pg, r, kd, __v_pow_exp_data.ln2_over_n_hi);
- r = svmls_x (pg, r, kd, __v_pow_exp_data.ln2_over_n_lo);
- /* The code assumes 2^-200 < |xtail| < 2^-8/N. */
- r = svadd_x (pg, r, xtail);
- /* 2^(k/N) ~= scale. */
- svuint64_t idx = svand_x (pg, ki, N_EXP - 1);
- svuint64_t top
- = svlsl_x (pg, svadd_x (pg, ki, sign_bias), 52 - V_POW_EXP_TABLE_BITS);
- /* This is only a valid scale when -1023*N < k < 1024*N. */
- svuint64_t sbits = svld1_gather_index (pg, __v_pow_exp_data.sbits, idx);
- sbits = svadd_x (pg, sbits, top);
- /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (exp(r) - 1). */
- svfloat64_t r2 = svmul_x (pg, r, r);
- svfloat64_t tmp = svmla_x (pg, sv_f64 (C[1]), r, C[2]);
- tmp = svmla_x (pg, sv_f64 (C[0]), r, tmp);
- tmp = svmla_x (pg, r, r2, tmp);
- svfloat64_t scale = svreinterpret_f64 (sbits);
- /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there
- is no spurious underflow here even without fma. */
- z = svmla_x (pg, scale, scale, tmp);
-
- /* Update result with special and large cases. */
- if (__glibc_unlikely (svptest_any (pg, special)))
- z = sv_call_specialcase (tmp, sbits, ki, z, special);
-
- /* Handle underflow and overflow. */
- svuint64_t sign_bit = svlsr_x (pg, svreinterpret_u64 (x), 63);
- svbool_t x_is_neg = svcmpne (pg, sign_bit, 0);
- svuint64_t sign_mask = svlsl_x (pg, sign_bias, 52 - V_POW_EXP_TABLE_BITS);
- svfloat64_t res_uoflow = svsel (x_is_neg, sv_f64 (0.0), sv_f64 (INFINITY));
- res_uoflow = svreinterpret_f64 (
- svorr_x (pg, svreinterpret_u64 (res_uoflow), sign_mask));
- z = svsel (oflow, res_uoflow, z);
- /* Avoid spurious underflow for tiny x. */
- svfloat64_t res_spurious_uflow
- = svreinterpret_f64 (svorr_x (pg, sign_mask, 0x3ff0000000000000));
- z = svsel (uflow, res_spurious_uflow, z);
-
- return z;
+ return sv_exp_core (pg, x, xtail, sign_bias, &tmp, &sbits, &ki, d);
}
static inline double
svfloat64_t SV_NAME_D2 (pow) (svfloat64_t x, svfloat64_t y, const svbool_t pg)
{
+ const struct data *d = ptr_barrier (&data);
+
/* This preamble handles special case conditions used in the final scalar
fallbacks. It also updates ix and sign_bias, that are used in the core
computation too, i.e., exp( y * log (x) ). */
svuint64_t vix0 = svreinterpret_u64 (x);
svuint64_t viy0 = svreinterpret_u64 (y);
- svuint64_t vtopx0 = svlsr_x (svptrue_b64 (), vix0, 52);
/* Negative x cases. */
- svuint64_t sign_bit = svlsr_m (pg, vix0, 63);
- svbool_t xisneg = svcmpeq (pg, sign_bit, 1);
+ svbool_t xisneg = svcmplt (pg, x, 0);
/* Set sign_bias and ix depending on sign of x and nature of y. */
- svbool_t yisnotint_xisneg = svpfalse_b ();
+ svbool_t yint_or_xpos = pg;
svuint64_t sign_bias = sv_u64 (0);
svuint64_t vix = vix0;
- svuint64_t vtopx1 = vtopx0;
if (__glibc_unlikely (svptest_any (pg, xisneg)))
{
/* Determine nature of y. */
- yisnotint_xisneg = sv_isnotint (xisneg, y);
- svbool_t yisint_xisneg = sv_isint (xisneg, y);
+ yint_or_xpos = sv_isint (xisneg, y);
svbool_t yisodd_xisneg = sv_isodd (xisneg, y);
/* ix set to abs(ix) if y is integer. */
- vix = svand_m (yisint_xisneg, vix0, 0x7fffffffffffffff);
- vtopx1 = svand_m (yisint_xisneg, vtopx0, 0x7ff);
+ vix = svand_m (yint_or_xpos, vix0, 0x7fffffffffffffff);
/* Set to SignBias if x is negative and y is odd. */
sign_bias = svsel (yisodd_xisneg, sv_u64 (SignBias), sv_u64 (0));
}
- /* Special cases of x or y: zero, inf and nan. */
- svbool_t xspecial = sv_zeroinfnan (pg, vix0);
- svbool_t yspecial = sv_zeroinfnan (pg, viy0);
- svbool_t special = svorr_z (pg, xspecial, yspecial);
-
/* Small cases of x: |x| < 0x1p-126. */
- svuint64_t vabstopx0 = svand_x (pg, vtopx0, 0x7ff);
- svbool_t xsmall = svcmplt (pg, vabstopx0, SmallPowX);
- if (__glibc_unlikely (svptest_any (pg, xsmall)))
+ svbool_t xsmall = svaclt (yint_or_xpos, x, SmallBoundX);
+ if (__glibc_unlikely (svptest_any (yint_or_xpos, xsmall)))
{
/* Normalize subnormal x so exponent becomes negative. */
- svbool_t topx_is_null = svcmpeq (xsmall, vtopx1, 0);
+ svuint64_t vtopx = svlsr_x (svptrue_b64 (), vix, 52);
+ svbool_t topx_is_null = svcmpeq (xsmall, vtopx, 0);
svuint64_t vix_norm = svreinterpret_u64 (svmul_m (xsmall, x, 0x1p52));
vix_norm = svand_m (xsmall, vix_norm, 0x7fffffffffffffff);
/* y_hi = log(ix, &y_lo). */
svfloat64_t vlo;
- svfloat64_t vhi = sv_log_inline (pg, vix, &vlo);
+ svfloat64_t vhi = sv_log_inline (yint_or_xpos, vix, &vlo, d);
/* z = exp(y_hi, y_lo, sign_bias). */
- svfloat64_t vehi = svmul_x (pg, y, vhi);
- svfloat64_t velo = svmul_x (pg, y, vlo);
- svfloat64_t vemi = svmls_x (pg, vehi, y, vhi);
- velo = svsub_x (pg, velo, vemi);
- svfloat64_t vz = sv_exp_inline (pg, vehi, velo, sign_bias);
+ svfloat64_t vehi = svmul_x (svptrue_b64 (), y, vhi);
+ svfloat64_t vemi = svmls_x (yint_or_xpos, vehi, y, vhi);
+ svfloat64_t velo = svnmls_x (yint_or_xpos, vemi, y, vlo);
+ svfloat64_t vz = sv_exp_inline (yint_or_xpos, vehi, velo, sign_bias, d);
/* Cases of finite y and finite negative x. */
- vz = svsel (yisnotint_xisneg, sv_f64 (__builtin_nan ("")), vz);
+ vz = svsel (yint_or_xpos, vz, sv_f64 (__builtin_nan ("")));
+
+ /* Special cases of x or y: zero, inf and nan. */
+ svbool_t xspecial = sv_zeroinfnan (svptrue_b64 (), vix0);
+ svbool_t yspecial = sv_zeroinfnan (svptrue_b64 (), viy0);
+ svbool_t special = svorr_z (svptrue_b64 (), xspecial, yspecial);
/* Cases of zero/inf/nan x or y. */
- if (__glibc_unlikely (svptest_any (pg, special)))
+ if (__glibc_unlikely (svptest_any (svptrue_b64 (), special)))
vz = sv_call2_f64 (pow_sc, x, y, vz, special);
return vz;
#define Tlogc __v_powf_data.logc
#define Texp __v_powf_data.scale
#define SignBias (1 << (V_POWF_EXP2_TABLE_BITS + 11))
-#define Shift 0x1.8p52
#define Norm 0x1p23f /* 0x4b000000. */
/* Overall ULP error bound for pow is 2.6 ulp
double log_poly[4];
double exp_poly[3];
float uflow_bound, oflow_bound, small_bound;
- uint32_t sign_bias, sign_mask, subnormal_bias, off;
+ uint32_t sign_bias, subnormal_bias, off;
} data = {
/* rel err: 1.5 * 2^-30. Each coefficients is multiplied the value of
V_POWF_EXP2_N. */
.small_bound = 0x1p-126f,
.off = 0x3f35d000,
.sign_bias = SignBias,
- .sign_mask = 0x80000000,
.subnormal_bias = 0x0b800000, /* 23 << 23. */
};
static inline svbool_t
sv_zeroinfnan (svbool_t pg, svuint32_t i)
{
- return svcmpge (pg, svsub_x (pg, svmul_x (pg, i, 2u), 1),
+ return svcmpge (pg, svsub_x (pg, svadd_x (pg, i, i), 1),
2u * 0x7f800000 - 1);
}
}
/* Scalar fallback for special case routines with custom signature. */
-static inline svfloat32_t
-sv_call_powf_sc (svfloat32_t x1, svfloat32_t x2, svfloat32_t y, svbool_t cmp)
+static svfloat32_t NOINLINE
+sv_call_powf_sc (svfloat32_t x1, svfloat32_t x2, svfloat32_t y)
{
+ /* Special cases of x or y: zero, inf and nan. */
+ svbool_t xspecial = sv_zeroinfnan (svptrue_b32 (), svreinterpret_u32 (x1));
+ svbool_t yspecial = sv_zeroinfnan (svptrue_b32 (), svreinterpret_u32 (x2));
+ svbool_t cmp = svorr_z (svptrue_b32 (), xspecial, yspecial);
+
svbool_t p = svpfirst (cmp, svpfalse ());
while (svptest_any (cmp, p))
{
/* Polynomial to approximate log1p(r)/ln2. */
svfloat64_t logx = A (0);
- logx = svmla_x (pg, A (1), r, logx);
- logx = svmla_x (pg, A (2), r, logx);
- logx = svmla_x (pg, A (3), r, logx);
- logx = svmla_x (pg, y0, r, logx);
+ logx = svmad_x (pg, r, logx, A (1));
+ logx = svmad_x (pg, r, logx, A (2));
+ logx = svmad_x (pg, r, logx, A (3));
+ logx = svmad_x (pg, r, logx, y0);
*pylogx = svmul_x (pg, y, logx);
/* z - kd is in [-1, 1] in non-nearest rounding modes. */
- svfloat64_t kd = svadd_x (pg, *pylogx, Shift);
- svuint64_t ki = svreinterpret_u64 (kd);
- kd = svsub_x (pg, kd, Shift);
+ svfloat64_t kd = svrinta_x (svptrue_b64 (), *pylogx);
+ svuint64_t ki = svreinterpret_u64 (svcvt_s64_x (svptrue_b64 (), kd));
r = svsub_x (pg, *pylogx, kd);
/* exp2(x) = 2^(k/N) * 2^r ~= s * (C0*r^3 + C1*r^2 + C2*r + 1). */
- svuint64_t t
- = svld1_gather_index (pg, Texp, svand_x (pg, ki, V_POWF_EXP2_N - 1));
- svuint64_t ski = svadd_x (pg, ki, sign_bias);
- t = svadd_x (pg, t, svlsl_x (pg, ski, 52 - V_POWF_EXP2_TABLE_BITS));
+ svuint64_t t = svld1_gather_index (
+ svptrue_b64 (), Texp, svand_x (svptrue_b64 (), ki, V_POWF_EXP2_N - 1));
+ svuint64_t ski = svadd_x (svptrue_b64 (), ki, sign_bias);
+ t = svadd_x (svptrue_b64 (), t,
+ svlsl_x (svptrue_b64 (), ski, 52 - V_POWF_EXP2_TABLE_BITS));
svfloat64_t s = svreinterpret_f64 (t);
svfloat64_t p = C (0);
p = svmla_x (pg, C (1), p, r);
p = svmla_x (pg, C (2), p, r);
- p = svmla_x (pg, s, p, svmul_x (pg, s, r));
+ p = svmla_x (pg, s, p, svmul_x (svptrue_b64 (), s, r));
return p;
}
{
const svbool_t ptrue = svptrue_b64 ();
- /* Unpack and promote input vectors (pg, y, z, i, k and sign_bias) into two in
- order to perform core computation in double precision. */
+ /* Unpack and promote input vectors (pg, y, z, i, k and sign_bias) into two
+ * in order to perform core computation in double precision. */
const svbool_t pg_lo = svunpklo (pg);
const svbool_t pg_hi = svunpkhi (pg);
- svfloat64_t y_lo = svcvt_f64_x (
- ptrue, svreinterpret_f32 (svunpklo (svreinterpret_u32 (y))));
- svfloat64_t y_hi = svcvt_f64_x (
- ptrue, svreinterpret_f32 (svunpkhi (svreinterpret_u32 (y))));
- svfloat32_t z = svreinterpret_f32 (iz);
- svfloat64_t z_lo = svcvt_f64_x (
- ptrue, svreinterpret_f32 (svunpklo (svreinterpret_u32 (z))));
- svfloat64_t z_hi = svcvt_f64_x (
- ptrue, svreinterpret_f32 (svunpkhi (svreinterpret_u32 (z))));
+ svfloat64_t y_lo
+ = svcvt_f64_x (pg, svreinterpret_f32 (svunpklo (svreinterpret_u32 (y))));
+ svfloat64_t y_hi
+ = svcvt_f64_x (pg, svreinterpret_f32 (svunpkhi (svreinterpret_u32 (y))));
+ svfloat64_t z_lo = svcvt_f64_x (pg, svreinterpret_f32 (svunpklo (iz)));
+ svfloat64_t z_hi = svcvt_f64_x (pg, svreinterpret_f32 (svunpkhi (iz)));
svuint64_t i_lo = svunpklo (i);
svuint64_t i_hi = svunpkhi (i);
svint64_t k_lo = svunpklo (k);
/* Implementation of SVE powf.
Provides the same accuracy as AdvSIMD powf, since it relies on the same
algorithm. The theoretical maximum error is under 2.60 ULPs.
- Maximum measured error is 2.56 ULPs:
- SV_NAME_F2 (pow) (0x1.004118p+0, 0x1.5d14a4p+16) got 0x1.fd4bp+127
- want 0x1.fd4b06p+127. */
+ Maximum measured error is 2.57 ULPs:
+ SV_NAME_F2 (pow) (0x1.031706p+0, 0x1.ce2ec2p+12) got 0x1.fff868p+127
+ want 0x1.fff862p+127. */
svfloat32_t SV_NAME_F2 (pow) (svfloat32_t x, svfloat32_t y, const svbool_t pg)
{
const struct data *d = ptr_barrier (&data);
svuint32_t viy0 = svreinterpret_u32 (y);
/* Negative x cases. */
- svuint32_t sign_bit = svand_m (pg, vix0, d->sign_mask);
- svbool_t xisneg = svcmpeq (pg, sign_bit, d->sign_mask);
+ svbool_t xisneg = svcmplt (pg, x, sv_f32 (0));
/* Set sign_bias and ix depending on sign of x and nature of y. */
- svbool_t yisnotint_xisneg = svpfalse_b ();
+ svbool_t yint_or_xpos = pg;
svuint32_t sign_bias = sv_u32 (0);
svuint32_t vix = vix0;
if (__glibc_unlikely (svptest_any (pg, xisneg)))
{
/* Determine nature of y. */
- yisnotint_xisneg = svisnotint (xisneg, y);
- svbool_t yisint_xisneg = svisint (xisneg, y);
+ yint_or_xpos = svisint (xisneg, y);
svbool_t yisodd_xisneg = svisodd (xisneg, y);
/* ix set to abs(ix) if y is integer. */
- vix = svand_m (yisint_xisneg, vix0, 0x7fffffff);
+ vix = svand_m (yint_or_xpos, vix0, 0x7fffffff);
/* Set to SignBias if x is negative and y is odd. */
sign_bias = svsel (yisodd_xisneg, sv_u32 (d->sign_bias), sv_u32 (0));
}
svbool_t cmp = svorr_z (pg, xspecial, yspecial);
/* Small cases of x: |x| < 0x1p-126. */
- svbool_t xsmall = svaclt (pg, x, d->small_bound);
- if (__glibc_unlikely (svptest_any (pg, xsmall)))
+ svbool_t xsmall = svaclt (yint_or_xpos, x, d->small_bound);
+ if (__glibc_unlikely (svptest_any (yint_or_xpos, xsmall)))
{
/* Normalize subnormal x so exponent becomes negative. */
svuint32_t vix_norm = svreinterpret_u32 (svmul_x (xsmall, x, Norm));
vix = svsel (xsmall, vix_norm, vix);
}
/* Part of core computation carried in working precision. */
- svuint32_t tmp = svsub_x (pg, vix, d->off);
- svuint32_t i = svand_x (pg, svlsr_x (pg, tmp, (23 - V_POWF_LOG2_TABLE_BITS)),
- V_POWF_LOG2_N - 1);
- svuint32_t top = svand_x (pg, tmp, 0xff800000);
- svuint32_t iz = svsub_x (pg, vix, top);
- svint32_t k
- = svasr_x (pg, svreinterpret_s32 (top), (23 - V_POWF_EXP2_TABLE_BITS));
-
- /* Compute core in extended precision and return intermediate ylogx results to
- handle cases of underflow and underflow in exp. */
+ svuint32_t tmp = svsub_x (yint_or_xpos, vix, d->off);
+ svuint32_t i = svand_x (
+ yint_or_xpos, svlsr_x (yint_or_xpos, tmp, (23 - V_POWF_LOG2_TABLE_BITS)),
+ V_POWF_LOG2_N - 1);
+ svuint32_t top = svand_x (yint_or_xpos, tmp, 0xff800000);
+ svuint32_t iz = svsub_x (yint_or_xpos, vix, top);
+ svint32_t k = svasr_x (yint_or_xpos, svreinterpret_s32 (top),
+ (23 - V_POWF_EXP2_TABLE_BITS));
+
+ /* Compute core in extended precision and return intermediate ylogx results
+ * to handle cases of underflow and underflow in exp. */
svfloat32_t ylogx;
- svfloat32_t ret = sv_powf_core (pg, i, iz, k, y, sign_bias, &ylogx, d);
+ svfloat32_t ret
+ = sv_powf_core (yint_or_xpos, i, iz, k, y, sign_bias, &ylogx, d);
/* Handle exp special cases of underflow and overflow. */
- svuint32_t sign = svlsl_x (pg, sign_bias, 20 - V_POWF_EXP2_TABLE_BITS);
+ svuint32_t sign
+ = svlsl_x (yint_or_xpos, sign_bias, 20 - V_POWF_EXP2_TABLE_BITS);
svfloat32_t ret_oflow
- = svreinterpret_f32 (svorr_x (pg, sign, asuint (INFINITY)));
+ = svreinterpret_f32 (svorr_x (yint_or_xpos, sign, asuint (INFINITY)));
svfloat32_t ret_uflow = svreinterpret_f32 (sign);
- ret = svsel (svcmple (pg, ylogx, d->uflow_bound), ret_uflow, ret);
- ret = svsel (svcmpgt (pg, ylogx, d->oflow_bound), ret_oflow, ret);
+ ret = svsel (svcmple (yint_or_xpos, ylogx, d->uflow_bound), ret_uflow, ret);
+ ret = svsel (svcmpgt (yint_or_xpos, ylogx, d->oflow_bound), ret_oflow, ret);
/* Cases of finite y and finite negative x. */
- ret = svsel (yisnotint_xisneg, sv_f32 (__builtin_nanf ("")), ret);
+ ret = svsel (yint_or_xpos, ret, sv_f32 (__builtin_nanf ("")));
- if (__glibc_unlikely (svptest_any (pg, cmp)))
- return sv_call_powf_sc (x, y, ret, cmp);
+ if (__glibc_unlikely (svptest_any (cmp, cmp)))
+ return sv_call_powf_sc (x, y, ret);
return ret;
}
/* scale = 2^(n/N). */
svfloat32_t scale = svexpa (svreinterpret_u32 (z));
- /* y = exp(r) - 1 ~= r + C0 r^2 + C1 r^3 + C2 r^4 + C3 r^5 + C4 r^6. */
+ /* poly(r) = exp(r) - 1 ~= C0 r + C1 r^2 + C2 r^3 + C3 r^4 + C4 r^5. */
svfloat32_t p12 = svmla_lane (sv_f32 (d->c1), r, lane_consts, 2);
svfloat32_t p34 = svmla_lane (sv_f32 (d->c3), r, lane_consts, 3);
svfloat32_t r2 = svmul_x (svptrue_b32 (), r, r);
return svmla_x (pg, scale, scale, poly);
}
-
#endif
memset_kunpeng \
memset_mops \
memset_oryon1 \
+ memset_sve_zva64 \
memset_zva64 \
strlen_asimd \
strlen_generic \
IFUNC_IMPL_ADD (array, i, memset, 1, __memset_kunpeng)
#if HAVE_AARCH64_SVE_ASM
IFUNC_IMPL_ADD (array, i, memset, sve && !bti && zva_size == 256, __memset_a64fx)
+ IFUNC_IMPL_ADD (array, i, memset, sve && zva_size == 64, __memset_sve_zva64)
#endif
IFUNC_IMPL_ADD (array, i, memset, mops, __memset_mops)
IFUNC_IMPL_ADD (array, i, memset, 1, __memset_generic))
extern __typeof (__redirect_memset) __memset_generic attribute_hidden;
extern __typeof (__redirect_memset) __memset_mops attribute_hidden;
extern __typeof (__redirect_memset) __memset_oryon1 attribute_hidden;
+extern __typeof (__redirect_memset) __memset_sve_zva64 attribute_hidden;
static inline __typeof (__redirect_memset) *
select_memset_ifunc (void)
{
if (IS_A64FX (midr) && zva_size == 256)
return __memset_a64fx;
+
+ if (prefer_sve_ifuncs && zva_size == 64)
+ return __memset_sve_zva64;
}
if (IS_ORYON1 (midr) && zva_size == 64)
--- /dev/null
+/* Optimized memset for SVE.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+/* Assumptions:
+ *
+ * ARMv8-a, AArch64, Advanced SIMD, SVE, unaligned accesses.
+ * ZVA size is 64.
+ */
+
+#if HAVE_AARCH64_SVE_ASM
+
+.arch armv8.2-a+sve
+
+#define dstin x0
+#define val x1
+#define valw w1
+#define count x2
+#define dst x3
+#define dstend x4
+#define zva_val x5
+#define vlen x5
+#define off x3
+#define dstend2 x5
+
+ENTRY (__memset_sve_zva64)
+ dup v0.16B, valw
+ cmp count, 16
+ b.lo L(set_16)
+
+ add dstend, dstin, count
+ cmp count, 64
+ b.hs L(set_128)
+
+ /* Set 16..63 bytes. */
+ mov off, 16
+ and off, off, count, lsr 1
+ sub dstend2, dstend, off
+ str q0, [dstin]
+ str q0, [dstin, off]
+ str q0, [dstend2, -16]
+ str q0, [dstend, -16]
+ ret
+
+ .p2align 4
+L(set_16):
+ whilelo p0.b, xzr, count
+ st1b z0.b, p0, [dstin]
+ ret
+
+ .p2align 4
+L(set_128):
+ bic dst, dstin, 15
+ cmp count, 128
+ b.hi L(set_long)
+ stp q0, q0, [dstin]
+ stp q0, q0, [dstin, 32]
+ stp q0, q0, [dstend, -64]
+ stp q0, q0, [dstend, -32]
+ ret
+
+ .p2align 4
+L(set_long):
+ cmp count, 256
+ b.lo L(no_zva)
+ tst valw, 255
+ b.ne L(no_zva)
+
+ str q0, [dstin]
+ str q0, [dst, 16]
+ bic dst, dstin, 31
+ stp q0, q0, [dst, 32]
+ bic dst, dstin, 63
+ sub count, dstend, dst /* Count is now 64 too large. */
+ sub count, count, 128 /* Adjust count and bias for loop. */
+
+ sub x8, dstend, 1 /* Write last bytes before ZVA loop. */
+ bic x8, x8, 15
+ stp q0, q0, [x8, -48]
+ str q0, [x8, -16]
+ str q0, [dstend, -16]
+
+ .p2align 4
+L(zva64_loop):
+ add dst, dst, 64
+ dc zva, dst
+ subs count, count, 64
+ b.hi L(zva64_loop)
+ ret
+
+L(no_zva):
+ str q0, [dstin]
+ sub count, dstend, dst /* Count is 16 too large. */
+ sub count, count, 64 + 16 /* Adjust count and bias for loop. */
+L(no_zva_loop):
+ stp q0, q0, [dst, 16]
+ stp q0, q0, [dst, 48]
+ add dst, dst, 64
+ subs count, count, 64
+ b.hi L(no_zva_loop)
+ stp q0, q0, [dstend, -64]
+ stp q0, q0, [dstend, -32]
+ ret
+
+END (__memset_sve_zva64)
+#endif
extern size_t _dl_phnum;
#endif
+/* Possible values for the glibc.rtld.execstack tunable. */
+enum stack_tunable_mode
+ {
+ /* Do not allow executable stacks, even if program requires it. */
+ stack_tunable_mode_disable = 0,
+ /* Follows either ABI requirement, or the PT_GNU_STACK value. */
+ stack_tunable_mode_enable = 1,
+ /* Always enable an executable stack. */
+ stack_tunable_mode_force = 2
+ };
+
+void _dl_handle_execstack_tunable (void) attribute_hidden;
+
/* This function changes the permission of the memory region pointed
by STACK_ENDP to executable and update the internal memory protection
flags for future thread stack creation. */
-int _dl_make_stack_executable (void **stack_endp) attribute_hidden;
+int _dl_make_stack_executable (const void *stack_endp) attribute_hidden;
/* Variable pointing to the end of the stack (or close to it). This value
must be constant over the runtime of the application. Some programs
static const double huge = 1e300;
+#ifndef SECTION
+# define SECTION
+#endif
+
+SECTION
double
__ieee754_atanh (double x)
{
return copysign (t, x);
}
+
+#ifndef __ieee754_atanh
libm_alias_finite (__ieee754_atanh, __atanh)
+#endif
static const double one = 1.0, shuge = 1.0e307;
+#ifndef SECTION
+# define SECTION
+#endif
+
+SECTION
double
__ieee754_sinh (double x)
{
/* |x| > overflowthresold, sinh(x) overflow */
return math_narrow_eval (x * shuge);
}
+
+#ifndef __ieee754_sinh
libm_alias_finite (__ieee754_sinh, __sinh)
+#endif
extern const struct exp_data
{
double invln2N;
- double shift;
double negln2hiN;
double negln2loN;
double poly[4]; /* Last four coefficients. */
+ double shift;
+
double exp2_shift;
double exp2_poly[EXP2_POLY_ORDER];
- double invlog10_2N;
+
double neglog10_2hiN;
double neglog10_2loN;
double exp10_poly[5];
+ double invlog10_2N;
uint64_t tab[2*(1 << EXP_TABLE_BITS)];
} __exp_data attribute_hidden;
/* Reset rounding mode and test for inexact simultaneously. */
int j = libc_feupdateenv_test (&env, FE_INEXACT) != 0;
+ /* Ensure value of a1 + u.d is not reused. */
+ a1 = math_opt_barrier (a1);
+
if (__glibc_likely (adjust == 0))
{
if ((u.ieee.mantissa1 & 1) == 0 && u.ieee.exponent != 0x7ff)
static const double one = 1.0, two = 2.0, tiny = 1.0e-300;
+#ifndef SECTION
+# define SECTION
+#endif
+
+SECTION
double
__tanh (double x)
{
{ /* |x| <= 0x1.250bfep-11 */
if (__glibc_unlikely (ux < 0x66000000u)) /* |x| < 0x1p-24 */
return fmaf (x, fabsf (x), x);
- if (__glibc_unlikely (st.uarg == asuint (ux)))
+ if (__glibc_unlikely (st.uarg == ux))
{
float sgn = copysignf (1.0f, x);
return sgn * st.rh + sgn * st.rl;
};
static const double tl[] =
{
- 0x1.562ec497ef351p-43, 0x1.b9476892ea99cp-8, 0x1.b5e909c959eecp-7,
+ -0x1.562ec497ef351p-43, 0x1.b9476892ea99cp-8, 0x1.b5e909c959eecp-7,
0x1.45f4f59ec84fp-6, 0x1.af5f92cbcf2aap-6, 0x1.0ba01a6069052p-5,
0x1.3ed119b99dd41p-5, 0x1.714834298a088p-5, 0x1.a30a9d98309c1p-5,
0x1.d41d51266b9d9p-5, 0x1.02428c0f62dfcp-4, 0x1.1a23444eea521p-4,
uint32_t sgn = t >> 31;
for (int j = 0; j < array_length (st); j++)
{
- if (__glibc_unlikely (asfloat (st[j].arg) == ax))
+ if (__glibc_unlikely (asuint (st[j].arg) == ax))
{
if (sgn)
return -st[j].rh - st[j].rl;
so as to mprotect it. */
int
-_dl_make_stack_executable (void **stack_endp)
+_dl_make_stack_executable (const void *stack_endp)
{
/* Challenge the caller. */
- if (__builtin_expect (*stack_endp != __libc_stack_end, 0))
+ if (__glibc_unlikely (stack_endp != __libc_stack_end))
return EPERM;
- *stack_endp = NULL;
#if IS_IN (rtld)
if (__mprotect ((void *)_dl_hurd_data->stack_base, _dl_hurd_data->stack_size,
unsigned int __g1_orig_size;
unsigned int __wrefs;
unsigned int __g_signals[2];
+ unsigned int __unused_initialized_1;
+ unsigned int __unused_initialized_2;
};
typedef unsigned int __tss_t;
#include <tls.h>
#include <rseq-internal.h>
#include <thread_pointer.h>
+#include <dl-symbol-redir-ifunc.h>
#define TUNABLE_NAMESPACE pthread
#include <dl-tunables.h>
/* Conditional variable handling. */
-#define PTHREAD_COND_INITIALIZER { { {0}, {0}, {0, 0}, 0, 0, {0, 0} } }
+#define PTHREAD_COND_INITIALIZER { { {0}, {0}, {0, 0}, 0, 0, {0, 0}, 0, 0 } }
/* Cleanup buffers */
tst-cancel28 \
tst-cancel29 \
tst-cancel30 \
+ tst-cancel32 \
tst-cleanup0 \
tst-cleanup1 \
tst-cleanup2 \
tst-spin4 \
tst-spin5 \
tst-stack1 \
+ tst-stack2 \
tst-stdio1 \
tst-stdio2 \
tst-thrd-detach \
tst-atfork4mod \
tst-create1mod \
tst-fini1mod \
+ tst-stack2-mod \
tst-tls4moda \
tst-tls4modb \
# modules-names
$(objpfx)tst-create1: $(shared-thread-library)
$(objpfx)tst-create1.out: $(objpfx)tst-create1mod.so
+$(objpfx)tst-stack2.out: $(objpfx)tst-stack2-mod.so
+$(objpfx)tst-stack2-mod.so: $(shared-thread-library)
+LDFLAGS-tst-stack2-mod.so = -Wl,-z,execstack
+ifeq ($(have-no-error-execstack),yes)
+LDFLAGS-tst-stack2-mod.so += -Wl,--no-error-execstack
+endif
+tst-stack2-ENV = GLIBC_TUNABLES=glibc.rtld.execstack=2
+
endif
--- /dev/null
+/* Check if pthread_setcanceltype disables asynchronous cancellation
+ once cancellation happens (BZ 32782)
+
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* The pthread_setcanceltype is a cancellation entrypoint, and if
+ asynchronous is enabled and the cancellation starts (on the second
+ pthread_setcanceltype call), the asynchronous should not restart
+ the process. */
+
+#include <support/xthread.h>
+
+#define NITER 1000
+#define NTHREADS 8
+
+static void
+tf_cleanup (void *arg)
+{
+}
+
+static void *
+tf (void *closure)
+{
+ pthread_cleanup_push (tf_cleanup, NULL);
+ for (;;)
+ {
+ /* The only possible failure for pthread_setcanceltype is an
+ invalid state type. */
+ pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
+ pthread_setcanceltype (PTHREAD_CANCEL_DEFERRED, NULL);
+ }
+ pthread_cleanup_pop (1);
+
+ return NULL;
+}
+
+static void
+poll_threads (int nthreads)
+{
+ pthread_t thr[nthreads];
+ for (int i = 0; i < nthreads; i++)
+ thr[i] = xpthread_create (NULL, tf, NULL);
+ for (int i = 0; i < nthreads; i++)
+ xpthread_cancel (thr[i]);
+ for (int i = 0; i < nthreads; i++)
+ xpthread_join (thr[i]);
+}
+
+static int
+do_test (void)
+{
+ for (int k = 0; k < NITER; k++)
+ poll_threads (NTHREADS);
+
+ return 0;
+}
+
+#include <support/test-driver.c>
--- /dev/null
+/* Check if pthread_getattr_np works within modules with non-exectuble
+ stacks (BZ 32897).
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <pthread.h>
+
+bool init_result;
+
+void
+__attribute__ ((constructor))
+init (void)
+{
+ pthread_t me = pthread_self ();
+ pthread_attr_t attr;
+ init_result = pthread_getattr_np (me, &attr) == 0;
+}
+
+int
+mod_func (void)
+{
+ pthread_t me = pthread_self ();
+ pthread_attr_t attr;
+ return pthread_getattr_np (me, &attr);
+}
--- /dev/null
+/* Check if pthread_getattr_np works within modules with non-exectuble
+ stacks (BZ 32897).
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <pthread.h>
+#include <stdbool.h>
+#include <support/xdlfcn.h>
+#include <support/check.h>
+
+static int
+do_test (void)
+{
+ {
+ pthread_t me = pthread_self ();
+ pthread_attr_t attr;
+ TEST_COMPARE (pthread_getattr_np (me, &attr), 0);
+ }
+
+ void *h = xdlopen ("tst-stack2-mod.so", RTLD_NOW);
+
+ bool *init_result = xdlsym (h, "init_result");
+ TEST_COMPARE (*init_result, true);
+
+ int (*mod_func)(void) = xdlsym (h, "mod_func");
+ TEST_COMPARE (mod_func (), 0);
+
+ xdlclose (h);
+
+ return 0;
+}
+
+#include <support/test-driver.c>
gotplt[1] = (ElfW(Addr)) l;
}
- if (l->l_type == lt_executable && l->l_relocated)
+#ifdef SHARED
+ if (l->l_type == lt_executable)
{
/* The __global_pointer$ may not be defined by the linker if the
$gp register does not be used to access the global variable
_dl_lookup_symbol_x ("__global_pointer$", l, &ref,
l->l_scope, NULL, 0, 0, NULL);
if (ref)
- asm (
- "mv gp, %0\n"
- :
- : "r" (ref->st_value)
- );
+ asm (
+ "mv gp, %0\n"
+ :
+ : "r" (ref->st_value + l->l_addr)
+ /* Don't use SYMBOL_ADDRESS here since __global_pointer$
+ can be SHN_ABS type, but we need the address relative to
+ PC, not the absolute address. */
+ );
}
+#endif
#endif
return lazy;
}
tests += \
tst-aarch64-pkey \
# tests
-endif
+
+ifneq (no,$(findstring no,$(have-cc-gcs) $(have-test-cc-gcs) $(have-ld-gcs)))
+
+gcs-tests-dynamic = \
+ tst-gcs-disabled \
+ tst-gcs-dlopen-disabled \
+ tst-gcs-dlopen-enforced \
+ tst-gcs-dlopen-optional-off \
+ tst-gcs-dlopen-optional-on \
+ tst-gcs-dlopen-override \
+ tst-gcs-enforced \
+ tst-gcs-enforced-abort \
+ tst-gcs-noreturn \
+ tst-gcs-optional-off \
+ tst-gcs-optional-on \
+ tst-gcs-override \
+ tst-gcs-shared-disabled \
+ tst-gcs-shared-enforced-abort \
+ tst-gcs-shared-optional \
+ tst-gcs-shared-override \
+ # gcs-tests-dynamic
+
+gcs-tests-static = \
+ tst-gcs-disabled-static \
+ tst-gcs-enforced-static \
+ tst-gcs-enforced-static-abort \
+ tst-gcs-optional-static-off \
+ tst-gcs-optional-static-on \
+ tst-gcs-override-static \
+ # gcs-tests-static
+
+tests += \
+ $(gcs-tests-dynamic) \
+ $(gcs-tests-static) \
+ # tests
+
+tests-static += \
+ $(gcs-tests-static) \
+ # tests-static
+
+define run-gcs-abort-test
+ $(test-wrapper-env) $(run-program-env) \
+ $(tst-gcs-$*-abort-ENV) $(host-test-program-cmd)
+endef
+
+$(objpfx)tst-gcs-%-abort.out: $(..)sysdeps/unix/sysv/linux/aarch64/tst-gcs-abort.sh \
+ $(objpfx)tst-gcs-%-abort
+ $(SHELL) $< $(common-objpfx) $(test-name) '$(run-gcs-abort-test)'; \
+ $(evaluate-test)
+
+LDFLAGS-tst-gcs-disabled += -Wl,-z gcs=always
+LDFLAGS-tst-gcs-enforced += -Wl,-z gcs=always
+LDFLAGS-tst-gcs-enforced-abort += -Wl,-z gcs=never
+LDFLAGS-tst-gcs-optional-on += -Wl,-z gcs=always
+LDFLAGS-tst-gcs-optional-off += -Wl,-z gcs=never
+LDFLAGS-tst-gcs-override += -Wl,-z gcs=never
+
+LDFLAGS-tst-gcs-disabled-static += -Wl,-z gcs=always
+LDFLAGS-tst-gcs-enforced-static += -Wl,-z gcs=always
+LDFLAGS-tst-gcs-enforced-static-abort += -Wl,-z gcs=never
+LDFLAGS-tst-gcs-optional-static-on += -Wl,-z gcs=always
+LDFLAGS-tst-gcs-optional-static-off += -Wl,-z gcs=never
+LDFLAGS-tst-gcs-override-static += -Wl,-z gcs=never
+
+tst-gcs-disabled-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=0
+tst-gcs-enforced-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=1
+tst-gcs-enforced-abort-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=1
+tst-gcs-optional-on-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=2
+tst-gcs-optional-off-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=2
+tst-gcs-override-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=3
+
+tst-gcs-disabled-static-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=0
+tst-gcs-enforced-static-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=1
+tst-gcs-enforced-static-abort-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=1
+tst-gcs-optional-static-on-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=2
+tst-gcs-optional-static-off-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=2
+tst-gcs-override-static-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=3
+
+# force one of the dependencies to be unmarked
+LDFLAGS-tst-gcs-mod2.so += -Wl,-z gcs=never
+
+LDFLAGS-tst-gcs-shared-disabled = -Wl,-z gcs=always
+LDFLAGS-tst-gcs-shared-enforced-abort = -Wl,-z gcs=always
+LDFLAGS-tst-gcs-shared-optional = -Wl,-z gcs=always
+LDFLAGS-tst-gcs-shared-override = -Wl,-z gcs=always
+
+modules-names += \
+ tst-gcs-mod1 \
+ tst-gcs-mod2 \
+ tst-gcs-mod3 \
+ # modules-names
+
+$(objpfx)tst-gcs-shared-disabled: $(objpfx)tst-gcs-mod1.so $(objpfx)tst-gcs-mod3.so
+$(objpfx)tst-gcs-shared-enforced-abort: $(objpfx)tst-gcs-mod1.so $(objpfx)tst-gcs-mod3.so
+$(objpfx)tst-gcs-shared-optional: $(objpfx)tst-gcs-mod1.so $(objpfx)tst-gcs-mod3.so
+$(objpfx)tst-gcs-shared-override: $(objpfx)tst-gcs-mod1.so $(objpfx)tst-gcs-mod3.so
+$(objpfx)tst-gcs-mod1.so: $(objpfx)tst-gcs-mod2.so
+
+tst-gcs-shared-disabled-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=0
+tst-gcs-shared-enforced-abort-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=1
+tst-gcs-shared-optional-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=2
+tst-gcs-shared-override-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=3
+
+LDFLAGS-tst-gcs-dlopen-disabled = -Wl,-z gcs=always
+LDFLAGS-tst-gcs-dlopen-enforced = -Wl,-z gcs=always
+LDFLAGS-tst-gcs-dlopen-optional-on = -Wl,-z gcs=always
+LDFLAGS-tst-gcs-dlopen-optional-off = -Wl,-z gcs=never
+LDFLAGS-tst-gcs-dlopen-override = -Wl,-z gcs=always
+
+tst-gcs-dlopen-disabled-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=0
+tst-gcs-dlopen-enforced-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=1
+tst-gcs-dlopen-optional-on-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=2
+tst-gcs-dlopen-optional-off-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=2
+tst-gcs-dlopen-override-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=3
+
+$(objpfx)tst-gcs-dlopen-disabled.out: $(objpfx)tst-gcs-mod2.so
+$(objpfx)tst-gcs-dlopen-enforced.out: $(objpfx)tst-gcs-mod2.so
+$(objpfx)tst-gcs-dlopen-optional-on.out: $(objpfx)tst-gcs-mod2.so
+$(objpfx)tst-gcs-dlopen-optional-off.out: $(objpfx)tst-gcs-mod2.so
+$(objpfx)tst-gcs-dlopen-override.out: $(objpfx)tst-gcs-mod2.so
+
+LDFLAGS-tst-gcs-noreturn = -Wl,-z gcs=always
+
+tst-gcs-noreturn-ENV = GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=0
+
+endif # ifeq ($(have-test-cc-gcs),yes)
+
+endif # ifeq ($(subdir),misc)
ifeq ($(subdir),stdlib)
gen-as-const-headers += ucontext_i.sym
if (errno == ENOSYS || errno == EINVAL)
FAIL_UNSUPPORTED
("kernel or CPU does not support memory protection keys");
+ if (errno == ENOSPC)
+ FAIL_UNSUPPORTED
+ ("no keys available or kernel does not support memory"
+ " protection keys");
FAIL_EXIT1 ("pkey_alloc: %m");
}
--- /dev/null
+#!/bin/sh
+# Test wrapper for AArch64 tests for GCS that are expected to abort.
+# Copyright (C) 2025 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+
+objpfx=$1; shift
+tstname=$1; shift
+tstrun=$1; shift
+
+logfile=$objpfx/$tstname.out
+
+rm -vf $logfile
+touch $logfile
+
+${tstrun} 2>> $logfile >> $logfile; status=$?
+
+if test $status -eq 127 \
+ && grep -q -w "not GCS compatible" "$logfile" ; then
+ exit 0
+elif test $status -eq 77; then
+ exit 77
+else
+ echo "unexpected test output or exit status $status"
+ exit 1
+fi
--- /dev/null
+#include "tst-gcs-disabled.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 0
+#include "tst-gcs-skeleton.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 0
+#define TEST_GCS_EXPECT_DLOPEN 1
+#include "tst-gcs-dlopen.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 1
+#define TEST_GCS_EXPECT_DLOPEN 0
+#include "tst-gcs-dlopen.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 0
+#define TEST_GCS_EXPECT_DLOPEN 1
+#include "tst-gcs-dlopen.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 1
+#define TEST_GCS_EXPECT_DLOPEN 0
+#include "tst-gcs-dlopen.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 1
+#define TEST_GCS_EXPECT_DLOPEN 1
+#include "tst-gcs-dlopen.c"
--- /dev/null
+/* AArch64 tests for GCS for dlopen use case.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "tst-gcs-helper.h"
+
+#include <dlfcn.h>
+#include <string.h>
+
+static int
+do_test (void)
+{
+ /* Check if GCS could possible by enabled. */
+ if (!(getauxval (AT_HWCAP) & HWCAP_GCS))
+ {
+ puts ("kernel or CPU does not support GCS");
+ return EXIT_UNSUPPORTED;
+ }
+ /* The tst-gcs-mod2.so test library does not have GCS marking. */
+ void *h = dlopen ("tst-gcs-mod2.so", RTLD_NOW);
+ const char *err = dlerror ();
+
+#if TEST_GCS_EXPECT_DLOPEN
+ TEST_VERIFY (h != NULL);
+#else
+ TEST_VERIFY (h == NULL);
+ /* Only accept expected GCS-related errors. */
+ TEST_VERIFY (strstr (err, "not GCS compatible") != NULL);
+#endif
+
+#if TEST_GCS_EXPECT_ENABLED
+ TEST_VERIFY (__check_gcs_status ());
+#else
+ TEST_VERIFY (!__check_gcs_status ());
+#endif
+
+ if (h == NULL)
+ printf ("dlopen error: %s\n", err);
+ else
+ {
+ puts ("library loaded normally");
+ dlclose (h);
+ }
+
+ return 0;
+}
+
+#include <support/test-driver.c>
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 1
+#include "tst-gcs-skeleton.c"
--- /dev/null
+#include "tst-gcs-enforced-abort.c"
--- /dev/null
+#include "tst-gcs-enforced.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 1
+#include "tst-gcs-skeleton.c"
--- /dev/null
+/* AArch64 tests for GCS.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef TST_GCS_HELPER_H
+#define TST_GCS_HELPER_H
+
+#include <support/check.h>
+#include <support/support.h>
+#include <support/test-driver.h>
+
+#include <stdio.h>
+#include <sys/auxv.h>
+
+static bool __check_gcs_status (void)
+{
+ register unsigned long x16 asm ("x16");
+ asm volatile (
+ "mov x16, #1 /* _CHKFEAT_GCS */\n"
+ "hint 40 /* CHKFEAT_X16 */\n"
+ : "=r" (x16));
+ return x16 ^ 1;
+}
+
+#endif // POINTER_GUARD_H
--- /dev/null
+/* DSO for testing GCS.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <stdio.h>
+
+int fun2 (void); // tst-gcs-mod2.c
+
+int fun1 (void)
+{
+ puts ("called function fun1");
+ return fun2 ();
+}
--- /dev/null
+/* DSO for testing GCS.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <stdio.h>
+
+int fun2 (void)
+{
+ puts ("called function fun2");
+ return 0;
+}
--- /dev/null
+/* DSO for testing GCS.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <stdio.h>
+
+int fun3 (void)
+{
+ puts ("called function fun3");
+ return 0;
+}
--- /dev/null
+/* AArch64 test for GCS abort when returning to non-GCS address.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "tst-gcs-helper.h"
+
+#include <sys/prctl.h>
+#include <stdlib.h>
+
+#include <support/xsignal.h>
+
+# ifndef PR_SET_SHADOW_STACK_STATUS
+# define PR_SET_SHADOW_STACK_STATUS 75
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+# endif
+
+static void
+run_with_gcs (void)
+{
+ int r = prctl (PR_SET_SHADOW_STACK_STATUS, PR_SHADOW_STACK_ENABLE, 0, 0, 0);
+ /* Syscall should succeed. */
+ TEST_VERIFY (r == 0);
+ bool gcs_enabled = __check_gcs_status ();
+ /* Now GCS should be enabled. */
+ TEST_VERIFY (gcs_enabled);
+ printf ("GCS is %s\n", gcs_enabled ? "enabled" : "disabled");
+}
+
+static struct _aarch64_ctx *
+extension (void *p)
+{
+ return p;
+}
+
+#ifndef GCS_MAGIC
+#define GCS_MAGIC 0x47435300
+#endif
+
+static void
+handler (int sig, siginfo_t *si, void *ctx)
+{
+ TEST_VERIFY (sig == SIGSEGV);
+ ucontext_t *uc = ctx;
+ void *p = uc->uc_mcontext.__reserved;
+ if (extension (p)->magic == FPSIMD_MAGIC)
+ p = (char *)p + extension (p)->size;
+ if (extension (p)->magic == GCS_MAGIC)
+ {
+ struct { uint64_t x, gcspr, y, z; } *q = p;
+ printf ("GCS pointer: %016lx\n", q->gcspr);
+ exit (0);
+ }
+ else
+ exit (3);
+}
+
+static int
+do_test (void)
+{
+ /* Check if GCS could possible by enabled. */
+ if (!(getauxval (AT_HWCAP) & HWCAP_GCS))
+ {
+ puts ("kernel or CPU does not support GCS");
+ return EXIT_UNSUPPORTED;
+ }
+ bool gcs_enabled = __check_gcs_status ();
+ /* This test should be rung with GCS initially disabled. */
+ TEST_VERIFY (!gcs_enabled);
+
+ /* We can't use EXPECTED_SIGNAL because of cases when
+ this test runs on a system that does not support GCS
+ which is being detected at runtime. */
+ struct sigaction sigact;
+ sigemptyset (&sigact.sa_mask);
+ sigact.sa_flags = 0;
+ sigact.sa_flags = sigact.sa_flags | SA_SIGINFO;
+ sigact.sa_sigaction = handler;
+ xsigaction (SIGSEGV, &sigact, NULL);
+
+ run_with_gcs ();
+ /* If we reached this point, then something went wrong.
+ Returning from a function that enabled GCS should result in
+ SIGSEGV that we catch with the handler set up above. */
+ return 2;
+}
+
+#include <support/test-driver.c>
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 0
+#include "tst-gcs-skeleton.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 1
+#include "tst-gcs-skeleton.c"
--- /dev/null
+#include "tst-gcs-optional-off.c"
--- /dev/null
+#include "tst-gcs-optional-on.c"
--- /dev/null
+#include "tst-gcs-override.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 1
+#include "tst-gcs-skeleton.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 0
+#include "tst-gcs-shared.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 1
+#include "tst-gcs-shared.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 0
+#include "tst-gcs-shared.c"
--- /dev/null
+#define TEST_GCS_EXPECT_ENABLED 1
+#include "tst-gcs-shared.c"
--- /dev/null
+/* AArch64 tests for GCS.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "tst-gcs-helper.h"
+
+int fun1 (void); // tst-gcs-mod1.c
+int fun3 (void); // tst-gcs-mod3.c
+
+static int
+do_test (void)
+{
+ /* Check if GCS could possible by enabled. */
+ if (!(getauxval (AT_HWCAP) & HWCAP_GCS))
+ {
+ puts ("kernel or CPU does not support GCS");
+ return EXIT_UNSUPPORTED;
+ }
+#if TEST_GCS_EXPECT_ENABLED
+ TEST_VERIFY (__check_gcs_status ());
+#else
+ TEST_VERIFY (!__check_gcs_status ());
+#endif
+ return fun1 () + fun3 ();
+}
+
+#include <support/test-driver.c>
--- /dev/null
+/* AArch64 tests for GCS.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "tst-gcs-helper.h"
+
+static int
+do_test (void)
+{
+ /* Check if GCS could possible by enabled. */
+ if (!(getauxval (AT_HWCAP) & HWCAP_GCS))
+ {
+ puts ("kernel or CPU does not support GCS");
+ return EXIT_UNSUPPORTED;
+ }
+ bool gcs_enabled = __check_gcs_status ();
+ if (gcs_enabled)
+ puts ("GCS enabled");
+ else
+ puts ("GCS not enabled");
+#if TEST_GCS_EXPECT_ENABLED
+ TEST_VERIFY (gcs_enabled);
+#else
+ TEST_VERIFY (!gcs_enabled);
+#endif
+ return 0;
+}
+
+#include <support/test-driver.c>
store it in *ATTR. */
int sched_getattr (pid_t tid, struct sched_attr *attr, unsigned int size,
unsigned int flags)
- __THROW __nonnull ((2)) __attr_access ((__write_only__, 2, 3));
+ __THROW __nonnull ((2));
#endif
#include <ldsodefs.h>
int
-_dl_make_stack_executable (void **stack_endp)
+_dl_make_stack_executable (const void *stack_endp)
{
/* This gives us the highest/lowest page that needs to be changed. */
- uintptr_t page = ((uintptr_t) *stack_endp
+ uintptr_t page = ((uintptr_t) stack_endp
& -(intptr_t) GLRO(dl_pagesize));
if (__mprotect ((void *) page, GLRO(dl_pagesize),
) != 0)
return errno;
- /* Clear the address. */
- *stack_endp = NULL;
-
/* Remember that we changed the permission. */
GL(dl_stack_flags) |= PF_X;
if (size < RSEQ_AREA_SIZE_INITIAL)
size = RSEQ_AREA_SIZE_INITIAL;
- /* Initialize the rseq fields that are read by the kernel on
- registration, there is no guarantee that struct pthread is
- cleared on all architectures. */
+ /* Initialize the whole rseq area to zero prior to registration. */
+ memset (RSEQ_SELF (), 0, size);
+
+ /* Set the cpu_id field to RSEQ_CPU_ID_UNINITIALIZED, this is checked by
+ the kernel at registration when CONFIG_DEBUG_RSEQ is enabled. */
RSEQ_SETMEM (cpu_id, RSEQ_CPU_ID_UNINITIALIZED);
- RSEQ_SETMEM (cpu_id_start, 0);
- RSEQ_SETMEM (rseq_cs, 0);
- RSEQ_SETMEM (flags, 0);
int ret = INTERNAL_SYSCALL_CALL (rseq, RSEQ_SELF (), size, 0, RSEQ_SIG);
if (!INTERNAL_SYSCALL_ERROR_P (ret))
tst-cpu-features-supports-static \
tst-get-cpu-features \
tst-get-cpu-features-static \
+ tst-gnu2-tls2-x86-noxsave \
+ tst-gnu2-tls2-x86-noxsavec \
+ tst-gnu2-tls2-x86-noxsavexsavec \
tst-hwcap-tunables \
# tests
tests-static += \
CFLAGS-tst-gnu2-tls2mod0.c += -msse2 -mtune=haswell
CFLAGS-tst-gnu2-tls2mod1.c += -msse2 -mtune=haswell
CFLAGS-tst-gnu2-tls2mod2.c += -msse2 -mtune=haswell
+
+LDFLAGS-tst-gnu2-tls2-x86-noxsave += -Wl,-z,lazy
+LDFLAGS-tst-gnu2-tls2-x86-noxsavec += -Wl,-z,lazy
+LDFLAGS-tst-gnu2-tls2-x86-noxsavexsavec += -Wl,-z,lazy
+
+# Test for bug 32810: incorrect XSAVE state size if XSAVEC is disabled
+# via tunable.
+tst-gnu2-tls2-x86-noxsave-ENV = GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE
+tst-gnu2-tls2-x86-noxsavec-ENV = GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC
+tst-gnu2-tls2-x86-noxsavexsavec-ENV = GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE,-XSAVEC
+$(objpfx)tst-gnu2-tls2-x86-noxsave: $(shared-thread-library)
+$(objpfx)tst-gnu2-tls2-x86-noxsavec: $(shared-thread-library)
+$(objpfx)tst-gnu2-tls2-x86-noxsavexsavec: $(shared-thread-library)
+$(objpfx)tst-gnu2-tls2-x86-noxsave.out \
+$(objpfx)tst-gnu2-tls2-x86-noxsavec.out \
+$(objpfx)tst-gnu2-tls2-x86-noxsavexsavec.out: \
+ $(objpfx)tst-gnu2-tls2mod0.so \
+ $(objpfx)tst-gnu2-tls2mod1.so \
+ $(objpfx)tst-gnu2-tls2mod2.so
endif
ifeq ($(subdir),math)
floating-point type with the IEEE 754 binary128 format, and this
glibc includes corresponding *f128 interfaces for it. The required
libgcc support was added some time after the basic compiler
- support, for x86_64 and x86. */
+ support, for x86_64 and x86. Intel SYCL compiler doesn't support
+ _Float128: https://github.com/intel/llvm/issues/16903
+ */
#if (defined __x86_64__ \
? __GNUC_PREREQ (4, 3) \
: (defined __GNU__ ? __GNUC_PREREQ (4, 5) : __GNUC_PREREQ (4, 4))) \
- || __glibc_clang_prereq (3, 4)
+ || (__glibc_clang_prereq (3, 9) \
+ && (!defined __INTEL_LLVM_COMPILER \
+ || !defined SYCL_LANGUAGE_VERSION))
# define __HAVE_FLOAT128 1
#else
# define __HAVE_FLOAT128 0
/* The type _Float128 exists only since GCC 7.0. */
# if !__GNUC_PREREQ (7, 0) \
|| (defined __cplusplus && !__GNUC_PREREQ (13, 0)) \
- || __glibc_clang_prereq (3, 4)
+ || __glibc_clang_prereq (3, 9)
typedef __float128 _Float128;
# endif
#include <dl-cacheinfo.h>
#include <dl-minsigstacksize.h>
#include <dl-hwcap2.h>
+#include <gcc-macros.h>
extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
attribute_hidden;
# include <dl-cet.h>
#endif
+unsigned long int _dl_x86_features_tlsdesc_state_size;
+
static void
update_active (struct cpu_features *cpu_features)
{
= xsave_state_full_size;
cpu_features->xsave_state_full_size
= xsave_state_full_size;
+ _dl_x86_features_tlsdesc_state_size = xsave_state_full_size;
/* Check if XSAVEC is available. */
if (CPU_FEATURES_CPU_P (cpu_features, XSAVEC))
{
- unsigned int xstate_comp_offsets[32];
- unsigned int xstate_comp_sizes[32];
-#ifdef __x86_64__
- unsigned int xstate_amx_comp_offsets[32];
- unsigned int xstate_amx_comp_sizes[32];
- unsigned int amx_ecx;
-#endif
+ unsigned int xstate_comp_offsets[X86_XSTATE_MAX_ID + 1];
+ unsigned int xstate_comp_sizes[X86_XSTATE_MAX_ID + 1];
unsigned int i;
xstate_comp_offsets[0] = 0;
xstate_comp_offsets[2] = 576;
xstate_comp_sizes[0] = 160;
xstate_comp_sizes[1] = 256;
-#ifdef __x86_64__
- xstate_amx_comp_offsets[0] = 0;
- xstate_amx_comp_offsets[1] = 160;
- xstate_amx_comp_offsets[2] = 576;
- xstate_amx_comp_sizes[0] = 160;
- xstate_amx_comp_sizes[1] = 256;
-#endif
- for (i = 2; i < 32; i++)
+ for (i = 2; i <= X86_XSTATE_MAX_ID; i++)
{
if ((FULL_STATE_SAVE_MASK & (1 << i)) != 0)
{
__cpuid_count (0xd, i, eax, ebx, ecx, edx);
-#ifdef __x86_64__
- /* Include this in xsave_state_full_size. */
- amx_ecx = ecx;
- xstate_amx_comp_sizes[i] = eax;
- if ((AMX_STATE_SAVE_MASK & (1 << i)) != 0)
- {
- /* Exclude this from xsave_state_size. */
- ecx = 0;
- xstate_comp_sizes[i] = 0;
- }
- else
-#endif
- xstate_comp_sizes[i] = eax;
+ xstate_comp_sizes[i] = eax;
}
else
{
-#ifdef __x86_64__
- amx_ecx = 0;
- xstate_amx_comp_sizes[i] = 0;
-#endif
ecx = 0;
xstate_comp_sizes[i] = 0;
}
{
xstate_comp_offsets[i]
= (xstate_comp_offsets[i - 1]
- + xstate_comp_sizes[i -1]);
+ + xstate_comp_sizes[i - 1]);
if ((ecx & (1 << 1)) != 0)
xstate_comp_offsets[i]
= ALIGN_UP (xstate_comp_offsets[i], 64);
-#ifdef __x86_64__
- xstate_amx_comp_offsets[i]
- = (xstate_amx_comp_offsets[i - 1]
- + xstate_amx_comp_sizes[i - 1]);
- if ((amx_ecx & (1 << 1)) != 0)
- xstate_amx_comp_offsets[i]
- = ALIGN_UP (xstate_amx_comp_offsets[i],
- 64);
-#endif
}
}
/* Use XSAVEC. */
unsigned int size
- = xstate_comp_offsets[31] + xstate_comp_sizes[31];
+ = (xstate_comp_offsets[X86_XSTATE_MAX_ID]
+ + xstate_comp_sizes[X86_XSTATE_MAX_ID]);
if (size)
{
+ size = ALIGN_UP (size + TLSDESC_CALL_REGISTER_SAVE_AREA,
+ 64);
#ifdef __x86_64__
- unsigned int amx_size
- = (xstate_amx_comp_offsets[31]
- + xstate_amx_comp_sizes[31]);
- amx_size
- = ALIGN_UP ((amx_size
- + TLSDESC_CALL_REGISTER_SAVE_AREA),
- 64);
- /* Set xsave_state_full_size to the compact AMX
- state size for XSAVEC. NB: xsave_state_full_size
- is only used in _dl_tlsdesc_dynamic_xsave and
- _dl_tlsdesc_dynamic_xsavec. */
- cpu_features->xsave_state_full_size = amx_size;
+ _dl_x86_features_tlsdesc_state_size = size;
+ /* Exclude the AMX space from the start of TILECFG
+ space to the end of TILEDATA space. If CPU
+ doesn't support AMX, TILECFG offset is the same
+ as TILEDATA + 1 offset. Otherwise, they are
+ multiples of 64. */
+ size -= (xstate_comp_offsets[X86_XSTATE_TILEDATA_ID + 1]
+ - xstate_comp_offsets[X86_XSTATE_TILECFG_ID]);
#endif
- cpu_features->xsave_state_size
- = ALIGN_UP (size + TLSDESC_CALL_REGISTER_SAVE_AREA,
- 64);
+ cpu_features->xsave_state_size = size;
CPU_FEATURE_SET (cpu_features, XSAVEC);
}
}
"Incorrect index_arch_Fast_Unaligned_Load");
-/* Intel Family-6 microarch list. */
-enum
+/* Intel microarch list. */
+enum intel_microarch
{
/* Atom processors. */
INTEL_ATOM_BONNELL,
INTEL_ATOM_GOLDMONT,
INTEL_ATOM_GOLDMONT_PLUS,
INTEL_ATOM_SIERRAFOREST,
+ INTEL_ATOM_CLEARWATERFOREST,
INTEL_ATOM_GRANDRIDGE,
INTEL_ATOM_TREMONT,
INTEL_BIGCORE_METEORLAKE,
INTEL_BIGCORE_LUNARLAKE,
INTEL_BIGCORE_ARROWLAKE,
+ INTEL_BIGCORE_PANTHERLAKE,
INTEL_BIGCORE_GRANITERAPIDS,
+ INTEL_BIGCORE_DIAMONDRAPIDS,
/* Mixed (bigcore + atom SOC). */
INTEL_MIXED_LAKEFIELD,
INTEL_UNKNOWN,
};
-static unsigned int
+static enum intel_microarch
intel_get_fam6_microarch (unsigned int model,
__attribute__ ((unused)) unsigned int stepping)
{
return INTEL_ATOM_GOLDMONT_PLUS;
case 0xAF:
return INTEL_ATOM_SIERRAFOREST;
+ case 0xDD:
+ return INTEL_ATOM_CLEARWATERFOREST;
case 0xB6:
return INTEL_ATOM_GRANDRIDGE;
case 0x86:
return INTEL_BIGCORE_METEORLAKE;
case 0xbd:
return INTEL_BIGCORE_LUNARLAKE;
+ case 0xb5:
+ case 0xc5:
case 0xc6:
return INTEL_BIGCORE_ARROWLAKE;
+ case 0xCC:
+ return INTEL_BIGCORE_PANTHERLAKE;
case 0xAD:
case 0xAE:
return INTEL_BIGCORE_GRANITERAPIDS;
cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset]
&= ~bit_arch_Avoid_Non_Temporal_Memset;
+ enum intel_microarch microarch = INTEL_UNKNOWN;
if (family == 0x06)
{
model += extended_model;
- unsigned int microarch
- = intel_get_fam6_microarch (model, stepping);
+ microarch = intel_get_fam6_microarch (model, stepping);
+ /* Disable TSX on some processors to avoid TSX on kernels that
+ weren't updated with the latest microcode package (which
+ disables broken feature by default). */
switch (microarch)
{
- /* Atom / KNL tuning. */
- case INTEL_ATOM_BONNELL:
- /* BSF is slow on Bonnell. */
- cpu_features->preferred[index_arch_Slow_BSF]
- |= bit_arch_Slow_BSF;
- break;
-
- /* Unaligned load versions are faster than SSSE3
- on Airmont, Silvermont, Goldmont, and Goldmont Plus. */
- case INTEL_ATOM_AIRMONT:
- case INTEL_ATOM_SILVERMONT:
- case INTEL_ATOM_GOLDMONT:
- case INTEL_ATOM_GOLDMONT_PLUS:
-
- /* Knights Landing. Enable Silvermont optimizations. */
- case INTEL_KNIGHTS_LANDING:
-
- cpu_features->preferred[index_arch_Fast_Unaligned_Load]
- |= (bit_arch_Fast_Unaligned_Load
- | bit_arch_Fast_Unaligned_Copy
- | bit_arch_Prefer_PMINUB_for_stringop
- | bit_arch_Slow_SSE4_2);
- break;
-
- case INTEL_ATOM_TREMONT:
- /* Enable rep string instructions, unaligned load, unaligned
- copy, pminub and avoid SSE 4.2 on Tremont. */
- cpu_features->preferred[index_arch_Fast_Rep_String]
- |= (bit_arch_Fast_Rep_String
- | bit_arch_Fast_Unaligned_Load
- | bit_arch_Fast_Unaligned_Copy
- | bit_arch_Prefer_PMINUB_for_stringop
- | bit_arch_Slow_SSE4_2);
- break;
-
- /*
- Default tuned Knights microarch.
- case INTEL_KNIGHTS_MILL:
- */
-
- /*
- Default tuned atom microarch.
- case INTEL_ATOM_SIERRAFOREST:
- case INTEL_ATOM_GRANDRIDGE:
- */
-
- /* Bigcore/Default Tuning. */
default:
- default_tuning:
- /* Unknown family 0x06 processors. Assuming this is one
- of Core i3/i5/i7 processors if AVX is available. */
- if (!CPU_FEATURES_CPU_P (cpu_features, AVX))
- break;
-
- enable_modern_features:
- /* Rep string instructions, unaligned load, unaligned copy,
- and pminub are fast on Intel Core i3, i5 and i7. */
- cpu_features->preferred[index_arch_Fast_Rep_String]
- |= (bit_arch_Fast_Rep_String
- | bit_arch_Fast_Unaligned_Load
- | bit_arch_Fast_Unaligned_Copy
- | bit_arch_Prefer_PMINUB_for_stringop);
break;
- case INTEL_BIGCORE_NEHALEM:
- case INTEL_BIGCORE_WESTMERE:
- /* Older CPUs prefer non-temporal stores at lower threshold. */
- cpu_features->cachesize_non_temporal_divisor = 8;
- goto enable_modern_features;
-
- /* Older Bigcore microarch (smaller non-temporal store
- threshold). */
- case INTEL_BIGCORE_SANDYBRIDGE:
- case INTEL_BIGCORE_IVYBRIDGE:
- case INTEL_BIGCORE_HASWELL:
- case INTEL_BIGCORE_BROADWELL:
- cpu_features->cachesize_non_temporal_divisor = 8;
- goto default_tuning;
-
- /* Newer Bigcore microarch (larger non-temporal store
- threshold). */
- case INTEL_BIGCORE_SKYLAKE_AVX512:
- case INTEL_BIGCORE_CANNONLAKE:
- /* Benchmarks indicate non-temporal memset is not
- necessarily profitable on SKX (and in some cases much
- worse). This is likely unique to SKX due its it unique
- mesh interconnect (not present on ICX or BWD). Disable
- non-temporal on all Skylake servers. */
- cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset]
- |= bit_arch_Avoid_Non_Temporal_Memset;
- /* fallthrough */
- case INTEL_BIGCORE_COMETLAKE:
- case INTEL_BIGCORE_SKYLAKE:
- case INTEL_BIGCORE_KABYLAKE:
- case INTEL_BIGCORE_ICELAKE:
- case INTEL_BIGCORE_TIGERLAKE:
- case INTEL_BIGCORE_ROCKETLAKE:
- case INTEL_BIGCORE_RAPTORLAKE:
- case INTEL_BIGCORE_METEORLAKE:
- case INTEL_BIGCORE_LUNARLAKE:
- case INTEL_BIGCORE_ARROWLAKE:
- case INTEL_BIGCORE_SAPPHIRERAPIDS:
- case INTEL_BIGCORE_EMERALDRAPIDS:
- case INTEL_BIGCORE_GRANITERAPIDS:
- cpu_features->cachesize_non_temporal_divisor = 2;
- goto default_tuning;
-
- /* Default tuned Mixed (bigcore + atom SOC). */
- case INTEL_MIXED_LAKEFIELD:
- case INTEL_MIXED_ALDERLAKE:
- cpu_features->cachesize_non_temporal_divisor = 2;
- goto default_tuning;
- }
-
- /* Disable TSX on some processors to avoid TSX on kernels that
- weren't updated with the latest microcode package (which
- disables broken feature by default). */
- switch (microarch)
- {
case INTEL_BIGCORE_SKYLAKE_AVX512:
/* 0x55 (Skylake-avx512) && stepping <= 5 disable TSX. */
if (stepping <= 5)
case INTEL_BIGCORE_KABYLAKE:
/* NB: Although the errata documents that for model == 0x8e
- (kabylake skylake client), only 0xb stepping or lower are
- impacted, the intention of the errata was to disable TSX on
- all client processors on all steppings. Include 0xc
- stepping which is an Intel Core i7-8665U, a client mobile
- processor. */
+ (kabylake skylake client), only 0xb stepping or lower are
+ impacted, the intention of the errata was to disable TSX on
+ all client processors on all steppings. Include 0xc
+ stepping which is an Intel Core i7-8665U, a client mobile
+ processor. */
if (stepping > 0xc)
break;
/* Fall through. */
case INTEL_BIGCORE_SKYLAKE:
- /* Disable Intel TSX and enable RTM_ALWAYS_ABORT for
- processors listed in:
-
-https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html
- */
- disable_tsx:
- CPU_FEATURE_UNSET (cpu_features, HLE);
- CPU_FEATURE_UNSET (cpu_features, RTM);
- CPU_FEATURE_SET (cpu_features, RTM_ALWAYS_ABORT);
- break;
+ /* Disable Intel TSX and enable RTM_ALWAYS_ABORT for
+ processors listed in:
+
+ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html
+ */
+disable_tsx:
+ CPU_FEATURE_UNSET (cpu_features, HLE);
+ CPU_FEATURE_UNSET (cpu_features, RTM);
+ CPU_FEATURE_SET (cpu_features, RTM_ALWAYS_ABORT);
+ break;
case INTEL_BIGCORE_HASWELL:
- /* Xeon E7 v3 (model == 0x3f) with stepping >= 4 has working
- TSX. Haswell also include other model numbers that have
- working TSX. */
- if (model == 0x3f && stepping >= 4)
+ /* Xeon E7 v3 (model == 0x3f) with stepping >= 4 has working
+ TSX. Haswell also includes other model numbers that have
+ working TSX. */
+ if (model == 0x3f && stepping >= 4)
break;
- CPU_FEATURE_UNSET (cpu_features, RTM);
- break;
+ CPU_FEATURE_UNSET (cpu_features, RTM);
+ break;
}
}
+ else if (family == 19)
+ switch (model)
+ {
+ case 0x01:
+ microarch = INTEL_BIGCORE_DIAMONDRAPIDS;
+ break;
+ default:
+ break;
+ }
+
+ switch (microarch)
+ {
+ /* Atom / KNL tuning. */
+ case INTEL_ATOM_BONNELL:
+ /* BSF is slow on Bonnell. */
+ cpu_features->preferred[index_arch_Slow_BSF]
+ |= bit_arch_Slow_BSF;
+ break;
+
+ /* Unaligned load versions are faster than SSSE3
+ on Airmont, Silvermont, Goldmont, and Goldmont Plus. */
+ case INTEL_ATOM_AIRMONT:
+ case INTEL_ATOM_SILVERMONT:
+ case INTEL_ATOM_GOLDMONT:
+ case INTEL_ATOM_GOLDMONT_PLUS:
+
+ /* Knights Landing. Enable Silvermont optimizations. */
+ case INTEL_KNIGHTS_LANDING:
+
+ cpu_features->preferred[index_arch_Fast_Unaligned_Load]
+ |= (bit_arch_Fast_Unaligned_Load
+ | bit_arch_Fast_Unaligned_Copy
+ | bit_arch_Prefer_PMINUB_for_stringop
+ | bit_arch_Slow_SSE4_2);
+ break;
+
+ case INTEL_ATOM_TREMONT:
+ /* Enable rep string instructions, unaligned load, unaligned
+ copy, pminub and avoid SSE 4.2 on Tremont. */
+ cpu_features->preferred[index_arch_Fast_Rep_String]
+ |= (bit_arch_Fast_Rep_String
+ | bit_arch_Fast_Unaligned_Load
+ | bit_arch_Fast_Unaligned_Copy
+ | bit_arch_Prefer_PMINUB_for_stringop
+ | bit_arch_Slow_SSE4_2);
+ break;
+
+ /*
+ Default tuned Knights microarch.
+ case INTEL_KNIGHTS_MILL:
+ */
+
+ /*
+ Default tuned atom microarch.
+ case INTEL_ATOM_SIERRAFOREST:
+ case INTEL_ATOM_GRANDRIDGE:
+ case INTEL_ATOM_CLEARWATERFOREST:
+ */
+
+ /* Bigcore/Default Tuning. */
+ default:
+ default_tuning:
+ /* Unknown Intel processors. Assuming this is one of Core
+ i3/i5/i7 processors if AVX is available. */
+ if (!CPU_FEATURES_CPU_P (cpu_features, AVX))
+ break;
+
+ enable_modern_features:
+ /* Rep string instructions, unaligned load, unaligned copy,
+ and pminub are fast on Intel Core i3, i5 and i7. */
+ cpu_features->preferred[index_arch_Fast_Rep_String]
+ |= (bit_arch_Fast_Rep_String
+ | bit_arch_Fast_Unaligned_Load
+ | bit_arch_Fast_Unaligned_Copy
+ | bit_arch_Prefer_PMINUB_for_stringop);
+ break;
+
+ case INTEL_BIGCORE_NEHALEM:
+ case INTEL_BIGCORE_WESTMERE:
+ /* Older CPUs prefer non-temporal stores at lower threshold. */
+ cpu_features->cachesize_non_temporal_divisor = 8;
+ goto enable_modern_features;
+
+ /* Older Bigcore microarch (smaller non-temporal store
+ threshold). */
+ case INTEL_BIGCORE_SANDYBRIDGE:
+ case INTEL_BIGCORE_IVYBRIDGE:
+ case INTEL_BIGCORE_HASWELL:
+ case INTEL_BIGCORE_BROADWELL:
+ cpu_features->cachesize_non_temporal_divisor = 8;
+ goto default_tuning;
+
+ /* Newer Bigcore microarch (larger non-temporal store
+ threshold). */
+ case INTEL_BIGCORE_SKYLAKE_AVX512:
+ case INTEL_BIGCORE_CANNONLAKE:
+ /* Benchmarks indicate non-temporal memset is not
+ necessarily profitable on SKX (and in some cases much
+ worse). This is likely unique to SKX due to its unique
+ mesh interconnect (not present on ICX or BWD). Disable
+ non-temporal on all Skylake servers. */
+ cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset]
+ |= bit_arch_Avoid_Non_Temporal_Memset;
+ /* fallthrough */
+ case INTEL_BIGCORE_COMETLAKE:
+ case INTEL_BIGCORE_SKYLAKE:
+ case INTEL_BIGCORE_KABYLAKE:
+ case INTEL_BIGCORE_ICELAKE:
+ case INTEL_BIGCORE_TIGERLAKE:
+ case INTEL_BIGCORE_ROCKETLAKE:
+ case INTEL_BIGCORE_RAPTORLAKE:
+ case INTEL_BIGCORE_METEORLAKE:
+ case INTEL_BIGCORE_LUNARLAKE:
+ case INTEL_BIGCORE_ARROWLAKE:
+ case INTEL_BIGCORE_PANTHERLAKE:
+ case INTEL_BIGCORE_SAPPHIRERAPIDS:
+ case INTEL_BIGCORE_EMERALDRAPIDS:
+ case INTEL_BIGCORE_GRANITERAPIDS:
+ case INTEL_BIGCORE_DIAMONDRAPIDS:
+ /* Default tuned Mixed (bigcore + atom SOC). */
+ case INTEL_MIXED_LAKEFIELD:
+ case INTEL_MIXED_ALDERLAKE:
+ cpu_features->cachesize_non_temporal_divisor = 2;
+ goto default_tuning;
+ }
/* Since AVX512ER is unique to Xeon Phi, set Prefer_No_VZEROUPPER
if AVX512ER is available. Don't use AVX512 to avoid lower CPU
TUNABLE_CALLBACK (set_prefer_map_32bit_exec));
#endif
+ /* Do not add the logic to disable XSAVE/XSAVEC if this glibc build
+ requires AVX and therefore XSAVE or XSAVEC support. */
+#ifndef GCCMACRO__AVX__
bool disable_xsave_features = false;
if (!CPU_FEATURE_USABLE_P (cpu_features, OSXSAVE))
CPU_FEATURE_UNSET (cpu_features, FMA4);
}
+#endif
#ifdef __x86_64__
GLRO(dl_hwcap) = HWCAP_X86_64;
/* Update xsave_state_size to XSAVE state size. */
cpu_features->xsave_state_size
= cpu_features->xsave_state_full_size;
+ _dl_x86_features_tlsdesc_state_size
+ = cpu_features->xsave_state_full_size;
CPU_FEATURE_UNSET (cpu_features, XSAVEC);
}
}
cpu_features->xsave_state_size);
print_cpu_features_value ("xsave_state_full_size",
cpu_features->xsave_state_full_size);
+ print_cpu_features_value ("tlsdesc_state_full_size",
+ _dl_x86_features_tlsdesc_state_size);
print_cpu_features_value ("data_cache_size", cpu_features->data_cache_size);
print_cpu_features_value ("shared_cache_size",
cpu_features->shared_cache_size);
/* The full state size for XSAVE when XSAVEC is disabled by
GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC
-
- and the AMX state size when XSAVEC is available.
*/
unsigned int xsave_state_full_size;
/* Data cache size for use in memory and string routines, typically
#define __get_cpu_features() _dl_x86_get_cpu_features()
+#if IS_IN (rtld) || IS_IN (libc)
+/* XSAVE/XSAVEC state size used by TLS descriptors. Compared to
+ xsave_state_size from struct cpu_features, this includes additional
+ registers. */
+extern unsigned long int _dl_x86_features_tlsdesc_state_size attribute_hidden;
+#endif
+
#if defined (_LIBC) && !IS_IN (nonlib)
/* Unused for x86. */
# define INIT_ARCH()
| (1 << X86_XSTATE_ZMM_ID) \
| (1 << X86_XSTATE_APX_F_ID))
+/* The maximum supported xstate ID. */
+# define X86_XSTATE_MAX_ID X86_XSTATE_APX_F_ID
+
/* AMX state mask. */
# define AMX_STATE_SAVE_MASK \
((1 << X86_XSTATE_TILECFG_ID) | (1 << X86_XSTATE_TILEDATA_ID))
| (1 << X86_XSTATE_K_ID) \
| (1 << X86_XSTATE_ZMM_H_ID))
+/* The maximum supported xstate ID. */
+# define X86_XSTATE_MAX_ID X86_XSTATE_ZMM_H_ID
+
/* States to be included in xsave_state_size. */
# define FULL_STATE_SAVE_MASK STATE_SAVE_MASK
#endif
--- /dev/null
+#include <elf/tst-gnu2-tls2.c>
--- /dev/null
+#include <elf/tst-gnu2-tls2.c>
--- /dev/null
+#include <elf/tst-gnu2-tls2.c>
AVX512-CFLAGS = -mavx512f
CFLAGS-tst-audit10-aux.c += $(AVX512-CFLAGS)
CFLAGS-tst-auditmod10a.c += $(AVX512-CFLAGS)
-CFLAGS-tst-auditmod10b.c += $(AVX512-CFLAGS)
CFLAGS-tst-avx512-aux.c += $(AVX512-CFLAGS)
CFLAGS-tst-avx512mod.c += $(AVX512-CFLAGS)
# endif
#else
/* Allocate stack space of the required size to save the state. */
- sub _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_FULL_SIZE_OFFSET(%rip), %RSP_LP
+ sub _dl_x86_features_tlsdesc_state_size(%rip), %RSP_LP
#endif
/* Besides rdi and rsi, saved above, save rcx, rdx, r8, r9,
r10 and r11. */
ifeq ($(subdir),math)
CFLAGS-e_asin-fma.c = -mfma -mavx2
CFLAGS-e_atan2-fma.c = -mfma -mavx2
+CFLAGS-e_atanh-fma.c = -mfma -mavx2
CFLAGS-e_exp-fma.c = -mfma -mavx2
CFLAGS-e_log-fma.c = -mfma -mavx2
CFLAGS-e_log2-fma.c = -mfma -mavx2
CFLAGS-e_pow-fma.c = -mfma -mavx2
+CFLAGS-e_sinh-fma.c = -mfma -mavx2
CFLAGS-s_atan-fma.c = -mfma -mavx2
CFLAGS-s_expm1-fma.c = -mfma -mavx2
CFLAGS-s_log1p-fma.c = -mfma -mavx2
CFLAGS-s_sin-fma.c = -mfma -mavx2
CFLAGS-s_tan-fma.c = -mfma -mavx2
+CFLAGS-s_tanh-fma.c = -mfma -mavx2
CFLAGS-s_sincos-fma.c = -mfma -mavx2
CFLAGS-s_exp10m1f-fma.c = -mfma -mavx2
CFLAGS-s_exp2m1f-fma.c = -mfma -mavx2
e_asin-fma \
e_atan2-avx \
e_atan2-fma \
+ e_atanh-fma \
e_exp-avx \
e_exp-fma \
e_exp2f-fma \
e_logf-fma \
e_pow-fma \
e_powf-fma \
+ e_sinh-fma \
s_atan-avx \
s_atan-fma \
s_ceil-sse4_1 \
s_sinf-sse2 \
s_tan-avx \
s_tan-fma \
+ s_tanh-fma \
s_trunc-sse4_1 \
s_truncf-sse4_1 \
# libm-sysdep_routines
--- /dev/null
+#define __ieee754_atanh __ieee754_atanh_fma
+#define __log1p __log1p_fma
+
+#define SECTION __attribute__ ((section (".text.fma")))
+
+#include <sysdeps/ieee754/dbl-64/e_atanh.c>
--- /dev/null
+/* Multiple versions of atanh.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdeps/x86/isa-level.h>
+#if MINIMUM_X86_ISA_LEVEL < AVX2_X86_ISA_LEVEL
+# include <libm-alias-finite.h>
+
+extern double __redirect_ieee754_atanh (double);
+
+# define SYMBOL_NAME ieee754_atanh
+# include "ifunc-fma.h"
+
+libc_ifunc_redirected (__redirect_ieee754_atanh, __ieee754_atanh, IFUNC_SELECTOR ());
+
+libm_alias_finite (__ieee754_atanh, __atanh)
+
+# define __ieee754_atanh __ieee754_atanh_sse2
+#endif
+#include <sysdeps/ieee754/dbl-64/e_atanh.c>
--- /dev/null
+#define __ieee754_sinh __ieee754_sinh_fma
+#define __ieee754_exp __ieee754_exp_fma
+#define __expm1 __expm1_fma
+
+/* NB: __expm1 may be expanded to __expm1_fma in the following
+ prototypes. */
+extern long double __expm1l (long double);
+extern long double __expm1f128 (long double);
+
+#define SECTION __attribute__ ((section (".text.fma")))
+
+#include <sysdeps/ieee754/dbl-64/e_sinh.c>
--- /dev/null
+/* Multiple versions of sinh.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdeps/x86/isa-level.h>
+#if MINIMUM_X86_ISA_LEVEL < AVX2_X86_ISA_LEVEL
+# include <libm-alias-finite.h>
+
+extern double __redirect_ieee754_sinh (double);
+
+# define SYMBOL_NAME ieee754_sinh
+# include "ifunc-fma.h"
+
+libc_ifunc_redirected (__redirect_ieee754_sinh, __ieee754_sinh,
+ IFUNC_SELECTOR ());
+
+libm_alias_finite (__ieee754_sinh, __sinh)
+
+# define __ieee754_sinh __ieee754_sinh_sse2
+#endif
+#include <sysdeps/ieee754/dbl-64/e_sinh.c>
--- /dev/null
+#define __tanh __tanh_fma
+#define __expm1 __expm1_fma
+
+/* NB: __expm1 may be expanded to __expm1_fma in the following
+ prototypes. */
+extern long double __expm1l (long double);
+extern long double __expm1f128 (long double);
+
+#define SECTION __attribute__ ((section (".text.fma")))
+
+#include <sysdeps/ieee754/dbl-64/s_tanh.c>
--- /dev/null
+/* Multiple versions of tanh.
+ Copyright (C) 2025 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdeps/x86/isa-level.h>
+#if MINIMUM_X86_ISA_LEVEL < AVX2_X86_ISA_LEVEL
+
+extern double __redirect_tanh (double);
+
+# define SYMBOL_NAME tanh
+# include "ifunc-fma.h"
+
+libc_ifunc_redirected (__redirect_tanh, __tanh, IFUNC_SELECTOR ());
+
+# define __tanh __tanh_sse2
+#endif
+#include <sysdeps/ieee754/dbl-64/s_tanh.c>
#include <tst-audit.h>
-#ifdef __AVX512F__
#include <immintrin.h>
#include <cpuid.h>
return (eax & 0xe6) == 0xe6;
}
-#else
-#include <emmintrin.h>
-#endif
+void
+__attribute__ ((target ("avx512f")))
+pltenter_avx512f (La_regs *regs, long int *framesizep)
+{
+ __m512i zero = _mm512_setzero_si512 ();
+ if (memcmp (®s->lr_vector[0], &zero, sizeof (zero))
+ || memcmp (®s->lr_vector[1], &zero, sizeof (zero))
+ || memcmp (®s->lr_vector[2], &zero, sizeof (zero))
+ || memcmp (®s->lr_vector[3], &zero, sizeof (zero))
+ || memcmp (®s->lr_vector[4], &zero, sizeof (zero))
+ || memcmp (®s->lr_vector[5], &zero, sizeof (zero))
+ || memcmp (®s->lr_vector[6], &zero, sizeof (zero))
+ || memcmp (®s->lr_vector[7], &zero, sizeof (zero)))
+ abort ();
+
+ for (int i = 0; i < 8; i++)
+ regs->lr_vector[i].zmm[0]
+ = (La_x86_64_zmm) _mm512_set1_epi64 (i + 1);
+
+ __m512i zmm = _mm512_set1_epi64 (-1);
+ asm volatile ("vmovdqa64 %0, %%zmm0" : : "x" (zmm) : "xmm0" );
+ asm volatile ("vmovdqa64 %0, %%zmm1" : : "x" (zmm) : "xmm1" );
+ asm volatile ("vmovdqa64 %0, %%zmm2" : : "x" (zmm) : "xmm2" );
+ asm volatile ("vmovdqa64 %0, %%zmm3" : : "x" (zmm) : "xmm3" );
+ asm volatile ("vmovdqa64 %0, %%zmm4" : : "x" (zmm) : "xmm4" );
+ asm volatile ("vmovdqa64 %0, %%zmm5" : : "x" (zmm) : "xmm5" );
+ asm volatile ("vmovdqa64 %0, %%zmm6" : : "x" (zmm) : "xmm6" );
+ asm volatile ("vmovdqa64 %0, %%zmm7" : : "x" (zmm) : "xmm7" );
+
+ *framesizep = 1024;
+}
ElfW(Addr)
pltenter (ElfW(Sym) *sym, unsigned int ndx, uintptr_t *refcook,
printf ("pltenter: symname=%s, st_value=%#lx, ndx=%u, flags=%u\n",
symname, (long int) sym->st_value, ndx, *flags);
-#ifdef __AVX512F__
if (check_avx512 () && strcmp (symname, "audit_test") == 0)
+ pltenter_avx512f (regs, framesizep);
+
+ return sym->st_value;
+}
+
+void
+__attribute__ ((target ("avx512f")))
+pltexit_avx512f (const La_regs *inregs, La_retval *outregs)
+{
+ __m512i zero = _mm512_setzero_si512 ();
+ if (memcmp (&outregs->lrv_vector0, &zero, sizeof (zero)))
+ abort ();
+
+ for (int i = 0; i < 8; i++)
{
- __m512i zero = _mm512_setzero_si512 ();
- if (memcmp (®s->lr_vector[0], &zero, sizeof (zero))
- || memcmp (®s->lr_vector[1], &zero, sizeof (zero))
- || memcmp (®s->lr_vector[2], &zero, sizeof (zero))
- || memcmp (®s->lr_vector[3], &zero, sizeof (zero))
- || memcmp (®s->lr_vector[4], &zero, sizeof (zero))
- || memcmp (®s->lr_vector[5], &zero, sizeof (zero))
- || memcmp (®s->lr_vector[6], &zero, sizeof (zero))
- || memcmp (®s->lr_vector[7], &zero, sizeof (zero)))
- abort ();
-
- for (int i = 0; i < 8; i++)
- regs->lr_vector[i].zmm[0]
- = (La_x86_64_zmm) _mm512_set1_epi64 (i + 1);
-
- __m512i zmm = _mm512_set1_epi64 (-1);
- asm volatile ("vmovdqa64 %0, %%zmm0" : : "x" (zmm) : "xmm0" );
- asm volatile ("vmovdqa64 %0, %%zmm1" : : "x" (zmm) : "xmm1" );
- asm volatile ("vmovdqa64 %0, %%zmm2" : : "x" (zmm) : "xmm2" );
- asm volatile ("vmovdqa64 %0, %%zmm3" : : "x" (zmm) : "xmm3" );
- asm volatile ("vmovdqa64 %0, %%zmm4" : : "x" (zmm) : "xmm4" );
- asm volatile ("vmovdqa64 %0, %%zmm5" : : "x" (zmm) : "xmm5" );
- asm volatile ("vmovdqa64 %0, %%zmm6" : : "x" (zmm) : "xmm6" );
- asm volatile ("vmovdqa64 %0, %%zmm7" : : "x" (zmm) : "xmm7" );
-
- *framesizep = 1024;
+ __m512i zmm = _mm512_set1_epi64 (i + 1);
+ if (memcmp (&inregs->lr_vector[i], &zmm, sizeof (zmm)) != 0)
+ abort ();
}
-#endif
- return sym->st_value;
+ outregs->lrv_vector0.zmm[0]
+ = (La_x86_64_zmm) _mm512_set1_epi64 (0x12349876);
+
+ __m512i zmm = _mm512_set1_epi64 (-1);
+ asm volatile ("vmovdqa64 %0, %%zmm0" : : "x" (zmm) : "xmm0" );
+ asm volatile ("vmovdqa64 %0, %%zmm1" : : "x" (zmm) : "xmm1" );
}
unsigned int
symname, (long int) sym->st_value, ndx,
(ptrdiff_t) outregs->int_retval);
-#ifdef __AVX512F__
if (check_avx512 () && strcmp (symname, "audit_test") == 0)
- {
- __m512i zero = _mm512_setzero_si512 ();
- if (memcmp (&outregs->lrv_vector0, &zero, sizeof (zero)))
- abort ();
-
- for (int i = 0; i < 8; i++)
- {
- __m512i zmm = _mm512_set1_epi64 (i + 1);
- if (memcmp (&inregs->lr_vector[i], &zmm, sizeof (zmm)) != 0)
- abort ();
- }
-
- outregs->lrv_vector0.zmm[0]
- = (La_x86_64_zmm) _mm512_set1_epi64 (0x12349876);
-
- __m512i zmm = _mm512_set1_epi64 (-1);
- asm volatile ("vmovdqa64 %0, %%zmm0" : : "x" (zmm) : "xmm0" );
- asm volatile ("vmovdqa64 %0, %%zmm1" : : "x" (zmm) : "xmm1" );
- }
-#endif
+ pltexit_avx512f (inregs, outregs);
return 0;
}