From: Raspbian forward porter Date: Wed, 20 Sep 2023 14:03:25 +0000 (+0100) Subject: volk (3.0.0-2+rpi1) trixie-staging; urgency=medium X-Git-Tag: archive/raspbian/3.1.0-3+rpi1~12 X-Git-Url: https://dgit.raspbian.org/?a=commitdiff_plain;h=c1db40b8d1f1309bcbc1c85b399b37d2b257081d;p=volk.git volk (3.0.0-2+rpi1) trixie-staging; urgency=medium [changes brought forward from 1.1-1+rpi1 by Peter Michael Green at Sun, 20 Sep 2015 20:30:19 +0000] * Disable neon. [changes introduced in 2.1.0-2+rpi1 by Peter Michael Green] * Add bulid-depends-indep on texlive-latex-extra. [dgit import unpatched volk 3.0.0-2+rpi1] --- c1db40b8d1f1309bcbc1c85b399b37d2b257081d diff --cc debian/changelog index 0000000,0000000..c905776 new file mode 100644 --- /dev/null +++ b/debian/changelog @@@ -1,0 -1,0 +1,597 @@@ ++volk (3.0.0-2+rpi1) trixie-staging; urgency=medium ++ ++ [changes brought forward from 1.1-1+rpi1 by Peter Michael Green at Sun, 20 Sep 2015 20:30:19 +0000] ++ * Disable neon. ++ ++ [changes introduced in 2.1.0-2+rpi1 by Peter Michael Green] ++ * Add bulid-depends-indep on texlive-latex-extra. ++ ++ -- Raspbian forward porter Wed, 20 Sep 2023 14:03:25 +0000 ++ ++volk (3.0.0-2) unstable; urgency=medium ++ ++ * upload to unstable ++ ++ -- A. Maitland Bottoms Sat, 15 Jul 2023 21:58:53 -0400 ++ ++volk (3.0.0-1) experimental; urgency=medium ++ ++ * New upstream release ++ - License switch to LGPLv3+ ++ - Fix build for 32 bit arm with neon ++ - Add experimental support for MIPS and RISC-V ++ * Upload to experimental for package renames and soversion bump ++ ++ -- A. Maitland Bottoms Sat, 14 Jan 2023 14:01:06 -0500 ++ ++volk (2.5.2-3) unstable; urgency=medium ++ ++ * orc 1:0.4.33-1 dropped building static library, ++ so now volk will drop building its static library too. (Closes: #1026593) ++ ++ -- A. Maitland Bottoms Tue, 20 Dec 2022 20:03:23 -0500 ++ ++volk (2.5.2-2) unstable; urgency=medium ++ ++ * revert changes to kernels/volk/volk_8u_x2_encodeframepolar_8u.h ++ made by make-acc-happy patch since version 1.3-1 (Closes: #1021856) ++ ++ -- A. Maitland Bottoms Sat, 15 Oct 2022 23:41:48 -0400 ++ ++volk (2.5.2-1) unstable; urgency=medium ++ ++ * New upstream release. ++ * volk_8u_x4_conv_k7_r2_8u ++ - Add NEON implementation `neonspiral` via `sse2neon.h` ++ * Fixes ++ - Fix out-of-bounds reads ++ - Fix broken neon kernels ++ - Fix float to int conversion ++ * CMake ++ - Suppress superfluous warning ++ - Fix Python install path calculation and documentation ++ ++ -- A. Maitland Bottoms Sun, 04 Sep 2022 12:00:56 -0400 ++ ++volk (2.5.1-2) unstable; urgency=medium ++ ++ * VolkPython use posix prefix scheme (Closes: #1009394) ++ ++ -- A. Maitland Bottoms Tue, 12 Apr 2022 18:39:33 -0400 ++ ++volk (2.5.1-1) unstable; urgency=medium ++ ++ * New upstream release. ++ ++ -- A. Maitland Bottoms Sun, 13 Feb 2022 00:18:58 -0500 ++ ++volk (2.5.0-2) unstable; urgency=medium ++ ++ * upload to unstable ++ * with some upstream bugfixes ++ ++ -- A. Maitland Bottoms Thu, 21 Oct 2021 23:30:05 -0400 ++ ++volk (2.5.0-1) experimental; urgency=medium ++ ++ * New upstream release ++ * Use libcpu-features-dev on powerpc and x32 (Closes: #978602) ++ * Mention volk-config-info and volk_modtool in description (Closes: #989263) ++ * Upload to experimental for soversion bump ++ ++ -- A. Maitland Bottoms Thu, 10 Jun 2021 18:29:47 -0400 ++ ++volk (2.4.1-2) unstable; urgency=medium ++ ++ [ Shengjing Zhu ] ++ * Use system cpu_features package ++ ++ [ A. Maitland Bottoms ] ++ * Adopt Use system cpu_features package patch (Closes: #978096) ++ ++ -- A. Maitland Bottoms Sun, 27 Dec 2020 15:16:07 -0500 ++ ++volk (2.4.1-1) unstable; urgency=medium ++ ++ * New upstream release ++ ++ -- A. Maitland Bottoms Thu, 17 Dec 2020 23:53:21 -0500 ++ ++volk (2.4.0-4) unstable; urgency=medium ++ ++ * skip cpu_features on "Unsupported OS" kFreeBSD ++ * bump Standards-Version - no other changes. ++ ++ -- A. Maitland Bottoms Tue, 15 Dec 2020 19:53:16 -0500 ++ ++volk (2.4.0-3) unstable; urgency=medium ++ ++ * Fix binary-indep build (Closes: #976300) ++ * Upload to unstable ++ ++ -- A. Maitland Bottoms Thu, 03 Dec 2020 20:43:29 -0500 ++ ++volk (2.4.0-2) experimental; urgency=medium ++ ++ * Make use of cpu_features a CMake option with sensible defaults per arch ++ ++ -- A. Maitland Bottoms Mon, 30 Nov 2020 16:19:19 -0500 ++ ++volk (2.4.0-1) experimental; urgency=medium ++ ++ * New upstream release ++ * cpu_features git submodule packaged as cpu-features source component. ++ * Upload to experimental for soversion bump ++ ++ -- A. Maitland Bottoms Sun, 22 Nov 2020 12:35:43 -0500 ++ ++volk (2.3.0-3) unstable; urgency=medium ++ ++ * update to v2.3.0-14-g91e5d07 ++ emit an emms instruction after using the mmx extension ++ ++ -- A. Maitland Bottoms Tue, 30 Jun 2020 19:48:20 -0400 ++ ++volk (2.3.0-2) unstable; urgency=medium ++ ++ * Upload to unstable ++ ++ -- A. Maitland Bottoms Mon, 11 May 2020 07:26:03 -0400 ++ ++volk (2.3.0-1) experimental; urgency=medium ++ ++ * New upstream release, to experimental for soversion bump ++ * Kernels ++ - volk: accurate exp kernel ++ - exp: Rename SSE4.1 to SSE2 kernel ++ - Add 32f_s32f_add_32f kernel ++ - This kernel adds in vector + scalar functionality ++ - Fix the broken index max kernels ++ - Treat the mod_range puppet as such ++ - Add puppet for power spectral density kernel ++ - Updated log10 calcs to use faster log2 approach ++ - fix: Use unaligned load ++ - divide: Optimize complexmultiplyconjugate ++ ++ -- A. Maitland Bottoms Sat, 09 May 2020 15:42:23 -0400 ++ ++volk (2.2.1-3) unstable; urgency=medium ++ ++ * update to v2.2.1-34-gd4756c5 ++ ++ -- A. Maitland Bottoms Sun, 05 Apr 2020 10:37:46 -0400 ++ ++volk (2.2.1-2) unstable; urgency=medium ++ ++ * update to v2.2.1-11-gfaf230e ++ * cmake: Remove the ORC from the VOLK public link interface ++ * Fix the broken index max kernels ++ ++ -- A. Maitland Bottoms Fri, 27 Mar 2020 21:48:10 -0400 ++ ++volk (2.2.1-1) unstable; urgency=high ++ ++ * New upstream bugfix release ++ reason for high urgency: ++ - Fix loop bound in AVX rotator (only one fixed in 2.2.0-3) ++ - Fix out-of-bounds read in AVX2 square dist kernel ++ - Fix length checks in AVX2 index max kernels ++ ++ -- A. Maitland Bottoms Mon, 24 Feb 2020 18:08:05 -0500 ++ ++volk (2.2.0-3) unstable; urgency=high ++ ++ * Update to v2.2.0-6-g5701f8f ++ reason for high urgency: ++ - Fix loop bound in AVX rotator ++ ++ -- A. Maitland Bottoms Sun, 23 Feb 2020 23:49:18 -0500 ++ ++volk (2.2.0-2) unstable; urgency=medium ++ ++ * Upload to unstable ++ ++ -- A. Maitland Bottoms Tue, 18 Feb 2020 17:56:58 -0500 ++ ++volk (2.2.0-1) experimental; urgency=medium ++ ++ * New upstream release ++ - Remove build dependency on python six ++ - Fixup VolkConfigVersion ++ - add volk_version.h ++ ++ -- A. Maitland Bottoms Sun, 16 Feb 2020 18:25:20 -0500 ++ ++volk (2.1.0-2) unstable; urgency=medium ++ ++ * Upload to unstable ++ ++ -- A. Maitland Bottoms Sun, 05 Jan 2020 23:17:57 -0500 ++ ++volk (2.1.0-1) experimental; urgency=medium ++ ++ * New upstream release ++ - The AVX FMA rotator bug is fixed ++ - VOLK offers `volk::vector<>` for C++ to follow RAII ++ - Use C++17 `std::filesystem` ++ - This enables VOLK to be built without Boost if available! ++ - lots of bugfixes ++ - more optimized kernels, especially more NEON versions ++ * Upload to experimental for new ABI library package libvolk2.1 ++ ++ -- A. Maitland Bottoms Sun, 22 Dec 2019 10:27:36 -0500 ++ ++volk (2.0.0-3) unstable; urgency=medium ++ ++ * update to v2.0.0-4-gf04a46f ++ ++ -- A. Maitland Bottoms Thu, 14 Nov 2019 22:47:23 -0500 ++ ++volk (2.0.0-2) unstable; urgency=medium ++ ++ * Upload to unstable ++ ++ -- A. Maitland Bottoms Mon, 12 Aug 2019 22:49:11 -0400 ++ ++volk (2.0.0-1) experimental; urgency=medium ++ ++ * New upstream release ++ ++ -- A. Maitland Bottoms Wed, 07 Aug 2019 23:31:20 -0400 ++ ++volk (1.4-4) unstable; urgency=medium ++ ++ * working volk_modtool with Python 3 ++ * build and install libvolk.a ++ ++ -- A. Maitland Bottoms Mon, 29 Oct 2018 01:32:05 -0400 ++ ++volk (1.4-3) unstable; urgency=medium ++ ++ * update to v1.4-9-g297fefd ++ Added an AVX protokernel for volk_32fc_x2_32f_square_dist_scalar_mult_32f ++ fixed a buffer over-read and over-write in ++ volk_32fc_x2_s32f_square_dist_scalar_mult_32f_a_avx ++ Fix 32u_reverse_32u for ARM ++ ++ -- A. Maitland Bottoms Sat, 12 May 2018 15:25:04 -0400 ++ ++volk (1.4-2) unstable; urgency=medium ++ ++ * Upload to unstable, needed by gnuradio (>= 3.7.12.0) ++ ++ -- A. Maitland Bottoms Tue, 03 Apr 2018 01:03:19 -0400 ++ ++volk (1.4-1) experimental; urgency=medium ++ ++ * New upstream release ++ upstream changelog http://libvolk.org/release-v14.html ++ ++ -- A. Maitland Bottoms Tue, 27 Mar 2018 22:57:42 -0400 ++ ++volk (1.3.1-1) unstable; urgency=medium ++ ++ * New upstream bugfix release ++ * Refresh all debian patches for use with git am ++ ++ -- A. Maitland Bottoms Tue, 27 Mar 2018 21:54:29 -0400 ++ ++volk (1.3-3) unstable; urgency=medium ++ ++ * update to v1.3-23-g0109b2e ++ * update debian/libvolk1-dev.abi.tar.gz.amd64 ++ * Add breaks/replaces gnuradio (<=3.7.2.1) (LP: #1614235) ++ ++ -- A. Maitland Bottoms Sun, 04 Feb 2018 13:12:21 -0500 ++ ++volk (1.3-2) unstable; urgency=medium ++ ++ * update to v1.3-16-g28b03a9 ++ apps: fix profile update reading end of lines ++ qa: lower tolerance for 32fc_mag to fix issue #96 ++ * include upstream master patch to sort input files ++ ++ -- A. Maitland Bottoms Sun, 27 Aug 2017 13:44:55 -0400 ++ ++volk (1.3-1) unstable; urgency=medium ++ ++ * New upstream release ++ * The index_max kernels were named with the wrong output datatype. To ++ fix this there are new kernels that return a 32u (int32_t) and the ++ existing kernels had their signatures changed to return 16u (int16_t). ++ * The output to stdout and stderr has been shuffled around. There is no ++ longer a message that prints what VOLK machine is being used and the ++ warning messages go to stderr rather than stdout. ++ * The 32fc_index_max kernels previously were only accurate to the SSE ++ register width (4 points). This was a pretty serious and long-lived ++ bug that's been fixed and the QA updated appropriately. ++ ++ -- A. Maitland Bottoms Sat, 02 Jul 2016 16:30:47 -0400 ++ ++volk (1.2.2-2) unstable; urgency=medium ++ ++ * update to v1.2.2-11-g78c8bc4 (to follow gnuradio maint branch) ++ ++ -- A. Maitland Bottoms Sun, 19 Jun 2016 14:44:15 -0400 ++ ++volk (1.2.2-1) unstable; urgency=medium ++ ++ * New upstream release ++ ++ -- A. Maitland Bottoms Fri, 08 Apr 2016 00:12:10 -0400 ++ ++volk (1.2.1-2) unstable; urgency=medium ++ ++ * Upstream patches: ++ Fix some CMake complaints ++ The fix for compilation with cmake 3.5 ++ ++ -- A. Maitland Bottoms Wed, 23 Mar 2016 17:47:54 -0400 ++ ++volk (1.2.1-1) unstable; urgency=medium ++ ++ * New upstream release ++ ++ -- A. Maitland Bottoms Sun, 07 Feb 2016 19:38:32 -0500 ++ ++volk (1.2-1) unstable; urgency=medium ++ ++ * New upstream release ++ ++ -- A. Maitland Bottoms Thu, 24 Dec 2015 20:28:13 -0500 ++ ++volk (1.1.1-5) experimental; urgency=medium ++ ++ * update to v1.1.1-22-gef53547 to support gnuradio 3.7.9 ++ ++ -- A. Maitland Bottoms Fri, 11 Dec 2015 13:12:55 -0500 ++ ++volk (1.1.1-4) unstable; urgency=medium ++ ++ * more lintian fixes ++ ++ -- A. Maitland Bottoms Wed, 25 Nov 2015 21:49:58 -0500 ++ ++volk (1.1.1-3) unstable; urgency=medium ++ ++ * Lintian fixes Pre-Depends ++ ++ -- A. Maitland Bottoms Thu, 19 Nov 2015 21:24:27 -0500 ++ ++volk (1.1.1-2) unstable; urgency=medium ++ ++ * Note that libvolk1-dev replaces files in gnuradio-dev versions <<3.7.8 ++ (Closes: #802646) again. Thanks Andreas Beckmann. ++ ++ -- A. Maitland Bottoms Fri, 13 Nov 2015 18:45:49 -0500 ++ ++volk (1.1.1-1) unstable; urgency=medium ++ ++ * New upstream release ++ * New architectures exist for the AVX2 and FMA ISAs. ++ * The profiler now generates buffers that are vlen + a tiny amount and ++ generates random data to fill buffers. This is intended to catch bugs ++ in protokernels that write beyond num_points. ++ * Note that libvolk1-dev replaces files in earlier gnuradio-dev versions ++ (Closes: #802646) ++ ++ -- A. Maitland Bottoms Sun, 01 Nov 2015 18:45:43 -0500 ++ ++volk (1.1-4) unstable; urgency=medium ++ ++ * update to v1.1-12-g264addc ++ ++ -- A. Maitland Bottoms Tue, 29 Sep 2015 23:41:50 -0400 ++ ++volk (1.1-3) unstable; urgency=low ++ ++ * drop dh_acc to get reproducible builds ++ ++ -- A. Maitland Bottoms Fri, 11 Sep 2015 22:57:06 -0400 ++ ++volk (1.1-2) unstable; urgency=low ++ ++ * use dh-acc ++ ++ -- A. Maitland Bottoms Mon, 07 Sep 2015 15:45:20 -0400 ++ ++volk (1.1-1) unstable; urgency=medium ++ ++ * re-organize package naming convention ++ * New upstream release tag v1.1 ++ New architectures exist for the AVX2 and FMA ISAs. Along ++ with the build-system support the following kernels have ++ no proto-kernels taking advantage of these architectures: ++ ++ * 32f_x2_dot_prod_32f ++ * 32fc_x2_multiply_32fc ++ * 64_byteswap ++ * 32f_binary_slicer_8i ++ * 16u_byteswap ++ * 32u_byteswap ++ ++ QA/profiler ++ ----------- ++ ++ The profiler now generates buffers that are vlen + a tiny ++ amount and generates random data to fill buffers. This is ++ intended to catch bugs in protokernels that write beyond ++ num_points. ++ ++ -- A. Maitland Bottoms Wed, 26 Aug 2015 09:22:48 -0400 ++ ++volk (1.0.2-2) unstable; urgency=low ++ ++ * Use SOURCE_DATE_EPOCH from the environment, if defined, ++ rather than current date and time to implement volk_build_date() ++ (embedding build date in a library does not help reproducible builds) ++ * add watch file ++ ++ -- A. Maitland Bottoms Sat, 15 Aug 2015 17:43:15 -0400 ++ ++volk (1.0.2-1) unstable; urgency=medium ++ ++ * Maintenance release 24 Jul 2015 by Nathan West ++ * The major change is the CMake logic to add ASM protokernels. Rather ++ than depending on CFLAGS and ASMFLAGS we use the results of VOLK's ++ built in has_ARCH tests. All configurations should work the same as ++ before, but manually specifying CFLAGS and ASMFLAGS on the cmake call ++ for ARM native builds should no longer be necessary. ++ * The 32fc_s32fc_x2_rotator_32fc generic protokernel now includes a ++ previously implied header. ++ * Finally, there is a fix to return the "best" protokernel to the ++ dispatcher when no volk_config exists. Thanks to Alexandre Raymond for ++ pointing this out. ++ * with maint branch patch: ++ kernels-add-missing-include-arm_neon.h ++ * removed unused build-dependency on liboil0.3-dev (closes: #793626) ++ ++ -- A. Maitland Bottoms Wed, 05 Aug 2015 00:43:40 -0400 ++ ++volk (1.0.1-1) unstable; urgency=low ++ ++ * Maintenance Release v1.0.1 08 Jul 2015 by Nathan West ++ This is a maintenance release with bug fixes since the initial release of ++ v1.0 in April. ++ ++ * Contributors ++ ++ The following authors have contributed code to this release: ++ ++ Doug Geiger doug.geiger@bioradiation.net ++ Elliot Briggs elliot.briggs@gmail.com ++ Marcus Mueller marcus@hostalia.de ++ Nathan West nathan.west@okstate.edu ++ Tom Rondeau tom@trondeau.com ++ ++ * Kernels ++ ++ Several bug fixes in different kernels. The NEON implementations of the ++ following kernels have been fixed: ++ ++ 32f_x2_add_32f ++ 32f_x2_dot_prod_32f ++ 32fc_s32fc_multiply_32fc ++ 32fc_x2_multiply_32fc ++ ++ Additionally the NEON asm based 32f_x2_add_32f protokernels were not being ++ used and are now included and available for use via the dispatcher. ++ ++ The 32f_s32f_x2_fm_detect_32f kernel now has a puppet. This solves QA seg ++ faults on 32-bit machines and provide a better test for this kernel. ++ ++ The 32fc_s32fc_x2_rotator_32fc generic protokernel replaced cabsf with ++ hypotf for better Android support. ++ ++ * Building ++ ++ Static builds now trigger the applications (volk_profile and ++ volk-config-info) to be statically linked. ++ ++ The file gcc_x86_cpuid.h has been removed since it was no longer being ++ used. Previously it provided cpuid functionality for ancient compilers ++ that we do not support. ++ ++ All build types now use -Wall. ++ ++ * QA and Testing ++ ++ The documentation around the --update option to volk_profile now makes it ++ clear that the option will only profile kernels without entries in ++ volk_profile. The signature of run_volk_tests with expanded args changed ++ signed types to unsigned types to reflect the actual input. ++ ++ The remaining changes are all non-functional changes to address issues ++ from Coverity. ++ ++ -- A. Maitland Bottoms Fri, 10 Jul 2015 17:57:42 -0400 ++ ++volk (1.0-5) unstable; urgency=medium ++ ++ * native-armv7-build-support skips neon on Debian armel (Closes: #789972) ++ ++ -- A. Maitland Bottoms Sat, 04 Jul 2015 12:36:36 -0400 ++ ++volk (1.0-4) unstable; urgency=low ++ ++ * update native-armv7-build-support patch from gnuradio volk package ++ ++ -- A. Maitland Bottoms Thu, 25 Jun 2015 16:38:49 -0400 ++ ++volk (1.0-3) unstable; urgency=medium ++ ++ * Add Breaks/Replaces (Closes: #789893, #789894) ++ * Allow failing tests ++ ++ -- A. Maitland Bottoms Thu, 25 Jun 2015 12:46:06 -0400 ++ ++volk (1.0-2) unstable; urgency=medium ++ ++ * kernels-add-missing-math.h-include-to-rotator ++ ++ -- A. Maitland Bottoms Wed, 24 Jun 2015 21:09:32 -0400 ++ ++volk (1.0-1) unstable; urgency=low ++ ++ * Initial package (Closes: #782417) ++ Initial Release 11 Apr 2015 by Nathan West ++ ++ VOLK 1.0 is available. This is the first release of VOLK as an independently ++ tracked sub-project of GNU Radio. ++ ++ * Contributors ++ ++ VOLK has been tracked separately from GNU Radio since 2014 Dec 23. ++ Contributors between the split and the initial release are ++ ++ Albert Holguin aholguin_77@yahoo.com ++ Doug Geiger doug.geiger@bioradiation.net ++ Elliot Briggs elliot.briggs@gmail.com ++ Julien Olivain julien.olivain@lsv.ens-cachan.fr ++ Michael Dickens michael.dickens@ettus.com ++ Nathan West nathan.west@okstate.edu ++ Tom Rondeau tom@trondeau.com ++ ++ * QA ++ ++ The test and profiler have significantly changed. The profiler supports ++ run-time changes to vlen and iters to help kernel development and provide ++ more flexibility on embedded systems. Additionally there is a new option ++ to update an existing volk_profile results file with only new kernels which ++ will save time when updating to newer versions of VOLK ++ ++ The QA system creates a static list of kernels and test cases. The QA ++ testing and profiler iterate over this static list rather than each source ++ file keeping its own list. The QA also emits XML results to ++ lib/.unittest/kernels.xml which is formatted similarly to JUnit results. ++ ++ * Modtool ++ ++ Modtool was updated to support the QA and profiler changes. ++ ++ * Kernels ++ ++ New proto-kernels: ++ ++ 16ic_deinterleave_real_8i_neon ++ 16ic_s32f_deinterleave_32f_neon ++ fix preprocessor errors for some compilers on byteswap and popcount puppets ++ ++ ORC was moved to the asm kernels directory. ++ volk_malloc ++ ++ The posix_memalign implementation of Volk_malloc now falls back to a standard ++ malloc if alignment is 1. ++ ++ * Miscellaneous ++ ++ Several build system and cmake changes have made it possible to build VOLK ++ both independently with proper soname versions and in-tree for projects ++ such as GNU Radio. ++ ++ The static builds take advantage of cmake object libraries to speed up builds. ++ ++ Finally, there are a number of changes to satisfy compiler warnings and make ++ QA work on multiple machines. ++ ++ -- A. Maitland Bottoms Sun, 12 Apr 2015 23:20:41 -0400 diff --cc debian/control index 0000000,0000000..b42446b new file mode 100644 --- /dev/null +++ b/debian/control @@@ -1,0 -1,0 +1,83 @@@ ++Source: volk ++Section: libdevel ++Priority: optional ++Maintainer: A. Maitland Bottoms ++Build-Depends: cmake, ++ debhelper-compat (= 13), ++ dh-python, ++ liborc-0.4-dev, ++ libcpu-features-dev [amd64 arm64 armel armhf i386 mips64el mipsel powerpc ppc64 ppc64el x32], ++ python3-dev, ++ python3-mako ++Build-Depends-Indep: doxygen, graphviz, texlive-latex-extra, ++Standards-Version: 4.6.2 ++Rules-Requires-Root: no ++Homepage: https://libvolk.org ++Vcs-Browser: https://salsa.debian.org/bottoms/pkg-volk ++Vcs-Git: https://salsa.debian.org/bottoms/pkg-volk.git ++ ++Package: libvolk3.0 ++Section: libs ++Architecture: any ++Pre-Depends: ${misc:Pre-Depends} ++Depends: ${misc:Depends}, ${shlibs:Depends} ++Multi-Arch: same ++Recommends: libvolk-bin ++Suggests: libvolk-dev ++Description: vector optimized functions ++ Vector-Optimized Library of Kernels is designed to help applications ++ work with the processor's SIMD instruction sets. These are very ++ powerful vector operations that can give signal processing a huge ++ boost in performance. ++ ++Package: libvolk-dev ++Architecture: any ++Pre-Depends: ${misc:Pre-Depends} ++Depends: libvolk3.0 (=${binary:Version}), ${misc:Depends} ++Breaks: libvolk2-dev, libvolk1.0-dev, libvolk1-dev ++Replaces: libvolk2-dev, libvolk1.0-dev, libvolk1-dev ++Suggests: libvolk-doc ++Multi-Arch: same ++Description: vector optimized function headers ++ Vector-Optimized Library of Kernels is designed to help applications ++ work with the processor's SIMD instruction sets. These are very ++ powerful vector operations that can give signal processing a huge ++ boost in performance. ++ . ++ This package contains the header files. ++ For documentation, see libvolk-doc. ++ ++Package: libvolk-bin ++Section: libs ++Architecture: any ++Pre-Depends: ${misc:Pre-Depends} ++Depends: libvolk3.0 (=${binary:Version}), ++ ${misc:Depends}, ++ ${python3:Depends}, ++ ${shlibs:Depends} ++Breaks: libvolk2-bin, libvolk1-bin, libvolk1.0-bin ++Replaces: libvolk2-bin, libvolk1-bin, libvolk1.0-bin ++Description: vector optimized runtime tools ++ Vector-Optimized Library of Kernels is designed to help applications ++ work with the processor's SIMD instruction sets. These are very ++ powerful vector operations that can give signal processing a huge ++ boost in performance. ++ . ++ This package includes: the volk_profile tool to customize settings for ++ the system; volk_modtool to create new optimized modules; and ++ volk-config-info to show settings. ++ ++Package: libvolk-doc ++Section: doc ++Architecture: all ++Multi-Arch: foreign ++Depends: ${misc:Depends} ++Recommends: www-browser ++Description: vector optimized library documentation ++ Vector-Optimized Library of Kernels is designed to help applications ++ work with the processor's SIMD instruction sets. These are very ++ powerful vector operations that can give signal processing a huge ++ boost in performance. ++ . ++ This package includes the Doxygen generated documentation in ++ /usr/share/doc/libvolk-dev/html/index.html diff --cc debian/copyright index 0000000,0000000..440c5dc new file mode 100644 --- /dev/null +++ b/debian/copyright @@@ -1,0 -1,0 +1,152 @@@ ++Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ ++Upstream-Name: volk ++Upstream-Contact: http://libvolk.org/ ++Source: ++ https://github.com/gnuradio/volk ++Comment: ++ Debian packages by A. Maitland Bottoms ++ . ++ Upstream Maintainers: ++ Johannes Demel ++ Michael Dickens ++Copyright: 2014-2023 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: * ++Copyright: 2006, 2009-2023, Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: apps/volk_profile.h ++Copyright: 2014-2020 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: appveyor.yml ++Copyright: 2016 Paul Cercueil ++License: LGPL-3+ ++ ++Files: cmake/* ++Copyright: 2014-2020 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: cmake/Modules/* ++Copyright: 2006, 2009-2020, Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: cmake/Modules/CMakeParseArgumentsCopy.cmake ++Copyright: 2010 Alexander Neundorf ++License: Kitware-BSD ++ All rights reserved. ++ . ++ Redistribution and use in source and binary forms, with or without ++ modification, are permitted provided that the following conditions ++ are met: ++ . ++ * Redistributions of source code must retain the above copyright ++ notice, this list of conditions and the following disclaimer. ++ . ++ * Redistributions in binary form must reproduce the above copyright ++ notice, this list of conditions and the following disclaimer in the ++ documentation and/or other materials provided with the distribution. ++ . ++ * Neither the names of Kitware, Inc., the Insight Software Consortium, ++ nor the names of their contributors may be used to endorse or promote ++ products derived from this software without specific prior written ++ permission. ++ . ++ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ++ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT ++ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ++ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT ++ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, ++ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT ++ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, ++ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY ++ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ++ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE ++ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ++ ++Files: cmake/Modules/FindORC.cmake ++ cmake/Modules/VolkConfig.cmake.in ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: cmake/msvc/* ++Copyright: 2006-2008, Alexander Chemeris ++License: BSD-2-clause ++ Redistribution and use in source and binary forms, with or without ++ modification, are permitted provided that the following conditions are met: ++ . ++ 1. Redistributions of source code must retain the above copyright notice, ++ this list of conditions and the following disclaimer. ++ . ++ 2. Redistributions in binary form must reproduce the above copyright ++ notice, this list of conditions and the following disclaimer in the ++ documentation and/or other materials provided with the distribution. ++ . ++ 3. The name of the author may be used to endorse or promote products ++ derived from this software without specific prior written permission. ++ . ++ THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED ++ WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ++ MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO ++ EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, ++ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, ++ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; ++ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, ++ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR ++ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ++ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ++ ++Files: debian/* ++Copyright: 2015-2020 Free Software Foundation, Inc ++License: LGPL-3+ ++Comment: assigned by A. Maitland Bottoms ++ ++Files: docs/* ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: gen/archs.xml ++ gen/machines.xml ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: include/volk/volk_common.h ++ include/volk/volk_complex.h ++ include/volk/volk_prefs.h ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: kernels/volk/asm/* ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: kernels/volk/volk_16u_byteswappuppet_16u.h ++ kernels/volk/volk_32u_byteswappuppet_32u.h ++ kernels/volk/volk_64u_byteswappuppet_64u.h ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++Files: lib/kernel_tests.h ++ lib/qa_utils.cc ++ lib/qa_utils.h ++ lib/volk_prefs.c ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: LGPL-3+ ++ ++License: LGPL-3+ ++ This program is free software: you can redistribute it and/or modify ++ it under the terms of the GNU Lesser General Public License as published by ++ the Free Software Foundation; either version 3 of the License, or ++ (at your option) any later version. ++ . ++ This program is distributed in the hope that it will be useful, ++ but WITHOUT ANY WARRANTY; without even the implied warranty of ++ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ++ GNU General Public License for more details. ++ . ++ You should have received a copy of the GNU Lesser General Public License ++ along with this program. If not, see . ++ . ++ On Debian systems, the complete text of the GNU Lesser General ++ Public License version 3 can be found in "/usr/share/common-licenses/LGPL-3". diff --cc debian/libvolk-bin.install index 0000000,0000000..2f34922 new file mode 100644 --- /dev/null +++ b/debian/libvolk-bin.install @@@ -1,0 -1,0 +1,2 @@@ ++usr/bin/volk* ++usr/lib/python3*/site-packages/* usr/lib/python3/dist-packages/ diff --cc debian/libvolk-bin.manpages index 0000000,0000000..95bae9e new file mode 100644 --- /dev/null +++ b/debian/libvolk-bin.manpages @@@ -1,0 -1,0 +1,3 @@@ ++debian/volk-config-info.1 ++debian/volk_modtool.1 ++debian/volk_profile.1 diff --cc debian/libvolk-dev.install index 0000000,0000000..4b391be new file mode 100644 --- /dev/null +++ b/debian/libvolk-dev.install @@@ -1,0 -1,0 +1,4 @@@ ++usr/include/* ++usr/lib/*/*volk*so ++usr/lib/*/cmake/volk ++usr/lib/*/pkgconfig/*volk* diff --cc debian/libvolk-doc.doc-base index 0000000,0000000..1174c64 new file mode 100644 --- /dev/null +++ b/debian/libvolk-doc.doc-base @@@ -1,0 -1,0 +1,19 @@@ ++Document: libvolk-doc ++Title: Vector-Optimized Library of Kernels Reference Manual ++Author: GNU Radio Developers ++Abstract: VOLK is the Vector-Optimized Library of Kernels. ++ It is a library that contains kernels of hand-written SIMD code for ++ different mathematical operations. Since each SIMD architecture can ++ be very different and no compiler has yet come along to handle ++ vectorization properly or highly efficiently, VOLK approaches the ++ problem differently. For each architecture or platform that a ++ developer wishes to vectorize for, a new proto-kernel is added to ++ VOLK. At runtime, VOLK will select the correct proto-kernel. In this ++ way, the users of VOLK call a kernel for performing the operation ++ that is platform/architecture agnostic. This allows us to write ++ portable SIMD code. ++Section: Programming/C++ ++ ++Format: HTML ++Index: /usr/share/doc/libvolk-dev/html/index.html ++Files: /usr/share/doc/libvolk-dev/html/*.html diff --cc debian/libvolk-doc.docs index 0000000,0000000..87dd314 new file mode 100644 --- /dev/null +++ b/debian/libvolk-doc.docs @@@ -1,0 -1,0 +1,1 @@@ ++obj-*/html diff --cc debian/libvolk3.0.install index 0000000,0000000..e4252f4 new file mode 100644 --- /dev/null +++ b/debian/libvolk3.0.install @@@ -1,0 -1,0 +1,1 @@@ ++usr/lib/*/libvolk.so.* diff --cc debian/not-installed index 0000000,0000000..6f354d0 new file mode 100644 --- /dev/null +++ b/debian/not-installed @@@ -1,0 -1,0 +1,6 @@@ ++usr/bin/list_cpu_features ++usr/lib/*/cmake/CpuFeatures/CpuFeaturesConfig.cmake ++usr/lib/*/cmake/CpuFeatures/CpuFeaturesConfigVersion.cmake ++usr/lib/*/cmake/CpuFeatures/CpuFeaturesTargets-relwithdebinfo.cmake ++usr/lib/*/cmake/CpuFeatures/CpuFeaturesTargets.cmake ++usr/lib/*/libcpu_features.a diff --cc debian/patches/0001-ci-Remove-license-check.patch index 0000000,0000000..c4a0586 new file mode 100644 --- /dev/null +++ b/debian/patches/0001-ci-Remove-license-check.patch @@@ -1,0 -1,0 +1,44 @@@ ++From fe2e4a73480bf2ac2e566052ea682817dddaf61f Mon Sep 17 00:00:00 2001 ++From: Johannes Demel ++Date: Sat, 14 Jan 2023 18:35:59 +0100 ++Subject: [PATCH 1/5] ci: Remove license check ++ ++This check fails because new contributors don't need to re-license. All ++new contributions must be made under LGPL. Thus, this check is ++superfluous. Also, it would add an administrative burden that we should ++avoid. ++ ++Signed-off-by: Johannes Demel ++--- ++ CMakeLists.txt | 15 --------------- ++ 1 file changed, 15 deletions(-) ++ ++diff --git a/CMakeLists.txt b/CMakeLists.txt ++index 3816757..a6e4f87 100644 ++--- a/CMakeLists.txt +++++ b/CMakeLists.txt ++@@ -383,21 +383,6 @@ if(ENABLE_MODTOOL) ++ add_subdirectory(python/volk_modtool) ++ endif() ++ ++-######################################################################## ++-# And the LGPL license check ++-######################################################################## ++-# detect default for LGPL ++-if(VERSION_INFO_MAJOR_VERSION GREATER_EQUAL 3) ++- OPTION(ENABLE_LGPL "Enable testing for LGPL" ON) ++-else() ++- OPTION(ENABLE_LGPL "Enable testing for LGPL" OFF) ++-endif() ++- ++-if(ENABLE_TESTING AND ENABLE_LGPL) ++- message(STATUS "Checking for LGPL resubmission enabled") ++- add_test("check_lgpl" ${CMAKE_SOURCE_DIR}/scripts/licensing/count_contrib.sh ${CMAKE_SOURCE_DIR}/AUTHORS_RESUBMITTING_UNDER_LGPL_LICENSE.md) ++-endif() ++- ++ ######################################################################## ++ # Print summary ++ ######################################################################## ++-- ++2.39.2 ++ diff --cc debian/patches/0002-license-Fix-SPDX-identifiers.patch index 0000000,0000000..98310f6 new file mode 100644 --- /dev/null +++ b/debian/patches/0002-license-Fix-SPDX-identifiers.patch @@@ -1,0 -1,0 +1,160 @@@ ++From 43a5ecb7858395fc5f2eab48d06d617672081df1 Mon Sep 17 00:00:00 2001 ++From: Johannes Demel ++Date: Sun, 15 Jan 2023 11:57:12 +0100 ++Subject: [PATCH 2/5] license: Fix SPDX identifiers ++ ++Some files still carried GPL identifiers. These were identified and ++updated. Thus, all files should carry the LGPL identifier now. ++ ++Files with incorrect identifiers were found with: ++`grep -irnI --exclude-dir={build,.git,cpu_features} GPL . | grep -v LGPL` ++ ++Fixes #613 ++ ++Signed-off-by: Johannes Demel ++--- ++ CMakeLists.txt | 2 +- ++ apps/CMakeLists.txt | 2 +- ++ cmake/Modules/VolkConfig.cmake.in | 2 +- ++ cmake/Modules/VolkConfigVersion.cmake.in | 2 +- ++ cmake/cmake_uninstall.cmake.in | 2 +- ++ docs/CMakeLists.txt | 2 +- ++ include/volk/volk_version.h.in | 2 +- ++ lib/CMakeLists.txt | 2 +- ++ lib/constants.c.in | 2 +- ++ python/volk_modtool/CMakeLists.txt | 2 +- ++ 10 files changed, 10 insertions(+), 10 deletions(-) ++ ++diff --git a/CMakeLists.txt b/CMakeLists.txt ++index a6e4f87..2ee2c72 100644 ++--- a/CMakeLists.txt +++++ b/CMakeLists.txt ++@@ -3,7 +3,7 @@ ++ # ++ # This file is part of VOLK ++ # ++-# SPDX-License-Identifier: GPL-3.0-or-later +++# SPDX-License-Identifier: LGPL-3.0-or-later ++ # ++ ++ ######################################################################## ++diff --git a/apps/CMakeLists.txt b/apps/CMakeLists.txt ++index 8ac9187..db6ab50 100644 ++--- a/apps/CMakeLists.txt +++++ b/apps/CMakeLists.txt ++@@ -3,7 +3,7 @@ ++ # ++ # This file is part of VOLK ++ # ++-# SPDX-License-Identifier: GPL-3.0-or-later +++# SPDX-License-Identifier: LGPL-3.0-or-later ++ # ++ ++ ######################################################################## ++diff --git a/cmake/Modules/VolkConfig.cmake.in b/cmake/Modules/VolkConfig.cmake.in ++index 3ffdf36..f18e07f 100644 ++--- a/cmake/Modules/VolkConfig.cmake.in +++++ b/cmake/Modules/VolkConfig.cmake.in ++@@ -2,7 +2,7 @@ ++ # ++ # This file is part of VOLK. ++ # ++-# SPDX-License-Identifier: GPL-3.0-or-later +++# SPDX-License-Identifier: LGPL-3.0-or-later ++ # ++ ++ get_filename_component(VOLK_CMAKE_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH) ++diff --git a/cmake/Modules/VolkConfigVersion.cmake.in b/cmake/Modules/VolkConfigVersion.cmake.in ++index b6e9b51..b97d25d 100644 ++--- a/cmake/Modules/VolkConfigVersion.cmake.in +++++ b/cmake/Modules/VolkConfigVersion.cmake.in ++@@ -2,7 +2,7 @@ ++ # ++ # This file is part of VOLK. ++ # ++-# SPDX-License-Identifier: GPL-3.0-or-later +++# SPDX-License-Identifier: LGPL-3.0-or-later ++ # ++ ++ set(MAJOR_VERSION @VERSION_INFO_MAJOR_VERSION@) ++diff --git a/cmake/cmake_uninstall.cmake.in b/cmake/cmake_uninstall.cmake.in ++index d9d13ea..7ffbc90 100644 ++--- a/cmake/cmake_uninstall.cmake.in +++++ b/cmake/cmake_uninstall.cmake.in ++@@ -2,7 +2,7 @@ ++ # ++ # This file is part of VOLK. ++ # ++-# SPDX-License-Identifier: GPL-3.0-or-later +++# SPDX-License-Identifier: LGPL-3.0-or-later ++ # ++ ++ # http://www.vtk.org/Wiki/CMake_FAQ#Can_I_do_.22make_uninstall.22_with_CMake.3F ++diff --git a/docs/CMakeLists.txt b/docs/CMakeLists.txt ++index 90eb1d7..7c4c5e0 100644 ++--- a/docs/CMakeLists.txt +++++ b/docs/CMakeLists.txt ++@@ -2,7 +2,7 @@ ++ # ++ # Copyright 2022 Johannes Demel. ++ # ++-# SPDX-License-Identifier: GPL-3.0-or-later +++# SPDX-License-Identifier: LGPL-3.0-or-later ++ # ++ ++ find_package(Doxygen) ++diff --git a/include/volk/volk_version.h.in b/include/volk/volk_version.h.in ++index 46013da..e565048 100644 ++--- a/include/volk/volk_version.h.in +++++ b/include/volk/volk_version.h.in ++@@ -4,7 +4,7 @@ ++ * ++ * This file is part of VOLK ++ * ++- * SPDX-License-Identifier: GPL-3.0-or-later +++ * SPDX-License-Identifier: LGPL-3.0-or-later ++ */ ++ ++ #ifndef INCLUDED_VOLK_VERSION_H ++diff --git a/lib/CMakeLists.txt b/lib/CMakeLists.txt ++index 75055ee..3cfffe5 100644 ++--- a/lib/CMakeLists.txt +++++ b/lib/CMakeLists.txt ++@@ -3,7 +3,7 @@ ++ # ++ # This file is part of VOLK. ++ # ++-# SPDX-License-Identifier: GPL-3.0-or-later +++# SPDX-License-Identifier: LGPL-3.0-or-later ++ # ++ ++ ######################################################################## ++diff --git a/lib/constants.c.in b/lib/constants.c.in ++index fba8c39..049bc04 100644 ++--- a/lib/constants.c.in +++++ b/lib/constants.c.in ++@@ -4,7 +4,7 @@ ++ * ++ * This file is part of VOLK ++ * ++- * SPDX-License-Identifier: GPL-3.0-or-later +++ * SPDX-License-Identifier: LGPL-3.0-or-later ++ */ ++ ++ #if HAVE_CONFIG_H ++diff --git a/python/volk_modtool/CMakeLists.txt b/python/volk_modtool/CMakeLists.txt ++index aff7a62..dcc79a5 100644 ++--- a/python/volk_modtool/CMakeLists.txt +++++ b/python/volk_modtool/CMakeLists.txt ++@@ -3,7 +3,7 @@ ++ # ++ # This file is part of VOLK ++ # ++-# SPDX-License-Identifier: GPL-3.0-or-later +++# SPDX-License-Identifier: LGPL-3.0-or-later ++ # ++ ++ ######################################################################## ++-- ++2.39.2 ++ diff --cc debian/patches/0003-Use-cpu_features-on-RISC-V-platforms.patch index 0000000,0000000..84e97e0 new file mode 100644 --- /dev/null +++ b/debian/patches/0003-Use-cpu_features-on-RISC-V-platforms.patch @@@ -1,0 -1,0 +1,73 @@@ ++From da2cbdcd70509093171f4807c51d92b7636add9f Mon Sep 17 00:00:00 2001 ++From: Michael Roe ++Date: Tue, 17 Jan 2023 20:15:38 +0000 ++Subject: [PATCH 3/5] Use cpu_features on RISC-V platforms ++ ++Signed-off-by: Michael Roe ++--- ++ .github/workflows/run-tests.yml | 2 +- ++ CMakeLists.txt | 2 +- ++ cpu_features | 2 +- ++ tmpl/volk_cpu.tmpl.c | 6 ++++++ ++ 4 files changed, 9 insertions(+), 3 deletions(-) ++ ++diff --git a/.github/workflows/run-tests.yml b/.github/workflows/run-tests.yml ++index 215ded6..1b7024d 100644 ++--- a/.github/workflows/run-tests.yml +++++ b/.github/workflows/run-tests.yml ++@@ -114,7 +114,7 @@ jobs: ++ submodules: 'recursive' ++ - uses: uraimo/run-on-arch-action@v2.5.0 ++ name: Build in non-x86 container ++- continue-on-error: ${{ contains(fromJson('["ppc64le", "s390x", "riscv64"]'), matrix.arch) || contains(fromJson('["clang-14"]'), matrix.compiler.name) }} +++ continue-on-error: ${{ contains(fromJson('["ppc64le", "s390x"]'), matrix.arch) || contains(fromJson('["clang-14"]'), matrix.compiler.name) }} ++ id: build ++ with: ++ arch: ${{ matrix.arch }} ++diff --git a/CMakeLists.txt b/CMakeLists.txt ++index 2ee2c72..92d5097 100644 ++--- a/CMakeLists.txt +++++ b/CMakeLists.txt ++@@ -121,7 +121,7 @@ endif(MSVC) ++ ++ # cpu_features - sensible defaults, user settable option ++ if(CMAKE_SYSTEM_PROCESSOR MATCHES ++- "(^mips)|(^arm)|(^aarch64)|(x86_64)|(AMD64|amd64)|(^i.86$)|(^powerpc)|(^ppc)") +++ "(^mips)|(^arm)|(^aarch64)|(x86_64)|(AMD64|amd64)|(^i.86$)|(^powerpc)|(^ppc)|(^riscv)") ++ option(VOLK_CPU_FEATURES "Volk uses cpu_features" ON) ++ else() ++ option(VOLK_CPU_FEATURES "Volk uses cpu_features" OFF) ++diff --git a/cpu_features b/cpu_features ++index 188d0d3..4590768 160000 ++--- a/cpu_features +++++ b/cpu_features ++@@ -1 +1 @@ ++-Subproject commit 188d0d3c383689cdb6bb70dc6da2469faec84f61 +++Subproject commit 4590768e530a470a8fb8917dafee69932831c854 ++diff --git a/tmpl/volk_cpu.tmpl.c b/tmpl/volk_cpu.tmpl.c ++index a11a626..a4a06b0 100644 ++--- a/tmpl/volk_cpu.tmpl.c +++++ b/tmpl/volk_cpu.tmpl.c ++@@ -25,6 +25,8 @@ ++ #include "cpuinfo_mips.h" ++ #elif defined(CPU_FEATURES_ARCH_PPC) ++ #include "cpuinfo_ppc.h" +++#elif defined(CPU_FEATURES_ARCH_RISCV) +++#include "cpuinfo_riscv.h" ++ #endif ++ ++ // This is required for MSVC ++@@ -46,6 +48,10 @@ static int i_can_has_${arch.name} (void) { ++ %elif "mips" in arch.name: ++ #if defined(CPU_FEATURES_ARCH_MIPS) ++ if (GetMipsInfo().features.${check} == 0){ return 0; } +++#endif +++ %elif "riscv" in arch.name: +++#if defined(CPU_FEATURES_ARCH_RISCV) +++ if (GetRiscvInfo().features.${check} == 0){ return 0; } ++ #endif ++ %else: ++ #if defined(CPU_FEATURES_ARCH_X86) ++-- ++2.39.2 ++ diff --cc debian/patches/0004-add-volk_32f_s32f_x2_convert_8u-kernel.patch index 0000000,0000000..edc6107 new file mode 100644 --- /dev/null +++ b/debian/patches/0004-add-volk_32f_s32f_x2_convert_8u-kernel.patch @@@ -1,0 -1,0 +1,789 @@@ ++From d630d45e6f273cdfe1da7082afb381b3832b4c8f Mon Sep 17 00:00:00 2001 ++From: =?UTF-8?q?Daniel=20Est=C3=A9vez?= ++Date: Thu, 19 Jan 2023 18:01:25 +0100 ++Subject: [PATCH 4/5] add volk_32f_s32f_x2_convert_8u kernel ++MIME-Version: 1.0 ++Content-Type: text/plain; charset=UTF-8 ++Content-Transfer-Encoding: 8bit ++ ++This adds a kernel that performs conversion from float to uint8_t ++using a scale and a bias, according to out = in * scale + bias. The ++output is clamped to the interval [0, 255] and rounded to the nearest ++integer. ++ ++The kernels are implemented mirroring those for ++volk_32f_s32f_convert_8i, but there is an additional avx2_fma kernel, ++since it makes sense to perform the scale and bias conversion using FMA ++if it is available. ++ ++Signed-off-by: Daniel Estévez ++--- ++ docs/kernels.dox | 1 + ++ kernels/volk/volk_32f_s32f_convertpuppet_8u.h | 105 +++ ++ kernels/volk/volk_32f_s32f_x2_convert_8u.h | 616 ++++++++++++++++++ ++ lib/kernel_tests.h | 2 + ++ 4 files changed, 724 insertions(+) ++ create mode 100644 kernels/volk/volk_32f_s32f_convertpuppet_8u.h ++ create mode 100644 kernels/volk/volk_32f_s32f_x2_convert_8u.h ++ ++diff --git a/docs/kernels.dox b/docs/kernels.dox ++index 35fca1e..e6930ad 100644 ++--- a/docs/kernels.dox +++++ b/docs/kernels.dox ++@@ -84,6 +84,7 @@ ++ \li \subpage volk_32f_s32f_power_32f ++ \li \subpage volk_32f_s32f_s32f_mod_range_32f ++ \li \subpage volk_32f_s32f_stddev_32f +++\li \subpage volk_32f_s32f_x2_convert_8u ++ \li \subpage volk_32f_sin_32f ++ \li \subpage volk_32f_sqrt_32f ++ \li \subpage volk_32f_stddev_and_mean_32f_x2 ++diff --git a/kernels/volk/volk_32f_s32f_convertpuppet_8u.h b/kernels/volk/volk_32f_s32f_convertpuppet_8u.h ++new file mode 100644 ++index 0000000..7f530c4 ++--- /dev/null +++++ b/kernels/volk/volk_32f_s32f_convertpuppet_8u.h ++@@ -0,0 +1,105 @@ +++/* -*- c++ -*- */ +++/* +++ * Copyright 2023 Daniel Estevez +++ * +++ * This file is part of VOLK +++ * +++ * SPDX-License-Identifier: LGPL-3.0-or-later +++ */ +++ +++#ifndef INCLUDED_VOLK_32F_S32F_MOD_CONVERTPUPPET_8U_H +++#define INCLUDED_VOLK_32F_S32F_MOD_CONVERTPUPPET_8U_H +++ +++#include +++#include +++ +++#ifdef LV_HAVE_GENERIC +++static inline void volk_32f_s32f_convertpuppet_8u_generic(uint8_t* output, +++ const float* input, +++ float scale, +++ unsigned int num_points) +++{ +++ volk_32f_s32f_x2_convert_8u_generic(output, input, scale, 128.0, num_points); +++} +++#endif +++ +++#if LV_HAVE_AVX2 && LV_HAVE_FMA +++static inline void volk_32f_s32f_convertpuppet_8u_u_avx2_fma(uint8_t* output, +++ const float* input, +++ float scale, +++ unsigned int num_points) +++{ +++ volk_32f_s32f_x2_convert_8u_u_avx2_fma(output, input, scale, 128.0, num_points); +++} +++#endif +++ +++#if LV_HAVE_AVX2 && LV_HAVE_FMA +++static inline void volk_32f_s32f_convertpuppet_8u_a_avx2_fma(uint8_t* output, +++ const float* input, +++ float scale, +++ unsigned int num_points) +++{ +++ volk_32f_s32f_x2_convert_8u_a_avx2_fma(output, input, scale, 128.0, num_points); +++} +++#endif +++ +++#ifdef LV_HAVE_AVX2 +++static inline void volk_32f_s32f_convertpuppet_8u_u_avx2(uint8_t* output, +++ const float* input, +++ float scale, +++ unsigned int num_points) +++{ +++ volk_32f_s32f_x2_convert_8u_u_avx2(output, input, scale, 128.0, num_points); +++} +++#endif +++ +++#ifdef LV_HAVE_AVX2 +++static inline void volk_32f_s32f_convertpuppet_8u_a_avx2(uint8_t* output, +++ const float* input, +++ float scale, +++ unsigned int num_points) +++{ +++ volk_32f_s32f_x2_convert_8u_a_avx2(output, input, scale, 128.0, num_points); +++} +++#endif +++ +++#ifdef LV_HAVE_SSE2 +++static inline void volk_32f_s32f_convertpuppet_8u_u_sse2(uint8_t* output, +++ const float* input, +++ float scale, +++ unsigned int num_points) +++{ +++ volk_32f_s32f_x2_convert_8u_u_sse2(output, input, scale, 128.0, num_points); +++} +++#endif +++ +++#ifdef LV_HAVE_SSE2 +++static inline void volk_32f_s32f_convertpuppet_8u_a_sse2(uint8_t* output, +++ const float* input, +++ float scale, +++ unsigned int num_points) +++{ +++ volk_32f_s32f_x2_convert_8u_a_sse2(output, input, scale, 128.0, num_points); +++} +++#endif +++ +++#ifdef LV_HAVE_SSE +++static inline void volk_32f_s32f_convertpuppet_8u_u_sse(uint8_t* output, +++ const float* input, +++ float scale, +++ unsigned int num_points) +++{ +++ volk_32f_s32f_x2_convert_8u_u_sse(output, input, scale, 128.0, num_points); +++} +++#endif +++ +++#ifdef LV_HAVE_SSE +++static inline void volk_32f_s32f_convertpuppet_8u_a_sse(uint8_t* output, +++ const float* input, +++ float scale, +++ unsigned int num_points) +++{ +++ volk_32f_s32f_x2_convert_8u_a_sse(output, input, scale, 128.0, num_points); +++} +++#endif +++#endif ++diff --git a/kernels/volk/volk_32f_s32f_x2_convert_8u.h b/kernels/volk/volk_32f_s32f_x2_convert_8u.h ++new file mode 100644 ++index 0000000..a52cdf2 ++--- /dev/null +++++ b/kernels/volk/volk_32f_s32f_x2_convert_8u.h ++@@ -0,0 +1,616 @@ +++/* -*- c++ -*- */ +++/* +++ * Copyright 2023 Daniel Estevez +++ * Copyright 2012, 2014 Free Software Foundation, Inc. +++ * +++ * This file is part of VOLK +++ * +++ * SPDX-License-Identifier: LGPL-3.0-or-later +++ */ +++ +++/*! +++ * \page volk_32f_s32f_x2_convert_8u +++ * +++ * \b Overview +++ * +++ * Converts a floating point number to an 8-bit unsigned int after applying a +++ * multiplicative scaling factor and an additive bias. +++ * +++ * Dispatcher Prototype +++ * \code +++ * void volk_32f_s32f_x2_convert_8u(uint8_t* outputVector, const float* inputVector, +++ const float scale, const float bias, unsigned int num_points) +++ * \endcode +++ * +++ * \b Inputs +++ * \li inputVector: the input vector of floats. +++ * \li scale: The value multiplied against each point in the input buffer. +++ * \li bias: The value added to each multiplication by the scale. +++ * \li num_points: The number of data points. +++ * +++ * \b Outputs +++ * \li outputVector: The output vector. +++ * +++ * \b Example +++ * Convert floats from [-1,1] to 8-bit unsigend integers with a scale of 128 and a bias of +++ 128 +++ * int N = 10; +++ * unsigned int alignment = volk_get_alignment(); +++ * float* increasing = (float*)volk_malloc(sizeof(float)*N, alignment); +++ * uint8_t* out = (uint8_t*)volk_malloc(sizeof(uint8_t)*N, alignment); +++ * +++ * for(unsigned int ii = 0; ii < N; ++ii){ +++ * increasing[ii] = 2.f * ((float)ii / (float)N) - 1.f; +++ * } +++ * +++ * float scale = 128.0f; +++ * float bias = 128.0f; +++ * +++ * volk_32f_s32f_x2_convert_8u(out, increasing, scale, bias, N); +++ * +++ * for(unsigned int ii = 0; ii < N; ++ii){ +++ * printf("out[%u] = %i\n", ii, out[ii]); +++ * } +++ * +++ * volk_free(increasing); +++ * volk_free(out); +++ * \endcode +++ */ +++ +++#ifndef INCLUDED_volk_32f_s32f_x2_convert_8u_u_H +++#define INCLUDED_volk_32f_s32f_x2_convert_8u_u_H +++ +++#include +++ +++static inline void volk_32f_s32f_x2_convert_8u_single(uint8_t* out, const float in) +++{ +++ const float min_val = 0.0f; +++ const float max_val = UINT8_MAX; +++ if (in > max_val) { +++ *out = (uint8_t)(max_val); +++ } else if (in < min_val) { +++ *out = (uint8_t)(min_val); +++ } else { +++ *out = (uint8_t)(rintf(in)); +++ } +++} +++ +++ +++#ifdef LV_HAVE_GENERIC +++ +++static inline void volk_32f_s32f_x2_convert_8u_generic(uint8_t* outputVector, +++ const float* inputVector, +++ const float scale, +++ const float bias, +++ unsigned int num_points) +++{ +++ const float* inputVectorPtr = inputVector; +++ +++ for (unsigned int number = 0; number < num_points; number++) { +++ const float r = *inputVectorPtr++ * scale + bias; +++ volk_32f_s32f_x2_convert_8u_single(&outputVector[number], r); +++ } +++} +++ +++#endif /* LV_HAVE_GENERIC */ +++ +++ +++#if LV_HAVE_AVX2 && LV_HAVE_FMA +++#include +++ +++static inline void volk_32f_s32f_x2_convert_8u_u_avx2_fma(uint8_t* outputVector, +++ const float* inputVector, +++ const float scale, +++ const float bias, +++ unsigned int num_points) +++{ +++ const unsigned int thirtysecondPoints = num_points / 32; +++ +++ const float* inputVectorPtr = (const float*)inputVector; +++ uint8_t* outputVectorPtr = outputVector; +++ +++ const float min_val = 0.0f; +++ const float max_val = UINT8_MAX; +++ const __m256 vmin_val = _mm256_set1_ps(min_val); +++ const __m256 vmax_val = _mm256_set1_ps(max_val); +++ +++ const __m256 vScale = _mm256_set1_ps(scale); +++ const __m256 vBias = _mm256_set1_ps(bias); +++ +++ for (unsigned int number = 0; number < thirtysecondPoints; number++) { +++ __m256 inputVal1 = _mm256_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal2 = _mm256_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal3 = _mm256_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal4 = _mm256_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ +++ inputVal1 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_fmadd_ps(inputVal1, vScale, vBias), vmax_val), vmin_val); +++ inputVal2 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_fmadd_ps(inputVal2, vScale, vBias), vmax_val), vmin_val); +++ inputVal3 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_fmadd_ps(inputVal3, vScale, vBias), vmax_val), vmin_val); +++ inputVal4 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_fmadd_ps(inputVal4, vScale, vBias), vmax_val), vmin_val); +++ +++ __m256i intInputVal1 = _mm256_cvtps_epi32(inputVal1); +++ __m256i intInputVal2 = _mm256_cvtps_epi32(inputVal2); +++ __m256i intInputVal3 = _mm256_cvtps_epi32(inputVal3); +++ __m256i intInputVal4 = _mm256_cvtps_epi32(inputVal4); +++ +++ intInputVal1 = _mm256_packs_epi32(intInputVal1, intInputVal2); +++ intInputVal1 = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ intInputVal3 = _mm256_packs_epi32(intInputVal3, intInputVal4); +++ intInputVal3 = _mm256_permute4x64_epi64(intInputVal3, 0b11011000); +++ +++ intInputVal1 = _mm256_packus_epi16(intInputVal1, intInputVal3); +++ const __m256i intInputVal = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ +++ _mm256_storeu_si256((__m256i*)outputVectorPtr, intInputVal); +++ outputVectorPtr += 32; +++ } +++ +++ for (unsigned int number = thirtysecondPoints * 32; number < num_points; number++) { +++ const float r = inputVector[number] * scale + bias; +++ volk_32f_s32f_x2_convert_8u_single(&outputVector[number], r); +++ } +++} +++ +++#endif /* LV_HAVE_AVX2 && LV_HAVE_FMA */ +++ +++ +++#ifdef LV_HAVE_AVX2 +++#include +++ +++static inline void volk_32f_s32f_x2_convert_8u_u_avx2(uint8_t* outputVector, +++ const float* inputVector, +++ const float scale, +++ const float bias, +++ unsigned int num_points) +++{ +++ const unsigned int thirtysecondPoints = num_points / 32; +++ +++ const float* inputVectorPtr = (const float*)inputVector; +++ uint8_t* outputVectorPtr = outputVector; +++ +++ const float min_val = 0.0f; +++ const float max_val = UINT8_MAX; +++ const __m256 vmin_val = _mm256_set1_ps(min_val); +++ const __m256 vmax_val = _mm256_set1_ps(max_val); +++ +++ const __m256 vScale = _mm256_set1_ps(scale); +++ const __m256 vBias = _mm256_set1_ps(bias); +++ +++ for (unsigned int number = 0; number < thirtysecondPoints; number++) { +++ __m256 inputVal1 = _mm256_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal2 = _mm256_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal3 = _mm256_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal4 = _mm256_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ +++ inputVal1 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_add_ps(_mm256_mul_ps(inputVal1, vScale), vBias), +++ vmax_val), +++ vmin_val); +++ inputVal2 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_add_ps(_mm256_mul_ps(inputVal2, vScale), vBias), +++ vmax_val), +++ vmin_val); +++ inputVal3 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_add_ps(_mm256_mul_ps(inputVal3, vScale), vBias), +++ vmax_val), +++ vmin_val); +++ inputVal4 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_add_ps(_mm256_mul_ps(inputVal4, vScale), vBias), +++ vmax_val), +++ vmin_val); +++ +++ __m256i intInputVal1 = _mm256_cvtps_epi32(inputVal1); +++ __m256i intInputVal2 = _mm256_cvtps_epi32(inputVal2); +++ __m256i intInputVal3 = _mm256_cvtps_epi32(inputVal3); +++ __m256i intInputVal4 = _mm256_cvtps_epi32(inputVal4); +++ +++ intInputVal1 = _mm256_packs_epi32(intInputVal1, intInputVal2); +++ intInputVal1 = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ intInputVal3 = _mm256_packs_epi32(intInputVal3, intInputVal4); +++ intInputVal3 = _mm256_permute4x64_epi64(intInputVal3, 0b11011000); +++ +++ intInputVal1 = _mm256_packus_epi16(intInputVal1, intInputVal3); +++ const __m256i intInputVal = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ +++ _mm256_storeu_si256((__m256i*)outputVectorPtr, intInputVal); +++ outputVectorPtr += 32; +++ } +++ +++ for (unsigned int number = thirtysecondPoints * 32; number < num_points; number++) { +++ float r = inputVector[number] * scale + bias; +++ volk_32f_s32f_x2_convert_8u_single(&outputVector[number], r); +++ } +++} +++ +++#endif /* LV_HAVE_AVX2 */ +++ +++ +++#ifdef LV_HAVE_SSE2 +++#include +++ +++static inline void volk_32f_s32f_x2_convert_8u_u_sse2(uint8_t* outputVector, +++ const float* inputVector, +++ const float scale, +++ const float bias, +++ unsigned int num_points) +++{ +++ const unsigned int sixteenthPoints = num_points / 16; +++ +++ const float* inputVectorPtr = (const float*)inputVector; +++ uint8_t* outputVectorPtr = outputVector; +++ +++ const float min_val = 0.0f; +++ const float max_val = UINT8_MAX; +++ const __m128 vmin_val = _mm_set_ps1(min_val); +++ const __m128 vmax_val = _mm_set_ps1(max_val); +++ +++ const __m128 vScale = _mm_set_ps1(scale); +++ const __m128 vBias = _mm_set_ps1(bias); +++ +++ for (unsigned int number = 0; number < sixteenthPoints; number++) { +++ __m128 inputVal1 = _mm_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ __m128 inputVal2 = _mm_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ __m128 inputVal3 = _mm_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ __m128 inputVal4 = _mm_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ +++ inputVal1 = _mm_max_ps( +++ _mm_min_ps(_mm_add_ps(_mm_mul_ps(inputVal1, vScale), vBias), vmax_val), +++ vmin_val); +++ inputVal2 = _mm_max_ps( +++ _mm_min_ps(_mm_add_ps(_mm_mul_ps(inputVal2, vScale), vBias), vmax_val), +++ vmin_val); +++ inputVal3 = _mm_max_ps( +++ _mm_min_ps(_mm_add_ps(_mm_mul_ps(inputVal3, vScale), vBias), vmax_val), +++ vmin_val); +++ inputVal4 = _mm_max_ps( +++ _mm_min_ps(_mm_add_ps(_mm_mul_ps(inputVal4, vScale), vBias), vmax_val), +++ vmin_val); +++ +++ __m128i intInputVal1 = _mm_cvtps_epi32(inputVal1); +++ __m128i intInputVal2 = _mm_cvtps_epi32(inputVal2); +++ __m128i intInputVal3 = _mm_cvtps_epi32(inputVal3); +++ __m128i intInputVal4 = _mm_cvtps_epi32(inputVal4); +++ +++ intInputVal1 = _mm_packs_epi32(intInputVal1, intInputVal2); +++ intInputVal3 = _mm_packs_epi32(intInputVal3, intInputVal4); +++ +++ intInputVal1 = _mm_packus_epi16(intInputVal1, intInputVal3); +++ +++ _mm_storeu_si128((__m128i*)outputVectorPtr, intInputVal1); +++ outputVectorPtr += 16; +++ } +++ +++ for (unsigned int number = sixteenthPoints * 16; number < num_points; number++) { +++ const float r = inputVector[number] * scale + bias; +++ volk_32f_s32f_x2_convert_8u_single(&outputVector[number], r); +++ } +++} +++ +++#endif /* LV_HAVE_SSE2 */ +++ +++ +++#ifdef LV_HAVE_SSE +++#include +++ +++static inline void volk_32f_s32f_x2_convert_8u_u_sse(uint8_t* outputVector, +++ const float* inputVector, +++ const float scale, +++ const float bias, +++ unsigned int num_points) +++{ +++ const unsigned int quarterPoints = num_points / 4; +++ +++ const float* inputVectorPtr = (const float*)inputVector; +++ uint8_t* outputVectorPtr = outputVector; +++ +++ const float min_val = 0.0f; +++ const float max_val = UINT8_MAX; +++ const __m128 vmin_val = _mm_set_ps1(min_val); +++ const __m128 vmax_val = _mm_set_ps1(max_val); +++ +++ const __m128 vScale = _mm_set_ps1(scale); +++ const __m128 vBias = _mm_set_ps1(bias); +++ +++ __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; +++ +++ for (unsigned int number = 0; number < quarterPoints; number++) { +++ __m128 ret = _mm_loadu_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ +++ ret = _mm_max_ps(_mm_min_ps(_mm_add_ps(_mm_mul_ps(ret, vScale), vBias), vmax_val), +++ vmin_val); +++ +++ _mm_store_ps(outputFloatBuffer, ret); +++ for (size_t inner_loop = 0; inner_loop < 4; inner_loop++) { +++ *outputVectorPtr++ = (uint8_t)(rintf(outputFloatBuffer[inner_loop])); +++ } +++ } +++ +++ for (unsigned int number = quarterPoints * 4; number < num_points; number++) { +++ const float r = inputVector[number] * scale + bias; +++ volk_32f_s32f_x2_convert_8u_single(&outputVector[number], r); +++ } +++} +++ +++#endif /* LV_HAVE_SSE */ +++ +++ +++#endif /* INCLUDED_volk_32f_s32f_x2_convert_8u_u_H */ +++#ifndef INCLUDED_volk_32f_s32f_x2_convert_8u_a_H +++#define INCLUDED_volk_32f_s32f_x2_convert_8u_a_H +++ +++#include +++#include +++ +++#if LV_HAVE_AVX2 && LV_HAVE_FMA +++#include +++ +++static inline void volk_32f_s32f_x2_convert_8u_a_avx2_fma(uint8_t* outputVector, +++ const float* inputVector, +++ const float scale, +++ const float bias, +++ unsigned int num_points) +++{ +++ const unsigned int thirtysecondPoints = num_points / 32; +++ +++ const float* inputVectorPtr = (const float*)inputVector; +++ uint8_t* outputVectorPtr = outputVector; +++ +++ const float min_val = 0.0f; +++ const float max_val = UINT8_MAX; +++ const __m256 vmin_val = _mm256_set1_ps(min_val); +++ const __m256 vmax_val = _mm256_set1_ps(max_val); +++ +++ const __m256 vScale = _mm256_set1_ps(scale); +++ const __m256 vBias = _mm256_set1_ps(bias); +++ +++ for (unsigned int number = 0; number < thirtysecondPoints; number++) { +++ __m256 inputVal1 = _mm256_load_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal2 = _mm256_load_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal3 = _mm256_load_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal4 = _mm256_load_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ +++ inputVal1 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_fmadd_ps(inputVal1, vScale, vBias), vmax_val), vmin_val); +++ inputVal2 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_fmadd_ps(inputVal2, vScale, vBias), vmax_val), vmin_val); +++ inputVal3 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_fmadd_ps(inputVal3, vScale, vBias), vmax_val), vmin_val); +++ inputVal4 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_fmadd_ps(inputVal4, vScale, vBias), vmax_val), vmin_val); +++ +++ __m256i intInputVal1 = _mm256_cvtps_epi32(inputVal1); +++ __m256i intInputVal2 = _mm256_cvtps_epi32(inputVal2); +++ __m256i intInputVal3 = _mm256_cvtps_epi32(inputVal3); +++ __m256i intInputVal4 = _mm256_cvtps_epi32(inputVal4); +++ +++ intInputVal1 = _mm256_packs_epi32(intInputVal1, intInputVal2); +++ intInputVal1 = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ intInputVal3 = _mm256_packs_epi32(intInputVal3, intInputVal4); +++ intInputVal3 = _mm256_permute4x64_epi64(intInputVal3, 0b11011000); +++ +++ intInputVal1 = _mm256_packus_epi16(intInputVal1, intInputVal3); +++ const __m256i intInputVal = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ +++ _mm256_store_si256((__m256i*)outputVectorPtr, intInputVal); +++ outputVectorPtr += 32; +++ } +++ +++ for (unsigned int number = thirtysecondPoints * 32; number < num_points; number++) { +++ const float r = inputVector[number] * scale + bias; +++ volk_32f_s32f_x2_convert_8u_single(&outputVector[number], r); +++ } +++} +++ +++#endif /* LV_HAVE_AVX2 && LV_HAVE_FMA */ +++ +++ +++#ifdef LV_HAVE_AVX2 +++#include +++ +++static inline void volk_32f_s32f_x2_convert_8u_a_avx2(uint8_t* outputVector, +++ const float* inputVector, +++ const float scale, +++ const float bias, +++ unsigned int num_points) +++{ +++ const unsigned int thirtysecondPoints = num_points / 32; +++ +++ const float* inputVectorPtr = (const float*)inputVector; +++ uint8_t* outputVectorPtr = outputVector; +++ +++ const float min_val = 0.0f; +++ const float max_val = UINT8_MAX; +++ const __m256 vmin_val = _mm256_set1_ps(min_val); +++ const __m256 vmax_val = _mm256_set1_ps(max_val); +++ +++ const __m256 vScale = _mm256_set1_ps(scale); +++ const __m256 vBias = _mm256_set1_ps(bias); +++ +++ for (unsigned int number = 0; number < thirtysecondPoints; number++) { +++ __m256 inputVal1 = _mm256_load_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal2 = _mm256_load_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal3 = _mm256_load_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ __m256 inputVal4 = _mm256_load_ps(inputVectorPtr); +++ inputVectorPtr += 8; +++ +++ inputVal1 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_add_ps(_mm256_mul_ps(inputVal1, vScale), vBias), +++ vmax_val), +++ vmin_val); +++ inputVal2 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_add_ps(_mm256_mul_ps(inputVal2, vScale), vBias), +++ vmax_val), +++ vmin_val); +++ inputVal3 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_add_ps(_mm256_mul_ps(inputVal3, vScale), vBias), +++ vmax_val), +++ vmin_val); +++ inputVal4 = _mm256_max_ps( +++ _mm256_min_ps(_mm256_add_ps(_mm256_mul_ps(inputVal4, vScale), vBias), +++ vmax_val), +++ vmin_val); +++ +++ __m256i intInputVal1 = _mm256_cvtps_epi32(inputVal1); +++ __m256i intInputVal2 = _mm256_cvtps_epi32(inputVal2); +++ __m256i intInputVal3 = _mm256_cvtps_epi32(inputVal3); +++ __m256i intInputVal4 = _mm256_cvtps_epi32(inputVal4); +++ +++ intInputVal1 = _mm256_packs_epi32(intInputVal1, intInputVal2); +++ intInputVal1 = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ intInputVal3 = _mm256_packs_epi32(intInputVal3, intInputVal4); +++ intInputVal3 = _mm256_permute4x64_epi64(intInputVal3, 0b11011000); +++ +++ intInputVal1 = _mm256_packus_epi16(intInputVal1, intInputVal3); +++ const __m256i intInputVal = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ +++ _mm256_store_si256((__m256i*)outputVectorPtr, intInputVal); +++ outputVectorPtr += 32; +++ } +++ +++ for (unsigned int number = thirtysecondPoints * 32; number < num_points; number++) { +++ const float r = inputVector[number] * scale + bias; +++ volk_32f_s32f_x2_convert_8u_single(&outputVector[number], r); +++ } +++} +++ +++#endif /* LV_HAVE_AVX2 */ +++ +++ +++#ifdef LV_HAVE_SSE2 +++#include +++ +++static inline void volk_32f_s32f_x2_convert_8u_a_sse2(uint8_t* outputVector, +++ const float* inputVector, +++ const float scale, +++ const float bias, +++ unsigned int num_points) +++{ +++ const unsigned int sixteenthPoints = num_points / 16; +++ +++ const float* inputVectorPtr = (const float*)inputVector; +++ uint8_t* outputVectorPtr = outputVector; +++ +++ const float min_val = 0.0f; +++ const float max_val = UINT8_MAX; +++ const __m128 vmin_val = _mm_set_ps1(min_val); +++ const __m128 vmax_val = _mm_set_ps1(max_val); +++ +++ const __m128 vScale = _mm_set_ps1(scale); +++ const __m128 vBias = _mm_set_ps1(bias); +++ +++ for (unsigned int number = 0; number < sixteenthPoints; number++) { +++ __m128 inputVal1 = _mm_load_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ __m128 inputVal2 = _mm_load_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ __m128 inputVal3 = _mm_load_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ __m128 inputVal4 = _mm_load_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ +++ inputVal1 = _mm_max_ps( +++ _mm_min_ps(_mm_add_ps(_mm_mul_ps(inputVal1, vScale), vBias), vmax_val), +++ vmin_val); +++ inputVal2 = _mm_max_ps( +++ _mm_min_ps(_mm_add_ps(_mm_mul_ps(inputVal2, vScale), vBias), vmax_val), +++ vmin_val); +++ inputVal3 = _mm_max_ps( +++ _mm_min_ps(_mm_add_ps(_mm_mul_ps(inputVal3, vScale), vBias), vmax_val), +++ vmin_val); +++ inputVal4 = _mm_max_ps( +++ _mm_min_ps(_mm_add_ps(_mm_mul_ps(inputVal4, vScale), vBias), vmax_val), +++ vmin_val); +++ +++ __m128i intInputVal1 = _mm_cvtps_epi32(inputVal1); +++ __m128i intInputVal2 = _mm_cvtps_epi32(inputVal2); +++ __m128i intInputVal3 = _mm_cvtps_epi32(inputVal3); +++ __m128i intInputVal4 = _mm_cvtps_epi32(inputVal4); +++ +++ intInputVal1 = _mm_packs_epi32(intInputVal1, intInputVal2); +++ intInputVal3 = _mm_packs_epi32(intInputVal3, intInputVal4); +++ +++ intInputVal1 = _mm_packus_epi16(intInputVal1, intInputVal3); +++ +++ _mm_store_si128((__m128i*)outputVectorPtr, intInputVal1); +++ outputVectorPtr += 16; +++ } +++ +++ for (unsigned int number = sixteenthPoints * 16; number < num_points; number++) { +++ const float r = inputVector[number] * scale + bias; +++ volk_32f_s32f_x2_convert_8u_single(&outputVector[number], r); +++ } +++} +++#endif /* LV_HAVE_SSE2 */ +++ +++ +++#ifdef LV_HAVE_SSE +++#include +++ +++static inline void volk_32f_s32f_x2_convert_8u_a_sse(uint8_t* outputVector, +++ const float* inputVector, +++ const float scale, +++ const float bias, +++ unsigned int num_points) +++{ +++ const unsigned int quarterPoints = num_points / 4; +++ +++ const float* inputVectorPtr = (const float*)inputVector; +++ uint8_t* outputVectorPtr = outputVector; +++ +++ const float min_val = 0.0f; +++ const float max_val = UINT8_MAX; +++ const __m128 vmin_val = _mm_set_ps1(min_val); +++ const __m128 vmax_val = _mm_set_ps1(max_val); +++ +++ const __m128 vScalar = _mm_set_ps1(scale); +++ const __m128 vBias = _mm_set_ps1(bias); +++ +++ __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; +++ +++ for (unsigned int number = 0; number < quarterPoints; number++) { +++ __m128 ret = _mm_load_ps(inputVectorPtr); +++ inputVectorPtr += 4; +++ +++ ret = _mm_max_ps( +++ _mm_min_ps(_mm_add_ps(_mm_mul_ps(ret, vScalar), vBias), vmax_val), vmin_val); +++ +++ _mm_store_ps(outputFloatBuffer, ret); +++ for (size_t inner_loop = 0; inner_loop < 4; inner_loop++) { +++ *outputVectorPtr++ = (uint8_t)(rintf(outputFloatBuffer[inner_loop])); +++ } +++ } +++ +++ for (unsigned int number = quarterPoints * 4; number < num_points; number++) { +++ const float r = inputVector[number] * scale + bias; +++ volk_32f_s32f_x2_convert_8u_single(&outputVector[number], r); +++ } +++} +++ +++#endif /* LV_HAVE_SSE */ +++ +++ +++#endif /* INCLUDED_volk_32f_s32f_x2_convert_8u_a_H */ ++diff --git a/lib/kernel_tests.h b/lib/kernel_tests.h ++index 72330e4..d7e4ad1 100644 ++--- a/lib/kernel_tests.h +++++ b/lib/kernel_tests.h ++@@ -173,6 +173,8 @@ std::vector init_test_list(volk_test_params_t test_params) ++ QA(VOLK_INIT_PUPP(volk_32fc_s32f_power_spectral_densitypuppet_32f, ++ volk_32fc_s32f_x2_power_spectral_density_32f, ++ test_params)) +++ QA(VOLK_INIT_PUPP( +++ volk_32f_s32f_convertpuppet_8u, volk_32f_s32f_x2_convert_8u, test_params)) ++ // no one uses these, so don't test them ++ // VOLK_PROFILE(volk_16i_x5_add_quad_16i_x4, 1e-4, 2046, 10000, &results, ++ // benchmark_mode, kernel_regex); VOLK_PROFILE(volk_16i_branch_4_state_8, 1e-4, 2046, ++-- ++2.39.2 ++ diff --cc debian/patches/0005-volk_32f_s32f_convert_8i-code-style.patch index 0000000,0000000..9ec6119 new file mode 100644 --- /dev/null +++ b/debian/patches/0005-volk_32f_s32f_convert_8i-code-style.patch @@@ -1,0 -1,0 +1,540 @@@ ++From 59c9539bfd1dff6d3e7b6c9f6ddd9588d4fcb661 Mon Sep 17 00:00:00 2001 ++From: =?UTF-8?q?Daniel=20Est=C3=A9vez?= ++Date: Sun, 22 Jan 2023 16:10:27 +0100 ++Subject: [PATCH 5/5] volk_32f_s32f_convert_8i: code style ++MIME-Version: 1.0 ++Content-Type: text/plain; charset=UTF-8 ++Content-Transfer-Encoding: 8bit ++ ++Apply code style suggestions from #617. ++ ++Signed-off-by: Daniel Estévez ++--- ++ kernels/volk/volk_32f_s32f_convert_8i.h | 283 +++++++++--------------- ++ 1 file changed, 110 insertions(+), 173 deletions(-) ++ ++diff --git a/kernels/volk/volk_32f_s32f_convert_8i.h b/kernels/volk/volk_32f_s32f_convert_8i.h ++index 4d7c5ca..d47f95a 100644 ++--- a/kernels/volk/volk_32f_s32f_convert_8i.h +++++ b/kernels/volk/volk_32f_s32f_convert_8i.h ++@@ -30,12 +30,12 @@ ++ * \li outputVector: The output vector. ++ * ++ * \b Example ++- * Convert floats from [-1,1] to 16-bit integers with a scale of 5 to maintain smallest +++ * Convert floats from [-1,1] to 8-bit integers with a scale of 5 to maintain smallest ++ delta ++ * int N = 10; ++ * unsigned int alignment = volk_get_alignment(); ++ * float* increasing = (float*)volk_malloc(sizeof(float)*N, alignment); ++- * int16_t* out = (int16_t*)volk_malloc(sizeof(int16_t)*N, alignment); +++ * int8_t* out = (int8_t*)volk_malloc(sizeof(int8_t)*N, alignment); ++ * ++ * for(unsigned int ii = 0; ii < N; ++ii){ ++ * increasing[ii] = 2.f * ((float)ii / (float)N) - 1.f; ++@@ -46,7 +46,7 @@ ++ ++ * float scale = 5.1f; ++ * ++- * volk_32f_s32f_convert_32i(out, increasing, scale, N); +++ * volk_32f_s32f_convert_8i(out, increasing, scale, N); ++ * ++ * for(unsigned int ii = 0; ii < N; ++ii){ ++ * printf("out[%u] = %i\n", ii, out[ii]); ++@@ -61,12 +61,11 @@ ++ #define INCLUDED_volk_32f_s32f_convert_8i_u_H ++ ++ #include ++-#include ++ ++ static inline void volk_32f_s32f_convert_8i_single(int8_t* out, const float in) ++ { ++- float min_val = INT8_MIN; ++- float max_val = INT8_MAX; +++ const float min_val = INT8_MIN; +++ const float max_val = INT8_MAX; ++ if (in > max_val) { ++ *out = (int8_t)(max_val); ++ } else if (in < min_val) { ++@@ -76,6 +75,24 @@ static inline void volk_32f_s32f_convert_8i_single(int8_t* out, const float in) ++ } ++ } ++ +++#ifdef LV_HAVE_GENERIC +++ +++static inline void volk_32f_s32f_convert_8i_generic(int8_t* outputVector, +++ const float* inputVector, +++ const float scalar, +++ unsigned int num_points) +++{ +++ const float* inputVectorPtr = inputVector; +++ +++ for (unsigned int number = 0; number < num_points; number++) { +++ const float r = *inputVectorPtr++ * scalar; +++ volk_32f_s32f_convert_8i_single(&outputVector[number], r); +++ } +++} +++ +++#endif /* LV_HAVE_GENERIC */ +++ +++ ++ #ifdef LV_HAVE_AVX2 ++ #include ++ ++@@ -84,32 +101,26 @@ static inline void volk_32f_s32f_convert_8i_u_avx2(int8_t* outputVector, ++ const float scalar, ++ unsigned int num_points) ++ { ++- unsigned int number = 0; ++- ++ const unsigned int thirtysecondPoints = num_points / 32; ++ ++ const float* inputVectorPtr = (const float*)inputVector; ++ int8_t* outputVectorPtr = outputVector; ++ ++- float min_val = INT8_MIN; ++- float max_val = INT8_MAX; ++- float r; +++ const float min_val = INT8_MIN; +++ const float max_val = INT8_MAX; +++ const __m256 vmin_val = _mm256_set1_ps(min_val); +++ const __m256 vmax_val = _mm256_set1_ps(max_val); ++ ++- __m256 vScalar = _mm256_set1_ps(scalar); ++- __m256 inputVal1, inputVal2, inputVal3, inputVal4; ++- __m256i intInputVal1, intInputVal2, intInputVal3, intInputVal4; ++- __m256 vmin_val = _mm256_set1_ps(min_val); ++- __m256 vmax_val = _mm256_set1_ps(max_val); ++- __m256i intInputVal; +++ const __m256 vScalar = _mm256_set1_ps(scalar); ++ ++- for (; number < thirtysecondPoints; number++) { ++- inputVal1 = _mm256_loadu_ps(inputVectorPtr); +++ for (unsigned int number = 0; number < thirtysecondPoints; number++) { +++ __m256 inputVal1 = _mm256_loadu_ps(inputVectorPtr); ++ inputVectorPtr += 8; ++- inputVal2 = _mm256_loadu_ps(inputVectorPtr); +++ __m256 inputVal2 = _mm256_loadu_ps(inputVectorPtr); ++ inputVectorPtr += 8; ++- inputVal3 = _mm256_loadu_ps(inputVectorPtr); +++ __m256 inputVal3 = _mm256_loadu_ps(inputVectorPtr); ++ inputVectorPtr += 8; ++- inputVal4 = _mm256_loadu_ps(inputVectorPtr); +++ __m256 inputVal4 = _mm256_loadu_ps(inputVectorPtr); ++ inputVectorPtr += 8; ++ ++ inputVal1 = _mm256_max_ps( ++@@ -121,10 +132,10 @@ static inline void volk_32f_s32f_convert_8i_u_avx2(int8_t* outputVector, ++ inputVal4 = _mm256_max_ps( ++ _mm256_min_ps(_mm256_mul_ps(inputVal4, vScalar), vmax_val), vmin_val); ++ ++- intInputVal1 = _mm256_cvtps_epi32(inputVal1); ++- intInputVal2 = _mm256_cvtps_epi32(inputVal2); ++- intInputVal3 = _mm256_cvtps_epi32(inputVal3); ++- intInputVal4 = _mm256_cvtps_epi32(inputVal4); +++ __m256i intInputVal1 = _mm256_cvtps_epi32(inputVal1); +++ __m256i intInputVal2 = _mm256_cvtps_epi32(inputVal2); +++ __m256i intInputVal3 = _mm256_cvtps_epi32(inputVal3); +++ __m256i intInputVal4 = _mm256_cvtps_epi32(inputVal4); ++ ++ intInputVal1 = _mm256_packs_epi32(intInputVal1, intInputVal2); ++ intInputVal1 = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); ++@@ -132,15 +143,14 @@ static inline void volk_32f_s32f_convert_8i_u_avx2(int8_t* outputVector, ++ intInputVal3 = _mm256_permute4x64_epi64(intInputVal3, 0b11011000); ++ ++ intInputVal1 = _mm256_packs_epi16(intInputVal1, intInputVal3); ++- intInputVal = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ const __m256i intInputVal = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); ++ ++ _mm256_storeu_si256((__m256i*)outputVectorPtr, intInputVal); ++ outputVectorPtr += 32; ++ } ++ ++- number = thirtysecondPoints * 32; ++- for (; number < num_points; number++) { ++- r = inputVector[number] * scalar; +++ for (unsigned int number = thirtysecondPoints * 32; number < num_points; number++) { +++ float r = inputVector[number] * scalar; ++ volk_32f_s32f_convert_8i_single(&outputVector[number], r); ++ } ++ } ++@@ -156,31 +166,26 @@ static inline void volk_32f_s32f_convert_8i_u_sse2(int8_t* outputVector, ++ const float scalar, ++ unsigned int num_points) ++ { ++- unsigned int number = 0; ++- ++ const unsigned int sixteenthPoints = num_points / 16; ++ ++ const float* inputVectorPtr = (const float*)inputVector; ++ int8_t* outputVectorPtr = outputVector; ++ ++- float min_val = INT8_MIN; ++- float max_val = INT8_MAX; ++- float r; +++ const float min_val = INT8_MIN; +++ const float max_val = INT8_MAX; +++ const __m128 vmin_val = _mm_set_ps1(min_val); +++ const __m128 vmax_val = _mm_set_ps1(max_val); ++ ++- __m128 vScalar = _mm_set_ps1(scalar); ++- __m128 inputVal1, inputVal2, inputVal3, inputVal4; ++- __m128i intInputVal1, intInputVal2, intInputVal3, intInputVal4; ++- __m128 vmin_val = _mm_set_ps1(min_val); ++- __m128 vmax_val = _mm_set_ps1(max_val); +++ const __m128 vScalar = _mm_set_ps1(scalar); ++ ++- for (; number < sixteenthPoints; number++) { ++- inputVal1 = _mm_loadu_ps(inputVectorPtr); +++ for (unsigned int number = 0; number < sixteenthPoints; number++) { +++ __m128 inputVal1 = _mm_loadu_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++- inputVal2 = _mm_loadu_ps(inputVectorPtr); +++ __m128 inputVal2 = _mm_loadu_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++- inputVal3 = _mm_loadu_ps(inputVectorPtr); +++ __m128 inputVal3 = _mm_loadu_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++- inputVal4 = _mm_loadu_ps(inputVectorPtr); +++ __m128 inputVal4 = _mm_loadu_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++ ++ inputVal1 = ++@@ -192,10 +197,10 @@ static inline void volk_32f_s32f_convert_8i_u_sse2(int8_t* outputVector, ++ inputVal4 = ++ _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal4, vScalar), vmax_val), vmin_val); ++ ++- intInputVal1 = _mm_cvtps_epi32(inputVal1); ++- intInputVal2 = _mm_cvtps_epi32(inputVal2); ++- intInputVal3 = _mm_cvtps_epi32(inputVal3); ++- intInputVal4 = _mm_cvtps_epi32(inputVal4); +++ __m128i intInputVal1 = _mm_cvtps_epi32(inputVal1); +++ __m128i intInputVal2 = _mm_cvtps_epi32(inputVal2); +++ __m128i intInputVal3 = _mm_cvtps_epi32(inputVal3); +++ __m128i intInputVal4 = _mm_cvtps_epi32(inputVal4); ++ ++ intInputVal1 = _mm_packs_epi32(intInputVal1, intInputVal2); ++ intInputVal3 = _mm_packs_epi32(intInputVal3, intInputVal4); ++@@ -206,9 +211,8 @@ static inline void volk_32f_s32f_convert_8i_u_sse2(int8_t* outputVector, ++ outputVectorPtr += 16; ++ } ++ ++- number = sixteenthPoints * 16; ++- for (; number < num_points; number++) { ++- r = inputVector[number] * scalar; +++ for (unsigned int number = sixteenthPoints * 16; number < num_points; number++) { +++ const float r = inputVector[number] * scalar; ++ volk_32f_s32f_convert_8i_single(&outputVector[number], r); ++ } ++ } ++@@ -224,40 +228,34 @@ static inline void volk_32f_s32f_convert_8i_u_sse(int8_t* outputVector, ++ const float scalar, ++ unsigned int num_points) ++ { ++- unsigned int number = 0; ++- size_t inner_loop; ++- ++ const unsigned int quarterPoints = num_points / 4; ++ ++ const float* inputVectorPtr = (const float*)inputVector; ++ int8_t* outputVectorPtr = outputVector; ++ ++- float min_val = INT8_MIN; ++- float max_val = INT8_MAX; ++- float r; +++ const float min_val = INT8_MIN; +++ const float max_val = INT8_MAX; +++ const __m128 vmin_val = _mm_set_ps1(min_val); +++ const __m128 vmax_val = _mm_set_ps1(max_val); ++ ++- __m128 vScalar = _mm_set_ps1(scalar); ++- __m128 ret; ++- __m128 vmin_val = _mm_set_ps1(min_val); ++- __m128 vmax_val = _mm_set_ps1(max_val); +++ const __m128 vScalar = _mm_set_ps1(scalar); ++ ++ __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; ++ ++- for (; number < quarterPoints; number++) { ++- ret = _mm_loadu_ps(inputVectorPtr); +++ for (unsigned int number = 0; number < quarterPoints; number++) { +++ __m128 ret = _mm_loadu_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++ ++ ret = _mm_max_ps(_mm_min_ps(_mm_mul_ps(ret, vScalar), vmax_val), vmin_val); ++ ++ _mm_store_ps(outputFloatBuffer, ret); ++- for (inner_loop = 0; inner_loop < 4; inner_loop++) { +++ for (size_t inner_loop = 0; inner_loop < 4; inner_loop++) { ++ *outputVectorPtr++ = (int8_t)(rintf(outputFloatBuffer[inner_loop])); ++ } ++ } ++ ++- number = quarterPoints * 4; ++- for (; number < num_points; number++) { ++- r = inputVector[number] * scalar; +++ for (unsigned int number = quarterPoints * 4; number < num_points; number++) { +++ const float r = inputVector[number] * scalar; ++ volk_32f_s32f_convert_8i_single(&outputVector[number], r); ++ } ++ } ++@@ -265,33 +263,11 @@ static inline void volk_32f_s32f_convert_8i_u_sse(int8_t* outputVector, ++ #endif /* LV_HAVE_SSE */ ++ ++ ++-#ifdef LV_HAVE_GENERIC ++- ++-static inline void volk_32f_s32f_convert_8i_generic(int8_t* outputVector, ++- const float* inputVector, ++- const float scalar, ++- unsigned int num_points) ++-{ ++- const float* inputVectorPtr = inputVector; ++- unsigned int number = 0; ++- float r; ++- ++- for (number = 0; number < num_points; number++) { ++- r = *inputVectorPtr++ * scalar; ++- volk_32f_s32f_convert_8i_single(&outputVector[number], r); ++- } ++-} ++- ++-#endif /* LV_HAVE_GENERIC */ ++- ++- ++ #endif /* INCLUDED_volk_32f_s32f_convert_8i_u_H */ ++ #ifndef INCLUDED_volk_32f_s32f_convert_8i_a_H ++ #define INCLUDED_volk_32f_s32f_convert_8i_a_H ++ ++ #include ++-#include ++-#include ++ ++ #ifdef LV_HAVE_AVX2 ++ #include ++@@ -301,32 +277,26 @@ static inline void volk_32f_s32f_convert_8i_a_avx2(int8_t* outputVector, ++ const float scalar, ++ unsigned int num_points) ++ { ++- unsigned int number = 0; ++- ++ const unsigned int thirtysecondPoints = num_points / 32; ++ ++ const float* inputVectorPtr = (const float*)inputVector; ++ int8_t* outputVectorPtr = outputVector; ++ ++- float min_val = INT8_MIN; ++- float max_val = INT8_MAX; ++- float r; +++ const float min_val = INT8_MIN; +++ const float max_val = INT8_MAX; +++ const __m256 vmin_val = _mm256_set1_ps(min_val); +++ const __m256 vmax_val = _mm256_set1_ps(max_val); ++ ++- __m256 vScalar = _mm256_set1_ps(scalar); ++- __m256 inputVal1, inputVal2, inputVal3, inputVal4; ++- __m256i intInputVal1, intInputVal2, intInputVal3, intInputVal4; ++- __m256 vmin_val = _mm256_set1_ps(min_val); ++- __m256 vmax_val = _mm256_set1_ps(max_val); ++- __m256i intInputVal; +++ const __m256 vScalar = _mm256_set1_ps(scalar); ++ ++- for (; number < thirtysecondPoints; number++) { ++- inputVal1 = _mm256_load_ps(inputVectorPtr); +++ for (unsigned int number = 0; number < thirtysecondPoints; number++) { +++ __m256 inputVal1 = _mm256_load_ps(inputVectorPtr); ++ inputVectorPtr += 8; ++- inputVal2 = _mm256_load_ps(inputVectorPtr); +++ __m256 inputVal2 = _mm256_load_ps(inputVectorPtr); ++ inputVectorPtr += 8; ++- inputVal3 = _mm256_load_ps(inputVectorPtr); +++ __m256 inputVal3 = _mm256_load_ps(inputVectorPtr); ++ inputVectorPtr += 8; ++- inputVal4 = _mm256_load_ps(inputVectorPtr); +++ __m256 inputVal4 = _mm256_load_ps(inputVectorPtr); ++ inputVectorPtr += 8; ++ ++ inputVal1 = _mm256_max_ps( ++@@ -338,10 +308,10 @@ static inline void volk_32f_s32f_convert_8i_a_avx2(int8_t* outputVector, ++ inputVal4 = _mm256_max_ps( ++ _mm256_min_ps(_mm256_mul_ps(inputVal4, vScalar), vmax_val), vmin_val); ++ ++- intInputVal1 = _mm256_cvtps_epi32(inputVal1); ++- intInputVal2 = _mm256_cvtps_epi32(inputVal2); ++- intInputVal3 = _mm256_cvtps_epi32(inputVal3); ++- intInputVal4 = _mm256_cvtps_epi32(inputVal4); +++ __m256i intInputVal1 = _mm256_cvtps_epi32(inputVal1); +++ __m256i intInputVal2 = _mm256_cvtps_epi32(inputVal2); +++ __m256i intInputVal3 = _mm256_cvtps_epi32(inputVal3); +++ __m256i intInputVal4 = _mm256_cvtps_epi32(inputVal4); ++ ++ intInputVal1 = _mm256_packs_epi32(intInputVal1, intInputVal2); ++ intInputVal1 = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); ++@@ -349,15 +319,14 @@ static inline void volk_32f_s32f_convert_8i_a_avx2(int8_t* outputVector, ++ intInputVal3 = _mm256_permute4x64_epi64(intInputVal3, 0b11011000); ++ ++ intInputVal1 = _mm256_packs_epi16(intInputVal1, intInputVal3); ++- intInputVal = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); +++ __m256i intInputVal = _mm256_permute4x64_epi64(intInputVal1, 0b11011000); ++ ++ _mm256_store_si256((__m256i*)outputVectorPtr, intInputVal); ++ outputVectorPtr += 32; ++ } ++ ++- number = thirtysecondPoints * 32; ++- for (; number < num_points; number++) { ++- r = inputVector[number] * scalar; +++ for (unsigned int number = thirtysecondPoints * 32; number < num_points; number++) { +++ const float r = inputVector[number] * scalar; ++ volk_32f_s32f_convert_8i_single(&outputVector[number], r); ++ } ++ } ++@@ -373,31 +342,26 @@ static inline void volk_32f_s32f_convert_8i_a_sse2(int8_t* outputVector, ++ const float scalar, ++ unsigned int num_points) ++ { ++- unsigned int number = 0; ++- ++ const unsigned int sixteenthPoints = num_points / 16; ++ ++ const float* inputVectorPtr = (const float*)inputVector; ++ int8_t* outputVectorPtr = outputVector; ++ ++- float min_val = INT8_MIN; ++- float max_val = INT8_MAX; ++- float r; +++ const float min_val = INT8_MIN; +++ const float max_val = INT8_MAX; +++ const __m128 vmin_val = _mm_set_ps1(min_val); +++ const __m128 vmax_val = _mm_set_ps1(max_val); ++ ++- __m128 vScalar = _mm_set_ps1(scalar); ++- __m128 inputVal1, inputVal2, inputVal3, inputVal4; ++- __m128i intInputVal1, intInputVal2, intInputVal3, intInputVal4; ++- __m128 vmin_val = _mm_set_ps1(min_val); ++- __m128 vmax_val = _mm_set_ps1(max_val); +++ const __m128 vScalar = _mm_set_ps1(scalar); ++ ++- for (; number < sixteenthPoints; number++) { ++- inputVal1 = _mm_load_ps(inputVectorPtr); +++ for (unsigned int number = 0; number < sixteenthPoints; number++) { +++ __m128 inputVal1 = _mm_load_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++- inputVal2 = _mm_load_ps(inputVectorPtr); +++ __m128 inputVal2 = _mm_load_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++- inputVal3 = _mm_load_ps(inputVectorPtr); +++ __m128 inputVal3 = _mm_load_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++- inputVal4 = _mm_load_ps(inputVectorPtr); +++ __m128 inputVal4 = _mm_load_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++ ++ inputVal1 = ++@@ -409,10 +373,10 @@ static inline void volk_32f_s32f_convert_8i_a_sse2(int8_t* outputVector, ++ inputVal4 = ++ _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal4, vScalar), vmax_val), vmin_val); ++ ++- intInputVal1 = _mm_cvtps_epi32(inputVal1); ++- intInputVal2 = _mm_cvtps_epi32(inputVal2); ++- intInputVal3 = _mm_cvtps_epi32(inputVal3); ++- intInputVal4 = _mm_cvtps_epi32(inputVal4); +++ __m128i intInputVal1 = _mm_cvtps_epi32(inputVal1); +++ __m128i intInputVal2 = _mm_cvtps_epi32(inputVal2); +++ __m128i intInputVal3 = _mm_cvtps_epi32(inputVal3); +++ __m128i intInputVal4 = _mm_cvtps_epi32(inputVal4); ++ ++ intInputVal1 = _mm_packs_epi32(intInputVal1, intInputVal2); ++ intInputVal3 = _mm_packs_epi32(intInputVal3, intInputVal4); ++@@ -423,9 +387,8 @@ static inline void volk_32f_s32f_convert_8i_a_sse2(int8_t* outputVector, ++ outputVectorPtr += 16; ++ } ++ ++- number = sixteenthPoints * 16; ++- for (; number < num_points; number++) { ++- r = inputVector[number] * scalar; +++ for (unsigned int number = sixteenthPoints * 16; number < num_points; number++) { +++ const float r = inputVector[number] * scalar; ++ volk_32f_s32f_convert_8i_single(&outputVector[number], r); ++ } ++ } ++@@ -440,40 +403,34 @@ static inline void volk_32f_s32f_convert_8i_a_sse(int8_t* outputVector, ++ const float scalar, ++ unsigned int num_points) ++ { ++- unsigned int number = 0; ++- size_t inner_loop; ++- ++ const unsigned int quarterPoints = num_points / 4; ++ ++ const float* inputVectorPtr = (const float*)inputVector; +++ int8_t* outputVectorPtr = outputVector; ++ ++- float min_val = INT8_MIN; ++- float max_val = INT8_MAX; ++- float r; +++ const float min_val = INT8_MIN; +++ const float max_val = INT8_MAX; +++ const __m128 vmin_val = _mm_set_ps1(min_val); +++ const __m128 vmax_val = _mm_set_ps1(max_val); ++ ++- int8_t* outputVectorPtr = outputVector; ++- __m128 vScalar = _mm_set_ps1(scalar); ++- __m128 ret; ++- __m128 vmin_val = _mm_set_ps1(min_val); ++- __m128 vmax_val = _mm_set_ps1(max_val); +++ const __m128 vScalar = _mm_set_ps1(scalar); ++ ++ __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; ++ ++- for (; number < quarterPoints; number++) { ++- ret = _mm_load_ps(inputVectorPtr); +++ for (unsigned int number = 0; number < quarterPoints; number++) { +++ __m128 ret = _mm_load_ps(inputVectorPtr); ++ inputVectorPtr += 4; ++ ++ ret = _mm_max_ps(_mm_min_ps(_mm_mul_ps(ret, vScalar), vmax_val), vmin_val); ++ ++ _mm_store_ps(outputFloatBuffer, ret); ++- for (inner_loop = 0; inner_loop < 4; inner_loop++) { +++ for (size_t inner_loop = 0; inner_loop < 4; inner_loop++) { ++ *outputVectorPtr++ = (int8_t)(rintf(outputFloatBuffer[inner_loop])); ++ } ++ } ++ ++- number = quarterPoints * 4; ++- for (; number < num_points; number++) { ++- r = inputVector[number] * scalar; +++ for (unsigned int number = quarterPoints * 4; number < num_points; number++) { +++ const float r = inputVector[number] * scalar; ++ volk_32f_s32f_convert_8i_single(&outputVector[number], r); ++ } ++ } ++@@ -481,24 +438,4 @@ static inline void volk_32f_s32f_convert_8i_a_sse(int8_t* outputVector, ++ #endif /* LV_HAVE_SSE */ ++ ++ ++-#ifdef LV_HAVE_GENERIC ++- ++-static inline void volk_32f_s32f_convert_8i_a_generic(int8_t* outputVector, ++- const float* inputVector, ++- const float scalar, ++- unsigned int num_points) ++-{ ++- const float* inputVectorPtr = inputVector; ++- unsigned int number = 0; ++- float r; ++- ++- for (number = 0; number < num_points; number++) { ++- r = *inputVectorPtr++ * scalar; ++- volk_32f_s32f_convert_8i_single(&outputVector[number], r); ++- } ++-} ++- ++-#endif /* LV_HAVE_GENERIC */ ++- ++- ++ #endif /* INCLUDED_volk_32f_s32f_convert_8i_a_H */ ++-- ++2.39.2 ++ diff --cc debian/patches/disable-neon index 0000000,0000000..73227b3 new file mode 100644 --- /dev/null +++ b/debian/patches/disable-neon @@@ -1,0 -1,0 +1,24 @@@ ++Description: Disable neon. ++Author: Peter Michael Green ++ ++--- volk-2.1.0.orig/lib/CMakeLists.txt +++++ volk-2.1.0/lib/CMakeLists.txt ++@@ -273,7 +273,8 @@ else(neon_compile_result) ++ OVERRULE_ARCH(neonv8 "Compiler doesn't support NEON") ++ endif(neon_compile_result) ++ ++-######################################################################## +++OVERRULE_ARCH(neon "We don't want neon on raspbian") +++ ++ # implement overruling in the ORC case, ++ # since ORC always passes flag detection ++ ######################################################################## ++@@ -441,7 +442,7 @@ set(FULL_C_FLAGS "${CMAKE_C_FLAGS}" "${C ++ # set up the assembler flags and include the source files ++ foreach(ARCH ${ASM_ARCHS_AVAILABLE}) ++ string(REGEX MATCH "${ARCH}" ASM_ARCH "${available_archs}") ++-if( ASM_ARCH STREQUAL "neonv7" ) +++if( ASM_ARCH STREQUAL "neonv7xxxxxxxxxx" ) ++ message(STATUS "---- Adding ASM files") # we always use ATT syntax ++ message(STATUS "-- Detected neon architecture; enabling ASM") ++ # architecture specific assembler flags are now set in the cmake toolchain file diff --cc debian/patches/omit-build-paths index 0000000,0000000..c7efd9e new file mode 100644 --- /dev/null +++ b/debian/patches/omit-build-paths @@@ -1,0 -1,0 +1,28 @@@ ++From f3ee79c2c366f79e3e59b5deec438d4145e12cd3 Mon Sep 17 00:00:00 2001 ++From: "A. Maitland Bottoms" ++Date: Sun, 4 Sep 2022 21:20:13 -0400 ++Subject: [PATCH] omit build path ++ ++Have CMake filter out build path from COMPILER_INFO ++before using the string in constants.c ++ ++Signed-off-by: A. Maitland Bottoms ++--- ++ lib/CMakeLists.txt | 1 + ++ 1 file changed, 1 insertion(+) ++ ++diff --git a/lib/CMakeLists.txt b/lib/CMakeLists.txt ++index 75055ee..412cb11 100644 ++--- a/lib/CMakeLists.txt +++++ b/lib/CMakeLists.txt ++@@ -454,6 +454,7 @@ message(STATUS "Loading version ${VERSION} into constants...") ++ ++ #double escape for windows backslash path separators ++ string(REPLACE "\\" "\\\\" prefix "${prefix}") +++string(REPLACE "${CMAKE_SOURCE_DIR}" "$BUILD_DIR" COMPILER_INFO "${COMPILER_INFO}") ++ ++ configure_file( ++ ${CMAKE_CURRENT_SOURCE_DIR}/constants.c.in ++-- ++2.35.1 ++ diff --cc debian/patches/omit-doxygen-build-paths index 0000000,0000000..96441c9 new file mode 100644 --- /dev/null +++ b/debian/patches/omit-doxygen-build-paths @@@ -1,0 -1,0 +1,82 @@@ ++From 58cc2b105211f0e5beab4dc228b478ebe105be06 Mon Sep 17 00:00:00 2001 ++From: "A. Maitland Bottoms" ++Date: Sun, 4 Sep 2022 21:37:45 -0400 ++Subject: [PATCH] omit doxygen build paths ++ ++Use reproducible-builds friendly configuration settings. ++ ++Signed-off-by: A. Maitland Bottoms ++--- ++ docs/Doxyfile.in | 14 +++++++------- ++ 1 file changed, 7 insertions(+), 7 deletions(-) ++ ++diff --git a/docs/Doxyfile.in b/docs/Doxyfile.in ++index 70913f5..a0fb23f 100644 ++--- a/docs/Doxyfile.in +++++ b/docs/Doxyfile.in ++@@ -157,7 +157,7 @@ FULL_PATH_NAMES = YES ++ # will be relative from the directory where doxygen is started. ++ # This tag requires that the tag FULL_PATH_NAMES is set to YES. ++ ++-STRIP_FROM_PATH = +++STRIP_FROM_PATH = @CMAKE_BINARY_DIR@ @CMAKE_SOURCE_DIR@ ++ ++ # The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of the ++ # path mentioned in the documentation of a class, which tells the reader which ++@@ -166,7 +166,7 @@ STRIP_FROM_PATH = ++ # specify the list of include paths that are normally passed to the compiler ++ # using the -I flag. ++ ++-STRIP_FROM_INC_PATH = +++STRIP_FROM_INC_PATH = @CMAKE_SOURCE_DIR@/include @CMAKE_BINARY_DIR@ @CMAKE_SOURCE_DIR@/lib ++ ++ # If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter (but ++ # less readable) file names. This can be useful is your file systems doesn't ++@@ -637,7 +637,7 @@ MAX_INITIALIZER_LINES = 30 ++ # will mention the files that were used to generate the documentation. ++ # The default value is: YES. ++ ++-SHOW_USED_FILES = YES +++SHOW_USED_FILES = NO ++ ++ # Set the SHOW_FILES tag to NO to disable the generation of the Files page. This ++ # will remove the Files entry from the Quick Index and from the Folder Tree View ++@@ -832,7 +832,7 @@ RECURSIVE = YES ++ # Note that relative paths are relative to the directory from which doxygen is ++ # run. ++ ++-EXCLUDE = @CMAKE_BINARY_DIR@ @CMAKE_SOURCE_DIR@/cpu_features @CMAKE_SOURCE_DIR@/README.md @CMAKE_SOURCE_DIR@/docs/AUTHORS_RESUBMITTING_UNDER_LGPL_LICENSE.md +++EXCLUDE = @CMAKE_BINARY_DIR@ @CMAKE_SOURCE_DIR@/cpu_features @CMAKE_SOURCE_DIR@/README.md @CMAKE_SOURCE_DIR@/cmake @CMAKE_SOURCE_DIR@/docs/AUTHORS_RESUBMITTING_UNDER_LGPL_LICENSE.md @CMAKE_SOURCE_DIR@/apps @CMAKE_SOURCE_DIR@/lib/*qa* @CMAKE_SOURCE_DIR@/tmpl ++ ++ # The EXCLUDE_SYMLINKS tag can be used to select whether or not files or ++ # directories that are symbolic links (a Unix file system feature) are excluded ++@@ -979,7 +979,7 @@ REFERENCES_RELATION = NO ++ # link to the documentation. ++ # The default value is: YES. ++ ++-REFERENCES_LINK_SOURCE = YES +++REFERENCES_LINK_SOURCE = NO ++ ++ # If SOURCE_TOOLTIPS is enabled (the default) then hovering a hyperlink in the ++ # source code will show a tooltip with additional information such as prototype, ++@@ -989,7 +989,7 @@ REFERENCES_LINK_SOURCE = YES ++ # The default value is: YES. ++ # This tag requires that the tag SOURCE_BROWSER is set to YES. ++ ++-SOURCE_TOOLTIPS = YES +++SOURCE_TOOLTIPS = NO ++ ++ # If the USE_HTAGS tag is set to YES then the references to source code will ++ # point to the HTML generated by the htags(1) tool instead of doxygen built-in ++@@ -1099,7 +1099,7 @@ HTML_HEADER = ++ # that doxygen normally uses. ++ # This tag requires that the tag GENERATE_HTML is set to YES. ++ ++-HTML_FOOTER = +++HTML_FOOTER = "" ++ ++ # The HTML_STYLESHEET tag can be used to specify a user-defined cascading style ++ # sheet that is used by each HTML page. It can be used to fine-tune the look of ++-- ++2.35.1 ++ diff --cc debian/patches/optional-static-apps index 0000000,0000000..5f52e57 new file mode 100644 --- /dev/null +++ b/debian/patches/optional-static-apps @@@ -1,0 -1,0 +1,24 @@@ ++Author: A. Maitland Bottoms ++Description: optional static apps ++ For Debian, build apps with static libs if ENABLE_STATIC_APPS. ++ ++--- a/apps/CMakeLists.txt +++++ b/apps/CMakeLists.txt ++@@ -44,7 +44,7 @@ ++ endif() ++ target_link_libraries(volk_profile PRIVATE std::filesystem) ++ ++-if(ENABLE_STATIC_LIBS) +++if(ENABLE_STATIC_LIBS AND ENABLE_STATIC_APPS) ++ target_link_libraries(volk_profile PRIVATE volk_static) ++ set_target_properties(volk_profile PROPERTIES LINK_FLAGS "-static") ++ else() ++@@ -61,7 +61,7 @@ ++ add_executable(volk-config-info volk-config-info.cc ${CMAKE_CURRENT_SOURCE_DIR}/volk_option_helpers.cc ++ ) ++ ++-if(ENABLE_STATIC_LIBS) +++if(ENABLE_STATIC_LIBS AND ENABLE_STATIC_APPS) ++ target_link_libraries(volk-config-info volk_static) ++ set_target_properties(volk-config-info PROPERTIES LINK_FLAGS "-static") ++ else() diff --cc debian/patches/remove-external-HTML-resources index 0000000,0000000..2917d6c new file mode 100644 --- /dev/null +++ b/debian/patches/remove-external-HTML-resources @@@ -1,0 -1,0 +1,32 @@@ ++Author: A. Maitland Bottoms ++Description: remove external HTML resources ++ Debian packages should not generate traffic to external web services. ++ ++--- a/README.md +++++ b/README.md ++@@ -1,8 +1,3 @@ ++-[![Build Status](https://travis-ci.com/gnuradio/volk.svg?branch=master)](https://travis-ci.com/gnuradio/volk) [![Build status](https://ci.appveyor.com/api/projects/status/5o56mgw0do20jlh3/branch/master?svg=true)](https://ci.appveyor.com/project/gnuradio/volk/branch/master) ++-![Check PR Formatting](https://github.com/gnuradio/volk/workflows/Check%20PR%20Formatting/badge.svg) ++-![Run VOLK tests](https://github.com/gnuradio/volk/workflows/Run%20VOLK%20tests/badge.svg) ++-[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3360942.svg)](https://doi.org/10.5281/zenodo.3360942) ++- ++ ![VOLK Logo](/docs/volk_logo.png) ++ ++ # Welcome to VOLK! ++--- a/cpu_features/README.md +++++ b/cpu_features/README.md ++@@ -1,14 +1,4 @@ ++ # cpu_features ++-[![Linux Status][linux_svg]][linux_link] ++-[![Macos Status][macos_svg]][macos_link] ++-[![Windows Status][windows_svg]][windows_link] ++- ++-[linux_svg]: https://github.com/google/cpu_features/actions/workflows/amd64_linux.yml/badge.svg?branch=master ++-[linux_link]: https://github.com/google/cpu_features/actions/workflows/amd64_linux.yml ++-[macos_svg]: https://github.com/google/cpu_features/actions/workflows/amd64_macos.yml/badge.svg?branch=master ++-[macos_link]: https://github.com/google/cpu_features/actions/workflows/amd64_macos.yml ++-[windows_svg]: https://github.com/google/cpu_features/actions/workflows/amd64_windows.yml/badge.svg?branch=master ++-[windows_link]: https://github.com/google/cpu_features/actions/workflows/amd64_windows.yml ++ ++ A cross-platform C library to retrieve CPU features (such as available ++ instructions) at runtime. diff --cc debian/patches/series index 0000000,0000000..fb5c150 new file mode 100644 --- /dev/null +++ b/debian/patches/series @@@ -1,0 -1,0 +1,12 @@@ ++0001-ci-Remove-license-check.patch ++0002-license-Fix-SPDX-identifiers.patch ++# 0003-Use-cpu_features-on-RISC-V-platforms.patch ++0004-add-volk_32f_s32f_x2_convert_8u-kernel.patch ++0005-volk_32f_s32f_convert_8i-code-style.patch ++optional-static-apps ++skip-cpu_features-on-kfreebsd ++# remove-external-HTML-resources ++omit-build-paths ++omit-doxygen-build-paths ++update-doxygen ++disable-neon diff --cc debian/patches/skip-cpu_features-on-kfreebsd index 0000000,0000000..e708b15 new file mode 100644 --- /dev/null +++ b/debian/patches/skip-cpu_features-on-kfreebsd @@@ -1,0 -1,0 +1,20 @@@ ++Subject: skip cpu_freatures on kfreebsd ++Author: A. Maitland Bottoms ++ ++ Avoid #error "Unsupported OS" on kFreeBSD ++ ++--- a/CMakeLists.txt +++++ b/CMakeLists.txt ++@@ -120,8 +120,10 @@ ++ ######################################################################## ++ ++ # cpu_features - sensible defaults, user settable option ++-if(CMAKE_SYSTEM_PROCESSOR MATCHES ++- "(^mips)|(^arm)|(^aarch64)|(x86_64)|(AMD64|amd64)|(^i.86$)|(^powerpc)|(^ppc)") +++message(STATUS "Building Volk for ${CMAKE_SYSTEM_NAME} on ${CMAKE_SYSTEM_PROCESSOR}") +++if((CMAKE_SYSTEM_PROCESSOR MATCHES +++ "(^mips)|(^arm)|(^aarch64)|(x86_64)|(AMD64|amd64)|(^i.86$)|(^powerpc)|(^ppc)") +++ AND (NOT CMAKE_SYSTEM_NAME MATCHES "kFreeBSD")) ++ option(VOLK_CPU_FEATURES "Volk uses cpu_features" ON) ++ else() ++ option(VOLK_CPU_FEATURES "Volk uses cpu_features" OFF) diff --cc debian/patches/update-doxygen index 0000000,0000000..3764e8a new file mode 100644 --- /dev/null +++ b/debian/patches/update-doxygen @@@ -1,0 -1,0 +1,113 @@@ ++Author: A. Maitland Bottoms ++Description: update doxygen ++ Debian has recent version that complains of these settings. ++ ++--- a/docs/Doxyfile.in +++++ b/docs/Doxyfile.in ++@@ -235,12 +235,6 @@ ++ ++ ALIASES = ++ ++-# This tag can be used to specify a number of word-keyword mappings (TCL only). ++-# A mapping has the form "name=value". For example adding "class=itcl::class" ++-# will allow you to use the command class in the itcl::class meaning. ++- ++-TCL_SUBST = ++- ++ # Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources ++ # only. Doxygen will then generate output that is more tailored for C. For ++ # instance, some of the names that are used will be different. The list of all ++@@ -1032,13 +1026,6 @@ ++ ++ ALPHABETICAL_INDEX = YES ++ ++-# The COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns in ++-# which the alphabetical index list will be split. ++-# Minimum value: 1, maximum value: 20, default value: 5. ++-# This tag requires that the tag ALPHABETICAL_INDEX is set to YES. ++- ++-COLS_IN_ALPHA_INDEX = 5 ++- ++ # In case all classes in a project start with a common prefix, all classes will ++ # be put under the same header in the alphabetical index. The IGNORE_PREFIX tag ++ # can be used to specify a prefix (or a list of prefixes) that should be ignored ++@@ -1714,16 +1701,6 @@ ++ ++ LATEX_HIDE_INDICES = NO ++ ++-# If the LATEX_SOURCE_CODE tag is set to YES then doxygen will include source ++-# code with syntax highlighting in the LaTeX output. ++-# ++-# Note that which sources are shown also depends on other settings such as ++-# SOURCE_BROWSER. ++-# The default value is: NO. ++-# This tag requires that the tag GENERATE_LATEX is set to YES. ++- ++-LATEX_SOURCE_CODE = NO ++- ++ # The LATEX_BIB_STYLE tag can be used to specify the style to use for the ++ # bibliography, e.g. plainnat, or ieeetr. See ++ # http://en.wikipedia.org/wiki/BibTeX and \cite for more info. ++@@ -1843,18 +1820,6 @@ ++ ++ XML_OUTPUT = xml ++ ++-# The XML_SCHEMA tag can be used to specify a XML schema, which can be used by a ++-# validating XML parser to check the syntax of the XML files. ++-# This tag requires that the tag GENERATE_XML is set to YES. ++- ++-XML_SCHEMA = ++- ++-# The XML_DTD tag can be used to specify a XML DTD, which can be used by a ++-# validating XML parser to check the syntax of the XML files. ++-# This tag requires that the tag GENERATE_XML is set to YES. ++- ++-XML_DTD = ++- ++ # If the XML_PROGRAMLISTING tag is set to YES doxygen will dump the program ++ # listings (including syntax highlighting and cross-referencing information) to ++ # the XML output. Note that enabling this will significantly increase the size ++@@ -2062,34 +2027,10 @@ ++ ++ EXTERNAL_PAGES = YES ++ ++-# The PERL_PATH should be the absolute path and name of the perl script ++-# interpreter (i.e. the result of 'which perl'). ++-# The default file (with absolute path) is: /usr/bin/perl. ++- ++-PERL_PATH = /usr/bin/perl ++- ++ #--------------------------------------------------------------------------- ++ # Configuration options related to the dot tool ++ #--------------------------------------------------------------------------- ++ ++-# If the CLASS_DIAGRAMS tag is set to YES doxygen will generate a class diagram ++-# (in HTML and LaTeX) for classes with base or super classes. Setting the tag to ++-# NO turns the diagrams off. Note that this option also works with HAVE_DOT ++-# disabled, but it is recommended to install and use dot, since it yields more ++-# powerful graphs. ++-# The default value is: YES. ++- ++-CLASS_DIAGRAMS = NO ++- ++-# You can define message sequence charts within doxygen comments using the \msc ++-# command. Doxygen will then run the mscgen tool (see: ++-# http://www.mcternan.me.uk/mscgen/)) to produce the chart and insert it in the ++-# documentation. The MSCGEN_PATH tag allows you to specify the directory where ++-# the mscgen tool resides. If left empty the tool is assumed to be found in the ++-# default search path. ++- ++-MSCGEN_PATH = ++- ++ # You can include diagrams made with dia in doxygen documentation. Doxygen will ++ # then run dia to produce the diagram and insert it in the documentation. The ++ # DIA_PATH tag allows you to specify the directory where the dia binary resides. ++@@ -2226,7 +2167,7 @@ ++ # The default value is: NO. ++ # This tag requires that the tag HAVE_DOT is set to YES. ++ ++-CALL_GRAPH = NO +++CALL_GRAPH = YES ++ ++ # If the CALLER_GRAPH tag is set to YES then doxygen will generate a caller ++ # dependency graph for every global function or class method. diff --cc debian/rules index 0000000,0000000..85ada09 new file mode 100755 --- /dev/null +++ b/debian/rules @@@ -1,0 -1,0 +1,23 @@@ ++#!/usr/bin/make -f ++DEB_HOST_MULTIARCH ?= $(shell dpkg-architecture -qDEB_HOST_MULTIARCH) ++export DEB_HOST_MULTIARCH ++#export DH_VERBOSE=1 ++ ++%: ++ dh $@ --with python3 ++ ++override_dh_auto_configure: ++ dh_auto_configure -- -DLIB_SUFFIX="/$(DEB_HOST_MULTIARCH)" \ ++ -DPYTHON_EXECUTABLE=/usr/bin/python3 \ ++ -DCMAKE_BUILD_TYPE=RelWithDebInfo ++ ++override_dh_auto_build-indep: ++ cmake --build obj-* --target all ++ cmake --build obj-* --target volk_doc ++ ++override_dh_auto_install: ++ dh_auto_install ++ find debian -type d -empty -delete ++ ++override_dh_auto_test: ++ - dh_auto_test -- CTEST_TEST_TIMEOUT=60 diff --cc debian/source/format index 0000000,0000000..163aaf8 new file mode 100644 --- /dev/null +++ b/debian/source/format @@@ -1,0 -1,0 +1,1 @@@ ++3.0 (quilt) diff --cc debian/volk-config-info.1 index 0000000,0000000..e8d6efd new file mode 100644 --- /dev/null +++ b/debian/volk-config-info.1 @@@ -1,0 -1,0 +1,45 @@@ ++.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.40.10. ++.TH VOLK-CONFIG-INFO "1" "July 2014" "volk-config-info 0.1" "User Commands" ++.SH NAME ++volk-config-info \- pkgconfig-like tool for Vector Optimized Library of Kernels 0.1 ++.SH DESCRIPTION ++.SS "Program options: volk-config-info [options]:" ++.TP ++\fB\-h\fR [ \fB\-\-help\fR ] ++print help message ++.TP ++\fB\-\-prefix\fR ++print VOLK installation prefix ++.TP ++\fB\-\-builddate\fR ++print VOLK build date (RFC2822 format) ++.TP ++\fB\-\-cc\fR ++print VOLK C compiler version ++.TP ++\fB\-\-cflags\fR ++print VOLK CFLAGS ++.TP ++\fB\-\-all\-machines\fR ++print VOLK machines built into library ++.TP ++\fB\-\-avail\-machines\fR ++print VOLK machines the current platform can use ++.TP ++\fB\-\-machine\fR ++print the VOLK machine that will be used ++.TP ++\fB\-v\fR [ \fB\-\-version\fR ] ++print VOLK version ++.SH "SEE ALSO" ++The full documentation for ++.B volk-config-info ++is maintained as a Texinfo manual. If the ++.B info ++and ++.B volk-config-info ++programs are properly installed at your site, the command ++.IP ++.B info volk-config-info ++.PP ++should give you access to the complete manual. diff --cc debian/volk_modtool.1 index 0000000,0000000..752e7f5 new file mode 100644 --- /dev/null +++ b/debian/volk_modtool.1 @@@ -1,0 -1,0 +1,112 @@@ ++.TH GNURADIO "1" "August 2013" "volk_modtool 3.7" "User Commands" ++.SH NAME ++volk_modtool \- tailor VOLK modules ++.SH DESCRIPTION ++The volk_modtool tool is installed along with VOLK as a way of helping ++to construct, add to, and interogate the VOLK library or companion ++libraries. ++.P ++volk_modtool is installed into $prefix/bin. ++.P ++VOLK modtool enables creating standalone (out-of-tree) VOLK modules ++and provides a few tools for sharing VOLK kernels between VOLK ++modules. If you need to design or work with VOLK kernels away from ++the canonical VOLK library, this is the tool. If you need to tailor ++your own VOLK library for whatever reason, this is the tool. ++.P ++The canonical VOLK library installs a volk.h and a libvolk.so. Your ++own library will install volk_$name.h and libvolk_$name.so. Ya Gronk? ++Good. ++.P ++There isn't a substantial difference between the canonical VOLK ++module and any other VOLK module. They're all peers. Any module ++created via VOLK modtool will come complete with a default ++volk_modtool.cfg file associating the module with the base from which ++it came, its distinctive $name and its destination (or path). These ++values (created from user input if VOLK modtool runs without a ++user-supplied config file or a default config file) serve as default ++values for some VOLK modtool actions. It's more or less intended for ++the user to change directories to the top level of a created VOLK ++module and then run volk_modtool to take advantage of the values ++stored in the default volk_modtool.cfg file. ++.P ++Apart from creating new VOLK modules, VOLK modtool allows you to list ++the names of kernels in other modules, list the names of kernels in ++the current module, add kernels from another module into the current ++module, and remove kernels from the current module. When moving ++kernels between modules, VOLK modtool does its best to keep the qa ++and profiling code for those kernels intact. If the base has a test ++or a profiling call for some kernel, those calls will follow the ++kernel when VOLK modtool adds that kernel. If QA or profiling ++requires a puppet kernel, the puppet kernel will follow the original ++kernel when VOLK modtool adds that original kernel. VOLK modtool ++respects puppets. ++.P ++====================================================================== ++.P ++.SH Installing a new VOLK Library: ++.P ++Run the command "volk_modtool -i". This will ask you three questions: ++.P ++ name: // the name to give your VOLK library: volk_ ++ destination: // directory new source tree is built under -- must exists. ++ // It will create /volk_ ++ base: // the directory containing the original VOLK source code ++.P ++This will build a new skeleton directory in the destination provided ++with the name volk_. It will contain the necessary structure to ++build: ++.P ++ mkdir build ++ cd build ++ cmake -DCMAKE_INSTALL_PREFIX=/opt/volk ../ ++ make ++ sudo make install ++.P ++Right now, the library is empty and contains no kernels. Kernels can ++be added from another VOLK library using the '-a' option. If not ++specified, the kernel will be extracted from the base VOLK ++directory. Using the '-b' allows us to specify another VOLK library to ++use for this purpose. ++.P ++ volk_modtool -a -n 32fc_x2_conjugate_dot_prod_32fc ++.P ++This will put the code for the new kernel into ++/volk_/kernels/volk_/ ++.P ++Other kernels must be added by hand. See the following webpages for ++more information about creating VOLK kernels: ++ http://gnuradio.org/doc/doxygen/volk_guide.html ++ http://gnuradio.org/redmine/projects/gnuradio/wiki/Volk ++.P ++====================================================================== ++.P ++.SH OPTIONS ++.P ++Options for Adding and Removing Kernels: ++ -a, --add_kernel ++ Add kernel from existing VOLK module. Uses the base VOLK module ++ unless -b is used. Use -n to specify the kernel name. ++ Requires: -n. ++ Optional: -b ++.P ++ -A, --add_all_kernels ++ Add all kernels from existing VOLK module. Uses the base VOLK ++ module unless -b is used. ++ Optional: -b ++.P ++ -x, --remove_kernel ++ Remove kernel from module. ++ Required: -n. ++ Optional: -b ++.P ++Options for Listing Kernels: ++ -l, --list ++ Lists all kernels available in the base VOLK module. ++.P ++ -k, --kernels ++ Lists all kernels in this VOLK module. ++.P ++ -r, --remote-list ++ Lists all kernels in another VOLK module that is specified ++ using the -b option. diff --cc debian/volk_profile.1 index 0000000,0000000..405facb new file mode 100644 --- /dev/null +++ b/debian/volk_profile.1 @@@ -1,0 -1,0 +1,5 @@@ ++.TH UHD_FFT "1" "March 2012" "volk_profile 3.5" "User Commands" ++.SH NAME ++volk_profile \- Quality Assurance application for libvolk functions ++.SH DESCRIPTION ++Writes profile results to a file. diff --cc debian/watch index 0000000,0000000..0a64f71 new file mode 100644 --- /dev/null +++ b/debian/watch @@@ -1,0 -1,0 +1,4 @@@ ++version=4 ++ opts="filenamemangle=s%(?:.*?)?v?(\d[\d.]*@ARCHIVE_EXT@)%@PACKAGE@-$1%,uversionmangle=s/-rc/~rc/" \ ++ https://github.com/gnuradio/volk/tags \ ++ (?:.*?/)?v?@ANY_VERSION@@ARCHIVE_EXT@