--- /dev/null
+volk (2.5.0-2) unstable; urgency=medium
+
+ * upload to unstable
+ * with some upstream bugfixes
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Thu, 21 Oct 2021 23:30:05 -0400
+
+volk (2.5.0-1) experimental; urgency=medium
+
+ * New upstream release
+ * Use libcpu-features-dev on powerpc and x32 (Closes: #978602)
+ * Mention volk-config-info and volk_modtool in description (Closes: #989263)
+ * Upload to experimental for soversion bump
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Thu, 10 Jun 2021 18:29:47 -0400
+
+volk (2.4.1-2) unstable; urgency=medium
+
+ [ Shengjing Zhu ]
+ * Use system cpu_features package
+
+ [ A. Maitland Bottoms ]
+ * Adopt Use system cpu_features package patch (Closes: #978096)
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 27 Dec 2020 15:16:07 -0500
+
+volk (2.4.1-1) unstable; urgency=medium
+
+ * New upstream release
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Thu, 17 Dec 2020 23:53:21 -0500
+
+volk (2.4.0-4) unstable; urgency=medium
+
+ * skip cpu_features on "Unsupported OS" kFreeBSD
+ * bump Standards-Version - no other changes.
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Tue, 15 Dec 2020 19:53:16 -0500
+
+volk (2.4.0-3) unstable; urgency=medium
+
+ * Fix binary-indep build (Closes: #976300)
+ * Upload to unstable
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Thu, 03 Dec 2020 20:43:29 -0500
+
+volk (2.4.0-2) experimental; urgency=medium
+
+ * Make use of cpu_features a CMake option with sensible defaults per arch
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Mon, 30 Nov 2020 16:19:19 -0500
+
+volk (2.4.0-1) experimental; urgency=medium
+
+ * New upstream release
+ * cpu_features git submodule packaged as cpu-features source component.
+ * Upload to experimental for soversion bump
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 22 Nov 2020 12:35:43 -0500
+
+volk (2.3.0-3) unstable; urgency=medium
+
+ * update to v2.3.0-14-g91e5d07
+ emit an emms instruction after using the mmx extension
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Tue, 30 Jun 2020 19:48:20 -0400
+
+volk (2.3.0-2) unstable; urgency=medium
+
+ * Upload to unstable
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Mon, 11 May 2020 07:26:03 -0400
+
+volk (2.3.0-1) experimental; urgency=medium
+
+ * New upstream release, to experimental for soversion bump
+ * Kernels
+ - volk: accurate exp kernel
+ - exp: Rename SSE4.1 to SSE2 kernel
+ - Add 32f_s32f_add_32f kernel
+ - This kernel adds in vector + scalar functionality
+ - Fix the broken index max kernels
+ - Treat the mod_range puppet as such
+ - Add puppet for power spectral density kernel
+ - Updated log10 calcs to use faster log2 approach
+ - fix: Use unaligned load
+ - divide: Optimize complexmultiplyconjugate
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sat, 09 May 2020 15:42:23 -0400
+
+volk (2.2.1-3) unstable; urgency=medium
+
+ * update to v2.2.1-34-gd4756c5
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 05 Apr 2020 10:37:46 -0400
+
+volk (2.2.1-2) unstable; urgency=medium
+
+ * update to v2.2.1-11-gfaf230e
+ * cmake: Remove the ORC from the VOLK public link interface
+ * Fix the broken index max kernels
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Fri, 27 Mar 2020 21:48:10 -0400
+
+volk (2.2.1-1) unstable; urgency=high
+
+ * New upstream bugfix release
+ reason for high urgency:
+ - Fix loop bound in AVX rotator (only one fixed in 2.2.0-3)
+ - Fix out-of-bounds read in AVX2 square dist kernel
+ - Fix length checks in AVX2 index max kernels
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Mon, 24 Feb 2020 18:08:05 -0500
+
+volk (2.2.0-3) unstable; urgency=high
+
+ * Update to v2.2.0-6-g5701f8f
+ reason for high urgency:
+ - Fix loop bound in AVX rotator
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 23 Feb 2020 23:49:18 -0500
+
+volk (2.2.0-2) unstable; urgency=medium
+
+ * Upload to unstable
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Tue, 18 Feb 2020 17:56:58 -0500
+
+volk (2.2.0-1) experimental; urgency=medium
+
+ * New upstream release
+ - Remove build dependency on python six
+ - Fixup VolkConfigVersion
+ - add volk_version.h
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 16 Feb 2020 18:25:20 -0500
+
+volk (2.1.0-2) unstable; urgency=medium
+
+ * Upload to unstable
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 05 Jan 2020 23:17:57 -0500
+
+volk (2.1.0-1) experimental; urgency=medium
+
+ * New upstream release
+ - The AVX FMA rotator bug is fixed
+ - VOLK offers `volk::vector<>` for C++ to follow RAII
+ - Use C++17 `std::filesystem`
+ - This enables VOLK to be built without Boost if available!
+ - lots of bugfixes
+ - more optimized kernels, especially more NEON versions
+ * Upload to experimental for new ABI library package libvolk2.1
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 22 Dec 2019 10:27:36 -0500
+
+volk (2.0.0-3) unstable; urgency=medium
+
+ * update to v2.0.0-4-gf04a46f
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Thu, 14 Nov 2019 22:47:23 -0500
+
+volk (2.0.0-2) unstable; urgency=medium
+
+ * Upload to unstable
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Mon, 12 Aug 2019 22:49:11 -0400
+
+volk (2.0.0-1) experimental; urgency=medium
+
+ * New upstream release
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Wed, 07 Aug 2019 23:31:20 -0400
+
+volk (1.4-4) unstable; urgency=medium
+
+ * working volk_modtool with Python 3
+ * build and install libvolk.a
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Mon, 29 Oct 2018 01:32:05 -0400
+
+volk (1.4-3) unstable; urgency=medium
+
+ * update to v1.4-9-g297fefd
+ Added an AVX protokernel for volk_32fc_x2_32f_square_dist_scalar_mult_32f
+ fixed a buffer over-read and over-write in
+ volk_32fc_x2_s32f_square_dist_scalar_mult_32f_a_avx
+ Fix 32u_reverse_32u for ARM
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sat, 12 May 2018 15:25:04 -0400
+
+volk (1.4-2) unstable; urgency=medium
+
+ * Upload to unstable, needed by gnuradio (>= 3.7.12.0)
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Tue, 03 Apr 2018 01:03:19 -0400
+
+volk (1.4-1) experimental; urgency=medium
+
+ * New upstream release
+ upstream changelog http://libvolk.org/release-v14.html
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Tue, 27 Mar 2018 22:57:42 -0400
+
+volk (1.3.1-1) unstable; urgency=medium
+
+ * New upstream bugfix release
+ * Refresh all debian patches for use with git am
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Tue, 27 Mar 2018 21:54:29 -0400
+
+volk (1.3-3) unstable; urgency=medium
+
+ * update to v1.3-23-g0109b2e
+ * update debian/libvolk1-dev.abi.tar.gz.amd64
+ * Add breaks/replaces gnuradio (<=3.7.2.1) (LP: #1614235)
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 04 Feb 2018 13:12:21 -0500
+
+volk (1.3-2) unstable; urgency=medium
+
+ * update to v1.3-16-g28b03a9
+ apps: fix profile update reading end of lines
+ qa: lower tolerance for 32fc_mag to fix issue #96
+ * include upstream master patch to sort input files
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 27 Aug 2017 13:44:55 -0400
+
+volk (1.3-1) unstable; urgency=medium
+
+ * New upstream release
+ * The index_max kernels were named with the wrong output datatype. To
+ fix this there are new kernels that return a 32u (int32_t) and the
+ existing kernels had their signatures changed to return 16u (int16_t).
+ * The output to stdout and stderr has been shuffled around. There is no
+ longer a message that prints what VOLK machine is being used and the
+ warning messages go to stderr rather than stdout.
+ * The 32fc_index_max kernels previously were only accurate to the SSE
+ register width (4 points). This was a pretty serious and long-lived
+ bug that's been fixed and the QA updated appropriately.
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sat, 02 Jul 2016 16:30:47 -0400
+
+volk (1.2.2-2) unstable; urgency=medium
+
+ * update to v1.2.2-11-g78c8bc4 (to follow gnuradio maint branch)
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 19 Jun 2016 14:44:15 -0400
+
+volk (1.2.2-1) unstable; urgency=medium
+
+ * New upstream release
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Fri, 08 Apr 2016 00:12:10 -0400
+
+volk (1.2.1-2) unstable; urgency=medium
+
+ * Upstream patches:
+ Fix some CMake complaints
+ The fix for compilation with cmake 3.5
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Wed, 23 Mar 2016 17:47:54 -0400
+
+volk (1.2.1-1) unstable; urgency=medium
+
+ * New upstream release
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 07 Feb 2016 19:38:32 -0500
+
+volk (1.2-1) unstable; urgency=medium
+
+ * New upstream release
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Thu, 24 Dec 2015 20:28:13 -0500
+
+volk (1.1.1-5) experimental; urgency=medium
+
+ * update to v1.1.1-22-gef53547 to support gnuradio 3.7.9
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Fri, 11 Dec 2015 13:12:55 -0500
+
+volk (1.1.1-4) unstable; urgency=medium
+
+ * more lintian fixes
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Wed, 25 Nov 2015 21:49:58 -0500
+
+volk (1.1.1-3) unstable; urgency=medium
+
+ * Lintian fixes Pre-Depends
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Thu, 19 Nov 2015 21:24:27 -0500
+
+volk (1.1.1-2) unstable; urgency=medium
+
+ * Note that libvolk1-dev replaces files in gnuradio-dev versions <<3.7.8
+ (Closes: #802646) again. Thanks Andreas Beckmann.
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Fri, 13 Nov 2015 18:45:49 -0500
+
+volk (1.1.1-1) unstable; urgency=medium
+
+ * New upstream release
+ * New architectures exist for the AVX2 and FMA ISAs.
+ * The profiler now generates buffers that are vlen + a tiny amount and
+ generates random data to fill buffers. This is intended to catch bugs
+ in protokernels that write beyond num_points.
+ * Note that libvolk1-dev replaces files in earlier gnuradio-dev versions
+ (Closes: #802646)
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 01 Nov 2015 18:45:43 -0500
+
+volk (1.1-4) unstable; urgency=medium
+
+ * update to v1.1-12-g264addc
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Tue, 29 Sep 2015 23:41:50 -0400
+
+volk (1.1-3) unstable; urgency=low
+
+ * drop dh_acc to get reproducible builds
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Fri, 11 Sep 2015 22:57:06 -0400
+
+volk (1.1-2) unstable; urgency=low
+
+ * use dh-acc
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Mon, 07 Sep 2015 15:45:20 -0400
+
+volk (1.1-1) unstable; urgency=medium
+
+ * re-organize package naming convention
+ * New upstream release tag v1.1
+ New architectures exist for the AVX2 and FMA ISAs. Along
+ with the build-system support the following kernels have
+ no proto-kernels taking advantage of these architectures:
+
+ * 32f_x2_dot_prod_32f
+ * 32fc_x2_multiply_32fc
+ * 64_byteswap
+ * 32f_binary_slicer_8i
+ * 16u_byteswap
+ * 32u_byteswap
+
+ QA/profiler
+ -----------
+
+ The profiler now generates buffers that are vlen + a tiny
+ amount and generates random data to fill buffers. This is
+ intended to catch bugs in protokernels that write beyond
+ num_points.
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Wed, 26 Aug 2015 09:22:48 -0400
+
+volk (1.0.2-2) unstable; urgency=low
+
+ * Use SOURCE_DATE_EPOCH from the environment, if defined,
+ rather than current date and time to implement volk_build_date()
+ (embedding build date in a library does not help reproducible builds)
+ * add watch file
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sat, 15 Aug 2015 17:43:15 -0400
+
+volk (1.0.2-1) unstable; urgency=medium
+
+ * Maintenance release 24 Jul 2015 by Nathan West
+ * The major change is the CMake logic to add ASM protokernels. Rather
+ than depending on CFLAGS and ASMFLAGS we use the results of VOLK's
+ built in has_ARCH tests. All configurations should work the same as
+ before, but manually specifying CFLAGS and ASMFLAGS on the cmake call
+ for ARM native builds should no longer be necessary.
+ * The 32fc_s32fc_x2_rotator_32fc generic protokernel now includes a
+ previously implied header.
+ * Finally, there is a fix to return the "best" protokernel to the
+ dispatcher when no volk_config exists. Thanks to Alexandre Raymond for
+ pointing this out.
+ * with maint branch patch:
+ kernels-add-missing-include-arm_neon.h
+ * removed unused build-dependency on liboil0.3-dev (closes: #793626)
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Wed, 05 Aug 2015 00:43:40 -0400
+
+volk (1.0.1-1) unstable; urgency=low
+
+ * Maintenance Release v1.0.1 08 Jul 2015 by Nathan West
+ This is a maintenance release with bug fixes since the initial release of
+ v1.0 in April.
+
+ * Contributors
+
+ The following authors have contributed code to this release:
+
+ Doug Geiger doug.geiger@bioradiation.net
+ Elliot Briggs elliot.briggs@gmail.com
+ Marcus Mueller marcus@hostalia.de
+ Nathan West nathan.west@okstate.edu
+ Tom Rondeau tom@trondeau.com
+
+ * Kernels
+
+ Several bug fixes in different kernels. The NEON implementations of the
+ following kernels have been fixed:
+
+ 32f_x2_add_32f
+ 32f_x2_dot_prod_32f
+ 32fc_s32fc_multiply_32fc
+ 32fc_x2_multiply_32fc
+
+ Additionally the NEON asm based 32f_x2_add_32f protokernels were not being
+ used and are now included and available for use via the dispatcher.
+
+ The 32f_s32f_x2_fm_detect_32f kernel now has a puppet. This solves QA seg
+ faults on 32-bit machines and provide a better test for this kernel.
+
+ The 32fc_s32fc_x2_rotator_32fc generic protokernel replaced cabsf with
+ hypotf for better Android support.
+
+ * Building
+
+ Static builds now trigger the applications (volk_profile and
+ volk-config-info) to be statically linked.
+
+ The file gcc_x86_cpuid.h has been removed since it was no longer being
+ used. Previously it provided cpuid functionality for ancient compilers
+ that we do not support.
+
+ All build types now use -Wall.
+
+ * QA and Testing
+
+ The documentation around the --update option to volk_profile now makes it
+ clear that the option will only profile kernels without entries in
+ volk_profile. The signature of run_volk_tests with expanded args changed
+ signed types to unsigned types to reflect the actual input.
+
+ The remaining changes are all non-functional changes to address issues
+ from Coverity.
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Fri, 10 Jul 2015 17:57:42 -0400
+
+volk (1.0-5) unstable; urgency=medium
+
+ * native-armv7-build-support skips neon on Debian armel (Closes: #789972)
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sat, 04 Jul 2015 12:36:36 -0400
+
+volk (1.0-4) unstable; urgency=low
+
+ * update native-armv7-build-support patch from gnuradio volk package
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Thu, 25 Jun 2015 16:38:49 -0400
+
+volk (1.0-3) unstable; urgency=medium
+
+ * Add Breaks/Replaces (Closes: #789893, #789894)
+ * Allow failing tests
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Thu, 25 Jun 2015 12:46:06 -0400
+
+volk (1.0-2) unstable; urgency=medium
+
+ * kernels-add-missing-math.h-include-to-rotator
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Wed, 24 Jun 2015 21:09:32 -0400
+
+volk (1.0-1) unstable; urgency=low
+
+ * Initial package (Closes: #782417)
+ Initial Release 11 Apr 2015 by Nathan West
+
+ VOLK 1.0 is available. This is the first release of VOLK as an independently
+ tracked sub-project of GNU Radio.
+
+ * Contributors
+
+ VOLK has been tracked separately from GNU Radio since 2014 Dec 23.
+ Contributors between the split and the initial release are
+
+ Albert Holguin aholguin_77@yahoo.com
+ Doug Geiger doug.geiger@bioradiation.net
+ Elliot Briggs elliot.briggs@gmail.com
+ Julien Olivain julien.olivain@lsv.ens-cachan.fr
+ Michael Dickens michael.dickens@ettus.com
+ Nathan West nathan.west@okstate.edu
+ Tom Rondeau tom@trondeau.com
+
+ * QA
+
+ The test and profiler have significantly changed. The profiler supports
+ run-time changes to vlen and iters to help kernel development and provide
+ more flexibility on embedded systems. Additionally there is a new option
+ to update an existing volk_profile results file with only new kernels which
+ will save time when updating to newer versions of VOLK
+
+ The QA system creates a static list of kernels and test cases. The QA
+ testing and profiler iterate over this static list rather than each source
+ file keeping its own list. The QA also emits XML results to
+ lib/.unittest/kernels.xml which is formatted similarly to JUnit results.
+
+ * Modtool
+
+ Modtool was updated to support the QA and profiler changes.
+
+ * Kernels
+
+ New proto-kernels:
+
+ 16ic_deinterleave_real_8i_neon
+ 16ic_s32f_deinterleave_32f_neon
+ fix preprocessor errors for some compilers on byteswap and popcount puppets
+
+ ORC was moved to the asm kernels directory.
+ volk_malloc
+
+ The posix_memalign implementation of Volk_malloc now falls back to a standard
+ malloc if alignment is 1.
+
+ * Miscellaneous
+
+ Several build system and cmake changes have made it possible to build VOLK
+ both independently with proper soname versions and in-tree for projects
+ such as GNU Radio.
+
+ The static builds take advantage of cmake object libraries to speed up builds.
+
+ Finally, there are a number of changes to satisfy compiler warnings and make
+ QA work on multiple machines.
+
+ -- A. Maitland Bottoms <bottoms@debian.org> Sun, 12 Apr 2015 23:20:41 -0400
--- /dev/null
+Source: volk
+Section: libdevel
+Priority: optional
+Maintainer: A. Maitland Bottoms <bottoms@debian.org>
+Build-Depends: cmake,
+ debhelper-compat (= 13),
+ dh-python,
+ liborc-0.4-dev,
+ libcpu-features-dev [amd64 arm64 armel armhf i386 mips64el mipsel powerpc ppc64 ppc64el x32],
+ python3-dev,
+ python3-mako
+Build-Depends-Indep: doxygen, graphviz
+Standards-Version: 4.5.1
+Rules-Requires-Root: no
+Homepage: https://libvolk.org
+Vcs-Browser: https://salsa.debian.org/bottoms/pkg-volk
+Vcs-Git: https://salsa.debian.org/bottoms/pkg-volk.git
+
+Package: libvolk2.5
+Section: libs
+Architecture: any
+Pre-Depends: ${misc:Pre-Depends}
+Depends: ${misc:Depends}, ${shlibs:Depends}
+Multi-Arch: same
+Recommends: libvolk2-bin
+Suggests: libvolk2-dev
+Description: vector optimized functions
+ Vector-Optimized Library of Kernels is designed to help
+ applications work with the processor's SIMD instruction sets. These are
+ very powerful vector operations that can give signal processing a
+ huge boost in performance.
+
+Package: libvolk2-dev
+Architecture: any
+Pre-Depends: ${misc:Pre-Depends}
+Depends: libvolk2.5 (=${binary:Version}), ${misc:Depends}
+Breaks: gnuradio-dev (<<3.7.8), libvolk-dev, libvolk1.0-dev, libvolk1-dev
+Replaces: gnuradio-dev (<<3.7.8), libvolk-dev, libvolk1.0-dev, libvolk1-dev
+Suggests: libvolk2-doc
+Multi-Arch: same
+Description: vector optimized function headers
+ Vector-Optimized Library of Kernels is designed to help
+ applications work with the processor's SIMD instruction sets. These are
+ very powerful vector operations that can give signal processing a
+ huge boost in performance.
+ .
+ This package contains the header files.
+ For documentation, see libvolk-doc.
+
+Package: libvolk2-bin
+Section: libs
+Architecture: any
+Pre-Depends: ${misc:Pre-Depends}
+Depends: libvolk2.5 (=${binary:Version}),
+ ${misc:Depends},
+ ${python3:Depends},
+ ${shlibs:Depends}
+Breaks: libvolk1-bin, libvolk-bin, libvolk1.0-bin, gnuradio (<=3.7.2.1)
+Replaces: libvolk1-bin, libvolk-bin, libvolk1.0-bin, gnuradio (<=3.7.2.1)
+Description: vector optimized runtime tools
+ Vector-Optimized Library of Kernels is designed to help
+ applications work with the processor's SIMD instruction sets. These are
+ very powerful vector operations that can give signal processing a
+ huge boost in performance.
+ .
+ This package includes: the volk_profile tool to customize settings for
+ the system; volk_modtool to create new optimized modules; and
+ volk-config-info to show settings.
+
+Package: libvolk2-doc
+Section: doc
+Architecture: all
+Multi-Arch: foreign
+Depends: ${misc:Depends}
+Recommends: www-browser
+Description: vector optimized library documentation
+ Vector-Optimized Library of Kernels is designed to help
+ applications work with the processor's SIMD instruction sets. These are
+ very powerful vector operations that can give signal processing a
+ huge boost in performance.
+ .
+ This package includes the Doxygen generated documentation in
+ /usr/share/doc/libvolk2-dev/html/index.html
--- /dev/null
+Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Upstream-Name: volk
+Upstream-Contact: http://libvolk.org/
+Source:
+ https://github.com/gnuradio/volk
+ https://github.com/google/cpu_features
+Comment:
+ Debian packages by A. Maitland Bottoms <bottoms@debian.org>
+ git archive --format=tar --prefix=volk-2.3.0/ v2.3.0 | xz > ../volk_2.3.0.orig.tar.xz
+ git archive --format=tar --prefix=cpu_features/ v0.6.0 | xz > ../volk_2.4.0.orig-cpu_features.tar.xz
+ .
+ Upstream Maintainers:
+ Johannes Demel <demel@uni-bremen.de>
+ Michael Dickens <michael.dickens@ettus.com>
+Copyright: 2014-2020 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: *
+Copyright: 2006, 2009-2020, Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: Doxyfile.in
+ DoxygenLayout.xml
+ volk.pc.in
+Copyright: 2014-2020 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: apps/volk_profile.h
+Copyright: 2014-2020 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: appveyor.yml
+Copyright: 2016 Paul Cercueil <paul.cercueil@analog.com>
+License: GPL-3+
+
+Files: cmake/*
+Copyright: 2014-2020 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: cmake/Modules/*
+Copyright: 2006, 2009-2020, Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: cpu_features/*
+Copyright: 2020 Google LLC
+License: Apache-2.0
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+ .
+ http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ .
+ On Debian systems, the complete text of the Apache-2.0 License
+ can be found in "/usr/share/common-licenses/Apache-2.0".
+
+Files: cmake/Modules/CMakeParseArgumentsCopy.cmake
+Copyright: 2010 Alexander Neundorf <neundorf@kde.org>
+License: Kitware-BSD
+ All rights reserved.
+ .
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+ .
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+ .
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+ .
+ * Neither the names of Kitware, Inc., the Insight Software Consortium,
+ nor the names of their contributors may be used to endorse or promote
+ products derived from this software without specific prior written
+ permission.
+ .
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Files: cmake/Modules/FindORC.cmake
+ cmake/Modules/VolkConfig.cmake.in
+Copyright: 2014-2015 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: cmake/msvc/*
+Copyright: 2006-2008, Alexander Chemeris
+License: BSD-2-clause
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions are met:
+ .
+ 1. Redistributions of source code must retain the above copyright notice,
+ this list of conditions and the following disclaimer.
+ .
+ 2. Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+ .
+ 3. The name of the author may be used to endorse or promote products
+ derived from this software without specific prior written permission.
+ .
+ THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+ EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Files: debian/*
+Copyright: 2015-2020 Free Software Foundation, Inc
+License: GPL-3+
+Comment: assigned by A. Maitland Bottoms <bottoms@debian.org>
+
+Files: docs/*
+Copyright: 2014-2015 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: gen/archs.xml
+ gen/machines.xml
+Copyright: 2014-2015 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: include/volk/volk_common.h
+ include/volk/volk_complex.h
+ include/volk/volk_prefs.h
+Copyright: 2014-2015 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: kernels/volk/asm/*
+Copyright: 2014-2015 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: kernels/volk/volk_16u_byteswappuppet_16u.h
+ kernels/volk/volk_32u_byteswappuppet_32u.h
+ kernels/volk/volk_64u_byteswappuppet_64u.h
+Copyright: 2014-2015 Free Software Foundation, Inc.
+License: GPL-3+
+
+Files: lib/kernel_tests.h
+ lib/qa_utils.cc
+ lib/qa_utils.h
+ lib/volk_prefs.c
+Copyright: 2014-2015 Free Software Foundation, Inc.
+License: GPL-3+
+
+License: LGPL-2+
+ This library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Library General Public
+ License as published by the Free Software Foundation; either
+ version 2 of the License, or (at your option) any later version.
+ .
+ This library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Library General Public License for more details.
+ .
+ You should have received a copy of the GNU Library General Public License
+ along with this library; see the file COPYING.LIB. If not, write to
+ the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+ Boston, MA 02110-1301, USA.
+
+License: GPL-3+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+ .
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+ .
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+ .
+ On Debian systems, the complete text of the GNU General
+ Public License version 3 can be found in "/usr/share/common-licenses/GPL-3".
--- /dev/null
+usr/bin/volk*
+usr/lib/python3/dist-packages
--- /dev/null
+debian/volk-config-info.1
+debian/volk_modtool.1
+debian/volk_profile.1
--- /dev/null
+<?xml version="1.0" encoding="utf-8"?>
+<descriptor>
+
+ <gcc_options>
+ -DHAVE_CPUID_H
+ -DHAVE_DLFCN_H
+ -DHAVE_FENV_H
+ -DHAVE_POSIX_MEMALIGN
+ -DHAVE_XGETBV
+ -D_GLIBCXX_USE_CXX11_ABI=1
+ -I/usr/include/orc-0.4
+ -DNDEBUG
+ -std=gnu11
+ -m64
+ -mmmx
+ -msse
+ -msse2
+ -msse3
+ -mssse3
+ -msse4.1
+ -msse4.2
+ -mpopcnt
+ -mavx
+ -mfma
+ -mavx2
+ -mavx512f
+ -mavx512cd
+ -fPIC
+ -g
+ -O2
+ -fstack-protector-strong
+ -Wformat
+ -Werror=format-security
+ -Wdate-time
+ -D_FORTIFY_SOURCE=2
+ -fvisibility=hidden
+ -Wsign-compare
+ -Wall
+ -Wno-uninitialized
+</gcc_options>
+
+<headers>
+debian/libvolk2-dev/usr/include/volk/
+</headers>
+
+<libs>
+debian/libvolk2.0/usr/lib/
+</libs>
+
+</descriptor>
--- /dev/null
+usr/include/*
+usr/lib/*/*volk.a
+usr/lib/*/*volk*so
+usr/lib/*/cmake/volk
+usr/lib/*/pkgconfig/*volk*
--- /dev/null
+Document: libvolk2-doc
+Title: Vector-Optimized Library of Kernels Reference Manual
+Author: GNU Radio Developers
+Abstract: VOLK is the Vector-Optimized Library of Kernels.
+ It is a library that contains kernels of hand-written SIMD code for
+ different mathematical operations. Since each SIMD architecture can
+ be very different and no compiler has yet come along to handle
+ vectorization properly or highly efficiently, VOLK approaches the
+ problem differently. For each architecture or platform that a
+ developer wishes to vectorize for, a new proto-kernel is added to
+ VOLK. At runtime, VOLK will select the correct proto-kernel. In this
+ way, the users of VOLK call a kernel for performing the operation
+ that is platform/architecture agnostic. This allows us to write
+ portable SIMD code.
+Section: Programming/C++
+
+Format: HTML
+Index: /usr/share/doc/libvolk2-dev/html/index.html
+Files: /usr/share/doc/libvolk2-dev/html/*.html
--- /dev/null
+obj-*/html
--- /dev/null
+usr/lib/*/libvolk.so.*
--- /dev/null
+usr/bin/list_cpu_features
+usr/lib/*/cmake/CpuFeatures/CpuFeaturesConfig.cmake
+usr/lib/*/cmake/CpuFeatures/CpuFeaturesConfigVersion.cmake
+usr/lib/*/cmake/CpuFeatures/CpuFeaturesTargets-relwithdebinfo.cmake
+usr/lib/*/cmake/CpuFeatures/CpuFeaturesTargets.cmake
+usr/lib/*/libcpu_features.a
--- /dev/null
+From 7b5349217768244e646e12c8f53bbed3d66e0761 Mon Sep 17 00:00:00 2001
+From: Zlika <zlika_ese@hotmail.com>
+Date: Wed, 9 Jun 2021 22:47:04 +0200
+Subject: [PATCH 01/73] Add volk_32f(c)_index_min_16/32u
+
+Signed-off-by: Zlika <zlika_ese@hotmail.com>
+---
+ docs/kernels.dox | 4 +
+ include/volk/volk_avx2_intrinsics.h | 114 ++++-
+ kernels/volk/volk_32f_index_min_16u.h | 375 +++++++++++++++++
+ kernels/volk/volk_32f_index_min_32u.h | 558 +++++++++++++++++++++++++
+ kernels/volk/volk_32fc_index_min_16u.h | 482 +++++++++++++++++++++
+ kernels/volk/volk_32fc_index_min_32u.h | 524 +++++++++++++++++++++++
+ lib/kernel_tests.h | 4 +
+ 7 files changed, 2060 insertions(+), 1 deletion(-)
+ create mode 100644 kernels/volk/volk_32f_index_min_16u.h
+ create mode 100644 kernels/volk/volk_32f_index_min_32u.h
+ create mode 100644 kernels/volk/volk_32fc_index_min_16u.h
+ create mode 100644 kernels/volk/volk_32fc_index_min_32u.h
+
+diff --git a/docs/kernels.dox b/docs/kernels.dox
+index e9898f1..55e567b 100644
+--- a/docs/kernels.dox
++++ b/docs/kernels.dox
+@@ -48,6 +48,8 @@
+ \li \subpage volk_32fc_deinterleave_real_64f
+ \li \subpage volk_32fc_index_max_16u
+ \li \subpage volk_32fc_index_max_32u
++\li \subpage volk_32fc_index_min_16u
++\li \subpage volk_32fc_index_min_32u
+ \li \subpage volk_32fc_magnitude_32f
+ \li \subpage volk_32fc_magnitude_squared_32f
+ \li \subpage volk_32f_cos_32f
+@@ -63,6 +65,8 @@
+ \li \subpage volk_32f_expfast_32f
+ \li \subpage volk_32f_index_max_16u
+ \li \subpage volk_32f_index_max_32u
++\li \subpage volk_32f_index_min_16u
++\li \subpage volk_32f_index_min_32u
+ \li \subpage volk_32f_invsqrt_32f
+ \li \subpage volk_32f_log2_32f
+ \li \subpage volk_32f_s32f_calc_spectral_noise_floor_32f
+diff --git a/include/volk/volk_avx2_intrinsics.h b/include/volk/volk_avx2_intrinsics.h
+index 2c397d9..21060d6 100644
+--- a/include/volk/volk_avx2_intrinsics.h
++++ b/include/volk/volk_avx2_intrinsics.h
+@@ -130,7 +130,7 @@ static inline __m256 _mm256_scaled_norm_dist_ps_avx2(const __m256 symbols0,
+ * float abs_squared = real(src0) * real(src0) + imag(src0) * imag(src1)
+ * bool compare = abs_squared > max_values[j];
+ * max_values[j] = compare ? abs_squared : max_values[j];
+- * max_indices[j] = compare ? current_indices[j] > max_indices[j]
++ * max_indices[j] = compare ? current_indices[j] : max_indices[j]
+ * current_indices[j] += 8; // update for next outer loop iteration
+ * ++src0;
+ * }
+@@ -231,4 +231,116 @@ static inline void vector_32fc_index_max_variant1(__m256 in0,
+ *current_indices = _mm256_add_epi32(*current_indices, indices_increment);
+ }
+
++/*
++ * The function below vectorizes the inner loop of the following code:
++ *
++ * float min_values[8] = {FLT_MAX};
++ * unsigned min_indices[8] = {0};
++ * unsigned current_indices[8] = {0, 1, 2, 3, 4, 5, 6, 7};
++ * for (unsigned i = 0; i < num_points / 8; ++i) {
++ * for (unsigned j = 0; j < 8; ++j) {
++ * float abs_squared = real(src0) * real(src0) + imag(src0) * imag(src1)
++ * bool compare = abs_squared < min_values[j];
++ * min_values[j] = compare ? abs_squared : min_values[j];
++ * min_indices[j] = compare ? current_indices[j] : min_indices[j]
++ * current_indices[j] += 8; // update for next outer loop iteration
++ * ++src0;
++ * }
++ * }
++ */
++static inline void vector_32fc_index_min_variant0(__m256 in0,
++ __m256 in1,
++ __m256* min_values,
++ __m256i* min_indices,
++ __m256i* current_indices,
++ __m256i indices_increment)
++{
++ in0 = _mm256_mul_ps(in0, in0);
++ in1 = _mm256_mul_ps(in1, in1);
++
++ /*
++ * Given the vectors a = (a_7, a_6, …, a_1, a_0) and b = (b_7, b_6, …, b_1, b_0)
++ * hadd_ps(a, b) computes
++ * (b_7 + b_6,
++ * b_5 + b_4,
++ * ---------
++ * a_7 + b_6,
++ * a_5 + a_4,
++ * ---------
++ * b_3 + b_2,
++ * b_1 + b_0,
++ * ---------
++ * a_3 + a_2,
++ * a_1 + a_0).
++ * The result is the squared absolute value of complex numbers at index
++ * offsets (7, 6, 3, 2, 5, 4, 1, 0). This must be the initial value of
++ * current_indices!
++ */
++ __m256 abs_squared = _mm256_hadd_ps(in0, in1);
++
++ /*
++ * Compare the recently computed squared absolute values with the
++ * previously determined minimum values. cmp_ps(a, b) determines
++ * a < b ? 0xFFFFFFFF for each element in the vectors =>
++ * compare_mask = abs_squared < min_values ? 0xFFFFFFFF : 0
++ *
++ * If either operand is NaN, 0 is returned as an “ordered” comparision is
++ * used => the blend operation will select the value from *min_values.
++ */
++ __m256 compare_mask = _mm256_cmp_ps(abs_squared, *min_values, _CMP_LT_OS);
++
++ /* Select minimum by blending. This is the only line which differs from variant1 */
++ *min_values = _mm256_blendv_ps(*min_values, abs_squared, compare_mask);
++
++ /*
++ * Updates indices: blendv_ps(a, b, mask) determines mask ? b : a for
++ * each element in the vectors =>
++ * min_indices = compare_mask ? current_indices : min_indices
++ *
++ * Note: The casting of data types is required to make the compiler happy
++ * and does not change values.
++ */
++ *min_indices =
++ _mm256_castps_si256(_mm256_blendv_ps(_mm256_castsi256_ps(*min_indices),
++ _mm256_castsi256_ps(*current_indices),
++ compare_mask));
++
++ /* compute indices of complex numbers which will be loaded in the next iteration */
++ *current_indices = _mm256_add_epi32(*current_indices, indices_increment);
++}
++
++/* See _variant0 for details */
++static inline void vector_32fc_index_min_variant1(__m256 in0,
++ __m256 in1,
++ __m256* min_values,
++ __m256i* min_indices,
++ __m256i* current_indices,
++ __m256i indices_increment)
++{
++ in0 = _mm256_mul_ps(in0, in0);
++ in1 = _mm256_mul_ps(in1, in1);
++
++ __m256 abs_squared = _mm256_hadd_ps(in0, in1);
++ __m256 compare_mask = _mm256_cmp_ps(abs_squared, *min_values, _CMP_LT_OS);
++
++ /*
++ * This is the only line which differs from variant0. Using maxps instead of
++ * blendvps is faster on Intel CPUs (on the ones tested with).
++ *
++ * Note: The order of arguments matters if a NaN is encountered in which
++ * case the value of the second argument is selected. This is consistent
++ * with the “ordered” comparision and the blend operation: The comparision
++ * returns false if a NaN is encountered and the blend operation
++ * consequently selects the value from min_indices.
++ */
++ *min_values = _mm256_min_ps(abs_squared, *min_values);
++
++ *min_indices =
++ _mm256_castps_si256(_mm256_blendv_ps(_mm256_castsi256_ps(*min_indices),
++ _mm256_castsi256_ps(*current_indices),
++ compare_mask));
++
++ *current_indices = _mm256_add_epi32(*current_indices, indices_increment);
++}
++
+ #endif /* INCLUDE_VOLK_VOLK_AVX2_INTRINSICS_H_ */
+diff --git a/kernels/volk/volk_32f_index_min_16u.h b/kernels/volk/volk_32f_index_min_16u.h
+new file mode 100644
+index 0000000..848b75c
+--- /dev/null
++++ b/kernels/volk/volk_32f_index_min_16u.h
+@@ -0,0 +1,375 @@
++/* -*- c++ -*- */
++/*
++ * Copyright 2021 Free Software Foundation, Inc.
++ *
++ * This file is part of GNU Radio
++ *
++ * GNU Radio is free software; you can redistribute it and/or modify
++ * it under the terms of the GNU General Public License as published by
++ * the Free Software Foundation; either version 3, or (at your option)
++ * any later version.
++ *
++ * GNU Radio is distributed in the hope that it will be useful,
++ * but WITHOUT ANY WARRANTY; without even the implied warranty of
++ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++ * GNU General Public License for more details.
++ *
++ * You should have received a copy of the GNU General Public License
++ * along with GNU Radio; see the file COPYING. If not, write to
++ * the Free Software Foundation, Inc., 51 Franklin Street,
++ * Boston, MA 02110-1301, USA.
++ */
++
++/*!
++ * \page volk_32f_index_min_16u
++ *
++ * \b Overview
++ *
++ * Returns Argmin_i x[i]. Finds and returns the index which contains
++ * the fist minimum value in the given vector.
++ *
++ * Note that num_points is a uint32_t, but the return value is
++ * uint16_t. Providing a vector larger than the max of a uint16_t
++ * (65536) would miss anything outside of this boundary. The kernel
++ * will check the length of num_points and cap it to this max value,
++ * anyways.
++ *
++ * <b>Dispatcher Prototype</b>
++ * \code
++ * void volk_32f_index_min_16u(uint16_t* target, const float* src0, uint32_t num_points)
++ * \endcode
++ *
++ * \b Inputs
++ * \li src0: The input vector of floats.
++ * \li num_points: The number of data points.
++ *
++ * \b Outputs
++ * \li target: The index of the fist minimum value in the input buffer.
++ *
++ * \b Example
++ * \code
++ * int N = 10;
++ * uint32_t alignment = volk_get_alignment();
++ * float* in = (float*)volk_malloc(sizeof(float)*N, alignment);
++ * uint16_t* out = (uint16_t*)volk_malloc(sizeof(uint16_t), alignment);
++ *
++ * for(uint32_t ii = 0; ii < N; ++ii){
++ * float x = (float)ii;
++ * // a parabola with a minimum at x=4
++ * in[ii] = (x-4) * (x-4) - 5;
++ * }
++ *
++ * volk_32f_index_min_16u(out, in, N);
++ *
++ * printf("minimum is %1.2f at index %u\n", in[*out], *out);
++ *
++ * volk_free(in);
++ * volk_free(out);
++ * \endcode
++ */
++
++#ifndef INCLUDED_volk_32f_index_min_16u_a_H
++#define INCLUDED_volk_32f_index_min_16u_a_H
++
++#include <inttypes.h>
++#include <limits.h>
++#include <stdio.h>
++#include <volk/volk_common.h>
++
++#ifdef LV_HAVE_AVX
++#include <immintrin.h>
++
++static inline void
++volk_32f_index_min_16u_a_avx(uint16_t* target, const float* src0, uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ uint32_t number = 0;
++ const uint32_t eighthPoints = num_points / 8;
++
++ float* inputPtr = (float*)src0;
++
++ __m256 indexIncrementValues = _mm256_set1_ps(8);
++ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
++
++ float min = src0[0];
++ float index = 0;
++ __m256 minValues = _mm256_set1_ps(min);
++ __m256 minValuesIndex = _mm256_setzero_ps();
++ __m256 compareResults;
++ __m256 currentValues;
++
++ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
++ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
++
++ for (; number < eighthPoints; number++) {
++
++ currentValues = _mm256_load_ps(inputPtr);
++ inputPtr += 8;
++ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
++
++ compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
++
++ minValuesIndex = _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
++ }
++
++ // Calculate the smallest value from the remaining 4 points
++ _mm256_store_ps(minValuesBuffer, minValues);
++ _mm256_store_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 8; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = eighthPoints * 8;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint16_t)index;
++}
++
++#endif /*LV_HAVE_AVX*/
++
++#ifdef LV_HAVE_SSE4_1
++#include <smmintrin.h>
++
++static inline void
++volk_32f_index_min_16u_a_sse4_1(uint16_t* target, const float* src0, uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ uint32_t number = 0;
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)src0;
++
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++
++ float min = src0[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (; number < quarterPoints; number++) {
++
++ currentValues = _mm_load_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
++
++ minValuesIndex = _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm_blendv_ps(minValues, currentValues, compareResults);
++ }
++
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = quarterPoints * 4;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint16_t)index;
++}
++
++#endif /*LV_HAVE_SSE4_1*/
++
++
++#ifdef LV_HAVE_SSE
++
++#include <xmmintrin.h>
++
++static inline void
++volk_32f_index_min_16u_a_sse(uint16_t* target, const float* src0, uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ uint32_t number = 0;
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)src0;
++
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++
++ float min = src0[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (; number < quarterPoints; number++) {
++
++ currentValues = _mm_load_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
++
++ minValuesIndex = _mm_or_ps(_mm_and_ps(compareResults, currentIndexes),
++ _mm_andnot_ps(compareResults, minValuesIndex));
++ minValues = _mm_or_ps(_mm_and_ps(compareResults, currentValues),
++ _mm_andnot_ps(compareResults, minValues));
++ }
++
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = quarterPoints * 4;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint16_t)index;
++}
++
++#endif /*LV_HAVE_SSE*/
++
++
++#ifdef LV_HAVE_GENERIC
++
++static inline void
++volk_32f_index_min_16u_generic(uint16_t* target, const float* src0, uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ float min = src0[0];
++ uint16_t index = 0;
++
++ uint32_t i = 1;
++
++ for (; i < num_points; ++i) {
++ if (src0[i] < min) {
++ index = i;
++ min = src0[i];
++ }
++ }
++ target[0] = index;
++}
++
++#endif /*LV_HAVE_GENERIC*/
++
++
++#endif /*INCLUDED_volk_32f_index_min_16u_a_H*/
++
++
++#ifndef INCLUDED_volk_32f_index_min_16u_u_H
++#define INCLUDED_volk_32f_index_min_16u_u_H
++
++#include <inttypes.h>
++#include <limits.h>
++#include <stdio.h>
++#include <volk/volk_common.h>
++
++#ifdef LV_HAVE_AVX
++#include <immintrin.h>
++
++static inline void
++volk_32f_index_min_16u_u_avx(uint16_t* target, const float* src0, uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ uint32_t number = 0;
++ const uint32_t eighthPoints = num_points / 8;
++
++ float* inputPtr = (float*)src0;
++
++ __m256 indexIncrementValues = _mm256_set1_ps(8);
++ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
++
++ float min = src0[0];
++ float index = 0;
++ __m256 minValues = _mm256_set1_ps(min);
++ __m256 minValuesIndex = _mm256_setzero_ps();
++ __m256 compareResults;
++ __m256 currentValues;
++
++ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
++ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
++
++ for (; number < eighthPoints; number++) {
++
++ currentValues = _mm256_loadu_ps(inputPtr);
++ inputPtr += 8;
++ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
++
++ compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
++
++ minValuesIndex = _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
++ }
++
++ // Calculate the smallest value from the remaining 4 points
++ _mm256_storeu_ps(minValuesBuffer, minValues);
++ _mm256_storeu_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 8; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = eighthPoints * 8;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint16_t)index;
++}
++
++#endif /*LV_HAVE_AVX*/
++
++#endif /*INCLUDED_volk_32f_index_min_16u_u_H*/
+diff --git a/kernels/volk/volk_32f_index_min_32u.h b/kernels/volk/volk_32f_index_min_32u.h
+new file mode 100644
+index 0000000..67ee426
+--- /dev/null
++++ b/kernels/volk/volk_32f_index_min_32u.h
+@@ -0,0 +1,558 @@
++/* -*- c++ -*- */
++/*
++ * Copyright 2021 Free Software Foundation, Inc.
++ *
++ * This file is part of GNU Radio
++ *
++ * GNU Radio is free software; you can redistribute it and/or modify
++ * it under the terms of the GNU General Public License as published by
++ * the Free Software Foundation; either version 3, or (at your option)
++ * any later version.
++ *
++ * GNU Radio is distributed in the hope that it will be useful,
++ * but WITHOUT ANY WARRANTY; without even the implied warranty of
++ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++ * GNU General Public License for more details.
++ *
++ * You should have received a copy of the GNU General Public License
++ * along with GNU Radio; see the file COPYING. If not, write to
++ * the Free Software Foundation, Inc., 51 Franklin Street,
++ * Boston, MA 02110-1301, USA.
++ */
++
++/*!
++ * \page volk_32f_index_min_32u
++ *
++ * \b Overview
++ *
++ * Returns Argmin_i x[i]. Finds and returns the index which contains the first minimum
++ * value in the given vector.
++ *
++ * <b>Dispatcher Prototype</b>
++ * \code
++ * void volk_32f_index_min_32u(uint32_t* target, const float* src0, uint32_t num_points)
++ * \endcode
++ *
++ * \b Inputs
++ * \li src0: The input vector of floats.
++ * \li num_points: The number of data points.
++ *
++ * \b Outputs
++ * \li target: The index of the first minimum value in the input buffer.
++ *
++ * \b Example
++ * \code
++ * int N = 10;
++ * uint32_t alignment = volk_get_alignment();
++ * float* in = (float*)volk_malloc(sizeof(float)*N, alignment);
++ * uint32_t* out = (uint32_t*)volk_malloc(sizeof(uint32_t), alignment);
++ *
++ * for(uint32_t ii = 0; ii < N; ++ii){
++ * float x = (float)ii;
++ * // a parabola with a minimum at x=4
++ * in[ii] = (x-4) * (x-4) - 5;
++ * }
++ *
++ * volk_32f_index_min_32u(out, in, N);
++ *
++ * printf("minimum is %1.2f at index %u\n", in[*out], *out);
++ *
++ * volk_free(in);
++ * volk_free(out);
++ * \endcode
++ */
++
++#ifndef INCLUDED_volk_32f_index_min_32u_a_H
++#define INCLUDED_volk_32f_index_min_32u_a_H
++
++#include <inttypes.h>
++#include <stdio.h>
++#include <volk/volk_common.h>
++
++#ifdef LV_HAVE_SSE4_1
++#include <smmintrin.h>
++
++static inline void
++volk_32f_index_min_32u_a_sse4_1(uint32_t* target, const float* src0, uint32_t num_points)
++{
++ if (num_points > 0) {
++ uint32_t number = 0;
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)src0;
++
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++
++ float min = src0[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (; number < quarterPoints; number++) {
++
++ currentValues = _mm_load_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
++
++ minValuesIndex =
++ _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm_blendv_ps(minValues, currentValues, compareResults);
++ }
++
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = quarterPoints * 4;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint32_t)index;
++ }
++}
++
++#endif /*LV_HAVE_SSE4_1*/
++
++
++#ifdef LV_HAVE_SSE
++
++#include <xmmintrin.h>
++
++static inline void
++volk_32f_index_min_32u_a_sse(uint32_t* target, const float* src0, uint32_t num_points)
++{
++ if (num_points > 0) {
++ uint32_t number = 0;
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)src0;
++
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++
++ float min = src0[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (; number < quarterPoints; number++) {
++
++ currentValues = _mm_load_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
++
++ minValuesIndex = _mm_or_ps(_mm_and_ps(compareResults, currentIndexes),
++ _mm_andnot_ps(compareResults, minValuesIndex));
++
++ minValues = _mm_or_ps(_mm_and_ps(compareResults, currentValues),
++ _mm_andnot_ps(compareResults, minValues));
++ }
++
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = quarterPoints * 4;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint32_t)index;
++ }
++}
++
++#endif /*LV_HAVE_SSE*/
++
++
++#ifdef LV_HAVE_AVX
++#include <immintrin.h>
++
++static inline void
++volk_32f_index_min_32u_a_avx(uint32_t* target, const float* src0, uint32_t num_points)
++{
++ if (num_points > 0) {
++ uint32_t number = 0;
++ const uint32_t quarterPoints = num_points / 8;
++
++ float* inputPtr = (float*)src0;
++
++ __m256 indexIncrementValues = _mm256_set1_ps(8);
++ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
++
++ float min = src0[0];
++ float index = 0;
++ __m256 minValues = _mm256_set1_ps(min);
++ __m256 minValuesIndex = _mm256_setzero_ps();
++ __m256 compareResults;
++ __m256 currentValues;
++
++ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
++ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
++
++ for (; number < quarterPoints; number++) {
++ currentValues = _mm256_load_ps(inputPtr);
++ inputPtr += 8;
++ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
++ compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
++ minValuesIndex =
++ _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
++ }
++
++ // Calculate the smallest value from the remaining 8 points
++ _mm256_store_ps(minValuesBuffer, minValues);
++ _mm256_store_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 8; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = quarterPoints * 8;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint32_t)index;
++ }
++}
++
++#endif /*LV_HAVE_AVX*/
++
++
++#ifdef LV_HAVE_NEON
++#include <arm_neon.h>
++
++static inline void
++volk_32f_index_min_32u_neon(uint32_t* target, const float* src0, uint32_t num_points)
++{
++ if (num_points > 0) {
++ uint32_t number = 0;
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)src0;
++ float32x4_t indexIncrementValues = vdupq_n_f32(4);
++ __VOLK_ATTR_ALIGNED(16)
++ float currentIndexes_float[4] = { -4.0f, -3.0f, -2.0f, -1.0f };
++ float32x4_t currentIndexes = vld1q_f32(currentIndexes_float);
++
++ float min = src0[0];
++ float index = 0;
++ float32x4_t minValues = vdupq_n_f32(min);
++ uint32x4_t minValuesIndex = vmovq_n_u32(0);
++ uint32x4_t compareResults;
++ uint32x4_t currentIndexes_u;
++ float32x4_t currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (; number < quarterPoints; number++) {
++ currentValues = vld1q_f32(inputPtr);
++ inputPtr += 4;
++ currentIndexes = vaddq_f32(currentIndexes, indexIncrementValues);
++ currentIndexes_u = vcvtq_u32_f32(currentIndexes);
++ compareResults = vcgeq_f32(currentValues, minValues);
++ minValuesIndex = vorrq_u32(vandq_u32(compareResults, minValuesIndex),
++ vbicq_u32(currentIndexes_u, compareResults));
++ minValues = vminq_f32(currentValues, minValues);
++ }
++
++ // Calculate the smallest value from the remaining 4 points
++ vst1q_f32(minValuesBuffer, minValues);
++ vst1q_f32(minIndexesBuffer, vcvtq_f32_u32(minValuesIndex));
++ for (number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValues[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = quarterPoints * 4;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint32_t)index;
++ }
++}
++
++#endif /*LV_HAVE_NEON*/
++
++
++#ifdef LV_HAVE_GENERIC
++
++static inline void
++volk_32f_index_min_32u_generic(uint32_t* target, const float* src0, uint32_t num_points)
++{
++ if (num_points > 0) {
++ float min = src0[0];
++ uint32_t index = 0;
++
++ uint32_t i = 1;
++
++ for (; i < num_points; ++i) {
++ if (src0[i] < min) {
++ index = i;
++ min = src0[i];
++ }
++ }
++ target[0] = index;
++ }
++}
++
++#endif /*LV_HAVE_GENERIC*/
++
++
++#endif /*INCLUDED_volk_32f_index_min_32u_a_H*/
++
++
++#ifndef INCLUDED_volk_32f_index_min_32u_u_H
++#define INCLUDED_volk_32f_index_min_32u_u_H
++
++#include <inttypes.h>
++#include <stdio.h>
++#include <volk/volk_common.h>
++
++
++#ifdef LV_HAVE_AVX
++#include <immintrin.h>
++
++static inline void
++volk_32f_index_min_32u_u_avx(uint32_t* target, const float* src0, uint32_t num_points)
++{
++ if (num_points > 0) {
++ uint32_t number = 0;
++ const uint32_t quarterPoints = num_points / 8;
++
++ float* inputPtr = (float*)src0;
++
++ __m256 indexIncrementValues = _mm256_set1_ps(8);
++ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
++
++ float min = src0[0];
++ float index = 0;
++ __m256 minValues = _mm256_set1_ps(min);
++ __m256 minValuesIndex = _mm256_setzero_ps();
++ __m256 compareResults;
++ __m256 currentValues;
++
++ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
++ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
++
++ for (; number < quarterPoints; number++) {
++ currentValues = _mm256_loadu_ps(inputPtr);
++ inputPtr += 8;
++ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
++ compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
++ minValuesIndex =
++ _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
++ }
++
++ // Calculate the smalles value from the remaining 8 points
++ _mm256_store_ps(minValuesBuffer, minValues);
++ _mm256_store_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 8; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = quarterPoints * 8;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint32_t)index;
++ }
++}
++
++#endif /*LV_HAVE_AVX*/
++
++
++#ifdef LV_HAVE_SSE4_1
++#include <smmintrin.h>
++
++static inline void
++volk_32f_index_min_32u_u_sse4_1(uint32_t* target, const float* src0, uint32_t num_points)
++{
++ if (num_points > 0) {
++ uint32_t number = 0;
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)src0;
++
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++
++ float min = src0[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (; number < quarterPoints; number++) {
++ currentValues = _mm_loadu_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
++ minValuesIndex =
++ _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm_blendv_ps(minValues, currentValues, compareResults);
++ }
++
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = quarterPoints * 4;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint32_t)index;
++ }
++}
++
++#endif /*LV_HAVE_SSE4_1*/
++
++#ifdef LV_HAVE_SSE
++#include <xmmintrin.h>
++
++static inline void
++volk_32f_index_min_32u_u_sse(uint32_t* target, const float* src0, uint32_t num_points)
++{
++ if (num_points > 0) {
++ uint32_t number = 0;
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)src0;
++
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++
++ float min = src0[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (; number < quarterPoints; number++) {
++ currentValues = _mm_loadu_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
++ minValuesIndex = _mm_or_ps(_mm_and_ps(compareResults, currentIndexes),
++ _mm_andnot_ps(compareResults, minValuesIndex));
++ minValues = _mm_or_ps(_mm_and_ps(compareResults, currentValues),
++ _mm_andnot_ps(compareResults, minValues));
++ }
++
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
++
++ for (number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
++ index = minIndexesBuffer[number];
++ }
++ }
++
++ number = quarterPoints * 4;
++ for (; number < num_points; number++) {
++ if (src0[number] < min) {
++ index = number;
++ min = src0[number];
++ }
++ }
++ target[0] = (uint32_t)index;
++ }
++}
++
++#endif /*LV_HAVE_SSE*/
++
++#endif /*INCLUDED_volk_32f_index_min_32u_u_H*/
+diff --git a/kernels/volk/volk_32fc_index_min_16u.h b/kernels/volk/volk_32fc_index_min_16u.h
+new file mode 100644
+index 0000000..5539ebf
+--- /dev/null
++++ b/kernels/volk/volk_32fc_index_min_16u.h
+@@ -0,0 +1,482 @@
++/* -*- c++ -*- */
++/*
++ * Copyright 2021 Free Software Foundation, Inc.
++ *
++ * This file is part of GNU Radio
++ *
++ * GNU Radio is free software; you can redistribute it and/or modify
++ * it under the terms of the GNU General Public License as published by
++ * the Free Software Foundation; either version 3, or (at your option)
++ * any later version.
++ *
++ * GNU Radio is distributed in the hope that it will be useful,
++ * but WITHOUT ANY WARRANTY; without even the implied warranty of
++ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++ * GNU General Public License for more details.
++ *
++ * You should have received a copy of the GNU General Public License
++ * along with GNU Radio; see the file COPYING. If not, write to
++ * the Free Software Foundation, Inc., 51 Franklin Street,
++ * Boston, MA 02110-1301, USA.
++ */
++
++/*!
++ * \page volk_32fc_index_min_16u
++ *
++ * \b Overview
++ *
++ * Returns Argmin_i mag(x[i]). Finds and returns the index which contains the
++ * minimum magnitude for complex points in the given vector.
++ *
++ * Note that num_points is a uint32_t, but the return value is
++ * uint16_t. Providing a vector larger than the max of a uint16_t
++ * (65536) would miss anything outside of this boundary. The kernel
++ * will check the length of num_points and cap it to this max value,
++ * anyways.
++ *
++ * <b>Dispatcher Prototype</b>
++ * \code
++ * void volk_32fc_index_min_16u(uint16_t* target, lv_32fc_t* src0, uint32_t
++ * num_points) \endcode
++ *
++ * \b Inputs
++ * \li src0: The complex input vector.
++ * \li num_points: The number of samples.
++ *
++ * \b Outputs
++ * \li target: The index of the point with minimum magnitude.
++ *
++ * \b Example
++ * Calculate the index of the minimum value of \f$x^2 + x\f$ for points around
++ * the unit circle.
++ * \code
++ * int N = 10;
++ * uint32_t alignment = volk_get_alignment();
++ * lv_32fc_t* in = (lv_32fc_t*)volk_malloc(sizeof(lv_32fc_t)*N, alignment);
++ * uint16_t* min = (uint16_t*)volk_malloc(sizeof(uint16_t), alignment);
++ *
++ * for(uint32_t ii = 0; ii < N/2; ++ii){
++ * float real = 2.f * ((float)ii / (float)N) - 1.f;
++ * float imag = std::sqrt(1.f - real * real);
++ * in[ii] = lv_cmake(real, imag);
++ * in[ii] = in[ii] * in[ii] + in[ii];
++ * in[N-ii] = lv_cmake(real, imag);
++ * in[N-ii] = in[N-ii] * in[N-ii] + in[N-ii];
++ * }
++ *
++ * volk_32fc_index_min_16u(min, in, N);
++ *
++ * printf("index of min value = %u\n", *min);
++ *
++ * volk_free(in);
++ * volk_free(min);
++ * \endcode
++ */
++
++#ifndef INCLUDED_volk_32fc_index_min_16u_a_H
++#define INCLUDED_volk_32fc_index_min_16u_a_H
++
++#include <inttypes.h>
++#include <limits.h>
++#include <stdio.h>
++#include <volk/volk_common.h>
++#include <volk/volk_complex.h>
++
++#ifdef LV_HAVE_AVX2
++#include <immintrin.h>
++#include <volk/volk_avx2_intrinsics.h>
++
++static inline void volk_32fc_index_min_16u_a_avx2_variant_0(uint16_t* target,
++ lv_32fc_t* src0,
++ uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ const __m256i indices_increment = _mm256_set1_epi32(8);
++ /*
++ * At the start of each loop iteration current_indices holds the indices of
++ * the complex numbers loaded from memory. Explanation for odd order is given
++ * in implementation of vector_32fc_index_min_variant0().
++ */
++ __m256i current_indices = _mm256_set_epi32(7, 6, 3, 2, 5, 4, 1, 0);
++
++ __m256 min_values = _mm256_set1_ps(FLT_MAX);
++ __m256i min_indices = _mm256_setzero_si256();
++
++ for (unsigned i = 0; i < num_points / 8u; ++i) {
++ __m256 in0 = _mm256_load_ps((float*)src0);
++ __m256 in1 = _mm256_load_ps((float*)(src0 + 4));
++ vector_32fc_index_min_variant0(
++ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
++ src0 += 8;
++ }
++
++ // determine minimum value and index in the result of the vectorized loop
++ __VOLK_ATTR_ALIGNED(32) float min_values_buffer[8];
++ __VOLK_ATTR_ALIGNED(32) uint32_t min_indices_buffer[8];
++ _mm256_store_ps(min_values_buffer, min_values);
++ _mm256_store_si256((__m256i*)min_indices_buffer, min_indices);
++
++ float min = FLT_MAX;
++ uint32_t index = 0;
++ for (unsigned i = 0; i < 8; i++) {
++ if (min_values_buffer[i] < min) {
++ min = min_values_buffer[i];
++ index = min_indices_buffer[i];
++ }
++ }
++
++ // handle tail not processed by the vectorized loop
++ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
++ const float abs_squared =
++ lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ if (abs_squared < min) {
++ min = abs_squared;
++ index = i;
++ }
++ ++src0;
++ }
++
++ *target = index;
++}
++
++#endif /*LV_HAVE_AVX2*/
++
++#ifdef LV_HAVE_AVX2
++#include <immintrin.h>
++#include <volk/volk_avx2_intrinsics.h>
++
++static inline void volk_32fc_index_min_16u_a_avx2_variant_1(uint16_t* target,
++ lv_32fc_t* src0,
++ uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ const __m256i indices_increment = _mm256_set1_epi32(8);
++ /*
++ * At the start of each loop iteration current_indices holds the indices of
++ * the complex numbers loaded from memory. Explanation for odd order is given
++ * in implementation of vector_32fc_index_min_variant0().
++ */
++ __m256i current_indices = _mm256_set_epi32(7, 6, 3, 2, 5, 4, 1, 0);
++
++ __m256 min_values = _mm256_set1_ps(FLT_MAX);
++ __m256i min_indices = _mm256_setzero_si256();
++
++ for (unsigned i = 0; i < num_points / 8u; ++i) {
++ __m256 in0 = _mm256_load_ps((float*)src0);
++ __m256 in1 = _mm256_load_ps((float*)(src0 + 4));
++ vector_32fc_index_min_variant1(
++ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
++ src0 += 8;
++ }
++
++ // determine minimum value and index in the result of the vectorized loop
++ __VOLK_ATTR_ALIGNED(32) float min_values_buffer[8];
++ __VOLK_ATTR_ALIGNED(32) uint32_t min_indices_buffer[8];
++ _mm256_store_ps(min_values_buffer, min_values);
++ _mm256_store_si256((__m256i*)min_indices_buffer, min_indices);
++
++ float min = FLT_MAX;
++ uint32_t index = 0;
++ for (unsigned i = 0; i < 8; i++) {
++ if (min_values_buffer[i] < min) {
++ min = min_values_buffer[i];
++ index = min_indices_buffer[i];
++ }
++ }
++
++ // handle tail not processed by the vectorized loop
++ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
++ const float abs_squared =
++ lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ if (abs_squared < min) {
++ min = abs_squared;
++ index = i;
++ }
++ ++src0;
++ }
++
++ *target = index;
++}
++
++#endif /*LV_HAVE_AVX2*/
++
++#ifdef LV_HAVE_SSE3
++#include <pmmintrin.h>
++#include <xmmintrin.h>
++
++static inline void
++volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* src0, uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++ const uint32_t num_bytes = num_points * 8;
++
++ union bit128 holderf;
++ union bit128 holderi;
++ float sq_dist = 0.0;
++
++ union bit128 xmm5, xmm4;
++ __m128 xmm1, xmm2, xmm3;
++ __m128i xmm8, xmm11, xmm12, xmm9, xmm10;
++
++ xmm5.int_vec = _mm_setzero_si128();
++ xmm4.int_vec = _mm_setzero_si128();
++ holderf.int_vec = _mm_setzero_si128();
++ holderi.int_vec = _mm_setzero_si128();
++
++ int bound = num_bytes >> 5;
++ int i = 0;
++
++ xmm8 = _mm_setr_epi32(0, 1, 2, 3);
++ xmm9 = _mm_setzero_si128();
++ xmm10 = _mm_setr_epi32(4, 4, 4, 4);
++ xmm3 = _mm_set_ps1(FLT_MAX);
++
++ for (; i < bound; ++i) {
++ xmm1 = _mm_load_ps((float*)src0);
++ xmm2 = _mm_load_ps((float*)&src0[2]);
++
++ src0 += 4;
++
++ xmm1 = _mm_mul_ps(xmm1, xmm1);
++ xmm2 = _mm_mul_ps(xmm2, xmm2);
++
++ xmm1 = _mm_hadd_ps(xmm1, xmm2);
++
++ xmm3 = _mm_min_ps(xmm1, xmm3);
++
++ xmm4.float_vec = _mm_cmpgt_ps(xmm1, xmm3);
++ xmm5.float_vec = _mm_cmpeq_ps(xmm1, xmm3);
++
++ xmm11 = _mm_and_si128(xmm8, xmm5.int_vec);
++ xmm12 = _mm_and_si128(xmm9, xmm4.int_vec);
++
++ xmm9 = _mm_add_epi32(xmm11, xmm12);
++
++ xmm8 = _mm_add_epi32(xmm8, xmm10);
++ }
++
++ if (num_bytes >> 4 & 1) {
++ xmm2 = _mm_load_ps((float*)src0);
++
++ xmm1 = _mm_movelh_ps(bit128_p(&xmm8)->float_vec, bit128_p(&xmm8)->float_vec);
++ xmm8 = bit128_p(&xmm1)->int_vec;
++
++ xmm2 = _mm_mul_ps(xmm2, xmm2);
++
++ src0 += 2;
++
++ xmm1 = _mm_hadd_ps(xmm2, xmm2);
++
++ xmm3 = _mm_min_ps(xmm1, xmm3);
++
++ xmm10 = _mm_setr_epi32(2, 2, 2, 2);
++
++ xmm4.float_vec = _mm_cmpgt_ps(xmm1, xmm3);
++ xmm5.float_vec = _mm_cmpeq_ps(xmm1, xmm3);
++
++ xmm11 = _mm_and_si128(xmm8, xmm5.int_vec);
++ xmm12 = _mm_and_si128(xmm9, xmm4.int_vec);
++
++ xmm9 = _mm_add_epi32(xmm11, xmm12);
++
++ xmm8 = _mm_add_epi32(xmm8, xmm10);
++ }
++
++ if (num_bytes >> 3 & 1) {
++ sq_dist =
++ lv_creal(src0[0]) * lv_creal(src0[0]) + lv_cimag(src0[0]) * lv_cimag(src0[0]);
++
++ xmm2 = _mm_load1_ps(&sq_dist);
++
++ xmm1 = xmm3;
++
++ xmm3 = _mm_min_ss(xmm3, xmm2);
++
++ xmm4.float_vec = _mm_cmpgt_ps(xmm1, xmm3);
++ xmm5.float_vec = _mm_cmpeq_ps(xmm1, xmm3);
++
++ xmm8 = _mm_shuffle_epi32(xmm8, 0x00);
++
++ xmm11 = _mm_and_si128(xmm8, xmm4.int_vec);
++ xmm12 = _mm_and_si128(xmm9, xmm5.int_vec);
++
++ xmm9 = _mm_add_epi32(xmm11, xmm12);
++ }
++
++ _mm_store_ps((float*)&(holderf.f), xmm3);
++ _mm_store_si128(&(holderi.int_vec), xmm9);
++
++ target[0] = holderi.i[0];
++ sq_dist = holderf.f[0];
++ target[0] = (holderf.f[1] < sq_dist) ? holderi.i[1] : target[0];
++ sq_dist = (holderf.f[1] < sq_dist) ? holderf.f[1] : sq_dist;
++ target[0] = (holderf.f[2] < sq_dist) ? holderi.i[2] : target[0];
++ sq_dist = (holderf.f[2] < sq_dist) ? holderf.f[2] : sq_dist;
++ target[0] = (holderf.f[3] < sq_dist) ? holderi.i[3] : target[0];
++ sq_dist = (holderf.f[3] < sq_dist) ? holderf.f[3] : sq_dist;
++}
++
++#endif /*LV_HAVE_SSE3*/
++
++#ifdef LV_HAVE_GENERIC
++static inline void
++volk_32fc_index_min_16u_generic(uint16_t* target, lv_32fc_t* src0, uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ const uint32_t num_bytes = num_points * 8;
++
++ float sq_dist = 0.0;
++ float min = FLT_MAX;
++ uint16_t index = 0;
++
++ uint32_t i = 0;
++
++ for (; i<num_bytes>> 3; ++i) {
++ sq_dist =
++ lv_creal(src0[i]) * lv_creal(src0[i]) + lv_cimag(src0[i]) * lv_cimag(src0[i]);
++
++ if (sq_dist < min) {
++ index = i;
++ min = sq_dist;
++ }
++ }
++ target[0] = index;
++}
++
++#endif /*LV_HAVE_GENERIC*/
++
++#endif /*INCLUDED_volk_32fc_index_min_16u_a_H*/
++
++#ifndef INCLUDED_volk_32fc_index_min_16u_u_H
++#define INCLUDED_volk_32fc_index_min_16u_u_H
++
++#include <inttypes.h>
++#include <limits.h>
++#include <stdio.h>
++#include <volk/volk_common.h>
++#include <volk/volk_complex.h>
++
++#ifdef LV_HAVE_AVX2
++#include <immintrin.h>
++#include <volk/volk_avx2_intrinsics.h>
++
++static inline void volk_32fc_index_min_16u_u_avx2_variant_0(uint16_t* target,
++ lv_32fc_t* src0,
++ uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ const __m256i indices_increment = _mm256_set1_epi32(8);
++ /*
++ * At the start of each loop iteration current_indices holds the indices of
++ * the complex numbers loaded from memory. Explanation for odd order is given
++ * in implementation of vector_32fc_index_min_variant0().
++ */
++ __m256i current_indices = _mm256_set_epi32(7, 6, 3, 2, 5, 4, 1, 0);
++
++ __m256 min_values = _mm256_set1_ps(FLT_MAX);
++ __m256i min_indices = _mm256_setzero_si256();
++
++ for (unsigned i = 0; i < num_points / 8u; ++i) {
++ __m256 in0 = _mm256_loadu_ps((float*)src0);
++ __m256 in1 = _mm256_loadu_ps((float*)(src0 + 4));
++ vector_32fc_index_min_variant0(
++ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
++ src0 += 8;
++ }
++
++ // determine minimum value and index in the result of the vectorized loop
++ __VOLK_ATTR_ALIGNED(32) float min_values_buffer[8];
++ __VOLK_ATTR_ALIGNED(32) uint32_t min_indices_buffer[8];
++ _mm256_store_ps(min_values_buffer, min_values);
++ _mm256_store_si256((__m256i*)min_indices_buffer, min_indices);
++
++ float min = FLT_MAX;
++ uint32_t index = 0;
++ for (unsigned i = 0; i < 8; i++) {
++ if (min_values_buffer[i] < min) {
++ min = min_values_buffer[i];
++ index = min_indices_buffer[i];
++ }
++ }
++
++ // handle tail not processed by the vectorized loop
++ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
++ const float abs_squared =
++ lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ if (abs_squared < min) {
++ min = abs_squared;
++ index = i;
++ }
++ ++src0;
++ }
++
++ *target = index;
++}
++
++#endif /*LV_HAVE_AVX2*/
++
++#ifdef LV_HAVE_AVX2
++#include <immintrin.h>
++#include <volk/volk_avx2_intrinsics.h>
++
++static inline void volk_32fc_index_min_16u_u_avx2_variant_1(uint16_t* target,
++ lv_32fc_t* src0,
++ uint32_t num_points)
++{
++ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
++
++ const __m256i indices_increment = _mm256_set1_epi32(8);
++ /*
++ * At the start of each loop iteration current_indices holds the indices of
++ * the complex numbers loaded from memory. Explanation for odd order is given
++ * in implementation of vector_32fc_index_min_variant0().
++ */
++ __m256i current_indices = _mm256_set_epi32(7, 6, 3, 2, 5, 4, 1, 0);
++
++ __m256 min_values = _mm256_set1_ps(FLT_MAX);
++ __m256i min_indices = _mm256_setzero_si256();
++
++ for (unsigned i = 0; i < num_points / 8u; ++i) {
++ __m256 in0 = _mm256_loadu_ps((float*)src0);
++ __m256 in1 = _mm256_loadu_ps((float*)(src0 + 4));
++ vector_32fc_index_min_variant1(
++ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
++ src0 += 8;
++ }
++
++ // determine minimum value and index in the result of the vectorized loop
++ __VOLK_ATTR_ALIGNED(32) float min_values_buffer[8];
++ __VOLK_ATTR_ALIGNED(32) uint32_t min_indices_buffer[8];
++ _mm256_store_ps(min_values_buffer, min_values);
++ _mm256_store_si256((__m256i*)min_indices_buffer, min_indices);
++
++ float min = FLT_MAX;
++ uint32_t index = 0;
++ for (unsigned i = 0; i < 8; i++) {
++ if (min_values_buffer[i] < min) {
++ min = min_values_buffer[i];
++ index = min_indices_buffer[i];
++ }
++ }
++
++ // handle tail not processed by the vectorized loop
++ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
++ const float abs_squared =
++ lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ if (abs_squared < min) {
++ min = abs_squared;
++ index = i;
++ }
++ ++src0;
++ }
++
++ *target = index;
++}
++
++#endif /*LV_HAVE_AVX2*/
++
++#endif /*INCLUDED_volk_32fc_index_min_16u_u_H*/
+diff --git a/kernels/volk/volk_32fc_index_min_32u.h b/kernels/volk/volk_32fc_index_min_32u.h
+new file mode 100644
+index 0000000..290b754
+--- /dev/null
++++ b/kernels/volk/volk_32fc_index_min_32u.h
+@@ -0,0 +1,524 @@
++/* -*- c++ -*- */
++/*
++ * Copyright 2021 Free Software Foundation, Inc.
++ *
++ * This file is part of GNU Radio
++ *
++ * GNU Radio is free software; you can redistribute it and/or modify
++ * it under the terms of the GNU General Public License as published by
++ * the Free Software Foundation; either version 3, or (at your option)
++ * any later version.
++ *
++ * GNU Radio is distributed in the hope that it will be useful,
++ * but WITHOUT ANY WARRANTY; without even the implied warranty of
++ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++ * GNU General Public License for more details.
++ *
++ * You should have received a copy of the GNU General Public License
++ * along with GNU Radio; see the file COPYING. If not, write to
++ * the Free Software Foundation, Inc., 51 Franklin Street,
++ * Boston, MA 02110-1301, USA.
++ */
++
++/*!
++ * \page volk_32fc_index_min_32u
++ *
++ * \b Overview
++ *
++ * Returns Argmin_i mag(x[i]). Finds and returns the index which contains the
++ * minimum magnitude for complex points in the given vector.
++ *
++ * <b>Dispatcher Prototype</b>
++ * \code
++ * void volk_32fc_index_min_32u(uint32_t* target, lv_32fc_t* src0, uint32_t
++ * num_points) \endcode
++ *
++ * \b Inputs
++ * \li src0: The complex input vector.
++ * \li num_points: The number of samples.
++ *
++ * \b Outputs
++ * \li target: The index of the point with minimum magnitude.
++ *
++ * \b Example
++ * Calculate the index of the minimum value of \f$x^2 + x\f$ for points around
++ * the unit circle.
++ * \code
++ * int N = 10;
++ * uint32_t alignment = volk_get_alignment();
++ * lv_32fc_t* in = (lv_32fc_t*)volk_malloc(sizeof(lv_32fc_t)*N, alignment);
++ * uint32_t* min = (uint32_t*)volk_malloc(sizeof(uint32_t), alignment);
++ *
++ * for(uint32_t ii = 0; ii < N/2; ++ii){
++ * float real = 2.f * ((float)ii / (float)N) - 1.f;
++ * float imag = std::sqrt(1.f - real * real);
++ * in[ii] = lv_cmake(real, imag);
++ * in[ii] = in[ii] * in[ii] + in[ii];
++ * in[N-ii] = lv_cmake(real, imag);
++ * in[N-ii] = in[N-ii] * in[N-ii] + in[N-ii];
++ * }
++ *
++ * volk_32fc_index_min_32u(min, in, N);
++ *
++ * printf("index of min value = %u\n", *min);
++ *
++ * volk_free(in);
++ * volk_free(min);
++ * \endcode
++ */
++
++#ifndef INCLUDED_volk_32fc_index_min_32u_a_H
++#define INCLUDED_volk_32fc_index_min_32u_a_H
++
++#include <inttypes.h>
++#include <stdio.h>
++#include <volk/volk_common.h>
++#include <volk/volk_complex.h>
++
++#ifdef LV_HAVE_AVX2
++#include <immintrin.h>
++#include <volk/volk_avx2_intrinsics.h>
++
++static inline void volk_32fc_index_min_32u_a_avx2_variant_0(uint32_t* target,
++ lv_32fc_t* src0,
++ uint32_t num_points)
++{
++ const __m256i indices_increment = _mm256_set1_epi32(8);
++ /*
++ * At the start of each loop iteration current_indices holds the indices of
++ * the complex numbers loaded from memory. Explanation for odd order is given
++ * in implementation of vector_32fc_index_min_variant0().
++ */
++ __m256i current_indices = _mm256_set_epi32(7, 6, 3, 2, 5, 4, 1, 0);
++
++ __m256 min_values = _mm256_set1_ps(FLT_MAX);
++ __m256i min_indices = _mm256_setzero_si256();
++
++ for (unsigned i = 0; i < num_points / 8u; ++i) {
++ __m256 in0 = _mm256_load_ps((float*)src0);
++ __m256 in1 = _mm256_load_ps((float*)(src0 + 4));
++ vector_32fc_index_min_variant0(
++ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
++ src0 += 8;
++ }
++
++ // determine minimum value and index in the result of the vectorized loop
++ __VOLK_ATTR_ALIGNED(32) float min_values_buffer[8];
++ __VOLK_ATTR_ALIGNED(32) uint32_t min_indices_buffer[8];
++ _mm256_store_ps(min_values_buffer, min_values);
++ _mm256_store_si256((__m256i*)min_indices_buffer, min_indices);
++
++ float min = FLT_MAX;
++ uint32_t index = 0;
++ for (unsigned i = 0; i < 8; i++) {
++ if (min_values_buffer[i] < min) {
++ min = min_values_buffer[i];
++ index = min_indices_buffer[i];
++ }
++ }
++
++ // handle tail not processed by the vectorized loop
++ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
++ const float abs_squared =
++ lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ if (abs_squared < min) {
++ min = abs_squared;
++ index = i;
++ }
++ ++src0;
++ }
++
++ *target = index;
++}
++
++#endif /*LV_HAVE_AVX2*/
++
++#ifdef LV_HAVE_AVX2
++#include <immintrin.h>
++#include <volk/volk_avx2_intrinsics.h>
++
++static inline void volk_32fc_index_min_32u_a_avx2_variant_1(uint32_t* target,
++ lv_32fc_t* src0,
++ uint32_t num_points)
++{
++ const __m256i indices_increment = _mm256_set1_epi32(8);
++ /*
++ * At the start of each loop iteration current_indices holds the indices of
++ * the complex numbers loaded from memory. Explanation for odd order is given
++ * in implementation of vector_32fc_index_min_variant0().
++ */
++ __m256i current_indices = _mm256_set_epi32(7, 6, 3, 2, 5, 4, 1, 0);
++
++ __m256 min_values = _mm256_set1_ps(FLT_MAX);
++ __m256i min_indices = _mm256_setzero_si256();
++
++ for (unsigned i = 0; i < num_points / 8u; ++i) {
++ __m256 in0 = _mm256_load_ps((float*)src0);
++ __m256 in1 = _mm256_load_ps((float*)(src0 + 4));
++ vector_32fc_index_min_variant1(
++ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
++ src0 += 8;
++ }
++
++ // determine minimum value and index in the result of the vectorized loop
++ __VOLK_ATTR_ALIGNED(32) float min_values_buffer[8];
++ __VOLK_ATTR_ALIGNED(32) uint32_t min_indices_buffer[8];
++ _mm256_store_ps(min_values_buffer, min_values);
++ _mm256_store_si256((__m256i*)min_indices_buffer, min_indices);
++
++ float min = FLT_MAX;
++ uint32_t index = 0;
++ for (unsigned i = 0; i < 8; i++) {
++ if (min_values_buffer[i] < min) {
++ min = min_values_buffer[i];
++ index = min_indices_buffer[i];
++ }
++ }
++
++ // handle tail not processed by the vectorized loop
++ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
++ const float abs_squared =
++ lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ if (abs_squared < min) {
++ min = abs_squared;
++ index = i;
++ }
++ ++src0;
++ }
++
++ *target = index;
++}
++
++#endif /*LV_HAVE_AVX2*/
++
++#ifdef LV_HAVE_SSE3
++#include <pmmintrin.h>
++#include <xmmintrin.h>
++
++static inline void
++volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* src0, uint32_t num_points)
++{
++ const uint32_t num_bytes = num_points * 8;
++
++ union bit128 holderf;
++ union bit128 holderi;
++ float sq_dist = 0.0;
++
++ union bit128 xmm5, xmm4;
++ __m128 xmm1, xmm2, xmm3;
++ __m128i xmm8, xmm11, xmm12, xmm9, xmm10;
++
++ xmm5.int_vec = _mm_setzero_si128();
++ xmm4.int_vec = _mm_setzero_si128();
++ holderf.int_vec = _mm_setzero_si128();
++ holderi.int_vec = _mm_setzero_si128();
++
++ int bound = num_bytes >> 5;
++ int i = 0;
++
++ xmm8 = _mm_setr_epi32(0, 1, 2, 3);
++ xmm9 = _mm_setzero_si128();
++ xmm10 = _mm_setr_epi32(4, 4, 4, 4);
++ xmm3 = _mm_set_ps1(FLT_MAX);
++
++ for (; i < bound; ++i) {
++ xmm1 = _mm_load_ps((float*)src0);
++ xmm2 = _mm_load_ps((float*)&src0[2]);
++
++ src0 += 4;
++
++ xmm1 = _mm_mul_ps(xmm1, xmm1);
++ xmm2 = _mm_mul_ps(xmm2, xmm2);
++
++ xmm1 = _mm_hadd_ps(xmm1, xmm2);
++
++ xmm3 = _mm_min_ps(xmm1, xmm3);
++
++ xmm4.float_vec = _mm_cmpgt_ps(xmm1, xmm3);
++ xmm5.float_vec = _mm_cmpeq_ps(xmm1, xmm3);
++
++ xmm11 = _mm_and_si128(xmm8, xmm5.int_vec);
++ xmm12 = _mm_and_si128(xmm9, xmm4.int_vec);
++
++ xmm9 = _mm_add_epi32(xmm11, xmm12);
++
++ xmm8 = _mm_add_epi32(xmm8, xmm10);
++ }
++
++ if (num_bytes >> 4 & 1) {
++ xmm2 = _mm_load_ps((float*)src0);
++
++ xmm1 = _mm_movelh_ps(bit128_p(&xmm8)->float_vec, bit128_p(&xmm8)->float_vec);
++ xmm8 = bit128_p(&xmm1)->int_vec;
++
++ xmm2 = _mm_mul_ps(xmm2, xmm2);
++
++ src0 += 2;
++
++ xmm1 = _mm_hadd_ps(xmm2, xmm2);
++
++ xmm3 = _mm_min_ps(xmm1, xmm3);
++
++ xmm10 = _mm_setr_epi32(2, 2, 2, 2);
++
++ xmm4.float_vec = _mm_cmpgt_ps(xmm1, xmm3);
++ xmm5.float_vec = _mm_cmpeq_ps(xmm1, xmm3);
++
++ xmm11 = _mm_and_si128(xmm8, xmm5.int_vec);
++ xmm12 = _mm_and_si128(xmm9, xmm4.int_vec);
++
++ xmm9 = _mm_add_epi32(xmm11, xmm12);
++
++ xmm8 = _mm_add_epi32(xmm8, xmm10);
++ }
++
++ if (num_bytes >> 3 & 1) {
++ sq_dist =
++ lv_creal(src0[0]) * lv_creal(src0[0]) + lv_cimag(src0[0]) * lv_cimag(src0[0]);
++
++ xmm2 = _mm_load1_ps(&sq_dist);
++
++ xmm1 = xmm3;
++
++ xmm3 = _mm_min_ss(xmm3, xmm2);
++
++ xmm4.float_vec = _mm_cmpgt_ps(xmm1, xmm3);
++ xmm5.float_vec = _mm_cmpeq_ps(xmm1, xmm3);
++
++ xmm8 = _mm_shuffle_epi32(xmm8, 0x00);
++
++ xmm11 = _mm_and_si128(xmm8, xmm4.int_vec);
++ xmm12 = _mm_and_si128(xmm9, xmm5.int_vec);
++
++ xmm9 = _mm_add_epi32(xmm11, xmm12);
++ }
++
++ _mm_store_ps((float*)&(holderf.f), xmm3);
++ _mm_store_si128(&(holderi.int_vec), xmm9);
++
++ target[0] = holderi.i[0];
++ sq_dist = holderf.f[0];
++ target[0] = (holderf.f[1] < sq_dist) ? holderi.i[1] : target[0];
++ sq_dist = (holderf.f[1] < sq_dist) ? holderf.f[1] : sq_dist;
++ target[0] = (holderf.f[2] < sq_dist) ? holderi.i[2] : target[0];
++ sq_dist = (holderf.f[2] < sq_dist) ? holderf.f[2] : sq_dist;
++ target[0] = (holderf.f[3] < sq_dist) ? holderi.i[3] : target[0];
++ sq_dist = (holderf.f[3] < sq_dist) ? holderf.f[3] : sq_dist;
++}
++
++#endif /*LV_HAVE_SSE3*/
++
++#ifdef LV_HAVE_GENERIC
++static inline void
++volk_32fc_index_min_32u_generic(uint32_t* target, lv_32fc_t* src0, uint32_t num_points)
++{
++ const uint32_t num_bytes = num_points * 8;
++
++ float sq_dist = 0.0;
++ float min = FLT_MAX;
++ uint32_t index = 0;
++
++ uint32_t i = 0;
++
++ for (; i<num_bytes>> 3; ++i) {
++ sq_dist =
++ lv_creal(src0[i]) * lv_creal(src0[i]) + lv_cimag(src0[i]) * lv_cimag(src0[i]);
++
++ if (sq_dist < min) {
++ index = i;
++ min = sq_dist;
++ }
++ }
++ target[0] = index;
++}
++
++#endif /*LV_HAVE_GENERIC*/
++
++#endif /*INCLUDED_volk_32fc_index_min_32u_a_H*/
++
++#ifndef INCLUDED_volk_32fc_index_min_32u_u_H
++#define INCLUDED_volk_32fc_index_min_32u_u_H
++
++#include <inttypes.h>
++#include <stdio.h>
++#include <volk/volk_common.h>
++#include <volk/volk_complex.h>
++
++#ifdef LV_HAVE_AVX2
++#include <immintrin.h>
++#include <volk/volk_avx2_intrinsics.h>
++
++static inline void volk_32fc_index_min_32u_u_avx2_variant_0(uint32_t* target,
++ lv_32fc_t* src0,
++ uint32_t num_points)
++{
++ const __m256i indices_increment = _mm256_set1_epi32(8);
++ /*
++ * At the start of each loop iteration current_indices holds the indices of
++ * the complex numbers loaded from memory. Explanation for odd order is given
++ * in implementation of vector_32fc_index_min_variant0().
++ */
++ __m256i current_indices = _mm256_set_epi32(7, 6, 3, 2, 5, 4, 1, 0);
++
++ __m256 min_values = _mm256_set1_ps(FLT_MAX);
++ __m256i min_indices = _mm256_setzero_si256();
++
++ for (unsigned i = 0; i < num_points / 8u; ++i) {
++ __m256 in0 = _mm256_loadu_ps((float*)src0);
++ __m256 in1 = _mm256_loadu_ps((float*)(src0 + 4));
++ vector_32fc_index_min_variant0(
++ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
++ src0 += 8;
++ }
++
++ // determine minimum value and index in the result of the vectorized loop
++ __VOLK_ATTR_ALIGNED(32) float min_values_buffer[8];
++ __VOLK_ATTR_ALIGNED(32) uint32_t min_indices_buffer[8];
++ _mm256_store_ps(min_values_buffer, min_values);
++ _mm256_store_si256((__m256i*)min_indices_buffer, min_indices);
++
++ float min = FLT_MAX;
++ uint32_t index = 0;
++ for (unsigned i = 0; i < 8; i++) {
++ if (min_values_buffer[i] < min) {
++ min = min_values_buffer[i];
++ index = min_indices_buffer[i];
++ }
++ }
++
++ // handle tail not processed by the vectorized loop
++ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
++ const float abs_squared =
++ lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ if (abs_squared < min) {
++ min = abs_squared;
++ index = i;
++ }
++ ++src0;
++ }
++
++ *target = index;
++}
++
++#endif /*LV_HAVE_AVX2*/
++
++#ifdef LV_HAVE_AVX2
++#include <immintrin.h>
++#include <volk/volk_avx2_intrinsics.h>
++
++static inline void volk_32fc_index_min_32u_u_avx2_variant_1(uint32_t* target,
++ lv_32fc_t* src0,
++ uint32_t num_points)
++{
++ const __m256i indices_increment = _mm256_set1_epi32(8);
++ /*
++ * At the start of each loop iteration current_indices holds the indices of
++ * the complex numbers loaded from memory. Explanation for odd order is given
++ * in implementation of vector_32fc_index_min_variant0().
++ */
++ __m256i current_indices = _mm256_set_epi32(7, 6, 3, 2, 5, 4, 1, 0);
++
++ __m256 min_values = _mm256_set1_ps(FLT_MAX);
++ __m256i min_indices = _mm256_setzero_si256();
++
++ for (unsigned i = 0; i < num_points / 8u; ++i) {
++ __m256 in0 = _mm256_loadu_ps((float*)src0);
++ __m256 in1 = _mm256_loadu_ps((float*)(src0 + 4));
++ vector_32fc_index_min_variant1(
++ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
++ src0 += 8;
++ }
++
++ // determine minimum value and index in the result of the vectorized loop
++ __VOLK_ATTR_ALIGNED(32) float min_values_buffer[8];
++ __VOLK_ATTR_ALIGNED(32) uint32_t min_indices_buffer[8];
++ _mm256_store_ps(min_values_buffer, min_values);
++ _mm256_store_si256((__m256i*)min_indices_buffer, min_indices);
++
++ float min = FLT_MAX;
++ uint32_t index = 0;
++ for (unsigned i = 0; i < 8; i++) {
++ if (min_values_buffer[i] < min) {
++ min = min_values_buffer[i];
++ index = min_indices_buffer[i];
++ }
++ }
++
++ // handle tail not processed by the vectorized loop
++ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
++ const float abs_squared =
++ lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ if (abs_squared < min) {
++ min = abs_squared;
++ index = i;
++ }
++ ++src0;
++ }
++
++ *target = index;
++}
++
++#endif /*LV_HAVE_AVX2*/
++
++#ifdef LV_HAVE_NEON
++#include <arm_neon.h>
++#include <volk/volk_neon_intrinsics.h>
++
++static inline void
++volk_32fc_index_min_32u_neon(uint32_t* target, lv_32fc_t* src0, uint32_t num_points)
++{
++ unsigned int number = 0;
++ const uint32_t quarter_points = num_points / 4;
++ const lv_32fc_t* src0Ptr = src0;
++
++ uint32_t indices[4] = { 0, 1, 2, 3 };
++ const uint32x4_t vec_indices_incr = vdupq_n_u32(4);
++ uint32x4_t vec_indices = vld1q_u32(indices);
++ uint32x4_t vec_min_indices = vec_indices;
++
++ if (num_points) {
++ float min = *src0Ptr;
++ uint32_t index = 0;
++
++ float32x4_t vec_min = vdupq_n_f32(*src0Ptr);
++
++ for (; number < quarter_points; number++) {
++ // Load complex and compute magnitude squared
++ const float32x4_t vec_mag2 =
++ _vmagnitudesquaredq_f32(vld2q_f32((float*)src0Ptr));
++ __VOLK_PREFETCH(src0Ptr += 4);
++ // a < b?
++ const uint32x4_t lt_mask = vcltq_f32(vec_mag2, vec_min);
++ vec_min = vbslq_f32(lt_mask, vec_mag2, vec_min);
++ vec_min_indices = vbslq_u32(lt_mask, vec_indices, vec_min_indices);
++ vec_indices = vaddq_u32(vec_indices, vec_indices_incr);
++ }
++ uint32_t tmp_min_indices[4];
++ float tmp_min[4];
++ vst1q_u32(tmp_min_indices, vec_min_indices);
++ vst1q_f32(tmp_min, vec_min);
++
++ for (int i = 0; i < 4; i++) {
++ if (tmp_min[i] < min) {
++ min = tmp_min[i];
++ index = tmp_min_indices[i];
++ }
++ }
++
++ // Deal with the rest
++ for (number = quarter_points * 4; number < num_points; number++) {
++ const float re = lv_creal(*src0Ptr);
++ const float im = lv_cimag(*src0Ptr);
++ if ((re * re + im * im) < min) {
++ min = *src0Ptr;
++ index = number;
++ }
++ src0Ptr++;
++ }
++ *target = index;
++ }
++}
++
++#endif /*LV_HAVE_NEON*/
++
++#endif /*INCLUDED_volk_32fc_index_min_32u_u_H*/
+diff --git a/lib/kernel_tests.h b/lib/kernel_tests.h
+index 6df83ab..9f947cf 100644
+--- a/lib/kernel_tests.h
++++ b/lib/kernel_tests.h
+@@ -68,6 +68,8 @@ std::vector<volk_test_case_t> init_test_list(volk_test_params_t test_params)
+ QA(VOLK_INIT_TEST(volk_32f_x2_add_32f, test_params))
+ QA(VOLK_INIT_TEST(volk_32f_index_max_16u, test_params))
+ QA(VOLK_INIT_TEST(volk_32f_index_max_32u, test_params))
++ QA(VOLK_INIT_TEST(volk_32f_index_min_16u, test_params))
++ QA(VOLK_INIT_TEST(volk_32f_index_min_32u, test_params))
+ QA(VOLK_INIT_TEST(volk_32fc_32f_multiply_32fc, test_params))
+ QA(VOLK_INIT_TEST(volk_32fc_32f_add_32fc, test_params))
+ QA(VOLK_INIT_TEST(volk_32f_log2_32f, test_params.make_absolute(1e-5)))
+@@ -94,6 +96,8 @@ std::vector<volk_test_case_t> init_test_list(volk_test_params_t test_params)
+ QA(VOLK_INIT_TEST(volk_32fc_32f_dot_prod_32fc, test_params_inacc))
+ QA(VOLK_INIT_TEST(volk_32fc_index_max_16u, test_params))
+ QA(VOLK_INIT_TEST(volk_32fc_index_max_32u, test_params))
++ QA(VOLK_INIT_TEST(volk_32fc_index_min_16u, test_params))
++ QA(VOLK_INIT_TEST(volk_32fc_index_min_32u, test_params))
+ QA(VOLK_INIT_TEST(volk_32fc_s32f_magnitude_16i, test_params))
+ QA(VOLK_INIT_TEST(volk_32fc_magnitude_32f, test_params_inacc_tenth))
+ QA(VOLK_INIT_TEST(volk_32fc_magnitude_squared_32f, test_params))
+--
+2.30.2
+
--- /dev/null
+From 6d21053e58073f82f1ec9bd83707c95b77807fce Mon Sep 17 00:00:00 2001
+From: Zlika <zlika_ese@hotmail.com>
+Date: Fri, 11 Jun 2021 11:13:28 +0200
+Subject: [PATCH 02/73] Fix volk_32fc_index_min_32u_neon
+
+Signed-off-by: Zlika <zlika_ese@hotmail.com>
+---
+ kernels/volk/volk_32fc_index_min_32u.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/kernels/volk/volk_32fc_index_min_32u.h b/kernels/volk/volk_32fc_index_min_32u.h
+index 290b754..31eb094 100644
+--- a/kernels/volk/volk_32fc_index_min_32u.h
++++ b/kernels/volk/volk_32fc_index_min_32u.h
+@@ -477,7 +477,7 @@ volk_32fc_index_min_32u_neon(uint32_t* target, lv_32fc_t* src0, uint32_t num_poi
+ uint32x4_t vec_min_indices = vec_indices;
+
+ if (num_points) {
+- float min = *src0Ptr;
++ float min = FLT_MAX;
+ uint32_t index = 0;
+
+ float32x4_t vec_min = vdupq_n_f32(*src0Ptr);
+--
+2.30.2
+
--- /dev/null
+From ac395a54e62429ff043ba240986f27507a54df75 Mon Sep 17 00:00:00 2001
+From: Zlika <zlika_ese@hotmail.com>
+Date: Fri, 11 Jun 2021 16:46:51 +0200
+Subject: [PATCH 03/73] Fix volk_32fc_index_min_32u_neon
+
+Signed-off-by: Zlika <zlika_ese@hotmail.com>
+---
+ kernels/volk/volk_32fc_index_min_32u.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/kernels/volk/volk_32fc_index_min_32u.h b/kernels/volk/volk_32fc_index_min_32u.h
+index 31eb094..545f9bf 100644
+--- a/kernels/volk/volk_32fc_index_min_32u.h
++++ b/kernels/volk/volk_32fc_index_min_32u.h
+@@ -480,7 +480,7 @@ volk_32fc_index_min_32u_neon(uint32_t* target, lv_32fc_t* src0, uint32_t num_poi
+ float min = FLT_MAX;
+ uint32_t index = 0;
+
+- float32x4_t vec_min = vdupq_n_f32(*src0Ptr);
++ float32x4_t vec_min = vdupq_n_f32(FLT_MAX);
+
+ for (; number < quarter_points; number++) {
+ // Load complex and compute magnitude squared
+--
+2.30.2
+
--- /dev/null
+From 7739ff89e4908a2d30013cd89c529daac9c26049 Mon Sep 17 00:00:00 2001
+From: Zlika <zlika_ese@hotmail.com>
+Date: Wed, 16 Jun 2021 15:11:25 +0200
+Subject: [PATCH 04/73] Code cleanup
+
+Signed-off-by: Zlika <zlika_ese@hotmail.com>
+---
+ kernels/volk/volk_32f_index_min_16u.h | 92 +++++++---------
+ kernels/volk/volk_32f_index_min_32u.h | 142 +++++++++++--------------
+ kernels/volk/volk_32fc_index_min_16u.h | 82 +++++++-------
+ kernels/volk/volk_32fc_index_min_32u.h | 102 +++++++++---------
+ 4 files changed, 190 insertions(+), 228 deletions(-)
+
+diff --git a/kernels/volk/volk_32f_index_min_16u.h b/kernels/volk/volk_32f_index_min_16u.h
+index 848b75c..d8ffcc7 100644
+--- a/kernels/volk/volk_32f_index_min_16u.h
++++ b/kernels/volk/volk_32f_index_min_16u.h
+@@ -36,11 +36,11 @@
+ *
+ * <b>Dispatcher Prototype</b>
+ * \code
+- * void volk_32f_index_min_16u(uint16_t* target, const float* src0, uint32_t num_points)
++ * void volk_32f_index_min_16u(uint16_t* target, const float* source, uint32_t num_points)
+ * \endcode
+ *
+ * \b Inputs
+- * \li src0: The input vector of floats.
++ * \li source: The input vector of floats.
+ * \li num_points: The number of data points.
+ *
+ * \b Outputs
+@@ -80,19 +80,17 @@
+ #include <immintrin.h>
+
+ static inline void
+-volk_32f_index_min_16u_a_avx(uint16_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_16u_a_avx(uint16_t* target, const float* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+-
+- uint32_t number = 0;
+ const uint32_t eighthPoints = num_points / 8;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m256 indexIncrementValues = _mm256_set1_ps(8);
+ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m256 minValues = _mm256_set1_ps(min);
+ __m256 minValuesIndex = _mm256_setzero_ps();
+@@ -102,7 +100,7 @@ volk_32f_index_min_16u_a_avx(uint16_t* target, const float* src0, uint32_t num_p
+ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
+ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
+
+- for (; number < eighthPoints; number++) {
++ for (uint32_t number = 0; number < eighthPoints; number++) {
+
+ currentValues = _mm256_load_ps(inputPtr);
+ inputPtr += 8;
+@@ -118,7 +116,7 @@ volk_32f_index_min_16u_a_avx(uint16_t* target, const float* src0, uint32_t num_p
+ _mm256_store_ps(minValuesBuffer, minValues);
+ _mm256_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 8; number++) {
++ for (uint32_t number = 0; number < 8; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -128,11 +126,10 @@ volk_32f_index_min_16u_a_avx(uint16_t* target, const float* src0, uint32_t num_p
+ }
+ }
+
+- number = eighthPoints * 8;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = eighthPoints * 8; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint16_t)index;
+@@ -144,19 +141,17 @@ volk_32f_index_min_16u_a_avx(uint16_t* target, const float* src0, uint32_t num_p
+ #include <smmintrin.h>
+
+ static inline void
+-volk_32f_index_min_16u_a_sse4_1(uint16_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_16u_a_sse4_1(uint16_t* target, const float* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+-
+- uint32_t number = 0;
+ const uint32_t quarterPoints = num_points / 4;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m128 indexIncrementValues = _mm_set1_ps(4);
+ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m128 minValues = _mm_set1_ps(min);
+ __m128 minValuesIndex = _mm_setzero_ps();
+@@ -166,7 +161,7 @@ volk_32f_index_min_16u_a_sse4_1(uint16_t* target, const float* src0, uint32_t nu
+ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+
+- for (; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+
+ currentValues = _mm_load_ps(inputPtr);
+ inputPtr += 4;
+@@ -182,7 +177,7 @@ volk_32f_index_min_16u_a_sse4_1(uint16_t* target, const float* src0, uint32_t nu
+ _mm_store_ps(minValuesBuffer, minValues);
+ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 4; number++) {
++ for (uint32_t number = 0; number < 4; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -192,11 +187,10 @@ volk_32f_index_min_16u_a_sse4_1(uint16_t* target, const float* src0, uint32_t nu
+ }
+ }
+
+- number = quarterPoints * 4;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint16_t)index;
+@@ -210,19 +204,17 @@ volk_32f_index_min_16u_a_sse4_1(uint16_t* target, const float* src0, uint32_t nu
+ #include <xmmintrin.h>
+
+ static inline void
+-volk_32f_index_min_16u_a_sse(uint16_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_16u_a_sse(uint16_t* target, const float* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+-
+- uint32_t number = 0;
+ const uint32_t quarterPoints = num_points / 4;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m128 indexIncrementValues = _mm_set1_ps(4);
+ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m128 minValues = _mm_set1_ps(min);
+ __m128 minValuesIndex = _mm_setzero_ps();
+@@ -232,7 +224,7 @@ volk_32f_index_min_16u_a_sse(uint16_t* target, const float* src0, uint32_t num_p
+ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+
+- for (; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+
+ currentValues = _mm_load_ps(inputPtr);
+ inputPtr += 4;
+@@ -250,7 +242,7 @@ volk_32f_index_min_16u_a_sse(uint16_t* target, const float* src0, uint32_t num_p
+ _mm_store_ps(minValuesBuffer, minValues);
+ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 4; number++) {
++ for (uint32_t number = 0; number < 4; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -260,11 +252,10 @@ volk_32f_index_min_16u_a_sse(uint16_t* target, const float* src0, uint32_t num_p
+ }
+ }
+
+- number = quarterPoints * 4;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint16_t)index;
+@@ -276,19 +267,17 @@ volk_32f_index_min_16u_a_sse(uint16_t* target, const float* src0, uint32_t num_p
+ #ifdef LV_HAVE_GENERIC
+
+ static inline void
+-volk_32f_index_min_16u_generic(uint16_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_16u_generic(uint16_t* target, const float* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+
+- float min = src0[0];
++ float min = source[0];
+ uint16_t index = 0;
+
+- uint32_t i = 1;
+-
+- for (; i < num_points; ++i) {
+- if (src0[i] < min) {
++ for (uint32_t i = 1; i < num_points; ++i) {
++ if (source[i] < min) {
+ index = i;
+- min = src0[i];
++ min = source[i];
+ }
+ }
+ target[0] = index;
+@@ -312,19 +301,17 @@ volk_32f_index_min_16u_generic(uint16_t* target, const float* src0, uint32_t num
+ #include <immintrin.h>
+
+ static inline void
+-volk_32f_index_min_16u_u_avx(uint16_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_16u_u_avx(uint16_t* target, const float* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+-
+- uint32_t number = 0;
+ const uint32_t eighthPoints = num_points / 8;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m256 indexIncrementValues = _mm256_set1_ps(8);
+ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m256 minValues = _mm256_set1_ps(min);
+ __m256 minValuesIndex = _mm256_setzero_ps();
+@@ -334,7 +321,7 @@ volk_32f_index_min_16u_u_avx(uint16_t* target, const float* src0, uint32_t num_p
+ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
+ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
+
+- for (; number < eighthPoints; number++) {
++ for (uint32_t number = 0; number < eighthPoints; number++) {
+
+ currentValues = _mm256_loadu_ps(inputPtr);
+ inputPtr += 8;
+@@ -350,7 +337,7 @@ volk_32f_index_min_16u_u_avx(uint16_t* target, const float* src0, uint32_t num_p
+ _mm256_storeu_ps(minValuesBuffer, minValues);
+ _mm256_storeu_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 8; number++) {
++ for (uint32_t number = 0; number < 8; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -360,11 +347,10 @@ volk_32f_index_min_16u_u_avx(uint16_t* target, const float* src0, uint32_t num_p
+ }
+ }
+
+- number = eighthPoints * 8;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = eighthPoints * 8; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint16_t)index;
+diff --git a/kernels/volk/volk_32f_index_min_32u.h b/kernels/volk/volk_32f_index_min_32u.h
+index 67ee426..23c2d17 100644
+--- a/kernels/volk/volk_32f_index_min_32u.h
++++ b/kernels/volk/volk_32f_index_min_32u.h
+@@ -30,11 +30,11 @@
+ *
+ * <b>Dispatcher Prototype</b>
+ * \code
+- * void volk_32f_index_min_32u(uint32_t* target, const float* src0, uint32_t num_points)
++ * void volk_32f_index_min_32u(uint32_t* target, const float* source, uint32_t num_points)
+ * \endcode
+ *
+ * \b Inputs
+- * \li src0: The input vector of floats.
++ * \li source: The input vector of floats.
+ * \li num_points: The number of data points.
+ *
+ * \b Outputs
+@@ -73,18 +73,17 @@
+ #include <smmintrin.h>
+
+ static inline void
+-volk_32f_index_min_32u_a_sse4_1(uint32_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_32u_a_sse4_1(uint32_t* target, const float* source, uint32_t num_points)
+ {
+ if (num_points > 0) {
+- uint32_t number = 0;
+ const uint32_t quarterPoints = num_points / 4;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m128 indexIncrementValues = _mm_set1_ps(4);
+ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m128 minValues = _mm_set1_ps(min);
+ __m128 minValuesIndex = _mm_setzero_ps();
+@@ -94,7 +93,7 @@ volk_32f_index_min_32u_a_sse4_1(uint32_t* target, const float* src0, uint32_t nu
+ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+
+- for (; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+
+ currentValues = _mm_load_ps(inputPtr);
+ inputPtr += 4;
+@@ -111,7 +110,7 @@ volk_32f_index_min_32u_a_sse4_1(uint32_t* target, const float* src0, uint32_t nu
+ _mm_store_ps(minValuesBuffer, minValues);
+ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 4; number++) {
++ for (uint32_t number = 0; number < 4; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -121,11 +120,10 @@ volk_32f_index_min_32u_a_sse4_1(uint32_t* target, const float* src0, uint32_t nu
+ }
+ }
+
+- number = quarterPoints * 4;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint32_t)index;
+@@ -140,18 +138,17 @@ volk_32f_index_min_32u_a_sse4_1(uint32_t* target, const float* src0, uint32_t nu
+ #include <xmmintrin.h>
+
+ static inline void
+-volk_32f_index_min_32u_a_sse(uint32_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_32u_a_sse(uint32_t* target, const float* source, uint32_t num_points)
+ {
+ if (num_points > 0) {
+- uint32_t number = 0;
+ const uint32_t quarterPoints = num_points / 4;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m128 indexIncrementValues = _mm_set1_ps(4);
+ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m128 minValues = _mm_set1_ps(min);
+ __m128 minValuesIndex = _mm_setzero_ps();
+@@ -161,7 +158,7 @@ volk_32f_index_min_32u_a_sse(uint32_t* target, const float* src0, uint32_t num_p
+ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+
+- for (; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+
+ currentValues = _mm_load_ps(inputPtr);
+ inputPtr += 4;
+@@ -180,7 +177,7 @@ volk_32f_index_min_32u_a_sse(uint32_t* target, const float* src0, uint32_t num_p
+ _mm_store_ps(minValuesBuffer, minValues);
+ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 4; number++) {
++ for (uint32_t number = 0; number < 4; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -190,11 +187,10 @@ volk_32f_index_min_32u_a_sse(uint32_t* target, const float* src0, uint32_t num_p
+ }
+ }
+
+- number = quarterPoints * 4;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint32_t)index;
+@@ -208,18 +204,17 @@ volk_32f_index_min_32u_a_sse(uint32_t* target, const float* src0, uint32_t num_p
+ #include <immintrin.h>
+
+ static inline void
+-volk_32f_index_min_32u_a_avx(uint32_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_32u_a_avx(uint32_t* target, const float* source, uint32_t num_points)
+ {
+ if (num_points > 0) {
+- uint32_t number = 0;
+ const uint32_t quarterPoints = num_points / 8;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m256 indexIncrementValues = _mm256_set1_ps(8);
+ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m256 minValues = _mm256_set1_ps(min);
+ __m256 minValuesIndex = _mm256_setzero_ps();
+@@ -229,7 +224,7 @@ volk_32f_index_min_32u_a_avx(uint32_t* target, const float* src0, uint32_t num_p
+ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
+ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
+
+- for (; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+ currentValues = _mm256_load_ps(inputPtr);
+ inputPtr += 8;
+ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
+@@ -243,7 +238,7 @@ volk_32f_index_min_32u_a_avx(uint32_t* target, const float* src0, uint32_t num_p
+ _mm256_store_ps(minValuesBuffer, minValues);
+ _mm256_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 8; number++) {
++ for (uint32_t number = 0; number < 8; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -253,11 +248,10 @@ volk_32f_index_min_32u_a_avx(uint32_t* target, const float* src0, uint32_t num_p
+ }
+ }
+
+- number = quarterPoints * 8;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = quarterPoints * 8; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint32_t)index;
+@@ -271,19 +265,18 @@ volk_32f_index_min_32u_a_avx(uint32_t* target, const float* src0, uint32_t num_p
+ #include <arm_neon.h>
+
+ static inline void
+-volk_32f_index_min_32u_neon(uint32_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_32u_neon(uint32_t* target, const float* source, uint32_t num_points)
+ {
+ if (num_points > 0) {
+- uint32_t number = 0;
+ const uint32_t quarterPoints = num_points / 4;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+ float32x4_t indexIncrementValues = vdupq_n_f32(4);
+ __VOLK_ATTR_ALIGNED(16)
+ float currentIndexes_float[4] = { -4.0f, -3.0f, -2.0f, -1.0f };
+ float32x4_t currentIndexes = vld1q_f32(currentIndexes_float);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ float32x4_t minValues = vdupq_n_f32(min);
+ uint32x4_t minValuesIndex = vmovq_n_u32(0);
+@@ -294,7 +287,7 @@ volk_32f_index_min_32u_neon(uint32_t* target, const float* src0, uint32_t num_po
+ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+
+- for (; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+ currentValues = vld1q_f32(inputPtr);
+ inputPtr += 4;
+ currentIndexes = vaddq_f32(currentIndexes, indexIncrementValues);
+@@ -308,7 +301,7 @@ volk_32f_index_min_32u_neon(uint32_t* target, const float* src0, uint32_t num_po
+ // Calculate the smallest value from the remaining 4 points
+ vst1q_f32(minValuesBuffer, minValues);
+ vst1q_f32(minIndexesBuffer, vcvtq_f32_u32(minValuesIndex));
+- for (number = 0; number < 4; number++) {
++ for (uint32_t number = 0; number < 4; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -318,11 +311,10 @@ volk_32f_index_min_32u_neon(uint32_t* target, const float* src0, uint32_t num_po
+ }
+ }
+
+- number = quarterPoints * 4;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint32_t)index;
+@@ -335,18 +327,16 @@ volk_32f_index_min_32u_neon(uint32_t* target, const float* src0, uint32_t num_po
+ #ifdef LV_HAVE_GENERIC
+
+ static inline void
+-volk_32f_index_min_32u_generic(uint32_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_32u_generic(uint32_t* target, const float* source, uint32_t num_points)
+ {
+ if (num_points > 0) {
+- float min = src0[0];
++ float min = source[0];
+ uint32_t index = 0;
+
+- uint32_t i = 1;
+-
+- for (; i < num_points; ++i) {
+- if (src0[i] < min) {
++ for (uint32_t i = 1; i < num_points; ++i) {
++ if (source[i] < min) {
+ index = i;
+- min = src0[i];
++ min = source[i];
+ }
+ }
+ target[0] = index;
+@@ -371,18 +361,17 @@ volk_32f_index_min_32u_generic(uint32_t* target, const float* src0, uint32_t num
+ #include <immintrin.h>
+
+ static inline void
+-volk_32f_index_min_32u_u_avx(uint32_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_32u_u_avx(uint32_t* target, const float* source, uint32_t num_points)
+ {
+ if (num_points > 0) {
+- uint32_t number = 0;
+ const uint32_t quarterPoints = num_points / 8;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m256 indexIncrementValues = _mm256_set1_ps(8);
+ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m256 minValues = _mm256_set1_ps(min);
+ __m256 minValuesIndex = _mm256_setzero_ps();
+@@ -392,7 +381,7 @@ volk_32f_index_min_32u_u_avx(uint32_t* target, const float* src0, uint32_t num_p
+ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
+ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
+
+- for (; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+ currentValues = _mm256_loadu_ps(inputPtr);
+ inputPtr += 8;
+ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
+@@ -406,7 +395,7 @@ volk_32f_index_min_32u_u_avx(uint32_t* target, const float* src0, uint32_t num_p
+ _mm256_store_ps(minValuesBuffer, minValues);
+ _mm256_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 8; number++) {
++ for (uint32_t number = 0; number < 8; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -416,11 +405,10 @@ volk_32f_index_min_32u_u_avx(uint32_t* target, const float* src0, uint32_t num_p
+ }
+ }
+
+- number = quarterPoints * 8;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = quarterPoints * 8; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint32_t)index;
+@@ -434,18 +422,17 @@ volk_32f_index_min_32u_u_avx(uint32_t* target, const float* src0, uint32_t num_p
+ #include <smmintrin.h>
+
+ static inline void
+-volk_32f_index_min_32u_u_sse4_1(uint32_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_32u_u_sse4_1(uint32_t* target, const float* source, uint32_t num_points)
+ {
+ if (num_points > 0) {
+- uint32_t number = 0;
+ const uint32_t quarterPoints = num_points / 4;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m128 indexIncrementValues = _mm_set1_ps(4);
+ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m128 minValues = _mm_set1_ps(min);
+ __m128 minValuesIndex = _mm_setzero_ps();
+@@ -455,7 +442,7 @@ volk_32f_index_min_32u_u_sse4_1(uint32_t* target, const float* src0, uint32_t nu
+ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+
+- for (; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+ currentValues = _mm_loadu_ps(inputPtr);
+ inputPtr += 4;
+ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
+@@ -469,7 +456,7 @@ volk_32f_index_min_32u_u_sse4_1(uint32_t* target, const float* src0, uint32_t nu
+ _mm_store_ps(minValuesBuffer, minValues);
+ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 4; number++) {
++ for (uint32_t number = 0; number < 4; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -479,11 +466,10 @@ volk_32f_index_min_32u_u_sse4_1(uint32_t* target, const float* src0, uint32_t nu
+ }
+ }
+
+- number = quarterPoints * 4;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint32_t)index;
+@@ -496,18 +482,17 @@ volk_32f_index_min_32u_u_sse4_1(uint32_t* target, const float* src0, uint32_t nu
+ #include <xmmintrin.h>
+
+ static inline void
+-volk_32f_index_min_32u_u_sse(uint32_t* target, const float* src0, uint32_t num_points)
++volk_32f_index_min_32u_u_sse(uint32_t* target, const float* source, uint32_t num_points)
+ {
+ if (num_points > 0) {
+- uint32_t number = 0;
+ const uint32_t quarterPoints = num_points / 4;
+
+- float* inputPtr = (float*)src0;
++ float* inputPtr = (float*)source;
+
+ __m128 indexIncrementValues = _mm_set1_ps(4);
+ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+
+- float min = src0[0];
++ float min = source[0];
+ float index = 0;
+ __m128 minValues = _mm_set1_ps(min);
+ __m128 minValuesIndex = _mm_setzero_ps();
+@@ -517,7 +502,7 @@ volk_32f_index_min_32u_u_sse(uint32_t* target, const float* src0, uint32_t num_p
+ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+
+- for (; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+ currentValues = _mm_loadu_ps(inputPtr);
+ inputPtr += 4;
+ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
+@@ -532,7 +517,7 @@ volk_32f_index_min_32u_u_sse(uint32_t* target, const float* src0, uint32_t num_p
+ _mm_store_ps(minValuesBuffer, minValues);
+ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (number = 0; number < 4; number++) {
++ for (uint32_t number = 0; number < 4; number++) {
+ if (minValuesBuffer[number] < min) {
+ index = minIndexesBuffer[number];
+ min = minValuesBuffer[number];
+@@ -542,11 +527,10 @@ volk_32f_index_min_32u_u_sse(uint32_t* target, const float* src0, uint32_t num_p
+ }
+ }
+
+- number = quarterPoints * 4;
+- for (; number < num_points; number++) {
+- if (src0[number] < min) {
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
+ index = number;
+- min = src0[number];
++ min = source[number];
+ }
+ }
+ target[0] = (uint32_t)index;
+diff --git a/kernels/volk/volk_32fc_index_min_16u.h b/kernels/volk/volk_32fc_index_min_16u.h
+index 5539ebf..bf7f6e3 100644
+--- a/kernels/volk/volk_32fc_index_min_16u.h
++++ b/kernels/volk/volk_32fc_index_min_16u.h
+@@ -36,11 +36,11 @@
+ *
+ * <b>Dispatcher Prototype</b>
+ * \code
+- * void volk_32fc_index_min_16u(uint16_t* target, lv_32fc_t* src0, uint32_t
++ * void volk_32fc_index_min_16u(uint16_t* target, lv_32fc_t* source, uint32_t
+ * num_points) \endcode
+ *
+ * \b Inputs
+- * \li src0: The complex input vector.
++ * \li source: The complex input vector.
+ * \li num_points: The number of samples.
+ *
+ * \b Outputs
+@@ -87,7 +87,7 @@
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_16u_a_avx2_variant_0(uint16_t* target,
+- lv_32fc_t* src0,
++ lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+@@ -104,11 +104,11 @@ static inline void volk_32fc_index_min_16u_a_avx2_variant_0(uint16_t* target,
+ __m256i min_indices = _mm256_setzero_si256();
+
+ for (unsigned i = 0; i < num_points / 8u; ++i) {
+- __m256 in0 = _mm256_load_ps((float*)src0);
+- __m256 in1 = _mm256_load_ps((float*)(src0 + 4));
++ __m256 in0 = _mm256_load_ps((float*)source);
++ __m256 in1 = _mm256_load_ps((float*)(source + 4));
+ vector_32fc_index_min_variant0(
+ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
+- src0 += 8;
++ source += 8;
+ }
+
+ // determine minimum value and index in the result of the vectorized loop
+@@ -129,12 +129,12 @@ static inline void volk_32fc_index_min_16u_a_avx2_variant_0(uint16_t* target,
+ // handle tail not processed by the vectorized loop
+ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
+ const float abs_squared =
+- lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ lv_creal(*source) * lv_creal(*source) + lv_cimag(*source) * lv_cimag(*source);
+ if (abs_squared < min) {
+ min = abs_squared;
+ index = i;
+ }
+- ++src0;
++ ++source;
+ }
+
+ *target = index;
+@@ -147,7 +147,7 @@ static inline void volk_32fc_index_min_16u_a_avx2_variant_0(uint16_t* target,
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_16u_a_avx2_variant_1(uint16_t* target,
+- lv_32fc_t* src0,
++ lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+@@ -164,11 +164,11 @@ static inline void volk_32fc_index_min_16u_a_avx2_variant_1(uint16_t* target,
+ __m256i min_indices = _mm256_setzero_si256();
+
+ for (unsigned i = 0; i < num_points / 8u; ++i) {
+- __m256 in0 = _mm256_load_ps((float*)src0);
+- __m256 in1 = _mm256_load_ps((float*)(src0 + 4));
++ __m256 in0 = _mm256_load_ps((float*)source);
++ __m256 in1 = _mm256_load_ps((float*)(source + 4));
+ vector_32fc_index_min_variant1(
+ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
+- src0 += 8;
++ source += 8;
+ }
+
+ // determine minimum value and index in the result of the vectorized loop
+@@ -189,12 +189,12 @@ static inline void volk_32fc_index_min_16u_a_avx2_variant_1(uint16_t* target,
+ // handle tail not processed by the vectorized loop
+ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
+ const float abs_squared =
+- lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ lv_creal(*source) * lv_creal(*source) + lv_cimag(*source) * lv_cimag(*source);
+ if (abs_squared < min) {
+ min = abs_squared;
+ index = i;
+ }
+- ++src0;
++ ++source;
+ }
+
+ *target = index;
+@@ -207,7 +207,7 @@ static inline void volk_32fc_index_min_16u_a_avx2_variant_1(uint16_t* target,
+ #include <xmmintrin.h>
+
+ static inline void
+-volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* src0, uint32_t num_points)
++volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+ const uint32_t num_bytes = num_points * 8;
+@@ -225,19 +225,18 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* src0, uint32_t num_p
+ holderf.int_vec = _mm_setzero_si128();
+ holderi.int_vec = _mm_setzero_si128();
+
+- int bound = num_bytes >> 5;
+- int i = 0;
+-
+ xmm8 = _mm_setr_epi32(0, 1, 2, 3);
+ xmm9 = _mm_setzero_si128();
+ xmm10 = _mm_setr_epi32(4, 4, 4, 4);
+ xmm3 = _mm_set_ps1(FLT_MAX);
+
+- for (; i < bound; ++i) {
+- xmm1 = _mm_load_ps((float*)src0);
+- xmm2 = _mm_load_ps((float*)&src0[2]);
++ int bound = num_bytes >> 5;
++
++ for (int i = 0; i < bound; ++i) {
++ xmm1 = _mm_load_ps((float*)source);
++ xmm2 = _mm_load_ps((float*)&source[2]);
+
+- src0 += 4;
++ source += 4;
+
+ xmm1 = _mm_mul_ps(xmm1, xmm1);
+ xmm2 = _mm_mul_ps(xmm2, xmm2);
+@@ -258,14 +257,14 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* src0, uint32_t num_p
+ }
+
+ if (num_bytes >> 4 & 1) {
+- xmm2 = _mm_load_ps((float*)src0);
++ xmm2 = _mm_load_ps((float*)source);
+
+ xmm1 = _mm_movelh_ps(bit128_p(&xmm8)->float_vec, bit128_p(&xmm8)->float_vec);
+ xmm8 = bit128_p(&xmm1)->int_vec;
+
+ xmm2 = _mm_mul_ps(xmm2, xmm2);
+
+- src0 += 2;
++ source += 2;
+
+ xmm1 = _mm_hadd_ps(xmm2, xmm2);
+
+@@ -286,7 +285,7 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* src0, uint32_t num_p
+
+ if (num_bytes >> 3 & 1) {
+ sq_dist =
+- lv_creal(src0[0]) * lv_creal(src0[0]) + lv_cimag(src0[0]) * lv_cimag(src0[0]);
++ lv_creal(source[0]) * lv_creal(source[0]) + lv_cimag(source[0]) * lv_cimag(source[0]);
+
+ xmm2 = _mm_load1_ps(&sq_dist);
+
+@@ -322,21 +321,18 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* src0, uint32_t num_p
+
+ #ifdef LV_HAVE_GENERIC
+ static inline void
+-volk_32fc_index_min_16u_generic(uint16_t* target, lv_32fc_t* src0, uint32_t num_points)
++volk_32fc_index_min_16u_generic(uint16_t* target, lv_32fc_t* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+-
+ const uint32_t num_bytes = num_points * 8;
+
+ float sq_dist = 0.0;
+ float min = FLT_MAX;
+ uint16_t index = 0;
+
+- uint32_t i = 0;
+-
+- for (; i<num_bytes>> 3; ++i) {
++ for (uint32_t i = 0; i<num_bytes>> 3; ++i) {
+ sq_dist =
+- lv_creal(src0[i]) * lv_creal(src0[i]) + lv_cimag(src0[i]) * lv_cimag(src0[i]);
++ lv_creal(source[i]) * lv_creal(source[i]) + lv_cimag(source[i]) * lv_cimag(source[i]);
+
+ if (sq_dist < min) {
+ index = i;
+@@ -364,7 +360,7 @@ volk_32fc_index_min_16u_generic(uint16_t* target, lv_32fc_t* src0, uint32_t num_
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_16u_u_avx2_variant_0(uint16_t* target,
+- lv_32fc_t* src0,
++ lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+@@ -381,11 +377,11 @@ static inline void volk_32fc_index_min_16u_u_avx2_variant_0(uint16_t* target,
+ __m256i min_indices = _mm256_setzero_si256();
+
+ for (unsigned i = 0; i < num_points / 8u; ++i) {
+- __m256 in0 = _mm256_loadu_ps((float*)src0);
+- __m256 in1 = _mm256_loadu_ps((float*)(src0 + 4));
++ __m256 in0 = _mm256_loadu_ps((float*)source);
++ __m256 in1 = _mm256_loadu_ps((float*)(source + 4));
+ vector_32fc_index_min_variant0(
+ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
+- src0 += 8;
++ source += 8;
+ }
+
+ // determine minimum value and index in the result of the vectorized loop
+@@ -406,12 +402,12 @@ static inline void volk_32fc_index_min_16u_u_avx2_variant_0(uint16_t* target,
+ // handle tail not processed by the vectorized loop
+ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
+ const float abs_squared =
+- lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ lv_creal(*source) * lv_creal(*source) + lv_cimag(*source) * lv_cimag(*source);
+ if (abs_squared < min) {
+ min = abs_squared;
+ index = i;
+ }
+- ++src0;
++ ++source;
+ }
+
+ *target = index;
+@@ -424,7 +420,7 @@ static inline void volk_32fc_index_min_16u_u_avx2_variant_0(uint16_t* target,
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_16u_u_avx2_variant_1(uint16_t* target,
+- lv_32fc_t* src0,
++ lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+@@ -441,11 +437,11 @@ static inline void volk_32fc_index_min_16u_u_avx2_variant_1(uint16_t* target,
+ __m256i min_indices = _mm256_setzero_si256();
+
+ for (unsigned i = 0; i < num_points / 8u; ++i) {
+- __m256 in0 = _mm256_loadu_ps((float*)src0);
+- __m256 in1 = _mm256_loadu_ps((float*)(src0 + 4));
++ __m256 in0 = _mm256_loadu_ps((float*)source);
++ __m256 in1 = _mm256_loadu_ps((float*)(source + 4));
+ vector_32fc_index_min_variant1(
+ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
+- src0 += 8;
++ source += 8;
+ }
+
+ // determine minimum value and index in the result of the vectorized loop
+@@ -466,12 +462,12 @@ static inline void volk_32fc_index_min_16u_u_avx2_variant_1(uint16_t* target,
+ // handle tail not processed by the vectorized loop
+ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
+ const float abs_squared =
+- lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ lv_creal(*source) * lv_creal(*source) + lv_cimag(*source) * lv_cimag(*source);
+ if (abs_squared < min) {
+ min = abs_squared;
+ index = i;
+ }
+- ++src0;
++ ++source;
+ }
+
+ *target = index;
+diff --git a/kernels/volk/volk_32fc_index_min_32u.h b/kernels/volk/volk_32fc_index_min_32u.h
+index 545f9bf..0539dd5 100644
+--- a/kernels/volk/volk_32fc_index_min_32u.h
++++ b/kernels/volk/volk_32fc_index_min_32u.h
+@@ -30,11 +30,11 @@
+ *
+ * <b>Dispatcher Prototype</b>
+ * \code
+- * void volk_32fc_index_min_32u(uint32_t* target, lv_32fc_t* src0, uint32_t
++ * void volk_32fc_index_min_32u(uint32_t* target, lv_32fc_t* source, uint32_t
+ * num_points) \endcode
+ *
+ * \b Inputs
+- * \li src0: The complex input vector.
++ * \li source: The complex input vector.
+ * \li num_points: The number of samples.
+ *
+ * \b Outputs
+@@ -80,7 +80,7 @@
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_32u_a_avx2_variant_0(uint32_t* target,
+- lv_32fc_t* src0,
++ lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ const __m256i indices_increment = _mm256_set1_epi32(8);
+@@ -95,11 +95,11 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_0(uint32_t* target,
+ __m256i min_indices = _mm256_setzero_si256();
+
+ for (unsigned i = 0; i < num_points / 8u; ++i) {
+- __m256 in0 = _mm256_load_ps((float*)src0);
+- __m256 in1 = _mm256_load_ps((float*)(src0 + 4));
++ __m256 in0 = _mm256_load_ps((float*)source);
++ __m256 in1 = _mm256_load_ps((float*)(source + 4));
+ vector_32fc_index_min_variant0(
+ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
+- src0 += 8;
++ source += 8;
+ }
+
+ // determine minimum value and index in the result of the vectorized loop
+@@ -120,12 +120,12 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_0(uint32_t* target,
+ // handle tail not processed by the vectorized loop
+ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
+ const float abs_squared =
+- lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ lv_creal(*source) * lv_creal(*source) + lv_cimag(*source) * lv_cimag(*source);
+ if (abs_squared < min) {
+ min = abs_squared;
+ index = i;
+ }
+- ++src0;
++ ++source;
+ }
+
+ *target = index;
+@@ -138,7 +138,7 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_0(uint32_t* target,
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_32u_a_avx2_variant_1(uint32_t* target,
+- lv_32fc_t* src0,
++ lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ const __m256i indices_increment = _mm256_set1_epi32(8);
+@@ -153,11 +153,11 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_1(uint32_t* target,
+ __m256i min_indices = _mm256_setzero_si256();
+
+ for (unsigned i = 0; i < num_points / 8u; ++i) {
+- __m256 in0 = _mm256_load_ps((float*)src0);
+- __m256 in1 = _mm256_load_ps((float*)(src0 + 4));
++ __m256 in0 = _mm256_load_ps((float*)source);
++ __m256 in1 = _mm256_load_ps((float*)(source + 4));
+ vector_32fc_index_min_variant1(
+ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
+- src0 += 8;
++ source += 8;
+ }
+
+ // determine minimum value and index in the result of the vectorized loop
+@@ -178,12 +178,12 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_1(uint32_t* target,
+ // handle tail not processed by the vectorized loop
+ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
+ const float abs_squared =
+- lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ lv_creal(*source) * lv_creal(*source) + lv_cimag(*source) * lv_cimag(*source);
+ if (abs_squared < min) {
+ min = abs_squared;
+ index = i;
+ }
+- ++src0;
++ ++source;
+ }
+
+ *target = index;
+@@ -196,7 +196,7 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_1(uint32_t* target,
+ #include <xmmintrin.h>
+
+ static inline void
+-volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* src0, uint32_t num_points)
++volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* source, uint32_t num_points)
+ {
+ const uint32_t num_bytes = num_points * 8;
+
+@@ -213,19 +213,18 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* src0, uint32_t num_p
+ holderf.int_vec = _mm_setzero_si128();
+ holderi.int_vec = _mm_setzero_si128();
+
+- int bound = num_bytes >> 5;
+- int i = 0;
+-
+ xmm8 = _mm_setr_epi32(0, 1, 2, 3);
+ xmm9 = _mm_setzero_si128();
+ xmm10 = _mm_setr_epi32(4, 4, 4, 4);
+ xmm3 = _mm_set_ps1(FLT_MAX);
+
+- for (; i < bound; ++i) {
+- xmm1 = _mm_load_ps((float*)src0);
+- xmm2 = _mm_load_ps((float*)&src0[2]);
++ int bound = num_bytes >> 5;
+
+- src0 += 4;
++ for (int i = 0; i < bound; ++i) {
++ xmm1 = _mm_load_ps((float*)source);
++ xmm2 = _mm_load_ps((float*)&source[2]);
++
++ source += 4;
+
+ xmm1 = _mm_mul_ps(xmm1, xmm1);
+ xmm2 = _mm_mul_ps(xmm2, xmm2);
+@@ -246,14 +245,14 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* src0, uint32_t num_p
+ }
+
+ if (num_bytes >> 4 & 1) {
+- xmm2 = _mm_load_ps((float*)src0);
++ xmm2 = _mm_load_ps((float*)source);
+
+ xmm1 = _mm_movelh_ps(bit128_p(&xmm8)->float_vec, bit128_p(&xmm8)->float_vec);
+ xmm8 = bit128_p(&xmm1)->int_vec;
+
+ xmm2 = _mm_mul_ps(xmm2, xmm2);
+
+- src0 += 2;
++ source += 2;
+
+ xmm1 = _mm_hadd_ps(xmm2, xmm2);
+
+@@ -274,7 +273,7 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* src0, uint32_t num_p
+
+ if (num_bytes >> 3 & 1) {
+ sq_dist =
+- lv_creal(src0[0]) * lv_creal(src0[0]) + lv_cimag(src0[0]) * lv_cimag(src0[0]);
++ lv_creal(source[0]) * lv_creal(source[0]) + lv_cimag(source[0]) * lv_cimag(source[0]);
+
+ xmm2 = _mm_load1_ps(&sq_dist);
+
+@@ -310,7 +309,7 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* src0, uint32_t num_p
+
+ #ifdef LV_HAVE_GENERIC
+ static inline void
+-volk_32fc_index_min_32u_generic(uint32_t* target, lv_32fc_t* src0, uint32_t num_points)
++volk_32fc_index_min_32u_generic(uint32_t* target, lv_32fc_t* source, uint32_t num_points)
+ {
+ const uint32_t num_bytes = num_points * 8;
+
+@@ -318,11 +317,9 @@ volk_32fc_index_min_32u_generic(uint32_t* target, lv_32fc_t* src0, uint32_t num_
+ float min = FLT_MAX;
+ uint32_t index = 0;
+
+- uint32_t i = 0;
+-
+- for (; i<num_bytes>> 3; ++i) {
++ for (uint32_t i = 0; i<num_bytes>> 3; ++i) {
+ sq_dist =
+- lv_creal(src0[i]) * lv_creal(src0[i]) + lv_cimag(src0[i]) * lv_cimag(src0[i]);
++ lv_creal(source[i]) * lv_creal(source[i]) + lv_cimag(source[i]) * lv_cimag(source[i]);
+
+ if (sq_dist < min) {
+ index = i;
+@@ -349,7 +346,7 @@ volk_32fc_index_min_32u_generic(uint32_t* target, lv_32fc_t* src0, uint32_t num_
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_32u_u_avx2_variant_0(uint32_t* target,
+- lv_32fc_t* src0,
++ lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ const __m256i indices_increment = _mm256_set1_epi32(8);
+@@ -364,11 +361,11 @@ static inline void volk_32fc_index_min_32u_u_avx2_variant_0(uint32_t* target,
+ __m256i min_indices = _mm256_setzero_si256();
+
+ for (unsigned i = 0; i < num_points / 8u; ++i) {
+- __m256 in0 = _mm256_loadu_ps((float*)src0);
+- __m256 in1 = _mm256_loadu_ps((float*)(src0 + 4));
++ __m256 in0 = _mm256_loadu_ps((float*)source);
++ __m256 in1 = _mm256_loadu_ps((float*)(source + 4));
+ vector_32fc_index_min_variant0(
+ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
+- src0 += 8;
++ source += 8;
+ }
+
+ // determine minimum value and index in the result of the vectorized loop
+@@ -389,12 +386,12 @@ static inline void volk_32fc_index_min_32u_u_avx2_variant_0(uint32_t* target,
+ // handle tail not processed by the vectorized loop
+ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
+ const float abs_squared =
+- lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ lv_creal(*source) * lv_creal(*source) + lv_cimag(*source) * lv_cimag(*source);
+ if (abs_squared < min) {
+ min = abs_squared;
+ index = i;
+ }
+- ++src0;
++ ++source;
+ }
+
+ *target = index;
+@@ -407,7 +404,7 @@ static inline void volk_32fc_index_min_32u_u_avx2_variant_0(uint32_t* target,
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_32u_u_avx2_variant_1(uint32_t* target,
+- lv_32fc_t* src0,
++ lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ const __m256i indices_increment = _mm256_set1_epi32(8);
+@@ -422,11 +419,11 @@ static inline void volk_32fc_index_min_32u_u_avx2_variant_1(uint32_t* target,
+ __m256i min_indices = _mm256_setzero_si256();
+
+ for (unsigned i = 0; i < num_points / 8u; ++i) {
+- __m256 in0 = _mm256_loadu_ps((float*)src0);
+- __m256 in1 = _mm256_loadu_ps((float*)(src0 + 4));
++ __m256 in0 = _mm256_loadu_ps((float*)source);
++ __m256 in1 = _mm256_loadu_ps((float*)(source + 4));
+ vector_32fc_index_min_variant1(
+ in0, in1, &min_values, &min_indices, ¤t_indices, indices_increment);
+- src0 += 8;
++ source += 8;
+ }
+
+ // determine minimum value and index in the result of the vectorized loop
+@@ -447,12 +444,12 @@ static inline void volk_32fc_index_min_32u_u_avx2_variant_1(uint32_t* target,
+ // handle tail not processed by the vectorized loop
+ for (unsigned i = num_points & (~7u); i < num_points; ++i) {
+ const float abs_squared =
+- lv_creal(*src0) * lv_creal(*src0) + lv_cimag(*src0) * lv_cimag(*src0);
++ lv_creal(*source) * lv_creal(*source) + lv_cimag(*source) * lv_cimag(*source);
+ if (abs_squared < min) {
+ min = abs_squared;
+ index = i;
+ }
+- ++src0;
++ ++source;
+ }
+
+ *target = index;
+@@ -465,11 +462,10 @@ static inline void volk_32fc_index_min_32u_u_avx2_variant_1(uint32_t* target,
+ #include <volk/volk_neon_intrinsics.h>
+
+ static inline void
+-volk_32fc_index_min_32u_neon(uint32_t* target, lv_32fc_t* src0, uint32_t num_points)
++volk_32fc_index_min_32u_neon(uint32_t* target, lv_32fc_t* source, uint32_t num_points)
+ {
+- unsigned int number = 0;
+ const uint32_t quarter_points = num_points / 4;
+- const lv_32fc_t* src0Ptr = src0;
++ const lv_32fc_t* sourcePtr = source;
+
+ uint32_t indices[4] = { 0, 1, 2, 3 };
+ const uint32x4_t vec_indices_incr = vdupq_n_u32(4);
+@@ -482,11 +478,11 @@ volk_32fc_index_min_32u_neon(uint32_t* target, lv_32fc_t* src0, uint32_t num_poi
+
+ float32x4_t vec_min = vdupq_n_f32(FLT_MAX);
+
+- for (; number < quarter_points; number++) {
++ for (uint32_t number = 0; number < quarter_points; number++) {
+ // Load complex and compute magnitude squared
+ const float32x4_t vec_mag2 =
+- _vmagnitudesquaredq_f32(vld2q_f32((float*)src0Ptr));
+- __VOLK_PREFETCH(src0Ptr += 4);
++ _vmagnitudesquaredq_f32(vld2q_f32((float*)sourcePtr));
++ __VOLK_PREFETCH(sourcePtr += 4);
+ // a < b?
+ const uint32x4_t lt_mask = vcltq_f32(vec_mag2, vec_min);
+ vec_min = vbslq_f32(lt_mask, vec_mag2, vec_min);
+@@ -506,14 +502,14 @@ volk_32fc_index_min_32u_neon(uint32_t* target, lv_32fc_t* src0, uint32_t num_poi
+ }
+
+ // Deal with the rest
+- for (number = quarter_points * 4; number < num_points; number++) {
+- const float re = lv_creal(*src0Ptr);
+- const float im = lv_cimag(*src0Ptr);
++ for (uint32_t number = quarter_points * 4; number < num_points; number++) {
++ const float re = lv_creal(*sourcePtr);
++ const float im = lv_cimag(*sourcePtr);
+ if ((re * re + im * im) < min) {
+- min = *src0Ptr;
++ min = *sourcePtr;
+ index = number;
+ }
+- src0Ptr++;
++ sourcePtr++;
+ }
+ *target = index;
+ }
+--
+2.30.2
+
--- /dev/null
+From 2fb097c2f25f215bdb6a906a12aa6468d5bfc5c9 Mon Sep 17 00:00:00 2001
+From: Zlika <zlika_ese@hotmail.com>
+Date: Wed, 16 Jun 2021 15:21:46 +0200
+Subject: [PATCH 05/73] Fix clang-format errors
+
+Signed-off-by: Zlika <zlika_ese@hotmail.com>
+---
+ kernels/volk/volk_32f_index_min_16u.h | 5 +++--
+ kernels/volk/volk_32f_index_min_32u.h | 10 ++++++----
+ kernels/volk/volk_32fc_index_min_16u.h | 8 ++++----
+ kernels/volk/volk_32fc_index_min_32u.h | 8 ++++----
+ 4 files changed, 17 insertions(+), 14 deletions(-)
+
+diff --git a/kernels/volk/volk_32f_index_min_16u.h b/kernels/volk/volk_32f_index_min_16u.h
+index d8ffcc7..115835e 100644
+--- a/kernels/volk/volk_32f_index_min_16u.h
++++ b/kernels/volk/volk_32f_index_min_16u.h
+@@ -140,8 +140,9 @@ volk_32f_index_min_16u_a_avx(uint16_t* target, const float* source, uint32_t num
+ #ifdef LV_HAVE_SSE4_1
+ #include <smmintrin.h>
+
+-static inline void
+-volk_32f_index_min_16u_a_sse4_1(uint16_t* target, const float* source, uint32_t num_points)
++static inline void volk_32f_index_min_16u_a_sse4_1(uint16_t* target,
++ const float* source,
++ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+ const uint32_t quarterPoints = num_points / 4;
+diff --git a/kernels/volk/volk_32f_index_min_32u.h b/kernels/volk/volk_32f_index_min_32u.h
+index 23c2d17..a68ba9c 100644
+--- a/kernels/volk/volk_32f_index_min_32u.h
++++ b/kernels/volk/volk_32f_index_min_32u.h
+@@ -72,8 +72,9 @@
+ #ifdef LV_HAVE_SSE4_1
+ #include <smmintrin.h>
+
+-static inline void
+-volk_32f_index_min_32u_a_sse4_1(uint32_t* target, const float* source, uint32_t num_points)
++static inline void volk_32f_index_min_32u_a_sse4_1(uint32_t* target,
++ const float* source,
++ uint32_t num_points)
+ {
+ if (num_points > 0) {
+ const uint32_t quarterPoints = num_points / 4;
+@@ -421,8 +422,9 @@ volk_32f_index_min_32u_u_avx(uint32_t* target, const float* source, uint32_t num
+ #ifdef LV_HAVE_SSE4_1
+ #include <smmintrin.h>
+
+-static inline void
+-volk_32f_index_min_32u_u_sse4_1(uint32_t* target, const float* source, uint32_t num_points)
++static inline void volk_32f_index_min_32u_u_sse4_1(uint32_t* target,
++ const float* source,
++ uint32_t num_points)
+ {
+ if (num_points > 0) {
+ const uint32_t quarterPoints = num_points / 4;
+diff --git a/kernels/volk/volk_32fc_index_min_16u.h b/kernels/volk/volk_32fc_index_min_16u.h
+index bf7f6e3..8f40730 100644
+--- a/kernels/volk/volk_32fc_index_min_16u.h
++++ b/kernels/volk/volk_32fc_index_min_16u.h
+@@ -284,8 +284,8 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* source, uint32_t num
+ }
+
+ if (num_bytes >> 3 & 1) {
+- sq_dist =
+- lv_creal(source[0]) * lv_creal(source[0]) + lv_cimag(source[0]) * lv_cimag(source[0]);
++ sq_dist = lv_creal(source[0]) * lv_creal(source[0]) +
++ lv_cimag(source[0]) * lv_cimag(source[0]);
+
+ xmm2 = _mm_load1_ps(&sq_dist);
+
+@@ -331,8 +331,8 @@ volk_32fc_index_min_16u_generic(uint16_t* target, lv_32fc_t* source, uint32_t nu
+ uint16_t index = 0;
+
+ for (uint32_t i = 0; i<num_bytes>> 3; ++i) {
+- sq_dist =
+- lv_creal(source[i]) * lv_creal(source[i]) + lv_cimag(source[i]) * lv_cimag(source[i]);
++ sq_dist = lv_creal(source[i]) * lv_creal(source[i]) +
++ lv_cimag(source[i]) * lv_cimag(source[i]);
+
+ if (sq_dist < min) {
+ index = i;
+diff --git a/kernels/volk/volk_32fc_index_min_32u.h b/kernels/volk/volk_32fc_index_min_32u.h
+index 0539dd5..efa33ee 100644
+--- a/kernels/volk/volk_32fc_index_min_32u.h
++++ b/kernels/volk/volk_32fc_index_min_32u.h
+@@ -272,8 +272,8 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* source, uint32_t num
+ }
+
+ if (num_bytes >> 3 & 1) {
+- sq_dist =
+- lv_creal(source[0]) * lv_creal(source[0]) + lv_cimag(source[0]) * lv_cimag(source[0]);
++ sq_dist = lv_creal(source[0]) * lv_creal(source[0]) +
++ lv_cimag(source[0]) * lv_cimag(source[0]);
+
+ xmm2 = _mm_load1_ps(&sq_dist);
+
+@@ -318,8 +318,8 @@ volk_32fc_index_min_32u_generic(uint32_t* target, lv_32fc_t* source, uint32_t nu
+ uint32_t index = 0;
+
+ for (uint32_t i = 0; i<num_bytes>> 3; ++i) {
+- sq_dist =
+- lv_creal(source[i]) * lv_creal(source[i]) + lv_cimag(source[i]) * lv_cimag(source[i]);
++ sq_dist = lv_creal(source[i]) * lv_creal(source[i]) +
++ lv_cimag(source[i]) * lv_cimag(source[i]);
+
+ if (sq_dist < min) {
+ index = i;
+--
+2.30.2
+
--- /dev/null
+From 78a900ad5030ce13e38994a6d2c5c74e6c80b2d2 Mon Sep 17 00:00:00 2001
+From: Magnus Lundmark <magnus@skysense.io>
+Date: Fri, 18 Jun 2021 15:16:22 +0200
+Subject: [PATCH 06/73] New generic implementation, fixed typos
+
+Signed-off-by: Magnus Lundmark <magnus@skysense.io>
+---
+ .../volk/volk_32f_stddev_and_mean_32f_x2.h | 11 ++++--
+ .../volk_32fc_x2_conjugate_dot_prod_32fc.h | 37 +++++++++++++++++--
+ 2 files changed, 41 insertions(+), 7 deletions(-)
+
+diff --git a/kernels/volk/volk_32f_stddev_and_mean_32f_x2.h b/kernels/volk/volk_32f_stddev_and_mean_32f_x2.h
+index f62e630..accb441 100644
+--- a/kernels/volk/volk_32f_stddev_and_mean_32f_x2.h
++++ b/kernels/volk/volk_32f_stddev_and_mean_32f_x2.h
+@@ -43,16 +43,19 @@
+ *
+ * \b Example
+ * Generate random numbers with c++11's normal distribution and estimate the mean and
+- * standard deviation \code int N = 1000; unsigned int alignment = volk_get_alignment();
++ * standard deviation
++ * \code
++ * int N = 1000;
++ * unsigned int alignment = volk_get_alignment();
+ * float* rand_numbers = (float*) volk_malloc(sizeof(float)*N, alignment);
+ * float* mean = (float*) volk_malloc(sizeof(float), alignment);
+ * float* stddev = (float*) volk_malloc(sizeof(float), alignment);
+ *
+- * // Use a normal generator with 0 mean, stddev 1
++ * // Use a normal generator with 0 mean, stddev 1000
+ * std::default_random_engine generator;
+- * std::normal_distribution<float> distribution(0,1000);
++ * std::normal_distribution<float> distribution(0, 1000);
+ *
+- * for(unsigned int ii = 0; ii < N; ++ii){
++ * for(unsigned int ii = 0; ii < N; ++ii) {
+ * rand_numbers[ii] = distribution(generator);
+ * }
+ *
+diff --git a/kernels/volk/volk_32fc_x2_conjugate_dot_prod_32fc.h b/kernels/volk/volk_32fc_x2_conjugate_dot_prod_32fc.h
+index 0f69499..4aeb05a 100644
+--- a/kernels/volk/volk_32fc_x2_conjugate_dot_prod_32fc.h
++++ b/kernels/volk/volk_32fc_x2_conjugate_dot_prod_32fc.h
+@@ -47,12 +47,27 @@
+ *
+ * \b Example
+ * \code
+- * int N = 10000;
++ * unsigned int N = 1000;
++ * unsigned int alignment = volk_get_alignment();
+ *
+- * <FIXME>
++ * lv_32fc_t* a = (lv_32fc_t*) volk_malloc(sizeof(lv_32fc_t) * N, alignment);
++ * lv_32fc_t* b = (lv_32fc_t*) volk_malloc(sizeof(lv_32fc_t) * N, alignment);
+ *
+- * volk_32fc_x2_conjugate_dot_prod_32fc();
++ * for (int i = 0; i < N; ++i) {
++ * a[i] = lv_cmake(.50f, .50f);
++ * b[i] = lv_cmake(.50f, .75f);
++ * }
+ *
++ * lv_32fc_t e = (float) N * a[0] * lv_conj(b[0]); // When a and b constant
++ * lv_32fc_t res;
++ *
++ * volk_32fc_x2_conjugate_dot_prod_32fc(&res, a, b, N);
++ *
++ * printf("Expected: %8.2f%+8.2fi\n", lv_real(e), lv_imag(e));
++ * printf("Result: %8.2f%+8.2fi\n", lv_real(res), lv_imag(res));
++ *
++ * volk_free(a);
++ * volk_free(b);
+ * \endcode
+ */
+
+@@ -70,6 +85,22 @@ static inline void volk_32fc_x2_conjugate_dot_prod_32fc_generic(lv_32fc_t* resul
+ const lv_32fc_t* taps,
+ unsigned int num_points)
+ {
++ lv_32fc_t res = lv_cmake(0.f, 0.f);
++ for (unsigned int i = 0; i < num_points; ++i) {
++ res += (*input++) * lv_conj((*taps++));
++ }
++ *result = res;
++}
++
++#endif /*LV_HAVE_GENERIC*/
++
++#ifdef LV_HAVE_GENERIC
++
++static inline void volk_32fc_x2_conjugate_dot_prod_32fc_block(lv_32fc_t* result,
++ const lv_32fc_t* input,
++ const lv_32fc_t* taps,
++ unsigned int num_points)
++{
+
+ const unsigned int num_bytes = num_points * 8;
+
+--
+2.30.2
+
--- /dev/null
+From 63d110c49e69d145eb0fa71c3cd8f27562553f1d Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Marcus=20M=C3=BCller?= <marcus@hostalia.de>
+Date: Thu, 1 Jul 2021 19:41:27 +0200
+Subject: [PATCH 07/73] Add the list of contributors agreeing to LGPL licensing
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+* List of contributors, with short explanation
+* added LGPLv3 license text as COPYING-LGPL
+* moved GPLv3 license text to COPYING-GPL
+* COPYING still contains the GPLv3, but with an explanation that new
+ code is LGPL
+* updated the contribution guide
+* Removed the GPL license header from Readme, added explanation.
+
+Signed-off-by: Marcus Müller <marcus@hostalia.de>
+---
+ AUTHORS_GRANTING_LGPL_LICENSE.txt | 15 +++
+ CONTRIBUTING.md | 8 +-
+ COPYING | 10 ++
+ COPYING-GPL | 1 +
+ COPYING-LGPL | 175 ++++++++++++++++++++++++++++++
+ README.md | 26 +----
+ 6 files changed, 211 insertions(+), 24 deletions(-)
+ create mode 100644 AUTHORS_GRANTING_LGPL_LICENSE.txt
+ create mode 120000 COPYING-GPL
+ create mode 100644 COPYING-LGPL
+
+diff --git a/AUTHORS_GRANTING_LGPL_LICENSE.txt b/AUTHORS_GRANTING_LGPL_LICENSE.txt
+new file mode 100644
+index 0000000..7205328
+--- /dev/null
++++ b/AUTHORS_GRANTING_LGPL_LICENSE.txt
+@@ -0,0 +1,15 @@
++VOLK is going to migrating from GPLv3 (GNU General Public license version 3.0)
++to LGPLv3 (GNU Lesser General Public License version 3.0).
++
++This file is a list of the authors who agreed to grant an LGPL license to the
++code they contributed to this repository. In case the affected code is currently
++licensed differently (GPLv3), this gives the right to use the current
++contributions under both LGPLv3 and that other license. Future contributions by
++these authors are, however, licensed under LGPLv3, unless explicitly stated
++otherwise.
++
++Together with the date of agreement, these authors are:
++
++| Date | Author (as used in commits) |
++|------+-----------------------------|
++| | |
+diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
+index bfbbad7..87dffde 100644
+--- a/CONTRIBUTING.md
++++ b/CONTRIBUTING.md
+@@ -21,10 +21,10 @@ code.
+
+ ## DCO Signed?
+
+-Any code contributions going into VOLK will become part of a GPL-licensed,
+-open source repository. It is therefore imperative that code submissions belong
+-to the authors, and that submitters have the authority to merge that code into
+-the public VOLK codebase.
++Any code contributions going into VOLK will become part of an LPGPL-licensed
++(former contributions are GPL-licensed), open source repository. It is therefore
++imperative that code submissions belong to the authors, and that submitters have
++the authority to merge that code into the public VOLK codebase.
+
+ For that purpose, we use the [Developer's Certificate of Origin](DCO.txt). It
+ is the same document used by other projects. Signing the DCO states that there
+diff --git a/COPYING b/COPYING
+index 94a9ed0..6c1874b 100644
+--- a/COPYING
++++ b/COPYING
+@@ -1,3 +1,13 @@
++Files in this code repository are to be licensed under the LGPLv3, which you'll
++find in the file COPYING-LGPL
++
++However, you'll find some files that carry a license header that assigns them as
++GPLv3, which was the default license for VOLK up to version 3. You can find the
++full license text of the GPLv3 below.
++
++================================================================================
++
++
+ GNU GENERAL PUBLIC LICENSE
+ Version 3, 29 June 2007
+
+diff --git a/COPYING-GPL b/COPYING-GPL
+new file mode 120000
+index 0000000..d24842f
+--- /dev/null
++++ b/COPYING-GPL
+@@ -0,0 +1 @@
++COPYING
+\ No newline at end of file
+diff --git a/COPYING-LGPL b/COPYING-LGPL
+new file mode 100644
+index 0000000..21ca013
+--- /dev/null
++++ b/COPYING-LGPL
+@@ -0,0 +1,175 @@
++Files in this code repository are to be licensed under the LGPLv3, which you'll
++find below.
++
++However, you'll find some files that carry a license header that assigns them as
++GPLv3, which was the default license for VOLK up to version 3. You can find the
++full license text of the GPLv3 in the file COPYING-GPL.
++
++================================================================================
++
++
++ GNU LESSER GENERAL PUBLIC LICENSE
++ Version 3, 29 June 2007
++
++ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
++ Everyone is permitted to copy and distribute verbatim copies
++ of this license document, but changing it is not allowed.
++
++
++ This version of the GNU Lesser General Public License incorporates
++the terms and conditions of version 3 of the GNU General Public
++License, supplemented by the additional permissions listed below.
++
++ 0. Additional Definitions.
++
++ As used herein, "this License" refers to version 3 of the GNU Lesser
++General Public License, and the "GNU GPL" refers to version 3 of the GNU
++General Public License.
++
++ "The Library" refers to a covered work governed by this License,
++other than an Application or a Combined Work as defined below.
++
++ An "Application" is any work that makes use of an interface provided
++by the Library, but which is not otherwise based on the Library.
++Defining a subclass of a class defined by the Library is deemed a mode
++of using an interface provided by the Library.
++
++ A "Combined Work" is a work produced by combining or linking an
++Application with the Library. The particular version of the Library
++with which the Combined Work was made is also called the "Linked
++Version".
++
++ The "Minimal Corresponding Source" for a Combined Work means the
++Corresponding Source for the Combined Work, excluding any source code
++for portions of the Combined Work that, considered in isolation, are
++based on the Application, and not on the Linked Version.
++
++ The "Corresponding Application Code" for a Combined Work means the
++object code and/or source code for the Application, including any data
++and utility programs needed for reproducing the Combined Work from the
++Application, but excluding the System Libraries of the Combined Work.
++
++ 1. Exception to Section 3 of the GNU GPL.
++
++ You may convey a covered work under sections 3 and 4 of this License
++without being bound by section 3 of the GNU GPL.
++
++ 2. Conveying Modified Versions.
++
++ If you modify a copy of the Library, and, in your modifications, a
++facility refers to a function or data to be supplied by an Application
++that uses the facility (other than as an argument passed when the
++facility is invoked), then you may convey a copy of the modified
++version:
++
++ a) under this License, provided that you make a good faith effort to
++ ensure that, in the event an Application does not supply the
++ function or data, the facility still operates, and performs
++ whatever part of its purpose remains meaningful, or
++
++ b) under the GNU GPL, with none of the additional permissions of
++ this License applicable to that copy.
++
++ 3. Object Code Incorporating Material from Library Header Files.
++
++ The object code form of an Application may incorporate material from
++a header file that is part of the Library. You may convey such object
++code under terms of your choice, provided that, if the incorporated
++material is not limited to numerical parameters, data structure
++layouts and accessors, or small macros, inline functions and templates
++(ten or fewer lines in length), you do both of the following:
++
++ a) Give prominent notice with each copy of the object code that the
++ Library is used in it and that the Library and its use are
++ covered by this License.
++
++ b) Accompany the object code with a copy of the GNU GPL and this license
++ document.
++
++ 4. Combined Works.
++
++ You may convey a Combined Work under terms of your choice that,
++taken together, effectively do not restrict modification of the
++portions of the Library contained in the Combined Work and reverse
++engineering for debugging such modifications, if you also do each of
++the following:
++
++ a) Give prominent notice with each copy of the Combined Work that
++ the Library is used in it and that the Library and its use are
++ covered by this License.
++
++ b) Accompany the Combined Work with a copy of the GNU GPL and this license
++ document.
++
++ c) For a Combined Work that displays copyright notices during
++ execution, include the copyright notice for the Library among
++ these notices, as well as a reference directing the user to the
++ copies of the GNU GPL and this license document.
++
++ d) Do one of the following:
++
++ 0) Convey the Minimal Corresponding Source under the terms of this
++ License, and the Corresponding Application Code in a form
++ suitable for, and under terms that permit, the user to
++ recombine or relink the Application with a modified version of
++ the Linked Version to produce a modified Combined Work, in the
++ manner specified by section 6 of the GNU GPL for conveying
++ Corresponding Source.
++
++ 1) Use a suitable shared library mechanism for linking with the
++ Library. A suitable mechanism is one that (a) uses at run time
++ a copy of the Library already present on the user's computer
++ system, and (b) will operate properly with a modified version
++ of the Library that is interface-compatible with the Linked
++ Version.
++
++ e) Provide Installation Information, but only if you would otherwise
++ be required to provide such information under section 6 of the
++ GNU GPL, and only to the extent that such information is
++ necessary to install and execute a modified version of the
++ Combined Work produced by recombining or relinking the
++ Application with a modified version of the Linked Version. (If
++ you use option 4d0, the Installation Information must accompany
++ the Minimal Corresponding Source and Corresponding Application
++ Code. If you use option 4d1, you must provide the Installation
++ Information in the manner specified by section 6 of the GNU GPL
++ for conveying Corresponding Source.)
++
++ 5. Combined Libraries.
++
++ You may place library facilities that are a work based on the
++Library side by side in a single library together with other library
++facilities that are not Applications and are not covered by this
++License, and convey such a combined library under terms of your
++choice, if you do both of the following:
++
++ a) Accompany the combined library with a copy of the same work based
++ on the Library, uncombined with any other library facilities,
++ conveyed under the terms of this License.
++
++ b) Give prominent notice with the combined library that part of it
++ is a work based on the Library, and explaining where to find the
++ accompanying uncombined form of the same work.
++
++ 6. Revised Versions of the GNU Lesser General Public License.
++
++ The Free Software Foundation may publish revised and/or new versions
++of the GNU Lesser General Public License from time to time. Such new
++versions will be similar in spirit to the present version, but may
++differ in detail to address new problems or concerns.
++
++ Each version is given a distinguishing version number. If the
++Library as you received it specifies that a certain numbered version
++of the GNU Lesser General Public License "or any later version"
++applies to it, you have the option of following the terms and
++conditions either of that published version or of any later version
++published by the Free Software Foundation. If the Library as you
++received it does not specify a version number of the GNU Lesser
++General Public License, you may choose any version of the GNU Lesser
++General Public License ever published by the Free Software Foundation.
++
++ If the Library as you received it specifies that a proxy can decide
++whether future versions of the GNU Lesser General Public License shall
++apply, that proxy's public statement of acceptance of any version is
++permanent authorization for you to choose that version for the
++Library.
+diff --git a/README.md b/README.md
+index 8152b60..013a9ae 100644
+--- a/README.md
++++ b/README.md
+@@ -98,23 +98,9 @@ We want to make sure VOLK compiles on a wide variety of compilers. Thus, we targ
+
+ ## License
+
+->
+-> Copyright 2015 Free Software Foundation, Inc.
+->
+-> This file is part of VOLK
+->
+-> VOLK is free software; you can redistribute it and/or modify
+-> it under the terms of the GNU General Public License as published by
+-> the Free Software Foundation; either version 3, or (at your option)
+-> any later version.
+->
+-> VOLK is distributed in the hope that it will be useful,
+-> but WITHOUT ANY WARRANTY; without even the implied warranty of
+-> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+-> GNU General Public License for more details.
+->
+-> You should have received a copy of the GNU General Public License
+-> along with GNU Radio; see the file COPYING. If not, write to
+-> the Free Software Foundation, Inc., 51 Franklin Street,
+-> Boston, MA 02110-1301, USA.
+->
++VOLK is moving from the GNU General Public License version 3.0 (GPLv3) to the
++GNU Lesser General Public License version 3.0 (LGPLv3). At this point in time,
++much of the code in the repository is still GPL-licensed, but new contributors
++are asked to use the LGPLv3 for their code contributions. Existing contributors
++are very kindly requested to also allow LPGL-licensing by adding their name to
++the file `AUTHORS_GRANTING_LGPL_LICENSE.txt`.
+--
+2.30.2
+
--- /dev/null
+From f8714d89a3accaab78711c276c98199f1991af72 Mon Sep 17 00:00:00 2001
+From: Zlika <zlika_ese@hotmail.com>
+Date: Mon, 5 Jul 2021 13:05:18 +0200
+Subject: [PATCH 09/73] Code cleanup
+
+Signed-off-by: Zlika <zlika_ese@hotmail.com>
+---
+ kernels/volk/volk_32f_index_min_16u.h | 6 +-
+ kernels/volk/volk_32f_index_min_32u.h | 602 ++++++++++++-------------
+ kernels/volk/volk_32fc_index_min_16u.h | 16 +-
+ kernels/volk/volk_32fc_index_min_32u.h | 18 +-
+ 4 files changed, 310 insertions(+), 332 deletions(-)
+
+diff --git a/kernels/volk/volk_32f_index_min_16u.h b/kernels/volk/volk_32f_index_min_16u.h
+index 115835e..00acd85 100644
+--- a/kernels/volk/volk_32f_index_min_16u.h
++++ b/kernels/volk/volk_32f_index_min_16u.h
+@@ -2,14 +2,14 @@
+ /*
+ * Copyright 2021 Free Software Foundation, Inc.
+ *
+- * This file is part of GNU Radio
++ * This file is part of VOLK
+ *
+- * GNU Radio is free software; you can redistribute it and/or modify
++ * VOLK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3, or (at your option)
+ * any later version.
+ *
+- * GNU Radio is distributed in the hope that it will be useful,
++ * VOLK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+diff --git a/kernels/volk/volk_32f_index_min_32u.h b/kernels/volk/volk_32f_index_min_32u.h
+index a68ba9c..c71ee60 100644
+--- a/kernels/volk/volk_32f_index_min_32u.h
++++ b/kernels/volk/volk_32f_index_min_32u.h
+@@ -2,14 +2,14 @@
+ /*
+ * Copyright 2021 Free Software Foundation, Inc.
+ *
+- * This file is part of GNU Radio
++ * This file is part of VOLK
+ *
+- * GNU Radio is free software; you can redistribute it and/or modify
++ * VOLK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3, or (at your option)
+ * any later version.
+ *
+- * GNU Radio is distributed in the hope that it will be useful,
++ * VOLK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+@@ -76,59 +76,57 @@ static inline void volk_32f_index_min_32u_a_sse4_1(uint32_t* target,
+ const float* source,
+ uint32_t num_points)
+ {
+- if (num_points > 0) {
+- const uint32_t quarterPoints = num_points / 4;
++ const uint32_t quarterPoints = num_points / 4;
+
+- float* inputPtr = (float*)source;
++ float* inputPtr = (float*)source;
+
+- __m128 indexIncrementValues = _mm_set1_ps(4);
+- __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+
+- float min = source[0];
+- float index = 0;
+- __m128 minValues = _mm_set1_ps(min);
+- __m128 minValuesIndex = _mm_setzero_ps();
+- __m128 compareResults;
+- __m128 currentValues;
++ float min = source[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
+
+- __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+- __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+
+- for (uint32_t number = 0; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+
+- currentValues = _mm_load_ps(inputPtr);
+- inputPtr += 4;
+- currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++ currentValues = _mm_load_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
+
+- compareResults = _mm_cmplt_ps(currentValues, minValues);
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
+
+- minValuesIndex =
+- _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
+- minValues = _mm_blendv_ps(minValues, currentValues, compareResults);
+- }
++ minValuesIndex =
++ _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm_blendv_ps(minValues, currentValues, compareResults);
++ }
+
+- // Calculate the smallest value from the remaining 4 points
+- _mm_store_ps(minValuesBuffer, minValues);
+- _mm_store_ps(minIndexesBuffer, minValuesIndex);
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (uint32_t number = 0; number < 4; number++) {
+- if (minValuesBuffer[number] < min) {
++ for (uint32_t number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
+ index = minIndexesBuffer[number];
+- min = minValuesBuffer[number];
+- } else if (minValuesBuffer[number] == min) {
+- if (index > minIndexesBuffer[number])
+- index = minIndexesBuffer[number];
+- }
+ }
++ }
+
+- for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
+- if (source[number] < min) {
+- index = number;
+- min = source[number];
+- }
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
++ index = number;
++ min = source[number];
+ }
+- target[0] = (uint32_t)index;
+ }
++ target[0] = (uint32_t)index;
+ }
+
+ #endif /*LV_HAVE_SSE4_1*/
+@@ -141,61 +139,59 @@ static inline void volk_32f_index_min_32u_a_sse4_1(uint32_t* target,
+ static inline void
+ volk_32f_index_min_32u_a_sse(uint32_t* target, const float* source, uint32_t num_points)
+ {
+- if (num_points > 0) {
+- const uint32_t quarterPoints = num_points / 4;
++ const uint32_t quarterPoints = num_points / 4;
+
+- float* inputPtr = (float*)source;
++ float* inputPtr = (float*)source;
+
+- __m128 indexIncrementValues = _mm_set1_ps(4);
+- __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+
+- float min = source[0];
+- float index = 0;
+- __m128 minValues = _mm_set1_ps(min);
+- __m128 minValuesIndex = _mm_setzero_ps();
+- __m128 compareResults;
+- __m128 currentValues;
++ float min = source[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
+
+- __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+- __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+
+- for (uint32_t number = 0; number < quarterPoints; number++) {
++ for (uint32_t number = 0; number < quarterPoints; number++) {
+
+- currentValues = _mm_load_ps(inputPtr);
+- inputPtr += 4;
+- currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++ currentValues = _mm_load_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
+
+- compareResults = _mm_cmplt_ps(currentValues, minValues);
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
+
+- minValuesIndex = _mm_or_ps(_mm_and_ps(compareResults, currentIndexes),
+- _mm_andnot_ps(compareResults, minValuesIndex));
++ minValuesIndex = _mm_or_ps(_mm_and_ps(compareResults, currentIndexes),
++ _mm_andnot_ps(compareResults, minValuesIndex));
+
+- minValues = _mm_or_ps(_mm_and_ps(compareResults, currentValues),
+- _mm_andnot_ps(compareResults, minValues));
+- }
++ minValues = _mm_or_ps(_mm_and_ps(compareResults, currentValues),
++ _mm_andnot_ps(compareResults, minValues));
++ }
+
+- // Calculate the smallest value from the remaining 4 points
+- _mm_store_ps(minValuesBuffer, minValues);
+- _mm_store_ps(minIndexesBuffer, minValuesIndex);
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (uint32_t number = 0; number < 4; number++) {
+- if (minValuesBuffer[number] < min) {
++ for (uint32_t number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
+ index = minIndexesBuffer[number];
+- min = minValuesBuffer[number];
+- } else if (minValuesBuffer[number] == min) {
+- if (index > minIndexesBuffer[number])
+- index = minIndexesBuffer[number];
+- }
+ }
++ }
+
+- for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
+- if (source[number] < min) {
+- index = number;
+- min = source[number];
+- }
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
++ index = number;
++ min = source[number];
+ }
+- target[0] = (uint32_t)index;
+ }
++ target[0] = (uint32_t)index;
+ }
+
+ #endif /*LV_HAVE_SSE*/
+@@ -207,56 +203,54 @@ volk_32f_index_min_32u_a_sse(uint32_t* target, const float* source, uint32_t num
+ static inline void
+ volk_32f_index_min_32u_a_avx(uint32_t* target, const float* source, uint32_t num_points)
+ {
+- if (num_points > 0) {
+- const uint32_t quarterPoints = num_points / 8;
+-
+- float* inputPtr = (float*)source;
+-
+- __m256 indexIncrementValues = _mm256_set1_ps(8);
+- __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
+-
+- float min = source[0];
+- float index = 0;
+- __m256 minValues = _mm256_set1_ps(min);
+- __m256 minValuesIndex = _mm256_setzero_ps();
+- __m256 compareResults;
+- __m256 currentValues;
+-
+- __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
+- __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
+-
+- for (uint32_t number = 0; number < quarterPoints; number++) {
+- currentValues = _mm256_load_ps(inputPtr);
+- inputPtr += 8;
+- currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
+- compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
+- minValuesIndex =
+- _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
+- minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
+- }
++ const uint32_t quarterPoints = num_points / 8;
++
++ float* inputPtr = (float*)source;
++
++ __m256 indexIncrementValues = _mm256_set1_ps(8);
++ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
++
++ float min = source[0];
++ float index = 0;
++ __m256 minValues = _mm256_set1_ps(min);
++ __m256 minValuesIndex = _mm256_setzero_ps();
++ __m256 compareResults;
++ __m256 currentValues;
++
++ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
++ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
++
++ for (uint32_t number = 0; number < quarterPoints; number++) {
++ currentValues = _mm256_load_ps(inputPtr);
++ inputPtr += 8;
++ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
++ compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
++ minValuesIndex =
++ _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
++ }
+
+- // Calculate the smallest value from the remaining 8 points
+- _mm256_store_ps(minValuesBuffer, minValues);
+- _mm256_store_ps(minIndexesBuffer, minValuesIndex);
++ // Calculate the smallest value from the remaining 8 points
++ _mm256_store_ps(minValuesBuffer, minValues);
++ _mm256_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (uint32_t number = 0; number < 8; number++) {
+- if (minValuesBuffer[number] < min) {
++ for (uint32_t number = 0; number < 8; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
+ index = minIndexesBuffer[number];
+- min = minValuesBuffer[number];
+- } else if (minValuesBuffer[number] == min) {
+- if (index > minIndexesBuffer[number])
+- index = minIndexesBuffer[number];
+- }
+ }
++ }
+
+- for (uint32_t number = quarterPoints * 8; number < num_points; number++) {
+- if (source[number] < min) {
+- index = number;
+- min = source[number];
+- }
++ for (uint32_t number = quarterPoints * 8; number < num_points; number++) {
++ if (source[number] < min) {
++ index = number;
++ min = source[number];
+ }
+- target[0] = (uint32_t)index;
+ }
++ target[0] = (uint32_t)index;
+ }
+
+ #endif /*LV_HAVE_AVX*/
+@@ -268,58 +262,56 @@ volk_32f_index_min_32u_a_avx(uint32_t* target, const float* source, uint32_t num
+ static inline void
+ volk_32f_index_min_32u_neon(uint32_t* target, const float* source, uint32_t num_points)
+ {
+- if (num_points > 0) {
+- const uint32_t quarterPoints = num_points / 4;
+-
+- float* inputPtr = (float*)source;
+- float32x4_t indexIncrementValues = vdupq_n_f32(4);
+- __VOLK_ATTR_ALIGNED(16)
+- float currentIndexes_float[4] = { -4.0f, -3.0f, -2.0f, -1.0f };
+- float32x4_t currentIndexes = vld1q_f32(currentIndexes_float);
+-
+- float min = source[0];
+- float index = 0;
+- float32x4_t minValues = vdupq_n_f32(min);
+- uint32x4_t minValuesIndex = vmovq_n_u32(0);
+- uint32x4_t compareResults;
+- uint32x4_t currentIndexes_u;
+- float32x4_t currentValues;
+-
+- __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+- __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+-
+- for (uint32_t number = 0; number < quarterPoints; number++) {
+- currentValues = vld1q_f32(inputPtr);
+- inputPtr += 4;
+- currentIndexes = vaddq_f32(currentIndexes, indexIncrementValues);
+- currentIndexes_u = vcvtq_u32_f32(currentIndexes);
+- compareResults = vcgeq_f32(currentValues, minValues);
+- minValuesIndex = vorrq_u32(vandq_u32(compareResults, minValuesIndex),
+- vbicq_u32(currentIndexes_u, compareResults));
+- minValues = vminq_f32(currentValues, minValues);
+- }
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)source;
++ float32x4_t indexIncrementValues = vdupq_n_f32(4);
++ __VOLK_ATTR_ALIGNED(16)
++ float currentIndexes_float[4] = { -4.0f, -3.0f, -2.0f, -1.0f };
++ float32x4_t currentIndexes = vld1q_f32(currentIndexes_float);
++
++ float min = source[0];
++ float index = 0;
++ float32x4_t minValues = vdupq_n_f32(min);
++ uint32x4_t minValuesIndex = vmovq_n_u32(0);
++ uint32x4_t compareResults;
++ uint32x4_t currentIndexes_u;
++ float32x4_t currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (uint32_t number = 0; number < quarterPoints; number++) {
++ currentValues = vld1q_f32(inputPtr);
++ inputPtr += 4;
++ currentIndexes = vaddq_f32(currentIndexes, indexIncrementValues);
++ currentIndexes_u = vcvtq_u32_f32(currentIndexes);
++ compareResults = vcgeq_f32(currentValues, minValues);
++ minValuesIndex = vorrq_u32(vandq_u32(compareResults, minValuesIndex),
++ vbicq_u32(currentIndexes_u, compareResults));
++ minValues = vminq_f32(currentValues, minValues);
++ }
+
+- // Calculate the smallest value from the remaining 4 points
+- vst1q_f32(minValuesBuffer, minValues);
+- vst1q_f32(minIndexesBuffer, vcvtq_f32_u32(minValuesIndex));
+- for (uint32_t number = 0; number < 4; number++) {
+- if (minValuesBuffer[number] < min) {
++ // Calculate the smallest value from the remaining 4 points
++ vst1q_f32(minValuesBuffer, minValues);
++ vst1q_f32(minIndexesBuffer, vcvtq_f32_u32(minValuesIndex));
++ for (uint32_t number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValues[number] == min) {
++ if (index > minIndexesBuffer[number])
+ index = minIndexesBuffer[number];
+- min = minValuesBuffer[number];
+- } else if (minValues[number] == min) {
+- if (index > minIndexesBuffer[number])
+- index = minIndexesBuffer[number];
+- }
+ }
++ }
+
+- for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
+- if (source[number] < min) {
+- index = number;
+- min = source[number];
+- }
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
++ index = number;
++ min = source[number];
+ }
+- target[0] = (uint32_t)index;
+ }
++ target[0] = (uint32_t)index;
+ }
+
+ #endif /*LV_HAVE_NEON*/
+@@ -330,18 +322,16 @@ volk_32f_index_min_32u_neon(uint32_t* target, const float* source, uint32_t num_
+ static inline void
+ volk_32f_index_min_32u_generic(uint32_t* target, const float* source, uint32_t num_points)
+ {
+- if (num_points > 0) {
+- float min = source[0];
+- uint32_t index = 0;
+-
+- for (uint32_t i = 1; i < num_points; ++i) {
+- if (source[i] < min) {
+- index = i;
+- min = source[i];
+- }
++ float min = source[0];
++ uint32_t index = 0;
++
++ for (uint32_t i = 1; i < num_points; ++i) {
++ if (source[i] < min) {
++ index = i;
++ min = source[i];
+ }
+- target[0] = index;
+ }
++ target[0] = index;
+ }
+
+ #endif /*LV_HAVE_GENERIC*/
+@@ -364,56 +354,54 @@ volk_32f_index_min_32u_generic(uint32_t* target, const float* source, uint32_t n
+ static inline void
+ volk_32f_index_min_32u_u_avx(uint32_t* target, const float* source, uint32_t num_points)
+ {
+- if (num_points > 0) {
+- const uint32_t quarterPoints = num_points / 8;
+-
+- float* inputPtr = (float*)source;
+-
+- __m256 indexIncrementValues = _mm256_set1_ps(8);
+- __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
+-
+- float min = source[0];
+- float index = 0;
+- __m256 minValues = _mm256_set1_ps(min);
+- __m256 minValuesIndex = _mm256_setzero_ps();
+- __m256 compareResults;
+- __m256 currentValues;
+-
+- __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
+- __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
+-
+- for (uint32_t number = 0; number < quarterPoints; number++) {
+- currentValues = _mm256_loadu_ps(inputPtr);
+- inputPtr += 8;
+- currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
+- compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
+- minValuesIndex =
+- _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
+- minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
+- }
++ const uint32_t quarterPoints = num_points / 8;
++
++ float* inputPtr = (float*)source;
++
++ __m256 indexIncrementValues = _mm256_set1_ps(8);
++ __m256 currentIndexes = _mm256_set_ps(-1, -2, -3, -4, -5, -6, -7, -8);
++
++ float min = source[0];
++ float index = 0;
++ __m256 minValues = _mm256_set1_ps(min);
++ __m256 minValuesIndex = _mm256_setzero_ps();
++ __m256 compareResults;
++ __m256 currentValues;
++
++ __VOLK_ATTR_ALIGNED(32) float minValuesBuffer[8];
++ __VOLK_ATTR_ALIGNED(32) float minIndexesBuffer[8];
++
++ for (uint32_t number = 0; number < quarterPoints; number++) {
++ currentValues = _mm256_loadu_ps(inputPtr);
++ inputPtr += 8;
++ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
++ compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
++ minValuesIndex =
++ _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
++ }
+
+- // Calculate the smalles value from the remaining 8 points
+- _mm256_store_ps(minValuesBuffer, minValues);
+- _mm256_store_ps(minIndexesBuffer, minValuesIndex);
++ // Calculate the smalles value from the remaining 8 points
++ _mm256_store_ps(minValuesBuffer, minValues);
++ _mm256_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (uint32_t number = 0; number < 8; number++) {
+- if (minValuesBuffer[number] < min) {
++ for (uint32_t number = 0; number < 8; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
+ index = minIndexesBuffer[number];
+- min = minValuesBuffer[number];
+- } else if (minValuesBuffer[number] == min) {
+- if (index > minIndexesBuffer[number])
+- index = minIndexesBuffer[number];
+- }
+ }
++ }
+
+- for (uint32_t number = quarterPoints * 8; number < num_points; number++) {
+- if (source[number] < min) {
+- index = number;
+- min = source[number];
+- }
++ for (uint32_t number = quarterPoints * 8; number < num_points; number++) {
++ if (source[number] < min) {
++ index = number;
++ min = source[number];
+ }
+- target[0] = (uint32_t)index;
+ }
++ target[0] = (uint32_t)index;
+ }
+
+ #endif /*LV_HAVE_AVX*/
+@@ -426,56 +414,54 @@ static inline void volk_32f_index_min_32u_u_sse4_1(uint32_t* target,
+ const float* source,
+ uint32_t num_points)
+ {
+- if (num_points > 0) {
+- const uint32_t quarterPoints = num_points / 4;
+-
+- float* inputPtr = (float*)source;
+-
+- __m128 indexIncrementValues = _mm_set1_ps(4);
+- __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+-
+- float min = source[0];
+- float index = 0;
+- __m128 minValues = _mm_set1_ps(min);
+- __m128 minValuesIndex = _mm_setzero_ps();
+- __m128 compareResults;
+- __m128 currentValues;
+-
+- __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+- __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+-
+- for (uint32_t number = 0; number < quarterPoints; number++) {
+- currentValues = _mm_loadu_ps(inputPtr);
+- inputPtr += 4;
+- currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
+- compareResults = _mm_cmplt_ps(currentValues, minValues);
+- minValuesIndex =
+- _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
+- minValues = _mm_blendv_ps(minValues, currentValues, compareResults);
+- }
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)source;
++
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++
++ float min = source[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (uint32_t number = 0; number < quarterPoints; number++) {
++ currentValues = _mm_loadu_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
++ minValuesIndex =
++ _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValues = _mm_blendv_ps(minValues, currentValues, compareResults);
++ }
+
+- // Calculate the smallest value from the remaining 4 points
+- _mm_store_ps(minValuesBuffer, minValues);
+- _mm_store_ps(minIndexesBuffer, minValuesIndex);
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (uint32_t number = 0; number < 4; number++) {
+- if (minValuesBuffer[number] < min) {
++ for (uint32_t number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
+ index = minIndexesBuffer[number];
+- min = minValuesBuffer[number];
+- } else if (minValuesBuffer[number] == min) {
+- if (index > minIndexesBuffer[number])
+- index = minIndexesBuffer[number];
+- }
+ }
++ }
+
+- for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
+- if (source[number] < min) {
+- index = number;
+- min = source[number];
+- }
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
++ index = number;
++ min = source[number];
+ }
+- target[0] = (uint32_t)index;
+ }
++ target[0] = (uint32_t)index;
+ }
+
+ #endif /*LV_HAVE_SSE4_1*/
+@@ -486,57 +472,55 @@ static inline void volk_32f_index_min_32u_u_sse4_1(uint32_t* target,
+ static inline void
+ volk_32f_index_min_32u_u_sse(uint32_t* target, const float* source, uint32_t num_points)
+ {
+- if (num_points > 0) {
+- const uint32_t quarterPoints = num_points / 4;
+-
+- float* inputPtr = (float*)source;
+-
+- __m128 indexIncrementValues = _mm_set1_ps(4);
+- __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
+-
+- float min = source[0];
+- float index = 0;
+- __m128 minValues = _mm_set1_ps(min);
+- __m128 minValuesIndex = _mm_setzero_ps();
+- __m128 compareResults;
+- __m128 currentValues;
+-
+- __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
+- __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
+-
+- for (uint32_t number = 0; number < quarterPoints; number++) {
+- currentValues = _mm_loadu_ps(inputPtr);
+- inputPtr += 4;
+- currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
+- compareResults = _mm_cmplt_ps(currentValues, minValues);
+- minValuesIndex = _mm_or_ps(_mm_and_ps(compareResults, currentIndexes),
+- _mm_andnot_ps(compareResults, minValuesIndex));
+- minValues = _mm_or_ps(_mm_and_ps(compareResults, currentValues),
+- _mm_andnot_ps(compareResults, minValues));
+- }
++ const uint32_t quarterPoints = num_points / 4;
++
++ float* inputPtr = (float*)source;
++
++ __m128 indexIncrementValues = _mm_set1_ps(4);
++ __m128 currentIndexes = _mm_set_ps(-1, -2, -3, -4);
++
++ float min = source[0];
++ float index = 0;
++ __m128 minValues = _mm_set1_ps(min);
++ __m128 minValuesIndex = _mm_setzero_ps();
++ __m128 compareResults;
++ __m128 currentValues;
++
++ __VOLK_ATTR_ALIGNED(16) float minValuesBuffer[4];
++ __VOLK_ATTR_ALIGNED(16) float minIndexesBuffer[4];
++
++ for (uint32_t number = 0; number < quarterPoints; number++) {
++ currentValues = _mm_loadu_ps(inputPtr);
++ inputPtr += 4;
++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
++ compareResults = _mm_cmplt_ps(currentValues, minValues);
++ minValuesIndex = _mm_or_ps(_mm_and_ps(compareResults, currentIndexes),
++ _mm_andnot_ps(compareResults, minValuesIndex));
++ minValues = _mm_or_ps(_mm_and_ps(compareResults, currentValues),
++ _mm_andnot_ps(compareResults, minValues));
++ }
+
+- // Calculate the smallest value from the remaining 4 points
+- _mm_store_ps(minValuesBuffer, minValues);
+- _mm_store_ps(minIndexesBuffer, minValuesIndex);
++ // Calculate the smallest value from the remaining 4 points
++ _mm_store_ps(minValuesBuffer, minValues);
++ _mm_store_ps(minIndexesBuffer, minValuesIndex);
+
+- for (uint32_t number = 0; number < 4; number++) {
+- if (minValuesBuffer[number] < min) {
++ for (uint32_t number = 0; number < 4; number++) {
++ if (minValuesBuffer[number] < min) {
++ index = minIndexesBuffer[number];
++ min = minValuesBuffer[number];
++ } else if (minValuesBuffer[number] == min) {
++ if (index > minIndexesBuffer[number])
+ index = minIndexesBuffer[number];
+- min = minValuesBuffer[number];
+- } else if (minValuesBuffer[number] == min) {
+- if (index > minIndexesBuffer[number])
+- index = minIndexesBuffer[number];
+- }
+ }
++ }
+
+- for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
+- if (source[number] < min) {
+- index = number;
+- min = source[number];
+- }
++ for (uint32_t number = quarterPoints * 4; number < num_points; number++) {
++ if (source[number] < min) {
++ index = number;
++ min = source[number];
+ }
+- target[0] = (uint32_t)index;
+ }
++ target[0] = (uint32_t)index;
+ }
+
+ #endif /*LV_HAVE_SSE*/
+diff --git a/kernels/volk/volk_32fc_index_min_16u.h b/kernels/volk/volk_32fc_index_min_16u.h
+index 8f40730..6ddd8a3 100644
+--- a/kernels/volk/volk_32fc_index_min_16u.h
++++ b/kernels/volk/volk_32fc_index_min_16u.h
+@@ -2,14 +2,14 @@
+ /*
+ * Copyright 2021 Free Software Foundation, Inc.
+ *
+- * This file is part of GNU Radio
++ * This file is part of VOLK
+ *
+- * GNU Radio is free software; you can redistribute it and/or modify
++ * VOLK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3, or (at your option)
+ * any later version.
+ *
+- * GNU Radio is distributed in the hope that it will be useful,
++ * VOLK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+@@ -210,7 +210,6 @@ static inline void
+ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+- const uint32_t num_bytes = num_points * 8;
+
+ union bit128 holderf;
+ union bit128 holderi;
+@@ -230,7 +229,7 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* source, uint32_t num
+ xmm10 = _mm_setr_epi32(4, 4, 4, 4);
+ xmm3 = _mm_set_ps1(FLT_MAX);
+
+- int bound = num_bytes >> 5;
++ int bound = num_points >> 2;
+
+ for (int i = 0; i < bound; ++i) {
+ xmm1 = _mm_load_ps((float*)source);
+@@ -256,7 +255,7 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* source, uint32_t num
+ xmm8 = _mm_add_epi32(xmm8, xmm10);
+ }
+
+- if (num_bytes >> 4 & 1) {
++ if (num_points >> 1 & 1) {
+ xmm2 = _mm_load_ps((float*)source);
+
+ xmm1 = _mm_movelh_ps(bit128_p(&xmm8)->float_vec, bit128_p(&xmm8)->float_vec);
+@@ -283,7 +282,7 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* source, uint32_t num
+ xmm8 = _mm_add_epi32(xmm8, xmm10);
+ }
+
+- if (num_bytes >> 3 & 1) {
++ if (num_points & 1) {
+ sq_dist = lv_creal(source[0]) * lv_creal(source[0]) +
+ lv_cimag(source[0]) * lv_cimag(source[0]);
+
+@@ -324,13 +323,12 @@ static inline void
+ volk_32fc_index_min_16u_generic(uint16_t* target, lv_32fc_t* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+- const uint32_t num_bytes = num_points * 8;
+
+ float sq_dist = 0.0;
+ float min = FLT_MAX;
+ uint16_t index = 0;
+
+- for (uint32_t i = 0; i<num_bytes>> 3; ++i) {
++ for (uint32_t i = 0; i < num_points; ++i) {
+ sq_dist = lv_creal(source[i]) * lv_creal(source[i]) +
+ lv_cimag(source[i]) * lv_cimag(source[i]);
+
+diff --git a/kernels/volk/volk_32fc_index_min_32u.h b/kernels/volk/volk_32fc_index_min_32u.h
+index efa33ee..d5e2a00 100644
+--- a/kernels/volk/volk_32fc_index_min_32u.h
++++ b/kernels/volk/volk_32fc_index_min_32u.h
+@@ -2,14 +2,14 @@
+ /*
+ * Copyright 2021 Free Software Foundation, Inc.
+ *
+- * This file is part of GNU Radio
++ * This file is part of VOLK
+ *
+- * GNU Radio is free software; you can redistribute it and/or modify
++ * VOLK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3, or (at your option)
+ * any later version.
+ *
+- * GNU Radio is distributed in the hope that it will be useful,
++ * VOLK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+@@ -198,8 +198,6 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_1(uint32_t* target,
+ static inline void
+ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* source, uint32_t num_points)
+ {
+- const uint32_t num_bytes = num_points * 8;
+-
+ union bit128 holderf;
+ union bit128 holderi;
+ float sq_dist = 0.0;
+@@ -218,7 +216,7 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* source, uint32_t num
+ xmm10 = _mm_setr_epi32(4, 4, 4, 4);
+ xmm3 = _mm_set_ps1(FLT_MAX);
+
+- int bound = num_bytes >> 5;
++ int bound = num_points >> 2;
+
+ for (int i = 0; i < bound; ++i) {
+ xmm1 = _mm_load_ps((float*)source);
+@@ -244,7 +242,7 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* source, uint32_t num
+ xmm8 = _mm_add_epi32(xmm8, xmm10);
+ }
+
+- if (num_bytes >> 4 & 1) {
++ if (num_points >> 1 & 1) {
+ xmm2 = _mm_load_ps((float*)source);
+
+ xmm1 = _mm_movelh_ps(bit128_p(&xmm8)->float_vec, bit128_p(&xmm8)->float_vec);
+@@ -271,7 +269,7 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* source, uint32_t num
+ xmm8 = _mm_add_epi32(xmm8, xmm10);
+ }
+
+- if (num_bytes >> 3 & 1) {
++ if (num_points & 1) {
+ sq_dist = lv_creal(source[0]) * lv_creal(source[0]) +
+ lv_cimag(source[0]) * lv_cimag(source[0]);
+
+@@ -311,13 +309,11 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* source, uint32_t num
+ static inline void
+ volk_32fc_index_min_32u_generic(uint32_t* target, lv_32fc_t* source, uint32_t num_points)
+ {
+- const uint32_t num_bytes = num_points * 8;
+-
+ float sq_dist = 0.0;
+ float min = FLT_MAX;
+ uint32_t index = 0;
+
+- for (uint32_t i = 0; i<num_bytes>> 3; ++i) {
++ for (uint32_t i = 0; i < num_points; ++i) {
+ sq_dist = lv_creal(source[i]) * lv_creal(source[i]) +
+ lv_cimag(source[i]) * lv_cimag(source[i]);
+
+--
+2.30.2
+
--- /dev/null
+From c68e666420a840cbdeb9529f23af19b6c8e37391 Mon Sep 17 00:00:00 2001
+From: Zlika <zlika_ese@hotmail.com>
+Date: Mon, 5 Jul 2021 13:08:29 +0200
+Subject: [PATCH 10/73] Fix clang-format errors
+
+Signed-off-by: Zlika <zlika_ese@hotmail.com>
+---
+ kernels/volk/volk_32f_index_min_32u.h | 12 ++++--------
+ 1 file changed, 4 insertions(+), 8 deletions(-)
+
+diff --git a/kernels/volk/volk_32f_index_min_32u.h b/kernels/volk/volk_32f_index_min_32u.h
+index c71ee60..92bafbf 100644
+--- a/kernels/volk/volk_32f_index_min_32u.h
++++ b/kernels/volk/volk_32f_index_min_32u.h
+@@ -101,8 +101,7 @@ static inline void volk_32f_index_min_32u_a_sse4_1(uint32_t* target,
+
+ compareResults = _mm_cmplt_ps(currentValues, minValues);
+
+- minValuesIndex =
+- _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValuesIndex = _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
+ minValues = _mm_blendv_ps(minValues, currentValues, compareResults);
+ }
+
+@@ -225,8 +224,7 @@ volk_32f_index_min_32u_a_avx(uint32_t* target, const float* source, uint32_t num
+ inputPtr += 8;
+ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
+ compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
+- minValuesIndex =
+- _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValuesIndex = _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
+ minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
+ }
+
+@@ -376,8 +374,7 @@ volk_32f_index_min_32u_u_avx(uint32_t* target, const float* source, uint32_t num
+ inputPtr += 8;
+ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues);
+ compareResults = _mm256_cmp_ps(currentValues, minValues, _CMP_LT_OS);
+- minValuesIndex =
+- _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValuesIndex = _mm256_blendv_ps(minValuesIndex, currentIndexes, compareResults);
+ minValues = _mm256_blendv_ps(minValues, currentValues, compareResults);
+ }
+
+@@ -436,8 +433,7 @@ static inline void volk_32f_index_min_32u_u_sse4_1(uint32_t* target,
+ inputPtr += 4;
+ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues);
+ compareResults = _mm_cmplt_ps(currentValues, minValues);
+- minValuesIndex =
+- _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
++ minValuesIndex = _mm_blendv_ps(minValuesIndex, currentIndexes, compareResults);
+ minValues = _mm_blendv_ps(minValues, currentValues, compareResults);
+ }
+
+--
+2.30.2
+
--- /dev/null
+From 924b3fffbb9fa6218499a1fa0378262890c3c76d Mon Sep 17 00:00:00 2001
+From: Zlika <zlika_ese@hotmail.com>
+Date: Mon, 5 Jul 2021 14:55:30 +0200
+Subject: [PATCH 11/73] Code cleanup
+
+Signed-off-by: Zlika <zlika_ese@hotmail.com>
+---
+ kernels/volk/volk_32fc_index_min_16u.h | 12 ++++++------
+ kernels/volk/volk_32fc_index_min_32u.h | 14 +++++++-------
+ 2 files changed, 13 insertions(+), 13 deletions(-)
+
+diff --git a/kernels/volk/volk_32fc_index_min_16u.h b/kernels/volk/volk_32fc_index_min_16u.h
+index 6ddd8a3..e355626 100644
+--- a/kernels/volk/volk_32fc_index_min_16u.h
++++ b/kernels/volk/volk_32fc_index_min_16u.h
+@@ -87,7 +87,7 @@
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_16u_a_avx2_variant_0(uint16_t* target,
+- lv_32fc_t* source,
++ const lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+@@ -147,7 +147,7 @@ static inline void volk_32fc_index_min_16u_a_avx2_variant_0(uint16_t* target,
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_16u_a_avx2_variant_1(uint16_t* target,
+- lv_32fc_t* source,
++ const lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+@@ -207,7 +207,7 @@ static inline void volk_32fc_index_min_16u_a_avx2_variant_1(uint16_t* target,
+ #include <xmmintrin.h>
+
+ static inline void
+-volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* source, uint32_t num_points)
++volk_32fc_index_min_16u_a_sse3(uint16_t* target, const lv_32fc_t* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+
+@@ -320,7 +320,7 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, lv_32fc_t* source, uint32_t num
+
+ #ifdef LV_HAVE_GENERIC
+ static inline void
+-volk_32fc_index_min_16u_generic(uint16_t* target, lv_32fc_t* source, uint32_t num_points)
++volk_32fc_index_min_16u_generic(uint16_t* target, const lv_32fc_t* source, uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+
+@@ -358,7 +358,7 @@ volk_32fc_index_min_16u_generic(uint16_t* target, lv_32fc_t* source, uint32_t nu
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_16u_u_avx2_variant_0(uint16_t* target,
+- lv_32fc_t* source,
++ const lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+@@ -418,7 +418,7 @@ static inline void volk_32fc_index_min_16u_u_avx2_variant_0(uint16_t* target,
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_16u_u_avx2_variant_1(uint16_t* target,
+- lv_32fc_t* source,
++ const lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+diff --git a/kernels/volk/volk_32fc_index_min_32u.h b/kernels/volk/volk_32fc_index_min_32u.h
+index d5e2a00..72fb040 100644
+--- a/kernels/volk/volk_32fc_index_min_32u.h
++++ b/kernels/volk/volk_32fc_index_min_32u.h
+@@ -80,7 +80,7 @@
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_32u_a_avx2_variant_0(uint32_t* target,
+- lv_32fc_t* source,
++ const lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ const __m256i indices_increment = _mm256_set1_epi32(8);
+@@ -138,7 +138,7 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_0(uint32_t* target,
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_32u_a_avx2_variant_1(uint32_t* target,
+- lv_32fc_t* source,
++ const lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ const __m256i indices_increment = _mm256_set1_epi32(8);
+@@ -196,7 +196,7 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_1(uint32_t* target,
+ #include <xmmintrin.h>
+
+ static inline void
+-volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* source, uint32_t num_points)
++volk_32fc_index_min_32u_a_sse3(uint32_t* target, const lv_32fc_t* source, uint32_t num_points)
+ {
+ union bit128 holderf;
+ union bit128 holderi;
+@@ -307,7 +307,7 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, lv_32fc_t* source, uint32_t num
+
+ #ifdef LV_HAVE_GENERIC
+ static inline void
+-volk_32fc_index_min_32u_generic(uint32_t* target, lv_32fc_t* source, uint32_t num_points)
++volk_32fc_index_min_32u_generic(uint32_t* target, const lv_32fc_t* source, uint32_t num_points)
+ {
+ float sq_dist = 0.0;
+ float min = FLT_MAX;
+@@ -342,7 +342,7 @@ volk_32fc_index_min_32u_generic(uint32_t* target, lv_32fc_t* source, uint32_t nu
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_32u_u_avx2_variant_0(uint32_t* target,
+- lv_32fc_t* source,
++ const lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ const __m256i indices_increment = _mm256_set1_epi32(8);
+@@ -400,7 +400,7 @@ static inline void volk_32fc_index_min_32u_u_avx2_variant_0(uint32_t* target,
+ #include <volk/volk_avx2_intrinsics.h>
+
+ static inline void volk_32fc_index_min_32u_u_avx2_variant_1(uint32_t* target,
+- lv_32fc_t* source,
++ const lv_32fc_t* source,
+ uint32_t num_points)
+ {
+ const __m256i indices_increment = _mm256_set1_epi32(8);
+@@ -458,7 +458,7 @@ static inline void volk_32fc_index_min_32u_u_avx2_variant_1(uint32_t* target,
+ #include <volk/volk_neon_intrinsics.h>
+
+ static inline void
+-volk_32fc_index_min_32u_neon(uint32_t* target, lv_32fc_t* source, uint32_t num_points)
++volk_32fc_index_min_32u_neon(uint32_t* target, const lv_32fc_t* source, uint32_t num_points)
+ {
+ const uint32_t quarter_points = num_points / 4;
+ const lv_32fc_t* sourcePtr = source;
+--
+2.30.2
+
--- /dev/null
+From e06454128245cdf206808cf2532b41c5fee54453 Mon Sep 17 00:00:00 2001
+From: Zlika <zlika_ese@hotmail.com>
+Date: Mon, 5 Jul 2021 15:04:17 +0200
+Subject: [PATCH 12/73] Fix clang-format errors
+
+Signed-off-by: Zlika <zlika_ese@hotmail.com>
+---
+ kernels/volk/volk_32fc_index_min_16u.h | 10 ++++++----
+ kernels/volk/volk_32fc_index_min_32u.h | 15 +++++++++------
+ 2 files changed, 15 insertions(+), 10 deletions(-)
+
+diff --git a/kernels/volk/volk_32fc_index_min_16u.h b/kernels/volk/volk_32fc_index_min_16u.h
+index e355626..64fcf7b 100644
+--- a/kernels/volk/volk_32fc_index_min_16u.h
++++ b/kernels/volk/volk_32fc_index_min_16u.h
+@@ -206,8 +206,9 @@ static inline void volk_32fc_index_min_16u_a_avx2_variant_1(uint16_t* target,
+ #include <pmmintrin.h>
+ #include <xmmintrin.h>
+
+-static inline void
+-volk_32fc_index_min_16u_a_sse3(uint16_t* target, const lv_32fc_t* source, uint32_t num_points)
++static inline void volk_32fc_index_min_16u_a_sse3(uint16_t* target,
++ const lv_32fc_t* source,
++ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+
+@@ -319,8 +320,9 @@ volk_32fc_index_min_16u_a_sse3(uint16_t* target, const lv_32fc_t* source, uint32
+ #endif /*LV_HAVE_SSE3*/
+
+ #ifdef LV_HAVE_GENERIC
+-static inline void
+-volk_32fc_index_min_16u_generic(uint16_t* target, const lv_32fc_t* source, uint32_t num_points)
++static inline void volk_32fc_index_min_16u_generic(uint16_t* target,
++ const lv_32fc_t* source,
++ uint32_t num_points)
+ {
+ num_points = (num_points > USHRT_MAX) ? USHRT_MAX : num_points;
+
+diff --git a/kernels/volk/volk_32fc_index_min_32u.h b/kernels/volk/volk_32fc_index_min_32u.h
+index 72fb040..2fb0d7e 100644
+--- a/kernels/volk/volk_32fc_index_min_32u.h
++++ b/kernels/volk/volk_32fc_index_min_32u.h
+@@ -195,8 +195,9 @@ static inline void volk_32fc_index_min_32u_a_avx2_variant_1(uint32_t* target,
+ #include <pmmintrin.h>
+ #include <xmmintrin.h>
+
+-static inline void
+-volk_32fc_index_min_32u_a_sse3(uint32_t* target, const lv_32fc_t* source, uint32_t num_points)
++static inline void volk_32fc_index_min_32u_a_sse3(uint32_t* target,
++ const lv_32fc_t* source,
++ uint32_t num_points)
+ {
+ union bit128 holderf;
+ union bit128 holderi;
+@@ -306,8 +307,9 @@ volk_32fc_index_min_32u_a_sse3(uint32_t* target, const lv_32fc_t* source, uint32
+ #endif /*LV_HAVE_SSE3*/
+
+ #ifdef LV_HAVE_GENERIC
+-static inline void
+-volk_32fc_index_min_32u_generic(uint32_t* target, const lv_32fc_t* source, uint32_t num_points)
++static inline void volk_32fc_index_min_32u_generic(uint32_t* target,
++ const lv_32fc_t* source,
++ uint32_t num_points)
+ {
+ float sq_dist = 0.0;
+ float min = FLT_MAX;
+@@ -457,8 +459,9 @@ static inline void volk_32fc_index_min_32u_u_avx2_variant_1(uint32_t* target,
+ #include <arm_neon.h>
+ #include <volk/volk_neon_intrinsics.h>
+
+-static inline void
+-volk_32fc_index_min_32u_neon(uint32_t* target, const lv_32fc_t* source, uint32_t num_points)
++static inline void volk_32fc_index_min_32u_neon(uint32_t* target,
++ const lv_32fc_t* source,
++ uint32_t num_points)
+ {
+ const uint32_t quarter_points = num_points / 4;
+ const lv_32fc_t* sourcePtr = source;
+--
+2.30.2
+
--- /dev/null
+From a0837c094fa4725e3362e05da82e78233c104975 Mon Sep 17 00:00:00 2001
+From: AlexandreRouma <alexandre.rouma@gmail.com>
+Date: Thu, 30 Sep 2021 13:52:32 +0200
+Subject: [PATCH 55/73] asan: Fix volk_malloc alignment bug
+
+ASAN (the Address Sanitizer used by GCC) requires memory allocations to
+be a multiple of the alignment. To replicate the bug, use a version of
+libvolk without this fix, call volk_malloc() with a number of byte
+that's not a multiple of the alignment and compile it with the Address
+sanitizer enable (-fsanitize=address). The software will error out
+and complain about the alignement. This patch fixes it by adding the
+missing number of bytes to the size variable so that it becomes a
+multiple of the alignment.
+
+Signed-off-by: AlexandreRouma <alexandre.rouma@gmail.com>
+---
+ lib/volk_malloc.c | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
+
+diff --git a/lib/volk_malloc.c b/lib/volk_malloc.c
+index 8e84c14..f489ef8 100644
+--- a/lib/volk_malloc.c
++++ b/lib/volk_malloc.c
+@@ -50,6 +50,17 @@
+
+ void* volk_malloc(size_t size, size_t alignment)
+ {
++ if ((size == 0) || (alignment == 0)) {
++ fprintf(stderr, "VOLK: Error allocating memory: either size or alignment is 0");
++ return NULL;
++ }
++ // Tweak size to satisfy ASAN (the GCC address sanitizer).
++ // Calling 'volk_malloc' might therefor result in the allocation of more memory than
++ // requested for correct alignment. Any allocation size change here will in general not
++ // impact the end result since initial size alignment is required either way.
++ if (size % alignment) {
++ size += alignment - (size % alignment);
++ }
+ #if HAVE_POSIX_MEMALIGN
+ // quoting posix_memalign() man page:
+ // "alignment must be a power of two and a multiple of sizeof(void *)"
+--
+2.30.2
+
--- /dev/null
+From a307c9727be4b4b608b5e6b1ae3f46218df479c2 Mon Sep 17 00:00:00 2001
+From: Johannes Demel <demel@uni-bremen.de>
+Date: Sat, 2 Oct 2021 11:38:46 +0200
+Subject: [PATCH 56/73] format: Fix code format
+
+I was too quick to merge a PR and missed a formatting issue. Thus, I fix
+it now.
+
+Signed-off-by: Johannes Demel <demel@uni-bremen.de>
+---
+ lib/volk_malloc.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/lib/volk_malloc.c b/lib/volk_malloc.c
+index f489ef8..0410d29 100644
+--- a/lib/volk_malloc.c
++++ b/lib/volk_malloc.c
+@@ -56,8 +56,8 @@ void* volk_malloc(size_t size, size_t alignment)
+ }
+ // Tweak size to satisfy ASAN (the GCC address sanitizer).
+ // Calling 'volk_malloc' might therefor result in the allocation of more memory than
+- // requested for correct alignment. Any allocation size change here will in general not
+- // impact the end result since initial size alignment is required either way.
++ // requested for correct alignment. Any allocation size change here will in general
++ // not impact the end result since initial size alignment is required either way.
+ if (size % alignment) {
+ size += alignment - (size % alignment);
+ }
+--
+2.30.2
+
--- /dev/null
+From 799245ea6e9e05cc0ed0fabe783fbbe1a5054fd4 Mon Sep 17 00:00:00 2001
+From: "A. Maitland Bottoms" <bottoms@debian.org>
+Date: Tue, 27 Mar 2018 22:02:59 -0400
+Subject: [PATCH 2/6] make acc happy
+
+The abi-compliance-checker grabs all the .h files it finds
+and tries to compile them all. Even though some are not
+appropriate for the architecture being run on. Being careful
+with preprocessor protections avoids problems.
+---
+ include/volk/volk_neon_intrinsics.h | 2 ++
+ kernels/volk/volk_32f_8u_polarbutterflypuppet_32f.h | 1 +
+ kernels/volk/volk_8u_x2_encodeframepolar_8u.h | 3 ---
+ 3 files changed, 3 insertions(+), 3 deletions(-)
+
+--- a/include/volk/volk_neon_intrinsics.h
++++ b/include/volk/volk_neon_intrinsics.h
+@@ -79,6 +79,7 @@
+
+ #ifndef INCLUDE_VOLK_VOLK_NEON_INTRINSICS_H_
+ #define INCLUDE_VOLK_VOLK_NEON_INTRINSICS_H_
++#ifdef LV_HAVE_NEON
+ #include <arm_neon.h>
+
+
+@@ -294,4 +295,5 @@
+ #endif
+ }
+
++#endif /*LV_HAVE_NEON*/
+ #endif /* INCLUDE_VOLK_VOLK_NEON_INTRINSICS_H_ */
+--- a/kernels/volk/volk_32f_8u_polarbutterflypuppet_32f.h
++++ b/kernels/volk/volk_32f_8u_polarbutterflypuppet_32f.h
+@@ -31,6 +31,7 @@
+ #include <volk/volk_32f_8u_polarbutterfly_32f.h>
+ #include <volk/volk_8u_x3_encodepolar_8u_x2.h>
+ #include <volk/volk_8u_x3_encodepolarpuppet_8u.h>
++#include <volk/volk_8u_x2_encodeframepolar_8u.h>
+
+
+ static inline void sanitize_bytes(unsigned char* u, const int elements)
+--- a/kernels/volk/volk_8u_x2_encodeframepolar_8u.h
++++ b/kernels/volk/volk_8u_x2_encodeframepolar_8u.h
+@@ -60,8 +60,6 @@
+ }
+ }
+
+-#ifdef LV_HAVE_GENERIC
+-
+ static inline void volk_8u_x2_encodeframepolar_8u_generic(unsigned char* frame,
+ unsigned char* temp,
+ unsigned int frame_size)
+@@ -81,7 +79,6 @@
+ --stage;
+ }
+ }
+-#endif /* LV_HAVE_GENERIC */
+
+ #ifdef LV_HAVE_SSSE3
+ #include <tmmintrin.h>
--- /dev/null
+--- a/apps/CMakeLists.txt
++++ b/apps/CMakeLists.txt
+@@ -62,7 +62,7 @@
+ target_link_libraries(volk_profile PRIVATE std::filesystem)
+ endif()
+
+-if(ENABLE_STATIC_LIBS)
++if(ENABLE_STATIC_LIBS AND ENABLE_STATIC_APPS)
+ target_link_libraries(volk_profile PRIVATE volk_static)
+ set_target_properties(volk_profile PROPERTIES LINK_FLAGS "-static")
+ else()
+@@ -79,7 +79,7 @@
+ add_executable(volk-config-info volk-config-info.cc ${CMAKE_CURRENT_SOURCE_DIR}/volk_option_helpers.cc
+ )
+
+-if(ENABLE_STATIC_LIBS)
++if(ENABLE_STATIC_LIBS AND ENABLE_STATIC_APPS)
+ target_link_libraries(volk-config-info volk_static)
+ set_target_properties(volk-config-info PROPERTIES LINK_FLAGS "-static")
+ else()
--- /dev/null
+--- a/cpu_features/README.md
++++ b/cpu_features/README.md
+@@ -1,4 +1,4 @@
+-# cpu_features [](https://travis-ci.org/google/cpu_features) [](https://ci.appveyor.com/project/gchatelet/cpu-features/branch/master)
++# cpu_features
+
+ A cross-platform C library to retrieve CPU features (such as available
+ instructions) at runtime.
+--- a/README.md
++++ b/README.md
+@@ -1,9 +1,3 @@
+-[](https://travis-ci.com/gnuradio/volk) [](https://ci.appveyor.com/project/gnuradio/volk/branch/master)
+-
+-
+-
+-
+-
+ # Welcome to VOLK!
+
+ VOLK is a sub-project of GNU Radio. Please see http://libvolk.org for bug
--- /dev/null
+0001-Add-volk_32f-c-_index_min_16-32u.patch
+0002-Fix-volk_32fc_index_min_32u_neon.patch
+0003-Fix-volk_32fc_index_min_32u_neon.patch
+0004-Code-cleanup.patch
+0005-Fix-clang-format-errors.patch
+0006-New-generic-implementation-fixed-typos.patch
+0007-Add-the-list-of-contributors-agreeing-to-LGPL-licens.patch
+0009-Code-cleanup.patch
+0010-Fix-clang-format-errors.patch
+0011-Code-cleanup.patch
+0012-Fix-clang-format-errors.patch
+0055-asan-Fix-volk_malloc-alignment-bug.patch
+0056-format-Fix-code-format.patch
+make-acc-happy
+optional-static-apps
+remove-external-HTML-resources
+skip-cpu_features-on-kfreebsd
+use-system-cpu-features-package.patch
--- /dev/null
+Subject: skip cpu_freatures on kfreebsd
+Author: A. Maitland Bottoms <bottoms@debian.org>
+
+ Avoid #error "Unsupported OS" on kFreeBSD
+
+--- a/CMakeLists.txt
++++ b/CMakeLists.txt
+@@ -133,8 +133,10 @@
+ ########################################################################
+
+ # cpu_features - sensible defaults, user settable option
+-if(CMAKE_SYSTEM_PROCESSOR MATCHES
+- "(^mips)|(^arm)|(^aarch64)|(x86_64)|(AMD64|amd64)|(^i.86$)|(^powerpc)|(^ppc)")
++message(STATUS "Building Volk for ${CMAKE_SYSTEM_NAME} on ${CMAKE_SYSTEM_PROCESSOR}")
++if((CMAKE_SYSTEM_PROCESSOR MATCHES
++ "(^mips)|(^arm)|(^aarch64)|(x86_64)|(AMD64|amd64)|(^i.86$)|(^powerpc)|(^ppc)")
++ AND (NOT CMAKE_SYSTEM_NAME MATCHES "kFreeBSD"))
+ option(VOLK_CPU_FEATURES "Volk uses cpu_features" ON)
+ else()
+ option(VOLK_CPU_FEATURES "Volk uses cpu_features" OFF)
--- /dev/null
+Description: use system cpu_features package
+
+Author: Shengjing Zhu <zhsj@debian.org>
+Last-Update: 2020-12-26
+
+--- a/CMakeLists.txt
++++ b/CMakeLists.txt
+@@ -142,17 +142,7 @@
+ option(VOLK_CPU_FEATURES "Volk uses cpu_features" OFF)
+ endif()
+ if (VOLK_CPU_FEATURES)
+- if(NOT EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/cpu_features/CMakeLists.txt" )
+- message(FATAL_ERROR "cpu_features/CMakeLists.txt not found. Did you forget to git clone recursively?\nFix with: git submodule update --init")
+- endif()
+- message(STATUS "Building Volk with cpu_features")
+- set(BUILD_PIC ON CACHE BOOL
+- "Build cpu_features with Position Independent Code (PIC)."
+- FORCE)
+- set(BUILD_SHARED_LIBS_SAVED "${BUILD_SHARED_LIBS}")
+- set(BUILD_SHARED_LIBS OFF)
+- add_subdirectory(cpu_features)
+- set(BUILD_SHARED_LIBS "${BUILD_SHARED_LIBS_SAVED}")
++ find_package(CpuFeatures)
+ else()
+ message(STATUS "Building Volk without cpu_features")
+ endif()
+--- a/lib/CMakeLists.txt
++++ b/lib/CMakeLists.txt
+@@ -517,7 +517,7 @@
+ if(VOLK_CPU_FEATURES)
+ set_source_files_properties(volk_cpu.c PROPERTIES COMPILE_DEFINITIONS "VOLK_CPU_FEATURES=1")
+ target_include_directories(volk_obj
+- PRIVATE $<TARGET_PROPERTY:cpu_features,INTERFACE_INCLUDE_DIRECTORIES>
++ PRIVATE $<TARGET_PROPERTY:CpuFeatures::cpu_features,INTERFACE_INCLUDE_DIRECTORIES>
+ )
+ endif()
+
--- /dev/null
+#!/usr/bin/make -f
+DEB_HOST_MULTIARCH ?= $(shell dpkg-architecture -qDEB_HOST_MULTIARCH)
+export DEB_HOST_MULTIARCH
+#export DH_VERBOSE=1
+
+%:
+ dh $@ --with python3
+
+override_dh_auto_configure:
+ dh_auto_configure -- -DLIB_SUFFIX="/$(DEB_HOST_MULTIARCH)" \
+ -DENABLE_STATIC_LIBS=On -DPYTHON_EXECUTABLE=/usr/bin/python3 \
+ -DCMAKE_BUILD_TYPE=RelWithDebInfo
+
+override_dh_auto_build-indep:
+ cmake --build obj-* --target all
+ cmake --build obj-* --target volk_doc
+
+override_dh_auto_test:
+ - dh_auto_test -- CTEST_TEST_TIMEOUT=60
--- /dev/null
+3.0 (quilt)
--- /dev/null
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.40.10.
+.TH VOLK-CONFIG-INFO "1" "July 2014" "volk-config-info 0.1" "User Commands"
+.SH NAME
+volk-config-info \- pkgconfig-like tool for Vector Optimized Library of Kernels 0.1
+.SH DESCRIPTION
+.SS "Program options: volk-config-info [options]:"
+.TP
+\fB\-h\fR [ \fB\-\-help\fR ]
+print help message
+.TP
+\fB\-\-prefix\fR
+print VOLK installation prefix
+.TP
+\fB\-\-builddate\fR
+print VOLK build date (RFC2822 format)
+.TP
+\fB\-\-cc\fR
+print VOLK C compiler version
+.TP
+\fB\-\-cflags\fR
+print VOLK CFLAGS
+.TP
+\fB\-\-all\-machines\fR
+print VOLK machines built into library
+.TP
+\fB\-\-avail\-machines\fR
+print VOLK machines the current platform can use
+.TP
+\fB\-\-machine\fR
+print the VOLK machine that will be used
+.TP
+\fB\-v\fR [ \fB\-\-version\fR ]
+print VOLK version
+.SH "SEE ALSO"
+The full documentation for
+.B volk-config-info
+is maintained as a Texinfo manual. If the
+.B info
+and
+.B volk-config-info
+programs are properly installed at your site, the command
+.IP
+.B info volk-config-info
+.PP
+should give you access to the complete manual.
--- /dev/null
+.TH GNURADIO "1" "August 2013" "volk_modtool 3.7" "User Commands"
+.SH NAME
+volk_modtool \- tailor VOLK modules
+.SH DESCRIPTION
+The volk_modtool tool is installed along with VOLK as a way of helping
+to construct, add to, and interogate the VOLK library or companion
+libraries.
+.P
+volk_modtool is installed into $prefix/bin.
+.P
+VOLK modtool enables creating standalone (out-of-tree) VOLK modules
+and provides a few tools for sharing VOLK kernels between VOLK
+modules. If you need to design or work with VOLK kernels away from
+the canonical VOLK library, this is the tool. If you need to tailor
+your own VOLK library for whatever reason, this is the tool.
+.P
+The canonical VOLK library installs a volk.h and a libvolk.so. Your
+own library will install volk_$name.h and libvolk_$name.so. Ya Gronk?
+Good.
+.P
+There isn't a substantial difference between the canonical VOLK
+module and any other VOLK module. They're all peers. Any module
+created via VOLK modtool will come complete with a default
+volk_modtool.cfg file associating the module with the base from which
+it came, its distinctive $name and its destination (or path). These
+values (created from user input if VOLK modtool runs without a
+user-supplied config file or a default config file) serve as default
+values for some VOLK modtool actions. It's more or less intended for
+the user to change directories to the top level of a created VOLK
+module and then run volk_modtool to take advantage of the values
+stored in the default volk_modtool.cfg file.
+.P
+Apart from creating new VOLK modules, VOLK modtool allows you to list
+the names of kernels in other modules, list the names of kernels in
+the current module, add kernels from another module into the current
+module, and remove kernels from the current module. When moving
+kernels between modules, VOLK modtool does its best to keep the qa
+and profiling code for those kernels intact. If the base has a test
+or a profiling call for some kernel, those calls will follow the
+kernel when VOLK modtool adds that kernel. If QA or profiling
+requires a puppet kernel, the puppet kernel will follow the original
+kernel when VOLK modtool adds that original kernel. VOLK modtool
+respects puppets.
+.P
+======================================================================
+.P
+.SH Installing a new VOLK Library:
+.P
+Run the command "volk_modtool -i". This will ask you three questions:
+.P
+ name: // the name to give your VOLK library: volk_<name>
+ destination: // directory new source tree is built under -- must exists.
+ // It will create <directory>/volk_<name>
+ base: // the directory containing the original VOLK source code
+.P
+This will build a new skeleton directory in the destination provided
+with the name volk_<name>. It will contain the necessary structure to
+build:
+.P
+ mkdir build
+ cd build
+ cmake -DCMAKE_INSTALL_PREFIX=/opt/volk ../
+ make
+ sudo make install
+.P
+Right now, the library is empty and contains no kernels. Kernels can
+be added from another VOLK library using the '-a' option. If not
+specified, the kernel will be extracted from the base VOLK
+directory. Using the '-b' allows us to specify another VOLK library to
+use for this purpose.
+.P
+ volk_modtool -a -n 32fc_x2_conjugate_dot_prod_32fc
+.P
+This will put the code for the new kernel into
+<destination>/volk_<name>/kernels/volk_<name>/
+.P
+Other kernels must be added by hand. See the following webpages for
+more information about creating VOLK kernels:
+ http://gnuradio.org/doc/doxygen/volk_guide.html
+ http://gnuradio.org/redmine/projects/gnuradio/wiki/Volk
+.P
+======================================================================
+.P
+.SH OPTIONS
+.P
+Options for Adding and Removing Kernels:
+ -a, --add_kernel
+ Add kernel from existing VOLK module. Uses the base VOLK module
+ unless -b is used. Use -n to specify the kernel name.
+ Requires: -n.
+ Optional: -b
+.P
+ -A, --add_all_kernels
+ Add all kernels from existing VOLK module. Uses the base VOLK
+ module unless -b is used.
+ Optional: -b
+.P
+ -x, --remove_kernel
+ Remove kernel from module.
+ Required: -n.
+ Optional: -b
+.P
+Options for Listing Kernels:
+ -l, --list
+ Lists all kernels available in the base VOLK module.
+.P
+ -k, --kernels
+ Lists all kernels in this VOLK module.
+.P
+ -r, --remote-list
+ Lists all kernels in another VOLK module that is specified
+ using the -b option.
--- /dev/null
+.TH UHD_FFT "1" "March 2012" "volk_profile 3.5" "User Commands"
+.SH NAME
+volk_profile \- Quality Assurance application for libvolk functions
+.SH DESCRIPTION
+Writes profile results to a file.
--- /dev/null
+version=4
+ opts="filenamemangle=s%(?:.*?)?volk-?(\d[\d.]*)\.tar\.xz%volk_$1.orig.tar.xz%" \
+ https://github.com/gnuradio/volk/releases \
+ (?:.*?/)?volk-?(\d[\d.]*)\.tar\.xz debian uupdate