From: A. Maitland Bottoms Date: Sun, 4 Feb 2018 18:12:21 +0000 (+0000) Subject: volk (1.3-3) unstable; urgency=medium X-Git-Tag: archive/raspbian/1.3-3+rpi1^2~23 X-Git-Url: https://dgit.raspbian.org/?a=commitdiff_plain;h=dcb71b1bb990cfe4564ef50e3ec969086af6bc58;p=volk.git volk (1.3-3) unstable; urgency=medium * update to v1.3-23-g0109b2e * update debian/libvolk1-dev.abi.tar.gz.amd64 * Add breaks/replaces gnuradio (<=3.7.2.1) (LP: #1614235) [dgit import unpatched volk 1.3-3] --- dcb71b1bb990cfe4564ef50e3ec969086af6bc58 diff --cc debian/changelog index 0000000,0000000..7bd8a0a new file mode 100644 --- /dev/null +++ b/debian/changelog @@@ -1,0 -1,0 +1,319 @@@ ++volk (1.3-3) unstable; urgency=medium ++ ++ * update to v1.3-23-g0109b2e ++ * update debian/libvolk1-dev.abi.tar.gz.amd64 ++ * Add breaks/replaces gnuradio (<=3.7.2.1) (LP: #1614235) ++ ++ -- A. Maitland Bottoms Sun, 04 Feb 2018 13:12:21 -0500 ++ ++volk (1.3-2) unstable; urgency=medium ++ ++ * update to v1.3-16-g28b03a9 ++ apps: fix profile update reading end of lines ++ qa: lower tolerance for 32fc_mag to fix issue #96 ++ * include upstream master patch to sort input files ++ ++ -- A. Maitland Bottoms Sun, 27 Aug 2017 13:44:55 -0400 ++ ++volk (1.3-1) unstable; urgency=medium ++ ++ * New upstream release ++ * The index_max kernels were named with the wrong output datatype. To ++ fix this there are new kernels that return a 32u (int32_t) and the ++ existing kernels had their signatures changed to return 16u (int16_t). ++ * The output to stdout and stderr has been shuffled around. There is no ++ longer a message that prints what VOLK machine is being used and the ++ warning messages go to stderr rather than stdout. ++ * The 32fc_index_max kernels previously were only accurate to the SSE ++ register width (4 points). This was a pretty serious and long-lived ++ bug that's been fixed and the QA updated appropriately. ++ ++ -- A. Maitland Bottoms Sat, 02 Jul 2016 16:30:47 -0400 ++ ++volk (1.2.2-2) unstable; urgency=medium ++ ++ * update to v1.2.2-11-g78c8bc4 (to follow gnuradio maint branch) ++ ++ -- A. Maitland Bottoms Sun, 19 Jun 2016 14:44:15 -0400 ++ ++volk (1.2.2-1) unstable; urgency=medium ++ ++ * New upstream release ++ ++ -- A. Maitland Bottoms Fri, 08 Apr 2016 00:12:10 -0400 ++ ++volk (1.2.1-2) unstable; urgency=medium ++ ++ * Upstream patches: ++ Fix some CMake complaints ++ The fix for compilation with cmake 3.5 ++ ++ -- A. Maitland Bottoms Wed, 23 Mar 2016 17:47:54 -0400 ++ ++volk (1.2.1-1) unstable; urgency=medium ++ ++ * New upstream release ++ ++ -- A. Maitland Bottoms Sun, 07 Feb 2016 19:38:32 -0500 ++ ++volk (1.2-1) unstable; urgency=medium ++ ++ * New upstream release ++ ++ -- A. Maitland Bottoms Thu, 24 Dec 2015 20:28:13 -0500 ++ ++volk (1.1.1-5) experimental; urgency=medium ++ ++ * update to v1.1.1-22-gef53547 to support gnuradio 3.7.9 ++ ++ -- A. Maitland Bottoms Fri, 11 Dec 2015 13:12:55 -0500 ++ ++volk (1.1.1-4) unstable; urgency=medium ++ ++ * more lintian fixes ++ ++ -- A. Maitland Bottoms Wed, 25 Nov 2015 21:49:58 -0500 ++ ++volk (1.1.1-3) unstable; urgency=medium ++ ++ * Lintian fixes Pre-Depends ++ ++ -- A. Maitland Bottoms Thu, 19 Nov 2015 21:24:27 -0500 ++ ++volk (1.1.1-2) unstable; urgency=medium ++ ++ * Note that libvolk1-dev replaces files in gnuradio-dev versions <<3.7.8 ++ (Closes: #802646) again. Thanks Andreas Beckmann. ++ ++ -- A. Maitland Bottoms Fri, 13 Nov 2015 18:45:49 -0500 ++ ++volk (1.1.1-1) unstable; urgency=medium ++ ++ * New upstream release ++ * New architectures exist for the AVX2 and FMA ISAs. ++ * The profiler now generates buffers that are vlen + a tiny amount and ++ generates random data to fill buffers. This is intended to catch bugs ++ in protokernels that write beyond num_points. ++ * Note that libvolk1-dev replaces files in earlier gnuradio-dev versions ++ (Closes: #802646) ++ ++ -- A. Maitland Bottoms Sun, 01 Nov 2015 18:45:43 -0500 ++ ++volk (1.1-4) unstable; urgency=medium ++ ++ * update to v1.1-12-g264addc ++ ++ -- A. Maitland Bottoms Tue, 29 Sep 2015 23:41:50 -0400 ++ ++volk (1.1-3) unstable; urgency=low ++ ++ * drop dh_acc to get reproducible builds ++ ++ -- A. Maitland Bottoms Fri, 11 Sep 2015 22:57:06 -0400 ++ ++volk (1.1-2) unstable; urgency=low ++ ++ * use dh-acc ++ ++ -- A. Maitland Bottoms Mon, 07 Sep 2015 15:45:20 -0400 ++ ++volk (1.1-1) unstable; urgency=medium ++ ++ * re-organize package naming convention ++ * New upstream release tag v1.1 ++ New architectures exist for the AVX2 and FMA ISAs. Along ++ with the build-system support the following kernels have ++ no proto-kernels taking advantage of these architectures: ++ ++ * 32f_x2_dot_prod_32f ++ * 32fc_x2_multiply_32fc ++ * 64_byteswap ++ * 32f_binary_slicer_8i ++ * 16u_byteswap ++ * 32u_byteswap ++ ++ QA/profiler ++ ----------- ++ ++ The profiler now generates buffers that are vlen + a tiny ++ amount and generates random data to fill buffers. This is ++ intended to catch bugs in protokernels that write beyond ++ num_points. ++ ++ -- A. Maitland Bottoms Wed, 26 Aug 2015 09:22:48 -0400 ++ ++volk (1.0.2-2) unstable; urgency=low ++ ++ * Use SOURCE_DATE_EPOCH from the environment, if defined, ++ rather than current date and time to implement volk_build_date() ++ (embedding build date in a library does not help reproducible builds) ++ * add watch file ++ ++ -- A. Maitland Bottoms Sat, 15 Aug 2015 17:43:15 -0400 ++ ++volk (1.0.2-1) unstable; urgency=medium ++ ++ * Maintenance release 24 Jul 2015 by Nathan West ++ * The major change is the CMake logic to add ASM protokernels. Rather ++ than depending on CFLAGS and ASMFLAGS we use the results of VOLK's ++ built in has_ARCH tests. All configurations should work the same as ++ before, but manually specifying CFLAGS and ASMFLAGS on the cmake call ++ for ARM native builds should no longer be necessary. ++ * The 32fc_s32fc_x2_rotator_32fc generic protokernel now includes a ++ previously implied header. ++ * Finally, there is a fix to return the "best" protokernel to the ++ dispatcher when no volk_config exists. Thanks to Alexandre Raymond for ++ pointing this out. ++ * with maint branch patch: ++ kernels-add-missing-include-arm_neon.h ++ * removed unused build-dependency on liboil0.3-dev (closes: #793626) ++ ++ -- A. Maitland Bottoms Wed, 05 Aug 2015 00:43:40 -0400 ++ ++volk (1.0.1-1) unstable; urgency=low ++ ++ * Maintenance Release v1.0.1 08 Jul 2015 by Nathan West ++ This is a maintenance release with bug fixes since the initial release of ++ v1.0 in April. ++ ++ * Contributors ++ ++ The following authors have contributed code to this release: ++ ++ Doug Geiger doug.geiger@bioradiation.net ++ Elliot Briggs elliot.briggs@gmail.com ++ Marcus Mueller marcus@hostalia.de ++ Nathan West nathan.west@okstate.edu ++ Tom Rondeau tom@trondeau.com ++ ++ * Kernels ++ ++ Several bug fixes in different kernels. The NEON implementations of the ++ following kernels have been fixed: ++ ++ 32f_x2_add_32f ++ 32f_x2_dot_prod_32f ++ 32fc_s32fc_multiply_32fc ++ 32fc_x2_multiply_32fc ++ ++ Additionally the NEON asm based 32f_x2_add_32f protokernels were not being ++ used and are now included and available for use via the dispatcher. ++ ++ The 32f_s32f_x2_fm_detect_32f kernel now has a puppet. This solves QA seg ++ faults on 32-bit machines and provide a better test for this kernel. ++ ++ The 32fc_s32fc_x2_rotator_32fc generic protokernel replaced cabsf with ++ hypotf for better Android support. ++ ++ * Building ++ ++ Static builds now trigger the applications (volk_profile and ++ volk-config-info) to be statically linked. ++ ++ The file gcc_x86_cpuid.h has been removed since it was no longer being ++ used. Previously it provided cpuid functionality for ancient compilers ++ that we do not support. ++ ++ All build types now use -Wall. ++ ++ * QA and Testing ++ ++ The documentation around the --update option to volk_profile now makes it ++ clear that the option will only profile kernels without entries in ++ volk_profile. The signature of run_volk_tests with expanded args changed ++ signed types to unsigned types to reflect the actual input. ++ ++ The remaining changes are all non-functional changes to address issues ++ from Coverity. ++ ++ -- A. Maitland Bottoms Fri, 10 Jul 2015 17:57:42 -0400 ++ ++volk (1.0-5) unstable; urgency=medium ++ ++ * native-armv7-build-support skips neon on Debian armel (Closes: #789972) ++ ++ -- A. Maitland Bottoms Sat, 04 Jul 2015 12:36:36 -0400 ++ ++volk (1.0-4) unstable; urgency=low ++ ++ * update native-armv7-build-support patch from gnuradio volk package ++ ++ -- A. Maitland Bottoms Thu, 25 Jun 2015 16:38:49 -0400 ++ ++volk (1.0-3) unstable; urgency=medium ++ ++ * Add Breaks/Replaces (Closes: #789893, #789894) ++ * Allow failing tests ++ ++ -- A. Maitland Bottoms Thu, 25 Jun 2015 12:46:06 -0400 ++ ++volk (1.0-2) unstable; urgency=medium ++ ++ * kernels-add-missing-math.h-include-to-rotator ++ ++ -- A. Maitland Bottoms Wed, 24 Jun 2015 21:09:32 -0400 ++ ++volk (1.0-1) unstable; urgency=low ++ ++ * Initial package (Closes: #782417) ++ Initial Release 11 Apr 2015 by Nathan West ++ ++ VOLK 1.0 is available. This is the first release of VOLK as an independently ++ tracked sub-project of GNU Radio. ++ ++ * Contributors ++ ++ VOLK has been tracked separately from GNU Radio since 2014 Dec 23. ++ Contributors between the split and the initial release are ++ ++ Albert Holguin aholguin_77@yahoo.com ++ Doug Geiger doug.geiger@bioradiation.net ++ Elliot Briggs elliot.briggs@gmail.com ++ Julien Olivain julien.olivain@lsv.ens-cachan.fr ++ Michael Dickens michael.dickens@ettus.com ++ Nathan West nathan.west@okstate.edu ++ Tom Rondeau tom@trondeau.com ++ ++ * QA ++ ++ The test and profiler have significantly changed. The profiler supports ++ run-time changes to vlen and iters to help kernel development and provide ++ more flexibility on embedded systems. Additionally there is a new option ++ to update an existing volk_profile results file with only new kernels which ++ will save time when updating to newer versions of VOLK ++ ++ The QA system creates a static list of kernels and test cases. The QA ++ testing and profiler iterate over this static list rather than each source ++ file keeping its own list. The QA also emits XML results to ++ lib/.unittest/kernels.xml which is formatted similarly to JUnit results. ++ ++ * Modtool ++ ++ Modtool was updated to support the QA and profiler changes. ++ ++ * Kernels ++ ++ New proto-kernels: ++ ++ 16ic_deinterleave_real_8i_neon ++ 16ic_s32f_deinterleave_32f_neon ++ fix preprocessor errors for some compilers on byteswap and popcount puppets ++ ++ ORC was moved to the asm kernels directory. ++ volk_malloc ++ ++ The posix_memalign implementation of Volk_malloc now falls back to a standard ++ malloc if alignment is 1. ++ ++ * Miscellaneous ++ ++ Several build system and cmake changes have made it possible to build VOLK ++ both independently with proper soname versions and in-tree for projects ++ such as GNU Radio. ++ ++ The static builds take advantage of cmake object libraries to speed up builds. ++ ++ Finally, there are a number of changes to satisfy compiler warnings and make ++ QA work on multiple machines. ++ ++ -- A. Maitland Bottoms Sun, 12 Apr 2015 23:20:41 -0400 diff --cc debian/compat index 0000000,0000000..ec63514 new file mode 100644 --- /dev/null +++ b/debian/compat @@@ -1,0 -1,0 +1,1 @@@ ++9 diff --cc debian/control index 0000000,0000000..245b061 new file mode 100644 --- /dev/null +++ b/debian/control @@@ -1,0 -1,0 +1,68 @@@ ++Source: volk ++Section: libdevel ++Priority: extra ++Maintainer: A. Maitland Bottoms ++Build-Depends: cmake, ++ debhelper (>= 9.0.0~), ++ dh-python, ++ doxygen, ++ libboost-filesystem-dev, ++ libboost-program-options-dev, ++ libboost-system-dev, ++ libboost-test-dev, ++ liborc-0.4-dev, ++ pkg-config, ++ python, ++ python-cheetah ++Standards-Version: 4.1.3 ++Homepage: http://libvolk.org ++Vcs-Browser: https://salsa.debian.org/bottoms/pkg-volk ++Vcs-Git: https://salsa.debian.org/bottoms/pkg-volk.git ++ ++Package: libvolk1.3 ++Section: libs ++Architecture: any ++Pre-Depends: ${misc:Pre-Depends} ++Depends: ${misc:Depends}, ${shlibs:Depends} ++Multi-Arch: same ++Recommends: libvolk1-bin ++Suggests: libvolk1-dev ++Description: vector optimized functions ++ Vector-Optimized Library of Kernels is designed to help ++ applications work with the processor's SIMD instruction sets. These are ++ very powerful vector operations that can give signal processing a ++ huge boost in performance. ++ ++Package: libvolk1-dev ++Architecture: any ++Pre-Depends: ${misc:Pre-Depends} ++Depends: libvolk1.3 (=${binary:Version}), ${misc:Depends} ++Breaks: gnuradio-dev (<<3.7.8), libvolk-dev, libvolk1.0-dev ++Replaces: gnuradio-dev (<<3.7.8), libvolk-dev, libvolk1.0-dev ++Multi-Arch: same ++Description: vector optimized function headers ++ Vector-Optimized Library of Kernels is designed to help ++ applications work with the processor's SIMD instruction sets. These are ++ very powerful vector operations that can give signal processing a ++ huge boost in performance. ++ . ++ This package contains the header files. ++ For documentation, see libvolk-doc. ++ ++Package: libvolk1-bin ++Section: libs ++Architecture: any ++Pre-Depends: ${misc:Pre-Depends} ++Depends: libvolk1.3 (=${binary:Version}), ++ ${misc:Depends}, ++ ${python:Depends}, ++ ${shlibs:Depends} ++Breaks: libvolk-bin, libvolk1.0-bin, gnuradio (<=3.7.2.1) ++Replaces: libvolk-bin, libvolk1.0-bin, gnuradio (<=3.7.2.1) ++Description: vector optimized runtime tools ++ Vector-Optimized Library of Kernels is designed to help ++ applications work with the processor's SIMD instruction sets. These are ++ very powerful vector operations that can give signal processing a ++ huge boost in performance. ++ . ++ This package includes the volk_profile tool. diff --cc debian/copyright index 0000000,0000000..34e4faf new file mode 100644 --- /dev/null +++ b/debian/copyright @@@ -1,0 -1,0 +1,191 @@@ ++Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ ++Upstream-Name: volk ++Upstream-Contact: http://libvolk.org/ ++Source: ++ https://github.com/gnuradio/volk ++Comment: ++ Debian packages by A. Maitland Bottoms ++ . ++ Upstream Authors: ++ Albert Holguin ++ Doug Geiger ++ Elliot Briggs ++ Julien Olivain ++ Michael Dickens ++ Nathan West ++ Tom Rondeau ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: * ++Copyright: 2006, 2009-2016, Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: Doxyfile.in ++ DoxygenLayout.xml ++ volk.pc.in ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: apps/volk_profile.h ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: appveyor.yml ++Copyright: 2016 Paul Cercueil ++License: GPL-3+ ++ ++Files: cmake/* ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: cmake/Modules/* ++Copyright: 2006, 2009-2016, Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: cmake/Modules/CMakeParseArgumentsCopy.cmake ++Copyright: 2010 Alexander Neundorf ++License: Kitware-BSD ++ All rights reserved. ++ . ++ Redistribution and use in source and binary forms, with or without ++ modification, are permitted provided that the following conditions ++ are met: ++ . ++ * Redistributions of source code must retain the above copyright ++ notice, this list of conditions and the following disclaimer. ++ . ++ * Redistributions in binary form must reproduce the above copyright ++ notice, this list of conditions and the following disclaimer in the ++ documentation and/or other materials provided with the distribution. ++ . ++ * Neither the names of Kitware, Inc., the Insight Software Consortium, ++ nor the names of their contributors may be used to endorse or promote ++ products derived from this software without specific prior written ++ permission. ++ . ++ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ++ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT ++ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ++ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT ++ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, ++ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT ++ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, ++ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY ++ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ++ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE ++ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ++ ++Files: cmake/Modules/FindORC.cmake ++ cmake/Modules/VolkConfig.cmake.in ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: cmake/msvc/* ++Copyright: 2006-2008, Alexander Chemeris ++License: BSD-2-clause ++ Redistribution and use in source and binary forms, with or without ++ modification, are permitted provided that the following conditions are met: ++ . ++ 1. Redistributions of source code must retain the above copyright notice, ++ this list of conditions and the following disclaimer. ++ . ++ 2. Redistributions in binary form must reproduce the above copyright ++ notice, this list of conditions and the following disclaimer in the ++ documentation and/or other materials provided with the distribution. ++ . ++ 3. The name of the author may be used to endorse or promote products ++ derived from this software without specific prior written permission. ++ . ++ THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED ++ WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ++ MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO ++ EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, ++ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, ++ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; ++ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, ++ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR ++ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ++ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ++ ++Files: cmake/msvc/config.h ++Copyright: 2005, 2006 Apple Computer, Inc. ++License: LGPL-2+ ++ ++Files: cmake/msvc/stdbool.h ++Copyright: 2005, 2006, Apple Computer, Inc. ++License: LGPL-2+ ++ ++Files: debian/* ++Copyright: 2015 Free Software Foundation, Inc ++License: GPL-3+ ++Comment: assigned by A. Maitland Bottoms ++ ++Files: debian/libvolk1-dev.abi.tar.gz.amd64 ++Copyright: 2016 Free Software Foundation, Inc ++License: GPL-3+ ++ ++Files: docs/* ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: gen/archs.xml ++ gen/machines.xml ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: include/volk/volk_common.h ++ include/volk/volk_complex.h ++ include/volk/volk_prefs.h ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: kernels/volk/asm/* ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: kernels/volk/volk_16u_byteswappuppet_16u.h ++ kernels/volk/volk_32u_byteswappuppet_32u.h ++ kernels/volk/volk_64u_byteswappuppet_64u.h ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++Files: lib/kernel_tests.h ++ lib/qa_utils.cc ++ lib/qa_utils.h ++ lib/volk_prefs.c ++Copyright: 2014-2015 Free Software Foundation, Inc. ++License: GPL-3+ ++ ++License: LGPL-2+ ++ This library is free software; you can redistribute it and/or ++ modify it under the terms of the GNU Library General Public ++ License as published by the Free Software Foundation; either ++ version 2 of the License, or (at your option) any later version. ++ . ++ This library is distributed in the hope that it will be useful, ++ but WITHOUT ANY WARRANTY; without even the implied warranty of ++ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ++ Library General Public License for more details. ++ . ++ You should have received a copy of the GNU Library General Public License ++ along with this library; see the file COPYING.LIB. If not, write to ++ the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, ++ Boston, MA 02110-1301, USA. ++ ++License: GPL-3+ ++ This program is free software: you can redistribute it and/or modify ++ it under the terms of the GNU General Public License as published by ++ the Free Software Foundation; either version 3 of the License, or ++ (at your option) any later version. ++ . ++ This program is distributed in the hope that it will be useful, ++ but WITHOUT ANY WARRANTY; without even the implied warranty of ++ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ++ GNU General Public License for more details. ++ . ++ You should have received a copy of the GNU General Public License ++ along with this program. If not, see . ++ . ++ On Debian systems, the complete text of the GNU General ++ Public License version 3 can be found in "/usr/share/common-licenses/GPL-3". diff --cc debian/libvolk1-bin.install index 0000000,0000000..b8fc484 new file mode 100644 --- /dev/null +++ b/debian/libvolk1-bin.install @@@ -1,0 -1,0 +1,1 @@@ ++usr/bin/volk* diff --cc debian/libvolk1-bin.manpages index 0000000,0000000..95bae9e new file mode 100644 --- /dev/null +++ b/debian/libvolk1-bin.manpages @@@ -1,0 -1,0 +1,3 @@@ ++debian/volk-config-info.1 ++debian/volk_modtool.1 ++debian/volk_profile.1 diff --cc debian/libvolk1-dev.abi.tar.gz.amd64 index 0000000,0000000..c67d783 new file mode 100644 Binary files differ diff --cc debian/libvolk1-dev.acc index 0000000,0000000..465e6a5 new file mode 100644 --- /dev/null +++ b/debian/libvolk1-dev.acc @@@ -1,0 -1,0 +1,12 @@@ ++ ++ ++ ++ ++debian/libvolk1-dev/usr/include/ ++ ++ ++ ++debian/libvolk1.3/usr/lib/ ++ ++ ++ diff --cc debian/libvolk1-dev.install index 0000000,0000000..4b391be new file mode 100644 --- /dev/null +++ b/debian/libvolk1-dev.install @@@ -1,0 -1,0 +1,4 @@@ ++usr/include/* ++usr/lib/*/*volk*so ++usr/lib/*/cmake/volk ++usr/lib/*/pkgconfig/*volk* diff --cc debian/libvolk1.3.install index 0000000,0000000..2c3cb05 new file mode 100644 --- /dev/null +++ b/debian/libvolk1.3.install @@@ -1,0 -1,0 +1,1 @@@ ++usr/lib/*/libvolk.so.1.3 diff --cc debian/patches/0001-Add-a-AppVeyor-compatible-YAML-file-for-building-on-.patch index 0000000,0000000..09e2420 new file mode 100644 --- /dev/null +++ b/debian/patches/0001-Add-a-AppVeyor-compatible-YAML-file-for-building-on-.patch @@@ -1,0 -1,0 +1,76 @@@ ++From 4461f27f6533cf29baaac0ff9cdd9b6241c0840b Mon Sep 17 00:00:00 2001 ++From: Paul Cercueil ++Date: Wed, 17 Feb 2016 14:51:00 +0100 ++Subject: [PATCH 01/18] Add a AppVeyor compatible YAML file for building on the ++ AppVeyor CI ++ ++Signed-off-by: Paul Cercueil ++--- ++ appveyor.yml | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++ 1 file changed, 55 insertions(+) ++ create mode 100644 appveyor.yml ++ ++diff --git a/appveyor.yml b/appveyor.yml ++new file mode 100644 ++index 0000000..052ea51 ++--- /dev/null +++++ b/appveyor.yml ++@@ -0,0 +1,55 @@ +++clone_depth: 1 +++ +++os: Visual Studio 2013 +++ +++install: +++ - echo "Installing Boost libraries..." +++ - nuget install boost_system-vc120 +++ - nuget install boost_filesystem-vc120 +++ - nuget install boost_chrono-vc120 +++ - nuget install boost_program_options-vc120 +++ - nuget install boost_unit_test_framework-vc120 +++ +++ - echo "Installing Cheetah templates..." +++ - appveyor DownloadFile https://pypi.python.org/packages/source/C/Cheetah/Cheetah-2.4.4.tar.gz +++ - 7z x Cheetah-2.4.4.tar.gz +++ - 7z x -y Cheetah-2.4.4.tar +++ - cd Cheetah-2.4.4 +++ - c:\Python27\python.exe setup.py build +++ - c:\Python27\python.exe setup.py install +++ +++build_script: +++ - cd c:\projects\volk +++ +++ # Without this directory in the %PATH%, compiler tests fail because of missing DLLs +++ - set PATH=%PATH%;C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin +++ +++ - cmake -G "Visual Studio 12 Win64" \ +++ -DBoost_CHRONO_LIBRARY_RELEASE:FILEPATH=c:/projects/volk/boost_chrono-vc120.1.59.0.0/lib/native/address-model-64/lib/boost_chrono-vc120-mt-1_59.lib \ +++ -DBoost_FILESYSTEM_LIBRARY_RELEASE:FILEPATH=c:/projects/volk/boost_filesystem-vc120.1.59.0.0/lib/native/address-model-64/lib/boost_filesystem-vc120-mt-1_59.lib \ +++ -DBoost_PROGRAM_OPTIONS_LIBRARY_RELEASE:FILEPATH=c:/projects/volk/boost_program_options-vc120.1.59.0.0/lib/native/address-model-64/lib/boost_program_options-vc120-mt-1_59.lib \ +++ -DBoost_SYSTEM_LIBRARY_RELEASE:FILEPATH=c:/projects/volk/boost_system-vc120.1.59.0.0/lib/native/address-model-64/lib/boost_system-vc120-mt-1_59.lib \ +++ -DBoost_UNIT_TEST_FRAMEWORK_LIBRARY_RELEASE:FILEPATH=c:/projects/volk/boost_unit_test_framework-vc120.1.59.0.0/lib/native/address-model-64/lib/boost_unit_test_framework-vc120-mt-1_59.lib \ +++ -DBoost_INCLUDE_DIR:PATH=c:/projects/volk/boost.1.59.0.0/lib/native/include \ +++ -DCMAKE_BUILD_TYPE:STRING=Release -DENABLE_ORC:BOOL=OFF -DENABLE_TESTING:BOOL=OFF \ +++ . +++ +++ - cmake --build . --config Release --target INSTALL +++ +++ # Create an archive +++ - cd "c:\Program Files" +++ - 7z a "c:\libvolk-x64.zip" volk +++ +++ # Create the deps archive +++ - mkdir dlls +++ - copy c:\projects\volk\boost_chrono-vc120.1.59.0.0\lib\native\address-model-64\lib\boost_chrono-vc120-mt-1_59.dll dlls\boost_chrono-vc120-mt-1_59.dll +++ - copy c:\projects\volk\boost_filesystem-vc120.1.59.0.0\lib\native\address-model-64\lib\boost_filesystem-vc120-mt-1_59.dll dlls\boost_filesystem-vc120-mt-1_59.dll +++ - copy c:\projects\volk\boost_program_options-vc120.1.59.0.0\lib\native\address-model-64\lib\boost_program_options-vc120-mt-1_59.dll dlls\boost_program_options-vc120-mt-1_59.dll +++ - copy c:\projects\volk\boost_system-vc120.1.59.0.0\lib\native\address-model-64\lib\boost_system-vc120-mt-1_59.dll dlls\boost_system-vc120-mt-1_59.dll +++ - copy c:\projects\volk\boost_unit_test_framework-vc120.1.59.0.0\lib\native\address-model-64\lib\boost_unit_test_framework-vc120-mt-1_59.dll dlls\boost_unit_test_framework-vc120-mt-1_59.dll +++ - cd dlls +++ - 7z a "c:\libvolk-x64-deps.zip" * +++ +++ # Push it! +++ - appveyor PushArtifact c:\libvolk-x64.zip +++ - appveyor PushArtifact c:\libvolk-x64-deps.zip ++-- ++2.11.0 ++ diff --cc debian/patches/0002-Update-CMakeLists-for-1.3-development.patch index 0000000,0000000..48070da new file mode 100644 --- /dev/null +++ b/debian/patches/0002-Update-CMakeLists-for-1.3-development.patch @@@ -1,0 -1,0 +1,25 @@@ ++From 18428fb9f718f5f7fa34707dd47ab6db07d88683 Mon Sep 17 00:00:00 2001 ++From: Nathan West ++Date: Sat, 2 Jul 2016 12:01:28 -0400 ++Subject: [PATCH 02/18] Update CMakeLists for 1.3 development ++ ++--- ++ CMakeLists.txt | 2 +- ++ 1 file changed, 1 insertion(+), 1 deletion(-) ++ ++diff --git a/CMakeLists.txt b/CMakeLists.txt ++index 5ecc9c2..0d0b647 100644 ++--- a/CMakeLists.txt +++++ b/CMakeLists.txt ++@@ -45,7 +45,7 @@ message(STATUS "Build type set to ${CMAKE_BUILD_TYPE}.") ++ ++ set(VERSION_INFO_MAJOR_VERSION 1) ++ set(VERSION_INFO_MINOR_VERSION 3) ++-set(VERSION_INFO_MAINT_VERSION 0) +++set(VERSION_INFO_MAINT_VERSION 0git) ++ include(VolkVersion) #setup version info ++ ++ ++-- ++2.11.0 ++ diff --cc debian/patches/0003-apps-fix-profile-update-reading-end-of-lines.patch index 0000000,0000000..db94b06 new file mode 100644 --- /dev/null +++ b/debian/patches/0003-apps-fix-profile-update-reading-end-of-lines.patch @@@ -1,0 -1,0 +1,25 @@@ ++From e296749b8fe936f72e85cbaca57215cb528ed2e5 Mon Sep 17 00:00:00 2001 ++From: Nathan West ++Date: Mon, 1 Aug 2016 17:12:24 -0400 ++Subject: [PATCH 03/18] apps: fix profile update reading end of lines ++ ++--- ++ apps/volk_profile.cc | 2 +- ++ 1 file changed, 1 insertion(+), 1 deletion(-) ++ ++diff --git a/apps/volk_profile.cc b/apps/volk_profile.cc ++index 2086e3f..51591cc 100644 ++--- a/apps/volk_profile.cc +++++ b/apps/volk_profile.cc ++@@ -261,7 +261,7 @@ void read_results(std::vector *results, std::string path) ++ found = 127; ++ } ++ str_size = config_str.size(); ++- char buffer[128]; +++ char buffer[128] = {'\0'}; ++ config_str.copy(buffer, found + 1, 0); ++ buffer[found] = '\0'; ++ single_kernel_result.push_back(std::string(buffer)); ++-- ++2.11.0 ++ diff --cc debian/patches/0004-apps-fix-profile-update-reading-end-of-lines.patch index 0000000,0000000..9e2f048 new file mode 100644 --- /dev/null +++ b/debian/patches/0004-apps-fix-profile-update-reading-end-of-lines.patch @@@ -1,0 -1,0 +1,25 @@@ ++From 0d672945dfca506d4e49e857ce886d2b3dc80e96 Mon Sep 17 00:00:00 2001 ++From: Nathan West ++Date: Mon, 1 Aug 2016 17:12:24 -0400 ++Subject: [PATCH 04/18] apps: fix profile update reading end of lines ++ ++--- ++ apps/volk_profile.cc | 2 +- ++ 1 file changed, 1 insertion(+), 1 deletion(-) ++ ++diff --git a/apps/volk_profile.cc b/apps/volk_profile.cc ++index 2086e3f..51591cc 100644 ++--- a/apps/volk_profile.cc +++++ b/apps/volk_profile.cc ++@@ -261,7 +261,7 @@ void read_results(std::vector *results, std::string path) ++ found = 127; ++ } ++ str_size = config_str.size(); ++- char buffer[128]; +++ char buffer[128] = {'\0'}; ++ config_str.copy(buffer, found + 1, 0); ++ buffer[found] = '\0'; ++ single_kernel_result.push_back(std::string(buffer)); ++-- ++2.11.0 ++ diff --cc debian/patches/0005-qa-lower-tolerance-for-32fc_mag-to-fix-issue-96.patch index 0000000,0000000..e753e95 new file mode 100644 --- /dev/null +++ b/debian/patches/0005-qa-lower-tolerance-for-32fc_mag-to-fix-issue-96.patch @@@ -1,0 -1,0 +1,34 @@@ ++From 0f6d889b891bc2ac78c56ad18e43c4ec5a372574 Mon Sep 17 00:00:00 2001 ++From: Nathan West ++Date: Thu, 4 Aug 2016 11:30:55 -0400 ++Subject: [PATCH 05/18] qa: lower tolerance for 32fc_mag to fix issue #96 ++ ++--- ++ lib/kernel_tests.h | 4 +++- ++ 1 file changed, 3 insertions(+), 1 deletion(-) ++ ++diff --git a/lib/kernel_tests.h b/lib/kernel_tests.h ++index 2bf1f0c..7c82733 100644 ++--- a/lib/kernel_tests.h +++++ b/lib/kernel_tests.h ++@@ -24,6 +24,8 @@ std::vector init_test_list(volk_test_params_t test_params) ++ // Some kernels need a lower tolerance ++ volk_test_params_t test_params_inacc = volk_test_params_t(1e-2, test_params.scalar(), ++ test_params.vlen(), test_params.iter(), test_params.benchmark_mode(), test_params.kernel_regex()); +++ volk_test_params_t test_params_inacc_tenth = volk_test_params_t(1e-1, test_params.scalar(), +++ test_params.vlen(), test_params.iter(), test_params.benchmark_mode(), test_params.kernel_regex()); ++ volk_test_params_t test_params_int1 = volk_test_params_t(1, test_params.scalar(), ++ test_params.vlen(), test_params.iter(), test_params.benchmark_mode(), test_params.kernel_regex()); ++ ++@@ -79,7 +81,7 @@ std::vector init_test_list(volk_test_params_t test_params) ++ (VOLK_INIT_TEST(volk_32fc_index_max_16u, volk_test_params_t(3, test_params.scalar(), test_params.vlen(), test_params.iter(), test_params.benchmark_mode(), test_params.kernel_regex()))) ++ (VOLK_INIT_TEST(volk_32fc_index_max_32u, volk_test_params_t(3, test_params.scalar(), test_params.vlen(), test_params.iter(), test_params.benchmark_mode(), test_params.kernel_regex()))) ++ (VOLK_INIT_TEST(volk_32fc_s32f_magnitude_16i, test_params_int1)) ++- (VOLK_INIT_TEST(volk_32fc_magnitude_32f, test_params_inacc)) +++ (VOLK_INIT_TEST(volk_32fc_magnitude_32f, test_params_inacc_tenth)) ++ (VOLK_INIT_TEST(volk_32fc_magnitude_squared_32f, test_params)) ++ (VOLK_INIT_TEST(volk_32fc_x2_multiply_32fc, test_params)) ++ (VOLK_INIT_TEST(volk_32fc_x2_multiply_conjugate_32fc, test_params)) ++-- ++2.11.0 ++ diff --cc debian/patches/0006-Add-NEON-AVX-and-unaligned-versions-of-SSE4.1-and-SS.patch index 0000000,0000000..43b0c0b new file mode 100644 --- /dev/null +++ b/debian/patches/0006-Add-NEON-AVX-and-unaligned-versions-of-SSE4.1-and-SS.patch @@@ -1,0 -1,0 +1,346 @@@ ++From aeaf56828ba0a08728a0cf2d2370b5d0153332b1 Mon Sep 17 00:00:00 2001 ++From: Carles Fernandez ++Date: Fri, 23 Sep 2016 19:16:27 +0200 ++Subject: [PATCH 06/18] Add NEON, AVX and unaligned versions of SSE4.1 and SSE ++ ++--- ++ kernels/volk/volk_32f_index_max_32u.h | 316 ++++++++++++++++++++++++++++++++++ ++ 1 file changed, 316 insertions(+) ++ ++diff --git a/kernels/volk/volk_32f_index_max_32u.h b/kernels/volk/volk_32f_index_max_32u.h ++index 17b8f70..1888405 100644 ++--- a/kernels/volk/volk_32f_index_max_32u.h +++++ b/kernels/volk/volk_32f_index_max_32u.h ++@@ -130,6 +130,69 @@ volk_32f_index_max_32u_a_sse4_1(uint32_t* target, const float* src0, uint32_t nu ++ #endif /*LV_HAVE_SSE4_1*/ ++ ++ +++#ifdef LV_HAVE_SSE4_1 +++#include +++ +++static inline void volk_32f_index_max_32u_u_sse4_1(uint32_t* target, const float* src0, uint32_t num_points) +++{ +++ if(num_points > 0) +++ { +++ uint32_t number = 0; +++ const uint32_t quarterPoints = num_points / 4; +++ +++ float* inputPtr = (float*)src0; +++ +++ __m128 indexIncrementValues = _mm_set1_ps(4); +++ __m128 currentIndexes = _mm_set_ps(-1,-2,-3,-4); +++ +++ float max = src0[0]; +++ float index = 0; +++ __m128 maxValues = _mm_set1_ps(max); +++ __m128 maxValuesIndex = _mm_setzero_ps(); +++ __m128 compareResults; +++ __m128 currentValues; +++ +++ __VOLK_ATTR_ALIGNED(16) float maxValuesBuffer[4]; +++ __VOLK_ATTR_ALIGNED(16) float maxIndexesBuffer[4]; +++ +++ for(;number < quarterPoints; number++) +++ { +++ currentValues = _mm_loadu_ps(inputPtr); inputPtr += 4; +++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues); +++ compareResults = _mm_cmpgt_ps(maxValues, currentValues); +++ maxValuesIndex = _mm_blendv_ps(currentIndexes, maxValuesIndex, compareResults); +++ maxValues = _mm_blendv_ps(currentValues, maxValues, compareResults); +++ } +++ +++ // Calculate the largest value from the remaining 4 points +++ _mm_store_ps(maxValuesBuffer, maxValues); +++ _mm_store_ps(maxIndexesBuffer, maxValuesIndex); +++ +++ for(number = 0; number < 4; number++) +++ { +++ if(maxValuesBuffer[number] > max) +++ { +++ index = maxIndexesBuffer[number]; +++ max = maxValuesBuffer[number]; +++ } +++ } +++ +++ number = quarterPoints * 4; +++ for(;number < num_points; number++) +++ { +++ if(src0[number] > max) +++ { +++ index = number; +++ max = src0[number]; +++ } +++ } +++ target[0] = (uint32_t)index; +++ } +++} +++ +++#endif /*LV_HAVE_SSE4_1*/ +++ +++ ++ #ifdef LV_HAVE_SSE ++ ++ #include ++@@ -193,6 +256,259 @@ volk_32f_index_max_32u_a_sse(uint32_t* target, const float* src0, uint32_t num_p ++ #endif /*LV_HAVE_SSE*/ ++ ++ +++#ifdef LV_HAVE_SSE +++#include +++ +++static inline void volk_32f_index_max_32u_u_sse(uint32_t* target, const float* src0, uint32_t num_points) +++{ +++ if(num_points > 0) +++ { +++ uint32_t number = 0; +++ const uint32_t quarterPoints = num_points / 4; +++ +++ float* inputPtr = (float*)src0; +++ +++ __m128 indexIncrementValues = _mm_set1_ps(4); +++ __m128 currentIndexes = _mm_set_ps(-1,-2,-3,-4); +++ +++ float max = src0[0]; +++ float index = 0; +++ __m128 maxValues = _mm_set1_ps(max); +++ __m128 maxValuesIndex = _mm_setzero_ps(); +++ __m128 compareResults; +++ __m128 currentValues; +++ +++ __VOLK_ATTR_ALIGNED(16) float maxValuesBuffer[4]; +++ __VOLK_ATTR_ALIGNED(16) float maxIndexesBuffer[4]; +++ +++ for(;number < quarterPoints; number++) +++ { +++ currentValues = _mm_loadu_ps(inputPtr); inputPtr += 4; +++ currentIndexes = _mm_add_ps(currentIndexes, indexIncrementValues); +++ compareResults = _mm_cmpgt_ps(maxValues, currentValues); +++ maxValuesIndex = _mm_or_ps(_mm_and_ps(compareResults, maxValuesIndex) , _mm_andnot_ps(compareResults, currentIndexes)); +++ maxValues = _mm_or_ps(_mm_and_ps(compareResults, maxValues) , _mm_andnot_ps(compareResults, currentValues)); +++ } +++ +++ // Calculate the largest value from the remaining 4 points +++ _mm_store_ps(maxValuesBuffer, maxValues); +++ _mm_store_ps(maxIndexesBuffer, maxValuesIndex); +++ +++ for(number = 0; number < 4; number++) +++ { +++ if(maxValuesBuffer[number] > max) +++ { +++ index = maxIndexesBuffer[number]; +++ max = maxValuesBuffer[number]; +++ } +++ } +++ +++ number = quarterPoints * 4; +++ for(;number < num_points; number++) +++ { +++ if(src0[number] > max) +++ { +++ index = number; +++ max = src0[number]; +++ } +++ } +++ target[0] = (uint32_t)index; +++ } +++} +++ +++#endif /*LV_HAVE_SSE*/ +++ +++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void volk_32f_index_max_32u_a_avx(uint32_t* target, const float* src0, uint32_t num_points) +++{ +++ if(num_points > 0) +++ { +++ uint32_t number = 0; +++ const uint32_t quarterPoints = num_points / 8; +++ +++ float* inputPtr = (float*)src0; +++ +++ __m256 indexIncrementValues = _mm256_set1_ps(8); +++ __m256 currentIndexes = _mm256_set_ps(-1,-2,-3,-4,-5,-6,-7,-8); +++ +++ float max = src0[0]; +++ float index = 0; +++ __m256 maxValues = _mm256_set1_ps(max); +++ __m256 maxValuesIndex = _mm256_setzero_ps(); +++ __m256 compareResults; +++ __m256 currentValues; +++ +++ __VOLK_ATTR_ALIGNED(32) float maxValuesBuffer[8]; +++ __VOLK_ATTR_ALIGNED(32) float maxIndexesBuffer[8]; +++ +++ for(;number < quarterPoints; number++) +++ { +++ currentValues = _mm256_load_ps(inputPtr); inputPtr += 8; +++ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues); +++ compareResults = _mm256_cmp_ps(maxValues, currentValues, 0x1e); +++ maxValuesIndex = _mm256_blendv_ps(currentIndexes, maxValuesIndex, compareResults); +++ maxValues = _mm256_blendv_ps(currentValues, maxValues, compareResults); +++ } +++ +++ // Calculate the largest value from the remaining 8 points +++ _mm256_store_ps(maxValuesBuffer, maxValues); +++ _mm256_store_ps(maxIndexesBuffer, maxValuesIndex); +++ +++ for(number = 0; number < 8; number++) +++ { +++ if(maxValuesBuffer[number] > max) +++ { +++ index = maxIndexesBuffer[number]; +++ max = maxValuesBuffer[number]; +++ } +++ } +++ +++ number = quarterPoints * 8; +++ for(;number < num_points; number++) +++ { +++ if(src0[number] > max) +++ { +++ index = number; +++ max = src0[number]; +++ } +++ } +++ target[0] = (uint32_t)index; +++ } +++} +++ +++#endif /*LV_HAVE_AVX*/ +++ +++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void volk_32f_index_max_32u_u_avx(uint32_t* target, const float* src0, uint32_t num_points) +++{ +++ if(num_points > 0) +++ { +++ uint32_t number = 0; +++ const uint32_t quarterPoints = num_points / 8; +++ +++ float* inputPtr = (float*)src0; +++ +++ __m256 indexIncrementValues = _mm256_set1_ps(8); +++ __m256 currentIndexes = _mm256_set_ps(-1,-2,-3,-4,-5,-6,-7,-8); +++ +++ float max = src0[0]; +++ float index = 0; +++ __m256 maxValues = _mm256_set1_ps(max); +++ __m256 maxValuesIndex = _mm256_setzero_ps(); +++ __m256 compareResults; +++ __m256 currentValues; +++ +++ __VOLK_ATTR_ALIGNED(32) float maxValuesBuffer[8]; +++ __VOLK_ATTR_ALIGNED(32) float maxIndexesBuffer[8]; +++ +++ for(;number < quarterPoints; number++) +++ { +++ currentValues = _mm256_loadu_ps(inputPtr); inputPtr += 8; +++ currentIndexes = _mm256_add_ps(currentIndexes, indexIncrementValues); +++ compareResults = _mm256_cmp_ps(maxValues, currentValues, 0x1e); +++ maxValuesIndex = _mm256_blendv_ps(currentIndexes, maxValuesIndex, compareResults); +++ maxValues = _mm256_blendv_ps(currentValues, maxValues, compareResults); +++ } +++ +++ // Calculate the largest value from the remaining 8 points +++ _mm256_store_ps(maxValuesBuffer, maxValues); +++ _mm256_store_ps(maxIndexesBuffer, maxValuesIndex); +++ +++ for(number = 0; number < 8; number++) +++ { +++ if(maxValuesBuffer[number] > max) +++ { +++ index = maxIndexesBuffer[number]; +++ max = maxValuesBuffer[number]; +++ } +++ } +++ +++ number = quarterPoints * 8; +++ for(;number < num_points; number++) +++ { +++ if(src0[number] > max) +++ { +++ index = number; +++ max = src0[number]; +++ } +++ } +++ target[0] = (uint32_t)index; +++ } +++} +++ +++#endif /*LV_HAVE_AVX*/ +++ +++ +++#ifdef LV_HAVE_NEON +++#include +++ +++static inline void volk_32f_index_max_32u_neon(uint32_t* target, const float* src0, uint32_t num_points) +++{ +++ if(num_points > 0) +++ { +++ uint32_t number = 0; +++ const uint32_t quarterPoints = num_points / 4; +++ +++ float* inputPtr = (float*)src0; +++ float32x4_t indexIncrementValues = vdupq_n_f32(4); +++ __VOLK_ATTR_ALIGNED(16) float currentIndexes_float[4] = { -4.0f, -3.0f, -2.0f, -1.0f }; +++ float32x4_t currentIndexes = vld1q_f32(currentIndexes_float); +++ +++ float max = src0[0]; +++ float index = 0; +++ float32x4_t maxValues = vdupq_n_f32(max); +++ uint32x4_t maxValuesIndex = vmovq_n_u32(0); +++ uint32x4_t compareResults; +++ uint32x4_t currentIndexes_u; +++ float32x4_t currentValues; +++ +++ __VOLK_ATTR_ALIGNED(16) float maxValuesBuffer[4]; +++ __VOLK_ATTR_ALIGNED(16) float maxIndexesBuffer[4]; +++ +++ for(;number < quarterPoints; number++) +++ { +++ currentValues = vld1q_f32(inputPtr); inputPtr += 4; +++ currentIndexes = vaddq_f32(currentIndexes, indexIncrementValues); +++ currentIndexes_u = vcvtq_u32_f32(currentIndexes); +++ compareResults = vcgtq_f32( maxValues, currentValues); +++ maxValuesIndex = vorrq_u32( vandq_u32( compareResults, maxValuesIndex ), vbicq_u32(currentIndexes_u, compareResults) ); +++ maxValues = vmaxq_f32(currentValues, maxValues); +++ } +++ +++ // Calculate the largest value from the remaining 4 points +++ vst1q_f32(maxValuesBuffer, maxValues); +++ vst1q_f32(maxIndexesBuffer, vcvtq_f32_u32(maxValuesIndex)); +++ for(number = 0; number < 4; number++) +++ { +++ if(maxValuesBuffer[number] > max) +++ { +++ index = maxIndexesBuffer[number]; +++ max = maxValuesBuffer[number]; +++ } +++ } +++ +++ number = quarterPoints * 4; +++ for(;number < num_points; number++) +++ { +++ if(src0[number] > max) +++ { +++ index = number; +++ max = src0[number]; +++ } +++ } +++ target[0] = (uint32_t)index; +++ } +++} +++ +++#endif /*LV_HAVE_NEON*/ +++ +++ ++ #ifdef LV_HAVE_GENERIC ++ ++ static inline void ++-- ++2.11.0 ++ diff --cc debian/patches/0007-added-__VOLK_PREFETCH-compatibility-macro.patch index 0000000,0000000..564afa2 new file mode 100644 --- /dev/null +++ b/debian/patches/0007-added-__VOLK_PREFETCH-compatibility-macro.patch @@@ -1,0 -1,0 +1,357 @@@ ++From d065c1cdd34c0f5c78911331381e10687faa14a0 Mon Sep 17 00:00:00 2001 ++From: Josh Blum ++Date: Fri, 20 Jan 2017 10:03:49 -0800 ++Subject: [PATCH 07/18] added __VOLK_PREFETCH() compatibility macro ++ ++__VOLK_PREFETCH() performs __builtin_prefetch() on GCC compilers ++and is otherwise a NOP for other systems. The use of __builtin_prefetch ++was replaced with __VOLK_PREFETCH() to make the kernels portable. ++--- ++ include/volk/volk_common.h | 3 +++ ++ kernels/volk/volk_16i_max_star_16i.h | 2 +- ++ kernels/volk/volk_16i_max_star_horizontal_16i.h | 2 +- ++ kernels/volk/volk_16ic_convert_32fc.h | 2 +- ++ kernels/volk/volk_16ic_x2_dot_prod_16ic.h | 28 +++++++++++----------- ++ kernels/volk/volk_16ic_x2_multiply_16ic.h | 4 ++-- ++ kernels/volk/volk_32f_x2_add_32f.h | 4 ++-- ++ kernels/volk/volk_32fc_conjugate_32fc.h | 2 +- ++ kernels/volk/volk_32fc_convert_16ic.h | 6 ++--- ++ .../volk/volk_32fc_x2_conjugate_dot_prod_32fc.h | 4 ++-- ++ kernels/volk/volk_32fc_x2_dot_prod_32fc.h | 16 ++++++------- ++ kernels/volk/volk_32fc_x2_multiply_32fc.h | 8 +++---- ++ .../volk/volk_32fc_x2_multiply_conjugate_32fc.h | 4 ++-- ++ 13 files changed, 44 insertions(+), 41 deletions(-) ++ ++diff --git a/include/volk/volk_common.h b/include/volk/volk_common.h ++index 4d35f5c..a53b139 100644 ++--- a/include/volk/volk_common.h +++++ b/include/volk/volk_common.h ++@@ -16,6 +16,7 @@ ++ # define __VOLK_ATTR_EXPORT ++ # define __VOLK_ATTR_IMPORT ++ # endif +++# define __VOLK_PREFETCH(addr) __builtin_prefetch(addr) ++ #elif _MSC_VER ++ # define __VOLK_ATTR_ALIGNED(x) __declspec(align(x)) ++ # define __VOLK_ATTR_UNUSED ++@@ -23,6 +24,7 @@ ++ # define __VOLK_ATTR_DEPRECATED __declspec(deprecated) ++ # define __VOLK_ATTR_EXPORT __declspec(dllexport) ++ # define __VOLK_ATTR_IMPORT __declspec(dllimport) +++# define __VOLK_PREFETCH(addr) ++ #else ++ # define __VOLK_ATTR_ALIGNED(x) ++ # define __VOLK_ATTR_UNUSED ++@@ -30,6 +32,7 @@ ++ # define __VOLK_ATTR_DEPRECATED ++ # define __VOLK_ATTR_EXPORT ++ # define __VOLK_ATTR_IMPORT +++# define __VOLK_PREFETCH(addr) ++ #endif ++ ++ //////////////////////////////////////////////////////////////////////// ++diff --git a/kernels/volk/volk_16i_max_star_16i.h b/kernels/volk/volk_16i_max_star_16i.h ++index e470642..531a8b5 100644 ++--- a/kernels/volk/volk_16i_max_star_16i.h +++++ b/kernels/volk/volk_16i_max_star_16i.h ++@@ -139,7 +139,7 @@ volk_16i_max_star_16i_neon(short* target, short* src0, unsigned int num_points) ++ ++ for(number=0; number < eighth_points; ++number) { ++ input_vec = vld1q_s16(src0); ++- __builtin_prefetch(src0+16); +++ __VOLK_PREFETCH(src0+16); ++ diff = vsubq_s16(candidate_vec, input_vec); ++ comp1 = vcgeq_s16(diff, zeros); ++ comp2 = vcltq_s16(diff, zeros); ++diff --git a/kernels/volk/volk_16i_max_star_horizontal_16i.h b/kernels/volk/volk_16i_max_star_horizontal_16i.h ++index 1da8356..964587c 100644 ++--- a/kernels/volk/volk_16i_max_star_horizontal_16i.h +++++ b/kernels/volk/volk_16i_max_star_horizontal_16i.h ++@@ -169,7 +169,7 @@ volk_16i_max_star_horizontal_16i_neon(int16_t* target, int16_t* src0, unsigned i ++ zeros = veorq_s16(zeros, zeros); ++ for(number=0; number < eighth_points; ++number) { ++ input_vec = vld2q_s16(src0); ++- //__builtin_prefetch(src0+16); +++ //__VOLK_PREFETCH(src0+16); ++ diff = vsubq_s16(input_vec.val[0], input_vec.val[1]); ++ comp1 = vcgeq_s16(diff, zeros); ++ comp2 = vcltq_s16(diff, zeros); ++diff --git a/kernels/volk/volk_16ic_convert_32fc.h b/kernels/volk/volk_16ic_convert_32fc.h ++index 88e079d..9779b0f 100644 ++--- a/kernels/volk/volk_16ic_convert_32fc.h +++++ b/kernels/volk/volk_16ic_convert_32fc.h ++@@ -198,7 +198,7 @@ static inline void volk_16ic_convert_32fc_neon(lv_32fc_t* outputVector, const lv ++ for(number = 0; number < sse_iters; number++) ++ { ++ a16x4 = vld1_s16((const int16_t*)_in); ++- __builtin_prefetch(_in + 4); +++ __VOLK_PREFETCH(_in + 4); ++ a32x4 = vmovl_s16(a16x4); ++ f32x4 = vcvtq_f32_s32(a32x4); ++ vst1q_f32((float32_t*)_out, f32x4); ++diff --git a/kernels/volk/volk_16ic_x2_dot_prod_16ic.h b/kernels/volk/volk_16ic_x2_dot_prod_16ic.h ++index 9d4c882..8e6de4c 100644 ++--- a/kernels/volk/volk_16ic_x2_dot_prod_16ic.h +++++ b/kernels/volk/volk_16ic_x2_dot_prod_16ic.h ++@@ -96,9 +96,9 @@ static inline void volk_16ic_x2_dot_prod_16ic_a_sse2(lv_16sc_t* out, const lv_16 ++ { ++ // a[127:0]=[a3.i,a3.r,a2.i,a2.r,a1.i,a1.r,a0.i,a0.r] ++ a = _mm_load_si128((__m128i*)_in_a); //load (2 byte imag, 2 byte real) x 4 into 128 bits reg ++- __builtin_prefetch(_in_a + 8); +++ __VOLK_PREFETCH(_in_a + 8); ++ b = _mm_load_si128((__m128i*)_in_b); ++- __builtin_prefetch(_in_b + 8); +++ __VOLK_PREFETCH(_in_b + 8); ++ c = _mm_mullo_epi16(a, b); // a3.i*b3.i, a3.r*b3.r, .... ++ ++ c_sr = _mm_srli_si128(c, 2); // Shift a right by imm8 bytes while shifting in zeros, and store the results in dst. ++@@ -173,9 +173,9 @@ static inline void volk_16ic_x2_dot_prod_16ic_u_sse2(lv_16sc_t* out, const lv_16 ++ { ++ // a[127:0]=[a3.i,a3.r,a2.i,a2.r,a1.i,a1.r,a0.i,a0.r] ++ a = _mm_loadu_si128((__m128i*)_in_a); //load (2 byte imag, 2 byte real) x 4 into 128 bits reg ++- __builtin_prefetch(_in_a + 8); +++ __VOLK_PREFETCH(_in_a + 8); ++ b = _mm_loadu_si128((__m128i*)_in_b); ++- __builtin_prefetch(_in_b + 8); +++ __VOLK_PREFETCH(_in_b + 8); ++ c = _mm_mullo_epi16(a, b); // a3.i*b3.i, a3.r*b3.r, .... ++ ++ c_sr = _mm_srli_si128(c, 2); // Shift a right by imm8 bytes while shifting in zeros, and store the results in dst. ++@@ -248,9 +248,9 @@ static inline void volk_16ic_x2_dot_prod_16ic_u_axv2(lv_16sc_t* out, const lv_16 ++ for(number = 0; number < avx_iters; number++) ++ { ++ a = _mm256_loadu_si256((__m256i*)_in_a); ++- __builtin_prefetch(_in_a + 16); +++ __VOLK_PREFETCH(_in_a + 16); ++ b = _mm256_loadu_si256((__m256i*)_in_b); ++- __builtin_prefetch(_in_b + 16); +++ __VOLK_PREFETCH(_in_b + 16); ++ c = _mm256_mullo_epi16(a, b); ++ ++ c_sr = _mm256_srli_si256(c, 2); // Shift a right by imm8 bytes while shifting in zeros, and store the results in dst. ++@@ -324,9 +324,9 @@ static inline void volk_16ic_x2_dot_prod_16ic_a_axv2(lv_16sc_t* out, const lv_16 ++ for(number = 0; number < avx_iters; number++) ++ { ++ a = _mm256_load_si256((__m256i*)_in_a); ++- __builtin_prefetch(_in_a + 16); +++ __VOLK_PREFETCH(_in_a + 16); ++ b = _mm256_load_si256((__m256i*)_in_b); ++- __builtin_prefetch(_in_b + 16); +++ __VOLK_PREFETCH(_in_b + 16); ++ c = _mm256_mullo_epi16(a, b); ++ ++ c_sr = _mm256_srli_si256(c, 2); // Shift a right by imm8 bytes while shifting in zeros, and store the results in dst. ++@@ -399,8 +399,8 @@ static inline void volk_16ic_x2_dot_prod_16ic_neon(lv_16sc_t* out, const lv_16sc ++ { ++ a_val = vld2_s16((int16_t*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2_s16((int16_t*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr + 8); ++- __builtin_prefetch(b_ptr + 8); +++ __VOLK_PREFETCH(a_ptr + 8); +++ __VOLK_PREFETCH(b_ptr + 8); ++ ++ // multiply the real*real and imag*imag to get real result ++ // a0r*b0r|a1r*b1r|a2r*b2r|a3r*b3r ++@@ -465,8 +465,8 @@ static inline void volk_16ic_x2_dot_prod_16ic_neon_vma(lv_16sc_t* out, const lv_ ++ { ++ a_val = vld2_s16((int16_t*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2_s16((int16_t*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr + 8); ++- __builtin_prefetch(b_ptr + 8); +++ __VOLK_PREFETCH(a_ptr + 8); +++ __VOLK_PREFETCH(b_ptr + 8); ++ ++ tmp.val[0] = vmul_s16(a_val.val[0], b_val.val[0]); ++ tmp.val[1] = vmul_s16(a_val.val[1], b_val.val[0]); ++@@ -519,8 +519,8 @@ static inline void volk_16ic_x2_dot_prod_16ic_neon_optvma(lv_16sc_t* out, const ++ { ++ a_val = vld2_s16((int16_t*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2_s16((int16_t*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr + 8); ++- __builtin_prefetch(b_ptr + 8); +++ __VOLK_PREFETCH(a_ptr + 8); +++ __VOLK_PREFETCH(b_ptr + 8); ++ ++ // use 2 accumulators to remove inter-instruction data dependencies ++ accumulator1.val[0] = vmla_s16(accumulator1.val[0], a_val.val[0], b_val.val[0]); ++diff --git a/kernels/volk/volk_16ic_x2_multiply_16ic.h b/kernels/volk/volk_16ic_x2_multiply_16ic.h ++index 17033ae..9dcf06f 100644 ++--- a/kernels/volk/volk_16ic_x2_multiply_16ic.h +++++ b/kernels/volk/volk_16ic_x2_multiply_16ic.h ++@@ -291,8 +291,8 @@ static inline void volk_16ic_x2_multiply_16ic_neon(lv_16sc_t* out, const lv_16sc ++ { ++ a_val = vld2_s16((int16_t*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2_s16((int16_t*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr + 4); ++- __builtin_prefetch(b_ptr + 4); +++ __VOLK_PREFETCH(a_ptr + 4); +++ __VOLK_PREFETCH(b_ptr + 4); ++ ++ // multiply the real*real and imag*imag to get real result ++ // a0r*b0r|a1r*b1r|a2r*b2r|a3r*b3r ++diff --git a/kernels/volk/volk_32f_x2_add_32f.h b/kernels/volk/volk_32f_x2_add_32f.h ++index fc9cf5b..28cf73d 100644 ++--- a/kernels/volk/volk_32f_x2_add_32f.h +++++ b/kernels/volk/volk_32f_x2_add_32f.h ++@@ -191,8 +191,8 @@ volk_32f_x2_add_32f_u_neon(float* cVector, const float* aVector, ++ // Load in to NEON registers ++ aVal = vld1q_f32(aPtr); ++ bVal = vld1q_f32(bPtr); ++- __builtin_prefetch(aPtr+4); ++- __builtin_prefetch(bPtr+4); +++ __VOLK_PREFETCH(aPtr+4); +++ __VOLK_PREFETCH(bPtr+4); ++ ++ // vector add ++ cVal = vaddq_f32(aVal, bVal); ++diff --git a/kernels/volk/volk_32fc_conjugate_32fc.h b/kernels/volk/volk_32fc_conjugate_32fc.h ++index 1fdb6c2..6994d0e 100644 ++--- a/kernels/volk/volk_32fc_conjugate_32fc.h +++++ b/kernels/volk/volk_32fc_conjugate_32fc.h ++@@ -248,7 +248,7 @@ volk_32fc_conjugate_32fc_a_neon(lv_32fc_t* cVector, const lv_32fc_t* aVector, un ++ const lv_32fc_t* a = aVector; ++ ++ for(number=0; number < quarterPoints; number++){ ++- __builtin_prefetch(a+4); +++ __VOLK_PREFETCH(a+4); ++ x = vld2q_f32((float*)a); // Load the complex data as ar,br,cr,dr; ai,bi,ci,di ++ ++ // xor the imaginary lane ++diff --git a/kernels/volk/volk_32fc_convert_16ic.h b/kernels/volk/volk_32fc_convert_16ic.h ++index 4f6e6a5..307ab36 100644 ++--- a/kernels/volk/volk_32fc_convert_16ic.h +++++ b/kernels/volk/volk_32fc_convert_16ic.h ++@@ -75,7 +75,7 @@ static inline void volk_32fc_convert_16ic_u_sse2(lv_16sc_t* outputVector, const ++ { ++ inputVal1 = _mm_loadu_ps((float*)inputVectorPtr); inputVectorPtr += 4; ++ inputVal2 = _mm_loadu_ps((float*)inputVectorPtr); inputVectorPtr += 4; ++- __builtin_prefetch(inputVectorPtr + 8); +++ __VOLK_PREFETCH(inputVectorPtr + 8); ++ ++ // Clip ++ ret1 = _mm_max_ps(_mm_min_ps(inputVal1, vmax_val), vmin_val); ++@@ -128,7 +128,7 @@ static inline void volk_32fc_convert_16ic_a_sse2(lv_16sc_t* outputVector, const ++ { ++ inputVal1 = _mm_load_ps((float*)inputVectorPtr); inputVectorPtr += 4; ++ inputVal2 = _mm_load_ps((float*)inputVectorPtr); inputVectorPtr += 4; ++- __builtin_prefetch(inputVectorPtr + 8); +++ __VOLK_PREFETCH(inputVectorPtr + 8); ++ ++ // Clip ++ ret1 = _mm_max_ps(_mm_min_ps(inputVal1, vmax_val), vmin_val); ++@@ -184,7 +184,7 @@ static inline void volk_32fc_convert_16ic_neon(lv_16sc_t* outputVector, const lv ++ { ++ a = vld1q_f32((const float32_t*)(inputVectorPtr)); inputVectorPtr += 4; ++ b = vld1q_f32((const float32_t*)(inputVectorPtr)); inputVectorPtr += 4; ++- __builtin_prefetch(inputVectorPtr + 8); +++ __VOLK_PREFETCH(inputVectorPtr + 8); ++ ++ ret1 = vmaxq_f32(vminq_f32(a, max_val), min_val); ++ ret2 = vmaxq_f32(vminq_f32(b, max_val), min_val); ++diff --git a/kernels/volk/volk_32fc_x2_conjugate_dot_prod_32fc.h b/kernels/volk/volk_32fc_x2_conjugate_dot_prod_32fc.h ++index 981899c..4addf80 100644 ++--- a/kernels/volk/volk_32fc_x2_conjugate_dot_prod_32fc.h +++++ b/kernels/volk/volk_32fc_x2_conjugate_dot_prod_32fc.h ++@@ -219,8 +219,8 @@ static inline void volk_32fc_x2_conjugate_dot_prod_32fc_neon(lv_32fc_t* result, ++ for(number = 0; number < quarter_points; ++number) { ++ a_val = vld2q_f32((float*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2q_f32((float*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr+8); ++- __builtin_prefetch(b_ptr+8); +++ __VOLK_PREFETCH(a_ptr+8); +++ __VOLK_PREFETCH(b_ptr+8); ++ ++ // do the first multiply ++ tmp_imag.val[1] = vmulq_f32(a_val.val[1], b_val.val[0]); ++diff --git a/kernels/volk/volk_32fc_x2_dot_prod_32fc.h b/kernels/volk/volk_32fc_x2_dot_prod_32fc.h ++index 39d0c78..0c3271c 100644 ++--- a/kernels/volk/volk_32fc_x2_dot_prod_32fc.h +++++ b/kernels/volk/volk_32fc_x2_dot_prod_32fc.h ++@@ -894,8 +894,8 @@ static inline void volk_32fc_x2_dot_prod_32fc_neon(lv_32fc_t* result, const lv_3 ++ for(number = 0; number < quarter_points; ++number) { ++ a_val = vld2q_f32((float*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2q_f32((float*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr+8); ++- __builtin_prefetch(b_ptr+8); +++ __VOLK_PREFETCH(a_ptr+8); +++ __VOLK_PREFETCH(b_ptr+8); ++ ++ // multiply the real*real and imag*imag to get real result ++ // a0r*b0r|a1r*b1r|a2r*b2r|a3r*b3r ++@@ -949,8 +949,8 @@ static inline void volk_32fc_x2_dot_prod_32fc_neon_opttests(lv_32fc_t* result, c ++ for(number = 0; number < quarter_points; ++number) { ++ a_val = vld2q_f32((float*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2q_f32((float*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr+8); ++- __builtin_prefetch(b_ptr+8); +++ __VOLK_PREFETCH(a_ptr+8); +++ __VOLK_PREFETCH(b_ptr+8); ++ ++ // do the first multiply ++ tmp_imag.val[1] = vmulq_f32(a_val.val[1], b_val.val[0]); ++@@ -998,8 +998,8 @@ static inline void volk_32fc_x2_dot_prod_32fc_neon_optfma(lv_32fc_t* result, con ++ for(number = 0; number < quarter_points; ++number) { ++ a_val = vld2q_f32((float*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2q_f32((float*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr+8); ++- __builtin_prefetch(b_ptr+8); +++ __VOLK_PREFETCH(a_ptr+8); +++ __VOLK_PREFETCH(b_ptr+8); ++ ++ // use 2 accumulators to remove inter-instruction data dependencies ++ accumulator1.val[0] = vmlaq_f32(accumulator1.val[0], a_val.val[0], b_val.val[0]); ++@@ -1050,8 +1050,8 @@ static inline void volk_32fc_x2_dot_prod_32fc_neon_optfmaunroll(lv_32fc_t* resul ++ for(number = 0; number < quarter_points; ++number) { ++ a_val = vld4q_f32((float*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld4q_f32((float*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr+8); ++- __builtin_prefetch(b_ptr+8); +++ __VOLK_PREFETCH(a_ptr+8); +++ __VOLK_PREFETCH(b_ptr+8); ++ ++ // use 2 accumulators to remove inter-instruction data dependencies ++ accumulator1.val[0] = vmlaq_f32(accumulator1.val[0], a_val.val[0], b_val.val[0]); ++diff --git a/kernels/volk/volk_32fc_x2_multiply_32fc.h b/kernels/volk/volk_32fc_x2_multiply_32fc.h ++index 1709140..0b9d3fe 100644 ++--- a/kernels/volk/volk_32fc_x2_multiply_32fc.h +++++ b/kernels/volk/volk_32fc_x2_multiply_32fc.h ++@@ -372,8 +372,8 @@ volk_32fc_x2_multiply_32fc_neon(lv_32fc_t* cVector, const lv_32fc_t* aVector, ++ for(number = 0; number < quarter_points; ++number) { ++ a_val = vld2q_f32((float*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2q_f32((float*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr+4); ++- __builtin_prefetch(b_ptr+4); +++ __VOLK_PREFETCH(a_ptr+4); +++ __VOLK_PREFETCH(b_ptr+4); ++ ++ // multiply the real*real and imag*imag to get real result ++ // a0r*b0r|a1r*b1r|a2r*b2r|a3r*b3r ++@@ -420,8 +420,8 @@ volk_32fc_x2_multiply_32fc_neon_opttests(lv_32fc_t* cVector, const lv_32fc_t* aV ++ for(number = 0; number < quarter_points; ++number) { ++ a_val = vld2q_f32((float*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2q_f32((float*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++- __builtin_prefetch(a_ptr+4); ++- __builtin_prefetch(b_ptr+4); +++ __VOLK_PREFETCH(a_ptr+4); +++ __VOLK_PREFETCH(b_ptr+4); ++ ++ // do the first multiply ++ tmp_imag.val[1] = vmulq_f32(a_val.val[1], b_val.val[0]); ++diff --git a/kernels/volk/volk_32fc_x2_multiply_conjugate_32fc.h b/kernels/volk/volk_32fc_x2_multiply_conjugate_32fc.h ++index 703c78d..c13a32e 100644 ++--- a/kernels/volk/volk_32fc_x2_multiply_conjugate_32fc.h +++++ b/kernels/volk/volk_32fc_x2_multiply_conjugate_32fc.h ++@@ -262,8 +262,8 @@ volk_32fc_x2_multiply_conjugate_32fc_neon(lv_32fc_t* cVector, const lv_32fc_t* a ++ a_val = vld2q_f32((float*)a_ptr); // a0r|a1r|a2r|a3r || a0i|a1i|a2i|a3i ++ b_val = vld2q_f32((float*)b_ptr); // b0r|b1r|b2r|b3r || b0i|b1i|b2i|b3i ++ b_val.val[1] = vnegq_f32(b_val.val[1]); ++- __builtin_prefetch(a_ptr+4); ++- __builtin_prefetch(b_ptr+4); +++ __VOLK_PREFETCH(a_ptr+4); +++ __VOLK_PREFETCH(b_ptr+4); ++ ++ // multiply the real*real and imag*imag to get real result ++ // a0r*b0r|a1r*b1r|a2r*b2r|a3r*b3r ++-- ++2.11.0 ++ diff --cc debian/patches/0008-Fix-bug-106-volk_64u_popcnt-bug-in-generic-implement.patch index 0000000,0000000..01b77bb new file mode 100644 --- /dev/null +++ b/debian/patches/0008-Fix-bug-106-volk_64u_popcnt-bug-in-generic-implement.patch @@@ -1,0 -1,0 +1,26 @@@ ++From b0b9615e4e5d38c0d8d6bcc06ccefe08682ec352 Mon Sep 17 00:00:00 2001 ++From: Nick Foster ++Date: Fri, 20 Jan 2017 16:36:01 -0800 ++Subject: [PATCH 08/18] Fix bug 106 (volk_64u_popcnt bug in generic ++ implementation) ++ ++--- ++ kernels/volk/volk_64u_popcnt.h | 2 +- ++ 1 file changed, 1 insertion(+), 1 deletion(-) ++ ++diff --git a/kernels/volk/volk_64u_popcnt.h b/kernels/volk/volk_64u_popcnt.h ++index 653bfb9..cbce2ec 100644 ++--- a/kernels/volk/volk_64u_popcnt.h +++++ b/kernels/volk/volk_64u_popcnt.h ++@@ -84,7 +84,7 @@ volk_64u_popcnt_generic(uint64_t* ret, const uint64_t value) ++ uint64_t retVal64 = retVal; ++ ++ //retVal = valueVector[1]; ++- retVal = (uint32_t)((value & 0xFFFFFFFF00000000ull) >> 31); +++ retVal = (uint32_t)((value & 0xFFFFFFFF00000000ull) >> 32); ++ retVal = (retVal & 0x55555555) + (retVal >> 1 & 0x55555555); ++ retVal = (retVal & 0x33333333) + (retVal >> 2 & 0x33333333); ++ retVal = (retVal + (retVal >> 4)) & 0x0F0F0F0F; ++-- ++2.11.0 ++ diff --cc debian/patches/0009-modtool-deconflict-module-include-guards-from-main-v.patch index 0000000,0000000..b0beaca new file mode 100644 --- /dev/null +++ b/debian/patches/0009-modtool-deconflict-module-include-guards-from-main-v.patch @@@ -1,0 -1,0 +1,39 @@@ ++From 5af8aa45fa23f72aff8593f54e7b67e449927681 Mon Sep 17 00:00:00 2001 ++From: Nathan West ++Date: Mon, 13 Mar 2017 12:25:35 -0400 ++Subject: [PATCH 09/18] modtool: deconflict module include guards from main ++ volk ++ ++--- ++ python/volk_modtool/volk_modtool_generate.py | 8 ++++---- ++ 1 file changed, 4 insertions(+), 4 deletions(-) ++ ++diff --git a/python/volk_modtool/volk_modtool_generate.py b/python/volk_modtool/volk_modtool_generate.py ++index 83e0d26..6040a7d 100644 ++--- a/python/volk_modtool/volk_modtool_generate.py +++++ b/python/volk_modtool/volk_modtool_generate.py ++@@ -98,6 +98,9 @@ class volk_modtool: ++ os.makedirs(os.path.join(self.my_dict['destination'], 'volk_' + self.my_dict['name'], 'kernels/volk_' + self.my_dict['name'])) ++ ++ current_kernel_names = self.get_current_kernels() +++ need_ifdef_updates = ["constant.h", "volk_complex.h", "volk_malloc.h", "volk_prefs.h", +++ "volk_common.h", "volk_cpu.tmpl.h", "volk_config_fixed.tmpl.h", +++ "volk_typedefs.h", "volk.tmpl.h"] ++ for root, dirnames, filenames in os.walk(self.my_dict['base']): ++ for name in filenames: ++ t_table = map(lambda a: re.search(a, name), current_kernel_names) ++@@ -107,10 +110,7 @@ class volk_modtool: ++ instring = open(infile, 'r').read() ++ outstring = re.sub(self.volk, 'volk_' + self.my_dict['name'], instring) ++ # Update the header ifdef guards only where needed ++- if((name == "constants.h") or ++- (name == "volk_complex.h") or ++- (name == "volk_malloc.h") or ++- (name == "volk_prefs.h")): +++ if name in need_ifdef_updates: ++ outstring = re.sub(self.volk_included, 'INCLUDED_VOLK_' + self.my_dict['name'].upper(), outstring) ++ newname = re.sub(self.volk, 'volk_' + self.my_dict['name'], name) ++ relpath = os.path.relpath(infile, self.my_dict['base']) ++-- ++2.11.0 ++ diff --cc debian/patches/0010-modtool-update-the-cmake-find-module-for-volk-mods.patch index 0000000,0000000..7553987 new file mode 100644 --- /dev/null +++ b/debian/patches/0010-modtool-update-the-cmake-find-module-for-volk-mods.patch @@@ -1,0 -1,0 +1,27 @@@ ++From 663dbd00b3e4bd3ddb0b7f8a9360df132d7f0d56 Mon Sep 17 00:00:00 2001 ++From: Nathan West ++Date: Mon, 13 Mar 2017 12:37:18 -0400 ++Subject: [PATCH 10/18] modtool: update the cmake find module for volk mods ++ ++--- ++ python/volk_modtool/volk_modtool_generate.py | 4 ++++ ++ 1 file changed, 4 insertions(+) ++ ++diff --git a/python/volk_modtool/volk_modtool_generate.py b/python/volk_modtool/volk_modtool_generate.py ++index 6040a7d..75232ed 100644 ++--- a/python/volk_modtool/volk_modtool_generate.py +++++ b/python/volk_modtool/volk_modtool_generate.py ++@@ -113,6 +113,10 @@ class volk_modtool: ++ if name in need_ifdef_updates: ++ outstring = re.sub(self.volk_included, 'INCLUDED_VOLK_' + self.my_dict['name'].upper(), outstring) ++ newname = re.sub(self.volk, 'volk_' + self.my_dict['name'], name) +++ if name == 'VolkConfig.cmake.in': +++ outstring = re.sub("VOLK", 'VOLK_' + self.my_dict['name'].upper(), outstring) +++ newname = "Volk%sConfig.cmake.in" % self.my_dict['name'] +++ ++ relpath = os.path.relpath(infile, self.my_dict['base']) ++ newrelpath = re.sub(self.volk, 'volk_' + self.my_dict['name'], relpath) ++ dest = os.path.join(self.my_dict['destination'], 'volk_' + self.my_dict['name'], os.path.dirname(newrelpath), newname) ++-- ++2.11.0 ++ diff --cc debian/patches/0011-Use-powf-to-match-variables-and-avoid-implicit-type-.patch index 0000000,0000000..4f3b849 new file mode 100644 --- /dev/null +++ b/debian/patches/0011-Use-powf-to-match-variables-and-avoid-implicit-type-.patch @@@ -1,0 -1,0 +1,44 @@@ ++From 28b03a9a338dc24b002413e880222fe1d49f77f5 Mon Sep 17 00:00:00 2001 ++From: Michael Dickens ++Date: Sat, 1 Apr 2017 15:24:46 -0400 ++Subject: [PATCH 11/18] Use 'powf' to match variables and avoid implicit type ++ converstion. Makes some older compilers happy, allowing 'make test' to pass. ++ ++--- ++ kernels/volk/volk_32f_x2_pow_32f.h | 6 +++--- ++ 1 file changed, 3 insertions(+), 3 deletions(-) ++ ++diff --git a/kernels/volk/volk_32f_x2_pow_32f.h b/kernels/volk/volk_32f_x2_pow_32f.h ++index 58fecb6..a8cb2e1 100644 ++--- a/kernels/volk/volk_32f_x2_pow_32f.h +++++ b/kernels/volk/volk_32f_x2_pow_32f.h ++@@ -190,7 +190,7 @@ volk_32f_x2_pow_32f_a_sse4_1(float* cVector, const float* bVector, ++ ++ number = quarterPoints * 4; ++ for(;number < num_points; number++){ ++- *cPtr++ = pow(*aPtr++, *bPtr++); +++ *cPtr++ = powf(*aPtr++, *bPtr++); ++ } ++ } ++ ++@@ -215,7 +215,7 @@ volk_32f_x2_pow_32f_generic(float* cVector, const float* bVector, ++ unsigned int number = 0; ++ ++ for(number = 0; number < num_points; number++){ ++- *cPtr++ = pow(*aPtr++, *bPtr++); +++ *cPtr++ = powf(*aPtr++, *bPtr++); ++ } ++ } ++ #endif /* LV_HAVE_GENERIC */ ++@@ -326,7 +326,7 @@ volk_32f_x2_pow_32f_u_sse4_1(float* cVector, const float* bVector, ++ ++ number = quarterPoints * 4; ++ for(;number < num_points; number++){ ++- *cPtr++ = pow(*aPtr++, *bPtr++); +++ *cPtr++ = powf(*aPtr++, *bPtr++); ++ } ++ } ++ ++-- ++2.11.0 ++ diff --cc debian/patches/0012-cmake-support-empty-CMAKE_INSTALL_PREFIX.patch index 0000000,0000000..4680ed4 new file mode 100644 --- /dev/null +++ b/debian/patches/0012-cmake-support-empty-CMAKE_INSTALL_PREFIX.patch @@@ -1,0 -1,0 +1,26 @@@ ++From 67202d7b46f9ce55625d0ce5c3a2d98dff56b09a Mon Sep 17 00:00:00 2001 ++From: Josh Blum ++Date: Wed, 5 Oct 2016 14:09:05 -0700 ++Subject: [PATCH 12/18] cmake: support empty CMAKE_INSTALL_PREFIX ++ ++Needed quotes for the string escape command ++--- ++ lib/CMakeLists.txt | 2 +- ++ 1 file changed, 1 insertion(+), 1 deletion(-) ++ ++diff --git a/lib/CMakeLists.txt b/lib/CMakeLists.txt ++index ad5653c..45b6c51 100644 ++--- a/lib/CMakeLists.txt +++++ b/lib/CMakeLists.txt ++@@ -489,7 +489,7 @@ endif() ++ message(STATUS "Loading version ${VERSION} into constants...") ++ ++ #double escape for windows backslash path separators ++-string(REPLACE "\\" "\\\\" prefix ${prefix}) +++string(REPLACE "\\" "\\\\" prefix "${prefix}") ++ ++ configure_file( ++ ${CMAKE_CURRENT_SOURCE_DIR}/constants.c.in ++-- ++2.11.0 ++ diff --cc debian/patches/0013-Support-relocated-install-with-VOLK_PREFIX-env-var.patch index 0000000,0000000..d65686d new file mode 100644 --- /dev/null +++ b/debian/patches/0013-Support-relocated-install-with-VOLK_PREFIX-env-var.patch @@@ -1,0 -1,0 +1,35 @@@ ++From 0d1065854848494f211c990ed26267565cc44647 Mon Sep 17 00:00:00 2001 ++From: Josh Blum ++Date: Thu, 6 Oct 2016 15:06:09 -0700 ++Subject: [PATCH 13/18] Support relocated install with VOLK_PREFIX env var ++ ++Some packaging systems such as snaps will install ++the volk library to a dynamically chosen location. ++The install script can set an evironment variable ++so that the library reports the correct prefix. ++--- ++ lib/constants.c.in | 3 +++ ++ 1 file changed, 3 insertions(+) ++ ++diff --git a/lib/constants.c.in b/lib/constants.c.in ++index 3839f53..a81c7cb 100644 ++--- a/lib/constants.c.in +++++ b/lib/constants.c.in ++@@ -24,11 +24,14 @@ ++ #include ++ #endif ++ +++#include ++ #include ++ ++ char* ++ volk_prefix() ++ { +++ const char *prefix = getenv("VOLK_PREFIX"); +++ if (prefix != NULL) return (char *)prefix; ++ return "@prefix@"; ++ } ++ ++-- ++2.11.0 ++ diff --cc debian/patches/0014-Fixing-a-minimal-bug-in-the-log2-docstring.patch index 0000000,0000000..b14cebb new file mode 100644 --- /dev/null +++ b/debian/patches/0014-Fixing-a-minimal-bug-in-the-log2-docstring.patch @@@ -1,0 -1,0 +1,25 @@@ ++From ee70be38a66beb5eb236a3ffb3fc147a5d053979 Mon Sep 17 00:00:00 2001 ++From: =?UTF-8?q?Marcus=20M=C3=BCller?= ++Date: Mon, 20 Nov 2017 15:12:06 +0100 ++Subject: [PATCH 14/18] Fixing a minimal bug in the log2 docstring ++ ++--- ++ kernels/volk/volk_32f_log2_32f.h | 2 +- ++ 1 file changed, 1 insertion(+), 1 deletion(-) ++ ++diff --git a/kernels/volk/volk_32f_log2_32f.h b/kernels/volk/volk_32f_log2_32f.h ++index 6704129..c3bfeaa 100644 ++--- a/kernels/volk/volk_32f_log2_32f.h +++++ b/kernels/volk/volk_32f_log2_32f.h ++@@ -62,7 +62,7 @@ ++ * \li num_points: The number of data points. ++ * ++ * \b Outputs ++- * \li cVector: The output vector. +++ * \li bVector: The output vector. ++ * ++ * \b Example ++ * \code ++-- ++2.11.0 ++ diff --cc debian/patches/0015-kernel-Adds-unaligned-protokernles-to-32f_x2_s32f_in.patch index 0000000,0000000..1fc53d4 new file mode 100644 --- /dev/null +++ b/debian/patches/0015-kernel-Adds-unaligned-protokernles-to-32f_x2_s32f_in.patch @@@ -1,0 -1,0 +1,140 @@@ ++From 82a88672d80ef7652548182e819e726874e0adc0 Mon Sep 17 00:00:00 2001 ++From: Damian Miralles ++Date: Wed, 13 Dec 2017 13:27:17 -0700 ++Subject: [PATCH 15/18] kernel: Adds unaligned protokernles to ++ `32f_x2_s32f_interleave_16ic` and `32f_x2_subtract_32f` ++ ++Adds unaligned versions to the afore mentioned kernels, relative speeds ++improvements shown in both cases. ++--- ++ kernels/volk/volk_32f_x2_s32f_interleave_16ic.h | 63 +++++++++++++++++++++++++ ++ kernels/volk/volk_32f_x2_subtract_32f.h | 45 ++++++++++++++++++ ++ 2 files changed, 108 insertions(+) ++ ++diff --git a/kernels/volk/volk_32f_x2_s32f_interleave_16ic.h b/kernels/volk/volk_32f_x2_s32f_interleave_16ic.h ++index 99f1b5e..20f66ff 100644 ++--- a/kernels/volk/volk_32f_x2_s32f_interleave_16ic.h +++++ b/kernels/volk/volk_32f_x2_s32f_interleave_16ic.h ++@@ -214,3 +214,66 @@ volk_32f_x2_s32f_interleave_16ic_generic(lv_16sc_t* complexVector, const float* ++ ++ ++ #endif /* INCLUDED_volk_32f_x2_s32f_interleave_16ic_a_H */ +++ +++#ifndef INCLUDED_volk_32f_x2_s32f_interleave_16ic_u_H +++#define INCLUDED_volk_32f_x2_s32f_interleave_16ic_u_H +++ +++#include +++#include +++#include +++ +++#ifdef LV_HAVE_AVX2 +++#include +++ +++static inline void +++volk_32f_x2_s32f_interleave_16ic_u_avx2(lv_16sc_t* complexVector, const float* iBuffer, +++ const float* qBuffer, const float scalar, unsigned int num_points) +++{ +++ unsigned int number = 0; +++ const float* iBufferPtr = iBuffer; +++ const float* qBufferPtr = qBuffer; +++ +++ __m256 vScalar = _mm256_set1_ps(scalar); +++ +++ const unsigned int eighthPoints = num_points / 8; +++ +++ __m256 iValue, qValue, cplxValue1, cplxValue2; +++ __m256i intValue1, intValue2; +++ +++ int16_t* complexVectorPtr = (int16_t*)complexVector; +++ +++ for(;number < eighthPoints; number++){ +++ iValue = _mm256_loadu_ps(iBufferPtr); +++ qValue = _mm256_loadu_ps(qBufferPtr); +++ +++ // Interleaves the lower two values in the i and q variables into one buffer +++ cplxValue1 = _mm256_unpacklo_ps(iValue, qValue); +++ cplxValue1 = _mm256_mul_ps(cplxValue1, vScalar); +++ +++ // Interleaves the upper two values in the i and q variables into one buffer +++ cplxValue2 = _mm256_unpackhi_ps(iValue, qValue); +++ cplxValue2 = _mm256_mul_ps(cplxValue2, vScalar); +++ +++ intValue1 = _mm256_cvtps_epi32(cplxValue1); +++ intValue2 = _mm256_cvtps_epi32(cplxValue2); +++ +++ intValue1 = _mm256_packs_epi32(intValue1, intValue2); +++ +++ _mm256_storeu_si256((__m256i*)complexVectorPtr, intValue1); +++ complexVectorPtr += 16; +++ +++ iBufferPtr += 8; +++ qBufferPtr += 8; +++ } +++ +++ number = eighthPoints * 8; +++ complexVectorPtr = (int16_t*)(&complexVector[number]); +++ for(; number < num_points; number++){ +++ *complexVectorPtr++ = (int16_t)(*iBufferPtr++ * scalar); +++ *complexVectorPtr++ = (int16_t)(*qBufferPtr++ * scalar); +++ } +++} +++#endif /* LV_HAVE_AVX2 */ +++ +++ +++#endif /* INCLUDED_volk_32f_x2_s32f_interleave_16ic_u_H */ ++diff --git a/kernels/volk/volk_32f_x2_subtract_32f.h b/kernels/volk/volk_32f_x2_subtract_32f.h ++index 4a452fd..b7f36cf 100644 ++--- a/kernels/volk/volk_32f_x2_subtract_32f.h +++++ b/kernels/volk/volk_32f_x2_subtract_32f.h ++@@ -176,3 +176,48 @@ volk_32f_x2_subtract_32f_u_orc(float* cVector, const float* aVector, ++ ++ ++ #endif /* INCLUDED_volk_32f_x2_subtract_32f_a_H */ +++ +++ +++#ifndef INCLUDED_volk_32f_x2_subtract_32f_u_H +++#define INCLUDED_volk_32f_x2_subtract_32f_u_H +++ +++#include +++#include +++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void +++volk_32f_x2_subtract_32f_u_avx(float* cVector, const float* aVector, +++ const float* bVector, unsigned int num_points) +++{ +++ unsigned int number = 0; +++ const unsigned int eighthPoints = num_points / 8; +++ +++ float* cPtr = cVector; +++ const float* aPtr = aVector; +++ const float* bPtr = bVector; +++ +++ __m256 aVal, bVal, cVal; +++ for(;number < eighthPoints; number++){ +++ +++ aVal = _mm256_loadu_ps(aPtr); +++ bVal = _mm256_loadu_ps(bPtr); +++ +++ cVal = _mm256_sub_ps(aVal, bVal); +++ +++ _mm256_storeu_ps(cPtr,cVal); // Store the results back into the C container +++ +++ aPtr += 8; +++ bPtr += 8; +++ cPtr += 8; +++ } +++ +++ number = eighthPoints * 8; +++ for(;number < num_points; number++){ +++ *cPtr++ = (*aPtr++) - (*bPtr++); +++ } +++} +++#endif /* LV_HAVE_AVX */ +++ +++#endif /* INCLUDED_volk_32f_x2_subtract_32f_u_H */ ++-- ++2.11.0 ++ diff --cc debian/patches/0016-kernels-Adds-AVX-support-to-volk_32f_-kernels.patch index 0000000,0000000..5324a5c new file mode 100644 --- /dev/null +++ b/debian/patches/0016-kernels-Adds-AVX-support-to-volk_32f_-kernels.patch @@@ -1,0 -1,0 +1,422 @@@ ++From 940489f72b2c80f6b5dc514401773bf67a992f23 Mon Sep 17 00:00:00 2001 ++From: Damian Miralles ++Date: Fri, 15 Dec 2017 23:05:58 -0700 ++Subject: [PATCH 16/18] kernels: Adds AVX support to `volk_32f_*` kernels ++ ++Adds AVX support to `volk_32f_s32f_normalize`,`volk_32f_s32f_stddev_32f`, ++`volk_32f_sqrt_32f`, `volk_32f_x2_max_32f` and `volk_32f_x2_min_32f`. ++Some speed improvements can be seen with the new protokernel addition. ++--- ++ kernels/volk/volk_32f_s32f_normalize.h | 74 ++++++++++++++++++++++++++++- ++ kernels/volk/volk_32f_s32f_stddev_32f.h | 59 +++++++++++++++++++++++ ++ kernels/volk/volk_32f_sqrt_32f.h | 33 +++++++++++++ ++ kernels/volk/volk_32f_x2_max_32f.h | 84 +++++++++++++++++++++++++++++++++ ++ kernels/volk/volk_32f_x2_min_32f.h | 84 +++++++++++++++++++++++++++++++++ ++ 5 files changed, 333 insertions(+), 1 deletion(-) ++ ++diff --git a/kernels/volk/volk_32f_s32f_normalize.h b/kernels/volk/volk_32f_s32f_normalize.h ++index 52bf006..17d9da9 100644 ++--- a/kernels/volk/volk_32f_s32f_normalize.h +++++ b/kernels/volk/volk_32f_s32f_normalize.h ++@@ -105,6 +105,39 @@ static inline void volk_32f_s32f_normalize_a_sse(float* vecBuffer, const float s ++ } ++ #endif /* LV_HAVE_SSE */ ++ +++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void volk_32f_s32f_normalize_a_avx(float* vecBuffer, const float scalar, unsigned int num_points){ +++ unsigned int number = 0; +++ float* inputPtr = vecBuffer; +++ +++ const float invScalar = 1.0 / scalar; +++ __m256 vecScalar = _mm256_set1_ps(invScalar); +++ __m256 input1; +++ +++ const uint64_t eigthPoints = num_points / 8; +++ for(;number < eigthPoints; number++){ +++ +++ input1 = _mm256_load_ps(inputPtr); +++ +++ input1 = _mm256_mul_ps(input1, vecScalar); +++ +++ _mm256_store_ps(inputPtr, input1); +++ +++ inputPtr += 8; +++ } +++ +++ number = eigthPoints*8; +++ for(; number < num_points; number++){ +++ *inputPtr *= invScalar; +++ inputPtr++; +++ } +++} +++#endif /* LV_HAVE_AVX */ +++ +++ ++ #ifdef LV_HAVE_GENERIC ++ ++ static inline void volk_32f_s32f_normalize_generic(float* vecBuffer, const float scalar, unsigned int num_points){ ++@@ -128,6 +161,45 @@ static inline void volk_32f_s32f_normalize_u_orc(float* vecBuffer, const float s ++ #endif /* LV_HAVE_GENERIC */ ++ ++ +++#endif /* INCLUDED_volk_32f_s32f_normalize_a_H */ ++ ++ ++-#endif /* INCLUDED_volk_32f_s32f_normalize_a_H */ +++#ifndef INCLUDED_volk_32f_s32f_normalize_u_H +++#define INCLUDED_volk_32f_s32f_normalize_u_H +++ +++#include +++#include +++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void volk_32f_s32f_normalize_u_avx(float* vecBuffer, const float scalar, unsigned int num_points){ +++ unsigned int number = 0; +++ float* inputPtr = vecBuffer; +++ +++ const float invScalar = 1.0 / scalar; +++ __m256 vecScalar = _mm256_set1_ps(invScalar); +++ __m256 input1; +++ +++ const uint64_t eigthPoints = num_points / 8; +++ for(;number < eigthPoints; number++){ +++ +++ input1 = _mm256_loadu_ps(inputPtr); +++ +++ input1 = _mm256_mul_ps(input1, vecScalar); +++ +++ _mm256_storeu_ps(inputPtr, input1); +++ +++ inputPtr += 8; +++ } +++ +++ number = eigthPoints*8; +++ for(; number < num_points; number++){ +++ *inputPtr *= invScalar; +++ inputPtr++; +++ } +++} +++#endif /* LV_HAVE_AVX */ +++ +++ +++#endif /* INCLUDED_volk_32f_s32f_normalize_u_H */ ++diff --git a/kernels/volk/volk_32f_s32f_stddev_32f.h b/kernels/volk/volk_32f_s32f_stddev_32f.h ++index 30f0ed6..f97a783 100644 ++--- a/kernels/volk/volk_32f_s32f_stddev_32f.h +++++ b/kernels/volk/volk_32f_s32f_stddev_32f.h ++@@ -132,6 +132,65 @@ volk_32f_s32f_stddev_32f_a_sse4_1(float* stddev, const float* inputBuffer, ++ #endif /* LV_HAVE_SSE4_1 */ ++ ++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void +++volk_32f_s32f_stddev_32f_a_avx(float* stddev, const float* inputBuffer, +++ const float mean, unsigned int num_points) +++{ +++ float returnValue = 0; +++ if(num_points > 0){ +++ unsigned int number = 0; +++ const unsigned int thirtySecondPoints = num_points / 32; +++ +++ const float* aPtr = inputBuffer; +++ +++ __VOLK_ATTR_ALIGNED(32) float squareBuffer[8]; +++ +++ __m256 squareAccumulator = _mm256_setzero_ps(); +++ __m256 aVal1, aVal2, aVal3, aVal4; +++ __m256 cVal1, cVal2, cVal3, cVal4; +++ for(;number < thirtySecondPoints; number++) { +++ aVal1 = _mm256_load_ps(aPtr); aPtr += 8; +++ cVal1 = _mm256_dp_ps(aVal1, aVal1, 0xF1); +++ +++ aVal2 = _mm256_load_ps(aPtr); aPtr += 8; +++ cVal2 = _mm256_dp_ps(aVal2, aVal2, 0xF2); +++ +++ aVal3 = _mm256_load_ps(aPtr); aPtr += 8; +++ cVal3 = _mm256_dp_ps(aVal3, aVal3, 0xF4); +++ +++ aVal4 = _mm256_load_ps(aPtr); aPtr += 8; +++ cVal4 = _mm256_dp_ps(aVal4, aVal4, 0xF8); +++ +++ cVal1 = _mm256_or_ps(cVal1, cVal2); +++ cVal3 = _mm256_or_ps(cVal3, cVal4); +++ cVal1 = _mm256_or_ps(cVal1, cVal3); +++ +++ squareAccumulator = _mm256_add_ps(squareAccumulator, cVal1); // squareAccumulator += x^2 +++ } +++ _mm256_store_ps(squareBuffer,squareAccumulator); // Store the results back into the C container +++ returnValue = squareBuffer[0]; returnValue += squareBuffer[1]; +++ returnValue += squareBuffer[2]; returnValue += squareBuffer[3]; +++ returnValue += squareBuffer[4]; returnValue += squareBuffer[5]; +++ returnValue += squareBuffer[6]; returnValue += squareBuffer[7]; +++ +++ number = thirtySecondPoints * 32; +++ for(;number < num_points; number++){ +++ returnValue += (*aPtr) * (*aPtr); +++ aPtr++; +++ } +++ returnValue /= num_points; +++ returnValue -= (mean * mean); +++ returnValue = sqrtf(returnValue); +++ } +++ *stddev = returnValue; +++} +++ +++#endif /* LV_HAVE_AVX */ +++ +++ ++ #ifdef LV_HAVE_SSE ++ #include ++ ++diff --git a/kernels/volk/volk_32f_sqrt_32f.h b/kernels/volk/volk_32f_sqrt_32f.h ++index a5851a0..174f8e3 100644 ++--- a/kernels/volk/volk_32f_sqrt_32f.h +++++ b/kernels/volk/volk_32f_sqrt_32f.h ++@@ -102,6 +102,39 @@ volk_32f_sqrt_32f_a_sse(float* cVector, const float* aVector, unsigned int num_p ++ #endif /* LV_HAVE_SSE */ ++ ++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void +++volk_32f_sqrt_32f_a_avx(float* cVector, const float* aVector, unsigned int num_points) +++{ +++ unsigned int number = 0; +++ const unsigned int eigthPoints = num_points / 8; +++ +++ float* cPtr = cVector; +++ const float* aPtr = aVector; +++ +++ __m256 aVal, cVal; +++ for(;number < eigthPoints; number++) { +++ aVal = _mm256_load_ps(aPtr); +++ +++ cVal = _mm256_sqrt_ps(aVal); +++ +++ _mm256_store_ps(cPtr,cVal); // Store the results back into the C container +++ +++ aPtr += 8; +++ cPtr += 8; +++ } +++ +++ number = eigthPoints * 8; +++ for(;number < num_points; number++) { +++ *cPtr++ = sqrtf(*aPtr++); +++ } +++} +++ +++#endif /* LV_HAVE_AVX */ +++ +++ ++ #ifdef LV_HAVE_NEON ++ #include ++ ++diff --git a/kernels/volk/volk_32f_x2_max_32f.h b/kernels/volk/volk_32f_x2_max_32f.h ++index 14747c2..1dc0f7d 100644 ++--- a/kernels/volk/volk_32f_x2_max_32f.h +++++ b/kernels/volk/volk_32f_x2_max_32f.h ++@@ -112,6 +112,44 @@ volk_32f_x2_max_32f_a_sse(float* cVector, const float* aVector, ++ #endif /* LV_HAVE_SSE */ ++ ++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void +++volk_32f_x2_max_32f_a_avx(float* cVector, const float* aVector, +++ const float* bVector, unsigned int num_points) +++{ +++ unsigned int number = 0; +++ const unsigned int eigthPoints = num_points / 8; +++ +++ float* cPtr = cVector; +++ const float* aPtr = aVector; +++ const float* bPtr= bVector; +++ +++ __m256 aVal, bVal, cVal; +++ for(;number < eigthPoints; number++){ +++ aVal = _mm256_load_ps(aPtr); +++ bVal = _mm256_load_ps(bPtr); +++ +++ cVal = _mm256_max_ps(aVal, bVal); +++ +++ _mm256_store_ps(cPtr,cVal); // Store the results back into the C container +++ +++ aPtr += 8; +++ bPtr += 8; +++ cPtr += 8; +++ } +++ +++ number = eigthPoints * 8; +++ for(;number < num_points; number++){ +++ const float a = *aPtr++; +++ const float b = *bPtr++; +++ *cPtr++ = ( a > b ? a : b); +++ } +++} +++#endif /* LV_HAVE_AVX */ +++ +++ ++ #ifdef LV_HAVE_NEON ++ #include ++ ++@@ -180,3 +218,49 @@ volk_32f_x2_max_32f_u_orc(float* cVector, const float* aVector, ++ ++ ++ #endif /* INCLUDED_volk_32f_x2_max_32f_a_H */ +++ +++ +++#ifndef INCLUDED_volk_32f_x2_max_32f_u_H +++#define INCLUDED_volk_32f_x2_max_32f_u_H +++ +++#include +++#include +++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void +++volk_32f_x2_max_32f_u_avx(float* cVector, const float* aVector, +++ const float* bVector, unsigned int num_points) +++{ +++ unsigned int number = 0; +++ const unsigned int eigthPoints = num_points / 8; +++ +++ float* cPtr = cVector; +++ const float* aPtr = aVector; +++ const float* bPtr= bVector; +++ +++ __m256 aVal, bVal, cVal; +++ for(;number < eigthPoints; number++){ +++ aVal = _mm256_loadu_ps(aPtr); +++ bVal = _mm256_loadu_ps(bPtr); +++ +++ cVal = _mm256_max_ps(aVal, bVal); +++ +++ _mm256_storeu_ps(cPtr,cVal); // Store the results back into the C container +++ +++ aPtr += 8; +++ bPtr += 8; +++ cPtr += 8; +++ } +++ +++ number = eigthPoints * 8; +++ for(;number < num_points; number++){ +++ const float a = *aPtr++; +++ const float b = *bPtr++; +++ *cPtr++ = ( a > b ? a : b); +++ } +++} +++#endif /* LV_HAVE_AVX */ +++ +++#endif /* INCLUDED_volk_32f_x2_max_32f_u_H */ ++diff --git a/kernels/volk/volk_32f_x2_min_32f.h b/kernels/volk/volk_32f_x2_min_32f.h ++index f3cbae1..3beb5fa 100644 ++--- a/kernels/volk/volk_32f_x2_min_32f.h +++++ b/kernels/volk/volk_32f_x2_min_32f.h ++@@ -112,6 +112,44 @@ volk_32f_x2_min_32f_a_sse(float* cVector, const float* aVector, ++ #endif /* LV_HAVE_SSE */ ++ ++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void +++volk_32f_x2_min_32f_a_avx(float* cVector, const float* aVector, +++ const float* bVector, unsigned int num_points) +++{ +++ unsigned int number = 0; +++ const unsigned int eigthPoints = num_points / 8; +++ +++ float* cPtr = cVector; +++ const float* aPtr = aVector; +++ const float* bPtr= bVector; +++ +++ __m256 aVal, bVal, cVal; +++ for(;number < eigthPoints; number++){ +++ aVal = _mm256_load_ps(aPtr); +++ bVal = _mm256_load_ps(bPtr); +++ +++ cVal = _mm256_min_ps(aVal, bVal); +++ +++ _mm256_store_ps(cPtr,cVal); // Store the results back into the C container +++ +++ aPtr += 8; +++ bPtr += 8; +++ cPtr += 8; +++ } +++ +++ number = eigthPoints * 8; +++ for(;number < num_points; number++){ +++ const float a = *aPtr++; +++ const float b = *bPtr++; +++ *cPtr++ = ( a < b ? a : b); +++ } +++} +++#endif /* LV_HAVE_AVX */ +++ +++ ++ #ifdef LV_HAVE_NEON ++ #include ++ ++@@ -183,3 +221,49 @@ volk_32f_x2_min_32f_u_orc(float* cVector, const float* aVector, ++ ++ ++ #endif /* INCLUDED_volk_32f_x2_min_32f_a_H */ +++ +++ +++#ifndef INCLUDED_volk_32f_x2_min_32f_u_H +++#define INCLUDED_volk_32f_x2_min_32f_u_H +++ +++#include +++#include +++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void +++volk_32f_x2_min_32f_u_avx(float* cVector, const float* aVector, +++ const float* bVector, unsigned int num_points) +++{ +++ unsigned int number = 0; +++ const unsigned int eigthPoints = num_points / 8; +++ +++ float* cPtr = cVector; +++ const float* aPtr = aVector; +++ const float* bPtr= bVector; +++ +++ __m256 aVal, bVal, cVal; +++ for(;number < eigthPoints; number++){ +++ aVal = _mm256_loadu_ps(aPtr); +++ bVal = _mm256_loadu_ps(bPtr); +++ +++ cVal = _mm256_min_ps(aVal, bVal); +++ +++ _mm256_storeu_ps(cPtr,cVal); // Store the results back into the C container +++ +++ aPtr += 8; +++ bPtr += 8; +++ cPtr += 8; +++ } +++ +++ number = eigthPoints * 8; +++ for(;number < num_points; number++){ +++ const float a = *aPtr++; +++ const float b = *bPtr++; +++ *cPtr++ = ( a < b ? a : b); +++ } +++} +++#endif /* LV_HAVE_AVX */ +++ +++#endif /* INCLUDED_volk_32f_x2_min_32f_u_H */ ++-- ++2.11.0 ++ diff --cc debian/patches/0017-kernels-Add-AVX-support-to-32f_x2_divide_32f-32f_x2_.patch index 0000000,0000000..2f49f7c new file mode 100644 --- /dev/null +++ b/debian/patches/0017-kernels-Add-AVX-support-to-32f_x2_divide_32f-32f_x2_.patch @@@ -1,0 -1,0 +1,271 @@@ ++From 0dd53d3ad8e24a833342b369743f274a15a66274 Mon Sep 17 00:00:00 2001 ++From: Damian Miralles ++Date: Wed, 20 Dec 2017 21:01:52 -0700 ++Subject: [PATCH 17/18] kernels: Add AVX support to ++ `32f_x2_divide_32f`,`32f_x2_dot_prod_16i` ++ ++Adds protokernels for AVX support. Modest speed improvements in some of ++the kernels, however, it seems to be related to the host architecture ++being used ++--- ++ kernels/volk/volk_32f_x2_divide_32f.h | 80 +++++++++++++++++ ++ kernels/volk/volk_32f_x2_dot_prod_16i.h | 148 ++++++++++++++++++++++++++++++++ ++ 2 files changed, 228 insertions(+) ++ ++diff --git a/kernels/volk/volk_32f_x2_divide_32f.h b/kernels/volk/volk_32f_x2_divide_32f.h ++index d724173..7cc34ca 100644 ++--- a/kernels/volk/volk_32f_x2_divide_32f.h +++++ b/kernels/volk/volk_32f_x2_divide_32f.h ++@@ -110,6 +110,42 @@ volk_32f_x2_divide_32f_a_sse(float* cVector, const float* aVector, ++ #endif /* LV_HAVE_SSE */ ++ ++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void +++volk_32f_x2_divide_32f_a_avx(float* cVector, const float* aVector, +++ const float* bVector, unsigned int num_points) +++{ +++ unsigned int number = 0; +++ const unsigned int eigthPoints = num_points / 8; +++ +++ float* cPtr = cVector; +++ const float* aPtr = aVector; +++ const float* bPtr= bVector; +++ +++ __m256 aVal, bVal, cVal; +++ for(;number < eigthPoints; number++){ +++ aVal = _mm256_load_ps(aPtr); +++ bVal = _mm256_load_ps(bPtr); +++ +++ cVal = _mm256_div_ps(aVal, bVal); +++ +++ _mm256_store_ps(cPtr,cVal); // Store the results back into the C container +++ +++ aPtr += 8; +++ bPtr += 8; +++ cPtr += 8; +++ } +++ +++ number = eigthPoints * 8; +++ for(;number < num_points; number++){ +++ *cPtr++ = (*aPtr++) / (*bPtr++); +++ } +++} +++#endif /* LV_HAVE_AVX */ +++ +++ ++ #ifdef LV_HAVE_GENERIC ++ ++ static inline void ++@@ -145,3 +181,47 @@ volk_32f_x2_divide_32f_u_orc(float* cVector, const float* aVector, ++ ++ ++ #endif /* INCLUDED_volk_32f_x2_divide_32f_a_H */ +++ +++ +++#ifndef INCLUDED_volk_32f_x2_divide_32f_u_H +++#define INCLUDED_volk_32f_x2_divide_32f_u_H +++ +++#include +++#include +++ +++#ifdef LV_HAVE_AVX +++#include +++ +++static inline void +++volk_32f_x2_divide_32f_u_avx(float* cVector, const float* aVector, +++ const float* bVector, unsigned int num_points) +++{ +++ unsigned int number = 0; +++ const unsigned int eigthPoints = num_points / 8; +++ +++ float* cPtr = cVector; +++ const float* aPtr = aVector; +++ const float* bPtr= bVector; +++ +++ __m256 aVal, bVal, cVal; +++ for(;number < eigthPoints; number++){ +++ aVal = _mm256_loadu_ps(aPtr); +++ bVal = _mm256_loadu_ps(bPtr); +++ +++ cVal = _mm256_div_ps(aVal, bVal); +++ +++ _mm256_storeu_ps(cPtr,cVal); // Store the results back into the C container +++ +++ aPtr += 8; +++ bPtr += 8; +++ cPtr += 8; +++ } +++ +++ number = eigthPoints * 8; +++ for(;number < num_points; number++){ +++ *cPtr++ = (*aPtr++) / (*bPtr++); +++ } +++} +++#endif /* LV_HAVE_AVX */ +++ +++#endif /* INCLUDED_volk_32f_x2_divide_32f_u_H */ ++diff --git a/kernels/volk/volk_32f_x2_dot_prod_16i.h b/kernels/volk/volk_32f_x2_dot_prod_16i.h ++index 15f01b7..a1279cf 100644 ++--- a/kernels/volk/volk_32f_x2_dot_prod_16i.h +++++ b/kernels/volk/volk_32f_x2_dot_prod_16i.h ++@@ -82,6 +82,154 @@ static inline void volk_32f_x2_dot_prod_16i_generic(int16_t* result, const float ++ #endif /*LV_HAVE_GENERIC*/ ++ ++ +++#ifdef LV_HAVE_AVX +++ +++static inline void volk_32f_x2_dot_prod_16i_a_avx(int16_t* result, const float* input, const float* taps, unsigned int num_points) { +++ +++ unsigned int number = 0; +++ const unsigned int thirtySecondPoints = num_points / 32; +++ +++ float dotProduct = 0; +++ const float* aPtr = input; +++ const float* bPtr = taps; +++ +++ __m256 a0Val, a1Val, a2Val, a3Val; +++ __m256 b0Val, b1Val, b2Val, b3Val; +++ __m256 c0Val, c1Val, c2Val, c3Val; +++ +++ __m256 dotProdVal0 = _mm256_setzero_ps(); +++ __m256 dotProdVal1 = _mm256_setzero_ps(); +++ __m256 dotProdVal2 = _mm256_setzero_ps(); +++ __m256 dotProdVal3 = _mm256_setzero_ps(); +++ +++ for(;number < thirtySecondPoints; number++){ +++ +++ a0Val = _mm256_load_ps(aPtr); +++ a1Val = _mm256_load_ps(aPtr+8); +++ a2Val = _mm256_load_ps(aPtr+16); +++ a3Val = _mm256_load_ps(aPtr+24); +++ +++ b0Val = _mm256_load_ps(bPtr); +++ b1Val = _mm256_load_ps(bPtr+8); +++ b2Val = _mm256_load_ps(bPtr+16); +++ b3Val = _mm256_load_ps(bPtr+24); +++ +++ c0Val = _mm256_mul_ps(a0Val, b0Val); +++ c1Val = _mm256_mul_ps(a1Val, b1Val); +++ c2Val = _mm256_mul_ps(a2Val, b2Val); +++ c3Val = _mm256_mul_ps(a3Val, b3Val); +++ +++ dotProdVal0 = _mm256_add_ps(c0Val, dotProdVal0); +++ dotProdVal1 = _mm256_add_ps(c1Val, dotProdVal1); +++ dotProdVal2 = _mm256_add_ps(c2Val, dotProdVal2); +++ dotProdVal3 = _mm256_add_ps(c3Val, dotProdVal3); +++ +++ aPtr += 32; +++ bPtr += 32; +++ } +++ +++ dotProdVal0 = _mm256_add_ps(dotProdVal0, dotProdVal1); +++ dotProdVal0 = _mm256_add_ps(dotProdVal0, dotProdVal2); +++ dotProdVal0 = _mm256_add_ps(dotProdVal0, dotProdVal3); +++ +++ __VOLK_ATTR_ALIGNED(32) float dotProductVector[8]; +++ +++ _mm256_store_ps(dotProductVector,dotProdVal0); // Store the results back into the dot product vector +++ +++ dotProduct = dotProductVector[0]; +++ dotProduct += dotProductVector[1]; +++ dotProduct += dotProductVector[2]; +++ dotProduct += dotProductVector[3]; +++ dotProduct += dotProductVector[4]; +++ dotProduct += dotProductVector[5]; +++ dotProduct += dotProductVector[6]; +++ dotProduct += dotProductVector[7]; +++ +++ number = thirtySecondPoints*32; +++ for(;number < num_points; number++){ +++ dotProduct += ((*aPtr++) * (*bPtr++)); +++ } +++ +++ *result = (short)dotProduct; +++} +++ +++#endif /*LV_HAVE_AVX*/ +++ +++ +++#ifdef LV_HAVE_AVX +++ +++static inline void volk_32f_x2_dot_prod_16i_u_avx(int16_t* result, const float* input, const float* taps, unsigned int num_points) { +++ +++ unsigned int number = 0; +++ const unsigned int thirtySecondPoints = num_points / 32; +++ +++ float dotProduct = 0; +++ const float* aPtr = input; +++ const float* bPtr = taps; +++ +++ __m256 a0Val, a1Val, a2Val, a3Val; +++ __m256 b0Val, b1Val, b2Val, b3Val; +++ __m256 c0Val, c1Val, c2Val, c3Val; +++ +++ __m256 dotProdVal0 = _mm256_setzero_ps(); +++ __m256 dotProdVal1 = _mm256_setzero_ps(); +++ __m256 dotProdVal2 = _mm256_setzero_ps(); +++ __m256 dotProdVal3 = _mm256_setzero_ps(); +++ +++ for(;number < thirtySecondPoints; number++){ +++ +++ a0Val = _mm256_loadu_ps(aPtr); +++ a1Val = _mm256_loadu_ps(aPtr+8); +++ a2Val = _mm256_loadu_ps(aPtr+16); +++ a3Val = _mm256_loadu_ps(aPtr+24); +++ +++ b0Val = _mm256_loadu_ps(bPtr); +++ b1Val = _mm256_loadu_ps(bPtr+8); +++ b2Val = _mm256_loadu_ps(bPtr+16); +++ b3Val = _mm256_loadu_ps(bPtr+24); +++ +++ c0Val = _mm256_mul_ps(a0Val, b0Val); +++ c1Val = _mm256_mul_ps(a1Val, b1Val); +++ c2Val = _mm256_mul_ps(a2Val, b2Val); +++ c3Val = _mm256_mul_ps(a3Val, b3Val); +++ +++ dotProdVal0 = _mm256_add_ps(c0Val, dotProdVal0); +++ dotProdVal1 = _mm256_add_ps(c1Val, dotProdVal1); +++ dotProdVal2 = _mm256_add_ps(c2Val, dotProdVal2); +++ dotProdVal3 = _mm256_add_ps(c3Val, dotProdVal3); +++ +++ aPtr += 32; +++ bPtr += 32; +++ } +++ +++ dotProdVal0 = _mm256_add_ps(dotProdVal0, dotProdVal1); +++ dotProdVal0 = _mm256_add_ps(dotProdVal0, dotProdVal2); +++ dotProdVal0 = _mm256_add_ps(dotProdVal0, dotProdVal3); +++ +++ __VOLK_ATTR_ALIGNED(32) float dotProductVector[8]; +++ +++ _mm256_storeu_ps(dotProductVector,dotProdVal0); // Store the results back into the dot product vector +++ +++ dotProduct = dotProductVector[0]; +++ dotProduct += dotProductVector[1]; +++ dotProduct += dotProductVector[2]; +++ dotProduct += dotProductVector[3]; +++ dotProduct += dotProductVector[4]; +++ dotProduct += dotProductVector[5]; +++ dotProduct += dotProductVector[6]; +++ dotProduct += dotProductVector[7]; +++ +++ number = thirtySecondPoints*32; +++ for(;number < num_points; number++){ +++ dotProduct += ((*aPtr++) * (*bPtr++)); +++ } +++ +++ *result = (short)dotProduct; +++} +++ +++#endif /*LV_HAVE_AVX*/ +++ +++ ++ #ifdef LV_HAVE_SSE ++ ++ static inline void volk_32f_x2_dot_prod_16i_a_sse(int16_t* result, const float* input, const float* taps, unsigned int num_points) { ++-- ++2.11.0 ++ diff --cc debian/patches/0018-fix-GH-issue-139-for-32fc_index_max_-kernels.patch index 0000000,0000000..e6fcad1 new file mode 100644 --- /dev/null +++ b/debian/patches/0018-fix-GH-issue-139-for-32fc_index_max_-kernels.patch @@@ -1,0 -1,0 +1,42 @@@ ++From 0109b2ed06f907363d3ea5a05d24db4992e2d1a5 Mon Sep 17 00:00:00 2001 ++From: Nathan West ++Date: Tue, 23 Jan 2018 12:02:03 -0500 ++Subject: [PATCH 18/18] fix GH issue #139 for 32fc_index_max_* kernels ++ ++--- ++ kernels/volk/volk_32fc_index_max_16u.h | 3 +-- ++ kernels/volk/volk_32fc_index_max_32u.h | 2 +- ++ 2 files changed, 2 insertions(+), 3 deletions(-) ++ ++diff --git a/kernels/volk/volk_32fc_index_max_16u.h b/kernels/volk/volk_32fc_index_max_16u.h ++index c13196a..14b0d22 100644 ++--- a/kernels/volk/volk_32fc_index_max_16u.h +++++ b/kernels/volk/volk_32fc_index_max_16u.h ++@@ -115,10 +115,9 @@ volk_32fc_index_max_16u_a_sse3(uint16_t* target, lv_32fc_t* src0, ++ int i = 0; ++ ++ xmm8 = _mm_set_epi32(3, 2, 1, 0);//remember the crazy reverse order! ++- xmm9 = xmm8 = _mm_setzero_si128(); +++ xmm9 = _mm_setzero_si128(); ++ xmm10 = _mm_set_epi32(4, 4, 4, 4); ++ xmm3 = _mm_setzero_ps(); ++- ++ //printf("%f, %f, %f, %f\n", ((float*)&xmm10)[0], ((float*)&xmm10)[1], ((float*)&xmm10)[2], ((float*)&xmm10)[3]); ++ ++ for(; i < bound; ++i) { ++diff --git a/kernels/volk/volk_32fc_index_max_32u.h b/kernels/volk/volk_32fc_index_max_32u.h ++index ad794fb..5665582 100644 ++--- a/kernels/volk/volk_32fc_index_max_32u.h +++++ b/kernels/volk/volk_32fc_index_max_32u.h ++@@ -104,7 +104,7 @@ volk_32fc_index_max_32u_a_sse3(uint32_t* target, lv_32fc_t* src0, ++ int i = 0; ++ ++ xmm8 = _mm_set_epi32(3, 2, 1, 0);//remember the crazy reverse order! ++- xmm9 = xmm8 = _mm_setzero_si128(); +++ xmm9 = _mm_setzero_si128(); ++ xmm10 = _mm_set_epi32(4, 4, 4, 4); ++ xmm3 = _mm_setzero_ps(); ++ ++-- ++2.11.0 ++ diff --cc debian/patches/install-all-headers index 0000000,0000000..dfaed77 new file mode 100644 --- /dev/null +++ b/debian/patches/install-all-headers @@@ -1,0 -1,0 +1,33 @@@ ++From: A. Maitland Bottoms ++Subject: install all headers ++ ++(Along with some sorting) ++ ++--- a/CMakeLists.txt +++++ b/CMakeLists.txt ++@@ -158,17 +158,20 @@ ++ ) ++ ++ install(FILES ++- ${CMAKE_SOURCE_DIR}/include/volk/volk_prefs.h ++- ${CMAKE_SOURCE_DIR}/include/volk/volk_complex.h ++- ${CMAKE_SOURCE_DIR}/include/volk/volk_common.h +++ ${CMAKE_SOURCE_DIR}/include/volk/constants.h +++ ${CMAKE_SOURCE_DIR}/include/volk/saturation_arithmetic.h ++ ${CMAKE_SOURCE_DIR}/include/volk/volk_avx_intrinsics.h ++- ${CMAKE_SOURCE_DIR}/include/volk/volk_sse3_intrinsics.h +++ ${CMAKE_SOURCE_DIR}/include/volk/volk_common.h +++ ${CMAKE_SOURCE_DIR}/include/volk/volk_complex.h +++ ${CMAKE_SOURCE_DIR}/include/volk/volk_malloc.h ++ ${CMAKE_SOURCE_DIR}/include/volk/volk_neon_intrinsics.h +++ ${CMAKE_SOURCE_DIR}/include/volk/volk_prefs.h +++ ${CMAKE_SOURCE_DIR}/include/volk/volk_sse3_intrinsics.h +++ ${CMAKE_SOURCE_DIR}/include/volk/volk_sse_intrinsics.h ++ ${CMAKE_BINARY_DIR}/include/volk/volk.h ++ ${CMAKE_BINARY_DIR}/include/volk/volk_cpu.h ++ ${CMAKE_BINARY_DIR}/include/volk/volk_config_fixed.h ++ ${CMAKE_BINARY_DIR}/include/volk/volk_typedefs.h ++- ${CMAKE_SOURCE_DIR}/include/volk/volk_malloc.h ++ DESTINATION include/volk ++ COMPONENT "volk_devel" ++ ) diff --cc debian/patches/libm-link index 0000000,0000000..3387f36 new file mode 100644 --- /dev/null +++ b/debian/patches/libm-link @@@ -1,0 -1,0 +1,11 @@@ ++--- a/lib/CMakeLists.txt +++++ b/lib/CMakeLists.txt ++@@ -544,7 +544,7 @@ ++ ++ #Add dynamic library ++ add_library(volk SHARED $) ++- target_link_libraries(volk ${volk_libraries}) +++ target_link_libraries(volk ${volk_libraries} m) ++ ++ #Configure target properties ++ set_target_properties(volk_obj PROPERTIES COMPILE_FLAGS "-fPIC") diff --cc debian/patches/make-acc-happy index 0000000,0000000..9ca7cb3 new file mode 100644 --- /dev/null +++ b/debian/patches/make-acc-happy @@@ -1,0 -1,0 +1,53 @@@ ++From: A. Maitland Bottoms ++Subject: make acc happy ++ ++The abi-compliance-checker grabs all the .h files it finds ++and tries to compile them all. Even though some are not ++appropriate for the architecture being run on. Being careful ++with preprocessor protections avoids probplems. ++ ++--- a/kernels/volk/volk_32f_8u_polarbutterflypuppet_32f.h +++++ b/kernels/volk/volk_32f_8u_polarbutterflypuppet_32f.h ++@@ -31,6 +31,7 @@ ++ #include ++ #include ++ #include +++#include ++ ++ ++ static inline void ++--- a/include/volk/volk_neon_intrinsics.h +++++ b/include/volk/volk_neon_intrinsics.h ++@@ -27,6 +27,7 @@ ++ ++ #ifndef INCLUDE_VOLK_VOLK_NEON_INTRINSICS_H_ ++ #define INCLUDE_VOLK_VOLK_NEON_INTRINSICS_H_ +++#ifdef LV_HAVE_NEON ++ #include ++ ++ static inline float32x4_t ++@@ -119,4 +120,5 @@ ++ return log2_approx; ++ } ++ +++#endif /*LV_HAVE_NEON*/ ++ #endif /* INCLUDE_VOLK_VOLK_NEON_INTRINSICS_H_ */ ++--- a/kernels/volk/volk_8u_x2_encodeframepolar_8u.h +++++ b/kernels/volk/volk_8u_x2_encodeframepolar_8u.h ++@@ -58,8 +58,6 @@ ++ } ++ } ++ ++-#ifdef LV_HAVE_GENERIC ++- ++ static inline void ++ volk_8u_x2_encodeframepolar_8u_generic(unsigned char* frame, unsigned char* temp, ++ unsigned int frame_size) ++@@ -79,7 +77,6 @@ ++ --stage; ++ } ++ } ++-#endif /* LV_HAVE_GENERIC */ ++ ++ #ifdef LV_HAVE_SSSE3 ++ #include diff --cc debian/patches/native-armv7-build-support index 0000000,0000000..a0ce781 new file mode 100644 --- /dev/null +++ b/debian/patches/native-armv7-build-support @@@ -1,0 -1,0 +1,59 @@@ ++From: A. Maitland Bottoms ++Subject: native armv7 build support ++ ++Debian, unlike other GNU Radio deployments, does not cross-compile ++packages, but builds natively on a set of build machines, including ++both arm and armhf. ++ ++--- a/lib/CMakeLists.txt +++++ b/lib/CMakeLists.txt ++@@ -250,6 +250,13 @@ ++ endif(NOT CPU_IS_x86) ++ ++ ######################################################################## +++# if building Debian armel, eliminate neon +++######################################################################## +++if(${CMAKE_LIBRARY_ARCHITECTURE} STREQUAL "arm-linux-gnueabi") +++ OVERRULE_ARCH(neon "Architecture is not armhf") +++endif(${CMAKE_LIBRARY_ARCHITECTURE} STREQUAL "arm-linux-gnueabi") +++ +++######################################################################## ++ # implement overruling in the ORC case, ++ # since ORC always passes flag detection ++ ######################################################################## ++@@ -414,7 +421,7 @@ ++ # Handle ASM support ++ # on by default, but let users turn it off ++ ######################################################################## ++-if(${CMAKE_VERSION} VERSION_GREATER "2.8.9") +++if((${CMAKE_VERSION} VERSION_GREATER "2.8.9") AND NOT (${CMAKE_LIBRARY_ARCHITECTURE} STREQUAL "arm-linux-gnueabi")) ++ set(ASM_ARCHS_AVAILABLE "neon") ++ ++ set(FULL_C_FLAGS "${CMAKE_C_FLAGS}" "${CMAKE_CXX_COMPILER_ARG1}") ++@@ -424,7 +431,7 @@ ++ # set up the assembler flags and include the source files ++ foreach(ARCH ${ASM_ARCHS_AVAILABLE}) ++ string(REGEX MATCH "${ARCH}" ASM_ARCH "${available_archs}") ++- if( ASM_ARCH STREQUAL "neon" ) +++ if(( ASM_ARCH STREQUAL "neon" ) OR ( ${CMAKE_SYSTEM_PROCESSOR} MATCHES "armv7")) ++ message(STATUS "---- Adding ASM files") # we always use ATT syntax ++ message(STATUS "-- Detected neon architecture; enabling ASM") ++ # setup architecture specific assembler flags ++@@ -443,7 +450,7 @@ ++ message(STATUS "asm flags: ${CMAKE_ASM_FLAGS}") ++ endforeach(ARCH) ++ ++-else(${CMAKE_VERSION} VERSION_GREATER "2.8.9") +++else((${CMAKE_VERSION} VERSION_GREATER "2.8.9") AND NOT (${CMAKE_LIBRARY_ARCHITECTURE} STREQUAL "arm-linux-gnueabi")) ++ message(STATUS "Not enabling ASM support. CMake >= 2.8.10 required.") ++ foreach(machine_name ${available_machines}) ++ string(REGEX MATCH "neon" NEON_MACHINE ${machine_name}) ++@@ -451,7 +458,7 @@ ++ message(FATAL_ERROR "CMake >= 2.8.10 is required for ARM NEON support") ++ endif() ++ endforeach() ++-endif(${CMAKE_VERSION} VERSION_GREATER "2.8.9") +++endif((${CMAKE_VERSION} VERSION_GREATER "2.8.9") AND NOT (${CMAKE_LIBRARY_ARCHITECTURE} STREQUAL "arm-linux-gnueabi")) ++ ++ ######################################################################## ++ # Handle orc support diff --cc debian/patches/series index 0000000,0000000..90dacbd new file mode 100644 --- /dev/null +++ b/debian/patches/series @@@ -1,0 -1,0 +1,22 @@@ ++0001-Add-a-AppVeyor-compatible-YAML-file-for-building-on-.patch ++0003-apps-fix-profile-update-reading-end-of-lines.patch ++0005-qa-lower-tolerance-for-32fc_mag-to-fix-issue-96.patch ++0006-Add-NEON-AVX-and-unaligned-versions-of-SSE4.1-and-SS.patch ++0007-added-__VOLK_PREFETCH-compatibility-macro.patch ++0008-Fix-bug-106-volk_64u_popcnt-bug-in-generic-implement.patch ++0009-modtool-deconflict-module-include-guards-from-main-v.patch ++0010-modtool-update-the-cmake-find-module-for-volk-mods.patch ++0011-Use-powf-to-match-variables-and-avoid-implicit-type-.patch ++0012-cmake-support-empty-CMAKE_INSTALL_PREFIX.patch ++0013-Support-relocated-install-with-VOLK_PREFIX-env-var.patch ++0014-Fixing-a-minimal-bug-in-the-log2-docstring.patch ++0015-kernel-Adds-unaligned-protokernles-to-32f_x2_s32f_in.patch ++0016-kernels-Adds-AVX-support-to-volk_32f_-kernels.patch ++0017-kernels-Add-AVX-support-to-32f_x2_divide_32f-32f_x2_.patch ++0018-fix-GH-issue-139-for-32fc_index_max_-kernels.patch ++native-armv7-build-support ++make-acc-happy ++sort-cmake-glob-lists ++install-all-headers ++sort-input-files.patch ++libm-link diff --cc debian/patches/sort-cmake-glob-lists index 0000000,0000000..dfb8699 new file mode 100644 --- /dev/null +++ b/debian/patches/sort-cmake-glob-lists @@@ -1,0 -1,0 +1,20 @@@ ++From: A. Maitland Bottoms ++Subject sort cmake glob lists ++ ++File lists are generated in a CMakeLists.txt file with file(GLOB ...), which varies ++with the readdir() order. Sorting the lists should help make reproducinble builds. ++ ++--- a/lib/CMakeLists.txt +++++ b/lib/CMakeLists.txt ++@@ -328,8 +328,11 @@ ++ ++ #dependencies are all python, xml, and header implementation files ++ file(GLOB xml_files ${PROJECT_SOURCE_DIR}/gen/*.xml) +++list(SORT xml_files) ++ file(GLOB py_files ${PROJECT_SOURCE_DIR}/gen/*.py) +++list(SORT py_files) ++ file(GLOB h_files ${PROJECT_SOURCE_DIR}/kernels/volk/*.h) +++list(SORT h_files) ++ ++ macro(gen_template tmpl output) ++ list(APPEND volk_gen_sources ${output}) diff --cc debian/patches/sort-input-files.patch index 0000000,0000000..42e53f0 new file mode 100644 --- /dev/null +++ b/debian/patches/sort-input-files.patch @@@ -1,0 -1,0 +1,51 @@@ ++From f6dbb5f8ba840075dde9f0aa1cc48b805ea4d1c5 Mon Sep 17 00:00:00 2001 ++From: "Bernhard M. Wiedemann" ++Date: Mon, 5 Jun 2017 21:37:38 +0200 ++Subject: [PATCH 2/5] sort input files ++ ++when building packages (e.g. for openSUSE Linux) ++(random) filesystem order of input files ++influences ordering of entries in the output, ++thus without the patch, builds (in disposable VMs) would usually differ. ++ ++See https://reproducible-builds.org/ for why this matters. ++--- ++ gen/volk_kernel_defs.py | 2 +- ++ python/volk_modtool/volk_modtool_generate.py | 6 +++--- ++ 2 files changed, 4 insertions(+), 4 deletions(-) ++ ++--- a/gen/volk_kernel_defs.py +++++ b/gen/volk_kernel_defs.py ++@@ -202,7 +202,7 @@ ++ ######################################################################## ++ __file__ = os.path.abspath(__file__) ++ srcdir = os.path.dirname(os.path.dirname(__file__)) ++-kernel_files = glob.glob(os.path.join(srcdir, "kernels", "volk", "*.h")) +++kernel_files = sorted(glob.glob(os.path.join(srcdir, "kernels", "volk", "*.h"))) ++ kernels = map(kernel_class, kernel_files) ++ ++ if __name__ == '__main__': ++--- a/python/volk_modtool/volk_modtool_generate.py +++++ b/python/volk_modtool/volk_modtool_generate.py ++@@ -58,10 +58,10 @@ ++ else: ++ name = self.get_basename(base) ++ if name == '': ++- hdr_files = glob.glob(os.path.join(base, "kernels/volk/*.h")) +++ hdr_files = sorted(glob.glob(os.path.join(base, "kernels/volk/*.h"))) ++ begins = re.compile("(?<=volk_).*") ++ else: ++- hdr_files = glob.glob(os.path.join(base, "kernels/volk_" + name + "/*.h")) +++ hdr_files = sorted(glob.glob(os.path.join(base, "kernels/volk_" + name + "/*.h"))) ++ begins = re.compile("(?<=volk_" + name + "_).*") ++ ++ datatypes = [] ++@@ -156,7 +156,7 @@ ++ open(dest, 'w+').write(outstring) ++ ++ # copy orc proto-kernels if they exist ++- for orcfile in glob.glob(inpath + '/kernels/volk/asm/orc/' + top + name + '*.orc'): +++ for orcfile in sorted(glob.glob(inpath + '/kernels/volk/asm/orc/' + top + name + '*.orc')): ++ if os.path.isfile(orcfile): ++ instring = open(orcfile, 'r').read() ++ outstring = re.sub(oldvolk, 'volk_' + self.my_dict['name'], instring) diff --cc debian/rules index 0000000,0000000..d58ca4b new file mode 100755 --- /dev/null +++ b/debian/rules @@@ -1,0 -1,0 +1,17 @@@ ++#!/usr/bin/make -f ++DEB_HOST_MULTIARCH ?= $(shell dpkg-architecture -qDEB_HOST_MULTIARCH) ++export DEB_HOST_MULTIARCH ++DH_VERBOSE=1 ++export DH_VERBOSE ++ ++%: ++ dh $@ --with python2 ++ ++override_dh_auto_configure: ++ dh_auto_configure -- -DLIB_SUFFIX="/$(DEB_HOST_MULTIARCH)" -DPYTHON_EXECUTABLE=/usr/bin/python ++ ++override_dh_auto_test: ++ - dh_auto_test -- CTEST_TEST_TIMEOUT=60 ++ ++override_dh_acc: ++ - dh_acc diff --cc debian/source/format index 0000000,0000000..163aaf8 new file mode 100644 --- /dev/null +++ b/debian/source/format @@@ -1,0 -1,0 +1,1 @@@ ++3.0 (quilt) diff --cc debian/source/include-binaries index 0000000,0000000..bc68a00 new file mode 100644 --- /dev/null +++ b/debian/source/include-binaries @@@ -1,0 -1,0 +1,1 @@@ ++debian/libvolk1-dev.abi.tar.gz.amd64 diff --cc debian/upstream/signing-key.asc index 0000000,0000000..f6d7f93 new file mode 100644 --- /dev/null +++ b/debian/upstream/signing-key.asc @@@ -1,0 -1,0 +1,52 @@@ ++-----BEGIN PGP PUBLIC KEY BLOCK----- ++Version: GnuPG v1 ++ ++mQINBFcTzE0BEACWkwa+pAwjBPwUvL8E9adB6sFlH/bw/3Dj2Vr/bXDkNrZDEQzc ++C3wmoX3AZo0GSWpjlmlOGOPy6u4wZxEPfilKs+eDNnuIZN3gmLoRTThgbbrnH9bw ++kIaUMiUn8VJ0pk5ULaygG6APxl4EOVrMfzgRnxmIbUfggiBLaW/xq2a/BaVrUAuA ++oHv1GTGJkwcK0RfYigJMfZl9iHVJVopffexBt1hOeGYxiyLXSDWjOhLLVzhlfgTE ++T9YdLGyjoXFmImsCvkAA2MA52e5YGUQIBrqmiXdHFit7sve0e5Dw0aLyuTnMR0MO ++a2eIHWU6TYYv5GTJPzjBbWM1pRCgtupNilg2+RfN0tOTp27RQnUtgcCo26uBU+jV ++pyvnidpDGnuUBL3WNLZlUiqmiZs8Hc9BGNw3rKB37sUOMXz6XessnhRspXC1Mot4 ++V3I1NoKwb0wjgqlkAYIGCCSuySosC5HH2OssopBUH6U5QXjFp11QbP2e+QkvKPKA ++S9V4ouSMrIDZ4krtu6QFDYsHa0zZ54yRl3O4UpfISlz3yngO2eKM019C5n51kd62 ++Ia00rtx8ypvUxMy67PTEFdCKLJ6Ua/hEGcpxGygFMRa0pjHSrC6e9LvPudK92jsq ++qO0TjhUytig5k9YPoEa2JGn/kqP+K1HGAdJPay/HmcNTZWh0hoamhuJ6NwARAQAB ++tCZOYXRoYW4gV2VzdCA8bmF0aGFuLndlc3RAZ251cmFkaW8ub3JnPokCPgQTAQIA ++KAUCVxPMTQIbAwUJA8JnAAYLCQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQOFMj ++7mQCCR20CA//VJfDu8W8BI/44JkucC+XBVqwOcfg/rcSHflgi0mNNz7hyJ+idwcB ++JVFSbhSpXucl6baJ0nDe8gcMuGFLyF4uLwCByX3ExDAnFL3Mu/jIyOUX8TGudZU7 ++wTEhzOLPxmXfbo8lw3TETC1Xsl8g1gU/KBJnTl3WbdGZUlKW6fP0TR5BMdYskNHm ++CCqAvXWniZwjSX/jlpWremfTU9i9DUad8ufcdJue7uiZRNq4JLaWmSbtGNzDzJIq ++6csHc3GFcd0Q/LDEDcm1AG081yLEmRnbTstZo+xW27yaRyoe1Dpm9ehsl19dVaO7 ++9ek2CEarqHjtRfO1MJMSBGiaS1lvujukYKZQRGNDKemDJwuQCVkxBMEef7SNX8XG ++2OPTARVp0hlrhMVFUk3hScekrKobq81YyCfWxBxxjRWySdInFhuT29cxxRLUxb69 ++3MKLzFJRlq+oEbWJN8QGqILQ785TZA8MdnMsGywPk43x9spgYbwPhtJYb/Aes9B9 ++NFkZ6EzVtzV7ztITuGhefRxt3eEmdFYNDHooWNFQdifcUgLoBgKOkP+oHOc+9mx7 ++6CDN9ZJTHb87W3ISw7SLI4YcMPYipEN5g51ceInDc3kXFYQ+EqU691kOuGNtx3ov ++qqvPm9PBR00GSwhLQt7s127MFpYx9+in87+UMBFXyo/VstVBPQW2GLq5Ag0EVxPM ++TQEQAK+fh+ckP728ZVRn5mr8PtsG3gktyS6LlH7EjMsHnvQR16EVAjn5G915OQUY ++Bk6yk9l0VRX0NLautc41NwVlHI4FYBBjz6mEnDocvo+BT0g5KYTyjJPOxmEzgVZW ++3Zp/jPjK5Z9YZTCIalrk2iHVQCe8fFCnaXNGNQoku1jBPRUOOTI979LWPx4d7MI0 ++7Yy+8xp5ogCrcTxea9VrMeXqnXzvy2peiceZDlvNmcEUCz222i6t2k9rUwY0+ozg ++TbsorE42h4B+a49ylY4zOX9fTPfsUj59/z/ilrxZy2qP2lBIFC+wFphKF3Qkilxd ++dnVGTsb9oKCQjuMcvh7MR27RVGLjW1pVMWGMmXBkIDu0U88Hn91XKfm1ZmWgksoU ++MC7BZocvUxIKnV+WiKy9ooP/HSzgP7ggdG+16B3yDdicB0DiBFEKZEmIWCBt5NXR ++q853WwFSH7xcrEOTXnqtkRUX4+obdwQhtqTueSC4xqX0+YVixZUC6ewqueFmPn+l ++WItCV7XU67NXTJNRC3i4kIF+hpT5YWtx56NuNcvhN25bZr1frTChOuXcCBNrOU+b ++yo2wpXAcfq+YmnaP0ZFFh7wKRi4leEPL/+JyitQbvSQU4Lejwanzvv7Ug1j4qZo1 ++A6WSxXYUWJY5rhh8nWYtJJOn5Wj4Y3gWa1taUpYw1g2lf0o5ABEBAAGJAiUEGAEC ++AA8FAlcTzE0CGwwFCQPCZwAACgkQOFMj7mQCCR2uXRAAiBsOfqp+QuQqO3OPW8OZ ++I2+JNbaaFEC1TorUhGs5XiT4wKyn1wDni4mavO4kJ8nK4Zc1qBYWeMOClj6JySJL ++yf0aVTjLyn+4Q4jt/9Dmn15wbOWZvdSICipfcLWmPLYniizsJWA4Mqoefcztmyxk ++FrJZ+Vri6MH5PxVuZjHhOUVfXIsqRhqqrpRjVnjzGvNxLgP3aLHfQPim/jbxaeRK ++oVtDNDLA+1nwdpZ8Hehe5OVfUKWuz1DXrdM0eY7pTRcms8+7y//AXpRqygH7TLx5 ++mXavdmAzgYcamQGfu/K4Mq9Bkgr1BNasgkxnPu+J0Z4jO9HsRBCJWf2BLKXmYedD ++5t0LR8bJHUTV7lsIifo0Ev47qsk1QX41KSKPAMwSzmtTLA0wzPJrkUEeVgm075N7 ++btLneqw5EyDcz3pJ7aD3HceWh+HZOREnfYXyMLxWTND7SKx0k6nmM8xasYHP0/6y ++mR8picMjbPlyoETe6B6yKi5rDjOrDwrKqBjulcUHsRhjAAUUI6IHgj4v5gCfTPS7 ++WrV98icGSHYnuxV40NT8Nt0lWNrPJhIUm1nu3UkEInznxMii1h6ga6REE/TJsStD ++C46x7fsiH4HkK1FJ+owoLhsVQo0OE4nWh8lWIBhTpR4wxThwfVHKt/H12st3tHuI ++CLIM6szb01rYgHTn9/vDgJE= ++=MlbD ++-----END PGP PUBLIC KEY BLOCK----- diff --cc debian/volk-config-info.1 index 0000000,0000000..e8d6efd new file mode 100644 --- /dev/null +++ b/debian/volk-config-info.1 @@@ -1,0 -1,0 +1,45 @@@ ++.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.40.10. ++.TH VOLK-CONFIG-INFO "1" "July 2014" "volk-config-info 0.1" "User Commands" ++.SH NAME ++volk-config-info \- pkgconfig-like tool for Vector Optimized Library of Kernels 0.1 ++.SH DESCRIPTION ++.SS "Program options: volk-config-info [options]:" ++.TP ++\fB\-h\fR [ \fB\-\-help\fR ] ++print help message ++.TP ++\fB\-\-prefix\fR ++print VOLK installation prefix ++.TP ++\fB\-\-builddate\fR ++print VOLK build date (RFC2822 format) ++.TP ++\fB\-\-cc\fR ++print VOLK C compiler version ++.TP ++\fB\-\-cflags\fR ++print VOLK CFLAGS ++.TP ++\fB\-\-all\-machines\fR ++print VOLK machines built into library ++.TP ++\fB\-\-avail\-machines\fR ++print VOLK machines the current platform can use ++.TP ++\fB\-\-machine\fR ++print the VOLK machine that will be used ++.TP ++\fB\-v\fR [ \fB\-\-version\fR ] ++print VOLK version ++.SH "SEE ALSO" ++The full documentation for ++.B volk-config-info ++is maintained as a Texinfo manual. If the ++.B info ++and ++.B volk-config-info ++programs are properly installed at your site, the command ++.IP ++.B info volk-config-info ++.PP ++should give you access to the complete manual. diff --cc debian/volk_modtool.1 index 0000000,0000000..752e7f5 new file mode 100644 --- /dev/null +++ b/debian/volk_modtool.1 @@@ -1,0 -1,0 +1,112 @@@ ++.TH GNURADIO "1" "August 2013" "volk_modtool 3.7" "User Commands" ++.SH NAME ++volk_modtool \- tailor VOLK modules ++.SH DESCRIPTION ++The volk_modtool tool is installed along with VOLK as a way of helping ++to construct, add to, and interogate the VOLK library or companion ++libraries. ++.P ++volk_modtool is installed into $prefix/bin. ++.P ++VOLK modtool enables creating standalone (out-of-tree) VOLK modules ++and provides a few tools for sharing VOLK kernels between VOLK ++modules. If you need to design or work with VOLK kernels away from ++the canonical VOLK library, this is the tool. If you need to tailor ++your own VOLK library for whatever reason, this is the tool. ++.P ++The canonical VOLK library installs a volk.h and a libvolk.so. Your ++own library will install volk_$name.h and libvolk_$name.so. Ya Gronk? ++Good. ++.P ++There isn't a substantial difference between the canonical VOLK ++module and any other VOLK module. They're all peers. Any module ++created via VOLK modtool will come complete with a default ++volk_modtool.cfg file associating the module with the base from which ++it came, its distinctive $name and its destination (or path). These ++values (created from user input if VOLK modtool runs without a ++user-supplied config file or a default config file) serve as default ++values for some VOLK modtool actions. It's more or less intended for ++the user to change directories to the top level of a created VOLK ++module and then run volk_modtool to take advantage of the values ++stored in the default volk_modtool.cfg file. ++.P ++Apart from creating new VOLK modules, VOLK modtool allows you to list ++the names of kernels in other modules, list the names of kernels in ++the current module, add kernels from another module into the current ++module, and remove kernels from the current module. When moving ++kernels between modules, VOLK modtool does its best to keep the qa ++and profiling code for those kernels intact. If the base has a test ++or a profiling call for some kernel, those calls will follow the ++kernel when VOLK modtool adds that kernel. If QA or profiling ++requires a puppet kernel, the puppet kernel will follow the original ++kernel when VOLK modtool adds that original kernel. VOLK modtool ++respects puppets. ++.P ++====================================================================== ++.P ++.SH Installing a new VOLK Library: ++.P ++Run the command "volk_modtool -i". This will ask you three questions: ++.P ++ name: // the name to give your VOLK library: volk_ ++ destination: // directory new source tree is built under -- must exists. ++ // It will create /volk_ ++ base: // the directory containing the original VOLK source code ++.P ++This will build a new skeleton directory in the destination provided ++with the name volk_. It will contain the necessary structure to ++build: ++.P ++ mkdir build ++ cd build ++ cmake -DCMAKE_INSTALL_PREFIX=/opt/volk ../ ++ make ++ sudo make install ++.P ++Right now, the library is empty and contains no kernels. Kernels can ++be added from another VOLK library using the '-a' option. If not ++specified, the kernel will be extracted from the base VOLK ++directory. Using the '-b' allows us to specify another VOLK library to ++use for this purpose. ++.P ++ volk_modtool -a -n 32fc_x2_conjugate_dot_prod_32fc ++.P ++This will put the code for the new kernel into ++/volk_/kernels/volk_/ ++.P ++Other kernels must be added by hand. See the following webpages for ++more information about creating VOLK kernels: ++ http://gnuradio.org/doc/doxygen/volk_guide.html ++ http://gnuradio.org/redmine/projects/gnuradio/wiki/Volk ++.P ++====================================================================== ++.P ++.SH OPTIONS ++.P ++Options for Adding and Removing Kernels: ++ -a, --add_kernel ++ Add kernel from existing VOLK module. Uses the base VOLK module ++ unless -b is used. Use -n to specify the kernel name. ++ Requires: -n. ++ Optional: -b ++.P ++ -A, --add_all_kernels ++ Add all kernels from existing VOLK module. Uses the base VOLK ++ module unless -b is used. ++ Optional: -b ++.P ++ -x, --remove_kernel ++ Remove kernel from module. ++ Required: -n. ++ Optional: -b ++.P ++Options for Listing Kernels: ++ -l, --list ++ Lists all kernels available in the base VOLK module. ++.P ++ -k, --kernels ++ Lists all kernels in this VOLK module. ++.P ++ -r, --remote-list ++ Lists all kernels in another VOLK module that is specified ++ using the -b option. diff --cc debian/volk_profile.1 index 0000000,0000000..405facb new file mode 100644 --- /dev/null +++ b/debian/volk_profile.1 @@@ -1,0 -1,0 +1,5 @@@ ++.TH UHD_FFT "1" "March 2012" "volk_profile 3.5" "User Commands" ++.SH NAME ++volk_profile \- Quality Assurance application for libvolk functions ++.SH DESCRIPTION ++Writes profile results to a file. diff --cc debian/watch index 0000000,0000000..d755268 new file mode 100644 --- /dev/null +++ b/debian/watch @@@ -1,0 -1,0 +1,3 @@@ ++version=3 ++opts="pgpsigurlmangle=s/$/.asc/" \ ++ http://libvolk.org/releases/volk-(\d\S*)\.tar\.gz