diff options
107 files changed, 5075 insertions, 261 deletions
diff --git a/docs/doxygen/other/main_page.dox b/docs/doxygen/other/main_page.dox index 0caa0b20f..68b098943 100644 --- a/docs/doxygen/other/main_page.dox +++ b/docs/doxygen/other/main_page.dox @@ -38,4 +38,14 @@ More details on packages in GNU Radio: \li \ref page_uhd \li \ref page_vocoder \li \ref page_pfb + +\section volk_main Using Volk in GNU Radio + +The \ref volk_guide page provides an overview of how to incorporate +and use Volk in GNU Radio blocks. + +Many blocks have already been converted to use Volk in their calls, so +they can also serve as examples. See the gr_complex_to_xxx.h file for +examples of various blocks that make use of Volk. + */ diff --git a/docs/doxygen/other/volk_guide.dox b/docs/doxygen/other/volk_guide.dox new file mode 100644 index 000000000..d898f3864 --- /dev/null +++ b/docs/doxygen/other/volk_guide.dox @@ -0,0 +1,161 @@ +/*! \page volk_guide Instructions for using Volk in GNU Radio + +\section volk_intro Introduction + +Volk is the Vector-Optimized Library of Kernels. It is a library that +contains kernels of hand-written SIMD code for different mathematical +operations. Since each SIMD architecture can be greatly different and +no compiler has yet come along to handle vectorization properly or +highly efficiently, Volk approaches the problem differently. For each +architecture or platform that a developer wishes to vectorize for, a +new proto-kernel is added to Volk. At runtime, Volk will select the +correct proto-kernel. In this way, the users of Volk call a kernel for +performing the operation that is platform/architecture agnostic. This +allows us to write portable SIMD code. + +Volk kernels are always defined with a 'generic' proto-kernel, which +is written in plain C. With the generic kernel, the kernel becomes +portable to any platform. Kernels are then extended by adding +proto-kernels for new platforms in which they are desired. + +A good example of a Volk kernel with multiple proto-kernels defined is +the volk_32f_s32f_multiply_32f_a. This kernel implements a scalar +multiplication of a vector of floating point numbers (each item in the +vector is multiplied by the same value). This kernel has the following +proto-kernels that are defined for 'generic,' 'avx,' 'sse,' and 'orc.' + +\code + void volk_32f_s32f_multiply_32f_a_generic + void volk_32f_s32f_multiply_32f_a_sse + void volk_32f_s32f_multiply_32f_a_avx + void volk_32f_s32f_multiply_32f_a_orc +\endcode + +These proto-kernels means that on platforms with AVX support, Volk can +select this option or the SSE option, depending on which is faster. On +other platforms, the ORC SIMD compiler might provide a solution. If +all else fails, Volk can fall back on the generic proto-kernel, which +will always work. + +Just a note on ORC. ORC is a SIMD compiler library that uses a generic +assembly-like language for SIMD commands. Based on the available SIMD +architecture of a system, it will try and compile a good +solution. Tests show that the results of ORC proto-kernels are +generally better than the generic versions but often not as good as +the hand-tuned proto-kernels for a specific SIMD architecture. This +is, of course, to be expected, and ORC provides a nice intermediary +step to performance improvements until a specific hand-tuned +proto-kernel can be made for a given platform. + +See <a +href="http://gnuradio.org/redmine/projects/gnuradio/wiki/Volk">Volk on +gnuradio.org</a> for details on the Volk naming scheme. + + +\section volk_alignment Setting and Using Memory Alignment Information + +For Volk to work as best as possible, we want to use memory-aligned +SIMD calls, which means we have to have some way of knowing and +controlling the alignment of the buffers passed to gr_block's work +function. We set the alignment requirement for SIMD aligned memory +calls with: + +\code + const int alignment_multiple = + volk_get_alignment() / output_item_size; + set_alignment(alignment_multiple); +\endcode + +The Volk function 'volk_get_alignment' provides the alignment of the +the machine architecture. We then base the alignment on the number of +output items required to maintain the alignment, so we divide the +number of alignment bytes by the number of bytes in an output items +(sizeof(float), sizeof(gr_complex), etc.). This value is then set per +block with the 'set_alignment' function. + +Because the scheduler tries to optimize throughput, the number of +items available per call to work will change and depends on the +availability of the read and write buffers. This means that it +sometimes cannot produce a buffer that is properly memory +aligned. This is an inevitable consequence of the scheduler +system. Instead of requiring alignment, the scheduler enforces the +alignment as much as possible, and when a buffer becomes unaligned, +the scheduler will work to correct it as much as possible. If a +block's buffers are unaligned, then, the scheduler sets a flag to +indicate as much so that the block can then decide what best to +do. The next section discusses the use of the aligned/unaligned +information in a gr_block's work function. + + +\section volk_work Using Alignment Properties in Work() + +The buffers passed to work/general_work in a gr_block are not +guaranteed to be aligned, but they will mostly be aligned whenever +possible. When not aligned, the 'is_unaligned()' flag will be set. So +a block can know if its buffers are aligned and make the right +decisions. This looks like: + +\code +int +gr_some_block::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + const float *in = (const float *) input_items[0]; + float *out = (float *) output_items[0]; + + if(is_unaligned()) { + // do something with unaligned data. This can either be a manual + // handling of the items or a call to an unaligned Volk function. + volk_32f_something_32f_u(out, in, noutput_items); + } + else { + // Buffers are aligned; can call the aligned Volk function. + volk_32f_something_32f_a(out, in, noutput_items); + } + + return noutput_items; +} +\endcode + + + +\section volk_tuning Tuning Volk Performance + +VOLK comes with a profiler that will build a config file for the best +SIMD architecture for your processor. Run volk_profile that is +installed into $PREFIX/bin. This program tests all known VOLK kernels +for each architecture supported by the processor. When finished, it +will write to $HOME/.volk/volk_config the best architecture for the +VOLK function. This file is read when using a function to know the +best version of the function to execute. + +\subsection volk_hand_tuning Hand-Tuning Performance + +If you know a particular architecture works best for your processor, +you can specify the particular architecture to use in the VOLK +preferences file: $HOME/.volk/volk_config + +The file looks like: + +\code + volk_<FUNCTION_NAME> <ARCHITECTURE> +\endcode + +Where the "FUNCTION_NAME" is the particular function that you want to +over-ride the default value and "ARCHITECTURE" is the VOLK SIMD +architecture to use (generic, sse, sse2, sse3, avx, etc.). For +example, the following config file tells VOLK to use SSE3 for the +aligned and unaligned versions of a function that multiplies two +complex streams together. + +\code + volk_32fc_x2_multiply_32fc_a sse3 + volk_32fc_x2_multiply_32fc_u sse3 +\endcode + +\b Tip: if benchmarking GNU Radio blocks, it can be useful to have a +volk_config file that sets all architectures to 'generic' as a way to +test the vectorized versus non-vectorized implementations. + +*/ diff --git a/gnuradio-core/src/lib/CMakeLists.txt b/gnuradio-core/src/lib/CMakeLists.txt index 3fbe8f807..86f88242c 100644 --- a/gnuradio-core/src/lib/CMakeLists.txt +++ b/gnuradio-core/src/lib/CMakeLists.txt @@ -42,6 +42,7 @@ list(APPEND test_gnuradio_core_sources bug_work_around_6.cc) # Setup the include and linker paths ######################################################################## include_directories(${GNURADIO_CORE_INCLUDE_DIRS}) +include_directories(${VOLK_INCLUDE_DIRS}) include_directories(${Boost_INCLUDE_DIRS}) link_directories(${Boost_LIBRARY_DIRS}) @@ -73,6 +74,9 @@ if(LINUX) list(APPEND gnuradio_core_libs rt) endif() +# Link against libvolk +list(APPEND gnuradio_core_libs volk) + add_library(gnuradio-core SHARED ${gnuradio_core_sources}) target_link_libraries(gnuradio-core ${gnuradio_core_libs}) GR_LIBRARY_FOO(gnuradio-core RUNTIME_COMPONENT "core_runtime" DEVEL_COMPONENT "core_devel") diff --git a/gnuradio-core/src/lib/filter/gr_fft_filter_ccc.cc b/gnuradio-core/src/lib/filter/gr_fft_filter_ccc.cc index 63b52411e..5968e487e 100644 --- a/gnuradio-core/src/lib/filter/gr_fft_filter_ccc.cc +++ b/gnuradio-core/src/lib/filter/gr_fft_filter_ccc.cc @@ -30,7 +30,6 @@ #endif #include <gr_fft_filter_ccc.h> -//#include <gri_fft_filter_ccc_sse.h> #include <gri_fft_filter_ccc_generic.h> #include <gr_io_signature.h> #include <gri_fft.h> @@ -61,11 +60,13 @@ gr_fft_filter_ccc::gr_fft_filter_ccc (int decimation, d_updated(false) { set_history(1); + #if 1 // don't enable the sse version until handling it is worked out d_filter = new gri_fft_filter_ccc_generic(decimation, taps, nthreads); #else d_filter = new gri_fft_filter_ccc_sse(decimation, taps); #endif + d_new_taps = taps; d_nsamples = d_filter->set_taps(taps); set_output_multiple(d_nsamples); diff --git a/gnuradio-core/src/lib/filter/gr_fft_filter_fff.cc b/gnuradio-core/src/lib/filter/gr_fft_filter_fff.cc index 0e4072f63..e4a669150 100644 --- a/gnuradio-core/src/lib/filter/gr_fft_filter_fff.cc +++ b/gnuradio-core/src/lib/filter/gr_fft_filter_fff.cc @@ -26,7 +26,6 @@ #include <gr_fft_filter_fff.h> #include <gri_fft_filter_fff_generic.h> -//#include <gri_fft_filter_fff_sse.h> #include <gr_io_signature.h> #include <assert.h> #include <stdexcept> @@ -59,6 +58,7 @@ gr_fft_filter_fff::gr_fft_filter_fff (int decimation, #else d_filter = new gri_fft_filter_fff_sse(decimation, taps); #endif + d_new_taps = taps; d_nsamples = d_filter->set_taps(taps); set_output_multiple(d_nsamples); diff --git a/gnuradio-core/src/lib/filter/gri_fft_filter_ccc_generic.cc b/gnuradio-core/src/lib/filter/gri_fft_filter_ccc_generic.cc index 47e18b3e3..c894d62aa 100644 --- a/gnuradio-core/src/lib/filter/gri_fft_filter_ccc_generic.cc +++ b/gnuradio-core/src/lib/filter/gri_fft_filter_ccc_generic.cc @@ -26,6 +26,7 @@ #include <gri_fft_filter_ccc_generic.h> #include <gri_fft.h> +#include <volk/volk.h> #include <assert.h> #include <stdexcept> #include <cstdio> @@ -154,9 +155,7 @@ gri_fft_filter_ccc_generic::filter (int nitems, const gr_complex *input, gr_comp gr_complex *b = &d_xformed_taps[0]; gr_complex *c = d_invfft->get_inbuf(); - for (j = 0; j < d_fftsize; j+=1) { // filter in the freq domain - c[j] = a[j] * b[j]; - } + volk_32fc_x2_multiply_32fc_a(c, a, b, d_fftsize); d_invfft->execute(); // compute inv xform diff --git a/gnuradio-core/src/lib/filter/gri_fft_filter_fff_generic.cc b/gnuradio-core/src/lib/filter/gri_fft_filter_fff_generic.cc index 598bf268c..e7f66b714 100644 --- a/gnuradio-core/src/lib/filter/gri_fft_filter_fff_generic.cc +++ b/gnuradio-core/src/lib/filter/gri_fft_filter_fff_generic.cc @@ -26,6 +26,7 @@ #include <gri_fft_filter_fff_generic.h> #include <gri_fft.h> +#include <volk/volk.h> #include <assert.h> #include <stdexcept> #include <cstdio> @@ -141,9 +142,7 @@ gri_fft_filter_fff_generic::filter (int nitems, const float *input, float *outpu gr_complex *b = &d_xformed_taps[0]; gr_complex *c = d_invfft->get_inbuf(); - for (j = 0; j < d_fftsize/2+1; j++) { // filter in the freq domain - c[j] = a[j] * b[j]; - } + volk_32fc_x2_multiply_32fc_a(c, a, b, d_fftsize/2+1); d_invfft->execute(); // compute inv xform diff --git a/gnuradio-core/src/lib/general/CMakeLists.txt b/gnuradio-core/src/lib/general/CMakeLists.txt index 6ecaa930a..2bc639cd3 100644 --- a/gnuradio-core/src/lib/general/CMakeLists.txt +++ b/gnuradio-core/src/lib/general/CMakeLists.txt @@ -1,4 +1,4 @@ -# Copyright 2010-2011 Free Software Foundation, Inc. +# Copyright 2010-2012 Free Software Foundation, Inc. # # This file is part of GNU Radio # @@ -178,6 +178,7 @@ endif(ENABLE_PYTHON) ######################################################################## set(gr_core_general_triple_threats complex_vec_test + gr_add_ff gr_additive_scrambler_bb gr_agc_cc gr_agc_ff @@ -187,6 +188,7 @@ set(gr_core_general_triple_threats gr_bin_statistics_f gr_bytes_to_syms gr_char_to_float + gr_char_to_short gr_check_counting_s gr_check_lfsr_32k_s gr_complex_to_interleaved_short @@ -229,6 +231,11 @@ set(gr_core_general_triple_threats gr_kludge_copy gr_lfsr_32k_source_s gr_map_bb + gr_multiply_cc + gr_multiply_ff + gr_multiply_const_cc + gr_multiply_const_ff + gr_multiply_conjugate_cc gr_nlog10_ff gr_nop gr_null_sink @@ -256,6 +263,7 @@ set(gr_core_general_triple_threats gr_rms_ff gr_repeat gr_short_to_float + gr_short_to_char gr_simple_correlator gr_simple_framer gr_simple_squelch_cc diff --git a/gnuradio-core/src/lib/general/general.i b/gnuradio-core/src/lib/general/general.i index 5a701bf80..89738b01a 100644 --- a/gnuradio-core/src/lib/general/general.i +++ b/gnuradio-core/src/lib/general/general.i @@ -44,8 +44,10 @@ #include <gr_float_to_char.h> #include <gr_float_to_uchar.h> #include <gr_short_to_float.h> +#include <gr_short_to_char.h> #include <gr_int_to_float.h> #include <gr_char_to_float.h> +#include <gr_char_to_short.h> #include <gr_uchar_to_float.h> #include <gr_frequency_modulator_fc.h> #include <gr_phase_modulator_fc.h> @@ -104,6 +106,11 @@ #include <gr_diff_decoder_bb.h> #include <gr_framer_sink_1.h> #include <gr_map_bb.h> +#include <gr_multiply_cc.h> +#include <gr_multiply_ff.h> +#include <gr_multiply_const_cc.h> +#include <gr_multiply_const_ff.h> +#include <gr_multiply_conjugate_cc.h> #include <gr_feval.h> #include <gr_pwr_squelch_cc.h> #include <gr_pwr_squelch_ff.h> @@ -134,6 +141,7 @@ #include <gr_burst_tagger.h> #include <gr_cpm.h> #include <gr_correlate_access_code_tag_bb.h> +#include <gr_add_ff.h> %} %include "gri_control_loop.i" @@ -158,8 +166,10 @@ %include "gr_float_to_char.i" %include "gr_float_to_uchar.i" %include "gr_short_to_float.i" +%include "gr_short_to_char.i" %include "gr_int_to_float.i" %include "gr_char_to_float.i" +%include "gr_char_to_short.i" %include "gr_uchar_to_float.i" %include "gr_frequency_modulator_fc.i" %include "gr_phase_modulator_fc.i" @@ -218,6 +228,11 @@ %include "gr_diff_decoder_bb.i" %include "gr_framer_sink_1.i" %include "gr_map_bb.i" +%include "gr_multiply_cc.i" +%include "gr_multiply_ff.i" +%include "gr_multiply_const_cc.i" +%include "gr_multiply_const_ff.i" +%include "gr_multiply_conjugate_cc.i" %include "gr_feval.i" %include "gr_pwr_squelch_cc.i" %include "gr_pwr_squelch_ff.i" @@ -248,3 +263,4 @@ %include "gr_burst_tagger.i" %include "gr_cpm.i" %include "gr_correlate_access_code_tag_bb.i" +%include "gr_add_ff.i" diff --git a/gnuradio-core/src/lib/general/general_generated.i b/gnuradio-core/src/lib/general/general_generated.i index a41f30a3d..e12f2b0ec 100644 --- a/gnuradio-core/src/lib/general/general_generated.i +++ b/gnuradio-core/src/lib/general/general_generated.i @@ -12,7 +12,6 @@ #include <gr_add_const_vff.h> #include <gr_add_const_vii.h> #include <gr_add_const_vss.h> -#include <gr_add_ff.h> #include <gr_add_ii.h> #include <gr_add_ss.h> #include <gr_add_vcc.h> @@ -29,16 +28,12 @@ #include <gr_divide_ff.h> #include <gr_divide_ii.h> #include <gr_divide_ss.h> -#include <gr_multiply_cc.h> -#include <gr_multiply_const_cc.h> -#include <gr_multiply_const_ff.h> #include <gr_multiply_const_ii.h> #include <gr_multiply_const_ss.h> #include <gr_multiply_const_vcc.h> #include <gr_multiply_const_vff.h> #include <gr_multiply_const_vii.h> #include <gr_multiply_const_vss.h> -#include <gr_multiply_ff.h> #include <gr_multiply_ii.h> #include <gr_multiply_ss.h> #include <gr_multiply_vcc.h> @@ -89,7 +84,6 @@ %include <gr_add_const_vff.i> %include <gr_add_const_vii.i> %include <gr_add_const_vss.i> -%include <gr_add_ff.i> %include <gr_add_ii.i> %include <gr_add_ss.i> %include <gr_add_vcc.i> @@ -106,16 +100,12 @@ %include <gr_divide_ff.i> %include <gr_divide_ii.i> %include <gr_divide_ss.i> -%include <gr_multiply_cc.i> -%include <gr_multiply_const_cc.i> -%include <gr_multiply_const_ff.i> %include <gr_multiply_const_ii.i> %include <gr_multiply_const_ss.i> %include <gr_multiply_const_vcc.i> %include <gr_multiply_const_vff.i> %include <gr_multiply_const_vii.i> %include <gr_multiply_const_vss.i> -%include <gr_multiply_ff.i> %include <gr_multiply_ii.i> %include <gr_multiply_ss.i> %include <gr_multiply_vcc.i> diff --git a/gnuradio-core/src/lib/general/gr_add_ff.cc b/gnuradio-core/src/lib/general/gr_add_ff.cc new file mode 100644 index 000000000..fc5455c98 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_add_ff.cc @@ -0,0 +1,66 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <gr_add_ff.h> +#include <gr_io_signature.h> +#include <volk/volk.h> + +gr_add_ff_sptr +gr_make_add_ff(size_t vlen) +{ + return gnuradio::get_initial_sptr(new gr_add_ff(vlen)); +} + +gr_add_ff::gr_add_ff (size_t vlen) + : gr_sync_block("add_ff", + gr_make_io_signature (1, -1, sizeof(float)*vlen), + gr_make_io_signature (1, 1, sizeof(float)*vlen)), + d_vlen (vlen) +{ + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); +} + +int +gr_add_ff::work(int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + float *out = (float *) output_items[0]; + int noi = d_vlen*noutput_items; + + memcpy(out, input_items[0], noi*sizeof(float)); + if(is_unaligned()) { + for(size_t i = 1; i < input_items.size(); i++) + volk_32f_x2_add_32f_u(out, out, (const float*)input_items[i], noi); + } + else { + for(size_t i = 1; i < input_items.size(); i++) + volk_32f_x2_add_32f_a(out, out, (const float*)input_items[i], noi); + } + return noutput_items; +} diff --git a/gnuradio-core/src/lib/general/gr_add_ff.h b/gnuradio-core/src/lib/general/gr_add_ff.h new file mode 100644 index 000000000..6421f8da2 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_add_ff.h @@ -0,0 +1,56 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifndef INCLUDED_GR_ADD_FF_H +#define INCLUDED_GR_ADD_FF_H + +#include <gr_core_api.h> +#include <gr_sync_block.h> + +class gr_add_ff; +typedef boost::shared_ptr<gr_add_ff> gr_add_ff_sptr; + +GR_CORE_API gr_add_ff_sptr +gr_make_add_ff (size_t vlen=1); + +/*! + * \brief Add streams of complex values + * \ingroup math_blk + */ + +class GR_CORE_API gr_add_ff : public gr_sync_block +{ + private: + friend GR_CORE_API gr_add_ff_sptr + gr_make_add_ff (size_t vlen); + gr_add_ff (size_t vlen); + + size_t d_vlen; + + public: + virtual int work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items); +}; + + +#endif /* INCLUDED_GR_ADD_FF_H */ diff --git a/gnuradio-core/src/lib/general/gr_add_ff.i b/gnuradio-core/src/lib/general/gr_add_ff.i new file mode 100644 index 000000000..3c30640b1 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_add_ff.i @@ -0,0 +1,32 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +GR_SWIG_BLOCK_MAGIC(gr,add_ff) + +gr_add_ff_sptr +gr_make_add_ff (size_t vlen=1); + +class gr_add_ff : public gr_sync_block +{ +public: + +}; diff --git a/gnuradio-core/src/lib/general/gr_char_to_float.cc b/gnuradio-core/src/lib/general/gr_char_to_float.cc index e68f8d208..ffe8ee4a1 100644 --- a/gnuradio-core/src/lib/general/gr_char_to_float.cc +++ b/gnuradio-core/src/lib/general/gr_char_to_float.cc @@ -26,30 +26,47 @@ #include <gr_char_to_float.h> #include <gr_io_signature.h> -#include <gri_char_to_float.h> +#include <volk/volk.h> gr_char_to_float_sptr -gr_make_char_to_float () +gr_make_char_to_float (size_t vlen, float scale) { - return gnuradio::get_initial_sptr(new gr_char_to_float ()); + return gnuradio::get_initial_sptr(new gr_char_to_float (vlen, scale)); } -gr_char_to_float::gr_char_to_float () +gr_char_to_float::gr_char_to_float (size_t vlen, float scale) : gr_sync_block ("gr_char_to_float", - gr_make_io_signature (1, 1, sizeof (char)), - gr_make_io_signature (1, 1, sizeof (float))) + gr_make_io_signature (1, 1, sizeof (char)*vlen), + gr_make_io_signature (1, 1, sizeof (float)*vlen)), + d_vlen(vlen), d_scale(scale) { + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); +} + +float +gr_char_to_float::scale() const +{ + return d_scale; +} + +void +gr_char_to_float::set_scale(float scale) +{ + d_scale = scale; } int gr_char_to_float::work (int noutput_items, - gr_vector_const_void_star &input_items, - gr_vector_void_star &output_items) + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) { - const char *in = (const char *) input_items[0]; + const int8_t *in = (const int8_t *) input_items[0]; float *out = (float *) output_items[0]; - gri_char_to_float (in, out, noutput_items); - + // Note: the unaligned benchmarked much faster than the aligned + volk_8i_s32f_convert_32f_u(out, in, d_scale, d_vlen*noutput_items); + return noutput_items; } diff --git a/gnuradio-core/src/lib/general/gr_char_to_float.h b/gnuradio-core/src/lib/general/gr_char_to_float.h index b20d2066f..4ad8e59a8 100644 --- a/gnuradio-core/src/lib/general/gr_char_to_float.h +++ b/gnuradio-core/src/lib/general/gr_char_to_float.h @@ -30,7 +30,7 @@ class gr_char_to_float; typedef boost::shared_ptr<gr_char_to_float> gr_char_to_float_sptr; GR_CORE_API gr_char_to_float_sptr -gr_make_char_to_float (); +gr_make_char_to_float (size_t vlen=1, float scale=1); /*! * \brief Convert stream of chars to a stream of float @@ -39,10 +39,18 @@ gr_make_char_to_float (); class GR_CORE_API gr_char_to_float : public gr_sync_block { - friend GR_CORE_API gr_char_to_float_sptr gr_make_char_to_float (); - gr_char_to_float (); + private: + friend GR_CORE_API gr_char_to_float_sptr + gr_make_char_to_float (size_t vlen, float scale); + gr_char_to_float (size_t vlen, float scale); + + size_t d_vlen; + float d_scale; public: + float scale() const; + void set_scale(float scale); + virtual int work (int noutput_items, gr_vector_const_void_star &input_items, gr_vector_void_star &output_items); diff --git a/gnuradio-core/src/lib/general/gr_char_to_float.i b/gnuradio-core/src/lib/general/gr_char_to_float.i index 0403b621d..65ad861f2 100644 --- a/gnuradio-core/src/lib/general/gr_char_to_float.i +++ b/gnuradio-core/src/lib/general/gr_char_to_float.i @@ -22,9 +22,12 @@ GR_SWIG_BLOCK_MAGIC(gr,char_to_float) -gr_char_to_float_sptr gr_make_char_to_float (); +gr_char_to_float_sptr +gr_make_char_to_float (size_t vlen=1, float scale=1); class gr_char_to_float : public gr_sync_block { - gr_char_to_float (); +public: + float scale() const; + void set_scale(float scale); }; diff --git a/gnuradio-core/src/lib/general/gr_char_to_short.cc b/gnuradio-core/src/lib/general/gr_char_to_short.cc new file mode 100644 index 000000000..8b6cd0be1 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_char_to_short.cc @@ -0,0 +1,64 @@ +/* -*- c++ -*- */ +/* + * Copyright 2011,2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <gr_char_to_short.h> +#include <gr_io_signature.h> +#include <volk/volk.h> + +gr_char_to_short_sptr +gr_make_char_to_short (size_t vlen) +{ + return gnuradio::get_initial_sptr(new gr_char_to_short (vlen)); +} + +gr_char_to_short::gr_char_to_short (size_t vlen) + : gr_sync_block ("gr_char_to_short", + gr_make_io_signature (1, 1, sizeof (char)*vlen), + gr_make_io_signature (1, 1, sizeof (short)*vlen)), + d_vlen(vlen) +{ + const int alignment_multiple = + volk_get_alignment() / sizeof(char); + set_alignment(alignment_multiple); +} + +int +gr_char_to_short::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + const int8_t *in = (const int8_t *) input_items[0]; + int16_t *out = (int16_t *) output_items[0]; + + if(is_unaligned()) { + volk_8i_convert_16i_u(out, in, d_vlen*noutput_items); + } + else { + volk_8i_convert_16i_a(out, in, d_vlen*noutput_items); + } + + return noutput_items; +} diff --git a/gnuradio-core/src/lib/general/gr_char_to_short.h b/gnuradio-core/src/lib/general/gr_char_to_short.h new file mode 100644 index 000000000..58f9a62b0 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_char_to_short.h @@ -0,0 +1,56 @@ +/* -*- c++ -*- */ +/* + * Copyright 2011,2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifndef INCLUDED_GR_CHAR_TO_SHORT_H +#define INCLUDED_GR_CHAR_TO_SHORT_H + +#include <gr_core_api.h> +#include <gr_sync_block.h> + +class gr_char_to_short; +typedef boost::shared_ptr<gr_char_to_short> gr_char_to_short_sptr; + +GR_CORE_API gr_char_to_short_sptr +gr_make_char_to_short (size_t vlen=1); + +/*! + * \brief Convert stream of chars to a stream of float + * \ingroup converter_blk + */ + +class GR_CORE_API gr_char_to_short : public gr_sync_block +{ + private: + friend GR_CORE_API gr_char_to_short_sptr + gr_make_char_to_short (size_t vlen); + gr_char_to_short (size_t vlen); + + size_t d_vlen; + + public: + virtual int work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items); +}; + + +#endif /* INCLUDED_GR_CHAR_TO_SHORT_H */ diff --git a/gnuradio-core/src/lib/general/gr_char_to_short.i b/gnuradio-core/src/lib/general/gr_char_to_short.i new file mode 100644 index 000000000..48ddbf26b --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_char_to_short.i @@ -0,0 +1,30 @@ +/* -*- c++ -*- */ +/* + * Copyright 2011 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +GR_SWIG_BLOCK_MAGIC(gr,char_to_short) + +gr_char_to_short_sptr gr_make_char_to_short (size_t vlen=1); + +class gr_char_to_short : public gr_sync_block +{ + +}; diff --git a/gnuradio-core/src/lib/general/gr_complex_to_xxx.cc b/gnuradio-core/src/lib/general/gr_complex_to_xxx.cc index a59c127f3..108a92835 100644 --- a/gnuradio-core/src/lib/general/gr_complex_to_xxx.cc +++ b/gnuradio-core/src/lib/general/gr_complex_to_xxx.cc @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004,2008,2010 Free Software Foundation, Inc. + * Copyright 2004,2008,2010,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -27,6 +27,7 @@ #include <gr_complex_to_xxx.h> #include <gr_io_signature.h> #include <gr_math.h> +#include <volk/volk.h> // ---------------------------------------------------------------- @@ -42,6 +43,9 @@ gr_complex_to_float::gr_complex_to_float (unsigned int vlen) gr_make_io_signature (1, 2, sizeof (float) * vlen)), d_vlen(vlen) { + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); } int @@ -56,16 +60,26 @@ gr_complex_to_float::work (int noutput_items, switch (output_items.size ()){ case 1: - for (int i = 0; i < noi; i++){ - out0[i] = in[i].real (); + if(is_unaligned()) { + for (int i = 0; i < noi; i++){ + out0[i] = in[i].real (); + } + } + else { + volk_32fc_deinterleave_real_32f_a(out0, in, noi); } break; case 2: out1 = (float *) output_items[1]; - for (int i = 0; i < noi; i++){ - out0[i] = in[i].real (); - out1[i] = in[i].imag (); + if(is_unaligned()) { + for (int i = 0; i < noi; i++){ + out0[i] = in[i].real (); + out1[i] = in[i].imag (); + } + } + else { + volk_32fc_deinterleave_32f_x2_a(out0, out1, in, noi); } break; @@ -90,6 +104,9 @@ gr_complex_to_real::gr_complex_to_real (unsigned int vlen) gr_make_io_signature (1, 1, sizeof (float) * vlen)), d_vlen(vlen) { + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); } int @@ -101,9 +118,15 @@ gr_complex_to_real::work (int noutput_items, float *out = (float *) output_items[0]; int noi = noutput_items * d_vlen; - for (int i = 0; i < noi; i++){ - out[i] = in[i].real (); + if(is_unaligned()) { + for (int i = 0; i < noi; i++){ + out[i] = in[i].real (); + } + } + else { + volk_32fc_deinterleave_real_32f_a(out, in, noi); } + return noutput_items; } @@ -121,6 +144,9 @@ gr_complex_to_imag::gr_complex_to_imag (unsigned int vlen) gr_make_io_signature (1, 1, sizeof (float) * vlen)), d_vlen(vlen) { + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); } int @@ -132,9 +158,15 @@ gr_complex_to_imag::work (int noutput_items, float *out = (float *) output_items[0]; int noi = noutput_items * d_vlen; - for (int i = 0; i < noi; i++){ - out[i] = in[i].imag (); + if(is_unaligned()) { + for (int i = 0; i < noi; i++){ + out[i] = in[i].imag (); + } + } + else { + volk_32fc_deinterleave_imag_32f_a(out, in, noi); } + return noutput_items; } @@ -152,6 +184,9 @@ gr_complex_to_mag::gr_complex_to_mag (unsigned int vlen) gr_make_io_signature (1, 1, sizeof (float) * vlen)), d_vlen(vlen) { + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); } int @@ -163,9 +198,9 @@ gr_complex_to_mag::work (int noutput_items, float *out = (float *) output_items[0]; int noi = noutput_items * d_vlen; - for (int i = 0; i < noi; i++){ - out[i] = std::abs (in[i]); - } + // turned out to be faster than aligned/unaligned switching + volk_32fc_magnitude_32f_u(out, in, noi); + return noutput_items; } @@ -183,6 +218,9 @@ gr_complex_to_mag_squared::gr_complex_to_mag_squared (unsigned int vlen) gr_make_io_signature (1, 1, sizeof (float) * vlen)), d_vlen(vlen) { + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); } int @@ -194,11 +232,13 @@ gr_complex_to_mag_squared::work (int noutput_items, float *out = (float *) output_items[0]; int noi = noutput_items * d_vlen; - for (int i = 0; i < noi; i++){ - const float __x = in[i].real(); - const float __y = in[i].imag(); - out[i] = __x * __x + __y * __y; + if(is_unaligned()) { + volk_32fc_magnitude_squared_32f_u(out, in, noi); + } + else { + volk_32fc_magnitude_squared_32f_a(out, in, noi); } + return noutput_items; } @@ -216,6 +256,9 @@ gr_complex_to_arg::gr_complex_to_arg (unsigned int vlen) gr_make_io_signature (1, 1, sizeof (float) * vlen)), d_vlen(vlen) { + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); } int @@ -227,9 +270,11 @@ gr_complex_to_arg::work (int noutput_items, float *out = (float *) output_items[0]; int noi = noutput_items * d_vlen; + // The fast_atan2f is faster than Volk for (int i = 0; i < noi; i++){ // out[i] = std::arg (in[i]); out[i] = gr_fast_atan2f(in[i]); } + return noutput_items; } diff --git a/gnuradio-core/src/lib/general/gr_complex_to_xxx.h b/gnuradio-core/src/lib/general/gr_complex_to_xxx.h index 166403259..232071323 100644 --- a/gnuradio-core/src/lib/general/gr_complex_to_xxx.h +++ b/gnuradio-core/src/lib/general/gr_complex_to_xxx.h @@ -109,10 +109,11 @@ class GR_CORE_API gr_complex_to_imag : public gr_sync_block */ class GR_CORE_API gr_complex_to_mag : public gr_sync_block { - friend GR_CORE_API gr_complex_to_mag_sptr gr_make_complex_to_mag (unsigned int vlen); + friend GR_CORE_API gr_complex_to_mag_sptr + gr_make_complex_to_mag (unsigned int vlen); gr_complex_to_mag (unsigned int vlen); - unsigned int d_vlen; + unsigned int d_vlen; public: virtual int work (int noutput_items, diff --git a/gnuradio-core/src/lib/general/gr_conjugate_cc.cc b/gnuradio-core/src/lib/general/gr_conjugate_cc.cc index 59c3bae89..d2b20ffe6 100644 --- a/gnuradio-core/src/lib/general/gr_conjugate_cc.cc +++ b/gnuradio-core/src/lib/general/gr_conjugate_cc.cc @@ -28,6 +28,7 @@ #include <gr_conjugate_cc.h> #include <gr_io_signature.h> +#include <volk/volk.h> gr_conjugate_cc_sptr gr_make_conjugate_cc () @@ -40,6 +41,9 @@ gr_conjugate_cc::gr_conjugate_cc () gr_make_io_signature (1, 1, sizeof (gr_complex)), gr_make_io_signature (1, 1, sizeof (gr_complex))) { + const int alignment_multiple = + volk_get_alignment() / sizeof(gr_complex); + set_alignment(alignment_multiple); } int @@ -50,26 +54,11 @@ gr_conjugate_cc::work (int noutput_items, gr_complex *iptr = (gr_complex *) input_items[0]; gr_complex *optr = (gr_complex *) output_items[0]; - int size = noutput_items; - - while (size >= 8){ - optr[0] = conj(iptr[0]); - optr[1] = conj(iptr[1]); - optr[2] = conj(iptr[2]); - optr[3] = conj(iptr[3]); - optr[4] = conj(iptr[4]); - optr[5] = conj(iptr[5]); - optr[6] = conj(iptr[6]); - optr[7] = conj(iptr[7]); - size -= 8; - optr += 8; - iptr += 8; + if(is_unaligned()) { + volk_32fc_conjugate_32fc_u(optr, iptr, noutput_items); } - - while (size-- > 0) { - *optr = conj(*iptr); - iptr++; - optr++; + else { + volk_32fc_conjugate_32fc_a(optr, iptr, noutput_items); } return noutput_items; diff --git a/gnuradio-core/src/lib/general/gr_float_to_char.cc b/gnuradio-core/src/lib/general/gr_float_to_char.cc index 88b9d276e..14635ff71 100644 --- a/gnuradio-core/src/lib/general/gr_float_to_char.cc +++ b/gnuradio-core/src/lib/general/gr_float_to_char.cc @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004,2010 Free Software Foundation, Inc. + * Copyright 2004,2010,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -26,19 +26,35 @@ #include <gr_float_to_char.h> #include <gr_io_signature.h> -#include <gri_float_to_char.h> +#include <volk/volk.h> gr_float_to_char_sptr -gr_make_float_to_char () +gr_make_float_to_char (size_t vlen, float scale) { - return gnuradio::get_initial_sptr(new gr_float_to_char ()); + return gnuradio::get_initial_sptr(new gr_float_to_char (vlen, scale)); } -gr_float_to_char::gr_float_to_char () +gr_float_to_char::gr_float_to_char (size_t vlen, float scale) : gr_sync_block ("gr_float_to_char", - gr_make_io_signature (1, 1, sizeof (float)), - gr_make_io_signature (1, 1, sizeof (char))) + gr_make_io_signature (1, 1, sizeof (float)*vlen), + gr_make_io_signature (1, 1, sizeof (char)*vlen)), + d_vlen(vlen), d_scale(scale) { + const int alignment_multiple = + volk_get_alignment() / sizeof(char); + set_alignment(alignment_multiple); +} + +float +gr_float_to_char::scale() const +{ + return d_scale; +} + +void +gr_float_to_char::set_scale(float scale) +{ + d_scale = scale; } int @@ -47,9 +63,14 @@ gr_float_to_char::work (int noutput_items, gr_vector_void_star &output_items) { const float *in = (const float *) input_items[0]; - char *out = (char *) output_items[0]; + int8_t *out = (int8_t *) output_items[0]; - gri_float_to_char (in, out, noutput_items); + if(is_unaligned()) { + volk_32f_s32f_convert_8i_u(out, in, d_scale, d_vlen*noutput_items); + } + else { + volk_32f_s32f_convert_8i_a(out, in, d_scale, d_vlen*noutput_items); + } return noutput_items; } diff --git a/gnuradio-core/src/lib/general/gr_float_to_char.h b/gnuradio-core/src/lib/general/gr_float_to_char.h index 434e2e9d0..c88645a18 100644 --- a/gnuradio-core/src/lib/general/gr_float_to_char.h +++ b/gnuradio-core/src/lib/general/gr_float_to_char.h @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004 Free Software Foundation, Inc. + * Copyright 2004,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -30,7 +30,7 @@ class gr_float_to_char; typedef boost::shared_ptr<gr_float_to_char> gr_float_to_char_sptr; GR_CORE_API gr_float_to_char_sptr -gr_make_float_to_char (); +gr_make_float_to_char (size_t vlen=1, float scale=1); /*! * \brief Convert stream of float to a stream of char @@ -39,10 +39,18 @@ gr_make_float_to_char (); class GR_CORE_API gr_float_to_char : public gr_sync_block { - friend GR_CORE_API gr_float_to_char_sptr gr_make_float_to_char (); - gr_float_to_char (); + private: + friend GR_CORE_API gr_float_to_char_sptr gr_make_float_to_char + (size_t vlen, float scale); + gr_float_to_char (size_t vlen, float scale); + + size_t d_vlen; + float d_scale; public: + float scale() const; + void set_scale(float scale); + virtual int work (int noutput_items, gr_vector_const_void_star &input_items, gr_vector_void_star &output_items); diff --git a/gnuradio-core/src/lib/general/gr_float_to_char.i b/gnuradio-core/src/lib/general/gr_float_to_char.i index 05b206554..a1c88750f 100644 --- a/gnuradio-core/src/lib/general/gr_float_to_char.i +++ b/gnuradio-core/src/lib/general/gr_float_to_char.i @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004 Free Software Foundation, Inc. + * Copyright 2004,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -22,9 +22,12 @@ GR_SWIG_BLOCK_MAGIC(gr,float_to_char) -gr_float_to_char_sptr gr_make_float_to_char (); +gr_float_to_char_sptr +gr_make_float_to_char (size_t vlen=1, float scale=1); class gr_float_to_char : public gr_sync_block { - gr_float_to_char (); +public: + float scale() const; + void set_scale(float scale); }; diff --git a/gnuradio-core/src/lib/general/gr_float_to_int.cc b/gnuradio-core/src/lib/general/gr_float_to_int.cc index 2349de8cb..b69591043 100644 --- a/gnuradio-core/src/lib/general/gr_float_to_int.cc +++ b/gnuradio-core/src/lib/general/gr_float_to_int.cc @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2011 Free Software Foundation, Inc. + * Copyright 2011,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -27,32 +27,63 @@ #include <gr_float_to_int.h> #include <gr_io_signature.h> #include <gri_float_to_int.h> +#include <volk/volk.h> gr_float_to_int_sptr -gr_make_float_to_int () +gr_make_float_to_int (size_t vlen, float scale) { - return gnuradio::get_initial_sptr(new gr_float_to_int ()); + return gnuradio::get_initial_sptr(new gr_float_to_int (vlen, scale)); } -gr_float_to_int::gr_float_to_int () +gr_float_to_int::gr_float_to_int (size_t vlen, float scale) : gr_sync_block ("gr_float_to_int", - gr_make_io_signature (1, 1, sizeof (float)), - gr_make_io_signature (1, 1, sizeof (int))) + gr_make_io_signature (1, 1, sizeof (float)*vlen), + gr_make_io_signature (1, 1, sizeof (int)*vlen)), + d_vlen(vlen), d_scale(scale) { + const int alignment_multiple = + volk_get_alignment() / sizeof(int); + set_alignment(alignment_multiple); } +float +gr_float_to_int::scale() const +{ + return d_scale; +} + +void +gr_float_to_int::set_scale(float scale) +{ + d_scale = scale; +} int gr_float_to_int::work (int noutput_items, - gr_vector_const_void_star &input_items, - gr_vector_void_star &output_items) + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) { + // Disable the Volk for now. There is a problem for large 32-bit ints that + // are not properly represented by the precisions of a single float, which + // can cause wrapping from large, positive numbers to negative. + // In gri_float_to_int, the value is first promoted to a 64-bit + // value, clipped, then converted to a float. +#if 0 + const float *in = (const float *) input_items[0]; + int32_t *out = (int32_t *) output_items[0]; + + if(is_unaligned()) { + volk_32f_s32f_convert_32i_u(out, in, d_scale, d_vlen*noutput_items); + } + else { + volk_32f_s32f_convert_32i_a(out, in, d_scale, d_vlen*noutput_items); + } +#else const float *in = (const float *) input_items[0]; int *out = (int *) output_items[0]; - gri_float_to_int (in, out, noutput_items); + gri_float_to_int (in, out, d_scale, d_vlen*noutput_items); + +#endif return noutput_items; } - - - diff --git a/gnuradio-core/src/lib/general/gr_float_to_int.h b/gnuradio-core/src/lib/general/gr_float_to_int.h index 3324ed110..0b42c0aab 100644 --- a/gnuradio-core/src/lib/general/gr_float_to_int.h +++ b/gnuradio-core/src/lib/general/gr_float_to_int.h @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2011 Free Software Foundation, Inc. + * Copyright 2011,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -30,7 +30,7 @@ class gr_float_to_int; typedef boost::shared_ptr<gr_float_to_int> gr_float_to_int_sptr; GR_CORE_API gr_float_to_int_sptr -gr_make_float_to_int (); +gr_make_float_to_int (size_t vlen=1, float scale=1); /*! * \brief Convert stream of float to a stream of short @@ -39,10 +39,18 @@ gr_make_float_to_int (); class GR_CORE_API gr_float_to_int : public gr_sync_block { - friend GR_CORE_API gr_float_to_int_sptr gr_make_float_to_int (); - gr_float_to_int (); + private: + friend GR_CORE_API + gr_float_to_int_sptr gr_make_float_to_int (size_t vlen, float scale); + gr_float_to_int (size_t vlen, float scale); + + size_t d_vlen; + float d_scale; public: + float scale() const; + void set_scale(float scale); + virtual int work (int noutput_items, gr_vector_const_void_star &input_items, gr_vector_void_star &output_items); diff --git a/gnuradio-core/src/lib/general/gr_float_to_int.i b/gnuradio-core/src/lib/general/gr_float_to_int.i index 4ab04cbf2..6e71f54a9 100644 --- a/gnuradio-core/src/lib/general/gr_float_to_int.i +++ b/gnuradio-core/src/lib/general/gr_float_to_int.i @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2011 Free Software Foundation, Inc. + * Copyright 2011,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -22,9 +22,12 @@ GR_SWIG_BLOCK_MAGIC(gr,float_to_int) -gr_float_to_int_sptr gr_make_float_to_int (); +gr_float_to_int_sptr +gr_make_float_to_int (size_t vlen=1, float scale=1); class gr_float_to_int : public gr_sync_block { - gr_float_to_int (); +public: + float scale() const; + void set_scale(float scale); }; diff --git a/gnuradio-core/src/lib/general/gr_float_to_short.cc b/gnuradio-core/src/lib/general/gr_float_to_short.cc index 084f76f9c..188bfdae3 100644 --- a/gnuradio-core/src/lib/general/gr_float_to_short.cc +++ b/gnuradio-core/src/lib/general/gr_float_to_short.cc @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004,2010 Free Software Foundation, Inc. + * Copyright 2004,2010,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -26,19 +26,35 @@ #include <gr_float_to_short.h> #include <gr_io_signature.h> -#include <gri_float_to_short.h> +#include <volk/volk.h> gr_float_to_short_sptr -gr_make_float_to_short () +gr_make_float_to_short (size_t vlen, float scale) { - return gnuradio::get_initial_sptr(new gr_float_to_short ()); + return gnuradio::get_initial_sptr(new gr_float_to_short (vlen, scale)); } -gr_float_to_short::gr_float_to_short () +gr_float_to_short::gr_float_to_short (size_t vlen, float scale) : gr_sync_block ("gr_float_to_short", - gr_make_io_signature (1, 1, sizeof (float)), - gr_make_io_signature (1, 1, sizeof (short))) + gr_make_io_signature (1, 1, sizeof (float)*vlen), + gr_make_io_signature (1, 1, sizeof (short)*vlen)), + d_vlen(vlen), d_scale(scale) { + const int alignment_multiple = + volk_get_alignment() / sizeof(short); + set_alignment(alignment_multiple); +} + +float +gr_float_to_short::scale() const +{ + return d_scale; +} + +void +gr_float_to_short::set_scale(float scale) +{ + d_scale = scale; } int @@ -49,8 +65,13 @@ gr_float_to_short::work (int noutput_items, const float *in = (const float *) input_items[0]; short *out = (short *) output_items[0]; - gri_float_to_short (in, out, noutput_items); - + if(is_unaligned()) { + volk_32f_s32f_convert_16i_u(out, in, d_scale, d_vlen*noutput_items); + } + else { + volk_32f_s32f_convert_16i_a(out, in, d_scale, d_vlen*noutput_items); + } + return noutput_items; } diff --git a/gnuradio-core/src/lib/general/gr_float_to_short.h b/gnuradio-core/src/lib/general/gr_float_to_short.h index 010d61141..93e441f41 100644 --- a/gnuradio-core/src/lib/general/gr_float_to_short.h +++ b/gnuradio-core/src/lib/general/gr_float_to_short.h @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004 Free Software Foundation, Inc. + * Copyright 2004,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -30,7 +30,7 @@ class gr_float_to_short; typedef boost::shared_ptr<gr_float_to_short> gr_float_to_short_sptr; GR_CORE_API gr_float_to_short_sptr -gr_make_float_to_short (); +gr_make_float_to_short (size_t vlen=1, float scale=1); /*! * \brief Convert stream of float to a stream of short @@ -39,10 +39,17 @@ gr_make_float_to_short (); class GR_CORE_API gr_float_to_short : public gr_sync_block { - friend GR_CORE_API gr_float_to_short_sptr gr_make_float_to_short (); - gr_float_to_short (); + friend GR_CORE_API + gr_float_to_short_sptr gr_make_float_to_short (size_t vlen, float scale); + gr_float_to_short (size_t vlen, float scale); + + size_t d_vlen; + float d_scale; public: + float scale() const; + void set_scale(float scale); + virtual int work (int noutput_items, gr_vector_const_void_star &input_items, gr_vector_void_star &output_items); diff --git a/gnuradio-core/src/lib/general/gr_float_to_short.i b/gnuradio-core/src/lib/general/gr_float_to_short.i index ad059c453..072da5213 100644 --- a/gnuradio-core/src/lib/general/gr_float_to_short.i +++ b/gnuradio-core/src/lib/general/gr_float_to_short.i @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004 Free Software Foundation, Inc. + * Copyright 2004,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -22,9 +22,12 @@ GR_SWIG_BLOCK_MAGIC(gr,float_to_short) -gr_float_to_short_sptr gr_make_float_to_short (); +gr_float_to_short_sptr +gr_make_float_to_short (size_t vlen=1, float scale=1); class gr_float_to_short : public gr_sync_block { - gr_float_to_short (); +public: + float scale() const; + void set_scale(float scale); }; diff --git a/gnuradio-core/src/lib/general/gr_int_to_float.cc b/gnuradio-core/src/lib/general/gr_int_to_float.cc index 29ca22add..7ec15b1a8 100644 --- a/gnuradio-core/src/lib/general/gr_int_to_float.cc +++ b/gnuradio-core/src/lib/general/gr_int_to_float.cc @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2011 Free Software Foundation, Inc. + * Copyright 2011,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -26,19 +26,23 @@ #include <gr_int_to_float.h> #include <gr_io_signature.h> -#include <gri_int_to_float.h> +#include <volk/volk.h> gr_int_to_float_sptr -gr_make_int_to_float () +gr_make_int_to_float (size_t vlen, float scale) { - return gnuradio::get_initial_sptr(new gr_int_to_float ()); + return gnuradio::get_initial_sptr(new gr_int_to_float (vlen, scale)); } -gr_int_to_float::gr_int_to_float () +gr_int_to_float::gr_int_to_float (size_t vlen, float scale) : gr_sync_block ("gr_int_to_float", - gr_make_io_signature (1, 1, sizeof (int32_t)), - gr_make_io_signature (1, 1, sizeof (float))) + gr_make_io_signature (1, 1, sizeof (int32_t)*vlen), + gr_make_io_signature (1, 1, sizeof (float)*vlen)), + d_vlen(vlen), d_scale(scale) { + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); } int @@ -48,9 +52,14 @@ gr_int_to_float::work (int noutput_items, { const int32_t *in = (const int32_t *) input_items[0]; float *out = (float *) output_items[0]; - - gri_int_to_float(in, out, noutput_items); + if(is_unaligned()) { + volk_32i_s32f_convert_32f_u(out, in, d_scale, d_vlen*noutput_items); + } + else { + volk_32i_s32f_convert_32f_a(out, in, d_scale, d_vlen*noutput_items); + } + return noutput_items; } diff --git a/gnuradio-core/src/lib/general/gr_int_to_float.h b/gnuradio-core/src/lib/general/gr_int_to_float.h index 9af381ba9..af6488a50 100644 --- a/gnuradio-core/src/lib/general/gr_int_to_float.h +++ b/gnuradio-core/src/lib/general/gr_int_to_float.h @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2011 Free Software Foundation, Inc. + * Copyright 2011,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -30,7 +30,7 @@ class gr_int_to_float; typedef boost::shared_ptr<gr_int_to_float> gr_int_to_float_sptr; GR_CORE_API gr_int_to_float_sptr -gr_make_int_to_float (); +gr_make_int_to_float (size_t vlen=1, float scale=1); /*! * \brief Convert stream of int to a stream of float @@ -39,10 +39,18 @@ gr_make_int_to_float (); class GR_CORE_API gr_int_to_float : public gr_sync_block { - friend GR_CORE_API gr_int_to_float_sptr gr_make_int_to_float (); - gr_int_to_float (); + private: + friend GR_CORE_API gr_int_to_float_sptr + gr_make_int_to_float (size_t vlen, float scale); + gr_int_to_float (size_t vlen, float scale); + + size_t d_vlen; + float d_scale; public: + float scale() const; + void set_scale(float scale); + virtual int work (int noutput_items, gr_vector_const_void_star &input_items, gr_vector_void_star &output_items); diff --git a/gnuradio-core/src/lib/general/gr_int_to_float.i b/gnuradio-core/src/lib/general/gr_int_to_float.i index 8cb9e35b5..c1f25e37b 100644 --- a/gnuradio-core/src/lib/general/gr_int_to_float.i +++ b/gnuradio-core/src/lib/general/gr_int_to_float.i @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2011 Free Software Foundation, Inc. + * Copyright 2011,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -22,9 +22,11 @@ GR_SWIG_BLOCK_MAGIC(gr,int_to_float) -gr_int_to_float_sptr gr_make_int_to_float (); +gr_int_to_float_sptr +gr_make_int_to_float (size_t vlen=1, float scale=1); class gr_int_to_float : public gr_sync_block { - gr_int_to_float (); + float scale() const; + void set_scale(float scale); }; diff --git a/gnuradio-core/src/lib/general/gr_multiply_cc.cc b/gnuradio-core/src/lib/general/gr_multiply_cc.cc new file mode 100644 index 000000000..0d20e6257 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_cc.cc @@ -0,0 +1,69 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <gr_multiply_cc.h> +#include <gr_io_signature.h> +#include <volk/volk.h> + +gr_multiply_cc_sptr +gr_make_multiply_cc (size_t vlen) +{ + return gnuradio::get_initial_sptr(new gr_multiply_cc (vlen)); +} + +gr_multiply_cc::gr_multiply_cc (size_t vlen) + : gr_sync_block ("gr_multiply_cc", + gr_make_io_signature (1, -1, sizeof (gr_complex)*vlen), + gr_make_io_signature (1, 1, sizeof (gr_complex)*vlen)), + d_vlen(vlen) +{ + const int alignment_multiple = + volk_get_alignment() / sizeof(gr_complex); + set_alignment(alignment_multiple); +} + +int +gr_multiply_cc::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + gr_complex *out = (gr_complex *) output_items[0]; + int noi = d_vlen*noutput_items; + + memcpy(out, input_items[0], noi*sizeof(gr_complex)); + if(is_unaligned()) { + for(size_t i = 1; i < input_items.size(); i++) + volk_32fc_x2_multiply_32fc_u(out, out, (gr_complex*)input_items[i], noi); + } + else { + for(size_t i = 1; i < input_items.size(); i++) + volk_32fc_x2_multiply_32fc_a(out, out, (gr_complex*)input_items[i], noi); + } + return noutput_items; +} + + + diff --git a/gnuradio-core/src/lib/general/gr_multiply_cc.h b/gnuradio-core/src/lib/general/gr_multiply_cc.h new file mode 100644 index 000000000..f80ec8b25 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_cc.h @@ -0,0 +1,56 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifndef INCLUDED_GR_MULTIPLY_CC_H +#define INCLUDED_GR_MULTIPLY_CC_H + +#include <gr_core_api.h> +#include <gr_sync_block.h> + +class gr_multiply_cc; +typedef boost::shared_ptr<gr_multiply_cc> gr_multiply_cc_sptr; + +GR_CORE_API gr_multiply_cc_sptr +gr_make_multiply_cc (size_t vlen=1); + +/*! + * \brief Multiply streams of complex values + * \ingroup math_blk + */ + +class GR_CORE_API gr_multiply_cc : public gr_sync_block +{ + private: + friend GR_CORE_API gr_multiply_cc_sptr + gr_make_multiply_cc (size_t vlen); + gr_multiply_cc (size_t vlen); + + size_t d_vlen; + + public: + virtual int work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items); +}; + + +#endif /* INCLUDED_GR_MULTIPLY_CC_H */ diff --git a/gnuradio-core/src/lib/general/gr_multiply_cc.i b/gnuradio-core/src/lib/general/gr_multiply_cc.i new file mode 100644 index 000000000..61768c390 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_cc.i @@ -0,0 +1,32 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +GR_SWIG_BLOCK_MAGIC(gr,multiply_cc) + +gr_multiply_cc_sptr +gr_make_multiply_cc (size_t vlen=1); + +class gr_multiply_cc : public gr_sync_block +{ +public: + +}; diff --git a/gnuradio-core/src/lib/general/gr_multiply_conjugate_cc.cc b/gnuradio-core/src/lib/general/gr_multiply_conjugate_cc.cc new file mode 100644 index 000000000..103d87b8b --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_conjugate_cc.cc @@ -0,0 +1,69 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <gr_multiply_conjugate_cc.h> +#include <gr_io_signature.h> +#include <volk/volk.h> + +gr_multiply_conjugate_cc_sptr +gr_make_multiply_conjugate_cc (size_t vlen) +{ + return gnuradio::get_initial_sptr(new gr_multiply_conjugate_cc (vlen)); +} + +gr_multiply_conjugate_cc::gr_multiply_conjugate_cc (size_t vlen) + : gr_sync_block ("gr_multiply_conjugate_cc", + gr_make_io_signature (2, 2, sizeof (gr_complex)*vlen), + gr_make_io_signature (1, 1, sizeof (gr_complex)*vlen)), + d_vlen(vlen) +{ + const int alignment_multiple = + volk_get_alignment() / sizeof(gr_complex); + set_alignment(alignment_multiple); +} + +int +gr_multiply_conjugate_cc::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + gr_complex *in0 = (gr_complex *) input_items[0]; + gr_complex *in1 = (gr_complex *) input_items[1]; + gr_complex *out = (gr_complex *) output_items[0]; + int noi = d_vlen*noutput_items; + + if(is_unaligned()) { + volk_32fc_x2_multiply_conjugate_32fc_u(out, in0, in1, noi); + } + else { + volk_32fc_x2_multiply_conjugate_32fc_a(out, in0, in1, noi); + } + + return noutput_items; +} + + + diff --git a/gnuradio-core/src/lib/general/gr_multiply_conjugate_cc.h b/gnuradio-core/src/lib/general/gr_multiply_conjugate_cc.h new file mode 100644 index 000000000..eb032f31b --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_conjugate_cc.h @@ -0,0 +1,57 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifndef INCLUDED_GR_MULTIPLY_CONJUGATE_CC_H +#define INCLUDED_GR_MULTIPLY_CONJUGATE_CC_H + +#include <gr_core_api.h> +#include <gr_sync_block.h> + +class gr_multiply_conjugate_cc; +typedef boost::shared_ptr<gr_multiply_conjugate_cc> +gr_multiply_conjugate_cc_sptr; + +GR_CORE_API gr_multiply_conjugate_cc_sptr +gr_make_multiply_conjugate_cc (size_t vlen=1); + +/*! + * \brief Multiplies a stream by the conjugate of the second stream + * \ingroup math_blk + */ + +class GR_CORE_API gr_multiply_conjugate_cc : public gr_sync_block +{ + private: + friend GR_CORE_API gr_multiply_conjugate_cc_sptr + gr_make_multiply_conjugate_cc (size_t vlen); + gr_multiply_conjugate_cc (size_t vlen); + + size_t d_vlen; + + public: + virtual int work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items); +}; + + +#endif /* INCLUDED_GR_MULTIPLY_CONJUGATE_CC_H */ diff --git a/gnuradio-core/src/lib/general/gr_multiply_conjugate_cc.i b/gnuradio-core/src/lib/general/gr_multiply_conjugate_cc.i new file mode 100644 index 000000000..023410505 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_conjugate_cc.i @@ -0,0 +1,32 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +GR_SWIG_BLOCK_MAGIC(gr,multiply_conjugate_cc) + +gr_multiply_conjugate_cc_sptr +gr_make_multiply_conjugate_cc (size_t vlen=1); + +class gr_multiply_conjugate_cc : public gr_sync_block +{ +public: + +}; diff --git a/gnuradio-core/src/lib/general/gr_multiply_const_cc.cc b/gnuradio-core/src/lib/general/gr_multiply_const_cc.cc new file mode 100644 index 000000000..59521f54a --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_const_cc.cc @@ -0,0 +1,80 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <gr_multiply_const_cc.h> +#include <gr_io_signature.h> +#include <volk/volk.h> + +gr_multiply_const_cc_sptr +gr_make_multiply_const_cc (gr_complex k, size_t vlen) +{ + return gnuradio::get_initial_sptr(new gr_multiply_const_cc (k, vlen)); +} + +gr_multiply_const_cc::gr_multiply_const_cc (gr_complex k, size_t vlen) + : gr_sync_block ("gr_multiply_const_cc", + gr_make_io_signature (1, 1, sizeof (gr_complex)*vlen), + gr_make_io_signature (1, 1, sizeof (gr_complex)*vlen)), + d_k(k), d_vlen(vlen) +{ + const int alignment_multiple = + volk_get_alignment() / sizeof(gr_complex); + set_alignment(alignment_multiple); +} + +gr_complex +gr_multiply_const_cc::k() const +{ + return d_k; +} + +void +gr_multiply_const_cc::set_k(gr_complex k) +{ + d_k = k; +} + +int +gr_multiply_const_cc::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + const gr_complex *in = (const gr_complex *) input_items[0]; + gr_complex *out = (gr_complex *) output_items[0]; + int noi = d_vlen*noutput_items; + + if(is_unaligned()) { + volk_32fc_s32fc_multiply_32fc_u(out, in, d_k, noi); + } + else { + volk_32fc_s32fc_multiply_32fc_a(out, in, d_k, noi); + } + + return noutput_items; +} + + + diff --git a/gnuradio-core/src/lib/general/gr_multiply_const_cc.h b/gnuradio-core/src/lib/general/gr_multiply_const_cc.h new file mode 100644 index 000000000..1791d9160 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_const_cc.h @@ -0,0 +1,60 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifndef INCLUDED_GR_MULTIPLY_CONST_CC_H +#define INCLUDED_GR_MULTIPLY_CONST_CC_H + +#include <gr_core_api.h> +#include <gr_sync_block.h> + +class gr_multiply_const_cc; +typedef boost::shared_ptr<gr_multiply_const_cc> gr_multiply_const_cc_sptr; + +GR_CORE_API gr_multiply_const_cc_sptr +gr_make_multiply_const_cc (gr_complex k, size_t vlen=1); + +/*! + * \brief Multiply stream of complex values with a constant \p k + * \ingroup math_blk + */ + +class GR_CORE_API gr_multiply_const_cc : public gr_sync_block +{ + private: + friend GR_CORE_API gr_multiply_const_cc_sptr + gr_make_multiply_const_cc (gr_complex k, size_t vlen); + gr_multiply_const_cc (gr_complex k, size_t vlen); + + gr_complex d_k; + size_t d_vlen; + + public: + gr_complex k() const; + void set_k(gr_complex k); + + virtual int work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items); +}; + + +#endif /* INCLUDED_GR_MULTIPLY_CONST_CC_H */ diff --git a/gnuradio-core/src/lib/general/gr_multiply_const_cc.i b/gnuradio-core/src/lib/general/gr_multiply_const_cc.i new file mode 100644 index 000000000..be8d32b31 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_const_cc.i @@ -0,0 +1,33 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +GR_SWIG_BLOCK_MAGIC(gr,multiply_const_cc) + +gr_multiply_const_cc_sptr +gr_make_multiply_const_cc (gr_complex k, size_t vlen=1); + +class gr_multiply_const_cc : public gr_sync_block +{ +public: + gr_complex k() const; + void set_k(gr_complex k); +}; diff --git a/gnuradio-core/src/lib/general/gr_multiply_const_ff.cc b/gnuradio-core/src/lib/general/gr_multiply_const_ff.cc new file mode 100644 index 000000000..8354cb27b --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_const_ff.cc @@ -0,0 +1,80 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <gr_multiply_const_ff.h> +#include <gr_io_signature.h> +#include <volk/volk.h> + +gr_multiply_const_ff_sptr +gr_make_multiply_const_ff (float k, size_t vlen) +{ + return gnuradio::get_initial_sptr(new gr_multiply_const_ff (k, vlen)); +} + +gr_multiply_const_ff::gr_multiply_const_ff (float k, size_t vlen) + : gr_sync_block ("gr_multiply_const_ff", + gr_make_io_signature (1, 1, sizeof (float)*vlen), + gr_make_io_signature (1, 1, sizeof (float)*vlen)), + d_k(k), d_vlen(vlen) +{ + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); +} + +float +gr_multiply_const_ff::k() const +{ + return d_k; +} + +void +gr_multiply_const_ff::set_k(float k) +{ + d_k = k; +} + +int +gr_multiply_const_ff::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + const float *in = (const float *) input_items[0]; + float *out = (float *) output_items[0]; + int noi = d_vlen*noutput_items; + + if(is_unaligned()) { + volk_32f_s32f_multiply_32f_u(out, in, d_k, noi); + } + else { + volk_32f_s32f_multiply_32f_a(out, in, d_k, noi); + } + + return noutput_items; +} + + + diff --git a/gnuradio-core/src/lib/general/gr_multiply_const_ff.h b/gnuradio-core/src/lib/general/gr_multiply_const_ff.h new file mode 100644 index 000000000..ef42a92f4 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_const_ff.h @@ -0,0 +1,60 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifndef INCLUDED_GR_MULTIPLY_CONST_FF_H +#define INCLUDED_GR_MULTIPLY_CONST_FF_H + +#include <gr_core_api.h> +#include <gr_sync_block.h> + +class gr_multiply_const_ff; +typedef boost::shared_ptr<gr_multiply_const_ff> gr_multiply_const_ff_sptr; + +GR_CORE_API gr_multiply_const_ff_sptr +gr_make_multiply_const_ff (float k, size_t vlen=1); + +/*! + * \brief Multiply stream of float values with a constant \p k + * \ingroup math_blk + */ + +class GR_CORE_API gr_multiply_const_ff : public gr_sync_block +{ + private: + friend GR_CORE_API gr_multiply_const_ff_sptr + gr_make_multiply_const_ff (float k, size_t vlen); + gr_multiply_const_ff (float k, size_t vlen); + + float d_k; + size_t d_vlen; + + public: + float k() const; + void set_k(float k); + + virtual int work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items); +}; + + +#endif /* INCLUDED_GR_MULTIPLY_CONST_FF_H */ diff --git a/gnuradio-core/src/lib/general/gr_multiply_const_ff.i b/gnuradio-core/src/lib/general/gr_multiply_const_ff.i new file mode 100644 index 000000000..0fd3b1225 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_const_ff.i @@ -0,0 +1,33 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +GR_SWIG_BLOCK_MAGIC(gr,multiply_const_ff) + +gr_multiply_const_ff_sptr +gr_make_multiply_const_ff (float k, size_t vlen=1); + +class gr_multiply_const_ff : public gr_sync_block +{ +public: + float k() const; + void set_k(float k); +}; diff --git a/gnuradio-core/src/lib/general/gr_multiply_ff.cc b/gnuradio-core/src/lib/general/gr_multiply_ff.cc new file mode 100644 index 000000000..a7d34ce51 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_ff.cc @@ -0,0 +1,69 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <gr_multiply_ff.h> +#include <gr_io_signature.h> +#include <volk/volk.h> + +gr_multiply_ff_sptr +gr_make_multiply_ff (size_t vlen) +{ + return gnuradio::get_initial_sptr(new gr_multiply_ff (vlen)); +} + +gr_multiply_ff::gr_multiply_ff (size_t vlen) + : gr_sync_block ("gr_multiply_ff", + gr_make_io_signature (1, -1, sizeof (float)*vlen), + gr_make_io_signature (1, 1, sizeof (float)*vlen)), + d_vlen(vlen) +{ + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); +} + +int +gr_multiply_ff::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + float *out = (float *) output_items[0]; + int noi = d_vlen*noutput_items; + + memcpy(out, input_items[0], noi*sizeof(float)); + if(is_unaligned()) { + for(size_t i = 1; i < input_items.size(); i++) + volk_32f_x2_multiply_32f_u(out, out, (const float*)input_items[i], noi); + } + else { + for(size_t i = 1; i < input_items.size(); i++) + volk_32f_x2_multiply_32f_a(out, out, (const float*)input_items[i], noi); + } + return noutput_items; +} + + + diff --git a/gnuradio-core/src/lib/general/gr_multiply_ff.h b/gnuradio-core/src/lib/general/gr_multiply_ff.h new file mode 100644 index 000000000..ae36cb1e0 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_ff.h @@ -0,0 +1,56 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifndef INCLUDED_GR_MULTIPLY_FF_H +#define INCLUDED_GR_MULTIPLY_FF_H + +#include <gr_core_api.h> +#include <gr_sync_block.h> + +class gr_multiply_ff; +typedef boost::shared_ptr<gr_multiply_ff> gr_multiply_ff_sptr; + +GR_CORE_API gr_multiply_ff_sptr +gr_make_multiply_ff (size_t vlen=1); + +/*! + * \brief Multiply streams of complex values + * \ingroup math_blk + */ + +class GR_CORE_API gr_multiply_ff : public gr_sync_block +{ + private: + friend GR_CORE_API gr_multiply_ff_sptr + gr_make_multiply_ff (size_t vlen); + gr_multiply_ff (size_t vlen); + + size_t d_vlen; + + public: + virtual int work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items); +}; + + +#endif /* INCLUDED_GR_MULTIPLY_FF_H */ diff --git a/gnuradio-core/src/lib/general/gr_multiply_ff.i b/gnuradio-core/src/lib/general/gr_multiply_ff.i new file mode 100644 index 000000000..0f06301f2 --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_multiply_ff.i @@ -0,0 +1,32 @@ +/* -*- c++ -*- */ +/* + * Copyright 2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +GR_SWIG_BLOCK_MAGIC(gr,multiply_ff) + +gr_multiply_ff_sptr +gr_make_multiply_ff (size_t vlen=1); + +class gr_multiply_ff : public gr_sync_block +{ +public: + +}; diff --git a/gnuradio-core/src/lib/general/gr_short_to_char.cc b/gnuradio-core/src/lib/general/gr_short_to_char.cc new file mode 100644 index 000000000..a3c096e6d --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_short_to_char.cc @@ -0,0 +1,67 @@ +/* -*- c++ -*- */ +/* + * Copyright 2011,2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include <gr_short_to_char.h> +#include <gr_io_signature.h> +#include <volk/volk.h> + +gr_short_to_char_sptr +gr_make_short_to_char (size_t vlen) +{ + return gnuradio::get_initial_sptr(new gr_short_to_char (vlen)); +} + +gr_short_to_char::gr_short_to_char (size_t vlen) + : gr_sync_block ("gr_short_to_char", + gr_make_io_signature (1, 1, sizeof (short)*vlen), + gr_make_io_signature (1, 1, sizeof (char)*vlen)), + d_vlen(vlen) +{ + const int alignment_multiple = + volk_get_alignment() / sizeof(char); + set_alignment(alignment_multiple); +} + +int +gr_short_to_char::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + const int16_t *in = (const int16_t *) input_items[0]; + int8_t *out = (int8_t *) output_items[0]; + + if(is_unaligned()) { + volk_16i_convert_8i_u(out, in, d_vlen*noutput_items); + } + else { + volk_16i_convert_8i_a(out, in, d_vlen*noutput_items); + } + + return noutput_items; +} + + + diff --git a/gnuradio-core/src/lib/general/gr_short_to_char.h b/gnuradio-core/src/lib/general/gr_short_to_char.h new file mode 100644 index 000000000..9682d86ec --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_short_to_char.h @@ -0,0 +1,56 @@ +/* -*- c++ -*- */ +/* + * Copyright 2011,2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +#ifndef INCLUDED_GR_SHORT_TO_CHAR_H +#define INCLUDED_GR_SHORT_TO_CHAR_H + +#include <gr_core_api.h> +#include <gr_sync_block.h> + +class gr_short_to_char; +typedef boost::shared_ptr<gr_short_to_char> gr_short_to_char_sptr; + +GR_CORE_API gr_short_to_char_sptr +gr_make_short_to_char (size_t vlen=1); + +/*! + * \brief Convert stream of short to a stream of float + * \ingroup converter_blk + */ + +class GR_CORE_API gr_short_to_char : public gr_sync_block +{ + private: + friend GR_CORE_API gr_short_to_char_sptr + gr_make_short_to_char (size_t vlen); + gr_short_to_char (size_t vlen); + + size_t d_vlen; + + public: + virtual int work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items); +}; + + +#endif /* INCLUDED_GR_SHORT_TO_CHAR_H */ diff --git a/gnuradio-core/src/lib/general/gr_short_to_char.i b/gnuradio-core/src/lib/general/gr_short_to_char.i new file mode 100644 index 000000000..330a4fdda --- /dev/null +++ b/gnuradio-core/src/lib/general/gr_short_to_char.i @@ -0,0 +1,31 @@ +/* -*- c++ -*- */ +/* + * Copyright 2011,2012 Free Software Foundation, Inc. + * + * This file is part of GNU Radio + * + * GNU Radio is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3, or (at your option) + * any later version. + * + * GNU Radio is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GNU Radio; see the file COPYING. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, + * Boston, MA 02110-1301, USA. + */ + +GR_SWIG_BLOCK_MAGIC(gr,short_to_char) + +gr_short_to_char_sptr +gr_make_short_to_char (size_t vlen=1); + +class gr_short_to_char : public gr_sync_block +{ + +}; diff --git a/gnuradio-core/src/lib/general/gr_short_to_float.cc b/gnuradio-core/src/lib/general/gr_short_to_float.cc index 7b80953ac..94d376a27 100644 --- a/gnuradio-core/src/lib/general/gr_short_to_float.cc +++ b/gnuradio-core/src/lib/general/gr_short_to_float.cc @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004,2010 Free Software Foundation, Inc. + * Copyright 2004,2010,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -26,19 +26,35 @@ #include <gr_short_to_float.h> #include <gr_io_signature.h> -#include <gri_short_to_float.h> +#include <volk/volk.h> gr_short_to_float_sptr -gr_make_short_to_float () +gr_make_short_to_float (size_t vlen, float scale) { - return gnuradio::get_initial_sptr(new gr_short_to_float ()); + return gnuradio::get_initial_sptr(new gr_short_to_float (vlen, scale)); } -gr_short_to_float::gr_short_to_float () +gr_short_to_float::gr_short_to_float (size_t vlen, float scale) : gr_sync_block ("gr_short_to_float", - gr_make_io_signature (1, 1, sizeof (short)), - gr_make_io_signature (1, 1, sizeof (float))) + gr_make_io_signature (1, 1, sizeof (short)*vlen), + gr_make_io_signature (1, 1, sizeof (float)*vlen)), + d_vlen(vlen), d_scale(scale) { + const int alignment_multiple = + volk_get_alignment() / sizeof(float); + set_alignment(alignment_multiple); +} + +float +gr_short_to_float::scale() const +{ + return d_scale; +} + +void +gr_short_to_float::set_scale(float scale) +{ + d_scale = scale; } int @@ -49,8 +65,12 @@ gr_short_to_float::work (int noutput_items, const short *in = (const short *) input_items[0]; float *out = (float *) output_items[0]; - gri_short_to_float (in, out, noutput_items); - + if(is_unaligned()) { + volk_16i_s32f_convert_32f_u(out, in, d_scale, d_vlen*noutput_items); + } + else { + volk_16i_s32f_convert_32f_a(out, in, d_scale, d_vlen*noutput_items); + } return noutput_items; } diff --git a/gnuradio-core/src/lib/general/gr_short_to_float.h b/gnuradio-core/src/lib/general/gr_short_to_float.h index b40c966ea..efdc81ecd 100644 --- a/gnuradio-core/src/lib/general/gr_short_to_float.h +++ b/gnuradio-core/src/lib/general/gr_short_to_float.h @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004 Free Software Foundation, Inc. + * Copyright 2004,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -30,7 +30,7 @@ class gr_short_to_float; typedef boost::shared_ptr<gr_short_to_float> gr_short_to_float_sptr; GR_CORE_API gr_short_to_float_sptr -gr_make_short_to_float (); +gr_make_short_to_float (size_t vlen=1, float scale=1); /*! * \brief Convert stream of short to a stream of float @@ -39,10 +39,18 @@ gr_make_short_to_float (); class GR_CORE_API gr_short_to_float : public gr_sync_block { - friend GR_CORE_API gr_short_to_float_sptr gr_make_short_to_float (); - gr_short_to_float (); - + private: + friend GR_CORE_API gr_short_to_float_sptr + gr_make_short_to_float (size_t vlen, float scale); + gr_short_to_float (size_t vlen, float scale); + + size_t d_vlen; + float d_scale; + public: + float scale() const; + void set_scale(float scale); + virtual int work (int noutput_items, gr_vector_const_void_star &input_items, gr_vector_void_star &output_items); diff --git a/gnuradio-core/src/lib/general/gr_short_to_float.i b/gnuradio-core/src/lib/general/gr_short_to_float.i index 56759df29..229618890 100644 --- a/gnuradio-core/src/lib/general/gr_short_to_float.i +++ b/gnuradio-core/src/lib/general/gr_short_to_float.i @@ -1,6 +1,6 @@ /* -*- c++ -*- */ /* - * Copyright 2004 Free Software Foundation, Inc. + * Copyright 2004,2012 Free Software Foundation, Inc. * * This file is part of GNU Radio * @@ -22,9 +22,12 @@ GR_SWIG_BLOCK_MAGIC(gr,short_to_float) -gr_short_to_float_sptr gr_make_short_to_float (); +gr_short_to_float_sptr +gr_make_short_to_float (size_t vlen=1, float scale=1); class gr_short_to_float : public gr_sync_block { - gr_short_to_float (); +public: + float scale() const; + void set_scale(float scale); }; diff --git a/gnuradio-core/src/lib/general/gri_float_to_int.cc b/gnuradio-core/src/lib/general/gri_float_to_int.cc index 5271e60e2..0b0b10dfe 100644 --- a/gnuradio-core/src/lib/general/gri_float_to_int.cc +++ b/gnuradio-core/src/lib/general/gri_float_to_int.cc @@ -34,10 +34,10 @@ static const int64_t MIN_INT = -2147483647; // -(2^31)-1 void -gri_float_to_int (const float *in, int *out, int nsamples) +gri_float_to_int (const float *in, int *out, float scale, int nsamples) { for (int i = 0; i < nsamples; i++){ - int64_t r = llrintf(in[i]); + int64_t r = llrintf(scale * in[i]); if (r < MIN_INT) r = MIN_INT; else if (r > MAX_INT) diff --git a/gnuradio-core/src/lib/general/gri_float_to_int.h b/gnuradio-core/src/lib/general/gri_float_to_int.h index a2f6ea877..d8b98efc1 100644 --- a/gnuradio-core/src/lib/general/gri_float_to_int.h +++ b/gnuradio-core/src/lib/general/gri_float_to_int.h @@ -28,6 +28,6 @@ /*! * convert array of floats to int with rounding and saturation. */ -GR_CORE_API void gri_float_to_int (const float *in, int *out, int nsamples); +GR_CORE_API void gri_float_to_int (const float *in, int *out, float scale, int nsamples); #endif /* INCLUDED_GRI_FLOAT_TO_INT_H */ diff --git a/gnuradio-core/src/lib/gengen/CMakeLists.txt b/gnuradio-core/src/lib/gengen/CMakeLists.txt index a7292f131..98b149e97 100644 --- a/gnuradio-core/src/lib/gengen/CMakeLists.txt +++ b/gnuradio-core/src/lib/gengen/CMakeLists.txt @@ -86,10 +86,10 @@ expand_h_cc_i(gr_noise_source_X s i f c) expand_h_cc_i(gr_sig_source_X s i f c) expand_h_cc_i(gr_add_const_XX ss ii ff cc sf) -expand_h_cc_i(gr_multiply_const_XX ss ii ff cc) -expand_h_cc_i(gr_add_XX ss ii ff cc) +expand_h_cc_i(gr_multiply_const_XX ss ii) +expand_h_cc_i(gr_add_XX ss ii cc) expand_h_cc_i(gr_sub_XX ss ii ff cc) -expand_h_cc_i(gr_multiply_XX ss ii ff cc) +expand_h_cc_i(gr_multiply_XX ss ii) expand_h_cc_i(gr_divide_XX ss ii ff cc) expand_h_cc_i(gr_mute_XX ss ii ff cc) expand_h_cc_i(gr_add_const_vXX ss ii ff cc) diff --git a/gnuradio-core/src/lib/gengen/Makefile.gen b/gnuradio-core/src/lib/gengen/Makefile.gen index 1c529803c..db260585f 100644 --- a/gnuradio-core/src/lib/gengen/Makefile.gen +++ b/gnuradio-core/src/lib/gengen/Makefile.gen @@ -12,7 +12,6 @@ GENERATED_H = \ gr_add_const_vff.h \ gr_add_const_vii.h \ gr_add_const_vss.h \ - gr_add_ff.h \ gr_add_ii.h \ gr_add_ss.h \ gr_and_bb.h \ @@ -45,16 +44,12 @@ GENERATED_H = \ gr_moving_average_ff.h \ gr_moving_average_ii.h \ gr_moving_average_ss.h \ - gr_multiply_cc.h \ - gr_multiply_const_cc.h \ - gr_multiply_const_ff.h \ gr_multiply_const_ii.h \ gr_multiply_const_ss.h \ gr_multiply_const_vcc.h \ gr_multiply_const_vff.h \ gr_multiply_const_vii.h \ gr_multiply_const_vss.h \ - gr_multiply_ff.h \ gr_multiply_ii.h \ gr_multiply_ss.h \ gr_mute_cc.h \ @@ -117,7 +112,6 @@ GENERATED_I = \ gr_add_const_vff.i \ gr_add_const_vii.i \ gr_add_const_vss.i \ - gr_add_ff.i \ gr_add_ii.i \ gr_add_ss.i \ gr_and_bb.i \ @@ -150,16 +144,12 @@ GENERATED_I = \ gr_moving_average_ff.i \ gr_moving_average_ii.i \ gr_moving_average_ss.i \ - gr_multiply_cc.i \ - gr_multiply_const_cc.i \ - gr_multiply_const_ff.i \ gr_multiply_const_ii.i \ gr_multiply_const_ss.i \ gr_multiply_const_vcc.i \ gr_multiply_const_vff.i \ gr_multiply_const_vii.i \ gr_multiply_const_vss.i \ - gr_multiply_ff.i \ gr_multiply_ii.i \ gr_multiply_ss.i \ gr_mute_cc.i \ @@ -222,7 +212,6 @@ GENERATED_CC = \ gr_add_const_vff.cc \ gr_add_const_vii.cc \ gr_add_const_vss.cc \ - gr_add_ff.cc \ gr_add_ii.cc \ gr_add_ss.cc \ gr_and_bb.cc \ @@ -255,16 +244,12 @@ GENERATED_CC = \ gr_moving_average_ff.cc \ gr_moving_average_ii.cc \ gr_moving_average_ss.cc \ - gr_multiply_cc.cc \ - gr_multiply_const_cc.cc \ - gr_multiply_const_ff.cc \ gr_multiply_const_ii.cc \ gr_multiply_const_ss.cc \ gr_multiply_const_vcc.cc \ gr_multiply_const_vff.cc \ gr_multiply_const_vii.cc \ gr_multiply_const_vss.cc \ - gr_multiply_ff.cc \ gr_multiply_ii.cc \ gr_multiply_ss.cc \ gr_mute_cc.cc \ diff --git a/gnuradio-core/src/lib/gengen/generate_common.py b/gnuradio-core/src/lib/gengen/generate_common.py index 9bd6bcc9c..6da2044e0 100755 --- a/gnuradio-core/src/lib/gengen/generate_common.py +++ b/gnuradio-core/src/lib/gengen/generate_common.py @@ -41,10 +41,7 @@ reg_signatures = ['ss', 'ii', 'ff', 'cc'] reg_roots = [ 'gr_add_const_XX', - 'gr_multiply_const_XX', - 'gr_add_XX', 'gr_sub_XX', - 'gr_multiply_XX', 'gr_divide_XX', 'gr_mute_XX', 'gr_add_const_vXX', @@ -66,7 +63,10 @@ others = ( ('gr_sample_and_hold_XX', ('bb','ss','ii','ff')), ('gr_argmax_XX', ('fs','is','ss')), ('gr_max_XX', ('ff','ii','ss')), - ('gr_peak_detector_XX', ('fb','ib','sb')) + ('gr_peak_detector_XX', ('fb','ib','sb')), + ('gr_multiply_XX', ('ss','ii')), + ('gr_multiply_const_XX', ('ss','ii')), + ('gr_add_XX', ('ss','cc','ii')) ) diff --git a/gnuradio-core/src/lib/runtime/gr_block.cc b/gnuradio-core/src/lib/runtime/gr_block.cc index 9463869f5..78f77486b 100644 --- a/gnuradio-core/src/lib/runtime/gr_block.cc +++ b/gnuradio-core/src/lib/runtime/gr_block.cc @@ -34,6 +34,9 @@ gr_block::gr_block (const std::string &name, gr_io_signature_sptr output_signature) : gr_basic_block(name, input_signature, output_signature), d_output_multiple (1), + d_output_multiple_set(false), + d_unaligned(0), + d_is_unaligned(false), d_relative_rate (1.0), d_history(1), d_fixed_rate(false), @@ -75,10 +78,37 @@ gr_block::set_output_multiple (int multiple) if (multiple < 1) throw std::invalid_argument ("gr_block::set_output_multiple"); + d_output_multiple_set = true; d_output_multiple = multiple; } void +gr_block::set_alignment (int multiple) +{ + if (multiple < 1) + throw std::invalid_argument ("gr_block::set_alignment_multiple"); + + d_output_multiple = multiple; +} + +void +gr_block::set_unaligned (int na) +{ + // unaligned value must be less than 0 and it doesn't make sense + // that it's larger than the alignment value. + if ((na < 0) || (na > d_output_multiple)) + throw std::invalid_argument ("gr_block::set_unaligned"); + + d_unaligned = na; +} + +void +gr_block::set_is_unaligned (bool u) +{ + d_is_unaligned = u; +} + +void gr_block::set_relative_rate (double relative_rate) { if (relative_rate < 0.0) diff --git a/gnuradio-core/src/lib/runtime/gr_block.h b/gnuradio-core/src/lib/runtime/gr_block.h index 86e0583e9..9171942e0 100644 --- a/gnuradio-core/src/lib/runtime/gr_block.h +++ b/gnuradio-core/src/lib/runtime/gr_block.h @@ -152,8 +152,33 @@ class GR_CORE_API gr_block : public gr_basic_block { */ void set_output_multiple (int multiple); int output_multiple () const { return d_output_multiple; } + bool output_multiple_set () const { return d_output_multiple_set; } /*! + * \brief Constrains buffers to work on a set item alignment (for SIMD) + * + * set_alignment_multiple causes the scheduler to ensure that the noutput_items + * argument passed to forecast and general_work will be an integer multiple + * of \param multiple The default value is 1. + * + * This control is similar to the output_multiple setting, except + * that if the number of items passed to the block is less than the + * output_multiple, this value is ignored and the block can produce + * like normal. The d_unaligned value is set to the number of items + * the block is off by. In the next call to general_work, the + * noutput_items is set to d_unaligned or less until + * d_unaligned==0. The buffers are now aligned again and the aligned + * calls can be performed again. + */ + void set_alignment (int multiple); + int alignment () const { return d_output_multiple; } + + void set_unaligned (int na); + int unaligned () const { return d_unaligned; } + void set_is_unaligned (bool u); + bool is_unaligned () const { return d_is_unaligned; } + + /*! * \brief Tell the scheduler \p how_many_items of input stream \p which_input were consumed. */ void consume (int which_input, int how_many_items); @@ -231,6 +256,9 @@ class GR_CORE_API gr_block : public gr_basic_block { private: int d_output_multiple; + bool d_output_multiple_set; + int d_unaligned; + bool d_is_unaligned; double d_relative_rate; // approx output_rate / input_rate gr_block_detail_sptr d_detail; // implementation details unsigned d_history; diff --git a/gnuradio-core/src/lib/runtime/gr_block_executor.cc b/gnuradio-core/src/lib/runtime/gr_block_executor.cc index ef53baf78..86289695a 100644 --- a/gnuradio-core/src/lib/runtime/gr_block_executor.cc +++ b/gnuradio-core/src/lib/runtime/gr_block_executor.cc @@ -183,6 +183,8 @@ gr_block_executor::run_one_iteration() int noutput_items; int max_items_avail; int max_noutput_items = d_max_noutput_items; + int new_alignment=0; + int alignment_state=-1; gr_block *m = d_block.get(); gr_block_detail *d = m->detail().get(); @@ -307,7 +309,11 @@ gr_block_executor::run_one_iteration() // try to work it forward starting with max_items_avail. // We want to try to consume all the input we've got. int reqd_noutput_items = m->fixed_rate_ninput_to_noutput(max_items_avail); - reqd_noutput_items = round_up(reqd_noutput_items, m->output_multiple()); + + // only test this if we specifically set the output_multiple + if(m->output_multiple_set()) + reqd_noutput_items = round_down(reqd_noutput_items, m->output_multiple()); + if (reqd_noutput_items > 0 && reqd_noutput_items <= noutput_items) noutput_items = reqd_noutput_items; @@ -316,6 +322,41 @@ gr_block_executor::run_one_iteration() } noutput_items = std::min(noutput_items, max_noutput_items); + // Check if we're still unaligned; use up items until we're + // aligned again. Otherwise, make sure we set the alignment + // requirement. + if(!m->output_multiple_set()) { + if(m->is_unaligned()) { + // When unaligned, don't just set noutput_items to the remaining + // samples to meet alignment; this causes too much overhead in + // requiring a premature call back here. Set the maximum amount + // of samples to handle unalignment and get us back aligned. + if(noutput_items >= m->unaligned()) { + noutput_items = round_up(noutput_items, m->alignment()) \ + - (m->alignment() - m->unaligned()); + new_alignment = 0; + } + else { + new_alignment = m->unaligned() - noutput_items; + } + alignment_state = 0; + } + else if(noutput_items < m->alignment()) { + // if we don't have enough for an aligned call, keep track of + // misalignment, set unaligned flag, and proceed. + new_alignment = m->alignment() - noutput_items; + m->set_unaligned(new_alignment); + m->set_is_unaligned(true); + alignment_state = 1; + } + else { + // enough to round down to the nearest alignment and process. + noutput_items = round_down(noutput_items, m->alignment()); + m->set_is_unaligned(false); + alignment_state = 2; + } + } + // ask the block how much input they need to produce noutput_items m->forecast (noutput_items, d_ninput_items_required); @@ -354,6 +395,12 @@ gr_block_executor::run_one_iteration() goto were_done; } + // If we were made unaligned in this round but return here without + // processing; reset the unalignment claim before next entry. + if(alignment_state == 1) { + m->set_unaligned(0); + m->set_is_unaligned(false); + } return BLKD_IN; } @@ -379,6 +426,12 @@ gr_block_executor::run_one_iteration() LOG(*d_log << " general_work: noutput_items = " << noutput_items << " result = " << n << std::endl); + // Adjust number of unaligned items left to process + if(m->is_unaligned()) { + m->set_unaligned(new_alignment); + m->set_is_unaligned(m->unaligned() != 0); + } + if(!propagate_tags(m->tag_propagation_policy(), d, d_start_nitems_read, m->relative_rate(), d_returned_tags)) diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_add_and_friends.py b/gnuradio-core/src/python/gnuradio/gr/qa_add_and_friends.py index 8fb70fb3f..e3b20c3c3 100755 --- a/gnuradio-core/src/python/gnuradio/gr/qa_add_and_friends.py +++ b/gnuradio-core/src/python/gnuradio/gr/qa_add_and_friends.py @@ -78,6 +78,24 @@ class test_add_and_friends (gr_unittest.TestCase): op = gr.multiply_const_ii (5) self.help_ii ((src_data,), expected_result, op) + def test_mult_const_ff (self): + src_data = (-1, 0, 1, 2, 3) + expected_result = (-5, 0, 5, 10, 15) + op = gr.multiply_const_cc (5) + self.help_cc ((src_data,), expected_result, op) + + def test_mult_const_cc (self): + src_data = (-1-1j, 0+0j, 1+1j, 2+2j, 3+3j) + expected_result = (-5-5j, 0+0j, 5+5j, 10+10j, 15+15j) + op = gr.multiply_const_cc (5) + self.help_cc ((src_data,), expected_result, op) + + def test_mult_const_cc2 (self): + src_data = (-1-1j, 0+0j, 1+1j, 2+2j, 3+3j) + expected_result = (-3-7j, 0+0j, 3+7j, 6+14j, 9+21j) + op = gr.multiply_const_cc (5+2j) + self.help_cc ((src_data,), expected_result, op) + def test_add_ii (self): src1_data = (1, 2, 3, 4, 5) src2_data = (8, -3, 4, 8, 2) @@ -94,6 +112,22 @@ class test_add_and_friends (gr_unittest.TestCase): self.help_ii ((src1_data, src2_data), expected_result, op) + def test_mult_ff (self): + src1_data = (1, 2, 3, 4, 5) + src2_data = (8, -3, 4, 8, 2) + expected_result = (8, -6, 12, 32, 10) + op = gr.multiply_ff () + self.help_ff ((src1_data, src2_data), + expected_result, op) + + def test_mult_cc (self): + src1_data = (1+1j, 2+2j, 3+3j, 4+4j, 5+5j) + src2_data = (8, -3, 4, 8, 2) + expected_result = (8+8j, -6-6j, 12+12j, 32+32j, 10+10j) + op = gr.multiply_cc () + self.help_cc ((src1_data, src2_data), + expected_result, op) + def test_sub_ii_1 (self): src1_data = (1, 2, 3, 4, 5) expected_result = (-1, -2, -3, -4, -5) diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_complex_to_xxx.py b/gnuradio-core/src/python/gnuradio/gr/qa_complex_to_xxx.py index 76627247b..01679dc05 100755 --- a/gnuradio-core/src/python/gnuradio/gr/qa_complex_to_xxx.py +++ b/gnuradio-core/src/python/gnuradio/gr/qa_complex_to_xxx.py @@ -134,7 +134,7 @@ class test_complex_ops (gr_unittest.TestCase): self.tb.run () actual_result = dst.data () - self.assertFloatTuplesAlmostEqual (expected_result, actual_result, 5) + self.assertFloatTuplesAlmostEqual (expected_result, actual_result, 3) if __name__ == '__main__': diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_conjugate.py b/gnuradio-core/src/python/gnuradio/gr/qa_conjugate.py new file mode 100644 index 000000000..c07902a5a --- /dev/null +++ b/gnuradio-core/src/python/gnuradio/gr/qa_conjugate.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python +# +# Copyright 2012 Free Software Foundation, Inc. +# +# This file is part of GNU Radio +# +# GNU Radio is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GNU Radio is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GNU Radio; see the file COPYING. If not, write to +# the Free Software Foundation, Inc., 51 Franklin Street, +# Boston, MA 02110-1301, USA. +# + +from gnuradio import gr, gr_unittest + +class test_conjugate (gr_unittest.TestCase): + + def setUp (self): + self.tb = gr.top_block () + + def tearDown (self): + self.tb = None + + def test_000 (self): + src_data = (-2-2j, -1-1j, -2+2j, -1+1j, + 2-2j, 1-1j, 2+2j, 1+1j, + 0+0j) + + exp_data = (-2+2j, -1+1j, -2-2j, -1-1j, + 2+2j, 1+1j, 2-2j, 1-1j, + 0-0j) + + src = gr.vector_source_c(src_data) + op = gr.conjugate_cc () + dst = gr.vector_sink_c () + + self.tb.connect(src, op) + self.tb.connect(op, dst) + self.tb.run() + result_data = dst.data () + self.assertEqual (exp_data, result_data) + +if __name__ == '__main__': + gr_unittest.run(test_conjugate, "test_conjugate.xml") diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_float_to_char.py b/gnuradio-core/src/python/gnuradio/gr/qa_float_to_char.py new file mode 100755 index 000000000..ecdd36228 --- /dev/null +++ b/gnuradio-core/src/python/gnuradio/gr/qa_float_to_char.py @@ -0,0 +1,82 @@ +#!/usr/bin/env python +# +# Copyright 2011,2012 Free Software Foundation, Inc. +# +# This file is part of GNU Radio +# +# GNU Radio is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GNU Radio is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GNU Radio; see the file COPYING. If not, write to +# the Free Software Foundation, Inc., 51 Franklin Street, +# Boston, MA 02110-1301, USA. +# + +from gnuradio import gr, gr_unittest +class test_float_to_char (gr_unittest.TestCase): + + def setUp (self): + self.tb = gr.top_block () + + def tearDown (self): + self.tb = None + + def test_001(self): + + src_data = (0.0, 1.1, 2.2, 3.3, 4.4, 5.5, -1.1, -2.2, -3.3) + expected_result = [0, 1, 2, 3, 4, 5, 255, 254, 253] + src = gr.vector_source_f(src_data) + op = gr.float_to_char() + dst = gr.vector_sink_b() + + self.tb.connect(src, op, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + + def test_002(self): + + src_data = ( 126.0, 127.0, 128.0) + expected_result = [ 126, 127, 127 ] + + src = gr.vector_source_f(src_data) + op = gr.float_to_char() + # Note: vector_sink_b returns uchar + dst = gr.vector_sink_b() + + self.tb.connect(src, op, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + + def test_003(self): + + scale = 2 + vlen = 3 + src_data = (0.0, 1.1, 2.2, 3.3, 4.4, 5.5, -1.1, -2.2, -3.3) + expected_result = [0, 2, 4, 6, 8, 11, 254, 252, 250] + src = gr.vector_source_f(src_data) + s2v = gr.stream_to_vector(gr.sizeof_float, vlen) + op = gr.float_to_char(vlen, scale) + v2s = gr.vector_to_stream(gr.sizeof_char, vlen) + dst = gr.vector_sink_b() + + self.tb.connect(src, s2v, op, v2s, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + +if __name__ == '__main__': + gr_unittest.run(test_float_to_char, "test_float_to_char.xml") + diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_float_to_int.py b/gnuradio-core/src/python/gnuradio/gr/qa_float_to_int.py index 3e0b847a2..977a8518d 100644..100755 --- a/gnuradio-core/src/python/gnuradio/gr/qa_float_to_int.py +++ b/gnuradio-core/src/python/gnuradio/gr/qa_float_to_int.py @@ -33,7 +33,8 @@ class test_float_to_int (gr_unittest.TestCase): def test_001(self): src_data = (0.0, 1.1, 2.2, 3.3, 4.4, 5.5, -1.1, -2.2, -3.3, -4.4, -5.5) - expected_result = [int(round(s)) for s in src_data] + expected_result = [0, 1, 2, 3, 4, 6, -1, -2, -3, -4, -6] + src = gr.vector_source_f(src_data) op = gr.float_to_int() dst = gr.vector_sink_i() @@ -60,6 +61,25 @@ class test_float_to_int (gr_unittest.TestCase): self.assertEqual(expected_result, result_data) + + def test_003(self): + + scale = 2 + vlen = 3 + src_data = (0.0, 1.1, 2.2, 3.3, 4.4, 5.5, -1.1, -2.2, -3.3) + expected_result = [0, 2, 4, 7, 9, 11, -2, -4, -7,] + src = gr.vector_source_f(src_data) + s2v = gr.stream_to_vector(gr.sizeof_float, vlen) + op = gr.float_to_int(vlen, scale) + v2s = gr.vector_to_stream(gr.sizeof_int, vlen) + dst = gr.vector_sink_i() + + self.tb.connect(src, s2v, op, v2s, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + if __name__ == '__main__': gr_unittest.run(test_float_to_int, "test_float_to_int.xml") diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_float_to_short.py b/gnuradio-core/src/python/gnuradio/gr/qa_float_to_short.py new file mode 100755 index 000000000..0d89a149c --- /dev/null +++ b/gnuradio-core/src/python/gnuradio/gr/qa_float_to_short.py @@ -0,0 +1,86 @@ +#!/usr/bin/env python +# +# Copyright 2011,2012 Free Software Foundation, Inc. +# +# This file is part of GNU Radio +# +# GNU Radio is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GNU Radio is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GNU Radio; see the file COPYING. If not, write to +# the Free Software Foundation, Inc., 51 Franklin Street, +# Boston, MA 02110-1301, USA. +# + +from gnuradio import gr, gr_unittest +import ctypes + +class test_float_to_short (gr_unittest.TestCase): + + def setUp (self): + self.tb = gr.top_block () + + def tearDown (self): + self.tb = None + + def test_001(self): + + src_data = (0.0, 1.1, 2.2, 3.3, 4.4, 5.5, -1.1, -2.2, -3.3, -4.4, -5.5) + expected_result = [0, 1, 2, 3, 4, 6, -1, -2, -3, -4, -6] + + src = gr.vector_source_f(src_data) + op = gr.float_to_short() + dst = gr.vector_sink_s() + + self.tb.connect(src, op, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + + def test_002(self): + + src_data = ( 32766, 32767, 32768, + -32767, -32768, -32769) + expected_result = [ 32766, 32767, 32767, + -32767, -32768, -32768 ] + + src = gr.vector_source_f(src_data) + op = gr.float_to_short() + dst = gr.vector_sink_s() + + self.tb.connect(src, op, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + + def test_003(self): + + scale = 2 + vlen = 3 + src_data = (0.0, 1.1, 2.2, 3.3, 4.4, 5.5, -1.1, -2.2, -3.3) + expected_result = [0, 2, 4, 7, 9, 11, -2, -4, -7] + src = gr.vector_source_f(src_data) + s2v = gr.stream_to_vector(gr.sizeof_float, vlen) + op = gr.float_to_short(vlen, scale) + v2s = gr.vector_to_stream(gr.sizeof_short, vlen) + dst = gr.vector_sink_s() + + self.tb.connect(src, s2v, op, v2s, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + +if __name__ == '__main__': + gr_unittest.run(test_float_to_short, "test_float_to_short.xml") + diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_float_to_uchar.py b/gnuradio-core/src/python/gnuradio/gr/qa_float_to_uchar.py new file mode 100755 index 000000000..0d54f45f3 --- /dev/null +++ b/gnuradio-core/src/python/gnuradio/gr/qa_float_to_uchar.py @@ -0,0 +1,64 @@ +#!/usr/bin/env python +# +# Copyright 2011 Free Software Foundation, Inc. +# +# This file is part of GNU Radio +# +# GNU Radio is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GNU Radio is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GNU Radio; see the file COPYING. If not, write to +# the Free Software Foundation, Inc., 51 Franklin Street, +# Boston, MA 02110-1301, USA. +# + +from gnuradio import gr, gr_unittest +import ctypes + +class test_float_to_uchar (gr_unittest.TestCase): + + def setUp (self): + self.tb = gr.top_block () + + def tearDown (self): + self.tb = None + + def test_001(self): + + src_data = (0.0, 1.1, 2.2, 3.3, 4.4, 5.5, -1.1, -2.2, -3.3, -4.4, -5.5) + expected_result = [0, 1, 2, 3, 4, 6, 0, 0, 0, 0, 0] + src = gr.vector_source_f(src_data) + op = gr.float_to_uchar() + dst = gr.vector_sink_b() + + self.tb.connect(src, op, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + + def test_002(self): + + src_data = ( 254.0, 255.0, 256.0) + expected_result = [ 254, 255, 255 ] + src = gr.vector_source_f(src_data) + op = gr.float_to_uchar() + dst = gr.vector_sink_b() + + self.tb.connect(src, op, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + +if __name__ == '__main__': + gr_unittest.run(test_float_to_uchar, "test_float_to_uchar.xml") + diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_int_to_float.py b/gnuradio-core/src/python/gnuradio/gr/qa_int_to_float.py index edfc26409..530b2a5cc 100755 --- a/gnuradio-core/src/python/gnuradio/gr/qa_int_to_float.py +++ b/gnuradio-core/src/python/gnuradio/gr/qa_int_to_float.py @@ -44,6 +44,26 @@ class test_int_to_float (gr_unittest.TestCase): self.assertFloatTuplesAlmostEqual(expected_result, result_data) + def test_002(self): + + vlen = 3 + src_data = ( 65000, 65001, 65002, 65003, 65004, 65005, + -65001, -65002, -65003) + expected_result = [ 65000.0, 65001.0, 65002.0, + 65003.0, 65004.0, 65005.0, + -65001.0, -65002.0, -65003.0] + src = gr.vector_source_i(src_data) + s2v = gr.stream_to_vector(gr.sizeof_int, vlen) + op = gr.int_to_float(vlen) + v2s = gr.vector_to_stream(gr.sizeof_float, vlen) + dst = gr.vector_sink_f() + + self.tb.connect(src, s2v, op, v2s, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + if __name__ == '__main__': gr_unittest.run(test_int_to_float, "test_int_to_float.xml") diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_multiply_conjugate.py b/gnuradio-core/src/python/gnuradio/gr/qa_multiply_conjugate.py new file mode 100644 index 000000000..aaf3cc125 --- /dev/null +++ b/gnuradio-core/src/python/gnuradio/gr/qa_multiply_conjugate.py @@ -0,0 +1,57 @@ +#!/usr/bin/env python +# +# Copyright 2012 Free Software Foundation, Inc. +# +# This file is part of GNU Radio +# +# GNU Radio is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GNU Radio is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GNU Radio; see the file COPYING. If not, write to +# the Free Software Foundation, Inc., 51 Franklin Street, +# Boston, MA 02110-1301, USA. +# + +from gnuradio import gr, gr_unittest + +class test_multiply_conjugate (gr_unittest.TestCase): + + def setUp (self): + self.tb = gr.top_block () + + def tearDown (self): + self.tb = None + + def test_000 (self): + src_data0 = (-2-2j, -1-1j, -2+2j, -1+1j, + 2-2j, 1-1j, 2+2j, 1+1j, + 0+0j) + src_data1 = (-3-3j, -4-4j, -3+3j, -4+4j, + 3-3j, 4-4j, 3+3j, 4+4j, + 0+0j) + + exp_data = (12+0j, 8+0j, 12+0j, 8+0j, + 12+0j, 8+0j, 12+0j, 8+0j, + 0+0j) + src0 = gr.vector_source_c(src_data0) + src1 = gr.vector_source_c(src_data1) + op = gr.multiply_conjugate_cc () + dst = gr.vector_sink_c () + + self.tb.connect(src0, (op,0)) + self.tb.connect(src1, (op,1)) + self.tb.connect(op, dst) + self.tb.run() + result_data = dst.data () + self.assertEqual (exp_data, result_data) + +if __name__ == '__main__': + gr_unittest.run(test_multiply_conjugate, "test_multiply_conjugate.xml") diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_short_to_char.py b/gnuradio-core/src/python/gnuradio/gr/qa_short_to_char.py new file mode 100755 index 000000000..6a95fa01d --- /dev/null +++ b/gnuradio-core/src/python/gnuradio/gr/qa_short_to_char.py @@ -0,0 +1,69 @@ +#!/usr/bin/env python +# +# Copyright 2011,2012 Free Software Foundation, Inc. +# +# This file is part of GNU Radio +# +# GNU Radio is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GNU Radio is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GNU Radio; see the file COPYING. If not, write to +# the Free Software Foundation, Inc., 51 Franklin Street, +# Boston, MA 02110-1301, USA. +# + +from gnuradio import gr, gr_unittest +import ctypes + +class test_short_to_char (gr_unittest.TestCase): + + def setUp (self): + self.tb = gr.top_block () + + def tearDown (self): + self.tb = None + + def test_001(self): + + src_data = range(0, 32767, 32767/127) + src_data = [int(s) for s in src_data] + expected_result = range(0, 128) + src = gr.vector_source_s(src_data) + op = gr.short_to_char() + dst = gr.vector_sink_b() + + self.tb.connect(src, op, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + + def test_002(self): + + vlen = 3 + src_data = range(0, 32400, 32767/127) + src_data = [int(s) for s in src_data] + expected_result = range(0, 126) + src = gr.vector_source_s(src_data) + s2v = gr.stream_to_vector(gr.sizeof_short, vlen) + op = gr.short_to_char(vlen) + v2s = gr.vector_to_stream(gr.sizeof_char, vlen) + dst = gr.vector_sink_b() + + self.tb.connect(src, s2v, op, v2s, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + +if __name__ == '__main__': + gr_unittest.run(test_short_to_char, "test_short_to_char.xml") + diff --git a/gnuradio-core/src/python/gnuradio/gr/qa_short_to_float.py b/gnuradio-core/src/python/gnuradio/gr/qa_short_to_float.py new file mode 100755 index 000000000..8f331b495 --- /dev/null +++ b/gnuradio-core/src/python/gnuradio/gr/qa_short_to_float.py @@ -0,0 +1,70 @@ +#!/usr/bin/env python +# +# Copyright 2011,2012 Free Software Foundation, Inc. +# +# This file is part of GNU Radio +# +# GNU Radio is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GNU Radio is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GNU Radio; see the file COPYING. If not, write to +# the Free Software Foundation, Inc., 51 Franklin Street, +# Boston, MA 02110-1301, USA. +# + +from gnuradio import gr, gr_unittest +import ctypes + +class test_short_to_float (gr_unittest.TestCase): + + def setUp (self): + self.tb = gr.top_block () + + def tearDown (self): + self.tb = None + + def test_001(self): + + src_data = (0, 1, 2, 3, 4, 5, -1, -2, -3, -4, -5) + expected_result = [ 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, + -1.0, -2.0, -3.0, -4.0, -5.0] + + src = gr.vector_source_s(src_data) + op = gr.short_to_float() + dst = gr.vector_sink_f() + + self.tb.connect(src, op, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + + def test_002(self): + + vlen = 3 + src_data = (0, 1, 2, 3, 4, 5, -1, -2, -3) + expected_result = [0.0, 1.0, 2.0, 3.0, 4.0, + 5.0, -1.0, -2.0, -3.0] + src = gr.vector_source_s(src_data) + s2v = gr.stream_to_vector(gr.sizeof_short, vlen) + op = gr.short_to_float(vlen) + v2s = gr.vector_to_stream(gr.sizeof_float, vlen) + dst = gr.vector_sink_f() + + self.tb.connect(src, s2v, op, v2s, dst) + self.tb.run() + result_data = list(dst.data()) + + self.assertEqual(expected_result, result_data) + +if __name__ == '__main__': + gr_unittest.run(test_short_to_float, "test_short_to_float.xml") + diff --git a/gnuradio-examples/python/volk_benchmark/README b/gnuradio-examples/python/volk_benchmark/README new file mode 100644 index 000000000..516fc15bd --- /dev/null +++ b/gnuradio-examples/python/volk_benchmark/README @@ -0,0 +1,252 @@ +VOLK Benchmarking Scripts + +The Python programs in this directory are designed to help benchmark +and compare Volk enhancements to GNU Radio. There are two kinds of +scripts here: collecting data and displaying the data. + +Data collection is done by running a Volk testing script that will +populate a SQLite database file (volk_results.db by default). The +plotting utility provided here reads from the database files and plots +bar graphs to compare the different installations. + +These benchmarks can be used to compare previous versions of GNU +Radio to using Volk; they can be used to compare different Volk +proto-kernels, as well, by editing the volk_config file; or they could +be used to compare performance between different machines and/or +processors. + + +====================================================================== +Volk Profiling + +Before doing any kind of Volk benchmarking, it is important to run the +volk_profile program. The profiler will build a config file for the +best SIMD architecture for your processor. Run volk_profile that is +installed into $PREFIX/bin. This program tests all known Volk kernels +for each proto-kernel supported by the processor. When finished, it +will write to $HOME/.volk/volk_config the best architecture for the +VOLK function. This file is read when using a function to know the +best version of the function to execute. + +The volk_config file contains a line for each kernel, where each line +looks like: + + volk_<KERNEL_NAME> <ARCHITECTURE> + +The architecture will be something like (sse, sse2, sse3, avx, neon, +etc.), depending on your processor. + + +====================================================================== +Benchmark Tests + +There are currently two benchmark scripts defined for collecting +data. There is one that runs through the type conversions that have +been converted to Volk (volk_types.py) and the other runs through the +math operators converted to using Volk (volk_math.py). + +Script prototypes +Both have the same structure for use: + +---------------------------------------------------------------------- +./volk_<test>.py [-h] -L LABEL [-D DATABASE] [-N NITEMS] [-I ITERATIONS] + [--tests [{0,1,2,3} [{0,1,2,3} ...]]] [--list] + [--all] + +optional arguments: + -h, --help show this help message and exit + -L LABEL, --label LABEL + Label of database table [default: None] + -D DATABASE, --database DATABASE + Database file to store data in [default: + volk_results.db] + -N NITEMS, --nitems NITEMS + Number of items per iterations [default: 1000000000.0] + -I ITERATIONS, --iterations ITERATIONS + Number of iterations [default: 20] + --tests [{0,1,2,3} [{0,1,2,3} ...]] + A list of tests to run; can be a single test or a + space-separated list. + --list List the available tests + --all Run all tests +---------------------------------------------------------------------- + +To run, you specify the tests to run and a label to store along with +the results. To find out what the available tests are, use the +'--list' option. + +To specify a subset of tests, use the '--tests' with space-separated +list of tests numbers (e.g., --tests 0 2 4 9). + +Use the '--all' to run all tests. + +The label specified is used as an identifier for the benchmarking +currently being done. This is required as it is important in +organizing the data in the database (each label is its own +table). Usually, the label will specify the type of run being done, +such as "volk_aligned" or "v3_5_1". In these cases, the "volk_aligned" +label says that this is for a benchmarking using the GNU Radio version +that uses the aligned scheduler and Volk calls in the work +functions. The "v3_5_1" label is if you were benchmarking an installed +version 3.5.1 of GNU Radio, which is pre-Volk. These will then be +plotted against each other to see the timing differences. + +The 'database' option will output the results to a new database +file. This can be useful for separating the output of different runs +or of different benchmarks, such as the types versus the math scripts, +say, or to distinguish results from different computers. + +If rerun using the same database and label, the entries in the table +will simply be replaced by the new results. + +It is often useful to use the 'sqlitebrowser' program to interrogate +the database file farther, if you are interested in the structure or +the raw data. + +Other parameters of this script set the number of items to process and +number of iterations to use when computing the benchmarking +data. These default to 1 billion samples per iteration over 20 +iterations. Expect a default run to take a long time. Using the '-N' +and '-I' options can be used to change the runtime of the benchmarks +but are set high to remove problems of variance between iterations. + +====================================================================== +Plotting Results + +The volk_plot.py script reads a given database file and plots the +results. The default behavior is to read all of the labels stored in +the database and plot them as data sets on a bar graph. This shows the +average time taken to process the number of items given. + +The options for the plotting script are: + +usage: volk_plot.py [-h] [-D DATABASE] [-E] [-P {mean,min,max}] [-% table] + +Plot Volk performance results from a SQLite database. Run one of the volk +tests first (e.g, volk_math.py) + +---------------------------------------------------------------------- +optional arguments: + -h, --help show this help message and exit + -D DATABASE, --database DATABASE + Database file to read data from [default: + volk_results.db] + -E, --errorbars Show error bars (1 standard dev.) + -P {mean,min,max}, --plot {mean,min,max} + Set the type of plot to produce [default: mean] + -% table, --percent table + Show percent difference to the given type [default: + None] +---------------------------------------------------------------------- + +This script allows you to specify the database used (-D), but will +always read all rows from all tables from it and display them. You can +also turn on plotting error bars (1 standard deviation the mean). Be +careful, though, as some older versions of Matplotlib might have an +issue with this option. + +The mean time is only one possible statistic that we might be +interested in when looking at the data. It represents the average user +experience when running a given block. On the other hand, the minimum +runtime best represents the actual performance of a block given +minimal OS interruptions while running. Right now, the data collected +includes the mean, variance, min, and max over the number of +iterations given. Using the '-P' option, you can specify the type of +data to plot (mean, min, or max). + +Another useful way of looking at the data is to compare the percent +improvement of a benchmark compared to another. This is done using the +'-%' option with the provided table (or label) as the baseline. So if +we were interested in comparing how much the 'volk_aligned' was over +'v3_5_1', we would specify '-% v3_5_1' to see this. The plot would +then only show the percent speedup observed using Volk for each of the +blocks. + + +====================================================================== +Benchmarking Walkthrough + +This will walk through an example of benchmarking the new Volk +implementation versus the pre-Volk GNU Radio. It also shows how to +look at the SIMD optimized versions versus the generic +implementations. + +Since we introduced Volk in GNU Radio 3.5.2, we will use the following +labels for our data: + + 1.) volk_aligned: v3.5.2 with volk_profile results in .volk/volk_config + 2.) v3_5_2: v3.5.2 with the generic (non-SIMD) calls to Volk + 3.) v3_5_1: an installation of GNU Radio from version v3.5.1 + +We assume that we have installed two versions of GNU Radio. + + v3.5.2 installed into /opt/gr-3_5_2 + v3.5.1 installed into /opt/gr-3_5_1 + +To test cases 1 and 2 above, we have to run GNU Radio from the v3.5.2 +installation, so we set the following environmental variables. Note +that this is written for Ubuntu 11.10. These commands and directories +may have to be changed depending on your OS and versions. + + export LD_LIBRARY_PATH=/opt/gr-3_5_2/lib + export LD_LIBRARY_PATH=/opt/gr-3_5_2/lib/python2.7/dist-packages + +Now we can run the benchmark tests, so we will focus on the math +operators: + + ./volk_math.py -D volk_results_math.db --all -L volk_aligned + +When this finishes, the 'volk_results_math.db' will contain our +results for this run. + +We next want to run the generic, non-SIMD, calls. This can be done by +changing the Volk kernel settings in $HOME/.volk/volk_config. First, +make a backup of this file. Then edit it and change all architecture +calls (sse, sse2, etc.) to 'generic.' Now, Volk will only call the +generic versions of these functions. So we rerun the benchmark with: + + ./volk_math.py -D volk_results_math.db --all -L v3_5_2 + +Notice that the only thing changed here was the label to 'v3_5_2'. + +Next, we want to collect data for the non-Volk version of GNU +Radio. This is important because some internals to GNU Radio were made +when adding support for Volk, so it is nice to know what the +differences do to our performance. First, we set the environmental +variables to point to the v3.5.1 installation: + + export LD_LIBRARY_PATH=/opt/gr-3_5_1/lib + export LD_LIBRARY_PATH=/opt/gr-3_5_1/lib/python2.7/dist-packages + +And when we run the test, we use the same command line, but the GNU +Radio libraries and Python files used come from v3.5.1. We also change +the label to indicate the different version to store. + + ./volk_math.py -D volk_results_math.db --all -L v3_5_1 + +We now have a database populated with three tables for the three +different labels. We can plot them all together by simply running: + + ./volk_plot.py -D volk_results_math.db + +This will show the average run times for each of the three +configurations for all math functions tested. We might also be +interested to see the difference in performance from the v3.5.1 +version, so we can run: + + ./volk_plot.py -D volk_results_math.db -% v3_5_1 + +That will plot both the 'volk_aligned' and 'v3_5_2' as a percentage +improvement over v3_5_1. A positive value indicates that this version +runs faster than the v3.5.1 version. + + +---------------------------------------------------------------------- + +Another interesting test case could be to compare results on different +processors. So if you have different generation Intels, AMD, or +whatever, you can simply pass the .db file around and run the Volk +benchmark script to populate the database with different results. For +this, you would specify a label like '-L i7_2620M' that indicates the +processor type to uniquely ID the data. + diff --git a/gnuradio-examples/python/volk_benchmark/volk_math.py b/gnuradio-examples/python/volk_benchmark/volk_math.py new file mode 100755 index 000000000..8b0081387 --- /dev/null +++ b/gnuradio-examples/python/volk_benchmark/volk_math.py @@ -0,0 +1,152 @@ +#!/usr/bin/env python + +from gnuradio import gr +import argparse +from volk_test_funcs import * + +def multiply_const_cc(N): + k = 3.3 + op = gr.multiply_const_cc(k) + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_gr_complex, 1, 1) + return tb + +###################################################################### + +def multiply_const_ff(N): + k = 3.3 + op = gr.multiply_const_ff(k) + tb = helper(N, op, gr.sizeof_float, gr.sizeof_float, 1, 1) + return tb + +###################################################################### + +def multiply_cc(N): + op = gr.multiply_cc() + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_gr_complex, 2, 1) + return tb + +###################################################################### + +def multiply_ff(N): + op = gr.multiply_ff() + tb = helper(N, op, gr.sizeof_float, gr.sizeof_float, 2, 1) + return tb + +###################################################################### + +def add_ff(N): + op = gr.add_ff() + tb = helper(N, op, gr.sizeof_float, gr.sizeof_float, 2, 1) + return tb + +###################################################################### + +def conjugate_cc(N): + op = gr.conjugate_cc() + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_gr_complex, 1, 1) + return tb + +###################################################################### + +def multiply_conjugate_cc(N): + try: + op = gr.multiply_conjugate_cc() + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_gr_complex, 2, 1) + return tb + + except AttributeError: + class s(gr.hier_block2): + def __init__(self): + gr.hier_block2.__init__(self, "s", + gr.io_signature(2, 2, gr.sizeof_gr_complex), + gr.io_signature(1, 1, gr.sizeof_gr_complex)) + conj = gr.conjugate_cc() + mult = gr.multiply_cc() + self.connect((self,0), (mult,0)) + self.connect((self,1), conj, (mult,1)) + self.connect(mult, self) + + op = s() + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_gr_complex, 2, 1) + return tb + + +###################################################################### + +def run_tests(func, N, iters): + print("Running Test: {0}".format(func.__name__)) + try: + tb = func(N) + t = timeit(tb, iters) + res = format_results(func.__name__, t) + return res + except AttributeError: + print "\tCould not run test. Skipping." + return None + +def main(): + avail_tests = [multiply_const_cc, + multiply_const_ff, + multiply_cc, + multiply_ff, + add_ff, + conjugate_cc, + multiply_conjugate_cc] + + desc='Time an operation to compare with other implementations. \ + This program runs a simple GNU Radio flowgraph to test a \ + particular math function, mostly to compare the \ + Volk-optimized implementation versus a regular \ + implementation. The results are stored to an SQLite database \ + that can then be read by volk_plot.py to plot the differences.' + parser = argparse.ArgumentParser(description=desc) + parser.add_argument('-L', '--label', type=str, + required=True, default=None, + help='Label of database table [default: %(default)s]') + parser.add_argument('-D', '--database', type=str, + default="volk_results.db", + help='Database file to store data in [default: %(default)s]') + parser.add_argument('-N', '--nitems', type=float, + default=1e9, + help='Number of items per iterations [default: %(default)s]') + parser.add_argument('-I', '--iterations', type=int, + default=20, + help='Number of iterations [default: %(default)s]') + parser.add_argument('--tests', type=int, nargs='*', + choices=xrange(len(avail_tests)), + help='A list of tests to run; can be a single test or a \ + space-separated list.') + parser.add_argument('--list', action='store_true', + help='List the available tests') + parser.add_argument('--all', action='store_true', + help='Run all tests') + args = parser.parse_args() + + if(args.list): + print "Available Tests to Run:" + print "\n".join(["\t{0}: {1}".format(i,f.__name__) for i,f in enumerate(avail_tests)]) + sys.exit(0) + + N = int(args.nitems) + iters = args.iterations + label = args.label + + conn = create_connection(args.database) + new_table(conn, label) + + if not args.all: + func = avail_tests[args.test] + res = run_tests(func, N, iters) + if res is not None: + replace_results(conn, label, N, iters, res) + else: + for f in avail_tests: + res = run_tests(f, N, iters) + if res is not None: + replace_results(conn, label, N, iters, res) + +if __name__ == "__main__": + try: + main() + except KeyboardInterrupt: + pass diff --git a/gnuradio-examples/python/volk_benchmark/volk_plot.py b/gnuradio-examples/python/volk_benchmark/volk_plot.py new file mode 100755 index 000000000..823dfbf64 --- /dev/null +++ b/gnuradio-examples/python/volk_benchmark/volk_plot.py @@ -0,0 +1,169 @@ +#!/usr/bin/env python + +import sys, math +import argparse +from volk_test_funcs import * + +try: + import matplotlib + import matplotlib.pyplot as plt +except ImportError: + sys.stderr.write("Could not import Matplotlib (http://matplotlib.sourceforge.net/)\n") + sys.exit(1) + +def main(): + desc='Plot Volk performance results from a SQLite database. ' + \ + 'Run one of the volk tests first (e.g, volk_math.py)' + parser = argparse.ArgumentParser(description=desc) + parser.add_argument('-D', '--database', type=str, + default='volk_results.db', + help='Database file to read data from [default: %(default)s]') + parser.add_argument('-E', '--errorbars', + action='store_true', default=False, + help='Show error bars (1 standard dev.)') + parser.add_argument('-P', '--plot', type=str, + choices=['mean', 'min', 'max'], + default='mean', + help='Set the type of plot to produce [default: %(default)s]') + parser.add_argument('-%', '--percent', type=str, + default=None, metavar="table", + help='Show percent difference to the given type [default: %(default)s]') + args = parser.parse_args() + + # Set up global plotting properties + matplotlib.rcParams['figure.subplot.bottom'] = 0.2 + matplotlib.rcParams['figure.subplot.top'] = 0.95 + matplotlib.rcParams['figure.subplot.right'] = 0.98 + matplotlib.rcParams['ytick.labelsize'] = 16 + matplotlib.rcParams['xtick.labelsize'] = 16 + matplotlib.rcParams['legend.fontsize'] = 18 + + # Get list of tables to compare + conn = create_connection(args.database) + tables = list_tables(conn) + M = len(tables) + + # Colors to distinguish each table in the bar graph + # More than 5 tables will wrap around to the start. + colors = ['b', 'r', 'g', 'm', 'k'] + + # Set up figure for plotting + f0 = plt.figure(0, facecolor='w', figsize=(14,10)) + s0 = f0.add_subplot(1,1,1) + + # Create a register of names that exist in all tables + tmp_regs = [] + for table in tables: + # Get results from the next table + res = get_results(conn, table[0]) + + tmp_regs.append(list()) + for r in res: + try: + tmp_regs[-1].index(r['kernel']) + except ValueError: + tmp_regs[-1].append(r['kernel']) + + # Get only those names that are common in all tables + name_reg = tmp_regs[0] + for t in tmp_regs[1:]: + name_reg = list(set(name_reg) & set(t)) + name_reg.sort() + + # Pull the data out for each table into a dictionary + # we can ref the table by it's name and the data associated + # with a given kernel in name_reg by it's name. + # This ensures there is no sorting issue with the data in the + # dictionary, so the kernels are plotted against each other. + table_data = dict() + for i,table in enumerate(tables): + # Get results from the next table + res = get_results(conn, table[0]) + + data = dict() + for r in res: + data[r['kernel']] = r + + table_data[table[0]] = data + + if args.percent is not None: + for i,t in enumerate(table_data): + if args.percent == t: + norm_data = [] + for name in name_reg: + if(args.plot == 'max'): + norm_data.append(table_data[t][name]['max']) + elif(args.plot == 'min'): + norm_data.append(table_data[t][name]['min']) + elif(args.plot == 'mean'): + norm_data.append(table_data[t][name]['avg']) + + + # Plot the results + x0 = xrange(len(name_reg)) + i = 0 + for t in (table_data): + ydata = [] + stds = [] + for name in name_reg: + stds.append(math.sqrt(table_data[t][name]['var'])) + if(args.plot == 'max'): + ydata.append(table_data[t][name]['max']) + elif(args.plot == 'min'): + ydata.append(table_data[t][name]['min']) + elif(args.plot == 'mean'): + ydata.append(table_data[t][name]['avg']) + + if args.percent is not None: + ydata = [-100*(y-n)/y for y,n in zip(ydata,norm_data)] + if(args.percent != t): + # makes x values for this data set placement + # width of bars depends on number of comparisons + wdth = 0.80/(M-1) + x1 = [x + i*wdth for x in x0] + i += 1 + + s0.bar(x1, ydata, width=wdth, + color=colors[(i-1)%M], label=t, + edgecolor='k', linewidth=2) + + else: + # makes x values for this data set placement + # width of bars depends on number of comparisons + wdth = 0.80/M + x1 = [x + i*wdth for x in x0] + i += 1 + + if(args.errorbars is False): + s0.bar(x1, ydata, width=wdth, + color=colors[(i-1)%M], label=t, + edgecolor='k', linewidth=2) + else: + s0.bar(x1, ydata, width=wdth, + yerr=stds, + color=colors[i%M], label=t, + edgecolor='k', linewidth=2, + error_kw={"ecolor": 'k', "capsize":5, + "linewidth":2}) + + nitems = res[0]['nitems'] + if args.percent is None: + s0.set_ylabel("Processing time (sec) [{0:G} items]".format(nitems), + fontsize=22, fontweight='bold', + horizontalalignment='center') + else: + s0.set_ylabel("% Improvement over {0} [{1:G} items]".format( + args.percent, nitems), + fontsize=22, fontweight='bold') + + s0.legend() + s0.set_xticks(x0) + s0.set_xticklabels(name_reg) + for label in s0.xaxis.get_ticklabels(): + label.set_rotation(45) + label.set_fontsize(16) + + plt.show() + +if __name__ == "__main__": + main() diff --git a/gnuradio-examples/python/volk_benchmark/volk_test_funcs.py b/gnuradio-examples/python/volk_benchmark/volk_test_funcs.py new file mode 100644 index 000000000..4f4e4afd3 --- /dev/null +++ b/gnuradio-examples/python/volk_benchmark/volk_test_funcs.py @@ -0,0 +1,171 @@ +#!/usr/bin/env python + +from gnuradio import gr +import math, sys, os, time + +try: + import scipy +except ImportError: + sys.stderr.write("Unable to import Scipy (www.scipy.org)\n") + sys.exit(1) + +try: + import sqlite3 +except ImportError: + sys.stderr.write("Unable to import sqlite3: requires Python 2.5\n") + sys.exit(1) + +def execute(conn, cmd): + ''' + Executes the command cmd to the database opened in connection conn. + ''' + c = conn.cursor() + c.execute(cmd) + conn.commit() + c.close() + +def create_connection(database): + ''' + Returns a connection object to the SQLite database. + ''' + return sqlite3.connect(database) + +def new_table(conn, tablename): + ''' + Create a new table for results. + All results are in the form: [kernel | nitems | iters | avg. time | variance | max time | min time ] + Each table is meant as a different setting (e.g., volk_aligned, volk_unaligned, etc.) + ''' + cols = "kernel text, nitems int, iters int, avg real, var real, max real, min real" + cmd = "create table if not exists {0} ({1})".format( + tablename, cols) + execute(conn, cmd) + +def replace_results(conn, tablename, nitems, iters, res): + ''' + Inserts or replaces the results 'res' dictionary values into the table. + This deletes all old entries of the kernel in this table. + ''' + cmd = "DELETE FROM {0} where kernel='{1}'".format(tablename, res["kernel"]) + execute(conn, cmd) + insert_results(conn, tablename, nitems, iters, res) + +def insert_results(conn, tablename, nitems, iters, res): + ''' + Inserts the results dictionary values into the table. + ''' + cols = "kernel, nitems, iters, avg, var, max, min" + cmd = "INSERT INTO {0} ({1}) VALUES ('{2}', {3}, {4}, {5}, {6}, {7}, {8})".format( + tablename, cols, res["kernel"], nitems, iters, + res["avg"], res["var"], res["max"], res["min"]) + execute(conn, cmd) + +def list_tables(conn): + ''' + Returns a list of all tables in the database. + ''' + cmd = "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name" + c = conn.cursor() + c.execute(cmd) + t = c.fetchall() + c.close() + + return t + +def get_results(conn, tablename): + ''' + Gets all results in tablename. + ''' + cmd = "SELECT * FROM {0}".format(tablename) + c = conn.cursor() + c.execute(cmd) + fetched = c.fetchall() + c.close() + + res = list() + for f in fetched: + r = dict() + r['kernel'] = f[0] + r['nitems'] = f[1] + r['iters'] = f[2] + r['avg'] = f[3] + r['var'] = f[4] + r['min'] = f[5] + r['max'] = f[6] + res.append(r) + + return res + + +class helper(gr.top_block): + ''' + Helper function to run the tests. The parameters are: + N: number of items to process (int) + op: The GR block/hier_block to test + isizeof: the sizeof the input type + osizeof: the sizeof the output type + nsrcs: number of inputs to the op + nsnks: number of outputs of the op + + This function can only handle blocks where all inputs are the same + datatype and all outputs are the same data type + ''' + def __init__(self, N, op, + isizeof=gr.sizeof_gr_complex, + osizeof=gr.sizeof_gr_complex, + nsrcs=1, nsnks=1): + gr.top_block.__init__(self, "helper") + + self.op = op + self.srcs = [] + self.snks = [] + self.head = gr.head(isizeof, N) + + for n in xrange(nsrcs): + self.srcs.append(gr.null_source(isizeof)) + + for n in xrange(nsnks): + self.snks.append(gr.null_sink(osizeof)) + + self.connect(self.srcs[0], self.head, (self.op,0)) + + for n in xrange(1, nsrcs): + self.connect(self.srcs[n], (self.op,n)) + + for n in xrange(nsnks): + self.connect((self.op,n), self.snks[n]) + +def timeit(tb, iterations): + ''' + Given a top block, this function times it for a number of + iterations and stores the time in a list that is returned. + ''' + r = gr.enable_realtime_scheduling() + if r != gr.RT_OK: + print "Warning: failed to enable realtime scheduling" + + times = [] + for i in xrange(iterations): + start_time = time.time() + tb.run() + end_time = time.time() + tb.head.reset() + + times.append(end_time - start_time) + + return times + +def format_results(kernel, times): + ''' + Convinience function to convert the results of the timeit function + into a dictionary. + ''' + res = dict() + res["kernel"] = kernel + res["avg"] = scipy.mean(times) + res["var"] = scipy.var(times) + res["max"] = max(times) + res["min"] = min(times) + return res + + diff --git a/gnuradio-examples/python/volk_benchmark/volk_types.py b/gnuradio-examples/python/volk_benchmark/volk_types.py new file mode 100755 index 000000000..3bc5a22ae --- /dev/null +++ b/gnuradio-examples/python/volk_benchmark/volk_types.py @@ -0,0 +1,182 @@ +#!/usr/bin/env python + +from gnuradio import gr +import argparse +from volk_test_funcs import * + +###################################################################### + +def float_to_char(N): + op = gr.float_to_char() + tb = helper(N, op, gr.sizeof_float, gr.sizeof_char, 1, 1) + return tb + +###################################################################### + +def float_to_int(N): + op = gr.float_to_int() + tb = helper(N, op, gr.sizeof_float, gr.sizeof_int, 1, 1) + return tb + +###################################################################### + +def float_to_short(N): + op = gr.float_to_short() + tb = helper(N, op, gr.sizeof_float, gr.sizeof_short, 1, 1) + return tb + +###################################################################### + +def short_to_float(N): + op = gr.short_to_float() + tb = helper(N, op, gr.sizeof_short, gr.sizeof_float, 1, 1) + return tb + +###################################################################### + +def short_to_char(N): + op = gr.short_to_char() + tb = helper(N, op, gr.sizeof_short, gr.sizeof_char, 1, 1) + return tb + +###################################################################### + +def char_to_short(N): + op = gr.char_to_short() + tb = helper(N, op, gr.sizeof_char, gr.sizeof_short, 1, 1) + return tb + +###################################################################### + +def char_to_float(N): + op = gr.char_to_float() + tb = helper(N, op, gr.sizeof_char, gr.sizeof_float, 1, 1) + return tb + +###################################################################### + +def int_to_float(N): + op = gr.int_to_float() + tb = helper(N, op, gr.sizeof_int, gr.sizeof_float, 1, 1) + return tb + +###################################################################### + +def complex_to_float(N): + op = gr.complex_to_float() + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_float, 1, 2) + return tb + +###################################################################### + +def complex_to_real(N): + op = gr.complex_to_real() + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_float, 1, 1) + return tb + +###################################################################### + +def complex_to_imag(N): + op = gr.complex_to_imag() + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_float, 1, 1) + return tb + +###################################################################### + +def complex_to_mag(N): + op = gr.complex_to_mag() + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_float, 1, 1) + return tb + +###################################################################### + +def complex_to_mag_squared(N): + op = gr.complex_to_mag_squared() + tb = helper(N, op, gr.sizeof_gr_complex, gr.sizeof_float, 1, 1) + return tb + +###################################################################### + + +def run_tests(func, N, iters): + print("Running Test: {0}".format(func.__name__)) + try: + tb = func(N) + t = timeit(tb, iters) + res = format_results(func.__name__, t) + return res + except AttributeError: + print "\tCould not run test. Skipping." + return None + +def main(): + avail_tests = [float_to_char, + float_to_int, + float_to_short, + short_to_float, + short_to_char, + char_to_short, + char_to_float, + int_to_float, + complex_to_float, + complex_to_real, + complex_to_imag, + complex_to_mag, + complex_to_mag_squared] + + desc='Time an operation to compare with other implementations. \ + This program runs a simple GNU Radio flowgraph to test a \ + particular math function, mostly to compare the \ + Volk-optimized implementation versus a regular \ + implementation. The results are stored to an SQLite database \ + that can then be read by volk_plot.py to plot the differences.' + parser = argparse.ArgumentParser(description=desc) + parser.add_argument('-L', '--label', type=str, + required=True, default=None, + help='Label of database table [default: %(default)s]') + parser.add_argument('-D', '--database', type=str, + default="volk_results.db", + help='Database file to store data in [default: %(default)s]') + parser.add_argument('-N', '--nitems', type=float, + default=1e9, + help='Number of items per iterations [default: %(default)s]') + parser.add_argument('-I', '--iterations', type=int, + default=20, + help='Number of iterations [default: %(default)s]') + parser.add_argument('--tests', type=int, nargs='*', + choices=xrange(len(avail_tests)), + help='A list of tests to run; can be a single test or a \ + space-separated list.') + parser.add_argument('--list', action='store_true', + help='List the available tests') + parser.add_argument('--all', action='store_true', + help='Run all tests') + args = parser.parse_args() + + if(args.list): + print "Available Tests to Run:" + print "\n".join(["\t{0}: {1}".format(i,f.__name__) for i,f in enumerate(avail_tests)]) + sys.exit(0) + + N = int(args.nitems) + iters = args.iterations + label = args.label + + conn = create_connection(args.database) + new_table(conn, label) + + if args.all: + tests = xrange(len(avail_tests)) + else: + tests = args.tests + + for test in tests: + res = run_tests(avail_tests[test], N, iters) + if res is not None: + replace_results(conn, label, N, iters, res) + +if __name__ == "__main__": + try: + main() + except KeyboardInterrupt: + pass diff --git a/volk/apps/CMakeLists.txt b/volk/apps/CMakeLists.txt index f27bdc126..14291e5e3 100644 --- a/volk/apps/CMakeLists.txt +++ b/volk/apps/CMakeLists.txt @@ -42,4 +42,11 @@ add_executable(volk_profile target_link_libraries(volk_profile volk ${Boost_LIBRARIES}) +install( + PROGRAMS + ${CMAKE_BINARY_DIR}/apps/volk_profile + DESTINATION ${GR_RUNTIME_DIR} + COMPONENT "volk" +) + endif(Boost_FOUND AND UNIX) diff --git a/volk/apps/volk_profile.cc b/volk/apps/volk_profile.cc index 10a699872..bd36d6dc7 100644 --- a/volk/apps/volk_profile.cc +++ b/volk/apps/volk_profile.cc @@ -34,6 +34,7 @@ int main(int argc, char *argv[]) { VOLK_PROFILE(volk_16u_byteswap_a, 0, 0, 204600, 10000, &results); VOLK_PROFILE(volk_32f_accumulator_s32f_a, 1e-4, 0, 204600, 10000, &results); VOLK_PROFILE(volk_32f_x2_add_32f_a, 1e-4, 0, 204600, 10000, &results); + VOLK_PROFILE(volk_32f_x2_add_32f_u, 1e-4, 0, 204600, 10000, &results); VOLK_PROFILE(volk_32fc_32f_multiply_32fc_a, 1e-4, 0, 204600, 1000, &results); VOLK_PROFILE(volk_32fc_s32f_power_32fc_a, 1e-4, 0, 204600, 50, &results); VOLK_PROFILE(volk_32f_s32f_calc_spectral_noise_floor_32f_a, 1e-4, 20.0, 204600, 1000, &results); @@ -43,13 +44,22 @@ int main(int argc, char *argv[]) { VOLK_PROFILE(volk_32fc_deinterleave_32f_x2_a, 1e-4, 0, 204600, 1000, &results); VOLK_PROFILE(volk_32fc_deinterleave_64f_x2_a, 1e-4, 0, 204600, 1000, &results); VOLK_PROFILE(volk_32fc_s32f_deinterleave_real_16i_a, 0, 32768, 204600, 10000, &results); + VOLK_PROFILE(volk_32fc_deinterleave_imag_32f_a, 1e-4, 0, 204600, 5000, &results); VOLK_PROFILE(volk_32fc_deinterleave_real_32f_a, 1e-4, 0, 204600, 5000, &results); VOLK_PROFILE(volk_32fc_deinterleave_real_64f_a, 1e-4, 0, 204600, 1000, &results); VOLK_PROFILE(volk_32fc_x2_dot_prod_32fc_a, 1e-4, 0, 204600, 10000, &results); VOLK_PROFILE(volk_32fc_index_max_16u_a, 3, 0, 204600, 10000, &results); VOLK_PROFILE(volk_32fc_s32f_magnitude_16i_a, 1, 32768, 204600, 100, &results); VOLK_PROFILE(volk_32fc_magnitude_32f_a, 1e-4, 0, 204600, 1000, &results); + VOLK_PROFILE(volk_32fc_magnitude_32f_u, 1e-4, 0, 204600, 1000, &results); + VOLK_PROFILE(volk_32fc_magnitude_squared_32f_a, 1e-4, 0, 204600, 1000, &results); + VOLK_PROFILE(volk_32fc_magnitude_squared_32f_u, 1e-4, 0, 204600, 1000, &results); VOLK_PROFILE(volk_32fc_x2_multiply_32fc_a, 1e-4, 0, 204600, 1000, &results); + VOLK_PROFILE(volk_32fc_x2_multiply_32fc_u, 1e-4, 0, 204600, 1000, &results); + VOLK_PROFILE(volk_32fc_x2_multiply_conjugate_32fc_a, 1e-4, 0, 204600, 1000, &results); + VOLK_PROFILE(volk_32fc_x2_multiply_conjugate_32fc_u, 1e-4, 0, 204600, 1000, &results); + VOLK_PROFILE(volk_32fc_conjugate_32fc_a, 1e-4, 0, 204600, 1000, &results); + VOLK_PROFILE(volk_32fc_conjugate_32fc_u, 1e-4, 0, 204600, 1000, &results); VOLK_PROFILE(volk_32f_s32f_convert_16i_a, 1, 32768, 204600, 10000, &results); VOLK_PROFILE(volk_32f_s32f_convert_16i_u, 1, 32768, 204600, 10000, &results); VOLK_PROFILE(volk_32f_s32f_convert_32i_a, 1, 2<<31, 204600, 10000, &results); @@ -72,6 +82,7 @@ int main(int argc, char *argv[]) { VOLK_PROFILE(volk_32f_x2_max_32f_a, 1e-4, 0, 204600, 2000, &results); VOLK_PROFILE(volk_32f_x2_min_32f_a, 1e-4, 0, 204600, 2000, &results); VOLK_PROFILE(volk_32f_x2_multiply_32f_a, 1e-4, 0, 204600, 10000, &results); + VOLK_PROFILE(volk_32f_x2_multiply_32f_u, 1e-4, 0, 204600, 10000, &results); VOLK_PROFILE(volk_32f_s32f_normalize_a, 1e-4, 100, 204600, 10000, &results); VOLK_PROFILE(volk_32f_s32f_power_32f_a, 1e-4, 4, 204600, 100, &results); VOLK_PROFILE(volk_32f_sqrt_32f_a, 1e-4, 0, 204600, 100, &results); @@ -102,8 +113,11 @@ int main(int argc, char *argv[]) { VOLK_PROFILE(volk_8i_convert_16i_u, 0, 0, 204600, 2000, &results); VOLK_PROFILE(volk_8i_s32f_convert_32f_a, 1e-4, 100, 204600, 2000, &results); VOLK_PROFILE(volk_8i_s32f_convert_32f_u, 1e-4, 100, 204600, 2000, &results); - VOLK_PROFILE(volk_32fc_s32fc_multiply_32fc_a, 1e-4, 0, 204600, 1000, &results); - VOLK_PROFILE(volk_32f_s32f_multiply_32f_a, 1e-4, 0, 204600, 1000, &results); + //VOLK_PROFILE(volk_32fc_s32fc_multiply_32fc_a, 1e-4, lv_32fc_t(1.0, 0.5), 204600, 1000, &results); + VOLK_PROFILE(volk_32fc_s32fc_multiply_32fc_u, 1e-4, 0, 204600, 1000, &results); + VOLK_PROFILE(volk_32f_s32f_multiply_32f_a, 1e-4, 1.0, 204600, 10000, &results); + VOLK_PROFILE(volk_32f_s32f_multiply_32f_u, 1e-4, 0, 204600, 1000, &results); + char path[256]; get_config_path(path); diff --git a/volk/include/volk/volk_32f_s32f_convert_16i_a.h b/volk/include/volk/volk_32f_s32f_convert_16i_a.h index 0a2b4f0f2..a24959678 100644 --- a/volk/include/volk/volk_32f_s32f_convert_16i_a.h +++ b/volk/include/volk/volk_32f_s32f_convert_16i_a.h @@ -4,6 +4,7 @@ #include <volk/volk_common.h> #include <inttypes.h> #include <stdio.h> +#include <math.h> #ifdef LV_HAVE_SSE2 #include <emmintrin.h> @@ -21,17 +22,29 @@ static inline void volk_32f_s32f_convert_16i_a_sse2(int16_t* outputVector, const const float* inputVectorPtr = (const float*)inputVector; int16_t* outputVectorPtr = outputVector; + + float min_val = -32768; + float max_val = 32767; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 inputVal1, inputVal2; __m128i intInputVal1, intInputVal2; + __m128 ret1, ret2; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); for(;number < eighthPoints; number++){ inputVal1 = _mm_load_ps(inputVectorPtr); inputVectorPtr += 4; inputVal2 = _mm_load_ps(inputVectorPtr); inputVectorPtr += 4; - intInputVal1 = _mm_cvtps_epi32(_mm_mul_ps(inputVal1, vScalar)); - intInputVal2 = _mm_cvtps_epi32(_mm_mul_ps(inputVal2, vScalar)); - + // Scale and clip + ret1 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal1, vScalar), vmax_val), vmin_val); + ret2 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal2, vScalar), vmax_val), vmin_val); + + intInputVal1 = _mm_cvtps_epi32(ret1); + intInputVal2 = _mm_cvtps_epi32(ret2); + intInputVal1 = _mm_packs_epi32(intInputVal1, intInputVal2); _mm_store_si128((__m128i*)outputVectorPtr, intInputVal1); @@ -40,7 +53,12 @@ static inline void volk_32f_s32f_convert_16i_a_sse2(int16_t* outputVector, const number = eighthPoints * 8; for(; number < num_points; number++){ - *outputVectorPtr++ = (int16_t)(*inputVectorPtr++ * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int16_t)rintf(r); } } #endif /* LV_HAVE_SSE2 */ @@ -61,8 +79,15 @@ static inline void volk_32f_s32f_convert_16i_a_sse(int16_t* outputVector, const const float* inputVectorPtr = (const float*)inputVector; int16_t* outputVectorPtr = outputVector; + + float min_val = -32768; + float max_val = 32767; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 ret; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; @@ -70,18 +95,24 @@ static inline void volk_32f_s32f_convert_16i_a_sse(int16_t* outputVector, const ret = _mm_load_ps(inputVectorPtr); inputVectorPtr += 4; - ret = _mm_mul_ps(ret, vScalar); + // Scale and clip + ret = _mm_max_ps(_mm_min_ps(_mm_mul_ps(ret, vScalar), vmax_val), vmin_val); _mm_store_ps(outputFloatBuffer, ret); - *outputVectorPtr++ = (int16_t)(outputFloatBuffer[0]); - *outputVectorPtr++ = (int16_t)(outputFloatBuffer[1]); - *outputVectorPtr++ = (int16_t)(outputFloatBuffer[2]); - *outputVectorPtr++ = (int16_t)(outputFloatBuffer[3]); + *outputVectorPtr++ = (int16_t)rintf(outputFloatBuffer[0]); + *outputVectorPtr++ = (int16_t)rintf(outputFloatBuffer[1]); + *outputVectorPtr++ = (int16_t)rintf(outputFloatBuffer[2]); + *outputVectorPtr++ = (int16_t)rintf(outputFloatBuffer[3]); } number = quarterPoints * 4; for(; number < num_points; number++){ - *outputVectorPtr++ = (int16_t)(*inputVectorPtr++ * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int16_t)rintf(r); } } #endif /* LV_HAVE_SSE */ @@ -98,9 +129,17 @@ static inline void volk_32f_s32f_convert_16i_a_generic(int16_t* outputVector, co int16_t* outputVectorPtr = outputVector; const float* inputVectorPtr = inputVector; unsigned int number = 0; + float min_val = -32768; + float max_val = 32767; + float r; for(number = 0; number < num_points; number++){ - *outputVectorPtr++ = ((int16_t)(*inputVectorPtr++ * scalar)); + r = *inputVectorPtr++ * scalar; + if(r < min_val) + r = min_val; + else if(r > max_val) + r = max_val; + *outputVectorPtr++ = (int16_t)rintf(r); } } #endif /* LV_HAVE_GENERIC */ diff --git a/volk/include/volk/volk_32f_s32f_convert_16i_u.h b/volk/include/volk/volk_32f_s32f_convert_16i_u.h index dec3f1611..f58158041 100644 --- a/volk/include/volk/volk_32f_s32f_convert_16i_u.h +++ b/volk/include/volk/volk_32f_s32f_convert_16i_u.h @@ -3,6 +3,7 @@ #include <inttypes.h> #include <stdio.h> +#include <math.h> #ifdef LV_HAVE_SSE2 #include <emmintrin.h> @@ -21,17 +22,29 @@ static inline void volk_32f_s32f_convert_16i_u_sse2(int16_t* outputVector, const const float* inputVectorPtr = (const float*)inputVector; int16_t* outputVectorPtr = outputVector; + + float min_val = -32768; + float max_val = 32767; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 inputVal1, inputVal2; __m128i intInputVal1, intInputVal2; + __m128 ret1, ret2; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); for(;number < eighthPoints; number++){ inputVal1 = _mm_loadu_ps(inputVectorPtr); inputVectorPtr += 4; inputVal2 = _mm_loadu_ps(inputVectorPtr); inputVectorPtr += 4; - intInputVal1 = _mm_cvtps_epi32(_mm_mul_ps(inputVal1, vScalar)); - intInputVal2 = _mm_cvtps_epi32(_mm_mul_ps(inputVal2, vScalar)); - + // Scale and clip + ret1 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal1, vScalar), vmax_val), vmin_val); + ret2 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal2, vScalar), vmax_val), vmin_val); + + intInputVal1 = _mm_cvtps_epi32(ret1); + intInputVal2 = _mm_cvtps_epi32(ret2); + intInputVal1 = _mm_packs_epi32(intInputVal1, intInputVal2); _mm_storeu_si128((__m128i*)outputVectorPtr, intInputVal1); @@ -40,7 +53,12 @@ static inline void volk_32f_s32f_convert_16i_u_sse2(int16_t* outputVector, const number = eighthPoints * 8; for(; number < num_points; number++){ - outputVector[number] = (int16_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int16_t)rintf(r); } } #endif /* LV_HAVE_SSE2 */ @@ -62,8 +80,15 @@ static inline void volk_32f_s32f_convert_16i_u_sse(int16_t* outputVector, const const float* inputVectorPtr = (const float*)inputVector; int16_t* outputVectorPtr = outputVector; + + float min_val = -32768; + float max_val = 32767; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 ret; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; @@ -71,18 +96,24 @@ static inline void volk_32f_s32f_convert_16i_u_sse(int16_t* outputVector, const ret = _mm_loadu_ps(inputVectorPtr); inputVectorPtr += 4; - ret = _mm_mul_ps(ret, vScalar); + // Scale and clip + ret = _mm_max_ps(_mm_min_ps(_mm_mul_ps(ret, vScalar), vmax_val), vmin_val); _mm_store_ps(outputFloatBuffer, ret); - *outputVectorPtr++ = (int16_t)(outputFloatBuffer[0]); - *outputVectorPtr++ = (int16_t)(outputFloatBuffer[1]); - *outputVectorPtr++ = (int16_t)(outputFloatBuffer[2]); - *outputVectorPtr++ = (int16_t)(outputFloatBuffer[3]); + *outputVectorPtr++ = (int16_t)rintf(outputFloatBuffer[0]); + *outputVectorPtr++ = (int16_t)rintf(outputFloatBuffer[1]); + *outputVectorPtr++ = (int16_t)rintf(outputFloatBuffer[2]); + *outputVectorPtr++ = (int16_t)rintf(outputFloatBuffer[3]); } number = quarterPoints * 4; for(; number < num_points; number++){ - outputVector[number] = (int16_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int16_t)rintf(r); } } #endif /* LV_HAVE_SSE */ @@ -100,9 +131,17 @@ static inline void volk_32f_s32f_convert_16i_u_generic(int16_t* outputVector, co int16_t* outputVectorPtr = outputVector; const float* inputVectorPtr = inputVector; unsigned int number = 0; + float min_val = -32768; + float max_val = 32767; + float r; for(number = 0; number < num_points; number++){ - *outputVectorPtr++ = ((int16_t)(*inputVectorPtr++ * scalar)); + r = *inputVectorPtr++ * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + *outputVectorPtr++ = (int16_t)rintf(r); } } #endif /* LV_HAVE_GENERIC */ diff --git a/volk/include/volk/volk_32f_s32f_convert_32i_a.h b/volk/include/volk/volk_32f_s32f_convert_32i_a.h index aa370e614..8f2fc791e 100644 --- a/volk/include/volk/volk_32f_s32f_convert_32i_a.h +++ b/volk/include/volk/volk_32f_s32f_convert_32i_a.h @@ -21,14 +21,22 @@ static inline void volk_32f_s32f_convert_32i_a_avx(int32_t* outputVector, const const float* inputVectorPtr = (const float*)inputVector; int32_t* outputVectorPtr = outputVector; + + float min_val = -2147483648; + float max_val = 2147483647; + float r; + __m256 vScalar = _mm256_set1_ps(scalar); __m256 inputVal1; __m256i intInputVal1; + __m256 vmin_val = _mm256_set1_ps(min_val); + __m256 vmax_val = _mm256_set1_ps(max_val); for(;number < eighthPoints; number++){ inputVal1 = _mm256_load_ps(inputVectorPtr); inputVectorPtr += 8; - intInputVal1 = _mm256_cvtps_epi32(_mm256_mul_ps(inputVal1, vScalar)); + inputVal1 = _mm256_max_ps(_mm256_min_ps(_mm256_mul_ps(inputVal1, vScalar), vmax_val), vmin_val); + intInputVal1 = _mm256_cvtps_epi32(inputVal1); _mm256_store_si256((__m256i*)outputVectorPtr, intInputVal1); outputVectorPtr += 8; @@ -36,7 +44,12 @@ static inline void volk_32f_s32f_convert_32i_a_avx(int32_t* outputVector, const number = eighthPoints * 8; for(; number < num_points; number++){ - outputVector[number] = (int32_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int32_t)(r); } } #endif /* LV_HAVE_AVX */ @@ -57,14 +70,22 @@ static inline void volk_32f_s32f_convert_32i_a_sse2(int32_t* outputVector, const const float* inputVectorPtr = (const float*)inputVector; int32_t* outputVectorPtr = outputVector; + + float min_val = -2147483648; + float max_val = 2147483647; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 inputVal1; __m128i intInputVal1; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); for(;number < quarterPoints; number++){ inputVal1 = _mm_load_ps(inputVectorPtr); inputVectorPtr += 4; - intInputVal1 = _mm_cvtps_epi32(_mm_mul_ps(inputVal1, vScalar)); + inputVal1 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal1, vScalar), vmax_val), vmin_val); + intInputVal1 = _mm_cvtps_epi32(inputVal1); _mm_store_si128((__m128i*)outputVectorPtr, intInputVal1); outputVectorPtr += 4; @@ -72,7 +93,12 @@ static inline void volk_32f_s32f_convert_32i_a_sse2(int32_t* outputVector, const number = quarterPoints * 4; for(; number < num_points; number++){ - outputVector[number] = (int32_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int32_t)(r); } } #endif /* LV_HAVE_SSE2 */ @@ -93,8 +119,15 @@ static inline void volk_32f_s32f_convert_32i_a_sse(int32_t* outputVector, const const float* inputVectorPtr = (const float*)inputVector; int32_t* outputVectorPtr = outputVector; + + float min_val = -2147483647; + float max_val = 2147483647; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 ret; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; @@ -102,7 +135,7 @@ static inline void volk_32f_s32f_convert_32i_a_sse(int32_t* outputVector, const ret = _mm_load_ps(inputVectorPtr); inputVectorPtr += 4; - ret = _mm_mul_ps(ret, vScalar); + ret = _mm_max_ps(_mm_min_ps(_mm_mul_ps(ret, vScalar), vmax_val), vmin_val); _mm_store_ps(outputFloatBuffer, ret); *outputVectorPtr++ = (int32_t)(outputFloatBuffer[0]); @@ -113,7 +146,12 @@ static inline void volk_32f_s32f_convert_32i_a_sse(int32_t* outputVector, const number = quarterPoints * 4; for(; number < num_points; number++){ - outputVector[number] = (int32_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int32_t)(r); } } #endif /* LV_HAVE_SSE */ @@ -130,9 +168,17 @@ static inline void volk_32f_s32f_convert_32i_a_generic(int32_t* outputVector, co int32_t* outputVectorPtr = outputVector; const float* inputVectorPtr = inputVector; unsigned int number = 0; + float min_val = -2147483647; + float max_val = 2147483647; + float r; for(number = 0; number < num_points; number++){ - *outputVectorPtr++ = ((int32_t)(*inputVectorPtr++ * scalar)); + r = *inputVectorPtr++ * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + *outputVectorPtr++ = (int32_t)(r); } } #endif /* LV_HAVE_GENERIC */ diff --git a/volk/include/volk/volk_32f_s32f_convert_32i_u.h b/volk/include/volk/volk_32f_s32f_convert_32i_u.h index b4e954dc4..d8493454b 100644 --- a/volk/include/volk/volk_32f_s32f_convert_32i_u.h +++ b/volk/include/volk/volk_32f_s32f_convert_32i_u.h @@ -21,14 +21,24 @@ static inline void volk_32f_s32f_convert_32i_u_sse2(int32_t* outputVector, const const float* inputVectorPtr = (const float*)inputVector; int32_t* outputVectorPtr = outputVector; + + //float min_val = -2147483647; + //float max_val = 2147483647; + float min_val = -2146400000; + float max_val = 2146400000; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 inputVal1; __m128i intInputVal1; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); for(;number < quarterPoints; number++){ inputVal1 = _mm_loadu_ps(inputVectorPtr); inputVectorPtr += 4; - intInputVal1 = _mm_cvtps_epi32(_mm_mul_ps(inputVal1, vScalar)); + inputVal1 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal1, vScalar), vmax_val), vmin_val); + intInputVal1 = _mm_cvtps_epi32(inputVal1); _mm_storeu_si128((__m128i*)outputVectorPtr, intInputVal1); outputVectorPtr += 4; @@ -36,7 +46,12 @@ static inline void volk_32f_s32f_convert_32i_u_sse2(int32_t* outputVector, const number = quarterPoints * 4; for(; number < num_points; number++){ - outputVector[number] = (int32_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int32_t)(r); } } #endif /* LV_HAVE_SSE2 */ @@ -58,8 +73,15 @@ static inline void volk_32f_s32f_convert_32i_u_sse(int32_t* outputVector, const const float* inputVectorPtr = (const float*)inputVector; int32_t* outputVectorPtr = outputVector; + + float min_val = -2147483647; + float max_val = 2147483647; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 ret; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; @@ -67,7 +89,7 @@ static inline void volk_32f_s32f_convert_32i_u_sse(int32_t* outputVector, const ret = _mm_loadu_ps(inputVectorPtr); inputVectorPtr += 4; - ret = _mm_mul_ps(ret, vScalar); + ret = _mm_max_ps(_mm_min_ps(_mm_mul_ps(ret, vScalar), vmax_val), vmin_val); _mm_store_ps(outputFloatBuffer, ret); *outputVectorPtr++ = (int32_t)(outputFloatBuffer[0]); @@ -78,7 +100,12 @@ static inline void volk_32f_s32f_convert_32i_u_sse(int32_t* outputVector, const number = quarterPoints * 4; for(; number < num_points; number++){ - outputVector[number] = (int32_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int32_t)(r); } } #endif /* LV_HAVE_SSE */ @@ -96,9 +123,17 @@ static inline void volk_32f_s32f_convert_32i_u_generic(int32_t* outputVector, co int32_t* outputVectorPtr = outputVector; const float* inputVectorPtr = inputVector; unsigned int number = 0; + float min_val = -2147483647; + float max_val = 2147483647; + float r; for(number = 0; number < num_points; number++){ - *outputVectorPtr++ = ((int32_t)(*inputVectorPtr++ * scalar)); + r = *inputVectorPtr++ * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + *outputVectorPtr++ = (int32_t)(r); } } #endif /* LV_HAVE_GENERIC */ diff --git a/volk/include/volk/volk_32f_s32f_convert_8i_a.h b/volk/include/volk/volk_32f_s32f_convert_8i_a.h index 8d87a07d7..05172171c 100644 --- a/volk/include/volk/volk_32f_s32f_convert_8i_a.h +++ b/volk/include/volk/volk_32f_s32f_convert_8i_a.h @@ -21,9 +21,16 @@ static inline void volk_32f_s32f_convert_8i_a_sse2(int8_t* outputVector, const f const float* inputVectorPtr = (const float*)inputVector; int8_t* outputVectorPtr = outputVector; + + float min_val = -128; + float max_val = 127; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 inputVal1, inputVal2, inputVal3, inputVal4; __m128i intInputVal1, intInputVal2, intInputVal3, intInputVal4; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); for(;number < sixteenthPoints; number++){ inputVal1 = _mm_load_ps(inputVectorPtr); inputVectorPtr += 4; @@ -31,10 +38,15 @@ static inline void volk_32f_s32f_convert_8i_a_sse2(int8_t* outputVector, const f inputVal3 = _mm_load_ps(inputVectorPtr); inputVectorPtr += 4; inputVal4 = _mm_load_ps(inputVectorPtr); inputVectorPtr += 4; - intInputVal1 = _mm_cvtps_epi32(_mm_mul_ps(inputVal1, vScalar)); - intInputVal2 = _mm_cvtps_epi32(_mm_mul_ps(inputVal2, vScalar)); - intInputVal3 = _mm_cvtps_epi32(_mm_mul_ps(inputVal3, vScalar)); - intInputVal4 = _mm_cvtps_epi32(_mm_mul_ps(inputVal4, vScalar)); + inputVal1 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal1, vScalar), vmax_val), vmin_val); + inputVal2 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal2, vScalar), vmax_val), vmin_val); + inputVal3 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal3, vScalar), vmax_val), vmin_val); + inputVal4 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal4, vScalar), vmax_val), vmin_val); + + intInputVal1 = _mm_cvtps_epi32(inputVal1); + intInputVal2 = _mm_cvtps_epi32(inputVal2); + intInputVal3 = _mm_cvtps_epi32(inputVal3); + intInputVal4 = _mm_cvtps_epi32(inputVal4); intInputVal1 = _mm_packs_epi32(intInputVal1, intInputVal2); intInputVal3 = _mm_packs_epi32(intInputVal3, intInputVal4); @@ -47,7 +59,12 @@ static inline void volk_32f_s32f_convert_8i_a_sse2(int8_t* outputVector, const f number = sixteenthPoints * 16; for(; number < num_points; number++){ - outputVector[number] = (int8_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int8_t)(r); } } #endif /* LV_HAVE_SSE2 */ @@ -67,9 +84,16 @@ static inline void volk_32f_s32f_convert_8i_a_sse(int8_t* outputVector, const fl const unsigned int quarterPoints = num_points / 4; const float* inputVectorPtr = (const float*)inputVector; + + float min_val = -128; + float max_val = 127; + float r; + int8_t* outputVectorPtr = outputVector; __m128 vScalar = _mm_set_ps1(scalar); __m128 ret; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; @@ -77,7 +101,7 @@ static inline void volk_32f_s32f_convert_8i_a_sse(int8_t* outputVector, const fl ret = _mm_load_ps(inputVectorPtr); inputVectorPtr += 4; - ret = _mm_mul_ps(ret, vScalar); + ret = _mm_max_ps(_mm_min_ps(_mm_mul_ps(ret, vScalar), vmax_val), vmin_val); _mm_store_ps(outputFloatBuffer, ret); *outputVectorPtr++ = (int8_t)(outputFloatBuffer[0]); @@ -88,7 +112,12 @@ static inline void volk_32f_s32f_convert_8i_a_sse(int8_t* outputVector, const fl number = quarterPoints * 4; for(; number < num_points; number++){ - outputVector[number] = (int8_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int8_t)(r); } } #endif /* LV_HAVE_SSE */ @@ -105,9 +134,17 @@ static inline void volk_32f_s32f_convert_8i_a_generic(int8_t* outputVector, cons int8_t* outputVectorPtr = outputVector; const float* inputVectorPtr = inputVector; unsigned int number = 0; + float min_val = -128; + float max_val = 127; + float r; for(number = 0; number < num_points; number++){ - *outputVectorPtr++ = (int8_t)(*inputVectorPtr++ * scalar); + r = *inputVectorPtr++ * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + *outputVectorPtr++ = (int8_t)(r); } } #endif /* LV_HAVE_GENERIC */ diff --git a/volk/include/volk/volk_32f_s32f_convert_8i_u.h b/volk/include/volk/volk_32f_s32f_convert_8i_u.h index 1c6bf87c9..12991e9c1 100644 --- a/volk/include/volk/volk_32f_s32f_convert_8i_u.h +++ b/volk/include/volk/volk_32f_s32f_convert_8i_u.h @@ -21,9 +21,16 @@ static inline void volk_32f_s32f_convert_8i_u_sse2(int8_t* outputVector, const f const float* inputVectorPtr = (const float*)inputVector; int8_t* outputVectorPtr = outputVector; + + float min_val = -128; + float max_val = 127; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 inputVal1, inputVal2, inputVal3, inputVal4; __m128i intInputVal1, intInputVal2, intInputVal3, intInputVal4; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); for(;number < sixteenthPoints; number++){ inputVal1 = _mm_loadu_ps(inputVectorPtr); inputVectorPtr += 4; @@ -31,10 +38,15 @@ static inline void volk_32f_s32f_convert_8i_u_sse2(int8_t* outputVector, const f inputVal3 = _mm_loadu_ps(inputVectorPtr); inputVectorPtr += 4; inputVal4 = _mm_loadu_ps(inputVectorPtr); inputVectorPtr += 4; - intInputVal1 = _mm_cvtps_epi32(_mm_mul_ps(inputVal1, vScalar)); - intInputVal2 = _mm_cvtps_epi32(_mm_mul_ps(inputVal2, vScalar)); - intInputVal3 = _mm_cvtps_epi32(_mm_mul_ps(inputVal3, vScalar)); - intInputVal4 = _mm_cvtps_epi32(_mm_mul_ps(inputVal4, vScalar)); + inputVal1 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal1, vScalar), vmax_val), vmin_val); + inputVal2 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal2, vScalar), vmax_val), vmin_val); + inputVal3 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal3, vScalar), vmax_val), vmin_val); + inputVal4 = _mm_max_ps(_mm_min_ps(_mm_mul_ps(inputVal4, vScalar), vmax_val), vmin_val); + + intInputVal1 = _mm_cvtps_epi32(inputVal1); + intInputVal2 = _mm_cvtps_epi32(inputVal2); + intInputVal3 = _mm_cvtps_epi32(inputVal3); + intInputVal4 = _mm_cvtps_epi32(inputVal4); intInputVal1 = _mm_packs_epi32(intInputVal1, intInputVal2); intInputVal3 = _mm_packs_epi32(intInputVal3, intInputVal4); @@ -47,7 +59,12 @@ static inline void volk_32f_s32f_convert_8i_u_sse2(int8_t* outputVector, const f number = sixteenthPoints * 16; for(; number < num_points; number++){ - outputVector[number] = (int8_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int16_t)(r); } } #endif /* LV_HAVE_SSE2 */ @@ -69,8 +86,15 @@ static inline void volk_32f_s32f_convert_8i_u_sse(int8_t* outputVector, const fl const float* inputVectorPtr = (const float*)inputVector; int8_t* outputVectorPtr = outputVector; + + float min_val = -128; + float max_val = 127; + float r; + __m128 vScalar = _mm_set_ps1(scalar); __m128 ret; + __m128 vmin_val = _mm_set_ps1(min_val); + __m128 vmax_val = _mm_set_ps1(max_val); __VOLK_ATTR_ALIGNED(16) float outputFloatBuffer[4]; @@ -78,7 +102,7 @@ static inline void volk_32f_s32f_convert_8i_u_sse(int8_t* outputVector, const fl ret = _mm_loadu_ps(inputVectorPtr); inputVectorPtr += 4; - ret = _mm_mul_ps(ret, vScalar); + ret = _mm_max_ps(_mm_min_ps(_mm_mul_ps(ret, vScalar), vmax_val), vmin_val); _mm_store_ps(outputFloatBuffer, ret); *outputVectorPtr++ = (int8_t)(outputFloatBuffer[0]); @@ -89,7 +113,12 @@ static inline void volk_32f_s32f_convert_8i_u_sse(int8_t* outputVector, const fl number = quarterPoints * 4; for(; number < num_points; number++){ - outputVector[number] = (int8_t)(inputVector[number] * scalar); + r = inputVector[number] * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + outputVector[number] = (int16_t)(r); } } #endif /* LV_HAVE_SSE */ @@ -107,9 +136,17 @@ static inline void volk_32f_s32f_convert_8i_u_generic(int8_t* outputVector, cons int8_t* outputVectorPtr = outputVector; const float* inputVectorPtr = inputVector; unsigned int number = 0; + float min_val = -128; + float max_val = 127; + float r; for(number = 0; number < num_points; number++){ - *outputVectorPtr++ = ((int8_t)(*inputVectorPtr++ * scalar)); + r = *inputVectorPtr++ * scalar; + if(r > max_val) + r = max_val; + else if(r < min_val) + r = min_val; + *outputVectorPtr++ = (int16_t)(r); } } #endif /* LV_HAVE_GENERIC */ diff --git a/volk/include/volk/volk_32f_s32f_multiply_32f_a.h b/volk/include/volk/volk_32f_s32f_multiply_32f_a.h index 37223dc81..d1c6f3f65 100644 --- a/volk/include/volk/volk_32f_s32f_multiply_32f_a.h +++ b/volk/include/volk/volk_32f_s32f_multiply_32f_a.h @@ -4,6 +4,81 @@ #include <inttypes.h> #include <stdio.h> +#ifdef LV_HAVE_SSE +#include <xmmintrin.h> +/*! + \brief Scalar float multiply + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param scalar the scalar value + \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32f_s32f_multiply_32f_a_sse(float* cVector, const float* aVector, const float scalar, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + float* cPtr = cVector; + const float* aPtr = aVector; + + __m128 aVal, bVal, cVal; + bVal = _mm_set_ps1(scalar); + for(;number < quarterPoints; number++){ + + aVal = _mm_load_ps(aPtr); + + cVal = _mm_mul_ps(aVal, bVal); + + _mm_store_ps(cPtr,cVal); // Store the results back into the C container + + aPtr += 4; + cPtr += 4; + } + + number = quarterPoints * 4; + for(;number < num_points; number++){ + *cPtr++ = (*aPtr++) * scalar; + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_AVX +#include <immintrin.h> +/*! + \brief Scalar float multiply + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param scalar the scalar value + \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32f_s32f_multiply_32f_a_avx(float* cVector, const float* aVector, const float scalar, unsigned int num_points){ + unsigned int number = 0; + const unsigned int eighthPoints = num_points / 8; + + float* cPtr = cVector; + const float* aPtr = aVector; + + __m256 aVal, bVal, cVal; + bVal = _mm256_set1_ps(scalar); + for(;number < eighthPoints; number++){ + + aVal = _mm256_load_ps(aPtr); + + cVal = _mm256_mul_ps(aVal, bVal); + + _mm256_store_ps(cPtr,cVal); // Store the results back into the C container + + aPtr += 8; + cPtr += 8; + } + + number = eighthPoints * 8; + for(;number < num_points; number++){ + *cPtr++ = (*aPtr++) * scalar; + } +} +#endif /* LV_HAVE_AVX */ + + #ifdef LV_HAVE_GENERIC /*! \brief Scalar float multiply diff --git a/volk/include/volk/volk_32f_s32f_multiply_32f_u.h b/volk/include/volk/volk_32f_s32f_multiply_32f_u.h new file mode 100644 index 000000000..0e700060f --- /dev/null +++ b/volk/include/volk/volk_32f_s32f_multiply_32f_u.h @@ -0,0 +1,102 @@ +#ifndef INCLUDED_volk_32f_s32f_multiply_32f_u_H +#define INCLUDED_volk_32f_s32f_multiply_32f_u_H + +#include <inttypes.h> +#include <stdio.h> + +#ifdef LV_HAVE_SSE +#include <xmmintrin.h> +/*! + \brief Scalar float multiply + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param scalar the scalar value + \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32f_s32f_multiply_32f_u_sse(float* cVector, const float* aVector, const float scalar, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + float* cPtr = cVector; + const float* aPtr = aVector; + + __m128 aVal, bVal, cVal; + bVal = _mm_set_ps1(scalar); + for(;number < quarterPoints; number++){ + + aVal = _mm_loadu_ps(aPtr); + + cVal = _mm_mul_ps(aVal, bVal); + + _mm_storeu_ps(cPtr,cVal); // Store the results back into the C container + + aPtr += 4; + cPtr += 4; + } + + number = quarterPoints * 4; + for(;number < num_points; number++){ + *cPtr++ = (*aPtr++) * scalar; + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_AVX +#include <immintrin.h> +/*! + \brief Scalar float multiply + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param scalar the scalar value + \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32f_s32f_multiply_32f_u_avx(float* cVector, const float* aVector, const float scalar, unsigned int num_points){ + unsigned int number = 0; + const unsigned int eighthPoints = num_points / 8; + + float* cPtr = cVector; + const float* aPtr = aVector; + + __m256 aVal, bVal, cVal; + bVal = _mm256_set1_ps(scalar); + for(;number < eighthPoints; number++){ + + aVal = _mm256_loadu_ps(aPtr); + + cVal = _mm256_mul_ps(aVal, bVal); + + _mm256_storeu_ps(cPtr,cVal); // Store the results back into the C container + + aPtr += 8; + cPtr += 8; + } + + number = eighthPoints * 8; + for(;number < num_points; number++){ + *cPtr++ = (*aPtr++) * scalar; + } +} +#endif /* LV_HAVE_AVX */ + +#ifdef LV_HAVE_GENERIC +/*! + \brief Scalar float multiply + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param scalar the scalar value + \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32f_s32f_multiply_32f_u_generic(float* cVector, const float* aVector, const float scalar, unsigned int num_points){ + unsigned int number = 0; + const float* inputPtr = aVector; + float* outputPtr = cVector; + for(number = 0; number < num_points; number++){ + *outputPtr = (*inputPtr) * scalar; + inputPtr++; + outputPtr++; + } +} +#endif /* LV_HAVE_GENERIC */ + + +#endif /* INCLUDED_volk_32f_s32f_multiply_32f_u_H */ diff --git a/volk/include/volk/volk_32f_x2_add_32f_u.h b/volk/include/volk/volk_32f_x2_add_32f_u.h new file mode 100644 index 000000000..e360a7958 --- /dev/null +++ b/volk/include/volk/volk_32f_x2_add_32f_u.h @@ -0,0 +1,66 @@ +#ifndef INCLUDED_volk_32f_x2_add_32f_u_H +#define INCLUDED_volk_32f_x2_add_32f_u_H + +#include <inttypes.h> +#include <stdio.h> + +#ifdef LV_HAVE_SSE +#include <xmmintrin.h> +/*! + \brief Adds the two input vectors and store their results in the third vector + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be added + \param bVector One of the vectors to be added + \param num_points The number of values in aVector and bVector to be added together and stored into cVector +*/ +static inline void volk_32f_x2_add_32f_u_sse(float* cVector, const float* aVector, const float* bVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + float* cPtr = cVector; + const float* aPtr = aVector; + const float* bPtr= bVector; + + __m128 aVal, bVal, cVal; + for(;number < quarterPoints; number++){ + + aVal = _mm_loadu_ps(aPtr); + bVal = _mm_loadu_ps(bPtr); + + cVal = _mm_add_ps(aVal, bVal); + + _mm_storeu_ps(cPtr,cVal); // Store the results back into the C container + + aPtr += 4; + bPtr += 4; + cPtr += 4; + } + + number = quarterPoints * 4; + for(;number < num_points; number++){ + *cPtr++ = (*aPtr++) + (*bPtr++); + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_GENERIC +/*! + \brief Adds the two input vectors and store their results in the third vector + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be added + \param bVector One of the vectors to be added + \param num_points The number of values in aVector and bVector to be added together and stored into cVector +*/ +static inline void volk_32f_x2_add_32f_u_generic(float* cVector, const float* aVector, const float* bVector, unsigned int num_points){ + float* cPtr = cVector; + const float* aPtr = aVector; + const float* bPtr= bVector; + unsigned int number = 0; + + for(number = 0; number < num_points; number++){ + *cPtr++ = (*aPtr++) + (*bPtr++); + } +} +#endif /* LV_HAVE_GENERIC */ + +#endif /* INCLUDED_volk_32f_x2_add_32f_u_H */ diff --git a/volk/include/volk/volk_32f_x2_multiply_32f_u.h b/volk/include/volk/volk_32f_x2_multiply_32f_u.h new file mode 100644 index 000000000..6c3ce5d83 --- /dev/null +++ b/volk/include/volk/volk_32f_x2_multiply_32f_u.h @@ -0,0 +1,106 @@ +#ifndef INCLUDED_volk_32f_x2_multiply_32f_u_H +#define INCLUDED_volk_32f_x2_multiply_32f_u_H + +#include <inttypes.h> +#include <stdio.h> + +#ifdef LV_HAVE_SSE +#include <xmmintrin.h> +/*! + \brief Multiplys the two input vectors and store their results in the third vector + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param bVector One of the vectors to be multiplied + \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32f_x2_multiply_32f_u_sse(float* cVector, const float* aVector, const float* bVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + float* cPtr = cVector; + const float* aPtr = aVector; + const float* bPtr= bVector; + + __m128 aVal, bVal, cVal; + for(;number < quarterPoints; number++){ + + aVal = _mm_loadu_ps(aPtr); + bVal = _mm_loadu_ps(bPtr); + + cVal = _mm_mul_ps(aVal, bVal); + + _mm_storeu_ps(cPtr,cVal); // Store the results back into the C container + + aPtr += 4; + bPtr += 4; + cPtr += 4; + } + + number = quarterPoints * 4; + for(;number < num_points; number++){ + *cPtr++ = (*aPtr++) * (*bPtr++); + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_AVX +#include <immintrin.h> +/*! + \brief Multiplies the two input vectors and store their results in the third vector + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param bVector One of the vectors to be multiplied + \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32f_x2_multiply_32f_u_avx(float* cVector, const float* aVector, const float* bVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int eighthPoints = num_points / 8; + + float* cPtr = cVector; + const float* aPtr = aVector; + const float* bPtr= bVector; + + __m256 aVal, bVal, cVal; + for(;number < eighthPoints; number++){ + + aVal = _mm256_loadu_ps(aPtr); + bVal = _mm256_loadu_ps(bPtr); + + cVal = _mm256_mul_ps(aVal, bVal); + + _mm256_storeu_ps(cPtr,cVal); // Store the results back into the C container + + aPtr += 8; + bPtr += 8; + cPtr += 8; + } + + number = eighthPoints * 8; + for(;number < num_points; number++){ + *cPtr++ = (*aPtr++) * (*bPtr++); + } +} +#endif /* LV_HAVE_AVX */ + +#ifdef LV_HAVE_GENERIC +/*! + \brief Multiplys the two input vectors and store their results in the third vector + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param bVector One of the vectors to be multiplied + \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32f_x2_multiply_32f_u_generic(float* cVector, const float* aVector, const float* bVector, unsigned int num_points){ + float* cPtr = cVector; + const float* aPtr = aVector; + const float* bPtr= bVector; + unsigned int number = 0; + + for(number = 0; number < num_points; number++){ + *cPtr++ = (*aPtr++) * (*bPtr++); + } +} +#endif /* LV_HAVE_GENERIC */ + + +#endif /* INCLUDED_volk_32f_x2_multiply_32f_u_H */ diff --git a/volk/include/volk/volk_32fc_conjugate_32fc_a.h b/volk/include/volk/volk_32fc_conjugate_32fc_a.h new file mode 100644 index 000000000..1518af9be --- /dev/null +++ b/volk/include/volk/volk_32fc_conjugate_32fc_a.h @@ -0,0 +1,64 @@ +#ifndef INCLUDED_volk_32fc_conjugate_32fc_a_H +#define INCLUDED_volk_32fc_conjugate_32fc_a_H + +#include <inttypes.h> +#include <stdio.h> +#include <volk/volk_complex.h> +#include <float.h> + +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> + /*! + \brief Takes the conjugate of a complex vector. + \param cVector The vector where the results will be stored + \param aVector Vector to be conjugated + \param num_points The number of complex values in aVector to be conjugated and stored into cVector + */ +static inline void volk_32fc_conjugate_32fc_a_sse3(lv_32fc_t* cVector, const lv_32fc_t* aVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int halfPoints = num_points / 2; + + __m128 x; + lv_32fc_t* c = cVector; + const lv_32fc_t* a = aVector; + + __m128 conjugator = _mm_setr_ps(0, -0.f, 0, -0.f); + + for(;number < halfPoints; number++){ + + x = _mm_load_ps((float*)a); // Load the complex data as ar,ai,br,bi + + x = _mm_xor_ps(x, conjugator); // conjugate register + + _mm_store_ps((float*)c,x); // Store the results back into the C container + + a += 2; + c += 2; + } + + if((num_points % 2) != 0) { + *c = lv_conj(*a); + } +} +#endif /* LV_HAVE_SSE3 */ + +#ifdef LV_HAVE_GENERIC + /*! + \brief Takes the conjugate of a complex vector. + \param cVector The vector where the results will be stored + \param aVector Vector to be conjugated + \param num_points The number of complex values in aVector to be conjugated and stored into cVector + */ +static inline void volk_32fc_conjugate_32fc_a_generic(lv_32fc_t* cVector, const lv_32fc_t* aVector, unsigned int num_points){ + lv_32fc_t* cPtr = cVector; + const lv_32fc_t* aPtr = aVector; + unsigned int number = 0; + + for(number = 0; number < num_points; number++){ + *cPtr++ = lv_conj(*aPtr++); + } +} +#endif /* LV_HAVE_GENERIC */ + + +#endif /* INCLUDED_volk_32fc_conjugate_32fc_a_H */ diff --git a/volk/include/volk/volk_32fc_conjugate_32fc_u.h b/volk/include/volk/volk_32fc_conjugate_32fc_u.h new file mode 100644 index 000000000..b26fe0789 --- /dev/null +++ b/volk/include/volk/volk_32fc_conjugate_32fc_u.h @@ -0,0 +1,64 @@ +#ifndef INCLUDED_volk_32fc_conjugate_32fc_u_H +#define INCLUDED_volk_32fc_conjugate_32fc_u_H + +#include <inttypes.h> +#include <stdio.h> +#include <volk/volk_complex.h> +#include <float.h> + +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> + /*! + \brief Takes the conjugate of a complex vector. + \param cVector The vector where the results will be stored + \param aVector Vector to be conjugated + \param num_points The number of complex values in aVector to be conjugated and stored into cVector + */ +static inline void volk_32fc_conjugate_32fc_u_sse3(lv_32fc_t* cVector, const lv_32fc_t* aVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int halfPoints = num_points / 2; + + __m128 x; + lv_32fc_t* c = cVector; + const lv_32fc_t* a = aVector; + + __m128 conjugator = _mm_setr_ps(0, -0.f, 0, -0.f); + + for(;number < halfPoints; number++){ + + x = _mm_loadu_ps((float*)a); // Load the complex data as ar,ai,br,bi + + x = _mm_xor_ps(x, conjugator); // conjugate register + + _mm_storeu_ps((float*)c,x); // Store the results back into the C container + + a += 2; + c += 2; + } + + if((num_points % 2) != 0) { + *c = lv_conj(*a); + } +} +#endif /* LV_HAVE_SSE3 */ + +#ifdef LV_HAVE_GENERIC + /*! + \brief Takes the conjugate of a complex vector. + \param cVector The vector where the results will be stored + \param aVector Vector to be conjugated + \param num_points The number of complex values in aVector to be conjugated and stored into cVector + */ +static inline void volk_32fc_conjugate_32fc_u_generic(lv_32fc_t* cVector, const lv_32fc_t* aVector, unsigned int num_points){ + lv_32fc_t* cPtr = cVector; + const lv_32fc_t* aPtr = aVector; + unsigned int number = 0; + + for(number = 0; number < num_points; number++){ + *cPtr++ = lv_conj(*aPtr++); + } +} +#endif /* LV_HAVE_GENERIC */ + + +#endif /* INCLUDED_volk_32fc_conjugate_32fc_u_H */ diff --git a/volk/include/volk/volk_32fc_deinterleave_imag_32f_a.h b/volk/include/volk/volk_32fc_deinterleave_imag_32f_a.h new file mode 100644 index 000000000..adc4112b9 --- /dev/null +++ b/volk/include/volk/volk_32fc_deinterleave_imag_32f_a.h @@ -0,0 +1,68 @@ +#ifndef INCLUDED_volk_32fc_deinterleave_imag_32f_a_H +#define INCLUDED_volk_32fc_deinterleave_imag_32f_a_H + +#include <inttypes.h> +#include <stdio.h> + +#ifdef LV_HAVE_SSE +#include <xmmintrin.h> +/*! + \brief Deinterleaves the complex vector into Q vector data + \param complexVector The complex input vector + \param qBuffer The Q buffer output data + \param num_points The number of complex data values to be deinterleaved +*/ +static inline void volk_32fc_deinterleave_imag_32f_a_sse(float* qBuffer, const lv_32fc_t* complexVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + const float* complexVectorPtr = (const float*)complexVector; + float* qBufferPtr = qBuffer; + + __m128 cplxValue1, cplxValue2, iValue; + for(;number < quarterPoints; number++){ + + cplxValue1 = _mm_load_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue2 = _mm_load_ps(complexVectorPtr); + complexVectorPtr += 4; + + // Arrange in q1q2q3q4 format + iValue = _mm_shuffle_ps(cplxValue1, cplxValue2, _MM_SHUFFLE(3,1,3,1)); + + _mm_store_ps(qBufferPtr, iValue); + + qBufferPtr += 4; + } + + number = quarterPoints * 4; + for(; number < num_points; number++){ + complexVectorPtr++; + *qBufferPtr++ = *complexVectorPtr++; + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_GENERIC +/*! + \brief Deinterleaves the complex vector into Q vector data + \param complexVector The complex input vector + \param qBuffer The I buffer output data + \param num_points The number of complex data values to be deinterleaved +*/ +static inline void volk_32fc_deinterleave_imag_32f_a_generic(float* qBuffer, const lv_32fc_t* complexVector, unsigned int num_points){ + unsigned int number = 0; + const float* complexVectorPtr = (float*)complexVector; + float* qBufferPtr = qBuffer; + for(number = 0; number < num_points; number++){ + complexVectorPtr++; + *qBufferPtr++ = *complexVectorPtr++; + } +} +#endif /* LV_HAVE_GENERIC */ + + + + +#endif /* INCLUDED_volk_32fc_deinterleave_imag_32f_a_H */ diff --git a/volk/include/volk/volk_32fc_magnitude_32f_u.h b/volk/include/volk/volk_32fc_magnitude_32f_u.h new file mode 100644 index 000000000..ed1cedef9 --- /dev/null +++ b/volk/include/volk/volk_32fc_magnitude_32f_u.h @@ -0,0 +1,118 @@ +#ifndef INCLUDED_volk_32fc_magnitude_32f_u_H +#define INCLUDED_volk_32fc_magnitude_32f_u_H + +#include <inttypes.h> +#include <stdio.h> +#include <math.h> + +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> + /*! + \brief Calculates the magnitude of the complexVector and stores the results in the magnitudeVector + \param complexVector The vector containing the complex input values + \param magnitudeVector The vector containing the real output values + \param num_points The number of complex values in complexVector to be calculated and stored into cVector + */ +static inline void volk_32fc_magnitude_32f_u_sse3(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + const float* complexVectorPtr = (float*)complexVector; + float* magnitudeVectorPtr = magnitudeVector; + + __m128 cplxValue1, cplxValue2, result; + for(;number < quarterPoints; number++){ + cplxValue1 = _mm_loadu_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue2 = _mm_loadu_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue1 = _mm_mul_ps(cplxValue1, cplxValue1); // Square the values + cplxValue2 = _mm_mul_ps(cplxValue2, cplxValue2); // Square the Values + + result = _mm_hadd_ps(cplxValue1, cplxValue2); // Add the I2 and Q2 values + + result = _mm_sqrt_ps(result); + + _mm_storeu_ps(magnitudeVectorPtr, result); + magnitudeVectorPtr += 4; + } + + number = quarterPoints * 4; + for(; number < num_points; number++){ + float val1Real = *complexVectorPtr++; + float val1Imag = *complexVectorPtr++; + *magnitudeVectorPtr++ = sqrtf((val1Real * val1Real) + (val1Imag * val1Imag)); + } +} +#endif /* LV_HAVE_SSE3 */ + +#ifdef LV_HAVE_SSE +#include <xmmintrin.h> + /*! + \brief Calculates the magnitude of the complexVector and stores the results in the magnitudeVector + \param complexVector The vector containing the complex input values + \param magnitudeVector The vector containing the real output values + \param num_points The number of complex values in complexVector to be calculated and stored into cVector + */ +static inline void volk_32fc_magnitude_32f_u_sse(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + const float* complexVectorPtr = (float*)complexVector; + float* magnitudeVectorPtr = magnitudeVector; + + __m128 cplxValue1, cplxValue2, iValue, qValue, result; + for(;number < quarterPoints; number++){ + cplxValue1 = _mm_loadu_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue2 = _mm_loadu_ps(complexVectorPtr); + complexVectorPtr += 4; + + // Arrange in i1i2i3i4 format + iValue = _mm_shuffle_ps(cplxValue1, cplxValue2, _MM_SHUFFLE(2,0,2,0)); + // Arrange in q1q2q3q4 format + qValue = _mm_shuffle_ps(cplxValue1, cplxValue2, _MM_SHUFFLE(3,1,3,1)); + + iValue = _mm_mul_ps(iValue, iValue); // Square the I values + qValue = _mm_mul_ps(qValue, qValue); // Square the Q Values + + result = _mm_add_ps(iValue, qValue); // Add the I2 and Q2 values + + result = _mm_sqrt_ps(result); + + _mm_storeu_ps(magnitudeVectorPtr, result); + magnitudeVectorPtr += 4; + } + + number = quarterPoints * 4; + for(; number < num_points; number++){ + float val1Real = *complexVectorPtr++; + float val1Imag = *complexVectorPtr++; + *magnitudeVectorPtr++ = sqrtf((val1Real * val1Real) + (val1Imag * val1Imag)); + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_GENERIC + /*! + \brief Calculates the magnitude of the complexVector and stores the results in the magnitudeVector + \param complexVector The vector containing the complex input values + \param magnitudeVector The vector containing the real output values + \param num_points The number of complex values in complexVector to be calculated and stored into cVector + */ +static inline void volk_32fc_magnitude_32f_u_generic(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){ + const float* complexVectorPtr = (float*)complexVector; + float* magnitudeVectorPtr = magnitudeVector; + unsigned int number = 0; + for(number = 0; number < num_points; number++){ + const float real = *complexVectorPtr++; + const float imag = *complexVectorPtr++; + *magnitudeVectorPtr++ = sqrtf((real*real) + (imag*imag)); + } +} +#endif /* LV_HAVE_GENERIC */ + +#endif /* INCLUDED_volk_32fc_magnitude_32f_u_H */ diff --git a/volk/include/volk/volk_32fc_magnitude_squared_32f_a.h b/volk/include/volk/volk_32fc_magnitude_squared_32f_a.h new file mode 100644 index 000000000..00bdefbb5 --- /dev/null +++ b/volk/include/volk/volk_32fc_magnitude_squared_32f_a.h @@ -0,0 +1,114 @@ +#ifndef INCLUDED_volk_32fc_magnitude_squared_32f_a_H +#define INCLUDED_volk_32fc_magnitude_squared_32f_a_H + +#include <inttypes.h> +#include <stdio.h> +#include <math.h> + +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> + /*! + \brief Calculates the magnitude squared of the complexVector and stores the results in the magnitudeVector + \param complexVector The vector containing the complex input values + \param magnitudeVector The vector containing the real output values + \param num_points The number of complex values in complexVector to be calculated and stored into cVector + */ +static inline void volk_32fc_magnitude_squared_32f_a_sse3(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + const float* complexVectorPtr = (float*)complexVector; + float* magnitudeVectorPtr = magnitudeVector; + + __m128 cplxValue1, cplxValue2, result; + for(;number < quarterPoints; number++){ + cplxValue1 = _mm_load_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue2 = _mm_load_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue1 = _mm_mul_ps(cplxValue1, cplxValue1); // Square the values + cplxValue2 = _mm_mul_ps(cplxValue2, cplxValue2); // Square the Values + + result = _mm_hadd_ps(cplxValue1, cplxValue2); // Add the I2 and Q2 values + + _mm_store_ps(magnitudeVectorPtr, result); + magnitudeVectorPtr += 4; + } + + number = quarterPoints * 4; + for(; number < num_points; number++){ + float val1Real = *complexVectorPtr++; + float val1Imag = *complexVectorPtr++; + *magnitudeVectorPtr++ = (val1Real * val1Real) + (val1Imag * val1Imag); + } +} +#endif /* LV_HAVE_SSE3 */ + +#ifdef LV_HAVE_SSE +#include <xmmintrin.h> + /*! + \brief Calculates the magnitude squared of the complexVector and stores the results in the magnitudeVector + \param complexVector The vector containing the complex input values + \param magnitudeVector The vector containing the real output values + \param num_points The number of complex values in complexVector to be calculated and stored into cVector + */ +static inline void volk_32fc_magnitude_squared_32f_a_sse(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + const float* complexVectorPtr = (float*)complexVector; + float* magnitudeVectorPtr = magnitudeVector; + + __m128 cplxValue1, cplxValue2, iValue, qValue, result; + for(;number < quarterPoints; number++){ + cplxValue1 = _mm_load_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue2 = _mm_load_ps(complexVectorPtr); + complexVectorPtr += 4; + + // Arrange in i1i2i3i4 format + iValue = _mm_shuffle_ps(cplxValue1, cplxValue2, _MM_SHUFFLE(2,0,2,0)); + // Arrange in q1q2q3q4 format + qValue = _mm_shuffle_ps(cplxValue1, cplxValue2, _MM_SHUFFLE(3,1,3,1)); + + iValue = _mm_mul_ps(iValue, iValue); // Square the I values + qValue = _mm_mul_ps(qValue, qValue); // Square the Q Values + + result = _mm_add_ps(iValue, qValue); // Add the I2 and Q2 values + + _mm_store_ps(magnitudeVectorPtr, result); + magnitudeVectorPtr += 4; + } + + number = quarterPoints * 4; + for(; number < num_points; number++){ + float val1Real = *complexVectorPtr++; + float val1Imag = *complexVectorPtr++; + *magnitudeVectorPtr++ = (val1Real * val1Real) + (val1Imag * val1Imag); + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_GENERIC + /*! + \brief Calculates the magnitude squared of the complexVector and stores the results in the magnitudeVector + \param complexVector The vector containing the complex input values + \param magnitudeVector The vector containing the real output values + \param num_points The number of complex values in complexVector to be calculated and stored into cVector + */ +static inline void volk_32fc_magnitude_squared_32f_a_generic(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){ + const float* complexVectorPtr = (float*)complexVector; + float* magnitudeVectorPtr = magnitudeVector; + unsigned int number = 0; + for(number = 0; number < num_points; number++){ + const float real = *complexVectorPtr++; + const float imag = *complexVectorPtr++; + *magnitudeVectorPtr++ = (real*real) + (imag*imag); + } +} +#endif /* LV_HAVE_GENERIC */ + +#endif /* INCLUDED_volk_32fc_magnitude_32f_a_H */ diff --git a/volk/include/volk/volk_32fc_magnitude_squared_32f_u.h b/volk/include/volk/volk_32fc_magnitude_squared_32f_u.h new file mode 100644 index 000000000..6eb4a523a --- /dev/null +++ b/volk/include/volk/volk_32fc_magnitude_squared_32f_u.h @@ -0,0 +1,114 @@ +#ifndef INCLUDED_volk_32fc_magnitude_squared_32f_u_H +#define INCLUDED_volk_32fc_magnitude_squared_32f_u_H + +#include <inttypes.h> +#include <stdio.h> +#include <math.h> + +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> + /*! + \brief Calculates the magnitude squared of the complexVector and stores the results in the magnitudeVector + \param complexVector The vector containing the complex input values + \param magnitudeVector The vector containing the real output values + \param num_points The number of complex values in complexVector to be calculated and stored into cVector + */ +static inline void volk_32fc_magnitude_squared_32f_u_sse3(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + const float* complexVectorPtr = (float*)complexVector; + float* magnitudeVectorPtr = magnitudeVector; + + __m128 cplxValue1, cplxValue2, result; + for(;number < quarterPoints; number++){ + cplxValue1 = _mm_loadu_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue2 = _mm_loadu_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue1 = _mm_mul_ps(cplxValue1, cplxValue1); // Square the values + cplxValue2 = _mm_mul_ps(cplxValue2, cplxValue2); // Square the Values + + result = _mm_hadd_ps(cplxValue1, cplxValue2); // Add the I2 and Q2 values + + _mm_storeu_ps(magnitudeVectorPtr, result); + magnitudeVectorPtr += 4; + } + + number = quarterPoints * 4; + for(; number < num_points; number++){ + float val1Real = *complexVectorPtr++; + float val1Imag = *complexVectorPtr++; + *magnitudeVectorPtr++ = (val1Real * val1Real) + (val1Imag * val1Imag); + } +} +#endif /* LV_HAVE_SSE3 */ + +#ifdef LV_HAVE_SSE +#include <xmmintrin.h> + /*! + \brief Calculates the magnitude squared of the complexVector and stores the results in the magnitudeVector + \param complexVector The vector containing the complex input values + \param magnitudeVector The vector containing the real output values + \param num_points The number of complex values in complexVector to be calculated and stored into cVector + */ +static inline void volk_32fc_magnitude_squared_32f_u_sse(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int quarterPoints = num_points / 4; + + const float* complexVectorPtr = (float*)complexVector; + float* magnitudeVectorPtr = magnitudeVector; + + __m128 cplxValue1, cplxValue2, iValue, qValue, result; + for(;number < quarterPoints; number++){ + cplxValue1 = _mm_loadu_ps(complexVectorPtr); + complexVectorPtr += 4; + + cplxValue2 = _mm_loadu_ps(complexVectorPtr); + complexVectorPtr += 4; + + // Arrange in i1i2i3i4 format + iValue = _mm_shuffle_ps(cplxValue1, cplxValue2, _MM_SHUFFLE(2,0,2,0)); + // Arrange in q1q2q3q4 format + qValue = _mm_shuffle_ps(cplxValue1, cplxValue2, _MM_SHUFFLE(3,1,3,1)); + + iValue = _mm_mul_ps(iValue, iValue); // Square the I values + qValue = _mm_mul_ps(qValue, qValue); // Square the Q Values + + result = _mm_add_ps(iValue, qValue); // Add the I2 and Q2 values + + _mm_storeu_ps(magnitudeVectorPtr, result); + magnitudeVectorPtr += 4; + } + + number = quarterPoints * 4; + for(; number < num_points; number++){ + float val1Real = *complexVectorPtr++; + float val1Imag = *complexVectorPtr++; + *magnitudeVectorPtr++ = (val1Real * val1Real) + (val1Imag * val1Imag); + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_GENERIC + /*! + \brief Calculates the magnitude squared of the complexVector and stores the results in the magnitudeVector + \param complexVector The vector containing the complex input values + \param magnitudeVector The vector containing the real output values + \param num_points The number of complex values in complexVector to be calculated and stored into cVector + */ +static inline void volk_32fc_magnitude_squared_32f_u_generic(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){ + const float* complexVectorPtr = (float*)complexVector; + float* magnitudeVectorPtr = magnitudeVector; + unsigned int number = 0; + for(number = 0; number < num_points; number++){ + const float real = *complexVectorPtr++; + const float imag = *complexVectorPtr++; + *magnitudeVectorPtr++ = (real*real) + (imag*imag); + } +} +#endif /* LV_HAVE_GENERIC */ + +#endif /* INCLUDED_volk_32fc_magnitude_32f_u_H */ diff --git a/volk/include/volk/volk_32fc_s32fc_multiply_32fc_a.h b/volk/include/volk/volk_32fc_s32fc_multiply_32fc_a.h index b27a7259f..534dc2a25 100644 --- a/volk/include/volk/volk_32fc_s32fc_multiply_32fc_a.h +++ b/volk/include/volk/volk_32fc_s32fc_multiply_32fc_a.h @@ -6,7 +6,8 @@ #include <volk/volk_complex.h> #include <float.h> -#ifdef LV_HAVE_GENERIC +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> /*! \brief Multiplies the two input complex vectors and stores their results in the third vector \param cVector The vector where the results will be stored @@ -14,18 +15,44 @@ \param bVector One of the vectors to be multiplied \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector */ -static inline void volk_32fc_s32fc_multiply_32fc_a_generic(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t scalar, unsigned int num_points){ - lv_32fc_t* cPtr = cVector; - const lv_32fc_t* aPtr = aVector; - unsigned int number = 0; +static inline void volk_32fc_s32fc_multiply_32fc_a_sse3(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t scalar, unsigned int num_points){ + unsigned int number = 0; + const unsigned int halfPoints = num_points / 2; - for(number = 0; number < num_points; number++){ - *cPtr++ = (*aPtr++) * scalar; + __m128 x, yl, yh, z, tmp1, tmp2; + lv_32fc_t* c = cVector; + const lv_32fc_t* a = aVector; + + // Set up constant scalar vector + yl = _mm_set_ps1(lv_creal(scalar)); + yh = _mm_set_ps1(lv_cimag(scalar)); + + for(;number < halfPoints; number++){ + + x = _mm_load_ps((float*)a); // Load the ar + ai, br + bi as ar,ai,br,bi + + tmp1 = _mm_mul_ps(x,yl); // tmp1 = ar*cr,ai*cr,br*dr,bi*dr + + x = _mm_shuffle_ps(x,x,0xB1); // Re-arrange x to be ai,ar,bi,br + + tmp2 = _mm_mul_ps(x,yh); // tmp2 = ai*ci,ar*ci,bi*di,br*di + + z = _mm_addsub_ps(tmp1,tmp2); // ar*cr-ai*ci, ai*cr+ar*ci, br*dr-bi*di, bi*dr+br*di + + _mm_store_ps((float*)c,z); // Store the results back into the C container + + a += 2; + c += 2; + } + + if((num_points % 2) != 0) { + *c = (*a) * scalar; } } -#endif /* LV_HAVE_GENERIC */ +#endif /* LV_HAVE_SSE */ -#ifdef LV_HAVE_ORC + +#ifdef LV_HAVE_GENERIC /*! \brief Multiplies the two input complex vectors and stores their results in the third vector \param cVector The vector where the results will be stored @@ -33,11 +60,29 @@ static inline void volk_32fc_s32fc_multiply_32fc_a_generic(lv_32fc_t* cVector, c \param bVector One of the vectors to be multiplied \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector */ -extern void volk_32fc_s32fc_multiply_32fc_a_orc_impl(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t scalar, unsigned int num_points); -static inline void volk_32fc_s32fc_multiply_32fc_a_orc(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t scalar, unsigned int num_points){ - volk_32fc_s32fc_multiply_32fc_a_orc_impl(cVector, aVector, scalar, num_points); +static inline void volk_32fc_s32fc_multiply_32fc_a_generic(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t scalar, unsigned int num_points){ + lv_32fc_t* cPtr = cVector; + const lv_32fc_t* aPtr = aVector; + unsigned int number = num_points; + + // unwrap loop + while (number >= 8){ + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + number -= 8; + } + + // clean up any remaining + while (number-- > 0) + *cPtr++ = *aPtr++ * scalar; } -#endif /* LV_HAVE_ORC */ +#endif /* LV_HAVE_GENERIC */ diff --git a/volk/include/volk/volk_32fc_s32fc_multiply_32fc_u.h b/volk/include/volk/volk_32fc_s32fc_multiply_32fc_u.h new file mode 100644 index 000000000..218c450f8 --- /dev/null +++ b/volk/include/volk/volk_32fc_s32fc_multiply_32fc_u.h @@ -0,0 +1,87 @@ +#ifndef INCLUDED_volk_32fc_s32fc_multiply_32fc_u_H +#define INCLUDED_volk_32fc_s32fc_multiply_32fc_u_H + +#include <inttypes.h> +#include <stdio.h> +#include <volk/volk_complex.h> +#include <float.h> + +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> +/*! + \brief Multiplies the input vector by a scalar and stores the results in the third vector + \param cVector The vector where the results will be stored + \param aVector The vector to be multiplied + \param scalar The complex scalar to multiply aVector + \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32fc_s32fc_multiply_32fc_u_sse3(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t scalar, unsigned int num_points){ + unsigned int number = 0; + const unsigned int halfPoints = num_points / 2; + + __m128 x, yl, yh, z, tmp1, tmp2; + lv_32fc_t* c = cVector; + const lv_32fc_t* a = aVector; + + // Set up constant scalar vector + yl = _mm_set_ps1(lv_creal(scalar)); + yh = _mm_set_ps1(lv_cimag(scalar)); + + for(;number < halfPoints; number++){ + + x = _mm_loadu_ps((float*)a); // Load the ar + ai, br + bi as ar,ai,br,bi + + tmp1 = _mm_mul_ps(x,yl); // tmp1 = ar*cr,ai*cr,br*dr,bi*dr + + x = _mm_shuffle_ps(x,x,0xB1); // Re-arrange x to be ai,ar,bi,br + + tmp2 = _mm_mul_ps(x,yh); // tmp2 = ai*ci,ar*ci,bi*di,br*di + + z = _mm_addsub_ps(tmp1,tmp2); // ar*cr-ai*ci, ai*cr+ar*ci, br*dr-bi*di, bi*dr+br*di + + _mm_storeu_ps((float*)c,z); // Store the results back into the C container + + a += 2; + c += 2; + } + + if((num_points % 2) != 0) { + *c = (*a) * scalar; + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_GENERIC +/*! + \brief Multiplies the input vector by a scalar and stores the results in the third vector + \param cVector The vector where the results will be stored + \param aVector The vector to be multiplied + \param scalar The complex scalar to multiply aVector + \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector +*/ +static inline void volk_32fc_s32fc_multiply_32fc_u_generic(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t scalar, unsigned int num_points){ + lv_32fc_t* cPtr = cVector; + const lv_32fc_t* aPtr = aVector; + unsigned int number = num_points; + + // unwrap loop + while (number >= 8){ + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + *cPtr++ = (*aPtr++) * scalar; + number -= 8; + } + + // clean up any remaining + while (number-- > 0) + *cPtr++ = *aPtr++ * scalar; +} +#endif /* LV_HAVE_GENERIC */ + + +#endif /* INCLUDED_volk_32fc_x2_multiply_32fc_u_H */ diff --git a/volk/include/volk/volk_32fc_x2_multiply_32fc_a.h b/volk/include/volk/volk_32fc_x2_multiply_32fc_a.h index 18dd092e8..aec8bd716 100644 --- a/volk/include/volk/volk_32fc_x2_multiply_32fc_a.h +++ b/volk/include/volk/volk_32fc_x2_multiply_32fc_a.h @@ -23,7 +23,6 @@ static inline void volk_32fc_x2_multiply_32fc_a_sse3(lv_32fc_t* cVector, const l lv_32fc_t* c = cVector; const lv_32fc_t* a = aVector; const lv_32fc_t* b = bVector; - for(;number < halfPoints; number++){ x = _mm_load_ps((float*)a); // Load the ar + ai, br + bi as ar,ai,br,bi diff --git a/volk/include/volk/volk_32fc_x2_multiply_32fc_u.h b/volk/include/volk/volk_32fc_x2_multiply_32fc_u.h new file mode 100644 index 000000000..729c1a4ad --- /dev/null +++ b/volk/include/volk/volk_32fc_x2_multiply_32fc_u.h @@ -0,0 +1,77 @@ +#ifndef INCLUDED_volk_32fc_x2_multiply_32fc_u_H +#define INCLUDED_volk_32fc_x2_multiply_32fc_u_H + +#include <inttypes.h> +#include <stdio.h> +#include <volk/volk_complex.h> +#include <float.h> + +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> + /*! + \brief Multiplies the two input complex vectors and stores their results in the third vector + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param bVector One of the vectors to be multiplied + \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector + */ +static inline void volk_32fc_x2_multiply_32fc_u_sse3(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int halfPoints = num_points / 2; + + __m128 x, y, yl, yh, z, tmp1, tmp2; + lv_32fc_t* c = cVector; + const lv_32fc_t* a = aVector; + const lv_32fc_t* b = bVector; + + for(;number < halfPoints; number++){ + + x = _mm_loadu_ps((float*)a); // Load the ar + ai, br + bi as ar,ai,br,bi + y = _mm_loadu_ps((float*)b); // Load the cr + ci, dr + di as cr,ci,dr,di + + yl = _mm_moveldup_ps(y); // Load yl with cr,cr,dr,dr + yh = _mm_movehdup_ps(y); // Load yh with ci,ci,di,di + + tmp1 = _mm_mul_ps(x,yl); // tmp1 = ar*cr,ai*cr,br*dr,bi*dr + + x = _mm_shuffle_ps(x,x,0xB1); // Re-arrange x to be ai,ar,bi,br + + tmp2 = _mm_mul_ps(x,yh); // tmp2 = ai*ci,ar*ci,bi*di,br*di + + z = _mm_addsub_ps(tmp1,tmp2); // ar*cr-ai*ci, ai*cr+ar*ci, br*dr-bi*di, bi*dr+br*di + + _mm_storeu_ps((float*)c,z); // Store the results back into the C container + + a += 2; + b += 2; + c += 2; + } + + if((num_points % 2) != 0) { + *c = (*a) * (*b); + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_GENERIC + /*! + \brief Multiplies the two input complex vectors and stores their results in the third vector + \param cVector The vector where the results will be stored + \param aVector One of the vectors to be multiplied + \param bVector One of the vectors to be multiplied + \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector + */ +static inline void volk_32fc_x2_multiply_32fc_u_generic(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){ + lv_32fc_t* cPtr = cVector; + const lv_32fc_t* aPtr = aVector; + const lv_32fc_t* bPtr= bVector; + unsigned int number = 0; + + for(number = 0; number < num_points; number++){ + *cPtr++ = (*aPtr++) * (*bPtr++); + } +} +#endif /* LV_HAVE_GENERIC */ + + +#endif /* INCLUDED_volk_32fc_x2_multiply_32fc_u_H */ diff --git a/volk/include/volk/volk_32fc_x2_multiply_conjugate_32fc_a.h b/volk/include/volk/volk_32fc_x2_multiply_conjugate_32fc_a.h new file mode 100644 index 000000000..2a1bcbce0 --- /dev/null +++ b/volk/include/volk/volk_32fc_x2_multiply_conjugate_32fc_a.h @@ -0,0 +1,81 @@ +#ifndef INCLUDED_volk_32fc_x2_multiply_conjugate_32fc_a_H +#define INCLUDED_volk_32fc_x2_multiply_conjugate_32fc_a_H + +#include <inttypes.h> +#include <stdio.h> +#include <volk/volk_complex.h> +#include <float.h> + +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> + /*! + \brief Multiplies vector a by the conjugate of vector b and stores the results in the third vector + \param cVector The vector where the results will be stored + \param aVector First vector to be multiplied + \param bVector Second vector that is conjugated before being multiplied + \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector + */ +static inline void volk_32fc_x2_multiply_conjugate_32fc_a_sse3(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int halfPoints = num_points / 2; + + __m128 x, y, yl, yh, z, tmp1, tmp2; + lv_32fc_t* c = cVector; + const lv_32fc_t* a = aVector; + const lv_32fc_t* b = bVector; + + __m128 conjugator = _mm_setr_ps(0, -0.f, 0, -0.f); + + for(;number < halfPoints; number++){ + + x = _mm_load_ps((float*)a); // Load the ar + ai, br + bi as ar,ai,br,bi + y = _mm_load_ps((float*)b); // Load the cr + ci, dr + di as cr,ci,dr,di + + y = _mm_xor_ps(y, conjugator); // conjugate y + + yl = _mm_moveldup_ps(y); // Load yl with cr,cr,dr,dr + yh = _mm_movehdup_ps(y); // Load yh with ci,ci,di,di + + tmp1 = _mm_mul_ps(x,yl); // tmp1 = ar*cr,ai*cr,br*dr,bi*dr + + x = _mm_shuffle_ps(x,x,0xB1); // Re-arrange x to be ai,ar,bi,br + + tmp2 = _mm_mul_ps(x,yh); // tmp2 = ai*ci,ar*ci,bi*di,br*di + + z = _mm_addsub_ps(tmp1,tmp2); // ar*cr-ai*ci, ai*cr+ar*ci, br*dr-bi*di, bi*dr+br*di + + _mm_store_ps((float*)c,z); // Store the results back into the C container + + a += 2; + b += 2; + c += 2; + } + + if((num_points % 2) != 0) { + *c = (*a) * lv_conj(*b); + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_GENERIC + /*! + \brief Multiplies vector a by the conjugate of vector b and stores the results in the third vector + \param cVector The vector where the results will be stored + \param aVector First vector to be multiplied + \param bVector Second vector that is conjugated before being multiplied + \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector + */ +static inline void volk_32fc_x2_multiply_conjugate_32fc_a_generic(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){ + lv_32fc_t* cPtr = cVector; + const lv_32fc_t* aPtr = aVector; + const lv_32fc_t* bPtr= bVector; + unsigned int number = 0; + + for(number = 0; number < num_points; number++){ + *cPtr++ = (*aPtr++) * lv_conj(*bPtr++); + } +} +#endif /* LV_HAVE_GENERIC */ + + +#endif /* INCLUDED_volk_32fc_x2_multiply_conjugate_32fc_a_H */ diff --git a/volk/include/volk/volk_32fc_x2_multiply_conjugate_32fc_u.h b/volk/include/volk/volk_32fc_x2_multiply_conjugate_32fc_u.h new file mode 100644 index 000000000..92f6a051e --- /dev/null +++ b/volk/include/volk/volk_32fc_x2_multiply_conjugate_32fc_u.h @@ -0,0 +1,81 @@ +#ifndef INCLUDED_volk_32fc_x2_multiply_conjugate_32fc_u_H +#define INCLUDED_volk_32fc_x2_multiply_conjugate_32fc_u_H + +#include <inttypes.h> +#include <stdio.h> +#include <volk/volk_complex.h> +#include <float.h> + +#ifdef LV_HAVE_SSE3 +#include <pmmintrin.h> + /*! + \brief Multiplies vector a by the conjugate of vector b and stores the results in the third vector + \param cVector The vector where the results will be stored + \param aVector First vector to be multiplied + \param bVector Second vector that is conjugated before being multiplied + \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector + */ +static inline void volk_32fc_x2_multiply_conjugate_32fc_u_sse3(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){ + unsigned int number = 0; + const unsigned int halfPoints = num_points / 2; + + __m128 x, y, yl, yh, z, tmp1, tmp2; + lv_32fc_t* c = cVector; + const lv_32fc_t* a = aVector; + const lv_32fc_t* b = bVector; + + __m128 conjugator = _mm_setr_ps(0, -0.f, 0, -0.f); + + for(;number < halfPoints; number++){ + + x = _mm_loadu_ps((float*)a); // Load the ar + ai, br + bi as ar,ai,br,bi + y = _mm_loadu_ps((float*)b); // Load the cr + ci, dr + di as cr,ci,dr,di + + y = _mm_xor_ps(y, conjugator); // conjugate y + + yl = _mm_moveldup_ps(y); // Load yl with cr,cr,dr,dr + yh = _mm_movehdup_ps(y); // Load yh with ci,ci,di,di + + tmp1 = _mm_mul_ps(x,yl); // tmp1 = ar*cr,ai*cr,br*dr,bi*dr + + x = _mm_shuffle_ps(x,x,0xB1); // Re-arrange x to be ai,ar,bi,br + + tmp2 = _mm_mul_ps(x,yh); // tmp2 = ai*ci,ar*ci,bi*di,br*di + + z = _mm_addsub_ps(tmp1,tmp2); // ar*cr-ai*ci, ai*cr+ar*ci, br*dr-bi*di, bi*dr+br*di + + _mm_storeu_ps((float*)c,z); // Store the results back into the C container + + a += 2; + b += 2; + c += 2; + } + + if((num_points % 2) != 0) { + *c = (*a) * lv_conj(*b); + } +} +#endif /* LV_HAVE_SSE */ + +#ifdef LV_HAVE_GENERIC + /*! + \brief Multiplies vector a by the conjugate of vector b and stores the results in the third vector + \param cVector The vector where the results will be stored + \param aVector First vector to be multiplied + \param bVector Second vector that is conjugated before being multiplied + \param num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector + */ +static inline void volk_32fc_x2_multiply_conjugate_32fc_u_generic(lv_32fc_t* cVector, const lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){ + lv_32fc_t* cPtr = cVector; + const lv_32fc_t* aPtr = aVector; + const lv_32fc_t* bPtr= bVector; + unsigned int number = 0; + + for(number = 0; number < num_points; number++){ + *cPtr++ = (*aPtr++) * lv_conj(*bPtr++); + } +} +#endif /* LV_HAVE_GENERIC */ + + +#endif /* INCLUDED_volk_32fc_x2_multiply_conjugate_32fc_u_H */ diff --git a/volk/include/volk/volk_64u_popcnt_a.h b/volk/include/volk/volk_64u_popcnt_a.h index bdaa98643..4683f1e38 100644 --- a/volk/include/volk/volk_64u_popcnt_a.h +++ b/volk/include/volk/volk_64u_popcnt_a.h @@ -10,10 +10,11 @@ static inline void volk_64u_popcnt_a_generic(uint64_t* ret, const uint64_t value) { - const uint32_t* valueVector = (const uint32_t*)&value; + //const uint32_t* valueVector = (const uint32_t*)&value; // This is faster than a lookup table - uint32_t retVal = valueVector[0]; + //uint32_t retVal = valueVector[0]; + uint32_t retVal = (uint32_t)(value && 0x00000000FFFFFFFF); retVal = (retVal & 0x55555555) + (retVal >> 1 & 0x55555555); retVal = (retVal & 0x33333333) + (retVal >> 2 & 0x33333333); @@ -22,7 +23,8 @@ static inline void volk_64u_popcnt_a_generic(uint64_t* ret, const uint64_t value retVal = (retVal + (retVal >> 16)) & 0x0000003F; uint64_t retVal64 = retVal; - retVal = valueVector[1]; + //retVal = valueVector[1]; + retVal = (uint32_t)((value && 0xFFFFFFFF00000000) >> 31); retVal = (retVal & 0x55555555) + (retVal >> 1 & 0x55555555); retVal = (retVal & 0x33333333) + (retVal >> 2 & 0x33333333); retVal = (retVal + (retVal >> 4)) & 0x0F0F0F0F; diff --git a/volk/lib/qa_utils.cc b/volk/lib/qa_utils.cc index 9bb515e9f..bb37801c9 100644 --- a/volk/lib/qa_utils.cc +++ b/volk/lib/qa_utils.cc @@ -198,6 +198,18 @@ inline void run_cast_test3_s32f(volk_fn_3arg_s32f func, std::vector<void *> &buf while(iter--) func(buffs[0], buffs[1], buffs[2], scalar, vlen, arch.c_str()); } +inline void run_cast_test1_s32fc(volk_fn_1arg_s32fc func, std::vector<void *> &buffs, lv_32fc_t scalar, unsigned int vlen, unsigned int iter, std::string arch) { + while(iter--) func(buffs[0], scalar, vlen, arch.c_str()); +} + +inline void run_cast_test2_s32fc(volk_fn_2arg_s32fc func, std::vector<void *> &buffs, lv_32fc_t scalar, unsigned int vlen, unsigned int iter, std::string arch) { + while(iter--) func(buffs[0], buffs[1], scalar, vlen, arch.c_str()); +} + +inline void run_cast_test3_s32fc(volk_fn_3arg_s32fc func, std::vector<void *> &buffs, lv_32fc_t scalar, unsigned int vlen, unsigned int iter, std::string arch) { + while(iter--) func(buffs[0], buffs[1], buffs[2], scalar, vlen, arch.c_str()); +} + template <class t> bool fcompare(t *in1, t *in2, unsigned int vlen, float tol) { bool fail = false; @@ -246,7 +258,7 @@ bool run_volk_tests(struct volk_func_desc desc, void (*manual_func)(), std::string name, float tol, - float scalar, + lv_32fc_t scalar, int vlen, int iter, std::vector<std::string> *best_arch_vector = 0 @@ -316,21 +328,33 @@ bool run_volk_tests(struct volk_func_desc desc, if(inputsc.size() == 0) { run_cast_test1((volk_fn_1arg)(manual_func), test_data[i], vlen, iter, arch_list[i]); } else if(inputsc.size() == 1 && inputsc[0].is_float) { - run_cast_test1_s32f((volk_fn_1arg_s32f)(manual_func), test_data[i], scalar, vlen, iter, arch_list[i]); + if(inputsc[0].is_complex) { + run_cast_test1_s32fc((volk_fn_1arg_s32fc)(manual_func), test_data[i], scalar, vlen, iter, arch_list[i]); + } else { + run_cast_test1_s32f((volk_fn_1arg_s32f)(manual_func), test_data[i], scalar.real(), vlen, iter, arch_list[i]); + } } else throw "unsupported 1 arg function >1 scalars"; break; case 2: if(inputsc.size() == 0) { run_cast_test2((volk_fn_2arg)(manual_func), test_data[i], vlen, iter, arch_list[i]); } else if(inputsc.size() == 1 && inputsc[0].is_float) { - run_cast_test2_s32f((volk_fn_2arg_s32f)(manual_func), test_data[i], scalar, vlen, iter, arch_list[i]); + if(inputsc[0].is_complex) { + run_cast_test2_s32fc((volk_fn_2arg_s32fc)(manual_func), test_data[i], scalar, vlen, iter, arch_list[i]); + } else { + run_cast_test2_s32f((volk_fn_2arg_s32f)(manual_func), test_data[i], scalar.real(), vlen, iter, arch_list[i]); + } } else throw "unsupported 2 arg function >1 scalars"; break; case 3: if(inputsc.size() == 0) { run_cast_test3((volk_fn_3arg)(manual_func), test_data[i], vlen, iter, arch_list[i]); } else if(inputsc.size() == 1 && inputsc[0].is_float) { - run_cast_test3_s32f((volk_fn_3arg_s32f)(manual_func), test_data[i], scalar, vlen, iter, arch_list[i]); + if(inputsc[0].is_complex) { + run_cast_test3_s32fc((volk_fn_3arg_s32fc)(manual_func), test_data[i], scalar, vlen, iter, arch_list[i]); + } else { + run_cast_test3_s32f((volk_fn_3arg_s32f)(manual_func), test_data[i], scalar.real(), vlen, iter, arch_list[i]); + } } else throw "unsupported 3 arg function >1 scalars"; break; case 4: diff --git a/volk/lib/qa_utils.h b/volk/lib/qa_utils.h index a1bc1f20c..b998df852 100644 --- a/volk/lib/qa_utils.h +++ b/volk/lib/qa_utils.h @@ -21,7 +21,7 @@ volk_type_t volk_type_from_string(std::string); float uniform(void); void random_floats(float *buf, unsigned n); -bool run_volk_tests(struct volk_func_desc, void(*)(), std::string, float, float, int, int, std::vector<std::string> *); +bool run_volk_tests(struct volk_func_desc, void(*)(), std::string, float, lv_32fc_t, int, int, std::vector<std::string> *); #define VOLK_RUN_TESTS(func, tol, scalar, len, iter) BOOST_AUTO_TEST_CASE(func##_test) { BOOST_CHECK_EQUAL(run_volk_tests(func##_get_func_desc(), (void (*)())func##_manual, std::string(#func), tol, scalar, len, iter, 0), 0); } #define VOLK_PROFILE(func, tol, scalar, len, iter, results) run_volk_tests(func##_get_func_desc(), (void (*)())func##_manual, std::string(#func), tol, scalar, len, iter, results) @@ -32,5 +32,8 @@ typedef void (*volk_fn_4arg)(void *, void *, void *, void *, unsigned int, const typedef void (*volk_fn_1arg_s32f)(void *, float, unsigned int, const char*); //one input vector, one scalar float input typedef void (*volk_fn_2arg_s32f)(void *, void *, float, unsigned int, const char*); typedef void (*volk_fn_3arg_s32f)(void *, void *, void *, float, unsigned int, const char*); +typedef void (*volk_fn_1arg_s32fc)(void *, lv_32fc_t, unsigned int, const char*); //one input vector, one scalar float input +typedef void (*volk_fn_2arg_s32fc)(void *, void *, lv_32fc_t, unsigned int, const char*); +typedef void (*volk_fn_3arg_s32fc)(void *, void *, void *, lv_32fc_t, unsigned int, const char*); #endif //VOLK_QA_UTILS_H diff --git a/volk/lib/testqa.cc b/volk/lib/testqa.cc index fbd4bdea5..593087f85 100644 --- a/volk/lib/testqa.cc +++ b/volk/lib/testqa.cc @@ -22,6 +22,7 @@ VOLK_RUN_TESTS(volk_16i_convert_8i_u, 0, 0, 20460, 1); VOLK_RUN_TESTS(volk_16u_byteswap_a, 0, 0, 20460, 1); VOLK_RUN_TESTS(volk_32f_accumulator_s32f_a, 1e-4, 0, 20460, 1); VOLK_RUN_TESTS(volk_32f_x2_add_32f_a, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32f_x2_add_32f_u, 1e-4, 0, 20460, 1); VOLK_RUN_TESTS(volk_32fc_32f_multiply_32fc_a, 1e-4, 0, 20460, 1); VOLK_RUN_TESTS(volk_32fc_s32f_power_32fc_a, 1e-4, 0, 20460, 1); VOLK_RUN_TESTS(volk_32f_s32f_calc_spectral_noise_floor_32f_a, 1e-4, 20.0, 20460, 1); @@ -37,7 +38,6 @@ VOLK_RUN_TESTS(volk_32fc_x2_dot_prod_32fc_a, 1e-4, 0, 204600, 1); VOLK_RUN_TESTS(volk_32fc_index_max_16u_a, 3, 0, 20460, 1); VOLK_RUN_TESTS(volk_32fc_s32f_magnitude_16i_a, 1, 32768, 20460, 1); VOLK_RUN_TESTS(volk_32fc_magnitude_32f_a, 1e-4, 0, 20460, 1); -VOLK_RUN_TESTS(volk_32fc_x2_multiply_32fc_a, 1e-4, 0, 20460, 1); VOLK_RUN_TESTS(volk_32f_s32f_convert_16i_a, 1, 32768, 20460, 1); VOLK_RUN_TESTS(volk_32f_s32f_convert_16i_u, 1, 32768, 20460, 1); VOLK_RUN_TESTS(volk_32f_s32f_convert_32i_a, 1, 2<<31, 20460, 1); @@ -59,7 +59,6 @@ VOLK_RUN_TESTS(volk_32f_x2_s32f_interleave_16ic_a, 1, 32768, 20460, 1); VOLK_RUN_TESTS(volk_32f_x2_interleave_32fc_a, 0, 0, 20460, 1); VOLK_RUN_TESTS(volk_32f_x2_max_32f_a, 1e-4, 0, 20460, 1); VOLK_RUN_TESTS(volk_32f_x2_min_32f_a, 1e-4, 0, 20460, 1); -VOLK_RUN_TESTS(volk_32f_x2_multiply_32f_a, 1e-4, 0, 20460, 1); VOLK_RUN_TESTS(volk_32f_s32f_normalize_a, 1e-4, 100, 20460, 1); VOLK_RUN_TESTS(volk_32f_s32f_power_32f_a, 1e-4, 4, 20460, 1); VOLK_RUN_TESTS(volk_32f_sqrt_32f_a, 1e-4, 0, 20460, 1); @@ -90,3 +89,15 @@ VOLK_RUN_TESTS(volk_8i_convert_16i_a, 0, 0, 20460, 1); VOLK_RUN_TESTS(volk_8i_convert_16i_u, 0, 0, 20460, 1); VOLK_RUN_TESTS(volk_8i_s32f_convert_32f_a, 1e-4, 100, 20460, 1); VOLK_RUN_TESTS(volk_8i_s32f_convert_32f_u, 1e-4, 100, 20460, 1); +VOLK_RUN_TESTS(volk_32fc_x2_multiply_32fc_a, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32fc_x2_multiply_32fc_u, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32fc_x2_multiply_conjugate_32fc_a, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32fc_x2_multiply_conjugate_32fc_u, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32fc_conjugate_32fc_a, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32fc_conjugate_32fc_u, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32f_x2_multiply_32f_a, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32f_x2_multiply_32f_u, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32fc_s32fc_multiply_32fc_a, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32fc_s32fc_multiply_32fc_u, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32f_s32f_multiply_32f_a, 1e-4, 0, 20460, 1); +VOLK_RUN_TESTS(volk_32f_s32f_multiply_32f_u, 1e-4, 0, 20460, 1); |