diff options
author | Tom Rondeau | 2012-02-11 12:27:45 -0500 |
---|---|---|
committer | Tom Rondeau | 2012-02-13 14:57:28 -0500 |
commit | 4589b6d6f062e92fd84965eaf47d3fc30bdf516e (patch) | |
tree | ee9121eb1f7e1ac6b5c639db1402c573b3c6230e /docs/doxygen/other | |
parent | f671319ca9ccef8fb1590e676ff6bcb85d7ca5a1 (diff) | |
download | gnuradio-4589b6d6f062e92fd84965eaf47d3fc30bdf516e.tar.gz gnuradio-4589b6d6f062e92fd84965eaf47d3fc30bdf516e.tar.bz2 gnuradio-4589b6d6f062e92fd84965eaf47d3fc30bdf516e.zip |
volk: added some documentation to the Doxygen manual explaining Volk and how to use it.
Diffstat (limited to 'docs/doxygen/other')
-rw-r--r-- | docs/doxygen/other/main_page.dox | 10 | ||||
-rw-r--r-- | docs/doxygen/other/volk_guide.dox | 161 |
2 files changed, 171 insertions, 0 deletions
diff --git a/docs/doxygen/other/main_page.dox b/docs/doxygen/other/main_page.dox index 0caa0b20f..68b098943 100644 --- a/docs/doxygen/other/main_page.dox +++ b/docs/doxygen/other/main_page.dox @@ -38,4 +38,14 @@ More details on packages in GNU Radio: \li \ref page_uhd \li \ref page_vocoder \li \ref page_pfb + +\section volk_main Using Volk in GNU Radio + +The \ref volk_guide page provides an overview of how to incorporate +and use Volk in GNU Radio blocks. + +Many blocks have already been converted to use Volk in their calls, so +they can also serve as examples. See the gr_complex_to_xxx.h file for +examples of various blocks that make use of Volk. + */ diff --git a/docs/doxygen/other/volk_guide.dox b/docs/doxygen/other/volk_guide.dox new file mode 100644 index 000000000..d898f3864 --- /dev/null +++ b/docs/doxygen/other/volk_guide.dox @@ -0,0 +1,161 @@ +/*! \page volk_guide Instructions for using Volk in GNU Radio + +\section volk_intro Introduction + +Volk is the Vector-Optimized Library of Kernels. It is a library that +contains kernels of hand-written SIMD code for different mathematical +operations. Since each SIMD architecture can be greatly different and +no compiler has yet come along to handle vectorization properly or +highly efficiently, Volk approaches the problem differently. For each +architecture or platform that a developer wishes to vectorize for, a +new proto-kernel is added to Volk. At runtime, Volk will select the +correct proto-kernel. In this way, the users of Volk call a kernel for +performing the operation that is platform/architecture agnostic. This +allows us to write portable SIMD code. + +Volk kernels are always defined with a 'generic' proto-kernel, which +is written in plain C. With the generic kernel, the kernel becomes +portable to any platform. Kernels are then extended by adding +proto-kernels for new platforms in which they are desired. + +A good example of a Volk kernel with multiple proto-kernels defined is +the volk_32f_s32f_multiply_32f_a. This kernel implements a scalar +multiplication of a vector of floating point numbers (each item in the +vector is multiplied by the same value). This kernel has the following +proto-kernels that are defined for 'generic,' 'avx,' 'sse,' and 'orc.' + +\code + void volk_32f_s32f_multiply_32f_a_generic + void volk_32f_s32f_multiply_32f_a_sse + void volk_32f_s32f_multiply_32f_a_avx + void volk_32f_s32f_multiply_32f_a_orc +\endcode + +These proto-kernels means that on platforms with AVX support, Volk can +select this option or the SSE option, depending on which is faster. On +other platforms, the ORC SIMD compiler might provide a solution. If +all else fails, Volk can fall back on the generic proto-kernel, which +will always work. + +Just a note on ORC. ORC is a SIMD compiler library that uses a generic +assembly-like language for SIMD commands. Based on the available SIMD +architecture of a system, it will try and compile a good +solution. Tests show that the results of ORC proto-kernels are +generally better than the generic versions but often not as good as +the hand-tuned proto-kernels for a specific SIMD architecture. This +is, of course, to be expected, and ORC provides a nice intermediary +step to performance improvements until a specific hand-tuned +proto-kernel can be made for a given platform. + +See <a +href="http://gnuradio.org/redmine/projects/gnuradio/wiki/Volk">Volk on +gnuradio.org</a> for details on the Volk naming scheme. + + +\section volk_alignment Setting and Using Memory Alignment Information + +For Volk to work as best as possible, we want to use memory-aligned +SIMD calls, which means we have to have some way of knowing and +controlling the alignment of the buffers passed to gr_block's work +function. We set the alignment requirement for SIMD aligned memory +calls with: + +\code + const int alignment_multiple = + volk_get_alignment() / output_item_size; + set_alignment(alignment_multiple); +\endcode + +The Volk function 'volk_get_alignment' provides the alignment of the +the machine architecture. We then base the alignment on the number of +output items required to maintain the alignment, so we divide the +number of alignment bytes by the number of bytes in an output items +(sizeof(float), sizeof(gr_complex), etc.). This value is then set per +block with the 'set_alignment' function. + +Because the scheduler tries to optimize throughput, the number of +items available per call to work will change and depends on the +availability of the read and write buffers. This means that it +sometimes cannot produce a buffer that is properly memory +aligned. This is an inevitable consequence of the scheduler +system. Instead of requiring alignment, the scheduler enforces the +alignment as much as possible, and when a buffer becomes unaligned, +the scheduler will work to correct it as much as possible. If a +block's buffers are unaligned, then, the scheduler sets a flag to +indicate as much so that the block can then decide what best to +do. The next section discusses the use of the aligned/unaligned +information in a gr_block's work function. + + +\section volk_work Using Alignment Properties in Work() + +The buffers passed to work/general_work in a gr_block are not +guaranteed to be aligned, but they will mostly be aligned whenever +possible. When not aligned, the 'is_unaligned()' flag will be set. So +a block can know if its buffers are aligned and make the right +decisions. This looks like: + +\code +int +gr_some_block::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + const float *in = (const float *) input_items[0]; + float *out = (float *) output_items[0]; + + if(is_unaligned()) { + // do something with unaligned data. This can either be a manual + // handling of the items or a call to an unaligned Volk function. + volk_32f_something_32f_u(out, in, noutput_items); + } + else { + // Buffers are aligned; can call the aligned Volk function. + volk_32f_something_32f_a(out, in, noutput_items); + } + + return noutput_items; +} +\endcode + + + +\section volk_tuning Tuning Volk Performance + +VOLK comes with a profiler that will build a config file for the best +SIMD architecture for your processor. Run volk_profile that is +installed into $PREFIX/bin. This program tests all known VOLK kernels +for each architecture supported by the processor. When finished, it +will write to $HOME/.volk/volk_config the best architecture for the +VOLK function. This file is read when using a function to know the +best version of the function to execute. + +\subsection volk_hand_tuning Hand-Tuning Performance + +If you know a particular architecture works best for your processor, +you can specify the particular architecture to use in the VOLK +preferences file: $HOME/.volk/volk_config + +The file looks like: + +\code + volk_<FUNCTION_NAME> <ARCHITECTURE> +\endcode + +Where the "FUNCTION_NAME" is the particular function that you want to +over-ride the default value and "ARCHITECTURE" is the VOLK SIMD +architecture to use (generic, sse, sse2, sse3, avx, etc.). For +example, the following config file tells VOLK to use SSE3 for the +aligned and unaligned versions of a function that multiplies two +complex streams together. + +\code + volk_32fc_x2_multiply_32fc_a sse3 + volk_32fc_x2_multiply_32fc_u sse3 +\endcode + +\b Tip: if benchmarking GNU Radio blocks, it can be useful to have a +volk_config file that sets all architectures to 'generic' as a way to +test the vectorized versus non-vectorized implementations. + +*/ |