summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTom Rondeau2012-02-11 12:27:45 -0500
committerTom Rondeau2012-02-13 14:57:28 -0500
commit4589b6d6f062e92fd84965eaf47d3fc30bdf516e (patch)
treeee9121eb1f7e1ac6b5c639db1402c573b3c6230e
parentf671319ca9ccef8fb1590e676ff6bcb85d7ca5a1 (diff)
downloadgnuradio-4589b6d6f062e92fd84965eaf47d3fc30bdf516e.tar.gz
gnuradio-4589b6d6f062e92fd84965eaf47d3fc30bdf516e.tar.bz2
gnuradio-4589b6d6f062e92fd84965eaf47d3fc30bdf516e.zip
volk: added some documentation to the Doxygen manual explaining Volk and how to use it.
-rw-r--r--docs/doxygen/other/main_page.dox10
-rw-r--r--docs/doxygen/other/volk_guide.dox161
2 files changed, 171 insertions, 0 deletions
diff --git a/docs/doxygen/other/main_page.dox b/docs/doxygen/other/main_page.dox
index 0caa0b20f..68b098943 100644
--- a/docs/doxygen/other/main_page.dox
+++ b/docs/doxygen/other/main_page.dox
@@ -38,4 +38,14 @@ More details on packages in GNU Radio:
\li \ref page_uhd
\li \ref page_vocoder
\li \ref page_pfb
+
+\section volk_main Using Volk in GNU Radio
+
+The \ref volk_guide page provides an overview of how to incorporate
+and use Volk in GNU Radio blocks.
+
+Many blocks have already been converted to use Volk in their calls, so
+they can also serve as examples. See the gr_complex_to_xxx.h file for
+examples of various blocks that make use of Volk.
+
*/
diff --git a/docs/doxygen/other/volk_guide.dox b/docs/doxygen/other/volk_guide.dox
new file mode 100644
index 000000000..d898f3864
--- /dev/null
+++ b/docs/doxygen/other/volk_guide.dox
@@ -0,0 +1,161 @@
+/*! \page volk_guide Instructions for using Volk in GNU Radio
+
+\section volk_intro Introduction
+
+Volk is the Vector-Optimized Library of Kernels. It is a library that
+contains kernels of hand-written SIMD code for different mathematical
+operations. Since each SIMD architecture can be greatly different and
+no compiler has yet come along to handle vectorization properly or
+highly efficiently, Volk approaches the problem differently. For each
+architecture or platform that a developer wishes to vectorize for, a
+new proto-kernel is added to Volk. At runtime, Volk will select the
+correct proto-kernel. In this way, the users of Volk call a kernel for
+performing the operation that is platform/architecture agnostic. This
+allows us to write portable SIMD code.
+
+Volk kernels are always defined with a 'generic' proto-kernel, which
+is written in plain C. With the generic kernel, the kernel becomes
+portable to any platform. Kernels are then extended by adding
+proto-kernels for new platforms in which they are desired.
+
+A good example of a Volk kernel with multiple proto-kernels defined is
+the volk_32f_s32f_multiply_32f_a. This kernel implements a scalar
+multiplication of a vector of floating point numbers (each item in the
+vector is multiplied by the same value). This kernel has the following
+proto-kernels that are defined for 'generic,' 'avx,' 'sse,' and 'orc.'
+
+\code
+ void volk_32f_s32f_multiply_32f_a_generic
+ void volk_32f_s32f_multiply_32f_a_sse
+ void volk_32f_s32f_multiply_32f_a_avx
+ void volk_32f_s32f_multiply_32f_a_orc
+\endcode
+
+These proto-kernels means that on platforms with AVX support, Volk can
+select this option or the SSE option, depending on which is faster. On
+other platforms, the ORC SIMD compiler might provide a solution. If
+all else fails, Volk can fall back on the generic proto-kernel, which
+will always work.
+
+Just a note on ORC. ORC is a SIMD compiler library that uses a generic
+assembly-like language for SIMD commands. Based on the available SIMD
+architecture of a system, it will try and compile a good
+solution. Tests show that the results of ORC proto-kernels are
+generally better than the generic versions but often not as good as
+the hand-tuned proto-kernels for a specific SIMD architecture. This
+is, of course, to be expected, and ORC provides a nice intermediary
+step to performance improvements until a specific hand-tuned
+proto-kernel can be made for a given platform.
+
+See <a
+href="http://gnuradio.org/redmine/projects/gnuradio/wiki/Volk">Volk on
+gnuradio.org</a> for details on the Volk naming scheme.
+
+
+\section volk_alignment Setting and Using Memory Alignment Information
+
+For Volk to work as best as possible, we want to use memory-aligned
+SIMD calls, which means we have to have some way of knowing and
+controlling the alignment of the buffers passed to gr_block's work
+function. We set the alignment requirement for SIMD aligned memory
+calls with:
+
+\code
+ const int alignment_multiple =
+ volk_get_alignment() / output_item_size;
+ set_alignment(alignment_multiple);
+\endcode
+
+The Volk function 'volk_get_alignment' provides the alignment of the
+the machine architecture. We then base the alignment on the number of
+output items required to maintain the alignment, so we divide the
+number of alignment bytes by the number of bytes in an output items
+(sizeof(float), sizeof(gr_complex), etc.). This value is then set per
+block with the 'set_alignment' function.
+
+Because the scheduler tries to optimize throughput, the number of
+items available per call to work will change and depends on the
+availability of the read and write buffers. This means that it
+sometimes cannot produce a buffer that is properly memory
+aligned. This is an inevitable consequence of the scheduler
+system. Instead of requiring alignment, the scheduler enforces the
+alignment as much as possible, and when a buffer becomes unaligned,
+the scheduler will work to correct it as much as possible. If a
+block's buffers are unaligned, then, the scheduler sets a flag to
+indicate as much so that the block can then decide what best to
+do. The next section discusses the use of the aligned/unaligned
+information in a gr_block's work function.
+
+
+\section volk_work Using Alignment Properties in Work()
+
+The buffers passed to work/general_work in a gr_block are not
+guaranteed to be aligned, but they will mostly be aligned whenever
+possible. When not aligned, the 'is_unaligned()' flag will be set. So
+a block can know if its buffers are aligned and make the right
+decisions. This looks like:
+
+\code
+int
+gr_some_block::work (int noutput_items,
+ gr_vector_const_void_star &input_items,
+ gr_vector_void_star &output_items)
+{
+ const float *in = (const float *) input_items[0];
+ float *out = (float *) output_items[0];
+
+ if(is_unaligned()) {
+ // do something with unaligned data. This can either be a manual
+ // handling of the items or a call to an unaligned Volk function.
+ volk_32f_something_32f_u(out, in, noutput_items);
+ }
+ else {
+ // Buffers are aligned; can call the aligned Volk function.
+ volk_32f_something_32f_a(out, in, noutput_items);
+ }
+
+ return noutput_items;
+}
+\endcode
+
+
+
+\section volk_tuning Tuning Volk Performance
+
+VOLK comes with a profiler that will build a config file for the best
+SIMD architecture for your processor. Run volk_profile that is
+installed into $PREFIX/bin. This program tests all known VOLK kernels
+for each architecture supported by the processor. When finished, it
+will write to $HOME/.volk/volk_config the best architecture for the
+VOLK function. This file is read when using a function to know the
+best version of the function to execute.
+
+\subsection volk_hand_tuning Hand-Tuning Performance
+
+If you know a particular architecture works best for your processor,
+you can specify the particular architecture to use in the VOLK
+preferences file: $HOME/.volk/volk_config
+
+The file looks like:
+
+\code
+ volk_<FUNCTION_NAME> <ARCHITECTURE>
+\endcode
+
+Where the "FUNCTION_NAME" is the particular function that you want to
+over-ride the default value and "ARCHITECTURE" is the VOLK SIMD
+architecture to use (generic, sse, sse2, sse3, avx, etc.). For
+example, the following config file tells VOLK to use SSE3 for the
+aligned and unaligned versions of a function that multiplies two
+complex streams together.
+
+\code
+ volk_32fc_x2_multiply_32fc_a sse3
+ volk_32fc_x2_multiply_32fc_u sse3
+\endcode
+
+\b Tip: if benchmarking GNU Radio blocks, it can be useful to have a
+volk_config file that sets all architectures to 'generic' as a way to
+test the vectorized versus non-vectorized implementations.
+
+*/