diff options
Diffstat (limited to 'docs/doxygen/other/volk_guide.dox')
-rw-r--r-- | docs/doxygen/other/volk_guide.dox | 152 |
1 files changed, 152 insertions, 0 deletions
diff --git a/docs/doxygen/other/volk_guide.dox b/docs/doxygen/other/volk_guide.dox new file mode 100644 index 0000000000..a6a24457a6 --- /dev/null +++ b/docs/doxygen/other/volk_guide.dox @@ -0,0 +1,152 @@ +/*! \page volk_guide Instructions for using VOLK in GNU Radio + +Note: Many blocks have already been converted to use VOLK in their calls, so +they can also serve as examples. See the +gr::blocks::complex_to_<type>.h files for examples of various blocks +that make use of VOLK. + +\section volk_intro Introduction + +VOLK is the Vector-Optimized Library of Kernels. It is a library that +contains kernels of hand-written SIMD code for different mathematical +operations. Since each SIMD architecture can be greatly different and +no compiler has yet come along to handle vectorization properly or +highly efficiently, VOLK approaches the problem differently. For each +architecture or platform that a developer wishes to vectorize for, a +new proto-kernel is added to VOLK. At runtime, VOLK will select the +correct proto-kernel. In this way, the users of VOLK call a kernel for +performing the operation that is platform/architecture agnostic. This +allows us to write portable SIMD code. + +VOLK kernels are always defined with a 'generic' proto-kernel, which +is written in plain C. With the generic kernel, the kernel becomes +portable to any platform. Kernels are then extended by adding +proto-kernels for new platforms in which they are desired. + +A good example of a VOLK kernel with multiple proto-kernels defined is +the volk_32f_s32f_multiply_32f_a. This kernel implements a scalar +multiplication of a vector of floating point numbers (each item in the +vector is multiplied by the same value). This kernel has the following +proto-kernels that are defined for 'generic,' 'avx,' 'sse,' and 'neon' + +\code + void volk_32f_s32f_multiply_32f_a_generic + void volk_32f_s32f_multiply_32f_a_sse + void volk_32f_s32f_multiply_32f_a_avx + void volk_32f_s32f_multiply_32f_a_neon +\endcode + +These proto-kernels means that on platforms with AVX support, VOLK can +select this option or the SSE option, depending on which is faster. If +all else fails, VOLK can fall back on the generic proto-kernel, which +will always work. + +See <a +href="http://gnuradio.org/redmine/projects/gnuradio/wiki/Volk">VOLK on +gnuradio.org</a> for details on the VOLK naming scheme. + + +\section volk_alignment Setting and Using Memory Alignment Information + +For VOLK to work as best as possible, we want to use memory-aligned +SIMD calls, which means we have to have some way of knowing and +controlling the alignment of the buffers passed to gr_block's work +function. We set the alignment requirement for SIMD aligned memory +calls with: + +\code + const int alignment_multiple = + volk_get_alignment() / output_item_size; + set_alignment(std::max(1,alignment_multiple)); +\endcode + +The VOLK function 'volk_get_alignment' provides the alignment of the +the machine architecture. We then base the alignment on the number of +output items required to maintain the alignment, so we divide the +number of alignment bytes by the number of bytes in an output items +(sizeof(float), sizeof(gr_complex), etc.). This value is then set per +block with the 'set_alignment' function. + +Because the scheduler tries to optimize throughput, the number of +items available per call to work will change and depends on the +availability of the read and write buffers. This means that it +sometimes cannot produce a buffer that is properly memory +aligned. This is an inevitable consequence of the scheduler +system. Instead of requiring alignment, the scheduler enforces the +alignment as much as possible, and when a buffer becomes unaligned, +the scheduler will work to correct it as much as possible. If a +block's buffers are unaligned, then, the scheduler sets a flag to +indicate as much so that the block can then decide what best to +do. The next section discusses the use of the aligned/unaligned +information in a gr_block's work function. + + +\section volk_work Calling VOLK kernels in Work() + +The buffers passed to work/general_work in a gr_block are not +guaranteed to be aligned, but they will mostly be aligned whenever +possible. When not aligned, the 'is_unaligned()' flag will be set so +the scheduler knows to try to realign the buffers. We actually make +calls to the VOLK dispatcher, which is mainly designed to check the +buffer alignments and call the correct version of the kernel for +us. From the user-level view of VOLK, calling the dispatcher allows us +to ignore the concept of aligned versus unaligned. This looks like: + +\code +int +gr_some_block::work (int noutput_items, + gr_vector_const_void_star &input_items, + gr_vector_void_star &output_items) +{ + const float *in = (const float *) input_items[0]; + float *out = (float *) output_items[0]; + + // Call the dispatcher to check alignment and call the _a or _u + // version of the kernel. + volk_32f_something_32f(out, in, noutput_items); + + return noutput_items; +} +\endcode + + + +\section volk_tuning Tuning VOLK Performance + +VOLK comes with a profiler that will build a config file for the best +SIMD architecture for your processor. Run volk_profile that is +installed into $PREFIX/bin. This program tests all known VOLK kernels +for each architecture supported by the processor. When finished, it +will write to $HOME/.volk/volk_config the best architecture for the +VOLK function. This file is read when using a function to know the +best version of the function to execute. + +\subsection volk_hand_tuning Hand-Tuning Performance + +If you know a particular architecture works best for your processor, +you can specify the particular architecture to use in the VOLK +preferences file: $HOME/.volk/volk_config + +The file looks like: + +\code + volk_<FUNCTION_NAME> <ARCHITECTURE> +\endcode + +Where the "FUNCTION_NAME" is the particular function that you want to +over-ride the default value and "ARCHITECTURE" is the VOLK SIMD +architecture to use (generic, sse, sse2, sse3, avx, etc.). For +example, the following config file tells VOLK to use SSE3 for the +aligned and unaligned versions of a function that multiplies two +complex streams together. + +\code + volk_32fc_x2_multiply_32fc_a sse3 + volk_32fc_x2_multiply_32fc_u sse3 +\endcode + +\b Tip: if benchmarking GNU Radio blocks, it can be useful to have a +volk_config file that sets all architectures to 'generic' as a way to +test the vectorized versus non-vectorized implementations. + +*/ |