diff options
author | Marc Lichtman <marcll@vt.edu> | 2018-11-02 00:53:52 -0400 |
---|---|---|
committer | Marc L <marcll@vt.edu> | 2019-06-07 12:45:14 -0400 |
commit | 73b074f121b0ab2ac38336916a60891c4d88d2cb (patch) | |
tree | de8f1782c897af3c278c224dcfbf96345c5c6f1d /docs/doxygen/other/volk_guide.dox | |
parent | b6f15c59e96aa83142c47aeacd64da793dd8ba31 (diff) |
docs: moved usage manual to wiki
docs: first snapshot of wiki's usage manual
Diffstat (limited to 'docs/doxygen/other/volk_guide.dox')
-rw-r--r-- | docs/doxygen/other/volk_guide.dox | 160 |
1 files changed, 0 insertions, 160 deletions
diff --git a/docs/doxygen/other/volk_guide.dox b/docs/doxygen/other/volk_guide.dox deleted file mode 100644 index 1f8f128e4b..0000000000 --- a/docs/doxygen/other/volk_guide.dox +++ /dev/null @@ -1,160 +0,0 @@ -# Copyright (C) 2017 Free Software Foundation, Inc. -# -# Permission is granted to copy, distribute and/or modify this document -# under the terms of the GNU Free Documentation License, Version 1.3 -# or any later version published by the Free Software Foundation; -# with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. -# A copy of the license is included in the section entitled "GNU -# Free Documentation License". - -/*! \page volk_guide Instructions for using VOLK in GNU Radio - -Note: Many blocks have already been converted to use VOLK in their calls, so -they can also serve as examples. See the -gr::blocks::complex_to_<type>.h files for examples of various blocks -that make use of VOLK. - -\section volk_intro Introduction - -VOLK is the Vector-Optimized Library of Kernels. It is a library that -contains kernels of hand-written SIMD code for different mathematical -operations. Since each SIMD architecture can be greatly different and -no compiler has yet come along to handle vectorization properly or -highly efficiently, VOLK approaches the problem differently. For each -architecture or platform that a developer wishes to vectorize for, a -new proto-kernel is added to VOLK. At runtime, VOLK will select the -correct proto-kernel. In this way, the users of VOLK call a kernel for -performing the operation that is platform/architecture agnostic. This -allows us to write portable SIMD code. - -VOLK kernels are always defined with a 'generic' proto-kernel, which -is written in plain C. With the generic kernel, the kernel becomes -portable to any platform. Kernels are then extended by adding -proto-kernels for new platforms in which they are desired. - -A good example of a VOLK kernel with multiple proto-kernels defined is -the volk_32f_s32f_multiply_32f_a. This kernel implements a scalar -multiplication of a vector of floating point numbers (each item in the -vector is multiplied by the same value). This kernel has the following -proto-kernels that are defined for 'generic,' 'avx,' 'sse,' and 'neon' - -\code - void volk_32f_s32f_multiply_32f_a_generic - void volk_32f_s32f_multiply_32f_a_sse - void volk_32f_s32f_multiply_32f_a_avx - void volk_32f_s32f_multiply_32f_a_neon -\endcode - -These proto-kernels means that on platforms with AVX support, VOLK can -select this option or the SSE option, depending on which is faster. If -all else fails, VOLK can fall back on the generic proto-kernel, which -will always work. - -See <a -href="http://libvolk.org</a> for details on the VOLK naming scheme. - - -\section volk_alignment Setting and Using Memory Alignment Information - -For VOLK to work as best as possible, we want to use memory-aligned -SIMD calls, which means we have to have some way of knowing and -controlling the alignment of the buffers passed to gr_block's work -function. We set the alignment requirement for SIMD aligned memory -calls with: - -\code - const int alignment_multiple = - volk_get_alignment() / output_item_size; - set_alignment(std::max(1,alignment_multiple)); -\endcode - -The VOLK function 'volk_get_alignment' provides the alignment of the -the machine architecture. We then base the alignment on the number of -output items required to maintain the alignment, so we divide the -number of alignment bytes by the number of bytes in an output items -(sizeof(float), sizeof(gr_complex), etc.). This value is then set per -block with the 'set_alignment' function. - -Because the scheduler tries to optimize throughput, the number of -items available per call to work will change and depends on the -availability of the read and write buffers. This means that it -sometimes cannot produce a buffer that is properly memory -aligned. This is an inevitable consequence of the scheduler -system. Instead of requiring alignment, the scheduler enforces the -alignment as much as possible, and when a buffer becomes unaligned, -the scheduler will work to correct it as much as possible. If a -block's buffers are unaligned, then, the scheduler sets a flag to -indicate as much so that the block can then decide what best to -do. The next section discusses the use of the aligned/unaligned -information in a gr_block's work function. - - -\section volk_work Calling VOLK kernels in Work() - -The buffers passed to work/general_work in a gr_block are not -guaranteed to be aligned, but they will mostly be aligned whenever -possible. When not aligned, the 'is_unaligned()' flag will be set so -the scheduler knows to try to realign the buffers. We actually make -calls to the VOLK dispatcher, which is mainly designed to check the -buffer alignments and call the correct version of the kernel for -us. From the user-level view of VOLK, calling the dispatcher allows us -to ignore the concept of aligned versus unaligned. This looks like: - -\code -int -gr_some_block::work (int noutput_items, - gr_vector_const_void_star &input_items, - gr_vector_void_star &output_items) -{ - const float *in = (const float *) input_items[0]; - float *out = (float *) output_items[0]; - - // Call the dispatcher to check alignment and call the _a or _u - // version of the kernel. - volk_32f_something_32f(out, in, noutput_items); - - return noutput_items; -} -\endcode - - - -\section volk_tuning Tuning VOLK Performance - -VOLK comes with a profiler that will build a config file for the best -SIMD architecture for your processor. Run volk_profile that is -installed into $PREFIX/bin. This program tests all known VOLK kernels -for each architecture supported by the processor. When finished, it -will write to $HOME/.volk/volk_config the best architecture for the -VOLK function. This file is read when using a function to know the -best version of the function to execute. - -\subsection volk_hand_tuning Hand-Tuning Performance - -If you know a particular architecture works best for your processor, -you can specify the particular architecture to use in the VOLK -preferences file: $HOME/.volk/volk_config - -The file looks like: - -\code - volk_<FUNCTION_NAME> <ARCHITECTURE> -\endcode - -Where the "FUNCTION_NAME" is the particular function that you want to -over-ride the default value and "ARCHITECTURE" is the VOLK SIMD -architecture to use (generic, sse, sse2, sse3, avx, etc.). For -example, the following config file tells VOLK to use SSE3 for the -aligned and unaligned versions of a function that multiplies two -complex streams together. - -\code - volk_32fc_x2_multiply_32fc_a sse3 - volk_32fc_x2_multiply_32fc_u sse3 -\endcode - -\b Tip: if benchmarking GNU Radio blocks, it can be useful to have a -volk_config file that sets all architectures to 'generic' as a way to -test the vectorized versus non-vectorized implementations. - -*/ |