diff options
author | Tom Rondeau <trondeau@vt.edu> | 2012-09-03 14:09:06 -0400 |
---|---|---|
committer | Tom Rondeau <trondeau@vt.edu> | 2012-09-03 14:09:06 -0400 |
commit | 743745f0fbb8c31b919bada25dab91dfeab5e129 (patch) | |
tree | 7f3e2a208cfdcbb04739c682536c990fe259d24f /docs/doxygen/other/volk_guide.dox | |
parent | c93ed1c9eb01027aee04f567b7541738b2aa8bda (diff) |
docs: reworking doc into one extra dox file for easier linking.
build_guide was not being found properly. Putting these together fixes that.
Diffstat (limited to 'docs/doxygen/other/volk_guide.dox')
-rw-r--r-- | docs/doxygen/other/volk_guide.dox | 161 |
1 files changed, 0 insertions, 161 deletions
diff --git a/docs/doxygen/other/volk_guide.dox b/docs/doxygen/other/volk_guide.dox deleted file mode 100644 index 0e444ebbaf..0000000000 --- a/docs/doxygen/other/volk_guide.dox +++ /dev/null @@ -1,161 +0,0 @@ -/*! \page volk_guide Instructions for using Volk in GNU Radio - -\section volk_intro Introduction - -Volk is the Vector-Optimized Library of Kernels. It is a library that -contains kernels of hand-written SIMD code for different mathematical -operations. Since each SIMD architecture can be greatly different and -no compiler has yet come along to handle vectorization properly or -highly efficiently, Volk approaches the problem differently. For each -architecture or platform that a developer wishes to vectorize for, a -new proto-kernel is added to Volk. At runtime, Volk will select the -correct proto-kernel. In this way, the users of Volk call a kernel for -performing the operation that is platform/architecture agnostic. This -allows us to write portable SIMD code. - -Volk kernels are always defined with a 'generic' proto-kernel, which -is written in plain C. With the generic kernel, the kernel becomes -portable to any platform. Kernels are then extended by adding -proto-kernels for new platforms in which they are desired. - -A good example of a Volk kernel with multiple proto-kernels defined is -the volk_32f_s32f_multiply_32f_a. This kernel implements a scalar -multiplication of a vector of floating point numbers (each item in the -vector is multiplied by the same value). This kernel has the following -proto-kernels that are defined for 'generic,' 'avx,' 'sse,' and 'orc.' - -\code - void volk_32f_s32f_multiply_32f_a_generic - void volk_32f_s32f_multiply_32f_a_sse - void volk_32f_s32f_multiply_32f_a_avx - void volk_32f_s32f_multiply_32f_a_orc -\endcode - -These proto-kernels means that on platforms with AVX support, Volk can -select this option or the SSE option, depending on which is faster. On -other platforms, the ORC SIMD compiler might provide a solution. If -all else fails, Volk can fall back on the generic proto-kernel, which -will always work. - -Just a note on ORC. ORC is a SIMD compiler library that uses a generic -assembly-like language for SIMD commands. Based on the available SIMD -architecture of a system, it will try and compile a good -solution. Tests show that the results of ORC proto-kernels are -generally better than the generic versions but often not as good as -the hand-tuned proto-kernels for a specific SIMD architecture. This -is, of course, to be expected, and ORC provides a nice intermediary -step to performance improvements until a specific hand-tuned -proto-kernel can be made for a given platform. - -See <a -href="http://gnuradio.org/redmine/projects/gnuradio/wiki/Volk">Volk on -gnuradio.org</a> for details on the Volk naming scheme. - - -\section volk_alignment Setting and Using Memory Alignment Information - -For Volk to work as best as possible, we want to use memory-aligned -SIMD calls, which means we have to have some way of knowing and -controlling the alignment of the buffers passed to gr_block's work -function. We set the alignment requirement for SIMD aligned memory -calls with: - -\code - const int alignment_multiple = - volk_get_alignment() / output_item_size; - set_alignment(std::max(1,alignment_multiple)); -\endcode - -The Volk function 'volk_get_alignment' provides the alignment of the -the machine architecture. We then base the alignment on the number of -output items required to maintain the alignment, so we divide the -number of alignment bytes by the number of bytes in an output items -(sizeof(float), sizeof(gr_complex), etc.). This value is then set per -block with the 'set_alignment' function. - -Because the scheduler tries to optimize throughput, the number of -items available per call to work will change and depends on the -availability of the read and write buffers. This means that it -sometimes cannot produce a buffer that is properly memory -aligned. This is an inevitable consequence of the scheduler -system. Instead of requiring alignment, the scheduler enforces the -alignment as much as possible, and when a buffer becomes unaligned, -the scheduler will work to correct it as much as possible. If a -block's buffers are unaligned, then, the scheduler sets a flag to -indicate as much so that the block can then decide what best to -do. The next section discusses the use of the aligned/unaligned -information in a gr_block's work function. - - -\section volk_work Using Alignment Properties in Work() - -The buffers passed to work/general_work in a gr_block are not -guaranteed to be aligned, but they will mostly be aligned whenever -possible. When not aligned, the 'is_unaligned()' flag will be set. So -a block can know if its buffers are aligned and make the right -decisions. This looks like: - -\code -int -gr_some_block::work (int noutput_items, - gr_vector_const_void_star &input_items, - gr_vector_void_star &output_items) -{ - const float *in = (const float *) input_items[0]; - float *out = (float *) output_items[0]; - - if(is_unaligned()) { - // do something with unaligned data. This can either be a manual - // handling of the items or a call to an unaligned Volk function. - volk_32f_something_32f_u(out, in, noutput_items); - } - else { - // Buffers are aligned; can call the aligned Volk function. - volk_32f_something_32f_a(out, in, noutput_items); - } - - return noutput_items; -} -\endcode - - - -\section volk_tuning Tuning Volk Performance - -VOLK comes with a profiler that will build a config file for the best -SIMD architecture for your processor. Run volk_profile that is -installed into $PREFIX/bin. This program tests all known VOLK kernels -for each architecture supported by the processor. When finished, it -will write to $HOME/.volk/volk_config the best architecture for the -VOLK function. This file is read when using a function to know the -best version of the function to execute. - -\subsection volk_hand_tuning Hand-Tuning Performance - -If you know a particular architecture works best for your processor, -you can specify the particular architecture to use in the VOLK -preferences file: $HOME/.volk/volk_config - -The file looks like: - -\code - volk_<FUNCTION_NAME> <ARCHITECTURE> -\endcode - -Where the "FUNCTION_NAME" is the particular function that you want to -over-ride the default value and "ARCHITECTURE" is the VOLK SIMD -architecture to use (generic, sse, sse2, sse3, avx, etc.). For -example, the following config file tells VOLK to use SSE3 for the -aligned and unaligned versions of a function that multiplies two -complex streams together. - -\code - volk_32fc_x2_multiply_32fc_a sse3 - volk_32fc_x2_multiply_32fc_u sse3 -\endcode - -\b Tip: if benchmarking GNU Radio blocks, it can be useful to have a -volk_config file that sets all architectures to 'generic' as a way to -test the vectorized versus non-vectorized implementations. - -*/ |