summaryrefslogtreecommitdiff
path: root/docs/doxygen/other/volk_guide.dox
diff options
context:
space:
mode:
authorTom Rondeau <trondeau@vt.edu>2012-09-03 14:09:06 -0400
committerTom Rondeau <trondeau@vt.edu>2012-09-03 14:09:06 -0400
commit743745f0fbb8c31b919bada25dab91dfeab5e129 (patch)
tree7f3e2a208cfdcbb04739c682536c990fe259d24f /docs/doxygen/other/volk_guide.dox
parentc93ed1c9eb01027aee04f567b7541738b2aa8bda (diff)
docs: reworking doc into one extra dox file for easier linking.
build_guide was not being found properly. Putting these together fixes that.
Diffstat (limited to 'docs/doxygen/other/volk_guide.dox')
-rw-r--r--docs/doxygen/other/volk_guide.dox161
1 files changed, 0 insertions, 161 deletions
diff --git a/docs/doxygen/other/volk_guide.dox b/docs/doxygen/other/volk_guide.dox
deleted file mode 100644
index 0e444ebbaf..0000000000
--- a/docs/doxygen/other/volk_guide.dox
+++ /dev/null
@@ -1,161 +0,0 @@
-/*! \page volk_guide Instructions for using Volk in GNU Radio
-
-\section volk_intro Introduction
-
-Volk is the Vector-Optimized Library of Kernels. It is a library that
-contains kernels of hand-written SIMD code for different mathematical
-operations. Since each SIMD architecture can be greatly different and
-no compiler has yet come along to handle vectorization properly or
-highly efficiently, Volk approaches the problem differently. For each
-architecture or platform that a developer wishes to vectorize for, a
-new proto-kernel is added to Volk. At runtime, Volk will select the
-correct proto-kernel. In this way, the users of Volk call a kernel for
-performing the operation that is platform/architecture agnostic. This
-allows us to write portable SIMD code.
-
-Volk kernels are always defined with a 'generic' proto-kernel, which
-is written in plain C. With the generic kernel, the kernel becomes
-portable to any platform. Kernels are then extended by adding
-proto-kernels for new platforms in which they are desired.
-
-A good example of a Volk kernel with multiple proto-kernels defined is
-the volk_32f_s32f_multiply_32f_a. This kernel implements a scalar
-multiplication of a vector of floating point numbers (each item in the
-vector is multiplied by the same value). This kernel has the following
-proto-kernels that are defined for 'generic,' 'avx,' 'sse,' and 'orc.'
-
-\code
- void volk_32f_s32f_multiply_32f_a_generic
- void volk_32f_s32f_multiply_32f_a_sse
- void volk_32f_s32f_multiply_32f_a_avx
- void volk_32f_s32f_multiply_32f_a_orc
-\endcode
-
-These proto-kernels means that on platforms with AVX support, Volk can
-select this option or the SSE option, depending on which is faster. On
-other platforms, the ORC SIMD compiler might provide a solution. If
-all else fails, Volk can fall back on the generic proto-kernel, which
-will always work.
-
-Just a note on ORC. ORC is a SIMD compiler library that uses a generic
-assembly-like language for SIMD commands. Based on the available SIMD
-architecture of a system, it will try and compile a good
-solution. Tests show that the results of ORC proto-kernels are
-generally better than the generic versions but often not as good as
-the hand-tuned proto-kernels for a specific SIMD architecture. This
-is, of course, to be expected, and ORC provides a nice intermediary
-step to performance improvements until a specific hand-tuned
-proto-kernel can be made for a given platform.
-
-See <a
-href="http://gnuradio.org/redmine/projects/gnuradio/wiki/Volk">Volk on
-gnuradio.org</a> for details on the Volk naming scheme.
-
-
-\section volk_alignment Setting and Using Memory Alignment Information
-
-For Volk to work as best as possible, we want to use memory-aligned
-SIMD calls, which means we have to have some way of knowing and
-controlling the alignment of the buffers passed to gr_block's work
-function. We set the alignment requirement for SIMD aligned memory
-calls with:
-
-\code
- const int alignment_multiple =
- volk_get_alignment() / output_item_size;
- set_alignment(std::max(1,alignment_multiple));
-\endcode
-
-The Volk function 'volk_get_alignment' provides the alignment of the
-the machine architecture. We then base the alignment on the number of
-output items required to maintain the alignment, so we divide the
-number of alignment bytes by the number of bytes in an output items
-(sizeof(float), sizeof(gr_complex), etc.). This value is then set per
-block with the 'set_alignment' function.
-
-Because the scheduler tries to optimize throughput, the number of
-items available per call to work will change and depends on the
-availability of the read and write buffers. This means that it
-sometimes cannot produce a buffer that is properly memory
-aligned. This is an inevitable consequence of the scheduler
-system. Instead of requiring alignment, the scheduler enforces the
-alignment as much as possible, and when a buffer becomes unaligned,
-the scheduler will work to correct it as much as possible. If a
-block's buffers are unaligned, then, the scheduler sets a flag to
-indicate as much so that the block can then decide what best to
-do. The next section discusses the use of the aligned/unaligned
-information in a gr_block's work function.
-
-
-\section volk_work Using Alignment Properties in Work()
-
-The buffers passed to work/general_work in a gr_block are not
-guaranteed to be aligned, but they will mostly be aligned whenever
-possible. When not aligned, the 'is_unaligned()' flag will be set. So
-a block can know if its buffers are aligned and make the right
-decisions. This looks like:
-
-\code
-int
-gr_some_block::work (int noutput_items,
- gr_vector_const_void_star &input_items,
- gr_vector_void_star &output_items)
-{
- const float *in = (const float *) input_items[0];
- float *out = (float *) output_items[0];
-
- if(is_unaligned()) {
- // do something with unaligned data. This can either be a manual
- // handling of the items or a call to an unaligned Volk function.
- volk_32f_something_32f_u(out, in, noutput_items);
- }
- else {
- // Buffers are aligned; can call the aligned Volk function.
- volk_32f_something_32f_a(out, in, noutput_items);
- }
-
- return noutput_items;
-}
-\endcode
-
-
-
-\section volk_tuning Tuning Volk Performance
-
-VOLK comes with a profiler that will build a config file for the best
-SIMD architecture for your processor. Run volk_profile that is
-installed into $PREFIX/bin. This program tests all known VOLK kernels
-for each architecture supported by the processor. When finished, it
-will write to $HOME/.volk/volk_config the best architecture for the
-VOLK function. This file is read when using a function to know the
-best version of the function to execute.
-
-\subsection volk_hand_tuning Hand-Tuning Performance
-
-If you know a particular architecture works best for your processor,
-you can specify the particular architecture to use in the VOLK
-preferences file: $HOME/.volk/volk_config
-
-The file looks like:
-
-\code
- volk_<FUNCTION_NAME> <ARCHITECTURE>
-\endcode
-
-Where the "FUNCTION_NAME" is the particular function that you want to
-over-ride the default value and "ARCHITECTURE" is the VOLK SIMD
-architecture to use (generic, sse, sse2, sse3, avx, etc.). For
-example, the following config file tells VOLK to use SSE3 for the
-aligned and unaligned versions of a function that multiplies two
-complex streams together.
-
-\code
- volk_32fc_x2_multiply_32fc_a sse3
- volk_32fc_x2_multiply_32fc_u sse3
-\endcode
-
-\b Tip: if benchmarking GNU Radio blocks, it can be useful to have a
-volk_config file that sets all architectures to 'generic' as a way to
-test the vectorized versus non-vectorized implementations.
-
-*/