GNU Radio 3.7.1 C++ API
Instructions for using Volk in GNU Radio

Introduction

Volk is the Vector-Optimized Library of Kernels. It is a library that contains kernels of hand-written SIMD code for different mathematical operations. Since each SIMD architecture can be greatly different and no compiler has yet come along to handle vectorization properly or highly efficiently, Volk approaches the problem differently. For each architecture or platform that a developer wishes to vectorize for, a new proto-kernel is added to Volk. At runtime, Volk will select the correct proto-kernel. In this way, the users of Volk call a kernel for performing the operation that is platform/architecture agnostic. This allows us to write portable SIMD code.

Volk kernels are always defined with a 'generic' proto-kernel, which is written in plain C. With the generic kernel, the kernel becomes portable to any platform. Kernels are then extended by adding proto-kernels for new platforms in which they are desired.

A good example of a Volk kernel with multiple proto-kernels defined is the volk_32f_s32f_multiply_32f_a. This kernel implements a scalar multiplication of a vector of floating point numbers (each item in the vector is multiplied by the same value). This kernel has the following proto-kernels that are defined for 'generic,' 'avx,' 'sse,' and 'orc.'

    void volk_32f_s32f_multiply_32f_a_generic
    void volk_32f_s32f_multiply_32f_a_sse
    void volk_32f_s32f_multiply_32f_a_avx
    void volk_32f_s32f_multiply_32f_a_orc

These proto-kernels means that on platforms with AVX support, Volk can select this option or the SSE option, depending on which is faster. On other platforms, the ORC SIMD compiler might provide a solution. If all else fails, Volk can fall back on the generic proto-kernel, which will always work.

Just a note on ORC. ORC is a SIMD compiler library that uses a generic assembly-like language for SIMD commands. Based on the available SIMD architecture of a system, it will try and compile a good solution. Tests show that the results of ORC proto-kernels are generally better than the generic versions but often not as good as the hand-tuned proto-kernels for a specific SIMD architecture. This is, of course, to be expected, and ORC provides a nice intermediary step to performance improvements until a specific hand-tuned proto-kernel can be made for a given platform.

See Volk on gnuradio.org for details on the Volk naming scheme.

Setting and Using Memory Alignment Information

For Volk to work as best as possible, we want to use memory-aligned SIMD calls, which means we have to have some way of knowing and controlling the alignment of the buffers passed to gr_block's work function. We set the alignment requirement for SIMD aligned memory calls with:

  const int alignment_multiple =
    volk_get_alignment() / output_item_size;
  set_alignment(std::max(1,alignment_multiple));

The Volk function 'volk_get_alignment' provides the alignment of the the machine architecture. We then base the alignment on the number of output items required to maintain the alignment, so we divide the number of alignment bytes by the number of bytes in an output items (sizeof(float), sizeof(gr_complex), etc.). This value is then set per block with the 'set_alignment' function.

Because the scheduler tries to optimize throughput, the number of items available per call to work will change and depends on the availability of the read and write buffers. This means that it sometimes cannot produce a buffer that is properly memory aligned. This is an inevitable consequence of the scheduler system. Instead of requiring alignment, the scheduler enforces the alignment as much as possible, and when a buffer becomes unaligned, the scheduler will work to correct it as much as possible. If a block's buffers are unaligned, then, the scheduler sets a flag to indicate as much so that the block can then decide what best to do. The next section discusses the use of the aligned/unaligned information in a gr_block's work function.

Using Alignment Properties in Work()

The buffers passed to work/general_work in a gr_block are not guaranteed to be aligned, but they will mostly be aligned whenever possible. When not aligned, the 'is_unaligned()' flag will be set. So a block can know if its buffers are aligned and make the right decisions. This looks like:

int
gr_some_block::work (int noutput_items,
                     gr_vector_const_void_star &input_items,
                     gr_vector_void_star &output_items)
{
  const float *in = (const float *) input_items[0];
  float *out = (float *) output_items[0];

  if(is_unaligned()) {
    // do something with unaligned data. This can either be a manual
    // handling of the items or a call to an unaligned Volk function.
    volk_32f_something_32f_u(out, in, noutput_items);
  }
  else {
    // Buffers are aligned; can call the aligned Volk function.
    volk_32f_something_32f_a(out, in, noutput_items);
  }

  return noutput_items;
}

Tuning Volk Performance

VOLK comes with a profiler that will build a config file for the best SIMD architecture for your processor. Run volk_profile that is installed into $PREFIX/bin. This program tests all known VOLK kernels for each architecture supported by the processor. When finished, it will write to $HOME/.volk/volk_config the best architecture for the VOLK function. This file is read when using a function to know the best version of the function to execute.

Hand-Tuning Performance

If you know a particular architecture works best for your processor, you can specify the particular architecture to use in the VOLK preferences file: $HOME/.volk/volk_config

The file looks like:

    volk_<FUNCTION_NAME> <ARCHITECTURE>

Where the "FUNCTION_NAME" is the particular function that you want to over-ride the default value and "ARCHITECTURE" is the VOLK SIMD architecture to use (generic, sse, sse2, sse3, avx, etc.). For example, the following config file tells VOLK to use SSE3 for the aligned and unaligned versions of a function that multiplies two complex streams together.

Tip: if benchmarking GNU Radio blocks, it can be useful to have a volk_config file that sets all architectures to 'generic' as a way to test the vectorized versus non-vectorized implementations.