VOLK¶
VOLK stands for Vector-Optimized Library of Kernels. It's a library that was introduced into GNU Radio in December 2010. You can read more about it here: http://www.trondeau.com/blog/2010/12/11/volk-vector-optimized-library-of-kernels.html
Other details on implementing Volk in GNU Radio can be found:
http://www.trondeau.com/blog/2012/2/13/volk-integration-to-gnu-radio.html
And benchmarking of Volk in GR:
http://www.trondeau.com/blog/2012/2/17/volk-benchmarking.html
Paper on VOLK at the WinnForum's SDR conference in January, 2013:
volk.pdf
Using VOLK¶
VOLK comes with a profiler that will build a config file for the best SIMD architecture for your processor. Run volk_profile that is installed into $PREFIX/bin. This program tests all known VOLK kernels for each architecture supported by the processor. When finished, it will write to $HOME/.volk/volk_config the best architecture for the VOLK function. This file is read when using a function to know the best version of the function to execute.
Hand-Tuning Performance¶
If you know a particular architecture works best for your processor, you can specify the particular architecture to use in the VOLK preferences file: $HOME/.volk/volk_config
The file looks like:
volk_<FUNCTION_NAME> <ARCHITECTURE>
Where the "FUNCTION_NAME" is the particular function that you want to over-ride the default value and "ARCHITECTURE" is the VOLK SIMD architecture to use (sse, sse2, sse3, avx, etc.). For example, the following config file tells VOLK to use SSE3 for the aligned and unaligned versions of a function that multiplies two complex streams together.
volk_32fc_x2_multiply_32fc_a sse3 volk_32fc_x2_multiply_32fc_u sse3
Writing Volk kernels¶
Developing with Volk in GNU Radio¶
To use Volk kernels in GNU Radio, you have to be aware of the buffer alignment. We have the ability to check the buffer alignment by calling the is_unaligned() function from the gr_block. If this returns True, then there is an alignment issue and the aligned kernels cannot be called. If this call returns False, then the buffers are aligned and the aligned Volk kernel may be used.
The following is an example using the gr_multiply_cc block, which uses the volk_32fc_x2_multiply_32fc kernel to multiply two streams together. When is_unaligned() returns True, the _u or unaligned version is called; otherwise, the _a or aligned version is used.
If a kernel does not have an unaligned version, one can either be made or a generic C++ implementation of the math can be programmed here. Generally speaking, making an unaligned kernel is as simple as copying the aligned kernel and changing and load calls to loadu and store to storeu (true for Intel SSE instructions, at least).
int
gr_multiply_cc::work (int noutput_items,
gr_vector_const_void_star &input_items,
gr_vector_void_star &output_items)
{
gr_complex *out = (gr_complex *) output_items[0];
int noi = d_vlen*noutput_items;
memcpy(out, input_items[0], noi*sizeof(gr_complex));
if(is_unaligned()) {
for(size_t i = 1; i < input_items.size(); i++)
volk_32fc_x2_multiply_32fc_u(out, out, (gr_complex*)input_items[i], noi);
}
else {
for(size_t i = 1; i < input_items.size(); i++)
volk_32fc_x2_multiply_32fc_a(out, out, (gr_complex*)input_items[i], noi);
}
return noutput_items;
}
Using ORC with VOLK¶
VOLK can take advantage of the Oil Runtime Compiler (ORC) to create cross-platform kernels relatively quickly. ORC is a higher-level language way to write SIMD code for different SIMD architectures. The ease of writing an Orc function can be offset by a less well-tuned architecture-specific kernel (generality versus speed). ORC can often be a good place to start writing VOLK kernels and then optimize as necessary.
To download ORC, go to:
http://code.entropywave.com/download/orc/
Or use their git repo:
git://code.entropywave.com/git/orc.git
As of GNU Radio 3.5.2, VOLK depends on ORC version 0.4.12 or higher.
VOLK Naming Scheme¶
There is discussion about standardizing the naming scheme for VOLK. We want standard naming to make sure that all functions are explicitly clear as to what they do, what their inputs and output types are, and that new functions do not have naming conflicts.
The basic naming scheme will look something like this:
volk_(inputs params)_[name]_(output params)_[alignment]
These are a few questions that must be addressed when creating the names:
1. Different and multiple inputs and/or outputs
2. Different types, also with different/multiple inputs/outputs
3. Constants (scalars) versus vectors
4. Mappings or other control information (I'm thinking of things like masks for permutation operators)
5. Memorable (as in the user's should be able to be able to "guess" the names from their purpose)
6. Unique (prevent duplication)
The current scheme follows this formula:
volk_(input_type_0)_x(input_num_0)_(input_type_1)_x(input_num_1)_...
_[name]_(output_type_0)_x(output_num_0)_(output_type_1)_x(output_num_1)_..._[alignment]
Any function may have M inputs and N outputs. Each input/output has a type that is explicitly named. We specify the types in blocks if there are multiple types in a row. For each block, the type of that block of inputs/outputs is followed by the number of items in that block. The types of data can be:
8i, 8u, 16i, 16u, 32i, 32u, 32f, 64i, 64u, 64f
The number of parameters with that type is specified following the type and prefixed with an "x." If there is only a single argument of the type, the multiplier may be omitted.
Any input/output type can be made complex by adding a "c" to the property type (such as 32-bit floating complex would be 32fc). By default, all inputs and outputs are vectors, but some of the VOLK kernels may take a scalar, such as multiplying by a const. These types are specified by prefixing a "s" to the type (e.g., s32fc).
The alignment property in the name specifies the memory alignment required by the inputs and outputs. Many SIMD architectures require a specific byte alignment. Mostly, this is a 16 byte alignment. The underlying Volk machinery will know this, and so the kernel must only be told that this is an aligned kernel by specifying an "a" suffix. An unaligned requirement would just be written as "u."
Note that only one alignment is specified for the function. Mostly, any imposed alignment on the input will be the same restriction on the output alignment, and vice-versa. However, some functions may not have the same requirements on all inputs or outputs, and scalars usually do not require a specific alignment. In these cases, the alignment should be the strictest alignment required by any of the inputs or outputs. Differences should be made clear in the function documentation.
Some examples include:
multiply two complex float vectors together (aligned and unaligned versions):
volk_32fc_x2_multiply_32fc_avolk_32fc_x2_multiply_32fc_u
Add four unsigned short vectors together:
volk_16u_x4_add_16u_a
Multiply a complex float vector by a short integer:
volk_32fc_s16i_multiply_32fc_a