volk: adding new kernels to test and profile.
filter: adding ssc and fsf versions of filter with associated new Volk kernels.
These routines work and pass QA. They could use some performance work. the FSF is just slightly slower than before; the SCC version is more noticably slower.
Both could benefit, probably, by using SSE2 intrinsics to handle the shorts.
filter: added a ccf Volk dot product to use with ccf filters and used it in fir_filter_ccf.
Produces improved results to previous version.
volk: fixes for 32f dot_prod
Accepts num_points like everything else and handles splitting up numbers itself, not expected to be done externally.
Adds AVX version, both aligned and unaligned.
volk: dot_produce for floats does 16 at a time.
This was done to make this have the same performance as float_dotprod from before. This makes all flavors of the 32f dotprod work the same way.
Because it's expecting the input to have 4x more samples than specified, it's making qa for these fail.
filter: process 4 vectors each time in volk dot_prod to speed up fir filters.
This makes the volk version of the SSE FIR filter the same speed as using the hand-crafted float_dotprod from before.
volk: force kwargs keys to be of type str, not unicode for py25
volk: code simplification, overrule macro and python opts
volk: avoid sse2 saturation issue 32768->32767
volk: added set_float_rounding to volk_cpu_init
Also available in: