|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves performance of moving average for me by about 4-5x.
My test is
[here](https://github.com/ThomasHabets/radiostuff/blob/master/amateur/listen_70cm.grc),
which processes 10MHz to find the strongest signal.
Without this PR I see `moving_average` in `top` taking about 89.4-94%
CPU. With this patch it's ~20.4-23.7%. Since without this patch it's
that high, I don't know that it's not even better.
Test measured on a Libremv2 with Intel Core i7-6500U @ 2.5GHz.
The memory access pattern is probably worse with this patch, but at
least on my hardware on my workload this seems to be dwarfed by the
gain of using volk. It can be hard to reason about memory access
patterns, so benchmarks overrule theory.
The test only actually uses float averaging.
More benchmarking may be required.
Possible improvements:
* Add template specialization for `uint16` and `uint32`?
* Refactor for non-volk fallback to use only one loop, with
(presumably) better memory access pattern)
|