Why I care about NEON ....

Results from a gnuradio test of the inner product code:

   generic: taps: 256 input: 4e+07 cpu: 1269.242 taps/sec: 8.068e+06
cortex_a8: taps: 256 input: 4e+07 cpu: 49.156 taps/sec: 2.083e+08

The generic routine is all C, the cortex_a8 is unrolled by 8 (if I
remember properly) that Mans helped me with. I'm sure gcc has room for