fftw: weekly report

Hi everyone, I guess the rss feed didn't pick up my weekly report, so
here it is again:

Over the last week, I did some more benchmarking, and also began
working ffmpeg_fft into my benchmarks for a good target to aim for.
I'm currently working out a segfault in the ffmpeg_fft library
(specifically in ff_fft_permute_neon.S) and hope to have a few more
pretty graphs to show for next time. I added fftwni to the benchmarks
in order to compare codelets coded in inline asm versus codelets coded
in neon intrinsics (also started using -O3 for -finline-functions),
and the results were very close for fftwn (inline neon asm) and fftwni
(inline neon intrinsics). The one exception I made was that vtrn was
coded in inline asm to prevent compiler errors revolving around being
unable to spill registers. Also, I'm changing the graphs to directly
show cycles (or time) instead of mflops. The next major step will be
working power-of-two fft 'algorithms' into the fftw planner. After
that I'll be tackling the much more interesting non-power-of-two
algorithms. By directly coding specific algorithms in asm, we're
hoping to achieve a greater speedup than what is possible with
codelets. The asm routines do not necessarily need to completely
displace the codelets from the point of view of the planner (at least
not without hard proof that they are always faster), so they can
possibly augment the pool of algorithms available to the planner.

I will need to have some serious crunch time for my thesis as well,
since the submission deadline is in 19 days (!), so you might not see
as many commits in my repositories this week. Please feel free to look
at the latest in main[1] and misc[2].

[1] http://gitorious.org/gsoc2010-fftw-neon
[2] http://gitorious.org/gsoc2010-fftw-neon-misc