Hi all,
I'm moving this conversation from private to public. I think Sylvain
and Andreas might be interested in participating.
---------- Forwarded message ----------
From: Thomas Tsou <tom(a)tsou.cc>
Date: Tue, Jul 9, 2013 at 2:51 AM
Subject: Re: DSP optimization
To: Alexander Chemeris <alexander.chemeris(a)gmail.com>
On Mon, Jul 8, 2013 at 6:52 AM, Alexander Chemeris
<alexander.chemeris(a)gmail.com> wrote:
Wow, optimization of 5-16x for Viterbi is huge indeed.
I wonder what
would be results for our Atoms.
Without SSE, just C only butterfly, the improvement is around 4x. SSE
3 (Atom) forces a small change on the normalization (not separated out
yet), but the results weren't very far off from SSE 4.1 when I tested
on Core 2 Duo.
I might try to manipulate the interface to read in the state tables
instead of the generator polynomials. That would really help with
testing and integration, but I'm not sure yet. There are many ways to
go here.
What is problematic with the runtime detection? CPU
autodetection on
Linux should be as easy as reading /proc/cpuinfo. But I see an issue
is with correctly setting up build system to generate all version on
the same run. I think we could leave CPU autodetection for the
"everything else" milestone, using compile time selection for now.
I think compile time detection is more appropriate. For GSM / LTE
we're almost always dealing with fixed sized vectors and not odd
calculations (e.g. 1023 size FFT), so it's unlikely that the results
will change on repeated runs.
/proc/cpuinfo parsing scripts I've seen have been prone to breakage.
If you have a really good one, let me know. I usually prefer to run
configure checks against the actual instruction, but that can get
messy with a lot of checks. Anyhow, I'm not worrying about this now.
What repository will you push at? We need to have at
least master
branch and dual-channel branch working with the optimizations. And I
believe everyone would be happy to see optimizations in the
libosmocore for the benefit of other projects as well. I don't foresee
any issues with a slight change in the API of libosmocore if it is
justified - just send an RFC/patch to the OpenBSC mailing list and it
will be reviewed.
Non-Viterbi changes are sigProc.cpp changes only, so they are not
branch-specific - they will probably merge into the oldest available
OpenBTS releases. The Viterbi changes merge into Andreas's branch,
which is a very large change. For now, somebody needs to write it,
which is why I'm considering making the interfaces match.
Attached are the standalone unit test cases for SSE 4.2. As previously
mentioned, Atom needs SSE3 only. I'll add the ifdefs for those
shortly. I don't know if there's an appropriate repository for these
right now - linking libosmocore from the transceiver for comparison
purposes only seems silly. I just generated a temporary tarball for
the time being.
Thomas
--
Regards,
Alexander Chemeris.
CEO, Fairwaves LLC / ООО УмРадио
http://fairwaves.ru