A recent patch to libosmocore refuses to allocate a rate counter group with an index that already exists. Per se that's a good thing, but it needs fixes in a whole range of callers.
Numerous callers wrongly just pass 0 as rate counter group, so as soon as a second <thing> shows up, the application may deny service; that's unacceptable. For example, OsmoSGSN now crashes with the second subscriber showing up!
causing patch: https://gerrit.osmocom.org/5418 my proposed change: https://gerrit.osmocom.org/5516
I'd like to request everyone to refrain from submitting patches that are insufficiently tested, in general of course, but in particular until after congress, please ;)
~N
Hi Neels,
On Wed, Dec 20, 2017 at 01:50:00AM +0100, Neels Hofmeyr wrote:
A recent patch to libosmocore refuses to allocate a rate counter group with an index that already exists. Per se that's a good thing, but it needs fixes in a whole range of callers.
Numerous callers wrongly just pass 0 as rate counter group, so as soon as a second <thing> shows up, the application may deny service; that's unacceptable. For example, OsmoSGSN now crashes with the second subscriber showing up!
causing patch: https://gerrit.osmocom.org/5418 my proposed change: https://gerrit.osmocom.org/5516
Thanks for your work-around which I have fixed.
But what does this story tell us: Our testing still sucks, despite all the existing efforts we put into it.
* how can a patch that breaks 'make check' of osmo-bts/openbsc/osmo-pcu, etc. be merged to libosmcoore? Isn't that what we have inverse dependencies for in the gerrit verification jobs?
* how can it be that even after the patch is merged, we find the osmo-sgsn issue only by manual testing and not by automatic testing?
We all have to take a step back and seriously reflect on that.
I'd like to request everyone to refrain from submitting patches that are insufficiently tested, in general of course, but in particular until after congress, please ;)
While I agree patches should be tested ahead of the merge, I also think that this must not rely on manual testing but we should have sufficient automatic testing in place. If the 'gerrit build testing' verification is not executing all existing tests or taking shortcts, then we have to a) think of speeding this up significantly (by throwing more hardware at it, or making tests execute faster), and/or b) introduce an intermediate stage, something like a "staging" branch. So the existing code review and gerrit-jenkins-build-testing would make patches end up in 'staging' first, where then the slower (hourly/daily/...) tests like osmo-gsm-tester are executed. And only once that's green, we go towards master.
So as much as I love CCC events and respect the people volunteering to bring GSM and UMTS newtworks alive there (like yourself), I don't think that any given single "event network" should dictate the patch merging. If that's the case, than we're doing things utterly wrong
Regards, Harald (who goes back to writing testing code)
On Wed, Dec 20, 2017 at 03:13:38PM +0100, Harald Welte wrote:
- how can a patch that breaks 'make check' of osmo-bts/openbsc/osmo-pcu, etc. be merged to libosmcoore? Isn't that what we have inverse dependencies for in the gerrit verification jobs?
In libosmocore gerrit, we don't test depending projects. We'd find the master jobs failing if we merge a breaking change.
- how can it be that even after the patch is merged, we find the osmo-sgsn issue only by manual testing and not by automatic testing?
I also wondered about this, and AFAICT the reason is that osmo-gsm-tester doesn't do more than one GPRS subscriber. SGSN worked fine with one subscriber, so we need a test that adds a second GPRS user to osmo-gsm-tester...
https://osmocom.org/issues/2820 Maybe a titan test would be more fitting to add a number of subscribers?
b) introduce an intermediate stage, something like a "staging" branch. So the existing code review and gerrit-jenkins-build-testing would make patches end up in 'staging' first, where then the slower (hourly/daily/...) tests like osmo-gsm-tester are executed. And only once that's green, we go towards master.
We could cause each and every gerrit patch to trigger osmo-gsm-tester and/or titan test runs; i.e. perfect coverage means "infinite" jenkins job effort and time to wait for V+1.
Aiming for perfect coverage before merging patches seems to me to be fighting windmills; we can get better with every failure, but I guess we'll always need reassurance from real-world tests.
So far the tradeoff was to rely on master job failures and osmo-gsm-tester failures to be reported. Which works only as long as we watch those and as long as they are able to catch the particular issues.
Maybe an IRC bot to report build failures to #osmocom-dev could be nice besides mail notifications... just dreaming.
I don't think that any given single "event network" should dictate the patch merging.
Was of course a selfish, not entirely serious request, with a wink.
~N