Attention is currently required from: neels, laforge.
2 comments:
Patchset:
if this patch helps you in production setups, and it has been tested/verified there, please rebase a […]
Out of many things I could understand I may have miscommunicated, I can't see how I have suggested that it "hardly matters". I think that delivering a sent SMS as soon as possible is rather important. I think I explained why this isn't happening. It's also been observed at the congress event.
if this patch helps you in production setups, and it has been tested/verified there, please rebase a […]
This patch is actually part of a larger patch set that I am currently running in production setup. Some of the other patches had been submitted here, including
https://gerrit.osmocom.org/c/osmo-msc/+/28338/1 which was abandoned due to your valid comment there: "... It creates a merge nightmare for all of my work on a SQL-less SMS storage backend... "
The patch set that I am running, is not experiencing the bizarre, up to some minutes of duration stalls that we observed last year related to sqlite driver? or disk access/locking? or who knows what? - I remember well that after valiant attempts to figure out what was going on, the conclusion was something like: I don't care (anymore) what sqlite is doing, let's simply remove it, hence your sprint to implement file based storage for the SMS queue..
I do not know which patch(es) is/are avoiding the lockups, but I have a feeling it has to do with all or some of:
1) not rapidly doing UPDATE followed by DELETE operations.
2) implementing a timer to prevent successive queue runs (possibly giving the sqlite/disk IO or I don't know what a chance to "catch up"?
My hypothesis is that current master osmo-msc continues to suffer from this stalling problem and I think it will be observed if there is significant SMS traffic, at for example a CCC event. Therefore, I don't see much point in submitting this patch alone. - This patch simply addresses an issue that I noticed while doing a pretty intense analysis of the SMS queue. This issue would be relevant regardless of the actual storage backend, if the queue is large enough.
A possibility for me at this point is: rebase and resumbit all the patches relevant to sqlite based SMS-Q and rebase laforge/nosql on top of that work. I don't know how complicated that might be, but it could be a solution to not creating that work for somebody else by merging the sqlite work to master.
At the same time, I'm not necessarily happy with the sqlite work as I don't know EXACTLY why it alleviates the problem and I have -ENOTEXIST desire to push patches that I do not understand through CR, or to create more pending patches that sit like this one for six months.
Maybe worth also saying that dropping sqlite and switching to file based SMS storage also creates a work load for me in that I have to adapt code that I run to deal with that. Mostly, the routine that checks for SMS sitting in local queue for a destination that is currently attached to another VLR, (distributed-GSM-SMSQ if you like) But it's probably as simple as changing the db access routines to file access routines.
To view, visit change 28342. To unsubscribe, or for help writing mail filters, visit settings.