SQLite uses DELETE journaling mode by default. It became a problem on a real-world, actively used network (several hundred subscribers, several thousands SMS per day). Some location updates and SMS delivery marks were not getting written because database file was locked:
2013-12-13_07:26:10.10665 <0002> gsm_subscriber.c:362 Subscriber 334020422525672 ATTACHED LAC=3 2013-12-13_07:26:10.10668 <000d> db.c:170 DBI: 5: database is locked 2013-12-13_07:26:10.10865 <000d> db.c:969 Failed to update Subscriber (by IMSI).
Switching to WAL mode fixed the problem.
WAL journaling mode is persistent; it stays in effect after closing and reopening the database. WAL mode database requires SQLite >= 3.7.0.
All credit belongs Ivan Kluchnikov who diagnosed and fixed this issue. --- openbsc/src/libmsc/db.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/openbsc/src/libmsc/db.c b/openbsc/src/libmsc/db.c index e720c78..3a989d4 100644 --- a/openbsc/src/libmsc/db.c +++ b/openbsc/src/libmsc/db.c @@ -232,11 +232,18 @@ static int db_configure(void) dbi_result result;
result = dbi_conn_query(conn, + "PRAGMA journal_mode = WAL"); + if (!result) + LOGP(DDB, LOGL_NOTICE, "Warning: failed to set journal_mode = WAL.\n"); + else + dbi_result_free(result); + + result = dbi_conn_query(conn, "PRAGMA synchronous = FULL"); if (!result) return -EINVAL; - dbi_result_free(result); + return 0; }
Alexey Shamrin wrote:
WAL journaling mode is persistent; it stays in effect after closing and reopening the database. WAL mode database requires SQLite >= 3.7.0.
Think about this some more. It's not enough to state dependencies in a commit message, you need to put them in the build system. A patch to do this will need to touch configure.ac, and I think it would be wise to make the requirement optional.
All credit belongs Ivan Kluchnikov who diagnosed and fixed this issue.
Then the commit should list him as author, right?
//Peter
Peter, thank you for your comments!
On Fri, Feb 21, 2014 at 2:05 AM, Peter Stuge peter@stuge.se wrote:
WAL journaling mode is persistent; it stays in effect after closing and reopening the database. WAL mode database requires SQLite >= 3.7.0.
Think about this some more. It's not enough to state dependencies in a commit message, you need to put them in the build system. A patch to do this will need to touch configure.ac, and I think it would be wise to make the requirement optional.
Is it really needed in this particular case? OpenBSC would continue working with SQLite < 3.7.0. It would just ignore `journal_mode = WAL` and keep using `DELETE` mode. The only thing that won't work is downgrading from newer SQLite to older while keeping the same database file. Older SQLite would refuse to open the database that was once opened in WAL mode. The file could be downgraded by changing journal_mode on it. But only newer SQLite could do it. Do you think something must be done about it?
All credit belongs Ivan Kluchnikov who diagnosed and fixed this issue.
Then the commit should list him as author, right?
As far as I understand, Ivan noticed the problem, opened the database with sqlite3 command, entered `PRAGMA journal_mode = WAL` and noticed the problem was now gone. Then he told me about it and suggested to write a patch. I found where to put this command in OpenBSC source, tested that it indeed persists WAL mode in the file, wrote the commit message and sent it to mailing list.
I don't know what this means about the commit authorship. I can do it either way.
Alexey
Alexey Shamrin wrote:
WAL journaling mode is persistent; it stays in effect after closing and reopening the database. WAL mode database requires SQLite >= 3.7.0.
Think about this some more. It's not enough to state dependencies in a commit message, you need to put them in the build system. A patch to do this will need to touch configure.ac, and I think it would be wise to make the requirement optional.
Is it really needed in this particular case? OpenBSC would continue working with SQLite < 3.7.0. It would just ignore `journal_mode = WAL` and keep using `DELETE` mode. The only thing that won't work is downgrading from newer SQLite to older while keeping the same database file. Older SQLite would refuse to open the database that was once opened in WAL mode. The file could be downgraded by changing journal_mode on it. But only newer SQLite could do it. Do you think something must be done about it?
Thanks for explaining in more detail! I do think something more needs to be done but I'm no longer so sure what I'd prefer..
I think it would be good to have a build-time (configure) option to explicitly enable WAL journalling and thus also require the newer SQLite version.
I also think that a version of OpenBSC built without WAL enabled has to be able to recognize databases created by OpenBSC built *with* WAL enabled, and tell the user what needs to be done to make the database work.
Finally, maybe the database version number should also be bumped, to allow OpenBSC using >=3.7 to know whether a database needs WAL or not.
All credit belongs Ivan Kluchnikov who diagnosed and fixed this issue.
Then the commit should list him as author, right?
As far as I understand, Ivan noticed the problem, opened the database with sqlite3 command, entered `PRAGMA journal_mode = WAL` and noticed the problem was now gone. Then he told me about it and suggested to write a patch. I found where to put this command in OpenBSC source, tested that it indeed persists WAL mode in the file, wrote the commit message and sent it to mailing list.
Oh ok, yes, then I think you should be the author and actually some more of the above explanation could easily be in the commit message. :)
//Peter
On Fri, Feb 21, 2014 at 3:52 AM, Peter Stuge peter@stuge.se wrote:
Alexey Shamrin wrote:
WAL journaling mode is persistent; it stays in effect after closing and reopening the database. WAL mode database requires SQLite >= 3.7.0.
Think about this some more. It's not enough to state dependencies in a commit message, you need to put them in the build system. A patch to do this will need to touch configure.ac, and I think it would be wise to make the requirement optional.
Is it really needed in this particular case? OpenBSC would continue working with SQLite < 3.7.0. It would just ignore `journal_mode = WAL` and keep using `DELETE` mode. The only thing that won't work is downgrading from newer SQLite to older while keeping the same database file. Older SQLite would refuse to open the database that was once opened in WAL mode. The file could be downgraded by changing journal_mode on it. But only newer SQLite could do it. Do you think something must be done about it?
Thanks for explaining in more detail! I do think something more needs to be done but I'm no longer so sure what I'd prefer..
I think it would be good to have a build-time (configure) option to explicitly enable WAL journalling and thus also require the newer SQLite version.
I don't think we need configure check - see my explanation below.
I also think that a version of OpenBSC built without WAL enabled has to be able to recognize databases created by OpenBSC built *with* WAL enabled, and tell the user what needs to be done to make the database work.
Finally, maybe the database version number should also be bumped, to allow OpenBSC using >=3.7 to know whether a database needs WAL or not.
I'm not sure about the DB version bump, as it doesn't touch DB structure.
OTOH I think we should have a code which enables WAL in old DBs, and then it's reasonable to do in the section where we upgrade DBs.
OTOH, an OpenBSC with SQLite < 3.7 should be able to jump onto a potential next DB revision without jumping over this "pseudo-revison". Otherwise it would be meaningless - DB version won't tell us whether we have WAL or not.
After all, I think the proper way is as follows:
1. OpenBSC should try to set WAL pragma on every start, unless a --no-wal command line option is specified:
2. OpenBSC should check SQLite version on startup and if it's <3.7 - print a warning to log that WAL can't be enabled and we recommend upgrading.
On Fri, Feb 21, 2014 at 02:00:01AM +0400, Alexey Shamrin wrote:
Hi,
SQLite uses DELETE journaling mode by default. It became a problem on a real-world, actively used network (several hundred subscribers, several thousands SMS per day). Some location updates and SMS delivery marks were not getting written because database file was locked:
2013-12-13_07:26:10.10665 <0002> gsm_subscriber.c:362 Subscriber 334020422525672 ATTACHED LAC=3 2013-12-13_07:26:10.10668 <000d> db.c:170 DBI: 5: database is locked 2013-12-13_07:26:10.10865 <000d> db.c:969 Failed to update Subscriber (by IMSI).
the answer right now is really not to access the database/tables from outside. So the "fix" is really just a workaround and WAL does not come for free either. In case you want to improve the database code please have a look here[1]. In the long run our interface should be async, we should use the DB APIs directly, use prepared statements, etc.
holger
[1] http://openbsc.osmocom.org/trac/wiki/Tasks/NITBAsyncDatabase
Hi Holger,
On Fri, Feb 21, 2014 at 4:53 PM, Holger Hans Peter Freyther holger@freyther.de wrote:
the answer right now is really not to access the database/tables from outside.
Thank you! I was under impression that OpenBSC was locking *itself* from database file. But this is obviously wrong, OpenBSC is single-threaded. I now know there was another script periodically querying SQLite file.
That said, it's not practical to never touch DB from outside and use only OpenBSC VTY interface. The latter doesn't allow custom SQL queries. Fortunately, the workaround is simple: issue `PRAGMA journal_mode = WAL` from `sqlite3` command once, when deploying OpenBSC.
So the "fix" is really just a workaround and WAL does not come for free either.
As far as I can tell, most disadvantages of WAL [1] don't concern OpenBSC. Possibly except the need to have write privileges to DB directory in order to read from WAL database.
In case you want to improve the database code please have a look here[1]. In the long run our interface should be async, we should use the DB APIs directly, use prepared statements, etc. [1] http://openbsc.osmocom.org/trac/wiki/Tasks/NITBAsyncDatabase
Yes, but switching the async interface alone won't fix the 'database is locked' problem. Only if we also switch to MySQL/Postgres. What do you mean by "use the DB APIs directly"?
[1]: https://www.sqlite.org/draft/wal.htm
Alexey
On Sat, Feb 22, 2014 at 12:19:09PM +0400, Alexey Shamrin wrote:
Hi,
That said, it's not practical to never touch DB from outside and use only OpenBSC VTY interface. The latter doesn't allow custom SQL queries. Fortunately, the workaround is simple: issue `PRAGMA journal_mode = WAL` from `sqlite3` command once, when deploying OpenBSC.
To ask the other way around. Which queries do yo run? Do you UPDATE subscriber structures too?
Yes, but switching the async interface alone won't fix the 'database is locked' problem. Only if we also switch to MySQL/Postgres. What do you mean by "use the DB APIs directly"?
My understanding is that real and bigger deployments should use a database server like Postgres. Currently we are using the libdbi and it doesn't really work well with SQLite[1]. For every column we SELECT libdbi will issue a PRAGMA to determine the type every time we query.
What I say is we need fixes and not work-arounds. The first step is making the code aware that a database lookup can take time.
holger
[1] http://comments.gmane.org/gmane.comp.db.libdbi.drivers/498
On Sat, Feb 22, 2014 at 12:48 PM, Holger Hans Peter Freyther holger@freyther.de wrote:
On Sat, Feb 22, 2014 at 12:19:09PM +0400, Alexey Shamrin wrote:
That said, it's not practical to never touch DB from outside and use only OpenBSC VTY interface. The latter doesn't allow custom SQL queries. Fortunately, the workaround is simple: issue `PRAGMA journal_mode = WAL` from `sqlite3` command once, when deploying OpenBSC.
To ask the other way around. Which queries do yo run? Do you UPDATE subscriber structures too?
SELECT only so far. We do not use UPDATE and INSERT or do this very rarely.
What I say is we need fixes and not work-arounds. The first step is making the code aware that a database lookup can take time.
I tend to agree that making WAL into the OpenBSC code doesn't worth the effort, which would be better spent on actually fixing the issue. It's easier to enable WAL from a script when we need to do so.