Dear List,
At the moment I am building a GSM-Network with OpenBSC and three NanoBTS at the University of Freiburg. The purpose of this network is teaching and research. The Network has an ISDN BRI Interface which allows every gsm-phone to be called over a land line telephony number.
We had huge problems with LCR-Stalling in our setup which lead to a increasing delay during a call (up to 6-10 seconds...). We could trace the problem to lookups in the database hlr.sqlite3. After some time the database was about 18MB big and it took some time to perform operations in the database.
The quick solution was to copy the database to a RAM-Disk. A RAM-Disk can be created like this: sudo mount -t tmpfs -o size=300M tmpfs ramdisk
After that, we never had seen stalling anymore.
But this can only be considered as a quick hack because in case of a system failure all changes to the database are lost.
Is someone at the moment working on a interface to a better database system? Maybe I find some student who can do this.
Regards Konrad Meier
On 08/10/2010 09:56 PM, Konrad Meier wrote:
Dear List,
Hi Konrad,
But this can only be considered as a quick hack because in case of a system failure all changes to the database are lost.
Is someone at the moment working on a interface to a better database system? Maybe I find some student who can do this.
do you have an idea why it is stalling? Is the executed query so complex or is someone trying to write into the database while we try to find a subscriber? At the 26C3 (IIRC) we had issues with blocking the database as we had a separate process that locked the database for its queries.
It would be very nice if you could find students as we have plenty DB related tasks where we could need a hand. Some of these could include:
-) Figure out why we are stalling (rw locks inside sqlite?) -) Create an index for the sqlite databases... (proof it is making things faster with a standalone benchmark having the same pattern as bsc_hack in your network...) -) Making the DB interface asynchronous (send and forget, send and async reply) -) Maybe go so far to use TCAP/MAP and implement a proper VLR module..
Am 10.08.2010 16:44, schrieb Holger Hans Peter Freyther:
do you have an idea why it is stalling? Is the executed query so complex or is someone trying to write into the database while we try to find a subscriber?
Yes, if think it is triggered by the size of the database. My hlr.sqlite is about 20MB big an is growing every day about 2MB. If I take a look at the Tables, I can see that the only big table is "Counters". Is there any reason why the table "Counters" is growing this fast?
The size of the database can be reduced by clearing the Counters table and performing the vaccum command in sqlite3. My database was only 180kB after cleaning.
I think a query should never block the program flow. In my setup I can see periodic read and write access to the database which are blocking the hole system and therefor LCR stalling messages are generated.
At the 26C3 (IIRC) we had issues with blocking the database as we had a separate process that locked the database for its queries.
It would be very nice if you could find students as we have plenty DB related tasks where we could need a hand. Some of these could include:
-) Figure out why we are stalling (rw locks inside sqlite?) -) Create an index for the sqlite databases... (proof it is making things faster with a standalone benchmark having the same pattern as bsc_hack in your network...) -) Making the DB interface asynchronous (send and forget, send and async reply) -) Maybe go so far to use TCAP/MAP and implement a proper VLR module..
I will see what I can do.
Regards Konrad
On 08/12/2010 12:33 AM, Konrad Meier wrote:
Yes, if think it is triggered by the size of the database. My hlr.sqlite is about 20MB big an is growing every day about 2MB. If I take a look at the Tables, I can see that the only big table is "Counters". Is there any reason why the table "Counters" is growing this fast?
In src/bsc_hack.c we are scheduling to sync the "Counters" every 60 seconds you might want to increase the define as a hacky workaround.
-) Figure out why we are stalling (rw locks inside sqlite?) -) Create an index for the sqlite databases... (proof it is making things faster with a standalone benchmark having the same pattern as bsc_hack in your network...) -) Making the DB interface asynchronous (send and forget, send and async reply) -) Maybe go so far to use TCAP/MAP and implement a proper VLR module..
I will see what I can do.
It would be very nice to have a histogram of which functions block, is it everything, just one query or really the write?
Am 11.08.2010 19:49, schrieb Holger Hans Peter Freyther:
In src/bsc_hack.c we are scheduling to sync the "Counters" every 60 seconds you might want to increase the define as a hacky workaround.
OK now i understand what the aim of the table "Counters" is.
It would be very nice to have a histogram of which functions block, is it everything, just one query or really the write?
Today I did some time measurements in the db.c (Debug log attached).
I think the problem is that that the insert-query of the counter values are blocking the program.
Regards Konrad
Am 12.08.2010 17:34, schrieb Konrad Meier:
Today I did some time measurements in the db.c (Debug log attached).
I think the problem is that that the insert-query of the counter values are blocking the program.
This is a "feature" of SQLite. After each INSERT, that is not part of an transaction, the database is writen to disk using fsync(). This leads to a huge performance penalty. Disabling this behaviour using PRAGMA synchronous = OFF at startup is IHMO fairly easy way to solve this problem.
Greetings Felix
[1] http://www.sqlite.org/faq.html#q19
Am 12.08.2010 19:56, schrieb Felix Rublack:
Am 12.08.2010 17:34, schrieb Konrad Meier:
Today I did some time measurements in the db.c (Debug log attached).
I think the problem is that that the insert-query of the counter values are blocking the program.
This is a "feature" of SQLite. After each INSERT, that is not part of an transaction, the database is writen to disk using fsync(). This leads to a huge performance penalty. Disabling this behaviour using PRAGMA synchronous = OFF at startup is IHMO fairly easy way to solve this problem.
Thanks.
Setting "PRAGMA synchronous = 0" solved the problem. But in case of a power failure the database may be corrupt. At the moment this is a acceptable risk for me since i am only running a test network at the university with no guaranteed service.
A different solution would be to to use BEGIN and COMMIT for a transaction like inserting the counter values.
Regard Konrad