Hi all,
We're moving this discussion to the mailing list, as it seems it is more generic and complex than we've thought initially.
The issue arose when I started doing load testing of the OsmoTRX transceiver and disabled all gating in it. As a result, all incoming noise was processed as valid Normal Bursts and Access Bursts and sent up to OsmoBTS. This leads to a situation, similar to a RACH flood, when there are more RACH requests coming, than a BTS could reasonably process. And this leads to an unbounded increase of the AGCH queue in the BTS - it consumes a few Mb per minute.
I think that this is the root cause of the issue we've seen at a Netherlands festival installation, when 20K phones suddenly started connecting to our station after official networks went down. When the amount of RACH requests exceeded available CCCH capacity (took <5 seconds), mobile phones stopped answering out IMM.ASS messages. Hypothesis is that the AGCH queue became so long, requests were sent too late for a phone to receive it. And thus no phones answered to our IMM.ASS messages. Unfortunately, I wasn't able to collect enough data to check this hypothesis during that time and we don't have another big festival on hands atm.
An attached is a quick fix for the unbounded queue growth. It uses a hardcoded value for the maximum queue length, which is fine for our load testing, but not flexible enough for the real life usage. We should make the AGCH queue long enough to keep high performance. At the same time, it must not exceed MS timeout or _all_ IMM.ASS messages will miss their target MS's.
We could make this parameter user-configurable on a BTS side, but it seems more reasonable to automatically calculate it, depending on the channel combination and timeout values. But this should be done on the BSC side. So the questions are: 1) what is a good way to calculate it? 2) should we configure this queue length over OML, or move the queue from BTS to BSC?
Alexander Chemeris wrote:
- what is a good way to calculate it?
- should we configure this queue length over OML, or move the queue
from BTS to BSC?
i suggest to use two queues:
* IMM.ASS messages which grant access * IMM.ASS messages which reject access
the first queue should have higher priority, because giving access to mobile stations will sooner or later resolve the rach flood.
another suggestion is to use a stack, rather than a fifo. this will ensure that latest assigned/rejected channels are handled first. we don't need to consider how long the mobile will wait for the assign/reject message, so we don't need to define a specific stack size. the latest messages will always be sent out early enough. the stack should have a limit, so oldest messages are dropped, if new messages are added. i still have no good solution for the maximum stack size.
On Tue, Sep 10, 2013 at 11:41:33AM +0400, Alexander Chemeris wrote:
Hi all,
An attached is a quick fix for the unbounded queue growth. It uses a hardcoded value for the maximum queue length, which is fine for our load testing, but not flexible enough for the real life usage. We should make the AGCH queue long enough to keep high performance. At the same time, it must not exceed MS timeout or _all_ IMM.ASS messages will miss their target MS's.
We could make this parameter user-configurable on a BTS side, but it seems more reasonable to automatically calculate it, depending on the channel combination and timeout values. But this should be done on the BSC side. So the questions are:
- what is a good way to calculate it?
- should we configure this queue length over OML, or move the queue
from BTS to BSC?
I had a quick look at 12.21 and there doesn't appear to be anything for the AGCH. So we will need to calculate the size from the channel configuration but this should be fairly easy. The other topic is the question of fairnes? The first requests will fill the queue, and then we drop most of the requests.
Can you implement the size calculation?
On Tue, Sep 10, 2013 at 1:21 PM, Holger Hans Peter Freyther holger@freyther.de wrote:
On Tue, Sep 10, 2013 at 11:41:33AM +0400, Alexander Chemeris wrote:
Hi all,
An attached is a quick fix for the unbounded queue growth. It uses a hardcoded value for the maximum queue length, which is fine for our load testing, but not flexible enough for the real life usage. We should make the AGCH queue long enough to keep high performance. At the same time, it must not exceed MS timeout or _all_ IMM.ASS messages will miss their target MS's.
We could make this parameter user-configurable on a BTS side, but it seems more reasonable to automatically calculate it, depending on the channel combination and timeout values. But this should be done on the BSC side. So the questions are:
- what is a good way to calculate it?
- should we configure this queue length over OML, or move the queue
from BTS to BSC?
I had a quick look at 12.21 and there doesn't appear to be anything for the AGCH. So we will need to calculate the size from the channel configuration but this should be fairly easy. The other topic is the question of fairnes? The first requests will fill the queue, and then we drop most of the requests.
In a flood situation fairness doesn't make much sense, since most of the requests will be dropped and we will process random requests anyway. So the best situation for the flood case is actually to have a very short (zero?) queue to reduce queueing latency. From this point, Andreas' ide to use a stack instead of a queue makes perfect sense.
Can you implement the size calculation?
I could, but I think you or Andreas will do this cleaner and with much less effort. I would rather spend my time on looking for other issues and implementing new features I have more experience with.
PS While you're here - I remind you that I'm still waiting for the review of the SMS DB schema update code.
On Tue, Sep 10, 2013 at 01:55:47PM +0400, Alexander Chemeris wrote:
Can you implement the size calculation?
I could, but I think you or Andreas will do this cleaner and with much less effort. I would rather spend my time on looking for other issues and implementing new features I have more experience with.
I will not have time to work on this anytime soon.
On Tue, Sep 10, 2013 at 2:05 PM, Holger Hans Peter Freyther holger@freyther.de wrote:
On Tue, Sep 10, 2013 at 01:55:47PM +0400, Alexander Chemeris wrote:
Can you implement the size calculation?
I could, but I think you or Andreas will do this cleaner and with much less effort. I would rather spend my time on looking for other issues and implementing new features I have more experience with.
I will not have time to work on this anytime soon.
If Andreas don't implement the calculation, we'll go with a manually configured limit. Should be fine for most use cases.
Hi all,
The attached patches for openbsc and osmo-bts finally fix issues which were discussed above.
There are two parts:
1. Implementation of sending rsl Delete_Ind message (on bts side) and handling it (on bsc side). Bts should send rsl Delete Ind message to bsc, if there is no space in agch queue and bts can not send current imm assign message. When bsc receives Delete_Ind message from bts, bsc should release allocated channel, which was specified in dropped imm_assign message.
2. Implemented calculation of agch queue length. Bts should calculate allowed length of agch queue, because bts should send imm assign message before immediate assignment procedure will be aborted by MS. Imm assign message can be queued no longer than T3126, so agch queue length should be equal (T3126 / 51 ) * bs_ag_blks_res.
These patches fix critical issues and prevent network failures under heavy load, so it makes sense to merge them asap.
2013-09-10 16:50 GMT+04:00 Holger Hans Peter Freyther holger@freyther.de:
On Tue, Sep 10, 2013 at 02:24:38PM +0400, Alexander Chemeris wrote:
If Andreas don't implement the calculation, we'll go with a manually configured limit. Should be fine for most use cases.
come on, please don't be so lazy.
On Tue, Feb 11, 2014 at 11:13:35AM +0400, Ivan Kluchnikov wrote:
Good Morning,
The attached patches for openbsc and osmo-bts finally fix issues which were discussed above.
what a surprise. Jacob has started implementing AGCH queue handling as well. So I will let him comment on the BTS part.
OpenBSC:
- /* bts didn't send IMM_ASSIGN, so we should release allocated channel */
- ia = (struct gsm48_imm_ass *) (rqd_hdr->data + 2);
Please add a size check that the mandatory element actually fits and use early returns.
- if (ia->msg_type == GSM48_MT_RR_IMM_ASS) {
chan_nr = ia->chan_desc.chan_nr;lchan = lchan_lookup(trx, chan_nr);
same thing for the lchan. Verify it was found and that the state is actually the right one.
rsl_rf_chan_release(lchan, 1, SACCH_DEACTIVATE);
Maybe use rsl_direct_rf_release. By definition there is no one listening on the SACCH. So there is not point in going through the normal release procedure.
+/* 8.5.4 DELETE INDICATION */ +int rsl_tx_delete_ind(struct gsm_bts *bts, uint8_t len, uint8_t *val) +{
- struct msgb *msg;
- msg = rsl_msgb_alloc(sizeof(struct abis_rsl_cchan_hdr));
- if (!msg)
return -ENOMEM;- rsl_cch_push_hdr(msg, RSL_MT_DELETE_IND, RSL_CHAN_PCH_AGCH);
- msgb_tlv_put(msg, RSL_IE_FULL_IMM_ASS_INFO, len, val);
- msg->trx = bts->c0;
Have you manually tested this with multi-trx support and the channel being on the second trx? The lchan_lookup in OpenBSC will be done using the "bts->c0"?
msgb_free(msg);
rsl_tx_delete_ind(trx->bts, msg->len, msg->data);return -ENOMEM;
Does it leak now? or was it a double free before?
Hi Holger,
Thank you for review, I will fix my patches today or tomorrow and send you new version asap.
2014-02-11 14:14 GMT+04:00 Holger Hans Peter Freyther holger@freyther.de:
On Tue, Feb 11, 2014 at 11:13:35AM +0400, Ivan Kluchnikov wrote:
Good Morning,
The attached patches for openbsc and osmo-bts finally fix issues which were discussed above.
what a surprise. Jacob has started implementing AGCH queue handling as well. So I will let him comment on the BTS part.
OpenBSC:
/* bts didn't send IMM_ASSIGN, so we should release allocated channel */ia = (struct gsm48_imm_ass *) (rqd_hdr->data + 2);Please add a size check that the mandatory element actually fits and use early returns.
if (ia->msg_type == GSM48_MT_RR_IMM_ASS) {chan_nr = ia->chan_desc.chan_nr;lchan = lchan_lookup(trx, chan_nr);same thing for the lchan. Verify it was found and that the state is actually the right one.
rsl_rf_chan_release(lchan, 1, SACCH_DEACTIVATE);Maybe use rsl_direct_rf_release. By definition there is no one listening on the SACCH. So there is not point in going through the normal release procedure.
+/* 8.5.4 DELETE INDICATION */ +int rsl_tx_delete_ind(struct gsm_bts *bts, uint8_t len, uint8_t *val) +{
struct msgb *msg;msg = rsl_msgb_alloc(sizeof(struct abis_rsl_cchan_hdr));if (!msg)return -ENOMEM;rsl_cch_push_hdr(msg, RSL_MT_DELETE_IND, RSL_CHAN_PCH_AGCH);msgb_tlv_put(msg, RSL_IE_FULL_IMM_ASS_INFO, len, val);msg->trx = bts->c0;Have you manually tested this with multi-trx support and the channel being on the second trx? The lchan_lookup in OpenBSC will be done using the "bts->c0"?
msgb_free(msg);
rsl_tx_delete_ind(trx->bts, msg->len, msg->data);return -ENOMEM;Does it leak now? or was it a double free before?
On Thu, Feb 13, 2014 at 04:53:27PM +0400, Ivan Kluchnikov wrote:
Dear Ivan,
Thank you for review, I will fix my patches today or tomorrow and send you new version asap.
can you please confirm/reject my statements from below. Does the code have the defect for multi-trx that I lined out?
holger
Hi Holger,
Yes, you are right, the code had the defect for multi-trx. We should use arfcn value from immediate assignment message to determine trx. I sent new version of the patch to openbsc mailing list, please check it. I also added checks for size and lchan.
2014-02-13 17:25 GMT+04:00 Holger Hans Peter Freyther holger@freyther.de:
On Thu, Feb 13, 2014 at 04:53:27PM +0400, Ivan Kluchnikov wrote:
Dear Ivan,
Thank you for review, I will fix my patches today or tomorrow and send you new version asap.
can you please confirm/reject my statements from below. Does the code have the defect for multi-trx that I lined out?
holger
Dear Ivan,
since I've been investigating this topic for some days now, I would like to share my thoughts about your approach, so that we can resolve this issue quickly together.
On 11.02.2014 08:13, Ivan Kluchnikov wrote:
- Implementation of sending rsl Delete_Ind message (on bts side) and
handling it (on bsc side). Bts should send rsl Delete Ind message to bsc, if there is no space in agch queue and bts can not send current imm assign message. When bsc receives Delete_Ind message from bts, bsc should release allocated channel, which was specified in dropped imm_assign message.
As far as I understand it is not important to send such messages to IMMEDIATE ASSIGNMENT REJECT messages. But I like this idea to shorten the timeout.
- Implemented calculation of agch queue length.
Bts should calculate allowed length of agch queue, because bts should send imm assign message before immediate assignment procedure will be aborted by MS. Imm assign message can be queued no longer than T3126, so agch queue length should be equal (T3126 / 51 ) * bs_ag_blks_res.
I'm not sure about the correctness of the max length calculation because of a) There are other timers/conditions that are influenced by the queue delay: - T3101 limits channel reservation on the network side. - Only the last 3 CHANNEL REQUEST messages are matched by the MS against incoming IMMEDIATE ASSIGNMENT messages. b) If the queue has the length limited by a) we know that at least the last (if not every) element of the queue will be delivered too late.
Since your patch isn't using T3126 but min(T3126) which is based on T + 2*S, and since this reflects the 3 CHAN-REQ restriction, this is already solved (T3101 is probably larger that min(T3126)). I'll use maxL = (min(T3126) / 51 ) * bs_ag_blks_res below.
The b) issue is another thing. Because of additional round-trip delays (DSP queues, BSC<->BTS delay) it might be sensible to reduce the max queue length by another factor. But AFAICS the main problem with dropping packets at the queue's input is, that you maintain a queue with stale messages while throwing away the fresh ones. So under heavy load, latency increases which is especially bad with IMMEDIATE ASSIGNMENT messages.
I've two slightly different proposals to fix this:
(1) If the queue is full (e.g. at 0.8 * maxL), flush all IMMEDIATE ASSIGNMENT REJECT messages. Don't notify the BSC about those. If there are still too many messages left afterwards one might also drop an IMMEDIATE ASSIGNMENT (perhaps from the queues input) and notify the BSC like in your patch.
(2) When taking message from the queue, throw them away if they are IMMEDIATE ASSIGNMENT REJECTs and the queue is too long (e.g. > maxL/2) and try the next message until that condition fails (or some low water mark is reached). A variant would be to drop IA REJECTs based on a queue usage dependant probability (e.g. p(L) = L / maxL).
I'm not sure, which approach would perform better considering different scenarios like burst-like accesses and high packet loss. I've done hand simulations for a single burst case (10 MS compete simultaneously for 4 channels) and both approaches worked well (much better than the current master). (2) is probably easier to implement.
What do you think about that?
These patches fix critical issues and prevent network failures under heavy load,
I'm not convinced that this approach will work reliably under overload (see above). It just guarantees a maximum queue length.
so it makes sense to merge them asap.
What about rebasing on on origin/master? I think that would speed this up a lot. Is there something missing from jolly's branch beside queue length accounting?
@@ -50,6 +50,7 @@ struct gsm_bts_role_bts { uint8_t max_ta; struct llist_head agch_queue; int agch_queue_count;
I'd rather not use this name, but agch_queue_length instead. I think 'length' more precise than 'size' or 'count' when asking for the number of elements of a list or a queue. 'count' makes me think about the number of queues. Quite a few libraries have chosen 'size' instead, but it's ambiguous, whether this refers to memory size or to the number of elements.
- uint8_t agch_queue_len;
What about agch_max_queue_length or agch_queue_length_limit?
int bts_agch_enqueue(struct gsm_bts *bts, struct msgb *msg) { struct gsm_bts_role_bts *btsb = bts_role_bts(bts);
- struct gsm48_system_information_type_3 *si3;
- uint8_t T, S, agch_num, i;
- uint8_t T_group = 0;
- uint8_t ccch_comb = 0;
- /* calculate length of agch queue
- agch_queue_len = ( min( T3126 ) / 51 ) * bs_ag_blks_res
- min(T3126) = T + 2*S defined in 04.08 11.1.1
Are you sure? Isn't it min(T3126) = (T + 2*S) / (RACH slots per second) ?
- S and T are defined in 04.08 3.3.1.1.2 */
- si3 = GSM_BTS_SI(bts, SYSINFO_TYPE_3);
- T = si3->rach_control.tx_integer;
- for (i = 0; i < 15; i++) {
if (tx_integer[i] == T) {T_group = i % 5;break;}- }
- if (si3->control_channel_desc.ccch_conf == 1) {
ccch_comb = 1;- }
- S = s_values[T_group][ccch_comb];
- agch_num = si3->control_channel_desc.bs_ag_blks_res;
- if (btsb->agch_queue_count >= 30)
- btsb->agch_queue_len = ((T + 2 * S) / 51) * agch_num;
This might lead to smaller values than neccessary due to rounding errors. Why not using agch_queue_len = (??? * agch_num) / 51? But I think, the max length computation does not yield correct results anyway because of the number of RACH slots is not taken into account.
I'd rather put the above into an own function (bts_agch_update_max_queue_length?) that is only called after config/bs_ag_blks_res is changed/set.
Best wishes
Jacob
Hi,
On 11.02.2014 15:07, Jacob Erlbeck wrote:
- Implemented calculation of agch queue length.
Bts should calculate allowed length of agch queue, because bts should send imm assign message before immediate assignment procedure will be aborted by MS. Imm assign message can be queued no longer than T3126, so agch queue length should be equal (T3126 / 51 ) * bs_ag_blks_res.
My understanding of GSM 05.02 3.3.2.3, 6.5.1 v), and clause 7 table 5 was, that bs_ag_blks_res defines the number of blocks _reserved_ for AGCH an SI messages per 51-multiframe, but does not _limit_ it. Is this correct? If yes, is there a reason why the implementations (master and jolly/trx) do not use PCH blocks when needed?
I'm not sure about the correctness of the max length calculation because of a) There are other timers/conditions that are influenced by the queue delay:
- T3101 limits channel reservation on the network side.
- Only the last 3 CHANNEL REQUEST messages are matched by the MS against incoming IMMEDIATE ASSIGNMENT messages.
More precisely, this limits the allowable delay to 3*(S+T/2) while CHANNEL REQUESTs are being sent. After the last one, T3126 is relevant. At least, this is my understanding of GSM 04.08, 3.3.1.1.2. So I'd rather drop that requirement, since 3S+1.5T > 2S+T.
Jacob
Hi Jacob,
Thank you for your comments and ideas!
First I want to summarize you previous ideas, which I think we should implement: 1. we shouldn't send Delete Ind message for IMMEDIATE ASSIGNMENT REJECT messages 2. Just drop IA REJECTs, if L > maxL/2 (am I right?) 3. Use agch_queue_length and agch_max_queue_length names 4. Implement function bts_agch_update_max_queue_length, that is only called after config/bs_ag_blks_res is changed/set (do you know places where bs_ag_blks_res is changed/set?) 5. Yes, you are right, we can use PCH blocks when needed, it is good question, why (master and jolly/trx) do not use PCH blocks when needed. Actually we already have function for that: int paging_add_imm_ass(struct paging_state *ps, const uint8_t *data, uint8_t len); Moreover we should also determine when we should start to use PCH for imm assign messages and what we should do with agch_max_queue_length in this case, what do you think about it? 6. What do you finally think about calculating agch_max_queue_length? What is the right way to calculate it from your point of view?
2014-02-13 15:07 GMT+04:00 Jacob Erlbeck jerlbeck@sysmocom.de:
Hi,
On 11.02.2014 15:07, Jacob Erlbeck wrote:
- Implemented calculation of agch queue length.
Bts should calculate allowed length of agch queue, because bts should send imm assign message before immediate assignment procedure will be aborted by MS. Imm assign message can be queued no longer than T3126, so agch queue length should be equal (T3126 / 51 ) * bs_ag_blks_res.
My understanding of GSM 05.02 3.3.2.3, 6.5.1 v), and clause 7 table 5 was, that bs_ag_blks_res defines the number of blocks _reserved_ for AGCH an SI messages per 51-multiframe, but does not _limit_ it. Is this correct? If yes, is there a reason why the implementations (master and jolly/trx) do not use PCH blocks when needed?
I'm not sure about the correctness of the max length calculation because of a) There are other timers/conditions that are influenced by the queue delay:
- T3101 limits channel reservation on the network side.
- Only the last 3 CHANNEL REQUEST messages are matched by the MS against incoming IMMEDIATE ASSIGNMENT messages.
More precisely, this limits the allowable delay to 3*(S+T/2) while CHANNEL REQUESTs are being sent. After the last one, T3126 is relevant. At least, this is my understanding of GSM 04.08, 3.3.1.1.2. So I'd rather drop that requirement, since 3S+1.5T > 2S+T.
Jacob
On Thu, Feb 13, 2014 at 2:48 PM, Ivan Kluchnikov Ivan.Kluchnikov@fairwaves.ru wrote:
- Yes, you are right, we can use PCH blocks when needed, it is good
question, why (master and jolly/trx) do not use PCH blocks when needed. Actually we already have function for that: int paging_add_imm_ass(struct paging_state *ps, const uint8_t *data, uint8_t len); Moreover we should also determine when we should start to use PCH for imm assign messages and what we should do with agch_max_queue_length in this case, what do you think about it?
I believe that we should start using PCH for IMM.ASS as soon as we exhaust capacity of the AGCH.
Imho, IMM.ASS has higher priority than PCH, as paging will lead to IMM.ASS anyway, and if AGCH is congested, there is no point in sending any more paging.
- What do you finally think about calculating agch_max_queue_length?
What is the right way to calculate it from your point of view?
You could also consider another approach. Instead of limiting the queue length - limit the age of IMM.ASS in the queue.
I'm not sure about the current implementation, but in general you should be able to predict when an IMM.ASS is sent out and thus you could predict whether the phone will receive it or not.
On 13.02.2014 14:01, Alexander Chemeris wrote:
I believe that we should start using PCH for IMM.ASS as soon as we exhaust capacity of the AGCH.
Imho, IMM.ASS has higher priority than PCH, as paging will lead to IMM.ASS anyway, and if AGCH is congested, there is no point in sending any more paging.
I'm not so comfortable with that. We would then have a decreasing probability within a 51-multiframe of having a PCH message being cannibalized by a AGCH message. That in turn would IMO lead to unfair treatment of MS because of the paging group they belong to (AFAI understand GSM 05.02, 6.5.1 vi and 6.5.2). In addition, I didn't understand the 'extended paging' (see GSM 04.08, 3.3.2.1.1 b) well enough, to estimate the implications here.
I'd rather prioritize paging messages over IMM.ASS.* to stay on the safe side until we have measurements or simulations that suggest otherwise.
- What do you finally think about calculating agch_max_queue_length?
What is the right way to calculate it from your point of view?
You could also consider another approach. Instead of limiting the queue length - limit the age of IMM.ASS in the queue.
Yes, this was my first approach, too. But it is more complex and we still need to determine max-age which suffers from the same difficulties like the max-queue-length computation.
I also thought about having to queues: - One for non-reject messages (IMM.ASS.CMD and IMM.ASS.EXT) that is unlimited and has a high prio - Another for the reject messages only with a lower prio, where messages older that 3(S+T/2) RACH slots are silently dropped
But I have the impression, that a simpler solution (like those I stated in the other mail) will take us far enough with much less efforts.
Jacob
Jacob,
On Fri, Feb 14, 2014 at 12:22 PM, Jacob Erlbeck jerlbeck@sysmocom.de wrote:
On 13.02.2014 14:01, Alexander Chemeris wrote:
I believe that we should start using PCH for IMM.ASS as soon as we exhaust capacity of the AGCH.
Imho, IMM.ASS has higher priority than PCH, as paging will lead to IMM.ASS anyway, and if AGCH is congested, there is no point in sending any more paging.
I'm not so comfortable with that. We would then have a decreasing probability within a 51-multiframe of having a PCH message being cannibalized by a AGCH message. That in turn would IMO lead to unfair treatment of MS because of the paging group they belong to (AFAI understand GSM 05.02, 6.5.1 vi and 6.5.2). In addition, I didn't understand the 'extended paging' (see GSM 04.08, 3.3.2.1.1 b) well enough, to estimate the implications here.
I'd rather prioritize paging messages over IMM.ASS.* to stay on the safe side until we have measurements or simulations that suggest otherwise.
Sending more paging messages only increases a number of IMM.ASS messages being sent out. Thus it doesn't make sense to prioritize PCH over AGCH.
Another question is how do we schedule IMM.ASS in case we cannibalize PCH. IIRC, an MS could be sleeping during paging groups it doesn't belong to and thus might miss the IMM.ASS we're sending. We have to check this against the standard.
- What do you finally think about calculating agch_max_queue_length?
What is the right way to calculate it from your point of view?
You could also consider another approach. Instead of limiting the queue length - limit the age of IMM.ASS in the queue.
Yes, this was my first approach, too. But it is more complex and we still need to determine max-age which suffers from the same difficulties like the max-queue-length computation.
I was thinking about scheduling IMM.ASS and at the time of its arrival. May be even replacing a queue with a fixed-size round-robin "map". In this case there should be no issues with understanding if IMM.ASS is too late, as we'll know when exactly it's to be sent.
This changes the current code structure, though, and I haven't estimated the effort required to do that.
But I have the impression, that a simpler solution (like those I stated in the other mail) will take us far enough with much less efforts.
Which solution are you referring to?
Hello
On 16.02.2014 12:20, Alexander Chemeris wrote:
On Fri, Feb 14, 2014 at 12:22 PM, Jacob Erlbeck jerlbeck@sysmocom.de wrote:
On 13.02.2014 14:01, Alexander Chemeris wrote:
I'd rather prioritize paging messages over IMM.ASS.* to stay on the safe side until we have measurements or simulations that suggest otherwise.
Sending more paging messages only increases a number of IMM.ASS messages being sent out. Thus it doesn't make sense to prioritize PCH over AGCH.
That might be true for many cases, but there might be other cases like issuing a paging request at the end of a short term AGCH overload where it would make sense, to not drop it but some IMM.ASS.REJ instead. Another case: all IMM.ASS.* messages are about TCH's and the paging request is for SDCCH (or vice versa). So until there are simulations covering all sensible combinations and all of them suggest that always prioritizing AGCH over PCH is really worth the drawbacks, I won't do that. It is not required by the spec either, which provides AG block reservation to cope with frequent PCH overloads instead. There might be reasons, why they didn't suggest this kind of prioritization and I'd rather be careful until I know that these are not technical ones.
So I've implemented the usage of _free_ PCH blocks for IMM.ASS which is IMO a big improvement over the current situation and without any drawback I know of. In addition, I've added dropping of IMM.ASS.REJ based on AGCH queue length, to further increase the AGCH queue's drain rate. This is beyond the spec too, but doesn't touch another procedure at least.
I suggest to gather real live experiences with these changes and implement more sophisticated solutions when need arises.
Another question is how do we schedule IMM.ASS in case we cannibalize PCH. IIRC, an MS could be sleeping during paging groups it doesn't belong to and thus might miss the IMM.ASS we're sending. We have to check this against the standard.
Paging groups are not relevant for the RR connection establishment procedure (AFAICS). According to GSM 04.08 3.3.1.1.3.1 and .2 IMM.ASS.* is only restricted to the "same CCCH timeslot" where the CHANNEL REQUEST has been sent, "there is no further restriction on what part of the downlink CCCH" the IMM.ASS.* is sent.
But I have the impression, that a simpler solution (like those I stated in the other mail) will take us far enough with much less efforts.
Which solution are you referring to?
See 52FA2E91.1090605@sysmocom.de (1) and (2) and above.
Jacob
Jacob,
Your implementation (jerbeck/agch-queue branch) looks reasonable. But from my point of view we should add sending rsl Delete Ind message to bsc and implement final check (in bts_agch_dequeue function) that immediate assignment message is still valid for ms. My plan is to test your code and add sending rsl Delete Ind message to bsc. After that we can try to implement lifetime parameter for immediate assignment message.
2014-02-17 13:59 GMT+04:00 Jacob Erlbeck jerlbeck@sysmocom.de:
Hello
On 16.02.2014 12:20, Alexander Chemeris wrote:
On Fri, Feb 14, 2014 at 12:22 PM, Jacob Erlbeck jerlbeck@sysmocom.de wrote:
On 13.02.2014 14:01, Alexander Chemeris wrote:
I'd rather prioritize paging messages over IMM.ASS.* to stay on the safe side until we have measurements or simulations that suggest otherwise.
Sending more paging messages only increases a number of IMM.ASS messages being sent out. Thus it doesn't make sense to prioritize PCH over AGCH.
That might be true for many cases, but there might be other cases like issuing a paging request at the end of a short term AGCH overload where it would make sense, to not drop it but some IMM.ASS.REJ instead. Another case: all IMM.ASS.* messages are about TCH's and the paging request is for SDCCH (or vice versa). So until there are simulations covering all sensible combinations and all of them suggest that always prioritizing AGCH over PCH is really worth the drawbacks, I won't do that. It is not required by the spec either, which provides AG block reservation to cope with frequent PCH overloads instead. There might be reasons, why they didn't suggest this kind of prioritization and I'd rather be careful until I know that these are not technical ones.
So I've implemented the usage of _free_ PCH blocks for IMM.ASS which is IMO a big improvement over the current situation and without any drawback I know of. In addition, I've added dropping of IMM.ASS.REJ based on AGCH queue length, to further increase the AGCH queue's drain rate. This is beyond the spec too, but doesn't touch another procedure at least.
I suggest to gather real live experiences with these changes and implement more sophisticated solutions when need arises.
Another question is how do we schedule IMM.ASS in case we cannibalize PCH. IIRC, an MS could be sleeping during paging groups it doesn't belong to and thus might miss the IMM.ASS we're sending. We have to check this against the standard.
Paging groups are not relevant for the RR connection establishment procedure (AFAICS). According to GSM 04.08 3.3.1.1.3.1 and .2 IMM.ASS.* is only restricted to the "same CCCH timeslot" where the CHANNEL REQUEST has been sent, "there is no further restriction on what part of the downlink CCCH" the IMM.ASS.* is sent.
But I have the impression, that a simpler solution (like those I stated in the other mail) will take us far enough with much less efforts.
Which solution are you referring to?
See 52FA2E91.1090605@sysmocom.de (1) and (2) and above.
Jacob
Dear Ivan,
On 18.02.2014 11:58, Ivan Kluchnikov wrote:
Your implementation (jerbeck/agch-queue branch) looks reasonable.
But it isn't ;-) It's just a snapshot and not yet finished.
But from my point of view we should add sending rsl Delete Ind message to bsc and implement final check (in bts_agch_dequeue function) that immediate assignment message is still valid for ms.
I thought that your Delete Ind patches could be added as next step, since they are orthogonal to the queue handling itself.
My plan is to test your code and add sending rsl Delete Ind message to bsc.
Please wait with testing, the current branch just segfaults and I'll fix (and rebase) it soon.
After that we can try to implement lifetime parameter for immediate assignment message.
What do you mean exactly?
Cheers
Jacob
Your implementation (jerbeck/agch-queue branch) looks reasonable.
But it isn't ;-) It's just a snapshot and not yet finished.
Ok, I understand, actually my main goal is to test bts_update_agch_max_queue_length and compact_agch_queue functions.
But from my point of view we should add sending rsl Delete Ind message to bsc and implement final check (in bts_agch_dequeue function) that immediate assignment message is still valid for ms.
I thought that your Delete Ind patches could be added as next step, since they are orthogonal to the queue handling itself.
Yes.
My plan is to test your code and add sending rsl Delete Ind message to bsc.
Please wait with testing, the current branch just segfaults and I'll fix (and rebase) it soon.
Ok.
After that we can try to implement lifetime parameter for immediate assignment message.
What do you mean exactly?
The idea is to save gsm_time for each imm assign message, when we add this message to agch queue. After that when we are ready to send this message and dequeue it, we are able to calculate how long this message was in agch queue and finally decide what to do, send or drop this imm assign message.
On Tue, Feb 18, 2014 at 10:36 PM, Ivan Kluchnikov Ivan.Kluchnikov@fairwaves.ru wrote:
After that we can try to implement lifetime parameter for immediate assignment message.
What do you mean exactly?
The idea is to save gsm_time for each imm assign message, when we add this message to agch queue. After that when we are ready to send this message and dequeue it, we are able to calculate how long this message was in agch queue and finally decide what to do, send or drop this imm assign message.
Are you sure that we could rely on the IMM.ASS timestamp to do the final judgement? From our discussions I was under impression that we should measure time difference from the original RACH burst, as that's what MS is measuring. Do I understand the procedure incorrectly?
Hi Ivan,
On 13.02.2014 13:48, Ivan Kluchnikov wrote:
First I want to summarize you previous ideas, which I think we should implement:
- we shouldn't send Delete Ind message for IMMEDIATE ASSIGNMENT REJECT messages
- Just drop IA REJECTs, if L > maxL/2 (am I right?)
I'm not sure about which factor to use, especially if maxL is very low. But I think it's a good start. Maybe we need some simulations to fine-tune. In addition, we should add stat counters for that.
- Use agch_queue_length and agch_max_queue_length names
- Implement function bts_agch_update_max_queue_length, that is only
called after config/bs_ag_blks_res is changed/set (do you know places where bs_ag_blks_res is changed/set?)
We've to check. If any of the input values changes, we have to update. But let's put that call into agch_enqueue initially. Moving this to other places is just an optimization and I would like to defer it, until the rest has proved to work.
- Yes, you are right, we can use PCH blocks when needed, it is good
question, why (master and jolly/trx) do not use PCH blocks when needed.
I've just set up a test case with NITB and two phones, and modified code where only the PCH blocks are used for AGCH and PCH and it worked pretty well.
Actually we already have function for that: int paging_add_imm_ass(struct paging_state *ps, const uint8_t *data, uint8_t len); Moreover we should also determine when we should start to use PCH for imm assign messages and what we should do with agch_max_queue_length in this case, what do you think about it?
I would use PCH for AGCH messages, when there is not pending paging message. And I wouldn't use paging_add_imm_ass() but just call bts_agch_dequeue() when there is no paging message. I wouldn't do this in paging_gen_message() but outside of it, since it is no paging message. agch_max_queue_length should be based on ag_blks_res + gsm0502_get_n_pag_blocks() (or something delivering the same result) then.
- What do you finally think about calculating agch_max_queue_length?
What is the right way to calculate it from your point of view?
I'm not sure yet. Since there is a direct relationship between number of RACH bursts and CCCH blocks per multiframe, there could be a way to simplify the calculation. One have to adjust it, if there are optional CCCH blocks that are not AGCH/PCH (like CBCH), but even if that is not taken into account, the error might be tolerable.
Jacob