Hi,
I have added rate counters to the PCU and noticed that a lot of frames in the DL are re-sent without there actually being any NACK or transmission errors. I have also re-produced a "stall" by just using ping with a big enough data size and a low enough interval (the PCU appears to transmit the ICMP PING but the MS never gets to send a PONG packet).
In both cases it could be a scheduling issue (e.g. we don't schedule the UL Packet ACK/NACK). Is any of you aware of it? I had already highlighted that the scheduling is not fair. I wonder if we are seeing starvation (and will re-factor to be able to unit test this).
In terms of scheduling. For the SBA we are already having a reservation that we try to honor in multiple places. Maybe it is time to extend it and have more reservations?
holger
Holger Hans Peter Freyther wrote:
Hi,
I have added rate counters to the PCU and noticed that a lot of frames in the DL are re-sent without there actually being any NACK or transmission errors.
dear holger,
i can only guess what causes the issues you describe. so i suggest that we should do a debug session at the congress, if you attend to it and find the time.
generally there is always a re-send due to delay between pcu and the radio interface. if all rlc/mac data blocks of an llc frame have been sent to the phone, a packet downlink ack control block is requested and scheduled. the pcu repeats all unacknowledged data blocks in a loop until it receives that requested control block. this way the pcu starts re-sending data blocks that might got lost, before it actually knows if and which blocks still need to be resend. (this procedure is described in TS 04.60.)
I have also re-produced a "stall" by just using ping with a big enough data size and a low enough interval (the PCU appears to transmit the ICMP PING but the MS never gets to send a PONG packet).
could this be an mtu issue at the phone? does the phone requests an uplink tbf to send a pong?
In both cases it could be a scheduling issue (e.g. we don't schedule the UL Packet ACK/NACK).
in case of an ongoing uplink tbf, we ack/nack received or timed out uplink rlc/mac blocks by using a control block. even if there is an ongoing downlink tbf, all control blocks have priority over data blocks.
Is any of you aware of it? I had already highlighted that the scheduling is not fair. I wonder if we are seeing starvation (and will re-factor to be able to unit test this).
In terms of scheduling. For the SBA we are already having a reservation that we try to honor in multiple places. Maybe it is time to extend it and have more reservations?
uplink control messages are reserved in the tbf object (poll_fn). the sba control messages are reserved in a separate structure, since sba are not (or not yet) related to a tbf.
regards,
andreas
On Mon, Nov 25, 2013 at 08:24:27AM +0100, Andreas Eversberg wrote:
could this be an mtu issue at the phone? does the phone requests an uplink tbf to send a pong?
the first 10 pings going through and then nothing coming back? I don't think it is a MTU issue. I don't know what the root cause is but the structure of the code doesn't make it easy to find it.
In both cases it could be a scheduling issue (e.g. we don't schedule the UL Packet ACK/NACK).
in case of an ongoing uplink tbf, we ack/nack received or timed out uplink rlc/mac blocks by using a control block. even if there is an ongoing downlink tbf, all control blocks have priority over data blocks.
As I have already pointed out. Some control blocks can starve. I don't know if it does but it is where I will continue to have a look.
Is any of you aware of it? I had already highlighted that the scheduling is not fair. I wonder if we are seeing starvation (and will re-factor to be able to unit test this).
In terms of scheduling. For the SBA we are already having a reservation that we try to honor in multiple places. Maybe it is time to extend it and have more reservations?
uplink control messages are reserved in the tbf object (poll_fn). the sba control messages are reserved in a separate structure, since sba are not (or not yet) related to a tbf.
well. In the tbf you have:
else if (bts->sba()->find(trx->trx_no, ts, (fn + 13) % 2715648)) LOGP(DRLCMAC, LOGL_DEBUG, "Polling cannot be " "sheduled, because single block alllocation " "already exists\n");
But there is nothing that checks if another TBF is setting poll_fn to a frame that another tbf is already polling.
On Mon, Nov 25, 2013 at 09:21:08AM +0100, Holger Hans Peter Freyther wrote:
As I have already pointed out. Some control blocks can starve. I don't know if it does but it is where I will continue to have a look.
Here is the starvation theory for the ping:
We have a LLC frame (or many queued up)... we schedule the polls and at the final_ack indicate we decide to re-use the TBF. This means that we will schedule another PACKET DOWNLINK ASSIGNMENT. But at the same time we either want to honor the "rh->si" or want to schedule the ACK due SEND_ACK_AFTER_FRAMES.
So we more or less want to send "PACKET DOWNLINK ASSIGNMENT" and the "PACKET UPLINK ACK" at the same time (with more DL tbfs/traffic this is getting more likely) but currently we will always prefer the PACKET DOWNLINK ASSIGNMENT. This means that the uplink will starve (e.g. the window stalled, rh->si means that the uplink will starve (the window will stall, rh->si being set, etc).
Fairness improvements:
* put ul_ass_tbf, dl_ass_tbf, ul_ack_tbf in an array and store the last index in the PDCH and iterate over it. * move the TBF to the front of the ul_tbfs and dl_tbfs list so that it is not selected again.
Holger Hans Peter Freyther wrote:
Here is the starvation theory for the ping:
We have a LLC frame (or many queued up)... we schedule the polls and at the final_ack indicate we decide to re-use the TBF. This means that we will schedule another PACKET DOWNLINK ASSIGNMENT. But at the same time we either want to honor the "rh->si" or want to schedule the ACK due SEND_ACK_AFTER_FRAMES.
the idea behind priority of PACKET DOWNLINK ASSIGNMENT is that i do not want the phone to switch back to idle mode. if i would ACK all uplink blocks, the MS might switch back to idle mode immediately and will never receive the PACKET DOWNLINK ASSIGNMENT.
when there was an ongoing downlink tbf, the phone keeps in transfer mode until T3193 fires, so in case of a tbf re-use we can safely schedule a PACKET UPLINK ACK/NACK prior PACKET DOWNLINK ASSIGNMENT.
So we more or less want to send "PACKET DOWNLINK ASSIGNMENT" and the "PACKET UPLINK ACK" at the same time (with more DL tbfs/traffic this is getting more likely) but currently we will always prefer the PACKET DOWNLINK ASSIGNMENT. This means that the uplink will starve (e.g. the window stalled, rh->si means that the uplink will starve (the window will stall, rh->si being set, etc).
once we sent PACKET DOWNLINK ASSIGNMENT we can ACK the uplink blocks right afterwards. this should resolve the stalled window just a bit later. this is my theory, but your tests showed that this does not work as i would expect.
On Mon, Nov 25, 2013 at 08:24:27AM +0100, Andreas Eversberg wrote:
generally there is always a re-send due to delay between pcu and the radio interface. if all rlc/mac data blocks of an llc frame have been sent to the phone, a packet downlink ack control block is requested and scheduled. the pcu repeats all unacknowledged data blocks in a loop until it receives that requested control block. this way the pcu starts re-sending data blocks that might got lost, before it actually knows if and which blocks still need to be resend. (this procedure is described in TS 04.60.)
Are you referring to 9.1.3.1 "Acknowledge state array V(B) for GPRS TBF Mode"?
" If there are no further RLC data blocks available for transmission (i.e. the RLC data block with BSN= V(S) does not exist), the sending side shall transmit the oldest RLC data block whose corresponding element in V(B) has the value PENDING_ACK, then the next oldest block whose corresponding element in V(B) has the value PENDING_ACK, etc. "
With statistics in place it appears that for CS4 close to 50% of the sent RLC blocks are re-sends. Is that what you expected when implementing it?
holger
Holger Hans Peter Freyther wrote:
Are you referring to 9.1.3.1 "Acknowledge state array V(B) for GPRS TBF Mode"?
yes, the protocol sends all unacknowledged blocks in a loop until all blocks are acknowledged and the tbf is complete.
With statistics in place it appears that for CS4 close to 50% of the sent RLC blocks are re-sends. Is that what you expected when implementing it?
if there is a continuous download (opening a web page), there should be almost no re-sends unless the window stalls or the download is complete. in case of a ping with several small tbfs, there are re-sends due to the delay between PCU and the radio interface: the PCU sends the final block of a tbf (polling bit set) and then waits for PACKET DOWNLINK ACK/NACK. higher delay will cause more re-sends for each tbf at the end, so smaller tbf will cause more percentage of re-sends.
osmocom-net-gprs@lists.osmocom.org