From andreas at eversberg.eu Tue Nov 5 06:55:29 2013 From: andreas at eversberg.eu (Andreas Eversberg) Date: Tue, 05 Nov 2013 07:55:29 +0100 Subject: Dropping LLC frames due the TBF destruction In-Reply-To: <20131027111242.GN6295@xiaoyu.lan> References: <20131027111242.GN6295@xiaoyu.lan> Message-ID: <52789661.1040502@eversberg.eu> Holger Hans Peter Freyther wrote: > * there is no indication to the SGSN for dropped frames/octets? > * can't the DL-TBF be re-associated/updated after the assignment > is done again? > hi holger, there is one issue about failed downlink TBF: the PCU does not do any retry. when the assignment is sent to the phone and it fails, there is no retry according to the specs. i assume that this is a task for the SGSN. osmo-sgsn/openggsn does not resend any LLC frame in case of LLC-DISCARDED message, if i look at the source code of libosmogb. (LLC-DISCARDED is forwarded at bssgp_rx_llc_disc() of gprs_bssgp.c, but i don't see where it is handled.) what actually happens is that TCP layer will do the resend, but this is a problem, especially at the border of coverage where many downlink assignments get lost. i have once enabled the debugging at ipaccess BTS and saw that the PCU itself does the retry of lost downlink assignments (4 or 5 times). regards, andreas From andreas at eversberg.eu Tue Nov 5 07:14:05 2013 From: andreas at eversberg.eu (Andreas Eversberg) Date: Tue, 05 Nov 2013 08:14:05 +0100 Subject: Losing ACKs due TLLI changes? In-Reply-To: <20131027123313.GO6295@xiaoyu.lan> References: <20131027123313.GO6295@xiaoyu.lan> Message-ID: <52789ABD.5030509@eversberg.eu> Holger Hans Peter Freyther wrote: > <0007> gprs_rlcmac_meas.cpp:103 UL RSSI of TLLI=0x88661bc6: -67 dBm > <0002> bts.cpp:945 Got ACK, but UL TBF is gone TLLI=0xe512eba3 > <0007> gprs_rlcmac_meas.cpp:158 DL packet loss of IMSI=274080000004765 / TLLI=0xe512eba3: 0% > <0002> tbf.cpp:668 TBF TFI=0 TLLI=0x88661bc6 T3169 timeout during transsmission > <0002> tbf.cpp:690 - Assignment was on PACCH > <0002> tbf.cpp:694 - No uplink data received yet > > So there is an ACK for TLLI=0xe512eba3 but at the same time the tlli > 0x88661bc6 is timing out. > > PCU->SGSN TLLI=0x88661bc6 Attach Request > SGSN->PCU TLLI=0x88661bc6 Attach Accept (new P-TMSI) > PCU->SGSN TLLI=0xe512eba3 Attach Complete (new TLLI) (6s later) > hi holger, can you provide the complete log, starting with the attach request? i looked at the gprs_rlcmac_rcv_control_block() function which already handles some TLLI change, but i am not sure if there is a bug or if there must be some other way to handle TLLI change. regards, andreas From andreas at eversberg.eu Tue Nov 5 07:39:49 2013 From: andreas at eversberg.eu (Andreas Eversberg) Date: Tue, 05 Nov 2013 08:39:49 +0100 Subject: Losing ACKs due TLLI changes? In-Reply-To: <20131030111731.GA23718@xiaoyu.lan> References: <20131027123313.GO6295@xiaoyu.lan> <20131030111731.GA23718@xiaoyu.lan> Message-ID: <5278A0C5.2000000@eversberg.eu> Holger Hans Peter Freyther wrote: > + printf("%s TLLLI changed...... 0x%08x->0x%08x\n", > + tbf_name(this), m_tlli, tlli); > + > + if (direction == GPRS_RLCMAC_DL_TBF) { > + gprs_rlcmac_tbf *ul_tbf; > + ul_tbf = bts->tbf_by_tlli(m_tlli, GPRS_RLCMAC_UL_TBF); > + > + if (ul_tbf) > + ul_tbf->m_tlli = tlli; > + } > + > oops, i am sorry. i did not read all the follow-ups before answering your message... the update of both DL and UL TBFs seems to be a good solution. in the function above you wrote this comment: |* TODO: There could be multiple DL and UL TBFs and we should * have a proper way to link all the related TBFs so we can do * a group update. how can we have multiple TBFs for a single direction? i don't see anything like this in the specs. | -------------- next part -------------- An HTML attachment was scrubbed... URL: From hfreyther at sysmocom.de Tue Nov 5 08:01:27 2013 From: hfreyther at sysmocom.de (Holger Hans Peter Freyther) Date: Tue, 5 Nov 2013 09:01:27 +0100 Subject: Dropping LLC frames due the TBF destruction In-Reply-To: <52789661.1040502@eversberg.eu> References: <20131027111242.GN6295@xiaoyu.lan> <52789661.1040502@eversberg.eu> Message-ID: <20131105080127.GB9642@xiaoyu.lan> On Tue, Nov 05, 2013 at 07:55:29AM +0100, Andreas Eversberg wrote: dear andreas, as usual there is more than one dimension to this problem. 1.) Dropping LLC data with no indication. When trying to understand a problem (e.g. Samsung3 traffic stalls) one needs to analyze what is going wrong. The "contract" for the SGSN/PCU is that a LLC frame will be either forwarded to the MS or a discarded message will be sent (frames, octets, etc..). BUT with the current code this contract is broken. One simply can not know if the LLC frame has actually reached the phone... 2.) Retry or not to retry. > there is one issue about failed downlink TBF: the PCU does not do any > retry. when the assignment is sent to the phone and it fails, there is > no retry according to the specs. i assume that this is a task for the > SGSN. osmo-sgsn/openggsn does not resend any LLC frame in case of > LLC-DISCARDED message, if i look at the source code of libosmogb. > (LLC-DISCARDED is forwarded at bssgp_rx_llc_disc() of gprs_bssgp.c, but > i don't see where it is handled.) Well, retry or not to retry is an architectural decision. The question is what to do with the frames we are discarding? -- - Holger Freyther http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Schivelbeiner Str. 5 * 10439 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Holger Freyther, Harald Welte From hfreyther at sysmocom.de Tue Nov 5 08:04:28 2013 From: hfreyther at sysmocom.de (Holger Hans Peter Freyther) Date: Tue, 5 Nov 2013 09:04:28 +0100 Subject: Losing ACKs due TLLI changes? In-Reply-To: <52789ABD.5030509@eversberg.eu> References: <20131027123313.GO6295@xiaoyu.lan> <52789ABD.5030509@eversberg.eu> Message-ID: <20131105080428.GC9642@xiaoyu.lan> On Tue, Nov 05, 2013 at 08:14:05AM +0100, Andreas Eversberg wrote: Dear Andreas, > can you provide the complete log, starting with the attach request? i > looked at the gprs_rlcmac_rcv_control_block() function which already > handles some TLLI change, but i am not sure if there is a bug or if > there must be some other way to handle TLLI change. everything that is needed to understand/reproduce the issue has been posted. There is now even a unit test for this behavior. holger -- - Holger Freyther http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Schivelbeiner Str. 5 * 10439 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Holger Freyther, Harald Welte From laforge at gnumonks.org Tue Nov 5 16:41:58 2013 From: laforge at gnumonks.org (Harald Welte) Date: Tue, 5 Nov 2013 17:41:58 +0100 Subject: Dropping LLC frames due the TBF destruction In-Reply-To: <52789661.1040502@eversberg.eu> References: <20131027111242.GN6295@xiaoyu.lan> <52789661.1040502@eversberg.eu> Message-ID: <20131105164158.GM12353@nataraja.gnumonks.org> Hi Andreas, On Tue, Nov 05, 2013 at 07:55:29AM +0100, Andreas Eversberg wrote: > there is one issue about failed downlink TBF: the PCU does not do any > retry. when the assignment is sent to the phone and it fails, there is > no retry according to the specs. i assume that this is a task for the > SGSN. osmo-sgsn/openggsn does not resend any LLC frame in case of > LLC-DISCARDED message, if i look at the source code of libosmogb. > (LLC-DISCARDED is forwarded at bssgp_rx_llc_disc() of gprs_bssgp.c, but > i don't see where it is handled.) > > what actually happens is that TCP layer will do the resend, but this is > a problem, especially at the border of coverage where many downlink > assignments get lost. I'm not sure if I'm mixing up things, but we have three potential protocol layers that can take of reliable delivery: 1) the RLC/MAC layer between PCU and MS 2) the LLC protocol 3) the TCP layer inside user IP From andreas at eversberg.eu Thu Nov 7 06:44:40 2013 From: andreas at eversberg.eu (Andreas Eversberg) Date: Thu, 07 Nov 2013 07:44:40 +0100 Subject: Dropping LLC frames due the TBF destruction In-Reply-To: <52789661.1040502@eversberg.eu> References: <20131027111242.GN6295@xiaoyu.lan> <52789661.1040502@eversberg.eu> Message-ID: <527B36D8.7010001@eversberg.eu> On Tue, Nov 05, 2013 at 07:55:29AM +0100, Andreas Eversberg wrote: dear andreas, as usual there is more than one dimension to this problem. 1.) Dropping LLC data with no indication. When trying to understand a problem (e.g. Samsung3 traffic stalls) one needs to analyze what is going wrong. The "contract" for the SGSN/PCU is that a LLC frame will be either forwarded to the MS or a discarded message will be sent (frames, octets, etc..). BUT with the current code this contract is broken. One simply can not know if the LLC frame has actually reached the phone... 2.) Retry or not to retry. > there is one issue about failed downlink TBF: the PCU does not do any > retry. when the assignment is sent to the phone and it fails, there is > no retry according to the specs. i assume that this is a task for the > SGSN. osmo-sgsn/openggsn does not resend any LLC frame in case of > LLC-DISCARDED message, if i look at the source code of libosmogb. > (LLC-DISCARDED is forwarded at bssgp_rx_llc_disc() of gprs_bssgp.c, but > i don't see where it is handled.) Well, retry or not to retry is an architectural decision. The question is what to do with the frames we are discarding? -- - Holger Freyther http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Schivelbeiner Str. 5 * 10439 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Holger Freyther, Harald Welte From andreas at eversberg.eu Thu Nov 7 07:35:49 2013 From: andreas at eversberg.eu (Andreas Eversberg) Date: Thu, 07 Nov 2013 08:35:49 +0100 Subject: Dropping LLC frames due the TBF destruction In-Reply-To: <20131105080127.GB9642@xiaoyu.lan> References: <20131027111242.GN6295@xiaoyu.lan> <52789661.1040502@eversberg.eu> <20131105080127.GB9642@xiaoyu.lan> Message-ID: <527B42D5.7080500@eversberg.eu> (just ignore my last mail. somehow i hit the wrong key...) hi holger, hi harald, i looked at the TS 08.18. the LLC-DISCARDED.ind message does not contain any reference to the frame(s) that has been discarded, so SGSN cannot resend specific frames that are dropped. also a resend does not seem to be specified. it is task of the PCU to be sure to deliver LLC frame. during a downlink TBF, lost downlink blocks are resent by the RLC/MAC protocol, but assignment is not. (at least not correctly) here is my suggestion: if assignment fails, the PCU should resend downlink TBF assignment several times. the number of retries should be a config option. it may happen that the assignment on PACCH fails, while the MS is in packet transmode. if the MS switches back to packet idle mode in the meantime, the retry must be performed on AGCH. it may also happen that the assignment on AGCH fails because the MS is establishing an uplink TBF. if this happens, the retry must be performed on PACCH. if the number of maximum retries is reached, the LLC frame (or list of) should be discarded and the an LLC-DISCARDED.ind message should be sent to SGSN. regards, andreas From laforge at gnumonks.org Thu Nov 7 07:50:58 2013 From: laforge at gnumonks.org (Harald Welte) Date: Thu, 7 Nov 2013 08:50:58 +0100 Subject: Dropping LLC frames due the TBF destruction In-Reply-To: <527B42D5.7080500@eversberg.eu> References: <20131027111242.GN6295@xiaoyu.lan> <52789661.1040502@eversberg.eu> <20131105080127.GB9642@xiaoyu.lan> <527B42D5.7080500@eversberg.eu> Message-ID: <20131107075058.GF639@nataraja.gnumonks.org> On Thu, Nov 07, 2013 at 08:35:49AM +0100, Andreas Eversberg wrote: > i looked at the TS 08.18. the LLC-DISCARDED.ind message does not contain > any reference to the frame(s) that has been discarded, so SGSN cannot > resend specific frames that are dropped. also a resend does not seem to > be specified. it is task of the PCU to be sure to deliver LLC frame. Thanks for your research and analysis. -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From hfreyther at sysmocom.de Mon Nov 11 19:24:08 2013 From: hfreyther at sysmocom.de (Holger Hans Peter Freyther) Date: Mon, 11 Nov 2013 20:24:08 +0100 Subject: Coverity issues in gsm_rlcmac.cpp Message-ID: <20131111192408.GD30839@xiaoyu.lan> Dear Ivan, could you please have a look at the coverity issues in the gsm_rlcmac.cpp routines? Uninitialized scalar variable: gsm_rlcmac.cpp:5321 ar.direction not initialized gsm_rlcmac.cpp:5039 ar.direction not initialized gsm_rlcmac.cpp:5155 ar.direction not initialized gsm_rlcmac.cpp:4872 ar.direction not initialized Just initialize it in csnStreamInit? Out-of-bounds read: gsm_rlcmac.cpp:5502 " Overrunning array "data->RLC_DATA" of 20 bytes at byte offset 22 using index "i" (which evaluates to 22)." gsm_rlcmac.cpp:5440 " Overrunning array "data->RLC_DATA" of 20 bytes at byte offset 22 using index "i" (which evaluates to 22)." Maybe just add an assert that dataNumOctets <= 20? -- - Holger Freyther http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Schivelbeiner Str. 5 * 10439 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Holger Freyther, Harald Welte From Ivan.Kluchnikov at fairwaves.ru Tue Nov 12 13:29:57 2013 From: Ivan.Kluchnikov at fairwaves.ru (Ivan Kluchnikov) Date: Tue, 12 Nov 2013 17:29:57 +0400 Subject: Coverity issues in gsm_rlcmac.cpp In-Reply-To: <20131111192408.GD30839@xiaoyu.lan> References: <20131111192408.GD30839@xiaoyu.lan> Message-ID: Hi Holger, 2013/11/11 Holger Hans Peter Freyther : > > Uninitialized scalar variable: > gsm_rlcmac.cpp:5321 ar.direction not initialized > gsm_rlcmac.cpp:5039 ar.direction not initialized > gsm_rlcmac.cpp:5155 ar.direction not initialized > gsm_rlcmac.cpp:4872 ar.direction not initialized > > Just initialize it in csnStreamInit? Yes. > > Out-of-bounds read: > gsm_rlcmac.cpp:5502 " Overrunning array "data->RLC_DATA" of 20 bytes > at byte offset 22 using index "i" (which evaluates to 22)." > > gsm_rlcmac.cpp:5440 " Overrunning array "data->RLC_DATA" of 20 bytes > at byte offset 22 using index "i" (which evaluates to 22)." > > Maybe just add an assert that dataNumOctets <= 20? Yes, it makes sense. -- Regards, Ivan Kluchnikov. http://fairwaves.ru From Ivan.Kluchnikov at fairwaves.ru Tue Nov 12 13:44:43 2013 From: Ivan.Kluchnikov at fairwaves.ru (Ivan Kluchnikov) Date: Tue, 12 Nov 2013 17:44:43 +0400 Subject: Coverity issues in gsm_rlcmac.cpp In-Reply-To: References: <20131111192408.GD30839@xiaoyu.lan> Message-ID: I will prepare patch for this issues soon. 2013/11/12 Ivan Kluchnikov : > Hi Holger, > > 2013/11/11 Holger Hans Peter Freyther : >> >> Uninitialized scalar variable: >> gsm_rlcmac.cpp:5321 ar.direction not initialized >> gsm_rlcmac.cpp:5039 ar.direction not initialized >> gsm_rlcmac.cpp:5155 ar.direction not initialized >> gsm_rlcmac.cpp:4872 ar.direction not initialized >> >> Just initialize it in csnStreamInit? > > Yes. > >> >> Out-of-bounds read: >> gsm_rlcmac.cpp:5502 " Overrunning array "data->RLC_DATA" of 20 bytes >> at byte offset 22 using index "i" (which evaluates to 22)." >> >> gsm_rlcmac.cpp:5440 " Overrunning array "data->RLC_DATA" of 20 bytes >> at byte offset 22 using index "i" (which evaluates to 22)." >> >> Maybe just add an assert that dataNumOctets <= 20? > > Yes, it makes sense. > > > > > -- > Regards, > Ivan Kluchnikov. > http://fairwaves.ru -- Regards, Ivan Kluchnikov. http://fairwaves.ru From dwillmann at sysmocom.de Mon Nov 18 15:56:23 2013 From: dwillmann at sysmocom.de (Daniel Willmann) Date: Mon, 18 Nov 2013 16:56:23 +0100 Subject: gprs_rlcmac_received_lost() in gprs_rlcmac_meas.cpp Message-ID: <20131118155623.GA7003@adrastea.totalueberwachung.de> Hello Andreas, while looking through the osmo-pcu code to figure out why some connections stall we had some problems making sense of the elapsed time calculations in gprs_rlcmac_meas.cpp. Could you confirm/deny my assumptions about it or explain the idea behind it? > gettimeofday(&now_tv, NULL); > elapsed = ((now_tv.tv_sec - loss_tv->tv_sec) << 7) > + ((now_tv.tv_usec - loss_tv->tv_usec) << 7) / 1000000; I assume here you're calculating the duration of the measurement period so far and since you want to have sub-second accuracy you multiply everything with 128. Why 128? Is it becasue it simplifies the throughput calculation? (tbf->meas.dl_bw_octets/elapsed in gprs_rlcmac_dl_bw()) > if (elapsed < 128) > return 0; Is the intention here that the duration of the measurements is supposed to be one second? So every second these measurements are printed out and reset? Regards Daniel Willmann -- - Daniel Willmann http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Schivelbeiner Str. 5 * 10439 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Holger Freyther, Harald Welte From andreas at eversberg.eu Tue Nov 19 05:27:07 2013 From: andreas at eversberg.eu (Andreas Eversberg) Date: Tue, 19 Nov 2013 06:27:07 +0100 Subject: gprs_rlcmac_received_lost() in gprs_rlcmac_meas.cpp In-Reply-To: <20131118155623.GA7003@adrastea.totalueberwachung.de> References: <20131118155623.GA7003@adrastea.totalueberwachung.de> Message-ID: <528AF6AB.9070102@eversberg.eu> Daniel Willmann wrote: >> > gettimeofday(&now_tv, NULL); >> > elapsed = ((now_tv.tv_sec - loss_tv->tv_sec) << 7) >> > + ((now_tv.tv_usec - loss_tv->tv_usec) << 7) / 1000000; >> > I assume here you're calculating the duration of the measurement period > so far and since you want to have sub-second accuracy you multiply > everything with 128. Why 128? > Is it becasue it simplifies the throughput calculation? > (tbf->meas.dl_bw_octets/elapsed in gprs_rlcmac_dl_bw()) > hi daniel, yes. first i wanted something more accurate than one second, so i multiplied the elapsed time with 128. 128 bytes are 1024 bits, so it simplifies the calculation for kbits/s. > >> > if (elapsed < 128) >> > return 0; >> > Is the intention here that the duration of the measurements is supposed > to be one second? So every second these measurements are printed out and > reset? > after every transmitted downlink frame the gprs_rlcmac_dl_bw() function is called. if at least one second is elapsed, the throughput of the time that has been elapsed is printed out. it is just a simple solution without the requirement for a timer for each TBF. regards, andreas From hfreyther at sysmocom.de Sun Nov 24 18:05:35 2013 From: hfreyther at sysmocom.de (Holger Hans Peter Freyther) Date: Sun, 24 Nov 2013 19:05:35 +0100 Subject: Summary of osmo-pcu failures/defects Message-ID: <20131124180535.GA8222@xiaoyu.lan> Hi, I have added rate counters to the PCU and noticed that a lot of frames in the DL are re-sent without there actually being any NACK or transmission errors. I have also re-produced a "stall" by just using ping with a big enough data size and a low enough interval (the PCU appears to transmit the ICMP PING but the MS never gets to send a PONG packet). In both cases it could be a scheduling issue (e.g. we don't schedule the UL Packet ACK/NACK). Is any of you aware of it? I had already highlighted that the scheduling is not fair. I wonder if we are seeing starvation (and will re-factor to be able to unit test this). In terms of scheduling. For the SBA we are already having a reservation that we try to honor in multiple places. Maybe it is time to extend it and have more reservations? holger -- - Holger Freyther http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Schivelbeiner Str. 5 * 10439 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Holger Freyther, Harald Welte From andreas at eversberg.eu Mon Nov 25 07:24:27 2013 From: andreas at eversberg.eu (Andreas Eversberg) Date: Mon, 25 Nov 2013 08:24:27 +0100 Subject: Summary of osmo-pcu failures/defects In-Reply-To: <20131124180535.GA8222@xiaoyu.lan> References: <20131124180535.GA8222@xiaoyu.lan> Message-ID: <5292FB2B.6060308@eversberg.eu> Holger Hans Peter Freyther wrote: > Hi, > > I have added rate counters to the PCU and noticed that a lot of frames > in the DL are re-sent without there actually being any NACK or > transmission errors. dear holger, i can only guess what causes the issues you describe. so i suggest that we should do a debug session at the congress, if you attend to it and find the time. generally there is always a re-send due to delay between pcu and the radio interface. if all rlc/mac data blocks of an llc frame have been sent to the phone, a packet downlink ack control block is requested and scheduled. the pcu repeats all unacknowledged data blocks in a loop until it receives that requested control block. this way the pcu starts re-sending data blocks that might got lost, before it actually knows if and which blocks still need to be resend. (this procedure is described in TS 04.60.) > I have also re-produced a "stall" by just using > ping with a big enough data size and a low enough interval (the PCU > appears to transmit the ICMP PING but the MS never gets to send a PONG > packet). > could this be an mtu issue at the phone? does the phone requests an uplink tbf to send a pong? > In both cases it could be a scheduling issue (e.g. we don't schedule > the UL Packet ACK/NACK). in case of an ongoing uplink tbf, we ack/nack received or timed out uplink rlc/mac blocks by using a control block. even if there is an ongoing downlink tbf, all control blocks have priority over data blocks. > Is any of you aware of it? I had already > highlighted that the scheduling is not fair. I wonder if we are seeing > starvation (and will re-factor to be able to unit test this). > > In terms of scheduling. For the SBA we are already having a reservation > that we try to honor in multiple places. Maybe it is time to extend it > and have more reservations? > uplink control messages are reserved in the tbf object (poll_fn). the sba control messages are reserved in a separate structure, since sba are not (or not yet) related to a tbf. regards, andreas From hfreyther at sysmocom.de Mon Nov 25 08:21:08 2013 From: hfreyther at sysmocom.de (Holger Hans Peter Freyther) Date: Mon, 25 Nov 2013 09:21:08 +0100 Subject: Summary of osmo-pcu failures/defects In-Reply-To: <5292FB2B.6060308@eversberg.eu> References: <20131124180535.GA8222@xiaoyu.lan> <5292FB2B.6060308@eversberg.eu> Message-ID: <20131125082108.GK8222@xiaoyu.lan> On Mon, Nov 25, 2013 at 08:24:27AM +0100, Andreas Eversberg wrote: > could this be an mtu issue at the phone? does the phone requests an > uplink tbf to send a pong? the first 10 pings going through and then nothing coming back? I don't think it is a MTU issue. I don't know what the root cause is but the structure of the code doesn't make it easy to find it. > > In both cases it could be a scheduling issue (e.g. we don't schedule > > the UL Packet ACK/NACK). > in case of an ongoing uplink tbf, we ack/nack received or timed out > uplink rlc/mac blocks by using a control block. even if there is an > ongoing downlink tbf, all control blocks have priority over data blocks. As I have already pointed out. Some control blocks can starve. I don't know if it does but it is where I will continue to have a look. > > Is any of you aware of it? I had already > > highlighted that the scheduling is not fair. I wonder if we are seeing > > starvation (and will re-factor to be able to unit test this). > > > > In terms of scheduling. For the SBA we are already having a reservation > > that we try to honor in multiple places. Maybe it is time to extend it > > and have more reservations? > > > uplink control messages are reserved in the tbf object (poll_fn). the > sba control messages are reserved in a separate structure, since sba are > not (or not yet) related to a tbf. well. In the tbf you have: else if (bts->sba()->find(trx->trx_no, ts, (fn + 13) % 2715648)) LOGP(DRLCMAC, LOGL_DEBUG, "Polling cannot be " "sheduled, because single block alllocation " "already exists\n"); But there is nothing that checks if another TBF is setting poll_fn to a frame that another tbf is already polling. -- - Holger Freyther http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Schivelbeiner Str. 5 * 10439 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Holger Freyther, Harald Welte From hfreyther at sysmocom.de Mon Nov 25 23:18:03 2013 From: hfreyther at sysmocom.de (Holger Hans Peter Freyther) Date: Tue, 26 Nov 2013 00:18:03 +0100 Subject: Summary of osmo-pcu failures/defects In-Reply-To: <20131125082108.GK8222@xiaoyu.lan> References: <20131124180535.GA8222@xiaoyu.lan> <5292FB2B.6060308@eversberg.eu> <20131125082108.GK8222@xiaoyu.lan> Message-ID: <20131125231803.GB18797@xiaoyu.lan> On Mon, Nov 25, 2013 at 09:21:08AM +0100, Holger Hans Peter Freyther wrote: > As I have already pointed out. Some control blocks can starve. I don't > know if it does but it is where I will continue to have a look. Here is the starvation theory for the ping: We have a LLC frame (or many queued up)... we schedule the polls and at the final_ack indicate we decide to re-use the TBF. This means that we will schedule another PACKET DOWNLINK ASSIGNMENT. But at the same time we either want to honor the "rh->si" or want to schedule the ACK due SEND_ACK_AFTER_FRAMES. So we more or less want to send "PACKET DOWNLINK ASSIGNMENT" and the "PACKET UPLINK ACK" at the same time (with more DL tbfs/traffic this is getting more likely) but currently we will always prefer the PACKET DOWNLINK ASSIGNMENT. This means that the uplink will starve (e.g. the window stalled, rh->si means that the uplink will starve (the window will stall, rh->si being set, etc). Fairness improvements: * put ul_ass_tbf, dl_ass_tbf, ul_ack_tbf in an array and store the last index in the PDCH and iterate over it. * move the TBF to the front of the ul_tbfs and dl_tbfs list so that it is not selected again. -- - Holger Freyther http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Schivelbeiner Str. 5 * 10439 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Holger Freyther, Harald Welte From hfreyther at sysmocom.de Tue Nov 26 08:22:56 2013 From: hfreyther at sysmocom.de (Holger Hans Peter Freyther) Date: Tue, 26 Nov 2013 09:22:56 +0100 Subject: Summary of osmo-pcu failures/defects In-Reply-To: <5292FB2B.6060308@eversberg.eu> References: <20131124180535.GA8222@xiaoyu.lan> <5292FB2B.6060308@eversberg.eu> Message-ID: <20131126082256.GH32012@xiaoyu.lan> On Mon, Nov 25, 2013 at 08:24:27AM +0100, Andreas Eversberg wrote: > generally there is always a re-send due to delay between pcu and the > radio interface. if all rlc/mac data blocks of an llc frame have been > sent to the phone, a packet downlink ack control block is requested and > scheduled. the pcu repeats all unacknowledged data blocks in a loop > until it receives that requested control block. this way the pcu starts > re-sending data blocks that might got lost, before it actually knows if > and which blocks still need to be resend. (this procedure is described > in TS 04.60.) Are you referring to 9.1.3.1 "Acknowledge state array V(B) for GPRS TBF Mode"? " If there are no further RLC data blocks available for transmission (i.e. the RLC data block with BSN= V(S) does not exist), the sending side shall transmit the oldest RLC data block whose corresponding element in V(B) has the value PENDING_ACK, then the next oldest block whose corresponding element in V(B) has the value PENDING_ACK, etc. " With statistics in place it appears that for CS4 close to 50% of the sent RLC blocks are re-sends. Is that what you expected when implementing it? holger -- - Holger Freyther http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Schivelbeiner Str. 5 * 10439 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Holger Freyther, Harald Welte From andreas at eversberg.eu Wed Nov 27 07:49:17 2013 From: andreas at eversberg.eu (Andreas Eversberg) Date: Wed, 27 Nov 2013 08:49:17 +0100 Subject: Summary of osmo-pcu failures/defects In-Reply-To: <20131125231803.GB18797@xiaoyu.lan> References: <20131124180535.GA8222@xiaoyu.lan> <5292FB2B.6060308@eversberg.eu> <20131125082108.GK8222@xiaoyu.lan> <20131125231803.GB18797@xiaoyu.lan> Message-ID: <5295A3FD.10708@eversberg.eu> Holger Hans Peter Freyther wrote: > Here is the starvation theory for the ping: > > We have a LLC frame (or many queued up)... we schedule the polls and > at the final_ack indicate we decide to re-use the TBF. This means that > we will schedule another PACKET DOWNLINK ASSIGNMENT. But at the same > time we either want to honor the "rh->si" or want to schedule the ACK > due SEND_ACK_AFTER_FRAMES. > the idea behind priority of PACKET DOWNLINK ASSIGNMENT is that i do not want the phone to switch back to idle mode. if i would ACK all uplink blocks, the MS might switch back to idle mode immediately and will never receive the PACKET DOWNLINK ASSIGNMENT. when there was an ongoing downlink tbf, the phone keeps in transfer mode until T3193 fires, so in case of a tbf re-use we can safely schedule a PACKET UPLINK ACK/NACK prior PACKET DOWNLINK ASSIGNMENT. > So we more or less want to send "PACKET DOWNLINK ASSIGNMENT" and the > "PACKET UPLINK ACK" at the same time (with more DL tbfs/traffic this > is getting more likely) but currently we will always prefer the PACKET > DOWNLINK ASSIGNMENT. This means that the uplink will starve (e.g. the > window stalled, rh->si means that the uplink will starve (the window > will stall, rh->si being set, etc). > once we sent PACKET DOWNLINK ASSIGNMENT we can ACK the uplink blocks right afterwards. this should resolve the stalled window just a bit later. this is my theory, but your tests showed that this does not work as i would expect. From andreas at eversberg.eu Wed Nov 27 08:06:00 2013 From: andreas at eversberg.eu (Andreas Eversberg) Date: Wed, 27 Nov 2013 09:06:00 +0100 Subject: Summary of osmo-pcu failures/defects In-Reply-To: <20131126082256.GH32012@xiaoyu.lan> References: <20131124180535.GA8222@xiaoyu.lan> <5292FB2B.6060308@eversberg.eu> <20131126082256.GH32012@xiaoyu.lan> Message-ID: <5295A7E8.4010406@eversberg.eu> Holger Hans Peter Freyther wrote: > Are you referring to 9.1.3.1 "Acknowledge state array V(B) for GPRS > TBF Mode"? > yes, the protocol sends all unacknowledged blocks in a loop until all blocks are acknowledged and the tbf is complete. > With statistics in place it appears that for CS4 close to 50% of the > sent RLC blocks are re-sends. Is that what you expected when implementing > it? > if there is a continuous download (opening a web page), there should be almost no re-sends unless the window stalls or the download is complete. in case of a ping with several small tbfs, there are re-sends due to the delay between PCU and the radio interface: the PCU sends the final block of a tbf (polling bit set) and then waits for PACKET DOWNLINK ACK/NACK. higher delay will cause more re-sends for each tbf at the end, so smaller tbf will cause more percentage of re-sends.