Hi all,
T200 is the LAPDm re-transmission timer, after which the sender assumes
a L2 frame transmitted was lost and starts with recovery.
Until recently, OsmoBTS ignored the T200 values that the BSC specified
via OML, always falling back to the (relatively long) T200 values that
exist in libosmocore (1s for the main channel, 2s for the associated
channel).
As you know, OsmoBTS is used in production in this configuration and we
never recevied any associated bug reports.
Recently, in e9f12acbeb5a369282719f8e0deecc88034a5488 I started to use
the T200 values as communicated from the BSC via OML. This is what
proprietary BTSs like the BS-11, nanoBTS etc. (supposedly) have been
using all the time anyway, so I thought of it as a bug fix.
However, as it turns out, it breaks our LAPDm implementation in many
ways. The LAPDm performance gets that bad that you cannot even transmit
a single SMS anymore, and even LU only occasionally work anymore.
You can see the erroneous behavior in the attached PCAP file showing
OsmoBTS-generated GSMTAP and RSL. Also attached are log file outputs of
LAPDm logging for mo-sms and mt-sms. In them you can find troubling
lines like 'S frame response with F=1 error' which should never
happen...
I suspect two issues related to this:
1) Our lapdm.c code uses regular osmo_timer_* functions to determine
once T200 expires, rather than a GSM frame number time-base.
This wouldn't be a problem in a synchronous real-time environemnt.
However, in OsmoBTS (as in OsmocomBB), there is a relatively long and
queue / delay between the point where a frame is pulled out of the
bottom end of LAPDm and actual transmission on the radio interface.
However, LAPDm T200 starts ticking from the point the frame was
pulled out, rather than from the point transmission started.
In order to change this, I suggest that we either change LAPDm timers
to work on frame numbers passed up from L1 via every L1SAP primitive
(comparing RTS.ind for downlink vs. DATA.ind for uplink), or simply
keeping a per-PHY/per-TRX measurement of that 'round trip time
between the actual radio and the L2'. We can then compensate for
this by adding it to T200.
I briefly tried this here, where one given hardware indicated about
56ms round-trip-time (13 frames difference between RTS.ind and
DATA.ind). However, it didn't help. Even compensating by 120ms was
not sufficient. This needs to be revisited.
2) I think libosmogsm LAPDm implementation is actually buggy,
specifically in situations where T200 expires. We don't see that
often, as the 1s/2s is so long that in reality it rarely happens.
Once the T200 value is reduced, the probability of running into T200
expiration increases, and so does the probability of seeing related
problems.
Our existing lapdm unit tests don't seem to cover timing related
behavior, so that definitely is something tha appears to need
improvement.
Any help on those issues is appreciated.
Meanwhile, I decided to revert back to the libosmogsm T200 defaults by
the means of commit 3ca59512d2f4eb1f87699e8fada67f33674918b4
Regards,
Harald
--
- Harald Welte <laforge(a)gnumonks.org>
http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
(ETSI EN 300 175-7 Ch. A6)