Dear Jolly,
from a quick look it appears to be that a very long running connection could overflow lchan->s to a negative number. Could you either make this code robust or explain why it is not needed? E.g. if we assume that this happens once per multiframe the counter will overflow within 1.3 hours?
holger
Holger Hans Peter Freyther wrote:
Dear Jolly,
from a quick look it appears to be that a very long running connection could overflow lchan->s to a negative number. Could you either make this code robust or explain why it is not needed? E.g. if we assume that this happens once per multiframe the counter will overflow within 1.3 hours?
holger
hi holger,
lchan->s must never raise above btsb->radio_link_timeout. look at the commit:
+ /* count up radio link counter S */ + lchan->s += 2; + if (lchan->s > btsb->radio_link_timeout) + lchan->s = btsb->radio_link_timeout;
if there would be no limit, a loss of link might also take hours until detected.
have i overseen something?
regards,
andreas
On Tue, Mar 12, 2013 at 10:13:01AM +0100, jolly wrote:
dear jolly,
- /* count up radio link counter S */
- lchan->s += 2;
- if (lchan->s > btsb->radio_link_timeout)
- lchan->s = btsb->radio_link_timeout;
that is what I was missing when browsing the code in gtik. I think there is a small glitch the other way around.
a.) We get criteria value == 0 through OML (for whatever reason)
+ btsb->radio_link_timeout = val[1]
b.) lchan_activate sets lchan->s == 0
+ lchan->s = btsb->radio_link_timeout;
c.) lchan->s -= 1
+ /* count down radio link counter S */ + lchan->s--;
d.) if (lchan->s == 0)
Will not be true as the lchan->s is -1. Is it intended that setting a '0' will disable the link failure checking?
kind regards holger
On Tue, Mar 12, 2013 at 01:24:44PM +0100, Holger Hans Peter Freyther wrote:
Good Morning,
that is what I was missing when browsing the code in gtik. I think there is a small glitch the other way around.
a.) We get criteria value == 0 through OML (for whatever reason)
btsb->radio_link_timeout = val[1]
b.) lchan_activate sets lchan->s == 0
lchan->s = btsb->radio_link_timeout;
c.) lchan->s -= 1
/* count down radio link counter S */
lchan->s--;
d.) if (lchan->s == 0)
Will not be true as the lchan->s is -1. Is it intended that setting a '0' will disable the link failure checking?
the easiest way to avoid the problem of the number to leave the range of 0 to radio_link_timeout is to use the OSMO_MAX and OSMO_MIN (now that these macros are fixed).
Underflow: lchan->s = OSMO_MAX(lchan->s - 1, 0);
Overflow: lchan->s = OSMO_MIN(lchan->s + 2, btsb->radio_link_timeout);
E.g. the following code (I only compiled it):
diff --git a/src/osmo-bts-sysmo/l1_if.c b/src/osmo-bts-sysmo/l1_if.c index df660c5..16375ef 100644 --- a/src/osmo-bts-sysmo/l1_if.c +++ b/src/osmo-bts-sysmo/l1_if.c @@ -684,7 +684,7 @@ static int handle_ph_data_ind(struct femtol1_hdl *fl1, GsmL1_PhDataInd_t *d /* process radio link timeout coniter S */ if (data_ind->msgUnitParam.u8Size == 0) { /* count down radio link counter S */ - lchan->s--; + lchan->s = OSMO_MAX(lchan->s - 1, 0); DEBUGP(DMEAS, "counting down radio link counter S=%d\n", lchan->s); if (lchan->s == 0) @@ -694,9 +694,7 @@ static int handle_ph_data_ind(struct femtol1_hdl *fl1, GsmL1_PhDataInd_t *d } if (lchan->s < btsb->radio_link_timeout) { /* count up radio link counter S */ - lchan->s += 2; - if (lchan->s > btsb->radio_link_timeout) - lchan->s = btsb->radio_link_timeout; + lchan->s = OSMO_MIN(lchan->s + 2, btsb->radio_link_timeout); DEBUGP(DMEAS, "counting up radio link counter S=%d\n", lchan->s); }
Holger Hans Peter Freyther wrote:
the easiest way to avoid the problem of the number to leave the range of 0 to radio_link_timeout is to use the OSMO_MAX and OSMO_MIN (now that these macros are fixed).
Underflow: lchan->s = OSMO_MAX(lchan->s - 1, 0);
It avoids the problem, but why is it safe?
The mere possibility that the counter can become negative suggests an imbalance somewhere in the code - but I don't know if that's the case, which is why I ask.
//Peter
On Wed, Mar 13, 2013 at 05:02:02PM +0100, Peter Stuge wrote:
It avoids the problem, but why is it safe?
The mere possibility that the counter can become negative suggests an imbalance somewhere in the code - but I don't know if that's the case, which is why I ask.
Exactly, the point is that the current code is broken. And in general using OSMO_MIN/OSMO_MAX is better than hand rolling code. And if a range is from a to b it is better to clamp it on both sides that just one.
For this specific issue. I wouldn't mind making the value '0' illegal, GSM 12.21 doesn't specify the legal range and as '0' and '1' will behave the same with the fixed code it does make sense to forbid '0'.
Anyway, the current code is broken so Andreas please fix it ASAP.
holger
On Tue, Mar 12, 2013 at 09:08:21AM +0100, Holger Hans Peter Freyther wrote:
from a quick look it appears to be that a very long running connection could overflow lchan->s to a negative number. Could you either make this code robust or explain why it is not needed? E.g. if we assume that this happens once per multiframe the counter will overflow within 1.3 hours?
clang lately got a nice feature for integer over/under-flow detection[1] during runtime, but of course it makes the resulting code extremely slow. I am not sure if it is usable in this case.
Kind regards, -Alexander Huemer
On Tue, Mar 12, 2013 at 10:25:09AM +0100, Alexander Huemer wrote:
clang lately got a nice feature for integer over/under-flow detection[1] during runtime, but of course it makes the resulting code extremely slow. I am not sure if it is usable in this case.
Ah nice. I started to use clang3.3 over the weekend for -fsanitize=address, didn't know about the integer instrumentation.
thank you for the pointer holger