Attention is currently required from: osmith. neels has posted comments on this change. ( https://gerrit.osmocom.org/c/osmo-bsc/+/30307 )
Change subject: fsms: use configurable timers instead of T23042 ......................................................................
Patch Set 1:
(7 comments)
Commit Message:
https://gerrit.osmocom.org/c/osmo-bsc/+/30307/comment/59b6a3be_4d7e915c PS1, Line 9: It looks like T23042 was a placeholder for timers to be filled in later, i can confirm that it actually 100% was nothing but a placeholder, "obvious" when familiar with use of the numbers 23 and 42 as placeholders (like "foo" and "bar"). I mean to say, you can just write that it was a placeholder period.
https://gerrit.osmocom.org/c/osmo-bsc/+/30307/comment/6612b98d_409e6c9d PS1, Line 15: Previous timeout for the states was 5s. ", from using the default of 5 for undefined timers."
https://gerrit.osmocom.org/c/osmo-bsc/+/30307/comment/5983b7af_1f86ab81 PS1, Line 20: ll "immediately"
https://gerrit.osmocom.org/c/osmo-bsc/+/30307/comment/81aef53b_86ea2464 PS1, Line 28: * Use new X29, 10s (5s from X5 + 5s from X6) I think the right thing here is to remove this timeout; this needs no timeout at all because we can rely on the lchan_fsm to either return HO_EV_LCHAN_ACTIVE or HO_EV_LCHAN_ERROR after the usual timeouts set for lchan activation. IOW since it is internal to osmo-bsc one of the two events is guaranteed to occur.
If we superimpose a timer on top of the lchan timeouts, configuring larger lchan activation timeouts becomes complex, because the user has to take care to also allow a larger timeout for the same procedure during HO.
https://gerrit.osmocom.org/c/osmo-bsc/+/30307/comment/98dc5864_97cc46af PS1, Line 46: * Use new X30, 5s i thiiink that this timeout should also be dropped, because again we have existing timeouts for the indivdiual steps this is waiting for, and we would introduce a secondary timeout on top of the existing ones.
but it's not so clear to see through the guaranteed handling of all failure cases of those individual steps.
WAIT_LCHAN_ESTABLISHED waits for both RSL/RR and RTP/MGW to be complete. If RSL/RR is done first, this state waits for RTP/MGW to also complete, before we go on to mark the new lchan as the primary lchan for the subscriber connection. (it helps a bit to look at the handover.png chart, `make -C osmo-bsc/doc/ msc`)
- RSL/RR should be guarded by T3103 above, it is completed on RR HO Complete.
- RTP/MGW is guarded by the lchan_rtp_fsm. If that fails/times out, it dispatches LCHAN_EV_RTP_ERROR to the lchan_fsm, then - lchan_fail()/_lchan_on_activation_error() dispatches HO_EV_LCHAN_ERROR to the handover_fsm. - And, if I see correctly, there is a bug: handover_fsm does not permit HO_EV_LCHAN_ERROR in this state HO_ST_WAIT_LCHAN_ESTABLISHED.
I'm afraid we need to check whether we have a ttcn3 test for an RTP/MGW aka MGCP timeout during handover, or need to create one if it is missing. Then if I'm correct we should see during that test a log error like "HO_EV_LCHAN_ERROR not permitted" in state HO_ST_WAIT_LCHAN_ESTABLISHED.
So if I'm correct, we add HO_EV_LCHAN_ERROR to the in_event_mask, make that event trigger a ho_fail(), and then don't need X30.
Could you check on that? Feel free to nudge if you'd like some support with this.
Patchset:
PS1: On a closer look I notice that two of the timers should probably not be there at all, which i apparently didn't figure out properly when i wrote the ho fsm...
File src/osmo-bsc/net_init.c:
https://gerrit.osmocom.org/c/osmo-bsc/+/30307/comment/f6df55f5_a64e175e PS1, Line 77: { .T = -30, .default_val = 5, .desc = "Timeout for establishing new lchan at the end of handover" }, (if X30 needs to stay, i'd put a desc = "Timeout for establishing MGW endpoint for handover target lchan")