OS#3764: backward compatibility: It's not just API

This is merely a historical archive of years 2008-2021, before the migration to mailman3.

A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.

Harald Welte laforge at gnumonks.org
Tue Jan 22 21:12:36 UTC 2019


I've been investigating today what is the cause of build failures like this:
https://build.opensuse.org/package/live_build_log/network:osmocom:latest/osmo-sgsn/Debian_9.0/x86_64

where "make check" suddenly fails with things like

[   80s] -==> got signal NS_UNBLOCK, NS-VC 0x2001/1.2.3.4:1111
[   80s] -
[   80s]  MESSAGE to BSS at 0x01020304:1111, msg length 1
[   80s]  07
[   80s]
[   80s] +==> got signal NS_UNBLOCK, NS-VC 0x2001/1.2.3.4:1111
[   80s] +
[   80s]  result (UNBLOCK) = 1

as the diff.  So clearly some re-ordering happening here.

This appears when building the "latest" osmo-sgsn version (1.3.0) against a
current libosmocore.  As libosmocore remains API compatible, this is building
fine, as expected.

However, ther is commit 

commit 797558ea1768e464f9559c5f7a4f3f4285c5de25
Author: Stefan Sperling <ssperling at sysmocom.de>
Date:   Mon Nov 19 17:18:41 2018 +0100

    send NS_POUT_UNBLOCK_ACK before signalling S_NS_UNBLOCK

    In gprs_ns_process_msg(), we were dispatching the S_NS_UNBLOCK
    signal before sending out the NS_POUT_UNBLOCK_ACK message.

    Signal handlers might send messages to the other side, assuming
    that NS is now unblocked. However, since such messages will arrive
    before the UNBLOCK_ACK message the receiver might discard them.
    This problem has been observed with our TTCN3 BSSGP_Emulation
    as a peer to osmo-pcu.

    This patch makes TTCN3 PCU TC_paging() test pass regardless of
    whether the test or osmo-pcu is started first. Before this patch,
    this test would only pass if the test was started before osmo-pcu.
    A remaining problem is that the test does not yet keep passing
    reliably unless osmo-pcu is restarted between test runs.

    Change-Id: I3af54a14bb6bcfa167c9a9d9f67835e7f5b9f1bb
    Related: OS#2890
    Related: OS#2388

which is a fix, and is appreciated.  However, it causes breakage, at
least in unit tests that rely on testing the old behavior, such as the gbproxy
test.

This immediately showed up, and one day later we have the following commit

commit eefb70df2c6aa7b1362d5e8decf52af3ff95ecb9
Author: Stefan Sperling <ssperling at sysmocom.de>
Date:   Tue Nov 20 11:33:17 2018 +0100

    update gbproxy test expected output

    libosmocore commit 797558ea1768e464f9559c5f7a4f3f4285c5de25
    changed the order of NS_UNBLOCK_ACK transmission dispatching
    of the NS_UNBLOCK signal. Update expected output of gbproxy
    tests accordingly to make these tests pass again.

    Change-Id: Ia3df811755b1c88cf7a27a466677b24a6c32fd8e
    Related: OS#2388

which is making sure the test passes from now onwards, but which of course
doesn't do anything about older, already-released versions of osmo-sgsn, which
will all fail to pass "make check" - forever.  There may be people who want
to rely on older versions of osmo-sgsn in the future, and they will think
they have non-working software if "make check" doesn't pass :/

So what do we take from this?  Be careful with changing existing library code
semantics.  Even if the function signature is not modified, there are existing
applications and/or test cases that make assumptions.  And we should not break
those assumptions.

If something like this shows up ever again, I think the correct approach is to

1) keep the old semantics as it is, making sure that old applications/tests
   will continue to work/pass, even on modern library versions or master of the day

2) introduce a new function/symbol implementing the new semantics.  Or, in this
   case (as it's some library-internal function dispatching events/signals), probably
   some kind of flag is needed.  New code, depending on new library
   versions, may then set that flag and elicit the new behavior, while old code
   gets the old behavior.

Please let's make sure we don't fall into the same trap in the future again.

The process of avoiding this can be helped by having automatic build
testing of old/previous applications against "master of the day" of the
libraries.  Let's track the latter in #3765.

Happy hacking,
	Harald

-- 
- Harald Welte <laforge at gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)



More information about the OpenBSC mailing list