Hello GSM community,
I realize that most of you over in Osmocom land would much rather see me submit Gerrit patches than write lengthy ML posts, but right now I really need some help with the algorithmic logic of a feature before I can develop patches implementing said feature - so please bear with me.
The fundamental question is: what is the most correct way for a GSM network (let's ignore divisions between network elements for the moment) to construct the DL speech frame stream for call leg B if it is coming from the UL of call leg A? I am talking about call scenarios where call leg A and call leg B use the same codec, thus no transcoding is done (TrFO), and let me also further restrict this question to old-style FR/HR/EFR codecs, as opposed to AMR.
At first the answer may seem so obvious that many people will probably wonder why I am asking such a silly question: just take the speech frame stream from call leg A UL, feed it to call leg B DL and be done with it, right? But the question is not so simple. What should the UL-to-DL mapper do when the UL stream hits a BFI instead of a valid speech frame? What should this mapper do if call leg A does DTXu but there is no DTXd on call leg B?
The only place in 3GPP specs where I could find an answer to this question is TS 28.062 section C.3.2.1.1. Yes, I know that it's the spec for in-band TFO within G.711, a feature which I reason no one other than me probably cares about, but that particular section - I am talking about section C.3.2.1.1 specifically, you can ignore the rest of TFO for the purpose of this question - seems to me like it should apply to _any_ scenario where an FR/HR/EFR frame stream is directly passed from call leg A to call leg B without transcoding, including scenarios like a self-contained Osmocom network with OsmoMSC switching from one MS to another without any external MNCC.
Let us first consider the case of FR1 codec, which is the simplest. Suppose call leg A has DTXu but call leg B has no DTXd - one can't do DTXd on C0, so if 200 kHz of spectrum is all you got, operating a BTS with just C0, then no one can do DTXd. When Alice on call leg A is silent, her MS will send a SID every 480 ms and have its Tx off the rest of the time, and the frame stream from the BTS serving her call leg will exhibit a SID frame in every 24th position and BFI placemarkers in all other positions.
So what should the DL frame stream going to Bob look like in this scenario? My reading of section C.3.2.1.1 (second paragraph from the top is the one that covers this scenario) tells me that the *network* (set aside the question of which element) is supposed to turn that stream of BFIs with occasional interspersed SIDs into a stream of valid *speech* frames going to Bob, a stream of valid speech frames representing comfort noise as produced by a network-located CN generator. The spec says in that paragraph: "The Downlink TRAU Frames shall not contain the SID codeword, but parameters that allow a direct decoding."
Needless to say, there is no code anywhere in Osmocom currently that does the above, thus current Osmocom is not able to produce the fancy TrFO behavior which the spec(s) seem to call for. (I said "spec(s)" vaguely because I only found a spec for TFO, not for TrFO, but I don't see any reason why this aspect of TFO spec shouldn't also apply to TrFO when the actual problem at hand is exactly the same.)
But no no no guys, I am *not* bashing Osmocom here, I am seeking to improve it! As it happens, fully implementing the complete set of TS 28.062 section C.3.2.1.1 rules (I shall hereafter call them C3211 rules for short) for the original FR1 codec would be quite easy, and I already have a code implementation which I am eyeing to integrate into Osmocom. Themyscira libgsmfrp is a FLOSS library that implements a complete, spec-compliant Rx DTX handler for FR1, and it is 100% my own original work, not based on ETSI or TI or any other sources, thus no silly license issues - and I am eyeing the idea of integrating the same functions, appropriately renamed, repackaged and re-API-ed, into libosmocodec, and then invoking that functionality in OsmoBTS, in the code path that goes from RTP Rx to feeding TCH DL to PHY layers.
But while FR1 is easy, doing the same for EFR is where the real difficulty lies, and this is the part where I come to the community for help. The key diff between FR1 and EFR that matters here is how their respective Rx DTX handlers are defined in the specs: for FR1 the Rx DTX handler is a separate piece, with the interface from this Rx DTX handler to the main body of the decoder being another 260-bit FR1 frame (this time without possibility of SID or BFI), and the specs for DTX (06.31 plus 06.11 and 06.12) define and describe the needed Rx DTX handler in terms of emitting that secondary 260-bit FR1 frame. Thus implementing this functionality in Themyscira libgsmfrp was a simple matter of taking the logic described in the specs and turning it into code.
But for EFR the specs do not define the Rx DTX handler as a separate piece, instead it is integrated into the guts of the full decoder. There is a decoder, presented as published C source from ETSI, that takes a 244-bit EFR frame, which can be either speech or SID, *plus* a BFI flag as input, and emits a block of 160 PCM samples as output - all Rx DTX logic is buried inside, intertwined with the actual speech decoder operation, which is naturally quite complex.
I've already spent a lot of time looking at the reference C implementation of EFR from ETSI - I kinda had to, as I did the rather substantial work of turning it into a usable function library, with state structures and a well-defined interface instead of global vars and namespace pollution - the result is Themyscira libgsmefr - but I am still nowhere closer to being able to implement C3211 functionality for this codec.
The problem is this: starting with a EFR SID frame and previous history of a few speech frames (the hangover period), how would one produce output EFR speech frames (not SID) that represent comfort noise, as C3211 says is required? We can all easily look at ETSI's original code that generates CN as part of the standard decoder: but that code generates linear PCM output, not secondary EFR speech frames that represent CN. There is the main body of the speech decoder, and there are conditions throughout that slightly modify this decoder logic in subtle ways for CN generation and/or for ECU-style substitution/muting - but no guidance for how one could construct "valid speech" EFR frames that would produce a similar result when fed to the standard decoder in the MS after crossing radio leg B.
This is where I could really use some input from more senior and more knowledgeable GSM-ers: does anyone know how mainstream commercial GSM infra vendors (particularly "ancient" ones of pure T1/E1 TDM kind) have solved this problem? What do _they_ do in the scenario of call leg A with DTXu turning into call leg B without DTXd?
Given that those specs were written in the happy and glorious days when everyone used 2G, when GSM operators had lots of spectrum, and when most networks operated large multi-ARFCN BTSes with frequency hopping, I figure that almost everyone probably ran with DTXd enabled when that spec section was written - hence if I wonder if the authors of the TFO spec failed to appreciate the magnitude of what they were asking implementors to do when they stipulated that a UL-to-DL mapping from DTXu-on to DTXd-off "shall" emit no-SID speech frames that represent TFO-TRAU-generated CN. And if I wonder if the actual implementors ignored that stipulation even Back In The Day...
Here is one way how we might be able to "cheat" - what if we implement a sort of fake DTXd in OsmoBTS for times when real DTXd is not possible because we only have C0? Here is what I mean: suppose the stream of TCH frames about to be sent to the PHY layer (perhaps the output of my proposed, to-be-implemented UL-to-DL mapper) is the kind that would be intended for DTXd-enabled DL in the original GSM architecture, with all speech pauses filled with repeated SIDs, every 20 ms without fail. A traditional DTXd BTS is supposed to transmit only those SIDs that either immediately follow a speech frame or fall in the SACCH-aligned always-Tx position, and turn the Tx off at other times. We can't actually turn off Tx at those "other" times when we are C0 - but what if we create a "fake DTXd" effect by transmitting a dummy FACCH containing an L2 fill frame at exactly the same times when we would do real DTXd if we could? The end effect will be that the spec-based Rx DTX handler in the MS will "see" the same "thing" as with real DTXd: receiving FACCH in all those "empty" 20 ms frame windows will cause that spec-based Rx DTX handler to get BFI=1, exactly the same as if radio Tx were truly off and the MS were listening to radio noise.
Anyway, I would love to hear other people's thoughts on these ideas, especially if someone happens to know how traditional GSM infra vendors handled those pesky requirements of TS 28.062 section C.3.2.1.1 for UL-to-DL mapping.
Sincerely, Your GSM-obsessed Mother Mychaela
*DTX*
I am not very familiar with DTX, so I may have some of these wrong:
In practice, I guess one way is to either have DTX support everywhere, or disable DTX support everywhere?
At the CCC congress, we had a transcoding step between GSM and the phone operating center. Makes me wonder, the DL voice probably never saw DTX, because I doubt the conversion from "plain audio" to our GSM codec would feature DTX?
I'm also thinking, the ability to do DTX maybe should also be a part of SDP codec negotiation? i.e. enable DTX only when both call legs support it? I wonder if there is a fmtp parameter for DTX, or maybe we should invent one.
I haven't thought about this before, so details are welcome.
*TLDR mode: enable*
On another note: Mychaela, if I may ask you, please, to help us communicate, try to write less text to say the same things. It is in your interest and an important part of community netiquette to make it easy for others to respond to your messages. Aim for writing a technical "TL;DR" section instead of a text flow. It may take more time and it may hurt a bit to edit things out, but it is worth it N times over, N being the number of readers. Thanks!
"If I had more time, I would have written a shorter letter." -- Blaise Pascal, but in French, 1657
~N
Hi Neels,
In practice, I guess one way is to either have DTX support everywhere, or disable DTX support everywhere?
There is DTXd for DL and DTXu for UL, and they are configured separately in OsmoBSC - but each of the two is global per BTS. Plus this added complication: DTXu is always allowed and almost always desired, but DTXd is only possible if your BTS has more carriers than just C0.
At the CCC congress, we had a transcoding step between GSM and the phone operating center. Makes me wonder, the DL voice probably never saw DTX, because I doubt the conversion from "plain audio" to our GSM codec would feature DTX?
That conversion is called the encoder, and it can operate with DTX enabled or disabled. I don't know how you had yours configured.
I'm also thinking, the ability to do DTX maybe should also be a part of SDP codec negotiation? i.e. enable DTX only when both call legs support it? I wonder if there is a fmtp parameter for DTX, or maybe we should invent one.
Can you please keep Osmocom CN compatible with traditional non-SDP interfaces? I currently connect my PSTN gateway to OsmoMSC via MNCC, I don't use SDP, and I don't want to. Please keep this option available.
SDP aside, if you do what you propose (enable DTX only when both sides support it), the result will be that DTXu will be disabled whenever the far end does not have DTXd, which will be 100% of the time with hobby networks operating in tiny slivers of spectrum. The effect of disabling DTXu will be that the MS will needlessly burn its battery. TL;DR: bad idea.
M~
Just sharing my general impression that I'm a bit out of my depth discussing DTX, and I currently have no hands left to juggle another ball. But I'm happy to help improve specific implementations, when it is clear what we want.
~N
Hi Neels,
Just sharing my general impression that I'm a bit out of my depth discussing DTX, and I currently have no hands left to juggle another ball. But I'm happy to help improve specific implementations, when it is clear what we want.
This patch is a small step toward proper handling of leg A UL to leg B DL mapping:
https://gerrit.osmocom.org/c/libosmocore/+/32184
but it has this other patch as prerequisite:
https://gerrit.osmocom.org/c/libosmocore/+/32183
but it got side-tracked by prefix naming issues; I added this third patch to my submission in hope of a solution:
https://gerrit.osmocom.org/c/libosmocore/+/32197
but based on the feedback I am getting, I may have gone the wrong way there. I added one more comment under that last 32197 patch, and right now it would be most helpful if someone senior (ideally Harald) could read that last comment and decide which way we should go. On this pesky naming issue I'll happily go with whatever the community decides (I am after functionality, not naming), but I am just trying to get everyone on the same page and understanding the implications - it will still get pulled into the namespace whether it's in a separate header or not.
M~
On Wed, Apr 05, 2023 at 07:33:16AM -0800, Mychaela Falconia wrote:
pesky naming issue I'll happily go with whatever the community decides (I am after functionality, not naming)
Naming is hard, but naming is also fundamentally extremely important. It makes a big difference in code maintenance and in humans understanding APIs and implementations.
Let's discuss all else in the gerrit patch conversations...
~N