Dear all,
this is a post about https://osmocom.org/issues/2362 in which I try to resolve the "naming fuck-up" regarding the automatic export of rate_ctr.
In short: * CTRL uses '.' to separate individual elements/nodes of the command * a lot of rate_ctr groups we use so far use '.' in their group names (e.g. 'bssgp.bss_ctx') and counter names (e.g. 'tx.bytes') * This makes the counters un-exportable via CTRL
I have a libosmocore patch that verifies the validity of counter names (to not contain spaces or dots) at registration time. This way we can catch any application with erroneous behaviour.
However, this causes the following problem: New libosmocore versions will make old apps crash or at least fail to have propre counters, as their names are wrong :(
Also, I don't like to replace all '.' in the counter with '_', as in seen in the example of 'bssgp.bss_ctx' there is a semantic difference. 'bssgp.' is denoted to indicate the code module from which the counter is, and 'bss_ctx' denotes the specific object whose counters we refer to. Manually replacing this with 'bssgp_bss_ctx' is ugly.
One alternative would be to replace '.' with ':' or '/', which are not characters with a special functoin in the CTRL syntax. This could be even done automatically at counter registration time, simply replace all occurrences of '.' with ':' (and log a warning as a reminder to fix the code?).
I prefer that more, but the question is which characters should be reserved for CTRL syntactic purpoess, and which are free to use by the applications to name elements of the CTRL string/node.
Any input to this? I personally would go for ':'. As CTRL is very loosely modelled after sysfs, ':' also occurs frequently in sysfs names such as '/sys/bus/usb/devices/1-3:2.0'
So we'd keep the '.' as path/node separator, and we use ':' as separator inside counter (and possibly other) names, which is opaque to CTRL.
Opinions?
Should we further restrict the CTRL interface strings (and those of systems exporting to CTRL) to standard US 7-bit ASCII with a limited set of special characters such as ":-_@" but prevent any non-printable chars or special chars like "{|}()~[]^`'?<>=;/+*&%$#!"? I would support such a motion. We could even make it more general and use that for all our identifier strings and have a general validation function that all code modules call whenever validating a string identifier used, such as e.g. osmo_fsm names, etc.
On Tue, Oct 03, 2017 at 08:49:04AM +0800, Harald Welte wrote:
- CTRL uses '.' to separate individual elements/nodes of the command
- a lot of rate_ctr groups we use so far use '.' in their group names (e.g. 'bssgp.bss_ctx') and counter names (e.g. 'tx.bytes')
- This makes the counters un-exportable via CTRL
I'm thinking, we could make the current rate_ctr_group_alloc() replace '.' with ':' automatically to fix all older and current builds' CTRL for rate_ctr. At the same time we can deprecate rarate_ctr_group_alloc() and provide rate_ctr_group_alloc2(), which then aborts the program when rate_ctr names don't adhere to strict identifier rules. Thus we get a compile-time warning for applications that haven't reviewed correctness of rate_ctr naming ("rate_ctr_group_alloc() is deprecated...") and once an application has moved on we make sure no new non-compliant names are introduced. However, newly introduced names will get checked only during run time.
Another addition could be a script like verify_value_string_arrays_are_terminated.py but for struct rate_ctr_group_desc arrays' identifier adherence, to verify even before compilation.
Should we further restrict the CTRL interface strings (and those of systems exporting to CTRL) to standard US 7-bit ASCII with a limited set of special characters such as ":-_@" but prevent any non-printable chars
For identifier validation, we should probably include that identifiers must not start with a number.
I think we could also include ",~+^"
as pseudocode:
valid = isalpha(str[0]) && each(str[1..len] as c){ isalnum(c) || c in ":-_@,~+^" }
Losely related is the OsmoHLR idea for CTRL interface to use names like
SET subscriber.by-imsi.1234567898765.ps-enabled 1
where a CTRL identifier name is used for inputting an IMSI string. If we do the same identifier checking during incoming CTRL interface commands, we would forbid the IMSI name because it starts with a number.
Thinking now that we could go for something like
subscriber.by-imsi=1234567898765.ps-enabled 1 or maybe rather subscriber.by-imsi-1234567898765.ps-enabled 1
I would have liked to use existing CTRL parsing to separate out the IMSI number without added char parsing... but it's not too hard to add a ctrl_cmd_lookup() that checks starts_with('by-imsi-') and then takes &str[8].
~N
Hi Neels,
On Wed, Oct 04, 2017 at 12:48:48AM +0200, Neels Hofmeyr wrote:
On Tue, Oct 03, 2017 at 08:49:04AM +0800, Harald Welte wrote:
- CTRL uses '.' to separate individual elements/nodes of the command
- a lot of rate_ctr groups we use so far use '.' in their group names (e.g. 'bssgp.bss_ctx') and counter names (e.g. 'tx.bytes')
- This makes the counters un-exportable via CTRL
I'm thinking, we could make the current rate_ctr_group_alloc() replace '.' with ':' automatically to fix all older and current builds' CTRL for rate_ctr.
Has been implemented in https://gerrit.osmocom.org/#/c/4131
At the same time we can deprecate rarate_ctr_group_alloc() and provide rate_ctr_group_alloc2(), which then aborts the program when rate_ctr names don't adhere to strict identifier rules. Thus we get a compile-time warning for applications that haven't reviewed correctness of rate_ctr naming ("rate_ctr_group_alloc() is deprecated...") and once an application has moved on we make sure no new non-compliant names are introduced. However, newly introduced names will get checked only during run time.
I think this is too much effort. We can easily grep through all our applications and replace the names without missing something. External (non-Osmocom) users have never approached us with any feedback, so I doubt they even exist.
Another addition could be a script like verify_value_string_arrays_are_terminated.py but for struct rate_ctr_group_desc arrays' identifier adherence, to verify even before compilation.
Also probably overkill? I mean, an unterminated value_string_array will end up in a bug. But a counter name with '.' will only result in slight increased memory use (once, at startup, when doing the . -> : conversion).
Should we further restrict the CTRL interface strings (and those of systems exporting to CTRL) to standard US 7-bit ASCII with a limited set of special characters such as ":-_@" but prevent any non-printable chars
For identifier validation, we should probably include that identifiers must not start with a number.
Why is that?
I think we could also include ",~+^"
In the permitted characters? I'm not sure, I think permitting too many special characters just constrains what we can do in terms of core syntax of interfaces like CTRL later on...
valid = isalpha(str[0]) && each(str[1..len] as c){ isalnum(c) || c in ":-_@,~+^" }
My current proposal is in https://gerrit.osmocom.org/#/c/4128
Losely related is the OsmoHLR idea for CTRL interface to use names like
SET subscriber.by-imsi.1234567898765.ps-enabled 1
where a CTRL identifier name is used for inputting an IMSI string. If we do the same identifier checking during incoming CTRL interface commands, we would forbid the IMSI name because it starts with a number.
I don't think there's something wrong with that. Why no number at first digit?
On Thu, Oct 05, 2017 at 08:58:19AM +0800, Harald Welte wrote:
Why no number at first digit?
Hmm, no real reason now that I think of it. In programming languages the requirement exists to easily distinguish identifiers from numeric constants. I was going with what I know from elsewhere...
But indeed in the CTRL there are no numeric constants to distinguish from.
~N
It seems like there's bunch of corner cases in CTRL protocol due underspecification. Would it make sense to formally specify it in smth like GNU bison and generate parsers automatically instead of relying on manual parsing?
This would probably allow us to automatically generate docs for it as well.
On 03.10.2017 02:49, Harald Welte wrote:
Opinions?
Should we further restrict the CTRL interface strings (and those of systems exporting to CTRL) to standard US 7-bit ASCII with a limited set of special characters such as ":-_@" but prevent any non-printable chars or special chars like "{|}()~[]^`'?<>=;/+*&%$#!"? I would support such a motion. We could even make it more general and use that for all our identifier strings and have a general validation function that all code modules call whenever validating a string identifier used, such as e.g. osmo_fsm names, etc.
On Wed, Oct 04, 2017 at 05:29:17PM +0200, Max wrote:
It seems like there's bunch of corner cases in CTRL protocol due underspecification. Would it make sense to formally specify it in smth like GNU bison and generate parsers automatically instead of relying on manual parsing?
I don't think this is a useful investment of our time now. If somebody finds time/funding/resources for any major reimplementations in this area, it should be spent on a unified CTRL+VTY configuration system - which for sure will [have to have] a proper parser.
This would probably allow us to automatically generate docs for it as well.
I don't really see how we can automatically generate documentation? You would still register individual control commands/nodes from application code, and not from the generated parser in libosmocore.