Dear Mauro,
my apologies for not following-up sooner. Seems like my workload at the moment doesn't permit me to participate effectively in detail :/
On Tue, Feb 27, 2024 at 02:38:11PM -0000, Mauro Levra wrote:
As you know, the current definition of the GSMTAP header has a version field that could enable new versions to be safely introduced. Unfortunately, Wireshark decoder does not check the value of the version field, it just presents the value in the decoded output. It seems that the only way to define a new GSMTAP version that does not break Wireshark decoder is to preserve the initial 16 header octets.
First of all: My apologies for not making things more future-proof in the beginning. GSMTAP initially was just a local hack for use within a very small group of people around open source GSM; it was not clear at the time we'd be looking at a more frequently used things. So a lot of it happened very ad-hoc.
As the patch for just decoding v2 has now been merged to wireshark, I think we could deviate from that.
However, there will always be people with older versions of wireshark for years to come, so having some kind of user experience that avoids giving them completely garbled output.
I guess we can achieve that two ways:
1) by keeping the full header as you suggest here, or
2) by making sure that the version and type fields retain teir position, where version would be '3' and the type would use something that is not defined for GSMTAPv2, resulting in none of the existing decoders being called for a GSMTAPv3 packet arriving at an old wireshark version
When hdr_len is greater than 4 (the length is 4 * 4 = 16 octets), it is possible to store additional information in the header, without breaking any compliant decoder implementation.
the maximum size would be 255 * 4 = 1020 bytes. This would only work if the combination of all hypothetical future additional header fields will always be below that limit. It would also mean that the actual payload data cannot be encoded in the TLV structure itself, but would come after the TLV section.
So while that doesn't sound like much of a constraint now, I'd like to prevent creating yet the next limitation that we have to work around later.
The length field allows decoders to skip unknown tags. To improve storage performance, a subset of tags for values with a length less or equal to 2 octets can be defined (something like this was proposed in 2012 during the GSMTAPv3 session). Like this:
Sure, it can be done - but I'm not sure this is worth the effort.
Finally, an additional bit can be used to distinguish between official Osmocom tag definitions, and application specific tags, that are valid only within the boundaries of a specific implementation and could be used to embed metadata that has not yet been assigned an official tag.
This is of course a bit of a difficult topic. In general we'd want to avoid vendor-specific stuff as it adds incompatibilities with other implementations, which will fragment the tools that can work with a given capture file, etc.
In general, there's multiple ways of addressing that:
1) splitting the entire range by one bit. I think that's too many vendor-specific fields.
2) having few 'reserved for local use' values, like DLT_USER in pcap
3) having a full-blown mechanism for any vendor to add their own tag definitions, like for example in the IPFIX/NETFLOW format.
4) having some kind of registry or fast enough process where vendors can register/allocate tags for their use cases. Similar to how IANA operates for registering port numbers or other fields in other protocols.
I'd personally prefer '3' or '4' in that those approaches at least avoid multiple different vendors from allocating the same tag for different purpose.
It's been decades sicne I looked at it, but AFAIR the IPFIX works in a way that there are universal tags, and there are enterprise-specific tags. The latter are prefixed with the enterprise number allocated at https://www.iana.org/assignments/enterprise-numbers/
The universal tags can be short, and the enterprise-specific tags are longer. This means that standard messages are more efficient, whcich is good.
The alternative to such a scheme would be '3': To operate a git repository like we do for the Openmoko OUI/MAC and USB Product ID ranges [1] with a set of guidelienes. Eveyone could then file a pull request to allocate some new field if they fulfill the conditions (which should include some kind of specification as to the name, purpose and encoding of the field).
Regards, Harald
[1] https://github.com/openmoko/openmoko-usb-oui