channel statistics - OpenBSC

31 Jan 2019


      After 35c3 and talking about statistics, it has become apparent to me that we
lack a good way of monitoring channel usage / availability; TL;DR: I think we
need min/max aggregators that sync with the stats push/poll time period.
This is just an idea I'm getting, not planning on implementing anything now,
nor have I really done my homework and tried to achieve useful stats. Has
anyone discussed this before / created an issue / solved it in a different way?
Here goes...
IIUC we have counters that we can poll/push in a given time period, so that the
data we get out makes sense and has no "holes" or "overlaps" in it.
However, we also have highly volatile numbers that are extremely interesting
for an operator to see: how many channels of which kind are currently still
available?
I asked around and one solution I heard is to poll the CTRL interface once per
second and aggregate min/max values before sending on to statistics once per
minute. That could be considered close enough, but we can do better.
Polling momentary values has holes in it, and doesn't scale well. If, e.g., one
new channel request comes in while at the same time another channel is
released, we might for a short time hit a situation of no more channels being
available, and the polling might just miss that and would show more available
channels than we factually had. If hypothetically scaling up such a situation:
we might actually have turned down 5 channel requests while the polled number
still shows available lchans at all times. So, we should allow:
- seeing peak usage
- in a pushing-stats fashion
- that is still useful when sampled only, say, once per minute.
One idea would be to push out a new number as soon as channel availability
changes, but that again doesn't scale well (might generate too many events when
monitoring a large number of cells).
So I'm thinking that we should aggregate the minimum-available lchan counts
within osmo-bsc per stats timeframe.
There should be separate minimum-available numbers for each lchan kind.
I guess minimum-available is more useful than maximum-used lchans, but we could
also provide both.
Also, if I want to find out how many lchans I need to add to provide adequate
service, it would be good to somehow determine the maximum number of
"concurrent" turned down channel requests. We probably already have that in a
per-second moving average? But here again, if I sample a per-second moving
average only once per minute, I will miss the maximum value that this
per-second value has reached in that minute. This probably also needs a
think-over from a practical "I want useful stats" POV.
Or am I missing something?
Thanks,
~N
-- 
- Neels Hofmeyr nhofmeyr@sysmocom.de          http://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschäftsführer / Managing Directors: Harald Welte