On 30/01/2019 14:43, Neels Hofmeyr wrote:
I asked around and one solution I heard is to poll the
CTRL interface once per
second and aggregate min/max values before sending on to statistics once per
minute. That could be considered close enough, but we can do better.
So I do that,
except actually poll the vty, (yes, I know...) and much
less often.. I actually only poll channels in use every 60 seconds.
which gives a kind of idea of general usage patterns during the day, but
is rather useless for detecting how often we experience saturation.
Polling momentary values has holes in it, and doesn't scale well. If, e.g., one
new channel request comes in while at the same time another channel is
released, we might for a short time hit a situation of no more channels being
available, and the polling might just miss that and would show more available
channels than we factually had. If hypothetically scaling up such a situation:
we might actually have turned down 5 channel requests while the polled number
still shows available lchans at all times. So, we should allow:
So, in order to compensate somewhat for what I just described, I poll
the "no channel" counter and that gives me an idea of how many chan
requests were rejected in the period. I only do this every 5 mins though.
OpenBSC# show statistics
Channel Requests : 1 total, 0 no channel
^^^^^^^^^^^ this one.
One idea would be to push out a new number as soon as channel availability
changes, but that again doesn't scale well (might generate too many events when
monitoring a large number of cells).
Yes, I think so.
So I'm thinking that we should aggregate the minimum-available lchan counts
within osmo-bsc per stats timeframe.
There should be separate minimum-available numbers for each lchan kind.
yep. that would be great.
In general I have a very basic to zero knowledge of the "science" of
stats collection, KPI etc.
But I imagine the industry has a standard? Maybe we can follow it?
I will take a look at the KPI talk from OsmoDevCon again
https://media.ccc.de/v/SE8HRK
I guess minimum-available is more useful than
maximum-used lchans, but we could
also provide both.
I do think there's something to be said for counters that count "error"
situations, like no chan available, then you know that this happened,
without trying to constantly count channels in use and then having to be
concerned about that micro-second between chan release and chan request
that may or may not overlap. - actually I do not know how big that
window is, maybe more than a uSecond :)
Also, if I want to find out how many lchans I need to
add to provide adequate
service, it would be good to somehow determine the maximum number of
"concurrent" turned down channel requests.
Maybe "what was the duration of complete saturation" might be a good
question.
I'll try to come up with a list of "questions" like that.