Change in osmo-bsc[master]: stats: add BTS uptime counter

This is merely a historical archive of years 2008-2021, before the migration to mailman3.

A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/gerrit-log@lists.osmocom.org/.

iedemam gerrit-no-reply at lists.osmocom.org
Fri Apr 30 14:43:17 UTC 2021


iedemam has posted comments on this change. ( https://gerrit.osmocom.org/c/osmo-bsc/+/23234 )

Change subject: stats: add BTS uptime counter
......................................................................


Patch Set 5:

(1 comment)

Hi,

Thanks again for taking a look. Reply to your comment is maybe long but hopefully clear.

-Michael

https://gerrit.osmocom.org/c/osmo-bsc/+/23234/4/src/osmo-bsc/bts.c 
File src/osmo-bsc/bts.c:

https://gerrit.osmocom.org/c/osmo-bsc/+/23234/4/src/osmo-bsc/bts.c@586 
PS4, Line 586: 	int downtime_seconds = BTS_DOWNTIME_SAMPLE_INTERVAL - uptime_seconds;
> I'm still not getting it, I'm sorry. Or maybe I'm getting it but I still find it really strange. […]
Let's back up a bit maybe. Currently there is no way determine a BTS uptime other than by polling it via the VTY. If the BSC restarts for some reason, all uptime tracking is lost. I wanted to have the uptime available via the statsd interface so each BTS uptime during any given period can be known without risking losing state in a restart.

So, originally I wrote this to run every second and count uptime. Are we up? Good, increment the uptime counter. This value is summed every X seconds when statsd runs and the value is exported. Every interval of statsd would contain between 0 and X seconds of uptime. I can sum these intervals, for example, for an hour and the difference between that number and 3600 will be my downtime. Straightforward I thought.

This approach was rejected. It ran too often, abused the counter interface, and counted uptime instead of downtime. OK, so now I've changed to using a stat_item, only running every INTERVAL seconds and counting downtime.

The BTS_DOWNTIME_SAMPLE_INTERVAL value now represents the maximum amount of downtime that we would be willing to let go missing if a restart would occur because it isn't getting pushed into the statsd system. When we execute the periodic timer to calculate downtime we see how many seconds of uptime have elapsed and take the difference from the interval to determine downtime.

Downtime is added to the stat_item. When the statsd system exports these values every X seconds, we have between 0 and X seconds of downtime in that period. Sum up all these periods and you can see total downtime for each BTS during any given timeframe.

Suggestions welcome. I've tried my best to address concerns but am running out of ideas.



-- 
To view, visit https://gerrit.osmocom.org/c/osmo-bsc/+/23234
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings

Gerrit-Project: osmo-bsc
Gerrit-Branch: master
Gerrit-Change-Id: Ib17674bbe95e828cebff12de9e0b30f06447ef6c
Gerrit-Change-Number: 23234
Gerrit-PatchSet: 5
Gerrit-Owner: iedemam <michael at kapsulate.com>
Gerrit-Assignee: daniel <dwillmann at sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: daniel <dwillmann at sysmocom.de>
Gerrit-Reviewer: laforge <laforge at osmocom.org>
Gerrit-Reviewer: pespin <pespin at sysmocom.de>
Gerrit-CC: dexter <pmaier at sysmocom.de>
Gerrit-Comment-Date: Fri, 30 Apr 2021 14:43:17 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: iedemam <michael at kapsulate.com>
Comment-In-Reply-To: pespin <pespin at sysmocom.de>
Gerrit-MessageType: comment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osmocom.org/pipermail/gerrit-log/attachments/20210430/a98d9dad/attachment.htm>


More information about the gerrit-log mailing list