fixeria submitted this change.

View Change

Approvals: Jenkins Builder: Verified pespin: Looks good to me, approved osmith: Looks good to me, but someone else must approve
doc/manuals: document the metrics

Change-Id: Iacfefd387d0cd26eebbbeba0cd37efa78f90bb46
Related: OS#6671, SYS#7065
---
M doc/manuals/chapters/configuration.adoc
A doc/manuals/chapters/metrics.adoc
M doc/manuals/osmo-s1gw-usermanual.adoc
3 files changed, 189 insertions(+), 4 deletions(-)

diff --git a/doc/manuals/chapters/configuration.adoc b/doc/manuals/chapters/configuration.adoc
index 7354d56..1f623b9 100644
--- a/doc/manuals/chapters/configuration.adoc
+++ b/doc/manuals/chapters/configuration.adoc
@@ -255,10 +255,10 @@
[[config_exometer]]
=== `exometer_core` — Metrics and StatsD Reporting

-OsmoS1GW uses the https://github.com/Feuerlabs/exometer_core[exometer_core]
-library for internal metrics (counters and gauges). The `exometer_core`
-section configures reporters — processes that periodically push metric
-values to an external destination.
+See <<metrics>> for an introduction to OsmoS1GW metrics and the full
+list of available counters and gauges. The `exometer_core` section
+configures reporters — processes that periodically push metric values to
+an external destination.

The default configuration reports all counters and gauges to a StatsD
server:
diff --git a/doc/manuals/chapters/metrics.adoc b/doc/manuals/chapters/metrics.adoc
new file mode 100644
index 0000000..a6d495e
--- /dev/null
+++ b/doc/manuals/chapters/metrics.adoc
@@ -0,0 +1,183 @@
+[[metrics]]
+== Metrics
+
+OsmoS1GW exposes internal metrics using the
+https://github.com/Feuerlabs/exometer_core[exometer_core] library, and
+ships with the
+https://github.com/osmocom/exometer_report_statsd[exometer_report_statsd]
+plugin for StatsD reporting. See <<config_exometer>> for configuration
+details.
+
+Two metric types are used:
+
+Counter:: A monotonically increasing integer, incremented each time a
+ specific event occurs. Counters never decrease.
+
+Gauge:: An integer that reflects a current quantity (e.g. the number of
+ active connections). Gauges can go up and down.
+
+[[metrics_naming]]
+=== Metric Names
+
+Metric names follow a hierarchical dot-separated structure that reflects
+the subsystem they belong to. For example, metrics related to S1AP
+processing contain `s1ap` in their name, while metrics specific to PFCP
+contain `pfcp`. When StatsD reporting is enabled, all metric names are
+further prefixed with the configured `prefix` string (default: `s1gw`),
+giving e.g. `s1gw.s1ap.proxy.in_pkt.all`.
+
+[[metrics_global_counters]]
+=== Global Counters
+
+The following counters are registered at startup and count events across
+all connections.
+
+[[metrics_pfcp_counters]]
+==== PFCP Counters
+
+[options="header",cols="45,55"]
+|===
+| Metric name | Description
+| `pfcp.heartbeat_req.tx` | PFCP Heartbeat Requests sent to the UPF
+| `pfcp.heartbeat_req.rx` | PFCP Heartbeat Requests received from the UPF
+| `pfcp.heartbeat_req.timeout` | PFCP Heartbeat Requests that timed out
+| `pfcp.heartbeat_resp.tx` | PFCP Heartbeat Responses sent to the UPF
+| `pfcp.heartbeat_resp.rx` | PFCP Heartbeat Responses received from the UPF
+| `pfcp.assoc_setup_req.tx` | PFCP Association Setup Requests sent
+| `pfcp.assoc_setup_req.timeout` | PFCP Association Setup Requests that timed out
+| `pfcp.assoc_setup_resp.rx` | PFCP Association Setup Responses received
+| `pfcp.assoc_setup_resp.rx_ack` | PFCP Association Setup Responses with success cause
+| `pfcp.assoc_setup_resp.rx_nack` | PFCP Association Setup Responses with failure cause
+| `pfcp.unexpected_pdu` | Unexpected or unrecognised PFCP PDUs received
+|===
+
+[[metrics_s1ap_counters]]
+==== S1AP Counters
+
+[options="header",cols="55,45"]
+|===
+| Metric name | Description
+| `s1ap.enb.all.rx` | S1AP PDUs received from any eNB
+| `s1ap.enb.all.rx_unknown_enb` | S1AP PDUs received from an unregistered eNB
+| `s1ap.proxy.exception` | Exceptions raised during S1AP PDU processing
+| `s1ap.proxy.in_pkt.all` | S1AP PDUs received by the proxy (all directions)
+| `s1ap.proxy.in_pkt.drop.all` | Received S1AP PDUs dropped by the proxy
+| `s1ap.proxy.in_pkt.decode_error` | Received S1AP PDUs that failed to decode
+| `s1ap.proxy.in_pkt.proc_error` | Received S1AP PDUs that failed to process
+| `s1ap.proxy.in_pkt.erab_setup_req` | E-RAB SETUP REQUEST PDUs received
+| `s1ap.proxy.in_pkt.erab_setup_rsp` | E-RAB SETUP RESPONSE PDUs received
+| `s1ap.proxy.in_pkt.erab_modify_req` | E-RAB MODIFY REQUEST PDUs received
+| `s1ap.proxy.in_pkt.erab_modify_rsp` | E-RAB MODIFY RESPONSE PDUs received
+| `s1ap.proxy.in_pkt.erab_release_cmd` | E-RAB RELEASE COMMAND PDUs received
+| `s1ap.proxy.in_pkt.erab_release_rsp` | E-RAB RELEASE RESPONSE PDUs received
+| `s1ap.proxy.in_pkt.erab_release_ind` | E-RAB RELEASE INDICATION PDUs received
+| `s1ap.proxy.in_pkt.erab_mod_ind` | E-RAB MODIFICATION INDICATION PDUs received
+| `s1ap.proxy.in_pkt.erab_mod_cnf` | E-RAB MODIFICATION CONFIRM PDUs received
+| `s1ap.proxy.in_pkt.init_ctx_req` | INITIAL CONTEXT SETUP REQUEST PDUs received
+| `s1ap.proxy.in_pkt.init_ctx_rsp` | INITIAL CONTEXT SETUP RESPONSE PDUs received
+| `s1ap.proxy.in_pkt.release_ctx_req` | UE CONTEXT RELEASE REQUEST PDUs received
+| `s1ap.proxy.in_pkt.release_ctx_cmd` | UE CONTEXT RELEASE COMMAND PDUs received
+| `s1ap.proxy.in_pkt.release_ctx_compl` | UE CONTEXT RELEASE COMPLETE PDUs received
+| `s1ap.proxy.in_pkt.handover_cmd` | HANDOVER COMMAND PDUs received
+| `s1ap.proxy.in_pkt.handover_req` | HANDOVER REQUEST PDUs received
+| `s1ap.proxy.in_pkt.handover_req_ack` | HANDOVER REQUEST ACKNOWLEDGE PDUs received
+| `s1ap.proxy.out_pkt.forward.all` | S1AP PDUs forwarded (total)
+| `s1ap.proxy.out_pkt.forward.proc` | S1AP PDUs forwarded after processing (with IE rewriting)
+| `s1ap.proxy.out_pkt.forward.unmodified` | S1AP PDUs forwarded without modification
+| `s1ap.proxy.out_pkt.reply.all` | S1AP PDUs generated locally by the proxy (total)
+| `s1ap.proxy.out_pkt.reply.erab_setup_rsp` | E-RAB SETUP RESPONSE PDUs generated locally
+|===
+
+[[metrics_enb_proxy_counters]]
+==== eNB Proxy Counters
+
+[options="header",cols="45,55"]
+|===
+| Metric name | Description
+| `enb_proxy.s1setup.req` | S1 SETUP REQUEST PDUs received from eNBs
+| `enb_proxy.s1setup.rsp` | S1 SETUP RESPONSE PDUs received from the MME and forwarded
+| `enb_proxy.s1setup.failure` | S1 SETUP FAILURE PDUs received from an MME (triggers retry)
+| `enb_proxy.s1setup.req.timeout` | Timeouts waiting for S1 SETUP REQUEST from an eNB
+| `enb_proxy.s1setup.rsp.timeout` | Timeouts waiting for S1 SETUP RESPONSE from an MME
+| `enb_proxy.conn_est.timeout` | MME SCTP connection establishment timeouts
+| `enb_proxy.conn_est.failure` | MME SCTP connection establishment failures
+| `enb_proxy.unexpected_pdu` | Unexpected PDUs received from an eNB or MME
+| `enb_proxy.malformed_pdu` | Malformed PDUs received from an eNB or MME
+| `enb_proxy.mme_select.ok` | Successful MME selections from the pool
+| `enb_proxy.mme_select.error` | Failed MME selections (pool exhausted)
+|===
+
+[[metrics_sctp_counters]]
+==== SCTP Error Counters
+
+[options="header",cols="40,60"]
+|===
+| Metric name | Description
+| `sctp.error.all` | Total number of SCTP errors
+| `sctp.error.send_failed` | SCTP send operation failures
+| `sctp.error.pdapi_event` | SCTP partial delivery API failures
+| `sctp.error.remote_error` | SCTP remote error notifications
+|===
+
+[[metrics_per_enb_counters]]
+=== Per-eNB Counters
+
+When an eNB connects and its Global-eNB-ID becomes known (after the S1
+Setup procedure), OsmoS1GW dynamically creates a set of per-eNB counters
+scoped to that eNB. These counters mirror the global eNB proxy counters
+but are broken down per connected base station.
+
+The naming scheme for per-eNB counters is
+`enb.{Global-eNB-ID}.{suffix}`, where `{Global-eNB-ID}` is the
+MCC-MNC-eNBId string (e.g. `001-01-1337`).
+
+In addition to the mirrored proxy counters, the following per-eNB
+counters are also registered:
+
+[options="header",cols="50,50"]
+|===
+| Metric name | Description
+| `enb.{id}.uptime` | Time (in seconds) since the eNB connected
+| `enb.{id}.gtpu.packets.ul` | GTP-U uplink packets (requires GTP-U KPI)
+| `enb.{id}.gtpu.packets.dl` | GTP-U downlink packets (requires GTP-U KPI)
+| `enb.{id}.gtpu.bytes.ue.ul` | GTP-U uplink bytes (UE side, requires GTP-U KPI)
+| `enb.{id}.gtpu.bytes.ue.dl` | GTP-U downlink bytes (UE side, requires GTP-U KPI)
+| `enb.{id}.gtpu.bytes.total.ul` | GTP-U uplink bytes (total, requires GTP-U KPI)
+| `enb.{id}.gtpu.bytes.total.dl` | GTP-U downlink bytes (total, requires GTP-U KPI)
+|===
+
+GTP-U counters are only populated when the GTP-U KPI module is enabled
+(see <<config_gtpu_kpi>>).
+
+[[metrics_per_mme_counters]]
+=== Per-MME Counters
+
+When an MME is registered in the pool — either at startup from the
+configuration file (see <<config_mme_pool>>) or dynamically via the REST
+API — OsmoS1GW creates a set of per-MME counters scoped to that MME entry.
+
+The naming scheme is `mme.{name}.{suffix}`, where `{name}` is the MME's
+configured name (e.g. `mme0`).
+
+[options="header",cols="45,55"]
+|===
+| Metric name | Description
+| `mme.{name}.selected` | Number of times this MME was selected for a connection attempt
+| `mme.{name}.conn_est.timeout` | Connection establishment timeouts to this MME
+| `mme.{name}.conn_est.failure` | Connection establishment failures to this MME
+| `mme.{name}.s1setup.rsp` | Successful S1 Setup procedures completed via this MME
+| `mme.{name}.s1setup.failure` | S1 SETUP FAILURE responses received from this MME
+| `mme.{name}.s1setup.rsp.timeout` | Timeouts waiting for S1 SETUP RESPONSE from this MME
+|===
+
+[[metrics_gauges]]
+=== Gauges
+
+[options="header",cols="45,55"]
+|===
+| Metric name | Description
+| `pfcp.associated` | `1` if the PFCP association with the UPF is currently established, `0` otherwise
+| `s1ap.enb.num_sctp_connections` | Current number of active eNB SCTP connections
+|===
+
+// vim:set ts=4 sw=4 et:
diff --git a/doc/manuals/osmo-s1gw-usermanual.adoc b/doc/manuals/osmo-s1gw-usermanual.adoc
index 13dcd4b..7c97192 100644
--- a/doc/manuals/osmo-s1gw-usermanual.adoc
+++ b/doc/manuals/osmo-s1gw-usermanual.adoc
@@ -15,6 +15,8 @@

include::{srcdir}/chapters/configuration.adoc[]

+include::{srcdir}/chapters/metrics.adoc[]
+
include::{commondir}/chapters/glossary.adoc[]

include::{commondir}/chapters/bibliography.adoc[]

To view, visit change 42365. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-MessageType: merged
Gerrit-Project: erlang/osmo-s1gw
Gerrit-Branch: master
Gerrit-Change-Id: Iacfefd387d0cd26eebbbeba0cd37efa78f90bb46
Gerrit-Change-Number: 42365
Gerrit-PatchSet: 5
Gerrit-Owner: fixeria <vyanitskiy@sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: fixeria <vyanitskiy@sysmocom.de>
Gerrit-Reviewer: osmith <osmith@sysmocom.de>
Gerrit-Reviewer: pespin <pespin@sysmocom.de>