fixeria has uploaded this change for review.

View Change

doc/manuals: document the metrics

Change-Id: Iacfefd387d0cd26eebbbeba0cd37efa78f90bb46
Related: OS#6671
---
M doc/manuals/chapters/configuration.adoc
A doc/manuals/chapters/metrics.adoc
M doc/manuals/osmo-s1gw-usermanual.adoc
3 files changed, 190 insertions(+), 3 deletions(-)

git pull ssh://gerrit.osmocom.org:29418/erlang/osmo-s1gw refs/changes/65/42365/1
diff --git a/doc/manuals/chapters/configuration.adoc b/doc/manuals/chapters/configuration.adoc
index 53e23e5..8bd7eea 100644
--- a/doc/manuals/chapters/configuration.adoc
+++ b/doc/manuals/chapters/configuration.adoc
@@ -256,9 +256,10 @@
=== `exometer_core` — Metrics and StatsD Reporting

OsmoS1GW uses the https://github.com/Feuerlabs/exometer_core[exometer_core]
-library for internal metrics (counters and gauges). The `exometer_core`
-section configures reporters — processes that periodically push metric
-values to an external destination.
+library for internal metrics (counters and gauges); see <<metrics>> for the
+full list of available metrics. The `exometer_core` section configures
+reporters — processes that periodically push metric values to an external
+destination.

The default configuration reports all counters and gauges to a StatsD
server:
diff --git a/doc/manuals/chapters/metrics.adoc b/doc/manuals/chapters/metrics.adoc
new file mode 100644
index 0000000..5b81227
--- /dev/null
+++ b/doc/manuals/chapters/metrics.adoc
@@ -0,0 +1,184 @@
+[[metrics]]
+== Metrics
+
+OsmoS1GW exposes internal metrics using the
+https://github.com/Feuerlabs/exometer_core[exometer_core] library. Two
+metric types are used:
+
+Counter:: A monotonically increasing integer, incremented each time a
+ specific event occurs. Counters never decrease.
+
+Gauge:: An integer that reflects a current quantity (e.g. the number of
+ active connections). Gauges can go up and down.
+
+[[metrics_naming]]
+=== Metric Names
+
+Internally, each metric is identified by an Erlang list such as
+`[ctr, pfcp, heartbeat_req, tx]`. When reported externally — via StatsD
+(see <<config_exometer>>) or the REST API (see <<rest_metrics>>) — the
+list elements are joined with dots and the leading type element (`ctr` or
+`gauge`) is dropped. For example:
+
+* `[ctr, pfcp, heartbeat_req, tx]` → `pfcp.heartbeat_req.tx`
+* `[gauge, pfcp, associated]` → `pfcp.associated`
+
+When StatsD reporting is enabled, all metric names are further prefixed
+with the configured `prefix` string (default: `s1gw`), giving e.g.
+`s1gw.pfcp.heartbeat_req.tx`.
+
+[[metrics_global_counters]]
+=== Global Counters
+
+The following counters are registered at startup and count events across
+all connections.
+
+[[metrics_pfcp_counters]]
+==== PFCP Counters
+
+[options="header",cols="45,55"]
+|===
+| Metric name | Description
+| `pfcp.heartbeat_req.tx` | PFCP Heartbeat Requests sent to the UPF
+| `pfcp.heartbeat_req.rx` | PFCP Heartbeat Requests received from the UPF
+| `pfcp.heartbeat_req.timeout` | PFCP Heartbeat Requests that timed out
+| `pfcp.heartbeat_resp.tx` | PFCP Heartbeat Responses sent to the UPF
+| `pfcp.heartbeat_resp.rx` | PFCP Heartbeat Responses received from the UPF
+| `pfcp.assoc_setup_req.tx` | PFCP Association Setup Requests sent
+| `pfcp.assoc_setup_req.timeout` | PFCP Association Setup Requests that timed out
+| `pfcp.assoc_setup_resp.rx` | PFCP Association Setup Responses received
+| `pfcp.assoc_setup_resp.rx_ack` | PFCP Association Setup Responses with success cause
+| `pfcp.assoc_setup_resp.rx_nack` | PFCP Association Setup Responses with failure cause
+| `pfcp.unexpected_pdu` | Unexpected or unrecognised PFCP PDUs received
+|===
+
+[[metrics_s1ap_counters]]
+==== S1AP Counters
+
+[options="header",cols="55,45"]
+|===
+| Metric name | Description
+| `s1ap.enb.all.rx` | S1AP PDUs received from any eNB
+| `s1ap.enb.all.rx_unknown_enb` | S1AP PDUs received from an unregistered eNB
+| `s1ap.proxy.exception` | Exceptions raised during S1AP PDU processing
+| `s1ap.proxy.in_pkt.all` | S1AP PDUs entering the proxy (all directions)
+| `s1ap.proxy.in_pkt.drop.all` | S1AP PDUs dropped by the proxy
+| `s1ap.proxy.in_pkt.decode_error` | S1AP PDUs that failed to decode
+| `s1ap.proxy.in_pkt.proc_error` | S1AP PDUs that failed to process
+| `s1ap.proxy.in_pkt.erab_setup_req` | E-RAB SETUP REQUEST PDUs
+| `s1ap.proxy.in_pkt.erab_setup_rsp` | E-RAB SETUP RESPONSE PDUs
+| `s1ap.proxy.in_pkt.erab_modify_req` | E-RAB MODIFY REQUEST PDUs
+| `s1ap.proxy.in_pkt.erab_modify_rsp` | E-RAB MODIFY RESPONSE PDUs
+| `s1ap.proxy.in_pkt.erab_release_cmd` | E-RAB RELEASE COMMAND PDUs
+| `s1ap.proxy.in_pkt.erab_release_rsp` | E-RAB RELEASE RESPONSE PDUs
+| `s1ap.proxy.in_pkt.erab_release_ind` | E-RAB RELEASE INDICATION PDUs
+| `s1ap.proxy.in_pkt.erab_mod_ind` | E-RAB MODIFICATION INDICATION PDUs
+| `s1ap.proxy.in_pkt.erab_mod_cnf` | E-RAB MODIFICATION CONFIRM PDUs
+| `s1ap.proxy.in_pkt.init_ctx_req` | INITIAL CONTEXT SETUP REQUEST PDUs
+| `s1ap.proxy.in_pkt.init_ctx_rsp` | INITIAL CONTEXT SETUP RESPONSE PDUs
+| `s1ap.proxy.in_pkt.release_ctx_req` | UE CONTEXT RELEASE REQUEST PDUs
+| `s1ap.proxy.in_pkt.release_ctx_cmd` | UE CONTEXT RELEASE COMMAND PDUs
+| `s1ap.proxy.in_pkt.release_ctx_compl` | UE CONTEXT RELEASE COMPLETE PDUs
+| `s1ap.proxy.in_pkt.handover_cmd` | HANDOVER COMMAND PDUs
+| `s1ap.proxy.in_pkt.handover_req` | HANDOVER REQUEST PDUs
+| `s1ap.proxy.in_pkt.handover_req_ack` | HANDOVER REQUEST ACKNOWLEDGE PDUs
+| `s1ap.proxy.out_pkt.forward.all` | S1AP PDUs forwarded (total)
+| `s1ap.proxy.out_pkt.forward.proc` | S1AP PDUs forwarded after processing (with IE rewriting)
+| `s1ap.proxy.out_pkt.forward.unmodified` | S1AP PDUs forwarded without modification
+| `s1ap.proxy.out_pkt.reply.all` | S1AP PDUs generated locally by the proxy (total)
+| `s1ap.proxy.out_pkt.reply.erab_setup_rsp` | E-RAB SETUP RESPONSE PDUs generated locally
+|===
+
+[[metrics_enb_proxy_counters]]
+==== eNB Proxy Counters
+
+[options="header",cols="45,55"]
+|===
+| Metric name | Description
+| `enb_proxy.s1setup.req` | S1 SETUP REQUEST PDUs received from eNBs
+| `enb_proxy.s1setup.rsp` | S1 SETUP RESPONSE PDUs received from the MME and forwarded
+| `enb_proxy.s1setup.failure` | S1 SETUP FAILURE PDUs received from an MME (triggers retry)
+| `enb_proxy.s1setup.req.timeout` | Timeouts waiting for S1 SETUP REQUEST from an eNB
+| `enb_proxy.s1setup.rsp.timeout` | Timeouts waiting for S1 SETUP RESPONSE from an MME
+| `enb_proxy.conn_est.timeout` | MME SCTP connection establishment timeouts
+| `enb_proxy.conn_est.failure` | MME SCTP connection establishment failures
+| `enb_proxy.unexpected_pdu` | Unexpected PDUs received from an eNB or MME
+| `enb_proxy.malformed_pdu` | Malformed PDUs received from an eNB or MME
+| `enb_proxy.mme_select.ok` | Successful MME selections from the pool
+| `enb_proxy.mme_select.error` | Failed MME selections (pool exhausted)
+|===
+
+[[metrics_sctp_counters]]
+==== SCTP Error Counters
+
+[options="header",cols="40,60"]
+|===
+| Metric name | Description
+| `sctp.error.all` | Total number of SCTP errors
+| `sctp.error.send_failed` | SCTP send operation failures
+| `sctp.error.pdapi_event` | SCTP partial delivery API failures
+| `sctp.error.remote_error` | SCTP remote error notifications
+|===
+
+[[metrics_per_enb_counters]]
+=== Per-eNB Counters
+
+When an eNB connects and its Global-eNB-ID becomes known (after the S1
+Setup procedure), OsmoS1GW dynamically creates a set of per-eNB counters
+scoped to that eNB. These counters mirror the global eNB proxy counters
+but are broken down per connected base station.
+
+The naming scheme for per-eNB counters is
+`enb.{Global-eNB-ID}.{suffix}`, where `{Global-eNB-ID}` is the
+MCC-MNC-eNBId string (e.g. `001-01-1337`).
+
+In addition to the mirrored proxy counters, the following per-eNB
+counters are also registered:
+
+[options="header",cols="50,50"]
+|===
+| Metric name | Description
+| `enb.{id}.uptime` | Time (in seconds) since the eNB connected
+| `enb.{id}.gtpu.packets.ul` | GTP-U uplink packets (requires GTP-U KPI)
+| `enb.{id}.gtpu.packets.dl` | GTP-U downlink packets (requires GTP-U KPI)
+| `enb.{id}.gtpu.bytes.ue.ul` | GTP-U uplink bytes (UE side, requires GTP-U KPI)
+| `enb.{id}.gtpu.bytes.ue.dl` | GTP-U downlink bytes (UE side, requires GTP-U KPI)
+| `enb.{id}.gtpu.bytes.total.ul` | GTP-U uplink bytes (total, requires GTP-U KPI)
+| `enb.{id}.gtpu.bytes.total.dl` | GTP-U downlink bytes (total, requires GTP-U KPI)
+|===
+
+GTP-U counters are only populated when the GTP-U KPI module is enabled
+(see <<config_gtpu_kpi>>).
+
+[[metrics_per_mme_counters]]
+=== Per-MME Counters
+
+When an MME is registered in the pool — either at startup from the
+configuration file (see <<config_mme_pool>>) or dynamically via the REST
+API — OsmoS1GW creates a set of per-MME counters scoped to that MME entry.
+
+The naming scheme is `mme.{name}.{suffix}`, where `{name}` is the MME's
+configured name (e.g. `mme0`).
+
+[options="header",cols="45,55"]
+|===
+| Metric name | Description
+| `mme.{name}.selected` | Number of times this MME was selected for a connection attempt
+| `mme.{name}.conn_est.timeout` | Connection establishment timeouts to this MME
+| `mme.{name}.conn_est.failure` | Connection establishment failures to this MME
+| `mme.{name}.s1setup.rsp` | Successful S1 Setup procedures completed via this MME
+| `mme.{name}.s1setup.failure` | S1 SETUP FAILURE responses received from this MME
+| `mme.{name}.s1setup.rsp.timeout` | Timeouts waiting for S1 SETUP RESPONSE from this MME
+|===
+
+[[metrics_gauges]]
+=== Gauges
+
+[options="header",cols="45,55"]
+|===
+| Metric name | Description
+| `pfcp.associated` | `1` if the PFCP association with the UPF is currently established, `0` otherwise
+| `s1ap.enb.num_sctp_connections` | Current number of active eNB SCTP connections
+|===
+
+// vim:set ts=4 sw=4 et:
diff --git a/doc/manuals/osmo-s1gw-usermanual.adoc b/doc/manuals/osmo-s1gw-usermanual.adoc
index 13dcd4b..7c97192 100644
--- a/doc/manuals/osmo-s1gw-usermanual.adoc
+++ b/doc/manuals/osmo-s1gw-usermanual.adoc
@@ -15,6 +15,8 @@

include::{srcdir}/chapters/configuration.adoc[]

+include::{srcdir}/chapters/metrics.adoc[]
+
include::{commondir}/chapters/glossary.adoc[]

include::{commondir}/chapters/bibliography.adoc[]

To view, visit change 42365. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-MessageType: newchange
Gerrit-Project: erlang/osmo-s1gw
Gerrit-Branch: master
Gerrit-Change-Id: Iacfefd387d0cd26eebbbeba0cd37efa78f90bb46
Gerrit-Change-Number: 42365
Gerrit-PatchSet: 1
Gerrit-Owner: fixeria <vyanitskiy@sysmocom.de>