fixeria has uploaded this change for review. ( https://gerrit.osmocom.org/c/erlang/osmo-s1gw/+/42365?usp=email )
Change subject: doc/manuals: document the metrics ......................................................................
doc/manuals: document the metrics
Change-Id: Iacfefd387d0cd26eebbbeba0cd37efa78f90bb46 Related: OS#6671 --- M doc/manuals/chapters/configuration.adoc A doc/manuals/chapters/metrics.adoc M doc/manuals/osmo-s1gw-usermanual.adoc 3 files changed, 190 insertions(+), 3 deletions(-)
git pull ssh://gerrit.osmocom.org:29418/erlang/osmo-s1gw refs/changes/65/42365/1
diff --git a/doc/manuals/chapters/configuration.adoc b/doc/manuals/chapters/configuration.adoc index 53e23e5..8bd7eea 100644 --- a/doc/manuals/chapters/configuration.adoc +++ b/doc/manuals/chapters/configuration.adoc @@ -256,9 +256,10 @@ === `exometer_core` — Metrics and StatsD Reporting
OsmoS1GW uses the https://github.com/Feuerlabs/exometer_core%5Bexometer_core] -library for internal metrics (counters and gauges). The `exometer_core` -section configures reporters — processes that periodically push metric -values to an external destination. +library for internal metrics (counters and gauges); see <<metrics>> for the +full list of available metrics. The `exometer_core` section configures +reporters — processes that periodically push metric values to an external +destination.
The default configuration reports all counters and gauges to a StatsD server: diff --git a/doc/manuals/chapters/metrics.adoc b/doc/manuals/chapters/metrics.adoc new file mode 100644 index 0000000..5b81227 --- /dev/null +++ b/doc/manuals/chapters/metrics.adoc @@ -0,0 +1,184 @@ +[[metrics]] +== Metrics + +OsmoS1GW exposes internal metrics using the +https://github.com/Feuerlabs/exometer_core%5Bexometer_core] library. Two +metric types are used: + +Counter:: A monotonically increasing integer, incremented each time a + specific event occurs. Counters never decrease. + +Gauge:: An integer that reflects a current quantity (e.g. the number of + active connections). Gauges can go up and down. + +[[metrics_naming]] +=== Metric Names + +Internally, each metric is identified by an Erlang list such as +`[ctr, pfcp, heartbeat_req, tx]`. When reported externally — via StatsD +(see <<config_exometer>>) or the REST API (see <<rest_metrics>>) — the +list elements are joined with dots and the leading type element (`ctr` or +`gauge`) is dropped. For example: + +* `[ctr, pfcp, heartbeat_req, tx]` → `pfcp.heartbeat_req.tx` +* `[gauge, pfcp, associated]` → `pfcp.associated` + +When StatsD reporting is enabled, all metric names are further prefixed +with the configured `prefix` string (default: `s1gw`), giving e.g. +`s1gw.pfcp.heartbeat_req.tx`. + +[[metrics_global_counters]] +=== Global Counters + +The following counters are registered at startup and count events across +all connections. + +[[metrics_pfcp_counters]] +==== PFCP Counters + +[options="header",cols="45,55"] +|=== +| Metric name | Description +| `pfcp.heartbeat_req.tx` | PFCP Heartbeat Requests sent to the UPF +| `pfcp.heartbeat_req.rx` | PFCP Heartbeat Requests received from the UPF +| `pfcp.heartbeat_req.timeout` | PFCP Heartbeat Requests that timed out +| `pfcp.heartbeat_resp.tx` | PFCP Heartbeat Responses sent to the UPF +| `pfcp.heartbeat_resp.rx` | PFCP Heartbeat Responses received from the UPF +| `pfcp.assoc_setup_req.tx` | PFCP Association Setup Requests sent +| `pfcp.assoc_setup_req.timeout` | PFCP Association Setup Requests that timed out +| `pfcp.assoc_setup_resp.rx` | PFCP Association Setup Responses received +| `pfcp.assoc_setup_resp.rx_ack` | PFCP Association Setup Responses with success cause +| `pfcp.assoc_setup_resp.rx_nack` | PFCP Association Setup Responses with failure cause +| `pfcp.unexpected_pdu` | Unexpected or unrecognised PFCP PDUs received +|=== + +[[metrics_s1ap_counters]] +==== S1AP Counters + +[options="header",cols="55,45"] +|=== +| Metric name | Description +| `s1ap.enb.all.rx` | S1AP PDUs received from any eNB +| `s1ap.enb.all.rx_unknown_enb` | S1AP PDUs received from an unregistered eNB +| `s1ap.proxy.exception` | Exceptions raised during S1AP PDU processing +| `s1ap.proxy.in_pkt.all` | S1AP PDUs entering the proxy (all directions) +| `s1ap.proxy.in_pkt.drop.all` | S1AP PDUs dropped by the proxy +| `s1ap.proxy.in_pkt.decode_error` | S1AP PDUs that failed to decode +| `s1ap.proxy.in_pkt.proc_error` | S1AP PDUs that failed to process +| `s1ap.proxy.in_pkt.erab_setup_req` | E-RAB SETUP REQUEST PDUs +| `s1ap.proxy.in_pkt.erab_setup_rsp` | E-RAB SETUP RESPONSE PDUs +| `s1ap.proxy.in_pkt.erab_modify_req` | E-RAB MODIFY REQUEST PDUs +| `s1ap.proxy.in_pkt.erab_modify_rsp` | E-RAB MODIFY RESPONSE PDUs +| `s1ap.proxy.in_pkt.erab_release_cmd` | E-RAB RELEASE COMMAND PDUs +| `s1ap.proxy.in_pkt.erab_release_rsp` | E-RAB RELEASE RESPONSE PDUs +| `s1ap.proxy.in_pkt.erab_release_ind` | E-RAB RELEASE INDICATION PDUs +| `s1ap.proxy.in_pkt.erab_mod_ind` | E-RAB MODIFICATION INDICATION PDUs +| `s1ap.proxy.in_pkt.erab_mod_cnf` | E-RAB MODIFICATION CONFIRM PDUs +| `s1ap.proxy.in_pkt.init_ctx_req` | INITIAL CONTEXT SETUP REQUEST PDUs +| `s1ap.proxy.in_pkt.init_ctx_rsp` | INITIAL CONTEXT SETUP RESPONSE PDUs +| `s1ap.proxy.in_pkt.release_ctx_req` | UE CONTEXT RELEASE REQUEST PDUs +| `s1ap.proxy.in_pkt.release_ctx_cmd` | UE CONTEXT RELEASE COMMAND PDUs +| `s1ap.proxy.in_pkt.release_ctx_compl` | UE CONTEXT RELEASE COMPLETE PDUs +| `s1ap.proxy.in_pkt.handover_cmd` | HANDOVER COMMAND PDUs +| `s1ap.proxy.in_pkt.handover_req` | HANDOVER REQUEST PDUs +| `s1ap.proxy.in_pkt.handover_req_ack` | HANDOVER REQUEST ACKNOWLEDGE PDUs +| `s1ap.proxy.out_pkt.forward.all` | S1AP PDUs forwarded (total) +| `s1ap.proxy.out_pkt.forward.proc` | S1AP PDUs forwarded after processing (with IE rewriting) +| `s1ap.proxy.out_pkt.forward.unmodified` | S1AP PDUs forwarded without modification +| `s1ap.proxy.out_pkt.reply.all` | S1AP PDUs generated locally by the proxy (total) +| `s1ap.proxy.out_pkt.reply.erab_setup_rsp` | E-RAB SETUP RESPONSE PDUs generated locally +|=== + +[[metrics_enb_proxy_counters]] +==== eNB Proxy Counters + +[options="header",cols="45,55"] +|=== +| Metric name | Description +| `enb_proxy.s1setup.req` | S1 SETUP REQUEST PDUs received from eNBs +| `enb_proxy.s1setup.rsp` | S1 SETUP RESPONSE PDUs received from the MME and forwarded +| `enb_proxy.s1setup.failure` | S1 SETUP FAILURE PDUs received from an MME (triggers retry) +| `enb_proxy.s1setup.req.timeout` | Timeouts waiting for S1 SETUP REQUEST from an eNB +| `enb_proxy.s1setup.rsp.timeout` | Timeouts waiting for S1 SETUP RESPONSE from an MME +| `enb_proxy.conn_est.timeout` | MME SCTP connection establishment timeouts +| `enb_proxy.conn_est.failure` | MME SCTP connection establishment failures +| `enb_proxy.unexpected_pdu` | Unexpected PDUs received from an eNB or MME +| `enb_proxy.malformed_pdu` | Malformed PDUs received from an eNB or MME +| `enb_proxy.mme_select.ok` | Successful MME selections from the pool +| `enb_proxy.mme_select.error` | Failed MME selections (pool exhausted) +|=== + +[[metrics_sctp_counters]] +==== SCTP Error Counters + +[options="header",cols="40,60"] +|=== +| Metric name | Description +| `sctp.error.all` | Total number of SCTP errors +| `sctp.error.send_failed` | SCTP send operation failures +| `sctp.error.pdapi_event` | SCTP partial delivery API failures +| `sctp.error.remote_error` | SCTP remote error notifications +|=== + +[[metrics_per_enb_counters]] +=== Per-eNB Counters + +When an eNB connects and its Global-eNB-ID becomes known (after the S1 +Setup procedure), OsmoS1GW dynamically creates a set of per-eNB counters +scoped to that eNB. These counters mirror the global eNB proxy counters +but are broken down per connected base station. + +The naming scheme for per-eNB counters is +`enb.{Global-eNB-ID}.{suffix}`, where `{Global-eNB-ID}` is the +MCC-MNC-eNBId string (e.g. `001-01-1337`). + +In addition to the mirrored proxy counters, the following per-eNB +counters are also registered: + +[options="header",cols="50,50"] +|=== +| Metric name | Description +| `enb.{id}.uptime` | Time (in seconds) since the eNB connected +| `enb.{id}.gtpu.packets.ul` | GTP-U uplink packets (requires GTP-U KPI) +| `enb.{id}.gtpu.packets.dl` | GTP-U downlink packets (requires GTP-U KPI) +| `enb.{id}.gtpu.bytes.ue.ul` | GTP-U uplink bytes (UE side, requires GTP-U KPI) +| `enb.{id}.gtpu.bytes.ue.dl` | GTP-U downlink bytes (UE side, requires GTP-U KPI) +| `enb.{id}.gtpu.bytes.total.ul` | GTP-U uplink bytes (total, requires GTP-U KPI) +| `enb.{id}.gtpu.bytes.total.dl` | GTP-U downlink bytes (total, requires GTP-U KPI) +|=== + +GTP-U counters are only populated when the GTP-U KPI module is enabled +(see <<config_gtpu_kpi>>). + +[[metrics_per_mme_counters]] +=== Per-MME Counters + +When an MME is registered in the pool — either at startup from the +configuration file (see <<config_mme_pool>>) or dynamically via the REST +API — OsmoS1GW creates a set of per-MME counters scoped to that MME entry. + +The naming scheme is `mme.{name}.{suffix}`, where `{name}` is the MME's +configured name (e.g. `mme0`). + +[options="header",cols="45,55"] +|=== +| Metric name | Description +| `mme.{name}.selected` | Number of times this MME was selected for a connection attempt +| `mme.{name}.conn_est.timeout` | Connection establishment timeouts to this MME +| `mme.{name}.conn_est.failure` | Connection establishment failures to this MME +| `mme.{name}.s1setup.rsp` | Successful S1 Setup procedures completed via this MME +| `mme.{name}.s1setup.failure` | S1 SETUP FAILURE responses received from this MME +| `mme.{name}.s1setup.rsp.timeout` | Timeouts waiting for S1 SETUP RESPONSE from this MME +|=== + +[[metrics_gauges]] +=== Gauges + +[options="header",cols="45,55"] +|=== +| Metric name | Description +| `pfcp.associated` | `1` if the PFCP association with the UPF is currently established, `0` otherwise +| `s1ap.enb.num_sctp_connections` | Current number of active eNB SCTP connections +|=== + +// vim:set ts=4 sw=4 et: diff --git a/doc/manuals/osmo-s1gw-usermanual.adoc b/doc/manuals/osmo-s1gw-usermanual.adoc index 13dcd4b..7c97192 100644 --- a/doc/manuals/osmo-s1gw-usermanual.adoc +++ b/doc/manuals/osmo-s1gw-usermanual.adoc @@ -15,6 +15,8 @@
include::{srcdir}/chapters/configuration.adoc[]
+include::{srcdir}/chapters/metrics.adoc[] + include::{commondir}/chapters/glossary.adoc[]
include::{commondir}/chapters/bibliography.adoc[]