Attention is currently required from: fixeria, osmith.
Hello Jenkins Builder, fixeria, pespin,
I'd like you to reexamine a change. Please visit
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/40153?usp=email
to look at the new patch set (#2).
The following approvals got outdated and were removed:
Code-Review+1 by fixeria
Change subject: bts: set osmo-bts sched priority to 30
......................................................................
bts: set osmo-bts sched priority to 30
Change the scheduling priority from 10 to 30, as we are currently see
osmo-bts suffering from scheduling latency in jenkins even though we
don't run other jobs at that time:
20250425034138405 DL1C ERROR PC clock skew: elapsed_us=387574, error_us=382959 (scheduler_trx.c:449)
This should fix that the kernel prioritizes other (userspace or kernel)
processes running on the same machine that have a higher priority. We
have seen such an improvement after increasing scheduler priority for
osmo-bts-sysmo too (see I2394e6bbc00a1d47987dbe7b70f4b5cbedf69b10).
Priority 30 is higher than 10. From sched(7):
> Processes scheduled under one of the real-time policies (SCHED_FIFO,
> SCHED_RR) have a sched_priority value in the range 1 (low) to 99 (high).
This testsuite currently gets executed through docker-playground and it
fetches this config from osmo-ttcn3-hacks (see
If15461240f3037c142c176fc7da745a1701ae3f8).
Related: osmo-ci I0162f7299c8e37f893ffa10ddc4c8edece29ed7f
Change-Id: I828422e2363a58ca8c19d0f1b8a1b7d4e4bc031e
---
M bts/osmo-bts.cfg
1 file changed, 1 insertion(+), 1 deletion(-)
git pull ssh://gerrit.osmocom.org:29418/osmo-ttcn3-hacks refs/changes/53/40153/2
--
To view, visit https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/40153?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings?usp=email
Gerrit-MessageType: newpatchset
Gerrit-Project: osmo-ttcn3-hacks
Gerrit-Branch: master
Gerrit-Change-Id: I828422e2363a58ca8c19d0f1b8a1b7d4e4bc031e
Gerrit-Change-Number: 40153
Gerrit-PatchSet: 2
Gerrit-Owner: osmith <osmith(a)sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: fixeria <vyanitskiy(a)sysmocom.de>
Gerrit-Reviewer: pespin <pespin(a)sysmocom.de>
Gerrit-Attention: osmith <osmith(a)sysmocom.de>
Gerrit-Attention: fixeria <vyanitskiy(a)sysmocom.de>
osmith has submitted this change. ( https://gerrit.osmocom.org/c/osmo-ci/+/40138?usp=email )
Change subject: ansible: build-hosts: add testenv-coredump-helper
......................................................................
ansible: build-hosts: add testenv-coredump-helper
The Osmocom jenkins nodes run inside LXCs. When we get a coredump it
appears on the host. Add a helper script to the hosts so the jenkins
jobs can fetch the coredumps in case an Osmocom program crashes while
running a ttcn3 testsuite.
The helper script has the following safety features to ensure jenkins
can't just fetch any coredump:
* Only fetch coredumps within the last 3 seconds and only if the
executable matches osmo-* or open5gs-*
* Only listen on the lxc IP
Related: OS#6769
Change-Id: I7e66c98106b7028a393e3b873e96ae2dcb412c48
---
A ansible/roles/testenv-coredump-helper/README.md
A ansible/roles/testenv-coredump-helper/files/testenv-coredump-helper.py
A ansible/roles/testenv-coredump-helper/files/testenv-coredump-helper.service
A ansible/roles/testenv-coredump-helper/handlers/main.yml
A ansible/roles/testenv-coredump-helper/tasks/main.yml
M ansible/setup-build-host.yml
6 files changed, 210 insertions(+), 0 deletions(-)
Approvals:
pespin: Looks good to me, but someone else must approve
Jenkins Builder: Verified
fixeria: Looks good to me, approved
diff --git a/ansible/roles/testenv-coredump-helper/README.md b/ansible/roles/testenv-coredump-helper/README.md
new file mode 100644
index 0000000..d27f457
--- /dev/null
+++ b/ansible/roles/testenv-coredump-helper/README.md
@@ -0,0 +1,49 @@
+# testenv-coredump-helper
+
+A simple webserver to make Osmocom related coredumps available in LXCs.
+
+## Architecture
+
+```
+.-----------------------------------------------------------------------------.
+| build host (build4) .--------------------------.|
+| | LXC (deb12build-ansible) ||
+| | ||
+| shell | HTTP ||
+| coredumpctl --------- testenv-coredump-helper ------------- testenv ||
+| |__________________________||
+|_____________________________________________________________________________|
+```
+
+## What this script does
+
+This role installs a systemd service running the script in
+`files/testenv-coredump-helper.py`, which runs a HTTP server on port `8042` of
+the `lxcbr0`'s IP (e.g. `10.0.3.1`) on the build host. The IP is detected
+dynamically as it is random on each build host.
+
+The HTTP server provides one GET endpoint `/core`. When it is requested (by
+testenv running inside the LXC), the script runs `coredumpctl` with parameters
+to check for any coredump within the last three seconds that was created for
+any Osmocom specific program (starting with `osmo-*` or `open5gs-*`).
+
+* If no matching coredump was found, it returns HTTP status code `404`.
+
+* If a matching coredump was found, it returns HTTP status code `200`, sends
+ the path to the executable in an `X-Executable-Path` header and sends the
+ coredump itself as body.
+
+The coredump and path to the executable are retrieved from `coredumpctl`. The
+coredump is stored in a temporary file for the duration of the transfer.
+
+## Client implementation
+
+The clientside implementation is in `osmo-ttcn3-hacks.git`,
+`_testenv/testenv/coredump.py` in the `get_from_coredumpctl_lxc_host()`
+function.
+
+## Maximum coredump size
+
+The `testenv-coredump-helper` script does not limit the size of the coredump,
+however a maximum size that `systemd-coredump` accepts can be configured in
+`/etc/systemd/coredump.conf`.
diff --git a/ansible/roles/testenv-coredump-helper/files/testenv-coredump-helper.py b/ansible/roles/testenv-coredump-helper/files/testenv-coredump-helper.py
new file mode 100644
index 0000000..a56bd0f
--- /dev/null
+++ b/ansible/roles/testenv-coredump-helper/files/testenv-coredump-helper.py
@@ -0,0 +1,112 @@
+#!/usr/bin/env python3
+# Copyright 2025 sysmocom - s.f.m.c. GmbH
+# SPDX-License-Identifier: GPL-3.0-or-later
+# Simple webserver to make Osmocom related coredumps available in LXCs. See
+# ../README.md and OS#6769 for details.
+import datetime
+import fnmatch
+import http.server
+import json
+import os
+import shutil
+import signal
+import socket
+import socketserver
+import subprocess
+import sys
+import tempfile
+
+
+NETDEV = "lxcbr0"
+IP_PATTERN = "10.0.*"
+PORT = 8042
+
+
+def find_lxc_ip():
+ cmd = ["ip", "-j", "-o", "-4", "addr", "show", "dev", NETDEV]
+ p = subprocess.run(cmd, capture_output=True, text=True, check=True)
+ ret = json.loads(p.stdout)[0]["addr_info"][0]["local"]
+ if not fnmatch.fnmatch(ret, IP_PATTERN):
+ print(f"ERROR: IP doesn't match pattern {IP_PATTERN}: {ret}")
+ sys.exit(1)
+ return ret
+
+
+def executable_is_relevant(exe):
+ basename = os.path.basename(exe)
+ patterns = [
+ "open5gs-*",
+ "osmo-*",
+ ]
+
+ for pattern in patterns:
+ if fnmatch.fnmatch(basename, pattern):
+ return True
+
+ return False
+
+
+class CustomRequestHandler(http.server.SimpleHTTPRequestHandler):
+ def do_GET(self):
+ if self.path == "/core":
+ # Check for any coredump within last 3 seconds
+ since = (datetime.datetime.now() - datetime.timedelta(seconds=3)).strftime("%Y-%m-%d %H:%M:%S")
+ cmd = ["coredumpctl", "-q", "-S", since, "--json=short", "-n1"]
+
+ p = subprocess.run(cmd, capture_output=True, text=True)
+ if p.returncode != 0:
+ self.send_error(404, "No coredump found")
+ return None
+
+ # Check if the coredump executable is from osmo-*, open5gs-*, etc.
+ coredump = json.loads(p.stdout)[0]
+ if not executable_is_relevant(coredump["exe"]):
+ self.send_error(404, "No coredump found")
+ return None
+
+ # Put coredump into a temporary file and return it
+ with tempfile.TemporaryDirectory() as tmpdirname:
+ core_path = os.path.join(tmpdirname, "core")
+ cmd = [
+ "coredumpctl",
+ "dump",
+ "-q",
+ "-S",
+ since,
+ "-o",
+ core_path,
+ str(coredump["pid"]),
+ coredump["exe"],
+ ]
+ subprocess.run(cmd, stdout=subprocess.DEVNULL, check=True)
+
+ with open(core_path, "rb") as f:
+ self.send_response(200)
+ self.send_header("X-Executable-Path", coredump["exe"])
+ self.end_headers()
+ self.wfile.write(f.read())
+ else:
+ self.send_error(404, "File Not Found")
+
+
+def signal_handler(sig, frame):
+ sys.exit(0)
+
+
+def main():
+ if not shutil.which("coredumpctl"):
+ print("ERROR: coredumpctl not found!")
+ sys.exit(1)
+
+ ip = os.environ.get("LXC_HOST_IP") or find_lxc_ip()
+ print(f"Listening on {ip}:{PORT}")
+ signal.signal(signal.SIGINT, signal_handler)
+ with socketserver.TCPServer((ip, PORT), CustomRequestHandler, False) as httpd:
+ httpd.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+ httpd.server_bind()
+ httpd.server_activate()
+ httpd.serve_forever()
+
+
+if __name__ == "__main__":
+ main()
diff --git a/ansible/roles/testenv-coredump-helper/files/testenv-coredump-helper.service b/ansible/roles/testenv-coredump-helper/files/testenv-coredump-helper.service
new file mode 100644
index 0000000..ef5a851
--- /dev/null
+++ b/ansible/roles/testenv-coredump-helper/files/testenv-coredump-helper.service
@@ -0,0 +1,12 @@
+[Unit]
+Description=testenv coredump helper
+After=lxc.service
+
+[Service]
+Environment="PYTHONUNBUFFERED=1"
+Type=simple
+Restart=always
+ExecStart=/opt/testenv-coredump-helper/testenv-coredump-helper
+
+[Install]
+WantedBy=multi-user.target
diff --git a/ansible/roles/testenv-coredump-helper/handlers/main.yml b/ansible/roles/testenv-coredump-helper/handlers/main.yml
new file mode 100644
index 0000000..0f7ef57
--- /dev/null
+++ b/ansible/roles/testenv-coredump-helper/handlers/main.yml
@@ -0,0 +1,5 @@
+---
+- name: restart testenv-coredump-helper
+ service:
+ name: testenv-coredump-helper
+ state: restarted
diff --git a/ansible/roles/testenv-coredump-helper/tasks/main.yml b/ansible/roles/testenv-coredump-helper/tasks/main.yml
new file mode 100644
index 0000000..9ff2769
--- /dev/null
+++ b/ansible/roles/testenv-coredump-helper/tasks/main.yml
@@ -0,0 +1,31 @@
+---
+- name: install coredumpctl
+ apt:
+ name:
+ - systemd-coredump
+ cache_valid_time: 3600
+ update_cache: yes
+
+- name: mkdir /opt/testenv-coredump-helper
+ ansible.builtin.file:
+ path: /opt/testenv-coredump-helper
+ state: directory
+
+- name: install testenv-coredump-helper
+ ansible.builtin.copy:
+ src: testenv-coredump-helper.py
+ dest: /opt/testenv-coredump-helper/testenv-coredump-helper
+ mode: '0755'
+ notify: restart testenv-coredump-helper
+
+- name: install testenv-coredump-helper service
+ ansible.builtin.copy:
+ src: testenv-coredump-helper.service
+ dest: /etc/systemd/system/testenv-coredump-helper.service
+ mode: '0644'
+ notify: restart testenv-coredump-helper
+
+- name: enable testenv-coredump-helper service
+ ansible.builtin.systemd_service:
+ name: testenv-coredump-helper
+ enabled: true
diff --git a/ansible/setup-build-host.yml b/ansible/setup-build-host.yml
index ed8def5..d1d9874 100644
--- a/ansible/setup-build-host.yml
+++ b/ansible/setup-build-host.yml
@@ -18,3 +18,4 @@
update_cache: yes
roles:
- name: apt-allow-relinfo-change
+ - name: testenv-coredump-helper
--
To view, visit https://gerrit.osmocom.org/c/osmo-ci/+/40138?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings?usp=email
Gerrit-MessageType: merged
Gerrit-Project: osmo-ci
Gerrit-Branch: master
Gerrit-Change-Id: I7e66c98106b7028a393e3b873e96ae2dcb412c48
Gerrit-Change-Number: 40138
Gerrit-PatchSet: 2
Gerrit-Owner: osmith <osmith(a)sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: fixeria <vyanitskiy(a)sysmocom.de>
Gerrit-Reviewer: osmith <osmith(a)sysmocom.de>
Gerrit-Reviewer: pespin <pespin(a)sysmocom.de>
osmith has submitted this change. ( https://gerrit.osmocom.org/c/osmo-ci/+/40139?usp=email )
Change subject: jobs/ttcn3-testsuites-testenv: set core env var
......................................................................
jobs/ttcn3-testsuites-testenv: set core env var
Configure testenv jobs to get coredumps from lxc hosts.
Related: OS#6769
Change-Id: I8359b0faa1fed76b430749589916cd072a8a7753
---
M jobs/ttcn3-testsuites-testenv.yml
1 file changed, 1 insertion(+), 0 deletions(-)
Approvals:
fixeria: Looks good to me, approved
pespin: Looks good to me, but someone else must approve
Jenkins Builder: Verified
laforge: Looks good to me, but someone else must approve
diff --git a/jobs/ttcn3-testsuites-testenv.yml b/jobs/ttcn3-testsuites-testenv.yml
index a5e90af..f6a7798 100644
--- a/jobs/ttcn3-testsuites-testenv.yml
+++ b/jobs/ttcn3-testsuites-testenv.yml
@@ -530,6 +530,7 @@
export TESTENV_SOURCE_HIGHLIGHT_COLORS="esc"
export TESTENV_NO_IMAGE_UP_TO_DATE_CHECK=1
export TESTENV_NO_KVM=1
+ export TESTENV_COREDUMP_FROM_LXC_HOST=1
set -x
./testenv.py run \
--
To view, visit https://gerrit.osmocom.org/c/osmo-ci/+/40139?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings?usp=email
Gerrit-MessageType: merged
Gerrit-Project: osmo-ci
Gerrit-Branch: master
Gerrit-Change-Id: I8359b0faa1fed76b430749589916cd072a8a7753
Gerrit-Change-Number: 40139
Gerrit-PatchSet: 2
Gerrit-Owner: osmith <osmith(a)sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: fixeria <vyanitskiy(a)sysmocom.de>
Gerrit-Reviewer: laforge <laforge(a)osmocom.org>
Gerrit-Reviewer: osmith <osmith(a)sysmocom.de>
Gerrit-Reviewer: pespin <pespin(a)sysmocom.de>
fixeria has submitted this change. ( https://gerrit.osmocom.org/c/osmo-bts/+/40156?usp=email )
Change subject: osmo-bts-trx: trx_fn_timer_cb(): fix misleading shutdown reason
......................................................................
osmo-bts-trx: trx_fn_timer_cb(): fix misleading shutdown reason
If osmo-bts-trx exit()s due to the PC clock issues, e.g. if the
process stalls, it produces rather confusing logging messages:
DL1C ERROR PC clock skew: elapsed_us=387574, error_us=382959
DOML NOTICE ... Shutting down BTS, exit 1, reason: No clock from osmo-trx
The second message suggests that the transceiver (osmo-trx) is the
culprit, but the first one reflects the actual reason (PC clock skew).
Let's pass proper shutdown reason to avoid confusion.
Change-Id: Ibbbbc4e919e6eb812882fc60de4be13fa77934b7
---
M src/osmo-bts-trx/scheduler_trx.c
1 file changed, 11 insertions(+), 7 deletions(-)
Approvals:
laforge: Looks good to me, but someone else must approve
pespin: Looks good to me, but someone else must approve
fixeria: Looks good to me, approved
Jenkins Builder: Verified
diff --git a/src/osmo-bts-trx/scheduler_trx.c b/src/osmo-bts-trx/scheduler_trx.c
index 00143ab..a9a53f6 100644
--- a/src/osmo-bts-trx/scheduler_trx.c
+++ b/src/osmo-bts-trx/scheduler_trx.c
@@ -411,6 +411,7 @@
struct timespec tv_now;
uint64_t expire_count;
int64_t elapsed_us, error_us;
+ const char *reason = NULL;
int rc, i;
if (!(what & OSMO_FD_READ))
@@ -430,8 +431,9 @@
/* check if transceiver is still alive */
if (tcs->fn_without_clock_ind++ == TRX_LOSS_FRAMES) {
- LOGP(DL1C, LOGL_NOTICE, "No more clock from transceiver\n");
- goto no_clock;
+ reason = "No more clock from transceiver";
+ LOGP(DL1C, LOGL_ERROR, "%s\n", reason);
+ goto shutdown;
}
/* compute actual elapsed time and resulting OS scheduling error */
@@ -446,9 +448,11 @@
/* if someone played with clock, or if the process stalled */
if (elapsed_us > GSM_TDMA_FN_DURATION_uS * MAX_FN_SKEW || elapsed_us < 0) {
- LOGP(DL1C, LOGL_ERROR, "PC clock skew: elapsed_us=%" PRId64 ", error_us=%" PRId64 "\n",
- elapsed_us, error_us);
- goto no_clock;
+ LOGP(DL1C, LOGL_ERROR,
+ "PC clock skew: elapsed_us=%" PRId64 ", error_us=%" PRId64 "\n",
+ elapsed_us, error_us);
+ reason = "PC clock skew too high";
+ goto shutdown;
}
/* call bts_sched_fn() for all expired FN */
@@ -457,9 +461,9 @@
return 0;
-no_clock:
+shutdown:
osmo_timerfd_disable(&tcs->fn_timer_ofd);
- bts_shutdown(bts, "No clock from osmo-trx");
+ bts_shutdown(bts, reason);
return -1;
}
--
To view, visit https://gerrit.osmocom.org/c/osmo-bts/+/40156?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings?usp=email
Gerrit-MessageType: merged
Gerrit-Project: osmo-bts
Gerrit-Branch: master
Gerrit-Change-Id: Ibbbbc4e919e6eb812882fc60de4be13fa77934b7
Gerrit-Change-Number: 40156
Gerrit-PatchSet: 3
Gerrit-Owner: fixeria <vyanitskiy(a)sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: fixeria <vyanitskiy(a)sysmocom.de>
Gerrit-Reviewer: laforge <laforge(a)osmocom.org>
Gerrit-Reviewer: osmith <osmith(a)sysmocom.de>
Gerrit-Reviewer: pespin <pespin(a)sysmocom.de>
laforge has submitted this change. ( https://gerrit.osmocom.org/c/libosmo-sigtran/+/40157?usp=email )
Change subject: cosmetic: osmo_sccp_user_bind() clarify prim_cb msgb ownership in documentation
......................................................................
cosmetic: osmo_sccp_user_bind() clarify prim_cb msgb ownership in documentation
Change-Id: I6dcda9221aa77809fe0f10e0e159558aad07885c
---
M src/sccp_user.c
1 file changed, 22 insertions(+), 4 deletions(-)
Approvals:
fixeria: Looks good to me, but someone else must approve
laforge: Looks good to me, approved
Jenkins Builder: Verified
diff --git a/src/sccp_user.c b/src/sccp_user.c
index b869dab..62ae9ed 100644
--- a/src/sccp_user.c
+++ b/src/sccp_user.c
@@ -92,9 +92,14 @@
/*! \brief Bind a SCCP User to a given Point Code
* \param[in] inst SCCP Instance
* \param[in] name human-readable name
+ * \param[in] prim_cb User provided callback to pass a primitive/msg up the stack
* \param[in] ssn Sub-System Number to bind to
* \param[in] pc Point Code to bind to, or OSMO_SS7_PC_INVALID if none.
- * \returns Callee-allocated SCCP User on success; negative otherwise */
+ * \returns Callee-allocated SCCP User on success; negative otherwise
+ *
+ * Ownership of oph->msg in prim_cb is transferred to the user of the
+ * registered callback when called.
+ */
static struct osmo_sccp_user *
sccp_user_bind_pc(struct osmo_sccp_instance *inst, const char *name,
osmo_prim_cb prim_cb, uint16_t ssn, uint32_t pc)
@@ -127,9 +132,14 @@
/*! \brief Bind a given SCCP User to a given SSN+PC
* \param[in] inst SCCP Instance
* \param[in] name human-readable name
+ * \param[in] prim_cb User provided callback to pass a primitive/msg up the stack
* \param[in] ssn Sub-System Number to bind to
* \param[in] pc Point Code to bind to
- * \returns Callee-allocated SCCP User on success; negative otherwise */
+ * \returns Callee-allocated SCCP User on success; negative otherwise
+ *
+ * Ownership of oph->msg in prim_cb is transferred to the user of the
+ * registered callback when called.
+ */
struct osmo_sccp_user *
osmo_sccp_user_bind_pc(struct osmo_sccp_instance *inst, const char *name,
osmo_prim_cb prim_cb, uint16_t ssn, uint32_t pc)
@@ -140,8 +150,13 @@
/*! \brief Bind a given SCCP User to a given SSN (at any PC)
* \param[in] inst SCCP Instance
* \param[in] name human-readable name
+ * \param[in] prim_cb User provided callback to pass a primitive/msg up the stack
* \param[in] ssn Sub-System Number to bind to
- * \returns Callee-allocated SCCP User on success; negative otherwise */
+ * \returns Callee-allocated SCCP User on success; negative otherwise
+ *
+ * Ownership of oph->msg in prim_cb is transferred to the user of the
+ * registered callback when called.
+ */
struct osmo_sccp_user *
osmo_sccp_user_bind(struct osmo_sccp_instance *inst, const char *name,
osmo_prim_cb prim_cb, uint16_t ssn)
@@ -175,7 +190,10 @@
/*! \brief Send a SCCP User SAP Primitive up to the User
* \param[in] scu SCCP User to whom to send the primitive
* \param[in] prim Primitive to send to the user
- * \returns return value of the SCCP User's prim_cb() function */
+ * \returns return value of the SCCP User's prim_cb() function
+ *
+ * Ownership of prim->oph->msg is passed to the user of the registered callback
+ */
int sccp_user_prim_up(struct osmo_sccp_user *scu, struct osmo_scu_prim *prim)
{
LOGP(DLSCCP, LOGL_DEBUG, "Delivering %s to SCCP User '%s'\n",
--
To view, visit https://gerrit.osmocom.org/c/libosmo-sigtran/+/40157?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings?usp=email
Gerrit-MessageType: merged
Gerrit-Project: libosmo-sigtran
Gerrit-Branch: master
Gerrit-Change-Id: I6dcda9221aa77809fe0f10e0e159558aad07885c
Gerrit-Change-Number: 40157
Gerrit-PatchSet: 1
Gerrit-Owner: pespin <pespin(a)sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: fixeria <vyanitskiy(a)sysmocom.de>
Gerrit-Reviewer: laforge <laforge(a)osmocom.org>