Change in osmo-gsm-tester[master]: process: Prevent NetNSProcess alive forever after SIGKILL

Pau Espin Pedrol gerrit-no-reply at lists.osmocom.org
Wed Apr 3 15:59:22 UTC 2019


Pau Espin Pedrol has uploaded this change for review. ( https://gerrit.osmocom.org/13511


Change subject: process: Prevent NetNSProcess alive forever after SIGKILL
......................................................................

process: Prevent NetNSProcess alive forever after SIGKILL

NetNSProcess are run in the following process tree:
osmo-gsm-tester -> sudo -> bash (osmo-gsm-tester_netns_exec.sh) ->
tcpdump.

Lots of osmo-gsm-tester_netns_exec.sh scripts with tcpdump child process
were spotted in prod setup of osmo-gsm-tester. Apparently that happens
because sometimes tcpdump doesn't get killed in time with SIGTERM and
SIGINT, and as a result SIGKILL is sent by osmo-gsm-tester as usual
termination procedure. When SIGKILL is sent, the parent sudo process is
instantly killed without possibility to forward the signal to its
children, leaving the bash script and tcpdump alive.

In order to fix it, catch SIGKILL for this process class and send
instead SIGUSR1. Then, modify the script under sudo to handle SIGUSR1 as
if it was a SIGKILL towards its children to make sure child process in
the netns terminates.

Change-Id: I2bf389c47bbbd75f46af413e7ba897be5be386e1
---
M src/osmo_gsm_tester/process.py
M utils/osmo-gsm-tester_netns_exec.sh
2 files changed, 42 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.osmocom.org:29418/osmo-gsm-tester refs/changes/11/13511/1

diff --git a/src/osmo_gsm_tester/process.py b/src/osmo_gsm_tester/process.py
index 7ecb67e..441d4ea 100644
--- a/src/osmo_gsm_tester/process.py
+++ b/src/osmo_gsm_tester/process.py
@@ -363,6 +363,11 @@
     # HACK: Since we run under sudo, only way to kill root-owned process is to kill as root...
     # This function is overwritten from Process.
     def send_signal(self, sig):
+        if sig == signal.SIGKILL:
+            # if we kill sudo, its children (bash running NETNS_EXEC_BIN +
+            # tcpdump under it) are kept alive. Let's instead tell the script to
+            # kill tcpdump:
+            sig = signal.SIGUSR1
         kill_cmd = ('kill', '-%d' % int(sig), str(self.process_obj.pid))
         run_local_netns_sync(self.run_dir, self.name()+"-kill"+str(sig), self.netns, kill_cmd)
 
diff --git a/utils/osmo-gsm-tester_netns_exec.sh b/utils/osmo-gsm-tester_netns_exec.sh
index 336b746..182ebff 100755
--- a/utils/osmo-gsm-tester_netns_exec.sh
+++ b/utils/osmo-gsm-tester_netns_exec.sh
@@ -1,5 +1,41 @@
 #!/bin/bash
 netns="$1"
 shift
+
+child_ps=0
+forward_kill() {
+	sig="$1"
+	echo "Caught signal SIG$sig!"
+	if [ "$child_ps" != "0" ]; then
+		echo "Killing $child_ps with SIG$sig!"
+		kill -SIG${sig} $child_ps
+	else
+		exit 0
+	fi
+}
+forward_kill_int() {
+	forward_kill "INT"
+}
+forward_kill_term() {
+	forward_kill "TERM"
+}
+forward_kill_usr1() {
+	# Special signal received from osmo-gsm-tester to tell child to SIGKILL
+	echo "Converting SIGUSR1->SIGKILL"
+	forward_kill "KILL"
+}
+# Don't use 'set -e', otherwise traps are not triggered!
+trap forward_kill_int INT
+trap forward_kill_term TERM
+trap forward_kill_usr1 USR1
+
 #TODO: Later on I may want to call myself with specific ENV and calling sudo in order to run inside the netns but with dropped privileges
-ip netns exec $netns "$@"
+ip netns exec $netns "$@" &
+child_ps=$!
+
+echo "$$: waiting for $child_ps"
+wait "$child_ps"
+child_exit_code="$?"
+echo "child exited with $child_exit_code"
+
+exit $child_exit_code

-- 
To view, visit https://gerrit.osmocom.org/13511
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings

Gerrit-Project: osmo-gsm-tester
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2bf389c47bbbd75f46af413e7ba897be5be386e1
Gerrit-Change-Number: 13511
Gerrit-PatchSet: 1
Gerrit-Owner: Pau Espin Pedrol <pespin at sysmocom.de>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osmocom.org/pipermail/gerrit-log/attachments/20190403/e9224f70/attachment.html>


More information about the gerrit-log mailing list