It seems that http://jenkins.osmocom.org/jenkins/job/OpenBSC-gerrit/186/ is stuck:
Configuration OpenBSC-gerrit ? --disable-mgcp-transcoding,--enable-smpp,linux_amd64_ubuntu_1504 is still in the queue: Ubuntu-1504-64 is offline
and http://jenkins.osmocom.org/jenkins/computer/Ubuntu-1504-64/log :
ERROR: [06/06/16 15:46:34] [SSH] Error deleting file. java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1279) at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:618) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [06/06/16 15:46:34] [SSH] Connection closed. ERROR: [06/06/16 15:46:34] [SSH] Error deleting file. java.lang.IllegalStateException: Cannot open session, you need to establish a connection first. at com.trilead.ssh2.Connection.openSession(Connection.java:1124) at com.trilead.ssh2.Connection.exec(Connection.java:1551) at hudson.plugins.sshslaves.SSHLauncher$3.run(SSHLauncher.java:1259) [06/06/16 15:46:34] [SSH] Opening SSH connection to 127.0.6.1:2222. at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
One of the builds for this patch failed already.
It seems the ssh is not retrying to connect but just failed ~12 hours ago and will sit there forever; so I removed the scheduled build.
Alas, the next build (for the same patch) goes stuck the same way, so the build slave seems to be offline for reals. How to fix?
~Neels
On Tue, Jun 07, 2016 at 02:19:27AM +0200, Neels Hofmeyr wrote:
It seems that http://jenkins.osmocom.org/jenkins/job/OpenBSC-gerrit/186/ is stuck:
Configuration OpenBSC-gerrit ? --disable-mgcp-transcoding,--enable-smpp,linux_amd64_ubuntu_1504 is still in the queue: Ubuntu-1504-64 is offline
and http://jenkins.osmocom.org/jenkins/computer/Ubuntu-1504-64/log :
ERROR: [06/06/16 15:46:34] [SSH] Error deleting file. java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1279) at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:618) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [06/06/16 15:46:34] [SSH] Connection closed. ERROR: [06/06/16 15:46:34] [SSH] Error deleting file. java.lang.IllegalStateException: Cannot open session, you need to establish a connection first. at com.trilead.ssh2.Connection.openSession(Connection.java:1124) at com.trilead.ssh2.Connection.exec(Connection.java:1551) at hudson.plugins.sshslaves.SSHLauncher$3.run(SSHLauncher.java:1259) [06/06/16 15:46:34] [SSH] Opening SSH connection to 127.0.6.1:2222. at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
On 07 Jun 2016, at 02:26, Neels Hofmeyr nhofmeyr@sysmocom.de wrote:
Hi!
One of the builds for this patch failed already.
It seems the ssh is not retrying to connect but just failed ~12 hours ago and will sit there forever; so I removed the scheduled build.
Alas, the next build (for the same patch) goes stuck the same way, so the build slave seems to be offline for reals. How to fix?
sorry about that. I fixed it and it builds again but it will break again as well.
Long story:
At OsmoDevCon I upgraded jenkins to a less vulnerable version. This required a JDK/JRE upgrade on our Debian6.0/i386 (Linux syscall compat by FreeBSD) build system and somehow this still failed. So in a rush I moved the builds to use the Ubuntu based AMD64 build that has been used for the asciidoc generation.
Now to the bad stuff. The VM/jail is not reboot safe as on boot /usr/local and other directories are not in the path _and_ the VirtualBox disk image is a plain file in a filesystem with quota. It runs out of quota because once a day the zfs-snap tool runs and makes snapshots of all volumes (it can't exclude a specific one) and this means that even removing files will not lead to more space.
The Plan:
Sysmocom has agreed to move the builder from my own machine to a newly rented one and then I will use bhyve + ZFS disk volume (block device backed by ZFS) and the problem will be gone. The only issue is that I didn't have time for that the last two weekends.
holger
PS: I will probably write a small script to undo some of the work zfs-snap did everyday.