I looked at why the 420G of the Osmocombuild1 root fs are full.
1)
Slave "build1-debian9-lxc" is running as lxc called 'docker'. /var/lib/lxc/docker/rootfs/var/lib/docker/vfs/dir contains 221G in 333 docker snapshots. It seems to me that these are growing indefinitely, that nothing is removing docker hashes that have been built. Does anyone have a discarding strategy ready that we can deploy there? We need to add one to avoid the build slave filling up again. https://osmocom.org/issues/2539
For now I did docker rmi $(docker images -q -f dangling=true) and rm'd a few stopped containers that had started an openbsc test: docker rm $(docker ps -aq) which removed hundreds of hashes and freed ~110G But half of the 221G are still there, not sure if we need those.
I also note that this lxc is not starting automatically after I rebooted Osmocombuild1 (to get rid of some stuck processes). I added lxc.start.auto = 1 to /var/lib/lxc/docker/config
Now the server is back up and running, the lxc is started, but the jenkins slave refuses to connect. It tries port 45 but nothing is listening there. I think I've hit this before but can't remember how to solve it :/ https://osmocom.org/issues/2540
2)
Slave "Osmocombuild1" is running as user osmocom-build on the host's OS itself. The workspaces make up one large chunk of the disk fill: 94G.
Looking at OpenBSC@3, it builds up to 3.8G from the various build matrix workspaces, where each has a compiled libosmocore at >110MB and osmo-iuh at >130MB. Then the parrallel builds each make an own workspace, exploding the total. The good news is that it should cap *somewhere* and not grow indefinitely.
We probably should adjust the OpenBSC and various gerrit jobs to clean up the workspace after they are done. We so far never needed the binaries. https://osmocom.org/issues/2538
~N
Hi Neels,
On Thu, Oct 05, 2017 at 07:25:48PM +0200, Neels Hofmeyr wrote:
Slave "build1-debian9-lxc" is running as lxc called 'docker'. /var/lib/lxc/docker/rootfs/var/lib/docker/vfs/dir contains 221G in 333 docker snapshots. It seems to me that these are growing indefinitely, that nothing is removing docker hashes that have been built. Does anyone have a discarding strategy ready that we can deploy there? We need to add one to avoid the build slave filling up again. https://osmocom.org/issues/2539
I updated the ticket. Real fix is still pending.
I also note that this lxc is not starting automatically after I rebooted Osmocombuild1 (to get rid of some stuck processes). I added lxc.start.auto = 1 to /var/lib/lxc/docker/config
Now the server is back up and running, the lxc is started, but the jenkins slave refuses to connect. It tries port 45 but nothing is listening there. I think I've hit this before but can't remember how to solve it :/ https://osmocom.org/issues/2540
I updated that issue, and hopefully fixed it.