We recently hit a bunch of jenkins failures, due to a full disk.
Just now I removed 172G worth of docker images from build2-deb9build-ansible;
I thought we had the docker cleanup automated by now?
Even after that, build-2 still uses 244G of its root file system, which doesn't
seem right. most of it is also in the deb9build-ansible lxc:
root@build-2 /var/lib/lxc/deb9build-ansible/rootfs # du -hs * | sort -h
[...]
2.2G opt
5.8G usr
8.1G tmp (what!
33G home
153G var
The tmp/ has many folders like
196M tmp.u3y02wgBNI
which are all from March to May this year. I will delete them now.
home:
root@build-2 /var/lib/lxc/deb9build-ansible/rootfs/home/osmocom-build # du -hs *
0 bin
19G jenkins
14G jenkins_build_artifact_store
1.2G osmo-ci
Interesting, I wasn't aware of us using the artifact store.
Seems to come from some manual builds between April-October.
Removing.
jenkins workspaces of 19G seems ok.
But osmo-ci of 1.2G!?
That seems to be a manual build of the coverity job -- though the date is
pretty recent, so is our coverity job actually building in
~osmocom-build/osmo-ci instead of in a workspace?
Even after the docker cleanup commands I used from the
osmocom.org servers wiki page:
docker rm $(docker ps -a -q)
docker rmi $(docker images -q -f dangling=true)
There are still 321 docker images around, most of which are months old.
Not sure why above cleanups don't catch those.
I'm just going to indiscriminately blow all of them away now.
Maybe a good cleanup strategy would be to every week or so automatically wipe
out the entire build slave lxc and re-create it from scratch?
After this, we have on build-2:
Filesystem Size Used Avail Use% Mounted on
/dev/md2 438G 83G 333G 20% /
------ host-2
Similar story on host-2 deb9build-ansible lxc: tons of docker images, just removed all of
them.
But after that we still have
root@host2 ~ # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md2 438G 311G 105G 75% /
On host-2 though there are a lot of services running.
root@host2 / # du -hs * | sort -h
[...]
1.2G usr
59G var
75G home
176G external
[...]
2.7G gerrit
3.1G redmine-20170530-before-upgrade-to-3.4.tar
4.3G mailman
5.7G openmoko-wiki
7.8G gitolite
9.9G openmoko-people
29G redmine
112G jenkins
root@host2 /external/jenkins/home/jobs # du -hs * | sort -h
171M nplab-m3ua-test
198M master-osmo-pcu
241M ttcn3-sip-test
251M osmo-gsm-tester_build-osmo-bsc
262M ttcn3-ggsn-test
287M gerrit-osmo-ttcn3-hacks
297M master-osmo-bsc
322M master-libosmo-sccp
328M osmo-gsm-tester_build-osmo-sgsn
355M master-osmo-mgw
359M master-libosmo-netif
365M osmo-gsm-tester_build-osmo-iuh
390M gerrit-asn1c
392M gerrit-osmo-bsc
419M ttcn3-nitb-sysinfo
445M osmo-gsm-tester_build-osmo-msc
456M osmo-gsm-tester_manual-build-all
461M master-libosmocore
461M TEST_osmocomBB_with_libosmocore_dep
482M master-osmo-iuh
611M master-osmo-sgsn
704M gerrit-osmo-bts
748M master-osmo-msc
756M gerrit-osmo-msc
929M master-openbsc
1.1G master-osmo-bts
1.1G ttcn3-hlr-test
1.2G gerrit-libosmocore
1.2G ttcn3-mgw-test
1.9G osmo-gsm-tester-rnd_run
2.0G ttcn3-sgsn-test
3.0G ttcn3-msc-test
3.2G osmo-gsm-tester_run
3.5G master-asn1c
4.2G ttcn3-bsc-test-sccplite
4.7G osmo-gsm-tester_run-rnd
6.2G osmo-gsm-tester_gerrit
6.3G osmo-gsm-tester_run-prod
7.5G osmo-gsm-tester_ttcn3
8.5G ttcn3-bsc-test
43G ttcn3-bts-test
It seems we are caching 211 ttcn3-bts-test builds. That seems a tad much.
Indeed
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/configure
has "[ ] Discard old builds" (unchecked).
Looking in osmo-ci, the jobs/ttcn3-testsuites.yml has no 'build-discarder' set.
I guess we should add one? Any discard option preferences? A month? A year?
(compare master-builds.yml)
----- admin-2
It seems I can not login there, or at least I don't know the IP address...
ssh: Could not resolve hostname
admin2.osmocom.org: Name or service not known
So I guess I can't check there.
~N