Just now I only got long waits and then "502 Bad Gateway" on git.osmocom.org ... the cgit ezjail hit its root partition limit. I had to re-figure out what I did last time, tried to find ezjail-admin options until finally I ended up back in zfs and started remembering... I increased the zfs quota for the cgit jail from 10G to 15G. Now it's working again.
For the record, resizing an ezjail goes like # zfs list # zfs get quota tank/jails/cgit # zfs set quota=15G tank/jails/cgit (so next time I can find it in my mail archive, and I guess it should go in a wiki somewhere. On osmocom.org?)
Last time it was the jenkins jail hitting disc limits. Then I said "We should probably set a refquota so that the live file system's quota is somewhat independent from the quota used for snapshots." I see the jenkins jail now has a 40G refquota, but the others seem not to. How does it work, set a huge quota to allow space for snapshots, and limit the fs itself by a refquota?
~N
On 2. Sep 2017, at 03:14, Neels Hofmeyr nhofmeyr@sysmocom.de wrote:
Hi!
Just now I only got long waits and then "502 Bad Gateway" on git.osmocom.org ... the cgit ezjail hit its root partition limit. I had to re-figure out what I did last time, tried to find ezjail-admin options until finally I ended up back in zfs and started remembering... I increased the zfs quota for the cgit jail from 10G to 15G. Now it's working again.
What takes 10G? The actual git repos are mounted RO and should not consume any space. The only thing that should take space is the log and the cgit cache.
Last time it was the jenkins jail hitting disc limits. Then I said "We should probably set a refquota so that the live file system's quota is somewhat independent from the quota used for snapshots." I see the jenkins jail now has a 40G refquota, but the others seem not to. How does it work, set a huge quota to allow space for snapshots, and limit the fs itself by a refquota?
quota -> refquota is a good thing. Let's do it. I think we ran into this not because the actual jail size was > 10GB but the cache was re-written over the last weeks.
holger
Hi Neels,
thanks for looking into this.
On Sat, Sep 02, 2017 at 03:14:33AM +0200, Neels Hofmeyr wrote:
Just now I only got long waits and then "502 Bad Gateway" on git.osmocom.org ... the cgit ezjail hit its root partition limit. I had to re-figure out what I did last time, tried to find ezjail-admin options until finally I ended up back in zfs and started remembering... I increased the zfs quota for the cgit jail from 10G to 15G. Now it's working again.
the question is why does it need that much quota? Right now I can see that it only uses only 2.1GB on disk:
root@cgit:/ # df -h Filesystem Size Used Avail Capacity Mounted on tank/jails/cgit 8.4G 2.1G 6.3G 25% /
Did you (or somebody else) clear the cgit cache (it's currently 1.8GB)? The cache is limited in number of files (10k), not in size of those files by (cgitrc 'cache-size' attribute).
What has grown beyond that existing 10GB limit?
For the record, resizing an ezjail goes like # zfs list # zfs get quota tank/jails/cgit # zfs set quota=15G tank/jails/cgit (so next time I can find it in my mail archive, and I guess it should go in a wiki somewhere. On osmocom.org?)
sure, that's why we have the http://osmocom.org/projects/osmocom-servers redmine for, I guess.
Last time it was the jenkins jail hitting disc limits. Then I said "We should probably set a refquota so that the live file system's quota is somewhat independent from the quota used for snapshots."
when are snapshots used?
On Sat, Sep 02, 2017 at 10:03:24AM +0200, Harald Welte wrote:
the question is why does it need that much quota? Right now I can see that it only uses only 2.1GB on disk:
root@cgit:/ # df -h Filesystem Size Used Avail Capacity Mounted on tank/jails/cgit 8.4G 2.1G 6.3G 25% /
What has grown beyond that existing 10GB limit?
When this happened, the root fs was on 4.1G, with a 'quota' of 10G, i.e. probably ~6G taken up by snapshots. Something must have shrunk since then.
I've now changed to refquota=5G (so the root fs can be up to 5G, chose 5 since I saw it hit 4G) and removed the quota (i.e. the snapshots will not cause us to limit disk space on the jail).
Did you (or somebody else) clear the cgit cache (it's currently 1.8GB)?
Maybe someone cleared it and that's why we have more space now? Wasn't me. Looking at /var/chache/cgit it doesn't look like it was cleared.
I now see
NAME USED AVAIL REFER MOUNTPOINT tank/jails/cgit 8.87G 3.25G 1.75G /usr/jails/cgit
i.e. 1.75G taken up by root fs, ~7 more G taken up by snapshots.
It seems a max of 5G is way beyond the current 1.75G, but I can't tell what grew the root fs to 4G ... I'm happy to accept any other value.
Maybe it was the cache that grew this large, and maybe the recent cgit rendering failures were due to hitting disc space limits? ... no, I just cleared the cgit cache and "my" file still renders empty.
(cleared by cd /var/cache mkdir not_cgit mv cgit/* not_cgit/ and then testing whether it still works, and it seems to work. So I'm now doing 'rm -rf not_cgit'. For the record root@cgit:/var/cache # du -hs not_cgit/ 1.5G not_cgit/ root@cgit:/var/cache # df -h Filesystem Size Used Avail Capacity Mounted on tank/jails/cgit 5.0G 1.9G 3.1G 38% /
root@cgit:/var/cache # rm -rf not_cgit/ root@cgit:/var/cache # df -h Filesystem Size Used Avail Capacity Mounted on tank/jails/cgit 5.0G 368M 4.6G 7% /
Seems like we can shrink refquota considerably, and what grew to 4G remains a mystery.)
when are snapshots used?
I'm not entirely sure. Holger?
For the record, resizing an ezjail goes like
[...]
sure, that's why we have the http://osmocom.org/projects/osmocom-servers redmine for, I guess.
Added an initial https://osmocom.org/projects/osmocom-servers/wiki/Osmocomorg_Web_Servers
~N
On 3. Sep 2017, at 02:08, Neels Hofmeyr nhofmeyr@sysmocom.de wrote:
when are snapshots used?
I'm not entirely sure. Holger?
They are automatically created by "zfsnap" and are a safety net for us to allow a quick rollback or find older files.
holger
Today the cgit jail quota again was exceeded and we saw 504 http errors when accessing http://git.osmocom.org/
The reason was indeed that the cache grew to fill the entire 5GB quota.
I checked what kind of files were occupying that much cache: Unfortunately more than 150 files each larger than 10MBytes as a result of cgit caching the .tar.gz snapshots it creates.
Unfortunately the cgit cache can only be restricted in number of files, not in terms of total size or "don't cache files larger than X".
As a workaround I disabled snapshot generation for now. I presume there was some crawler that generated snapshots for lots of commits.
Regards, Harald
On Sun, Sep 03, 2017 at 08:41:39PM +0200, Harald Welte wrote:
I checked what kind of files were occupying that much cache: Unfortunately more than 150 files each larger than 10MBytes as a result of cgit caching the .tar.gz snapshots it creates.
As a workaround I disabled snapshot generation for now. I presume there was some crawler that generated snapshots for lots of commits.
Just to clarify ... those tar.gz snapshots are obviously not related to the zfs snapshots we were talking about before, but some cgit specific mechanism.
~N
On 3. Sep 2017, at 20:41, Harald Welte laforge@gnumonks.org wrote:
Hi,
Today the cgit jail quota again was exceeded and we saw 504 http errors when accessing http://git.osmocom.org/
The reason was indeed that the cache grew to fill the entire 5GB quota.
I checked what kind of files were occupying that much cache: Unfortunately more than 150 files each larger than 10MBytes as a result of cgit caching the .tar.gz snapshots it creates.
Unfortunately the cgit cache can only be restricted in number of files, not in terms of total size or "don't cache files larger than X".
As a workaround I disabled snapshot generation for now. I presume there was some crawler that generated snapshots for lots of commits.
we had enabled snapshots as some of "our" Osmocom developers wanted the feature. And some people continue to clone the website (as if cloning a git repository couldn't be done easier).
On my todays flight I came up with a solution but will only implement it the next days (unless someone else is doing it).
* disable caching in cgit * enable caching inside the cgit nginx for the majority of URLs. Luckily with git the rendering of a specific commit will not change... * have crawler specific robots.txt to disable the SEO ones..
what do you think?
holger
Hi Holger,
On Mon, Sep 04, 2017 at 10:02:43PM +0200, Holger Freyther wrote:
we had enabled snapshots as some of "our" Osmocom developers wanted the feature. And some people continue to clone the website (as if cloning a git repository couldn't be done easier).
thanks for pointing this out. I'll make a comment in the config file about this.
On my todays flight I came up with a solution but will only implement it the next days (unless someone else is doing it).
- disable caching in cgit
- enable caching inside the cgit nginx for the majority of URLs. Luckily
with git the rendering of a specific commit will not change...
it changes, there's a small grey timestamp at the bottom of each html page (unless the page is raw). I had to figure this out when finding URLs that I could use to reliably invalidate the Docker cachen once a given branch of a repo changes. See e.g. line 20 of http://git.osmocom.org/docker-playground/tree/osmo-ggsn-master/Dockerfile#n2...
So yes, most users probably won't care if the timestam at the bottom of the html pages is wrong.
Please make sure though that URLS referring to specific branch HEADS are not cached though, such as http://git.osmocom.org/openggsn/patch/?h=laforge/osmo-ggsn as those are used from the Dockerfiles to detect if the HEAD of the given branch has changed or not.
what do you think?
First: Thanks for looking into this! I think I would still prefer a patch to cgit. Limiting either the size of individual cached objects, or having a soft limit on total size of the cache should be generally useful features for upstream, not just for us. But yes, it doesn't look like a trivial way given how they implement the cache. Maybe the maintainers have an idea about this?
But then, up to you!