On 3. Sep 2017, at 20:41, Harald Welte
<laforge(a)gnumonks.org> wrote:
Hi,
Today the cgit jail quota again was exceeded and we
saw 504 http errors
when accessing
http://git.osmocom.org/
The reason was indeed that the cache grew to fill the entire 5GB quota.
I checked what kind of files were occupying that much cache: Unfortunately
more than 150 files each larger than 10MBytes as a result of cgit caching
the .tar.gz snapshots it creates.
Unfortunately the cgit cache can only be restricted in number of files, not
in terms of total size or "don't cache files larger than X".
As a workaround I disabled snapshot generation for now. I presume there
was some crawler that generated snapshots for lots of commits.
we had enabled snapshots as some of "our" Osmocom developers wanted the
feature. And some people continue to clone the website (as if cloning a
git repository couldn't be done easier).
On my todays flight I came up with a solution but will only implement it
the next days (unless someone else is doing it).
* disable caching in cgit
* enable caching inside the cgit nginx for the majority of URLs. Luckily
with git the rendering of a specific commit will not change...
* have crawler specific robots.txt to disable the SEO ones..
what do you think?
holger
--
Holger Freyther <hfreyther(a)sysmocom.de>
http://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Directors: Harald Welte