From mboxrd@z Thu Jan 1 00:00:00 1970 From: john at keeping.me.uk (John Keeping) Date: Mon, 19 Jan 2015 19:50:12 +0000 Subject: Problem with cgit cache In-Reply-To: <54BD582C.4010609@eclipse.org> References: <54BD582C.4010609@eclipse.org> Message-ID: <20150119195012.GA8026@serenity.lan> On Mon, Jan 19, 2015 at 02:17:00PM -0500, Eclipse Webmaster (Denis Roy) wrote: > We use cgit for about 800 Git repos. Lately we've noticed that the links > in the cache become polluted. We've noticed hits like this in the logs, > which come from Search Bots, which seem to match the garbage in the > cache links: > > GET /c/set%7Cset%26set/org.... > > GET /c/%0aset%7cset%26set%0a/org.... > > (we serve cgit from /c/) > > If I clear the cache entries, all is well until these bots come along > and pollute it again. If I set cache-size=0 everything works well, > albeit much slower. > > Is this a known bug in cgit? For now I've added some Apache > RewriteRules so that these hits don't reach cgit, but it would be nice > if cgit could deal with these. > > You can read more on our bug tracker, here: > https://bugs.eclipse.org/bugs/show_bug.cgi?id=453438 Although you seem to have ruled it out, I think storing the cache on NFS is likely to be problematic. A quick search found some documentation [1], [2] on problems with sendfile(2) and NFS. You could try editing cgit.mk to comment out the HAVE_LINUX_SENDFILE define, but I would recommend avoiding NFS for the cache if possible. I have tried a quick test and wasn't able to reproduce your error, but I will try to find some time to investigate further and see if there is a problem with certain requests. [1] http://www.proftpd.org/docs/howto/Sendfile.html [2] http://httpd.apache.org/docs/2.2/misc/perf-tuning.html