From mboxrd@z Thu Jan  1 00:00:00 1970
From: esajine at interactivebrokers.com (Eugene Sajine)
Date: Wed, 19 Mar 2014 10:30:57 -0400
Subject: Cgit cache and slownesses
In-Reply-To: <CAHmME9pL0irbwoxK3y-RmHQ7mf9LMERSMUnS4z52r8m98anA4g@mail.gmail.com>
References: <CAHmME9r73f-VygyT2oT8kDjEOCApbKFMPVTifiE-MSpDgE5=Yg@mail.gmail.com>	<520D356E.1060106@interactivebrokers.com>	<5213DEAE.2030605@interactivebrokers.com>
 <CAHmME9pL0irbwoxK3y-RmHQ7mf9LMERSMUnS4z52r8m98anA4g@mail.gmail.com>
Message-ID: <5329AA21.9040503@interactivebrokers.com>

On 8/23/2013 8:17 AM, Jason A. Donenfeld wrote:
> The list is now opened up. Would you pass this message along to it?
> Sorry for the delay in this. Back from vacation now, getting things
> organized finally.
>

Hi!

Unfortunately, my message didn't seem to get any traction, so i have to 
repeat it:

http://lists.zx2c4.com/pipermail/cgit/2013-August/001526.html

I have 1600+ git repos that cgit is working with and this amount is
growing consistently.
Cgit experiences some slowness at times and I'm trying to find some
ways to improve the performance.

1. I have specified the cache amount as 2000 repos. sometimes it takes
up to a minute or more to get to the page while drilling down to the
tree or snapshots. I'm curious if I should try to increase ttl for the
repos (currently using default values) Is there anything else that could
make Cgit faster? Increasing TTL is one way probably but it will show
some outdated info...

Considering this I thought about the following solution (analog with
Jenkins):
We are serving repositories via Git protocol. Git protocol since some
1.8.* version supports --access-hook. Access hook script that we have can get the name of
the git service used and if it is receive-pack it calls Jenkins URL with
the url of the repository affected using curl and schedule Jenkins poll
immediately. This poll effectively gets the most recent changes and
builds the code.

Now It would probably increase the performance a lot if instead of
expiring Cgit cache every 5 min we could expire it upon a push.
But when i checked how many pushes per day we have we
averaged in about 200. That means that if we would use that approach we
would rescan even more often sometimes then with hardcoded value.
So, may be the solution would be either in
a) rescanning only the repo that was touched (that's preferred of
course) and then combine it with full rescan every ttl interval,
but in this case it would be like daily or
b) introduce a logic that would only rescan upon push but not more often
then once in ttl period I.e. the ttl time would be the least interval
for cache expiration. (if last_push_time - last_scan_time > ttl ; then
scan, else wait for ttl expiration)

So eventually i'm looking for Cgit to have a URL like that:
http://server/cgi-bin/cgit.cgi/myrepo?do_scan  that would update this
particular repo info in cgit cache.

2. It seems that the caching mechanism doesn't work properly for me
because I have changed the cache-root and i don't see any files being
saved in this new location - the folder is always empty.

Thanks,
Eugene