From mboxrd@z Thu Jan 1 00:00:00 1970 From: esajine at interactivebrokers.com (Eugene Sajine) Date: Wed, 19 Mar 2014 10:30:57 -0400 Subject: Cgit cache and slownesses In-Reply-To: References: <520D356E.1060106@interactivebrokers.com> <5213DEAE.2030605@interactivebrokers.com> Message-ID: <5329AA21.9040503@interactivebrokers.com> On 8/23/2013 8:17 AM, Jason A. Donenfeld wrote: > The list is now opened up. Would you pass this message along to it? > Sorry for the delay in this. Back from vacation now, getting things > organized finally. > Hi! Unfortunately, my message didn't seem to get any traction, so i have to repeat it: http://lists.zx2c4.com/pipermail/cgit/2013-August/001526.html I have 1600+ git repos that cgit is working with and this amount is growing consistently. Cgit experiences some slowness at times and I'm trying to find some ways to improve the performance. 1. I have specified the cache amount as 2000 repos. sometimes it takes up to a minute or more to get to the page while drilling down to the tree or snapshots. I'm curious if I should try to increase ttl for the repos (currently using default values) Is there anything else that could make Cgit faster? Increasing TTL is one way probably but it will show some outdated info... Considering this I thought about the following solution (analog with Jenkins): We are serving repositories via Git protocol. Git protocol since some 1.8.* version supports --access-hook. Access hook script that we have can get the name of the git service used and if it is receive-pack it calls Jenkins URL with the url of the repository affected using curl and schedule Jenkins poll immediately. This poll effectively gets the most recent changes and builds the code. Now It would probably increase the performance a lot if instead of expiring Cgit cache every 5 min we could expire it upon a push. But when i checked how many pushes per day we have we averaged in about 200. That means that if we would use that approach we would rescan even more often sometimes then with hardcoded value. So, may be the solution would be either in a) rescanning only the repo that was touched (that's preferred of course) and then combine it with full rescan every ttl interval, but in this case it would be like daily or b) introduce a logic that would only rescan upon push but not more often then once in ttl period I.e. the ttl time would be the least interval for cache expiration. (if last_push_time - last_scan_time > ttl ; then scan, else wait for ttl expiration) So eventually i'm looking for Cgit to have a URL like that: http://server/cgi-bin/cgit.cgi/myrepo?do_scan that would update this particular repo info in cgit cache. 2. It seems that the caching mechanism doesn't work properly for me because I have changed the cache-root and i don't see any files being saved in this new location - the folder is always empty. Thanks, Eugene