List for cgit developers and users
 help / color / mirror / Atom feed
* Performance issue
@ 2016-07-08 12:45 
  2016-07-08 13:23 ` john
  0 siblings, 1 reply; 3+ messages in thread
From:  @ 2016-07-08 12:45 UTC (permalink / raw)


Hi,
I use cgit as WebUI for dist-git of Copr [1].  There are 136000 git repositories (and growing).
My problem is that no matter how aggressive caching in /etc/cgitrc is used, it takes enormous time to generate initial
/var/cache/cgit/rc-* file where are those "repo.*" configurations. And by enormous I mean 30 minutes.

I came up  with one solution. Set TTL to 2 hours and regenerate the cgitrc from cron every hour. This way the cgitrc
will never be generated by user coming from httpd request.

I can generate that cgitrc in cron job manually by running:

QUERY_STRING="url=frostyx/new7/rubygem-active_null.git/commit/&id=b3ceddf17119bc4c9b249fe1b63659039e282c99"
CGIT_CONFIG="/etc/cgitrc" /var/www/cgi-bin/cgit >/tmp/x.html

The problem is that even with --nocache it does not refresh existing /var/cache/cgit/rc-* file. The only way to refresh
the cgitrc file is to wait till it become older than TTL or delete it. But until it is regenerated the users who access
my server, will take it down by filling all apache slots with running cgit (which will traverse all git repositories).

I am thinking about implementing new option. E.g. --update-scan-path, which will force cgit to scan 'scan-path', create
the include cgitrc file in tempfile and at the and it will remove original /var/cache/cgit/rc-* and rename the newly
created cgitrc to that rc-* file. So it will be nearly atomic operation.

If you agree, I can prepare patch next week.


[1] https://copr.fedorainfracloud.org/

-- 
Miroslav Suchy, RHCA
Red Hat, Senior Software Engineer, #brno, #devexp, #fedora-buildsys


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Performance issue
  2016-07-08 12:45 Performance issue 
@ 2016-07-08 13:23 ` john
  2016-07-11 13:45   ` 
  0 siblings, 1 reply; 3+ messages in thread
From: john @ 2016-07-08 13:23 UTC (permalink / raw)


On Fri, Jul 08, 2016 at 02:45:49PM +0200, Miroslav Such? wrote:
> I use cgit as WebUI for dist-git of Copr [1].  There are 136000 git
> repositories (and growing).  My problem is that no matter how
> aggressive caching in /etc/cgitrc is used, it takes enormous time to
> generate initial /var/cache/cgit/rc-* file where are those "repo.*"
> configurations. And by enormous I mean 30 minutes.
> 
> I came up  with one solution. Set TTL to 2 hours and regenerate the
> cgitrc from cron every hour. This way the cgitrc will never be
> generated by user coming from httpd request.
> 
> I can generate that cgitrc in cron job manually by running:
> 
> QUERY_STRING="url=frostyx/new7/rubygem-active_null.git/commit/&id=b3ceddf17119bc4c9b249fe1b63659039e282c99"
> CGIT_CONFIG="/etc/cgitrc" /var/www/cgi-bin/cgit >/tmp/x.html
> 
> The problem is that even with --nocache it does not refresh existing
> /var/cache/cgit/rc-* file. The only way to refresh the cgitrc file is
> to wait till it become older than TTL or delete it. But until it is
> regenerated the users who access my server, will take it down by
> filling all apache slots with running cgit (which will traverse all
> git repositories).
> 
> I am thinking about implementing new option. E.g. --update-scan-path,
> which will force cgit to scan 'scan-path', create the include cgitrc
> file in tempfile and at the and it will remove original
> /var/cache/cgit/rc-* and rename the newly created cgitrc to that rc-*
> file. So it will be nearly atomic operation.

Can't you already do this by removing scan-path from your config and
instead adding something like:

	include /path/to/my/repo-list

and you can generate the repo-list file with:

	cgit --scan-path=/path/to/repositories >repo-list

It's not quite the same because the rest of the configuration won't have
been loaded but I think we'd rather improve this mechanism that add
manual cache mangement.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Performance issue
  2016-07-08 13:23 ` john
@ 2016-07-11 13:45   ` 
  0 siblings, 0 replies; 3+ messages in thread
From:  @ 2016-07-11 13:45 UTC (permalink / raw)


Dne 8.7.2016 v 15:23 John Keeping napsal(a):
> Can't you already do this by removing scan-path from your config and
> instead adding something like:
> 
> 	include /path/to/my/repo-list

For the record it is
  include=/path/to/my/repo-list
(it took me some time to figure it out)

> and you can generate the repo-list file with:
> 
> 	cgit --scan-path=/path/to/repositories >repo-list

Yes!! This is doing exectly what I wanted.

> It's not quite the same because the rest of the configuration won't have
> been loaded but I think we'd rather improve this mechanism that add
> manual cache mangement.

For others:

I patched /etc/cgitrc with

--- /etc/cgitrc 2016-07-11 12:50:16.761192000 +0000
+++ /etc/cgitrc.slow    2016-07-11 12:50:18.088191000 +0000
@@ -6,7 +6,7 @@
 cache-dynamic-ttl=120
 cache-repo-ttl=120
 cache-root-ttl=120
-cache-scanrc-ttl=120
+cache-scanrc-ttl=10
 cache-about-ttl=120
 cache-snapshot-ttl=120
 cache-size=100000
@@ -81,4 +81,4 @@
 #repo.readme=info/web/about.html
 project-list=/var/lib/copr-dist-git/cgit_pkg_list
 #scan-path=/var/lib/dist-git/git/rpms
-include=/var/cache/cgit/repo-list.rc
+scan-path=/var/lib/dist-git/git/rpms

And created this hourly cronjob:

#!/usr/bin/bash
(
flock -n 9 || exit 1
# ... commands executed under lock ...
CGIT_CONFIG="/etc/cgitrc.slow"   /var/www/cgi-bin/cgit  --scan-path=/var/lib/dist-git/git/rpms
>/var/cache/cgit/repo-list.rc.new >/dev/null && mv -f /var/cache/cgit/repo-list.rc.new /var/cache/cgit/repo-list.rc
) 9>/var/lock/mylockfile


And - yes - it seems to do exactly what I wanted.


-- 
Miroslav Suchy, RHCA
Red Hat, Senior Software Engineer, #brno, #devexp, #fedora-buildsys


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-07-11 13:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-08 12:45 Performance issue 
2016-07-08 13:23 ` john
2016-07-11 13:45   ` 

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).