List for cgit developers and users
 help / color / mirror / Atom feed
* RFC: don't cache objects larger than X
@ 2016-10-10 14:03 mricon
  2016-10-12 11:22 ` Jason
  0 siblings, 1 reply; 5+ messages in thread
From: mricon @ 2016-10-10 14:03 UTC (permalink / raw)


Hi, all:

I have an unfortunate problem of maintaining several git trees where a
single patch can be over 1GB in size (I know this is crazy, but this
actually happens). When spam crawlers access such patch over /commit
links, this generates a colorized version that is easily 10GB in size in
the cache dir. A couple of such hits and my cgit cache partition runs
out of space.

There is currently no way to tweak cgit cache based on object size, only
on the number of entries, so there is really no way to fix this beyond a
kludge of running cron every 5 minutes that deletes all objects larger
than 1GB in the cgit-cache dir.

I'd be happy to see a config option that either limits total size of
cgit-cache (preferred) or at least tells cgit not to cache objects
larger than a certain size.

Best,
-- 
Konstantin Ryabitsev
Linux Foundation Collab Projects
Montr?al, Qu?bec
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.zx2c4.com/pipermail/cgit/attachments/20161010/d65a5b5e/attachment.asc>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RFC: don't cache objects larger than X
  2016-10-10 14:03 RFC: don't cache objects larger than X mricon
@ 2016-10-12 11:22 ` Jason
  2016-10-12 13:05   ` mricon
  2016-10-17 17:56   ` lfleischer
  0 siblings, 2 replies; 5+ messages in thread
From: Jason @ 2016-10-12 11:22 UTC (permalink / raw)


I face this same problem, in fact. Unless somebody beats me to it, I'd
be interested in giving this a stab.

One issue is that cache entries are currently "streamed" into the
cache files, as they're produced. It's not trivially possible to know
how big it's going to be beforehand. This means that the best we could
do would be to just immediately unlink it after creation and printing.
Would this be acceptable?

Regards,
Jason


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RFC: don't cache objects larger than X
  2016-10-12 11:22 ` Jason
@ 2016-10-12 13:05   ` mricon
  2016-10-17 17:56   ` lfleischer
  1 sibling, 0 replies; 5+ messages in thread
From: mricon @ 2016-10-12 13:05 UTC (permalink / raw)


On Wed, Oct 12, 2016 at 01:22:34PM +0200, Jason A. Donenfeld wrote:
> I face this same problem, in fact. Unless somebody beats me to it, I'd
> be interested in giving this a stab.
> 
> One issue is that cache entries are currently "streamed" into the
> cache files, as they're produced. It's not trivially possible to know
> how big it's going to be beforehand. This means that the best we could
> do would be to just immediately unlink it after creation and printing.
> Would this be acceptable?

Is there any way of keeping track of how much was written? It seems like
it should be pretty simple to continuously check how much has been
written to the cache and abort writing if the cache file grows larger
than the configured maximum?

Best,
-- 
Konstantin Ryabitsev
Linux Foundation Collab Projects
Montr?al, Qu?bec
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.zx2c4.com/pipermail/cgit/attachments/20161012/9b93db7c/attachment.asc>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RFC: don't cache objects larger than X
  2016-10-12 11:22 ` Jason
  2016-10-12 13:05   ` mricon
@ 2016-10-17 17:56   ` lfleischer
  2016-10-17 18:39     ` Jason
  1 sibling, 1 reply; 5+ messages in thread
From: lfleischer @ 2016-10-17 17:56 UTC (permalink / raw)


On Wed, 12 Oct 2016 at 13:22:34, Jason A. Donenfeld wrote:
> I face this same problem, in fact. Unless somebody beats me to it, I'd
> be interested in giving this a stab.
> 
> One issue is that cache entries are currently "streamed" into the
> cache files, as they're produced. It's not trivially possible to know
> how big it's going to be beforehand. This means that the best we could
> do would be to just immediately unlink it after creation and printing.
> Would this be acceptable?

It is not easy to compute the exact size of the generated page but we
are able to detect huge objects before streaming -- the size of the
object is already returned by read_sha1_file().

I wonder whether the max-blob-size setting already does what you want,
though? It does not only affect the cached version but it seems better
to prevent from generating such huge pages in the first place. If you
really want to offer such files to your users, the max_blob_size check
in print_object() might be a good place to add the "print but do not
cache large files" functionality.

Regards,
Lukas


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RFC: don't cache objects larger than X
  2016-10-17 17:56   ` lfleischer
@ 2016-10-17 18:39     ` Jason
  0 siblings, 0 replies; 5+ messages in thread
From: Jason @ 2016-10-17 18:39 UTC (permalink / raw)


I think there actually might be a trick using rlimit to implement this
type of streaming size check...


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-10-17 18:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-10 14:03 RFC: don't cache objects larger than X mricon
2016-10-12 11:22 ` Jason
2016-10-12 13:05   ` mricon
2016-10-17 17:56   ` lfleischer
2016-10-17 18:39     ` Jason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).