List for cgit developers and users
 help / color / mirror / Atom feed
* Supporting Namespaces in cgit
@ 2016-05-09 20:34 dsilvers
  2016-05-09 21:31 ` john
  0 siblings, 1 reply; 6+ messages in thread
From: dsilvers @ 2016-05-09 20:34 UTC (permalink / raw)


Hello,

One of the projects I am involved with is called Gitano[1] and is a Git server
along the lines of Gitolite or Gitosis, but not along the lines of Gitlab or
Gitorious.  Among various technologies developed by other projects, we
recommend the use of CGit for visualising the Git repositories hosted in a
Gitano instance; and we have been very pleased with CGit in the three or
four years we've been using it.

We have recently been looking at ways to support server-side repository forks
in Gitano and how we might use namespaces to support that.  In addition that
led to us thinking about how we could segregate Gitano's administration refs
into a namespace to keep things cleaner.  However if we were to use server-side
namespaces for that, then we would need to ensure our chosen web visualisation
tool (CGit) was able to deal with it.

An examination of the code (thorough, but admittedly a while ago) failed to
find any support for Git namespaces in CGit, but I wondered firstly if
namespaces had ever been considered for support in CGit?  If not yet
considered, would support for namespaces be something that the CGit project
might entertain?  We would be very prepared to do a first pass set of patches,
or indeed a design document first if that were desirable; but given our
relative inexperience in the CGit codebase we would very much appreciate
pointers and assistance from the CGit community.

Obviously, if namespace support is not something that the CGit community wants
to see done then we will have to rethink some of our backend design decisions.
Similarly, if it's interesting but there is no intention to make a CGit release
within the timeframe we'd be working to (approx. November time at the latest)
then we'll need to think again.

I appreciate this is a fairly content-free mail that seems to be asking so much
and offering very little, so in brief summary:

1. We'd like CGit to support namespaces on a per-repo basis
2. We're prepared to head up the design and/or implementation of this
3. But only if the CGit community thinks this is at least possible to get
   merged into a release the near future (next 6 months or so).

I look forward to your opinions, even if they're in the negative.

Regards,

Daniel.

[1: https://git.gitano.org.uk/gitano.git/ ]

-- 
Daniel Silverstone                         http://www.digital-scurf.org/
PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Supporting Namespaces in cgit
  2016-05-09 20:34 Supporting Namespaces in cgit dsilvers
@ 2016-05-09 21:31 ` john
  2016-05-09 21:54   ` dsilvers
  0 siblings, 1 reply; 6+ messages in thread
From: john @ 2016-05-09 21:31 UTC (permalink / raw)


On Mon, May 09, 2016 at 09:34:24PM +0100, Daniel Silverstone wrote:
> One of the projects I am involved with is called Gitano[1] and is a Git server
> along the lines of Gitolite or Gitosis, but not along the lines of Gitlab or
> Gitorious.  Among various technologies developed by other projects, we
> recommend the use of CGit for visualising the Git repositories hosted in a
> Gitano instance; and we have been very pleased with CGit in the three or
> four years we've been using it.
> 
> We have recently been looking at ways to support server-side repository forks
> in Gitano and how we might use namespaces to support that.  In addition that
> led to us thinking about how we could segregate Gitano's administration refs
> into a namespace to keep things cleaner.  However if we were to use server-side
> namespaces for that, then we would need to ensure our chosen web visualisation
> tool (CGit) was able to deal with it.
> 
> An examination of the code (thorough, but admittedly a while ago) failed to
> find any support for Git namespaces in CGit, but I wondered firstly if
> namespaces had ever been considered for support in CGit?  If not yet
> considered, would support for namespaces be something that the CGit project
> might entertain?  We would be very prepared to do a first pass set of patches,
> or indeed a design document first if that were desirable; but given our
> relative inexperience in the CGit codebase we would very much appreciate
> pointers and assistance from the CGit community.
> 
> Obviously, if namespace support is not something that the CGit community wants
> to see done then we will have to rethink some of our backend design decisions.
> Similarly, if it's interesting but there is no intention to make a CGit release
> within the timeframe we'd be working to (approx. November time at the latest)
> then we'll need to think again.
> 
> I appreciate this is a fairly content-free mail that seems to be asking so much
> and offering very little, so in brief summary:
> 
> 1. We'd like CGit to support namespaces on a per-repo basis
> 2. We're prepared to head up the design and/or implementation of this
> 3. But only if the CGit community thinks this is at least possible to get
>    merged into a release the near future (next 6 months or so).
> 
> I look forward to your opinions, even if they're in the negative.

I'm curious what you expect the UI for this to look like.  Would
namespaces appear under the repository in the URL?  If so, what does the
base repository look like?  (Although after I finished thinking this
through and writing the rest of the email maybe this doesn't actually
matter - see the last paragraph.)

Implementation-wise, it looks like using a namespace should just be a
matter of setting GIT_NAMESPACE in the environment near the top of
cgit.c::prepare_repo_cmd().

Discovering namespaces is more interesting, since we can't know what
exactly is a namespace.  For example, if we have:

	refs/namespaces/foo/bar/baz

is the namespace "foo" or "foo/bar"?  Maybe checking for "heads" and
"tags" subdirectories is enough, but I'm not familiar enough with
namespaces to know if those will definitely exist, and obviously users
can create or delete any directories anywhere in the hierarchy.

Also, any attempt to discover namespaces during automated repository
discovery (i.e. cgitrc's "scan-tree") is likely to be quite expensive
with reading packed-refs and the whole loose refs tree.  However, it
sounds like Gitano probably generates an explicit repository list, in
which case a "repo.namespace" config key should be usable.

If we can indeed ignore any attempt to discover namespaces and just use
"repo.namespace", is it enough to add that config value to
"struct cgit_repo" and then pass it to setenv() in prepare_repo_cmd()?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Supporting Namespaces in cgit
  2016-05-09 21:31 ` john
@ 2016-05-09 21:54   ` dsilvers
  2016-05-10 13:21     ` john
  0 siblings, 1 reply; 6+ messages in thread
From: dsilvers @ 2016-05-09 21:54 UTC (permalink / raw)


On Mon, May 09, 2016 at 22:31:37 +0100, John Keeping wrote:
> Implementation-wise, it looks like using a namespace should just be a
> matter of setting GIT_NAMESPACE in the environment near the top of
> cgit.c::prepare_repo_cmd().

This is certainly the basic starting point.

> Discovering namespaces is more interesting, since we can't know what
> exactly is a namespace.  For example, if we have:
> 
> 	refs/namespaces/foo/bar/baz
> 
> is the namespace "foo" or "foo/bar"?  Maybe checking for "heads" and
> "tags" subdirectories is enough, but I'm not familiar enough with
> namespaces to know if those will definitely exist, and obviously users
> can create or delete any directories anywhere in the hierarchy.

I'd not attempt to discover namespaces.  I think if you're given a namespace to
use in the repo stanza you use it, otherwise current behaviour prevails.

> Also, any attempt to discover namespaces during automated repository
> discovery (i.e. cgitrc's "scan-tree") is likely to be quite expensive
> with reading packed-refs and the whole loose refs tree.  However, it
> sounds like Gitano probably generates an explicit repository list, in
> which case a "repo.namespace" config key should be usable.

Yes, that's the intended behaviour.  I wouldn't expect cgit to be able to
invent namespace understanding out of nothing.

> If we can indeed ignore any attempt to discover namespaces and just use
> "repo.namespace", is it enough to add that config value to
> "struct cgit_repo" and then pass it to setenv() in prepare_repo_cmd()?

This is a necessary start, but it is not sufficient.  Elsewhere in the codebase
changes will need to be made to use namespace aware ref iteration among other
things.  In addition, if we wish to support agefile per-namespace then we need
a repo.agefile option which can override the global option.  There may be more
but right now I don't have them to mind because I've not fully scoured the
codebase.

If you think it's worth our while implementing a proof-of-concept patch series
then we'll give it a go.  I'm quite excited about being able to do this because
it'll open up so many interesting options for me when Gitano can ACLs which are
namespace aware :-)

D.

-- 
Daniel Silverstone                         http://www.digital-scurf.org/
PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Supporting Namespaces in cgit
  2016-05-09 21:54   ` dsilvers
@ 2016-05-10 13:21     ` john
  2016-06-25 15:46       ` richard.maw
  0 siblings, 1 reply; 6+ messages in thread
From: john @ 2016-05-10 13:21 UTC (permalink / raw)


On Mon, May 09, 2016 at 10:54:44PM +0100, Daniel Silverstone wrote:
> On Mon, May 09, 2016 at 22:31:37 +0100, John Keeping wrote:
> > Implementation-wise, it looks like using a namespace should just be a
> > matter of setting GIT_NAMESPACE in the environment near the top of
> > cgit.c::prepare_repo_cmd().
> 
> This is certainly the basic starting point.
> 
> > Discovering namespaces is more interesting, since we can't know what
> > exactly is a namespace.  For example, if we have:
> > 
> > 	refs/namespaces/foo/bar/baz
> > 
> > is the namespace "foo" or "foo/bar"?  Maybe checking for "heads" and
> > "tags" subdirectories is enough, but I'm not familiar enough with
> > namespaces to know if those will definitely exist, and obviously users
> > can create or delete any directories anywhere in the hierarchy.
> 
> I'd not attempt to discover namespaces.  I think if you're given a namespace to
> use in the repo stanza you use it, otherwise current behaviour prevails.
> 
> > Also, any attempt to discover namespaces during automated repository
> > discovery (i.e. cgitrc's "scan-tree") is likely to be quite expensive
> > with reading packed-refs and the whole loose refs tree.  However, it
> > sounds like Gitano probably generates an explicit repository list, in
> > which case a "repo.namespace" config key should be usable.
> 
> Yes, that's the intended behaviour.  I wouldn't expect cgit to be able to
> invent namespace understanding out of nothing.
> 
> > If we can indeed ignore any attempt to discover namespaces and just use
> > "repo.namespace", is it enough to add that config value to
> > "struct cgit_repo" and then pass it to setenv() in prepare_repo_cmd()?
> 
> This is a necessary start, but it is not sufficient.  Elsewhere in the codebase
> changes will need to be made to use namespace aware ref iteration among other
> things.  In addition, if we wish to support agefile per-namespace then we need
> a repo.agefile option which can override the global option.  There may be more
> but right now I don't have them to mind because I've not fully scoured the
> codebase.

Ah, right.  I thought git.git's infrastructure might take care of
namespaces automatically, but only git-upload-pack and git-receive-pack
actually make use of namespaces so we'll have to do it ourselves.

Apart from enumeration, which should be fairly mechanical with
strip_namespace(), we'll need to prefix user-provided values with the
namespace.  I think the three relevant parameters (in
cgit.c::querystring_cb()) are "h", "id" and "id2"; currently we allow
each of those to contain either a named ref or a raw SHA-1, although we
generate only named refs for "h" and only SHA-1s for "id" and "id2".
And in fact ui-blob.c enforces that "id" contains a valid SHA-1.

So a simple implementation would just prefix "h" with
get_git_namespace() and call it done, but that risks information leakage
via "id" which is treated equivalently in most places (although as
gitnamespaces(7) points out anyone with write access to the repository
can already read whatever they want and in fact CGit imposes no access
checks if you give it a SHA-1, but at least that's slightly more obscure
than a ref name).

One approach to that would be to switch all the sites using "id" or
"id2" to get_sha1_hex() but I'm sure we have people generating URLs
using those parameters and relying on at least "id2" taking a ref rather
than a raw SHA-1.  I suspect it is simpler to replace calls to
get_sha1() with cgit_get_sha1() and apply the namespace prefix there if
the value is not a raw SHA-1.

> If you think it's worth our while implementing a proof-of-concept patch series
> then we'll give it a go.  I'm quite excited about being able to do this because
> it'll open up so many interesting options for me when Gitano can ACLs which are
> namespace aware :-)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Supporting Namespaces in cgit
  2016-05-10 13:21     ` john
@ 2016-06-25 15:46       ` richard.maw
  2016-06-26  0:44         ` richard.maw
  0 siblings, 1 reply; 6+ messages in thread
From: richard.maw @ 2016-06-25 15:46 UTC (permalink / raw)


Hi all.

I thought I'd give an update,
since I managed to find time to make an attempt at this.

On Tue, May 10, 2016 at 02:21:36PM +0100, John Keeping wrote:
> On Mon, May 09, 2016 at 10:54:44PM +0100, Daniel Silverstone wrote:
> > On Mon, May 09, 2016 at 22:31:37 +0100, John Keeping wrote:
> > > Implementation-wise, it looks like using a namespace should just be a
> > > matter of setting GIT_NAMESPACE in the environment near the top of
> > > cgit.c::prepare_repo_cmd().
> > 
> > This is certainly the basic starting point.
> > 
> > > Discovering namespaces is more interesting, since we can't know what
> > > exactly is a namespace.  For example, if we have:
> > > 
> > > 	refs/namespaces/foo/bar/baz
> > > 
> > > is the namespace "foo" or "foo/bar"?  Maybe checking for "heads" and
> > > "tags" subdirectories is enough, but I'm not familiar enough with
> > > namespaces to know if those will definitely exist, and obviously users
> > > can create or delete any directories anywhere in the hierarchy.

Having looked at it in more detail,
I'd check for the presence of a HEAD symbolic ref,
if we were going to try it.

> > I'd not attempt to discover namespaces.  I think if you're given a namespace to
> > use in the repo stanza you use it, otherwise current behaviour prevails.
> > 
> > > Also, any attempt to discover namespaces during automated repository
> > > discovery (i.e. cgitrc's "scan-tree") is likely to be quite expensive
> > > with reading packed-refs and the whole loose refs tree.  However, it
> > > sounds like Gitano probably generates an explicit repository list, in
> > > which case a "repo.namespace" config key should be usable.
> > 
> > Yes, that's the intended behaviour.  I wouldn't expect cgit to be able to
> > invent namespace understanding out of nothing.

We have since discussed the idea of potentially having
a global namespace option,
so you could have a CGit instance that always displays the "docs" namespace
if you wanted to use it as a web-viewer for documentation served from git
without exposing the code.

> > > If we can indeed ignore any attempt to discover namespaces and just use
> > > "repo.namespace", is it enough to add that config value to
> > > "struct cgit_repo" and then pass it to setenv() in prepare_repo_cmd()?
> > 
> > This is a necessary start, but it is not sufficient.  Elsewhere in the codebase
> > changes will need to be made to use namespace aware ref iteration among other
> > things.  In addition, if we wish to support agefile per-namespace then we need
> > a repo.agefile option which can override the global option.  There may be more
> > but right now I don't have them to mind because I've not fully scoured the
> > codebase.

I think we can get away without having an option for this
if we instead deterministically mangle the file path with the namespace,
which has the advantage of it not being breakable by misconfiguration,
but since a global agefile option exists already
it might still be appropriate to add this as a configuration option.

> Ah, right.  I thought git.git's infrastructure might take care of
> namespaces automatically, but only git-upload-pack and git-receive-pack
> actually make use of namespaces so we'll have to do it ourselves.
> 
> Apart from enumeration, which should be fairly mechanical with
> strip_namespace(), we'll need to prefix user-provided values with the
> namespace.  I think the three relevant parameters (in
> cgit.c::querystring_cb()) are "h", "id" and "id2"; currently we allow
> each of those to contain either a named ref or a raw SHA-1, although we
> generate only named refs for "h" and only SHA-1s for "id" and "id2".
> And in fact ui-blob.c enforces that "id" contains a valid SHA-1.
> 
> So a simple implementation would just prefix "h" with
> get_git_namespace() and call it done, but that risks information leakage
> via "id" which is treated equivalently in most places (although as
> gitnamespaces(7) points out anyone with write access to the repository
> can already read whatever they want and in fact CGit imposes no access
> checks if you give it a SHA-1, but at least that's slightly more obscure
> than a ref name).

Yeah, namespaces aren't useful for security,
but we think they would be useful for preventing accidental leakage
or confusion.

Anyone can read the contents of the admin ref of a Gitano repository
that they have read access to,
but they may be prevented from pushing to it,
and its presence may just cause confusion for users who don't need to use it.

More about keeping the refs tidy, than keeping their contents secure.

> One approach to that would be to switch all the sites using "id" or
> "id2" to get_sha1_hex() but I'm sure we have people generating URLs
> using those parameters and relying on at least "id2" taking a ref rather
> than a raw SHA-1.  I suspect it is simpler to replace calls to
> get_sha1() with cgit_get_sha1() and apply the namespace prefix there if
> the value is not a raw SHA-1.

get_sha1() can also handle some of the weirder "refs",
which aren't a sha1 and don't start with a ref or branch name.

@{n} for reflog entries, :/ for searching for a commit in any ref,
or even the weird output of `git describe`.

Partially parsing the ref to find out which bit is the ref name
sounds like it is likely to be fragile when new forms are added,
so I'm tempted to have `cgit_get_sha1()` default to calling `get_sha1()`,
but if we have a namespace it strictly only supports sha1s and simple refs.

Until and unless git starts being able to handle namespaces in `get_sha1()`,
I think this is the best we can hope for,
and I'll be adding a big fat comment to revisit it when git updates.

> > If you think it's worth our while implementing a proof-of-concept patch series
> > then we'll give it a go.  I'm quite excited about being able to do this because
> > it'll open up so many interesting options for me when Gitano can ACLs which are
> > namespace aware :-)

A lot of the changes are straight-forward,
just changing something to call a namespace-aware function,
or prepending the path.

The dumb-http endpoint looks rather easy to convert,
but it may be convenient to resurrect the patch for adding smart-transport
(https://lists.zx2c4.com/pipermail/cgit/2014-December/002311.html)
since it would be easier to use smart-http
by having CGit set the namespace before calling http-backend
than configuring your web server to set the variable,
and fewer places for it to get out of sync.

We intend that Gitano will generate a cgitrc snippet for the repositories,
so Gitano would be able to cope without CGit providing smart-http,
but it may be a big hurdle to users if there isn't another implementation.


We should also be able to support displaying git notes in namespaces,
though it's a bit of a pig,
since we need to disable the default note search paths
and add one just for the default location in the namespace,
and since those paths are interpreted as globs,
for reliability I think I need to escape the path.

This does raise the interesting question of whether
we should include the notes of any sub-namespaces of the repository
in the search path for commits in the current namespace,
since all those commits in the sub-namespace are reachable,
but if they also exist in your current namespace you might get unexpected notes.


I will be pushing my progress to https://git.gitano.org.uk/cgit.git/log/?h=richardmaw/namespaces
and will try to keep in touch with my progress as I make it.
Comments are welcome.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Supporting Namespaces in cgit
  2016-06-25 15:46       ` richard.maw
@ 2016-06-26  0:44         ` richard.maw
  0 siblings, 0 replies; 6+ messages in thread
From: richard.maw @ 2016-06-26  0:44 UTC (permalink / raw)


On Sat, Jun 25, 2016 at 04:46:26PM +0100, Richard Maw wrote:
> On Tue, May 10, 2016 at 02:21:36PM +0100, John Keeping wrote:
> > On Mon, May 09, 2016 at 10:54:44PM +0100, Daniel Silverstone wrote:
> > > On Mon, May 09, 2016 at 22:31:37 +0100, John Keeping wrote:
> > > > Implementation-wise, it looks like using a namespace should just be a
> > > > matter of setting GIT_NAMESPACE in the environment near the top of
> > > > cgit.c::prepare_repo_cmd().
> > > 
> > > This is certainly the basic starting point.
> > > 
> > > > Discovering namespaces is more interesting, since we can't know what
> > > > exactly is a namespace.  For example, if we have:
> > > > 
> > > > 	refs/namespaces/foo/bar/baz
> > > > 
> > > > is the namespace "foo" or "foo/bar"?  Maybe checking for "heads" and
> > > > "tags" subdirectories is enough, but I'm not familiar enough with
> > > > namespaces to know if those will definitely exist, and obviously users
> > > > can create or delete any directories anywhere in the hierarchy.
> 
> Having looked at it in more detail,
> I'd check for the presence of a HEAD symbolic ref,
> if we were going to try it.

It turns out that symbolic refs don't get packed,
so it's always file system traversal rather than packed ref reading.

However, HEAD symbolic refs in namespaces don't get automatically created,
but they are needed to clone with the dumb-http transport.

This is mostly just an annoying extra step to using namespaces,
though it might be worth trying repo.defbranch in the absence of a HEAD file.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-26  0:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-09 20:34 Supporting Namespaces in cgit dsilvers
2016-05-09 21:31 ` john
2016-05-09 21:54   ` dsilvers
2016-05-10 13:21     ` john
2016-06-25 15:46       ` richard.maw
2016-06-26  0:44         ` richard.maw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).