zsh-workers
 help / color / mirror / code / Atom feed
From: Daniel Shahaf <d.s@daniel.shahaf.name>
To: Sam Bostock <sam.bostock@shopify.com>
Cc: "Lawrence Velázquez" <larryv@zsh.org>, zsh-workers@zsh.org
Subject: Re: [PATCH] Abbreviate commit hash in prompt during git rebase
Date: Sun, 17 Apr 2022 17:52:28 +0000	[thread overview]
Message-ID: <20220417175228.GP27526@tarpaulin.shahaf.local2> (raw)
In-Reply-To: <CAHwnEohmcE7Wg6ZBdKmUgU-e7zryRgUXXWPJFRdxU9_nbZ=DpA@mail.gmail.com>

Sam Bostock wrote on Wed, Apr 13, 2022 at 13:53:40 -0600:
> Hi Lawrence,
> 
> Thanks for the reply!
> 

Have you seen my previous reply to you, <https://zsh.org/workers/50057>?
(For obvious reasons I'll re-send this question to you in a minute from
another address of mine.  Apologies in advance for the duplicate.)

> Git's abbreviation mechanism ensures uniqueness, rather than strictly
> truncating. If I'm not mistaken, the core.abbrev setting is effectively
> just a minimum for how many characters to include, with git including as
> many additional characters as required to avoid colliding with another
> object in the repository. I believe the default is determined using a
> heuristic based on the number objects in the repository.
> 
> So we could make it a setting and do truncation ourselves, but then we'd be
> displaying a value without uniqueness guarantees.
> 

The operative word here is "guarantee".  If we truncate to, say, 12 hex
digits, that's 48 bits (= 12 nibbles), so a repository would need to
have 2**24 commits (≈ 16M) in order to have a ≈50% chance of having two
commits with the same first 12 nibbles.  (That's _commits_, not
_objects_, since IIRC «d41d8cd^{commit}» v. «d41d8cd^{blob}» is a valid
way to disambiguate.) Only the largest/oldest projects would have
history DAGs this large.¹

And if we truncate "too soon" and the resulting hash is ambiguous, then
what?  Would that be a problem?

- It wouldn't break the use-case of looking at the prompt after «git
  commit» or «git rebase --continue» to confirm the command worked.

- It wouldn't break the use-case of reviewing the scrollback and knowing
  what the state was before each command.
  
- It _would_ break the use-case of copying the hash into the command
  line, because then git would error out about the ambiguous prefix.
  The workaround would be to extract the full hash from the reflogs or
  from rebase's todo list.

  This use-case is so common for me that I had my zshrc store the commit
  hash in a global variable so I can easily use it in interactive
  commands.  (I do «typeset -g y=…» in zshrc and then «git show $y»
  interactively.)

So, do we actually need to *guarantee* uniqueness?

¹ For instance, FreeBSD's ports repository, which started in 1994, has
0.6M commit objects (per «git rev-list --all --count») and autogenerates
12-nibble short hashes.  (I didn't choose this repository because of
this property; it's just a repository I happened to have handy.)

> Given we're already in somewhat of an edge case (rebase in progress), is
> the penalty still a problem?
> 

I don't consider rebases an edge case, so, "Yes.".

> Also, while I'm not familiar with the inner workings of vcs_info, I'm
> wondering if we recompute this every time the prompt is rendered, or if we
> lazily compute the values only when the rebase progresses?
> 

Every time.

Cheers,

Daniel

> Sam
> 
> 
> On Tue., Apr. 12, 2022, 11:38 p.m. Lawrence Velázquez, <larryv@zsh.org>
> wrote:
> 
> > Hi Sam,
> >
> > On Wed, Apr 13, 2022, at 12:41 AM, Sam Bostock wrote:
> > > First time contributor here
> >
> > Welcome!
> >
> > > diff --git a/Functions/VCS_Info/Backends/VCS_INFO_get_data_git
> > > b/Functions/VCS_Info/Backends/VCS_INFO_get_data_git
> > > index e45eebc8e..dd9c40ab4 100644
> > > --- a/Functions/VCS_Info/Backends/VCS_INFO_get_data_git
> > > +++ b/Functions/VCS_Info/Backends/VCS_INFO_get_data_git
> > > @@ -209,27 +209,15 @@ elif [[ -d "${gitdir}/rebase-merge" ]]; then
> > >                  return 0
> > >                  ;;
> > >              (''(p|pick|e|edit|r|reword|f|fixup|s|squash)' '*)
> > > -                # The line is of the form "pick $hash $subject".
> > > -                # Just strip the verb and we're good to go.
> > > -                p=${p#* }
> > > +                # Typically, the line is of the form "pick $longhash
> > > $subject\n".
> > >                  # Special case: in an interactive rebase, if the user
> > > wrote "p $shorthash\n"
> > >                  # in the editor (without a description after the
> > > hash), then the .../done
> > >                  # file will contain "p $longhash $shorthash\n" (git
> > > 2.11.0) or "pick $longhash\n"
> > >                  # (git 2.19.0).
> > > -                if [[ $p != *\ * ]]; then
> > > -                        # The line is of the form "pick $longhash\n"
> > > -                        #
> > > -                        # Mark the log message subject as unknown.
> > > -                        # TODO: Can we performantly obtain the subject?
> > > -                        p+=" ?"
> > > -                elif (( ${#${p//[^ ]}} == 1 )) && [[ ${p%% *} == ${p#*
> > > }* ]]; then
> > > -                        # The line is of the form "p $longhash
> > > $shorthash\n"
> > > -                        #
> > > -                        # The shorthash is superfluous, so discard it,
> > > and mark
> > > -                        # the log message subject as unknown.
> > > -                        # TODO: Can we performantly obtain the subject?
> > > -                        p="${p%% *} ?"
> > > -                fi
> > > +                # Given we don't reliably have the subject, and the
> > > full hash is needlessly long
> > > +                # for a prompt, we simply extract the hash (second
> > > word) and delegate to git to
> > > +                # format it as "$shorthash $subject" according to
> > > git's hash abbreviation config.
> > > +                p=$(${vcs_comm[cmd]} log -1 --pretty=format:'%h %s'
> > > $p[(w)2])
> > >                  ;;
> > >              (''(x|exec) *)
> > >                  # The line is of the form 'exec foo bar baz' where
> > > 'foo bar
> >
> > It strikes me as a regression to run git if we already have the
> > subject; vcs_info is slow enough as it is.  Could this be reserved
> > for the edge cases where we don't have the subject?
> >
> > (This would preclude what is arguably your primary goal, which is
> > using core.abbrev to choose between short and long hashes.  But
> > that amounts to using git as a glorified string slicer.  Perhaps
> > the choice could be determined by a style instead, and we could do
> > the truncation ourselves?)
> >
> > --
> > vq
> >


  reply	other threads:[~2022-04-17 17:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13  4:41 Sam Bostock
2022-04-13  5:38 ` Lawrence Velázquez
2022-04-13 19:53   ` Sam Bostock
2022-04-17 17:52     ` Daniel Shahaf [this message]
2022-04-17 20:08       ` Lawrence Velázquez
2022-04-17 20:17         ` Daniel Shahaf
2022-04-13 22:03 ` Daniel Shahaf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220417175228.GP27526@tarpaulin.shahaf.local2 \
    --to=d.s@daniel.shahaf.name \
    --cc=larryv@zsh.org \
    --cc=sam.bostock@shopify.com \
    --cc=zsh-workers@zsh.org \
    --subject='Re: [PATCH] Abbreviate commit hash in prompt during git rebase' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).