Git Instructions You Most likely Do Not Want
Ah, git! Adore it, hate it. Few issues are as central to the fashionable software program
growth workflow as source-control administration (SCM) instruments. Though there
have been and nonetheless are loads of options to git
on the earth of SCMs,
none different appear fairly as prevalent each in open-source and the enterprise.
No matter how central git
has grown to be for a lot of (most?) software program
builders, I continuously get the impression that folks tend to shy
away from something past the comparatively fundamental performance it offers.
Since its very inception git
has been infamous for its typically unfriendly,
inconsistent and infrequently hostile command line interface:
On this publish I’ll current just a few git
instructions and operations I run or have run
occasionally, in no specific order, that almost all of git
customers on the market
may not ever want.
The empty commit ∅
git
is designed to trace content material and adjustments to that content material over time, so
creating empty commits doesn’t sound like a really productive or wise factor
to do. Not including any content material may appear to be a waste.
But, I do know off the highest of my thoughts no less than two events the place creating an
empty commit could make fairly a little bit of sense:
- Initializing new repositories.
- Triggering steady deployment pipelines.
1. Initializing a brand new repository
My principal motive for utilizing the --allow-empty
flag to git
is to create an
preliminary commit in a repository earlier than I’ve any content material for it.
Why?
Main motive for me is that git rebase
is considerably troublesome when wanting
to rewrite the primary commit of a repository. The rebase
command facilities round
working in opposition to an <upstream>
commit, and by definition the primary commit
of a department has no upstream. In later variations of git
the --root
choice has
been added to treatment this, however I nonetheless like to make use of the chance to start out any
new line of historical past with an empty commit:
❯ git commit --allow-empty -m "Preliminary commit"
❯ git present --stat 592f9dd06d1013e4b5300311e6c1b7033c17ab9b
commit 592f9dd06d1013e4b5300311e6c1b7033c17ab9b
Creator: Martin Øinæs Myrseth <myrseth@gmail.com>
Date: Thu Dec 21 21:25:24 2017 +0100
Preliminary commit
Observe that the standard diff of the git present
command is lacking. There was
no content material on this commit.
2. Set off a construct
Relying in your steady integration/deployment setup chances are high that
automated work is scheduled primarily based on updates to some git
repository. Alas,
flaky community and computation sources, assessments and dependency administration typically
are available in the best way of a flawless pipeline execution.
Creating empty git
commits and pushing them to a PR is likely to be all it’s essential
scratch that “let’s simply strive it as soon as extra, with out altering something” itch.
Pushing regionally ????
It’s arduous to make use of git
for lengthy with out encountering git push
, the git
sub-command for transferring (copying) stuff out of your native repository to some
different, distant git
repository. The format of the command I’m positive most individuals
are acquainted with is:
git push <distant>
Or for the extra adventurous:
git push <distant> <refspec>
Now what if I informed you it’s doable to push a distant reference to your
native repository, successfully reversing the route of the push operation?
git push . origin/principal:principal
The dot (.
) within the command above is what tells git
that the vacation spot
repository is the present listing. It’s equal to ./
or absolutely the
path of the repository (i.e. $(pwd)
).
Why on earth would you ever need to do that? Haven’t you heard of git pull?
Nicely, first off git pull
is a mixture of two git
operations: git
adopted by both a
fetchgit merge
or a git rebase
relying on consumer
configuration.
In truth, a git fetch
does the other of a typical git push
, which is to
replace the remote-tracking branches regionally, together with retrieving all objects
required to finish the historical past of the newfound distant commits. Then – within the
git pull
case – git merge
or git rebase
are invoked to replace the
presently checked out ref
to whichever commit is pointed to by the
remote-tracking department outlined because the “upstream” of the present native department
(pheeew).
The principle situation with git pull
on this state of affairs is the second step, as each git
and
mergegit rebase
require the ref
about to be up to date to even be checked
out. It’s because any potential conflicts should be resolved “on the department”.
In truth, to be able to correctly “push to the native repository” it’s needed to
invoke a git fetch
first to make sure that the distant monitoring branches are
up to date.
Distant-tracking branches are git
references (refs
) usually saved in
.git/refs/remotes/<remote-name>/<branch-name>
which maintain monitor of the department
state of a distant repository. For essentially the most half the remote-name
and department
combos are unambiguous permitting references to remote-tracking branches to
be abbreviated like origin/principal
. These references are managed by the git
command primarily. It’s necessary to remember that
fetchgit
does not hit
the community to verify for updates when merely referring to refs
like
origin/principal
.
Let’s say that you’ve got been working for a long-ish time frame on a function
department, leaving your native copy of the undertaking’s principal
department considerably out
of date. For no matter motive (diffing, cherry-picking, and many others) you need to replace
principal
with the most recent upstream adjustments, however you don’t need to navigate off
your function department. One motive is likely to be to keep away from invalidating a lot of a
slow-running incremental construct.
As an alternative of utilizing e.g. the worktree
performance to create and checkout a
complete new working tree, we will use the “push to native” to replace whichever
passive head
that we want.
git
offers the identical department safety guidelines as when pushing to distant
repositories. Which is to say that it denies non-fast-forward pushes by default,
however permits overrides by means of +<commit>:<ref>
refspecs, or the --force
and
--force-with-lease
command line choices.
Understand that utilizing these overrides are damaging and should result in
dataloss. At all times create a backup department with one thing like git department
when you’re not 100% sure you’re in
<some-temp-name> <branch-to-update>
management of what’s going to get pushed.
Commit rating ????
Maybe in want for one thing to function the year-end bonus rounds tie-breaker?
What higher technique to settle the implicit battle between your friends of “who’s
offering most worth” than by having a “git commit depend” showdown?
No, that’s a horrible concept …
Sure, sure it’s. Any wise developer is aware of that nothing good ever comes out of
inserting benefit in strains of code modified or variety of commits dedicated.
However do you have to, God forbid, ever be in must know (for the sake of
curiosity) who’s been committing essentially the most to a repository, right here’s git rank
:
git shortlog -s -n --no-merges
Configure it as an alias in ~/.config/git/config
with:
[alias]
rank = "shortlog -s -n --no-merges"
and easily run:
git rank
As a fast instance, behold, right here’s the horrendous output from my very own dotfiles
repository, the place I’ve been capable of make commits below totally different names and
identities:
❯ git shortlog -nse
567 Martin Øinæs Myrseth <myrseth@gmail.com>
322 Martin Øinæs Myrseth <mmyrseth@cisco.com>
142 Martin Myrseth <mm@myme.no>
14 Martin Myrseth <myrseth@gmail.com>
4 Martin Øinæs Myrseth <mm@myme.no>
3 Martin Myrseth <myme@map>
2 Martin Øinæs Myrseth <myme@Tuple.localdomain>
2 Martin Øinæs Myrseth <myme@map.localdomain>
It’s painful to learn, I do know. Attempt to think about the ache and embarrassment it’s
for me to share it. And have you ever been unlucky or careless sufficient to get
your self into an analogous scenario, please learn on. I’ll revisit this drawback in
the filter branch part beneath.
Cat file ????
That is extra of a git
get together trick than something to truly make a lot use of.
However I ought to say I’ve made use of git cat-file -p
on a few events
to assist folks visualize and really grok git
’s knowledge mannequin.
Because the identify hints at, the cat-file
command outputs details about git
objects. I’ve personally solely used cat-file
with the -p
(pretty-print) flag,
which first determines the kind of the article earlier than printing it out. Let’s
begin off with inspecting the HEAD
commit:
$ git cat-file -p HEAD
tree 9491ada70010d722646b674d2e2a26521628df94
mum or dad 9d7e5a6490c9f560f54fee9e1af5d72429bb26c7
creator Martin Myrseth <mm@myme.no> 1665439490 +0200
committer Martin Myrseth <mm@myme.no> 1665439490 +0200
Delete Docker deploy motion
We see the primary metadata that git associates with a commit: the repository file
construction (tree
), a mum or dad commit SHA1
reference, creator data and
lastly the commit message after a clean line. Let’s dig additional by passing the
SHA1
(9491ad..94
) of the tree
related to the HEAD
commit:
$ git cat-file -p 9491ada70010d722646b674d2e2a26521628df94
040000 tree 6d71faa5d70085c5d07228d8fa522fb712253b6d .github
100644 blob e09fe0dc282fdcaff06bcc6a9bbf57cbfc845eb4 .gitignore
100644 blob da7e7945524871726071f919690c9c9f6c1e173d README.md
100644 blob e6be557357c3fe2e3cce6f1b7b9b3c9c55981a16 default.nix
040000 tree 4f69a79c432cde80b4a1c486974b03cab84b45b9 docker
100644 blob 2f8aacd9efa3cfdf9e5f2860fa7226b510ed83bc feed-cors.conf
100644 blob 14b9e2dd0a41aa932c1f4bb5938519547f37f82c flake.lock
100644 blob eeae336837db94ca62255a7e5fa7f32ae3363716 flake.nix
100644 blob f1f8ef836b3b9b9ea011a43972a28ffaa713c868 picture.nix
040000 tree 5cad033d973f19ece938c33c3bb912eb63dc3305 website
040000 tree 49dc35d8e519f02f6f1a647f437226af198d225a ssg
100644 blob 60dede4bba8cd9479b0bec49048da1397e14f352 todo.org
The results of printing a tree
is what appears to be like like a listing itemizing of the
contents of that “tree” listing. Every listing entry is represented as some
mode bits, an object kind, the SHA1
of the article and the identify of the entry.
Bushes might include different tree
objects to create a listing construction, or
blob
objects which include file contents.
Lastly, let’s examine one of many blob
within the output, like .gitignore
(e09fe0..b4
):
$ git cat-file -p e09fe0dc282fdcaff06bcc6a9bbf57cbfc845eb4
.stack-work
_cache
public
dist-newstyle
.ghc.surroundings.*
# nix-build
outcome
Which prints out the precise content material of .gitignore
the best way it was dedicated
into the present HEAD
commit.
Wait? What? So all the things is simply textual content information?
Conceptually, sure. Nonetheless, fashionable git
does much more to optimize storage
(re)utilization and whatnot to make sure that a repository stays as small as doable.
There are different, scarier objects lurking below .git/objects
in a git
repository.
The git parable
As I mentioned at first of this part, I’ve used cat-file
to assist myself
and others perceive the git
object mannequin. Studying all the small print of that
mannequin isn’t the aim of this part (or publish) although. Nonetheless, if studying
this ignited some type of curiosity in your half I might gladly suggest the
discuss “The Git Parable” which dives deeper into the git
object mannequin, as
introduced by my good buddy Johan Herland:
Use case
Now, why would you need to use cat-file
? (Besides you wouldn’t, however let’s simply
play alongside right here)
I used to be deep into some refactoring and clean-up of a set of template information used
for numerous messages despatched out from a system. Every template listing would
include a set of information for every message template. I’ve been working with the
information for some time when a sense grew on me that a number of of those templates
appeared to be pretty comparable, an identical in truth.
At this level I had already been making some work-in-progress commits, which
would positively get in the best way for any try at checking if there have been
an identical template directories in my working copy. I needed to check the
contents of the template directories on the level earlier than I began making my
adjustments.
The first software for checking variations to information in git
is clearly the
git diff
command. It may simply verify the variations between information saved in
the git
historical past. Typical utilization of diff
is to check a single path throughout
numerous variations. Nonetheless, wanting nearer at it’s synopsis we will see that there
are a few name signatures which may do considerably what we want:
NAME
git-diff - Present adjustments between commits, commit and dealing tree, and many others
SYNOPSIS
git diff [<options>] [<commit>] [--] [<path>...]
git diff [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]
git diff [<options>] [--merge-base] <commit> [<commit>...] <commit> [--] [<path>...]
git diff [<options>] <commit>...<commit> [--] [<path>...]
git diff [<options>] <blob> <blob>
git diff [<options>] --no-index [--] <path> <path>
Primarily git diff <blob> <blob>
which might allow us to evaluate any two git
blob
objects. There’s additionally a word below “DESCRIPTION” which states:
Simply in case you might be doing one thing unique, it must be famous that all the
<commit>
within the above description, besides within the--merge-base
case and in
the final two varieties that use..
notations, could be any<tree>
.
Which signifies that additionally git diff <blob> <blob>
ought to do one thing alongside the
strains that we would like. And certainly, doing one thing much like the next yielded
an empty diff (the place HEAD~3
is the commit I primarily based my work on):
❯ git diff HEAD~3:some/templates/path/ HEAD~3:some/templates/other-path/
The guide web page for git-diff
states that it takes two blobs, but it surely’s simply
as legitimate with any tree-like object, typically known as tree-ish
within the git
documentation.
So I had discovered one pair of templates that had been an identical, and which might be
coalesced into one. However what if there have been extra? Utilizing git diff
alone I might
have needed to evaluate all permutations of template directories to see if the
outcomes had been empty.
No time for that…
As an alternative we will use cat-file
to easily dump all of the hashes of each sub
<tree>
. Then we will use a well-recognized shell pipeline to group the hashes and
depend them:
❯ git cat-file -p HEAD~3:some/templates/
| awk '{ print $3; }'
| type
| uniq -c
| type -rn
2 af83bb357f2b8dc42f6c9f07620140590dc7fd44
2 228182da5a0ffcf4c0d263bfa54852176f0250d2
1 ef1a471185c2092e6708349fa710702dd416f892
1 e453cb9d3dddbdd46a65c811068352ac40941fcd
1 e3df1181dae478172a7ae6bbc1618a3af2151db4
1 de0f6cd53ea97cb100a74c812f75c0d4844c0efa
1 d7f239da6283c927dad650599d49639ddc761465
1 d7d8f5aa3571ea2392028e353ad958d778d2bee0
1 cc03005d684b5735da337a6e5ca9765751943d7d
... # A bunch extra
Et voilà! We clearly see that there usually are not only one pair of duplicate
templates, however two!
I ought to word that this method is brittle within the sense that ought to there be
any distinction to the blobs in any respect this technique falls aside. In my case it
labored completely, however your mileage may differ. In my expertise there are sometimes
a number of methods to do the “identical factor” utilizing git
, so it might be good to listen to of
different approaches.
Orphan commits ????
Each commit in a git
repository has a reference to its mum or dad, which is the
commit that chronologically got here instantly earlier than the commit. For merge
commits the variety of mother and father are larger than one.
Nicely, that’s not 100% correct. As mentioned in the empty commit the preliminary
commit of a department is considerably particular: it has no mother and father. Commits with none
reference to a mum or dad known as an “orphan” commit. In most repos there would
solely be one such commit, the preliminary commit.
Nonetheless, git
is under no circumstances restricted to a single orphan commit. The default
habits when creating a brand new department is that the brand new department is predicated on some
start-point
. Utilizing git checkout --orphan
(or the presently unstable git
) it’s doable to start out off a very new and impartial
swap --orphan
line of historical past completely disconnected from the remainder of the repository.
The principle use-case I’ve had for git
’s assist of this performance is to not
begin “orphaned” histories, however quite take in the historical past of a department from
one other, unrelated repository. It’s very helpful when coalescing many smaller
repositories right into a monorepo or when vendorizing some library.
Merging histories
As an artificial case-study let’s import the doomemacs
historical past into my
dotfiles
repo!
First let’s create a brand new worktree
in order that we don’t mess up my precise information:
❯ git worktree add ~/code/doomfiles doomfiles
Getting ready worktree (trying out 'doomfiles')
HEAD is now at a0b32f8 machine: deque: Setup nginx with rtcp.myme.no
❯ cd ~/code/doomfiles
Doing a git log
of the latest commits we will see that they’re all mine:
❯ git log --oneline --graph -5
* 0f1f6cd machine: map: Allow podman
* 46099b9 emacs: Add React fn-component snippet
* 2e75458 ssh: Replace hosts
* 445ade4 machine: deque: Set SSH port
* bf0a552 flake: Add utils as "apps"
One other “little identified” function of git
is that it’s trivial to fetch “a random”
upstream repository with out including an express git distant
. This may be fairly
helpful when e.g. trying out some incoming one-off contribution. Simply cross the
distant url to fetch
instantly:
❯ git fetch git@github.com:doomemacs/doomemacs
distant: Enumerating objects: 118606, carried out.
distant: Counting objects: 100% (20/20), carried out.
distant: Compressing objects: 100% (17/17), carried out.
distant: Complete 118606 (delta 4), reused 15 (delta 3), pack-reused 118586
Receiving objects: 100% (118606/118606), 26.98 MiB | 6.80 MiB/s, carried out.
Resolving deltas: 100% (82950/82950), carried out.
From github.com:doomemacs/doomemacs
* department HEAD -> FETCH_HEAD
Because the output states, the results of the fetch is positioned within the particular git
ref
FETCH_HEAD
. We are able to use this ref to seek advice from the doomemacs
commit that was
fetched once we want to merge the histories.
Now, git
gained’t allow us to merge with out warning us first:
❯ git merge FETCH_HEAD
deadly: refusing to merge unrelated histories
Simply sufficient we will add the --allow-unrelated-histories
telling git
we’re
being fairly critical proper right here:
❯ git merge --allow-unrelated-histories FETCH_HEAD
Auto-merging .gitignore
CONFLICT (add/add): Merge battle in .gitignore
Auto-merging README.md
CONFLICT (add/add): Merge battle in README.md
Recorded preimage for '.gitignore'
Recorded preimage for 'README.md'
Computerized merge failed; repair conflicts after which commit the outcome.
Pfffft, conflicts … Let’s get on with our lives by merely resetting the conflicting information to the imported variations #yolo
.
❯ git checkout --theirs -- .gitignore README.md
❯ git add .gitignore README.md
❯ git commit -m 'Pulling in Doom Emacs!'
Recorded decision for '.gitignore'.
Recorded decision for 'README.md'.
[doomfiles 11826ae12] Pulling in Doom Emacs!
And that’s about it! Let’s examine the outcome:
❯ git present
commit 11826ae125834cc4e2263172275d8c51bca11d63 (HEAD -> doomfiles)
Merge: a0b32f85f e96624926
Creator: Martin Myrseth <mm@myme.no>
Date: Thu Jan 19 01:13:17 2023 +0100
Pulling in Doom Emacs!
We are able to see that the commit is a merge commit, the place one mum or dad is a0b32f85f from
my dotfiles
whereas the opposite mum or dad e96624926 is the present HEAD
from the
doomemacs
repo.
Now we have efficiently merged the histories of my dotfiles
repository with Doom
Emacs!
As said beforehand, this may be fairly helpful when pulling in e.g. an
experimental repository, vendorizing some dependency or equally setting up a
monorepo from separate smaller repositories.
The following part is about one (of a number of) instances I’ve discovered this handy myself.
Dotfiles chapter
I agree that the earlier instance of absorbing Doom Emacs
into my dotfiles
is type of foolish, but it surely illustrates potentialities.
Stepping away from artificial examples I additionally want to present one of some
events the place I’ve made use of it to resolve a real-world use-case.
Let’s step again into my dotfiles
.
With our new information about orphan commits we might surprise if there’s a technique to simply question for them in a git
repository. And there positive is:
❯ git log --all --max-parents=0
commit 6fa853118711f557a911b98f00d5c4a2eb3ded71
Creator: Martin Myrseth <mm@myme.no>
Date: Mon Jan 17 21:44:43 2022 +0000
nixos: Preliminary commit
commit 61a3f80babec8c1339391462590dafe7ff30fe7f
Creator: Martin Myrseth <mm@myme.no>
Date: Wed Feb 10 11:59:23 2016 +0100
Inital import of tuple
There may be not one, however two commits within the dotfiles
repository which doesn’t
have any mother and father.
- The actual “Initial import” created at first of time.
- The far more current “nixos: Initial commit”.
The second commit was the start of my try to maneuver my machine
configurations in direction of a totally NixOS
managed declarative setup constructed on prime of
flakes
. I’ve already coated this journey in another post which additionally hyperlinks to
the state of my configuration administration earlier than that migration.
In any case, when beginning my configuration rewrite I wasn’t but positive if I might
desire a clear slate or finally port it into my dotfiles
. In the long run I
figured I may have each, by merely pulling within the experiment into my already
present historical past.
Finally my experiment had matured to the purpose the place I used to be satisfied I had
what I needed. It was time to import it into the dotfiles
repository.
Following just about the identical steps as within the earlier part I ended up with
the next merge commit:
❯ git present 79977b007099390a53e11f540e178f6285137206
commit 79977b007099390a53e11f540e178f6285137206
Merge: ad28da4 841eec3
Creator: Martin Myrseth <mm@myme.no>
Date: Wed Feb 2 00:19:24 2022 +0100
nixos: Declare dotfile chapter
I keep in mind studying an e mail thread on the git
mailing record within the early days
of git
the place Linus Torvalds boasted performing this “absorption” operation in
order to tug in some unrelated historical past.
And equally attention-grabbing I keep in mind studying an evaluation which touched on what number of
orphan commits there are within the Linux
principal tree. I keep in mind there being 4,
one particularly appeared like a “careless” unintentional mistake.
Edit 2023-01-23:
Initially I didn’t spend sufficient effort looking for these two sources, however
each by means of assist from readers and a few extra search-fu vigilance I used to be capable of
discover what I used to be referring to:
-
From lore.kernel.org: Behold, The coolest merge EVER!
-
Because of a reader I used to be directed in direction of precisely the publish I used to be after that
researched “bizarre”Linux
commits (from Destroy All Software):“The Biggest and Weirdest Commits in Linux Kernel Git History” goes by means of
each octopus merges and orphaned commits within the historical pastLinux
.
Filter department ????
The git filter-branch
command has acquired WARNING written throughout it. Please
proceed with warning. This part illuminates utilization of filter-branch
to repair a
specific drawback. Because the part goes on to elucidate, there are higher, much less
damaging methods to resolve these issues.
I are likely to work in various git
repositories throughout numerous machines. I additionally
break up work between my private tasks and something associated to $DAYJOB
. I do
not need to taint the git
historical past in work repositories with private e mail
addresses and different “unprofessionalism”.
Seems I do although. Bear in mind the painful output from commit ranking?
Oh, the embarrassment! It’s insufferable!
This instance is was from my dotfiles historical past earlier than I cleaned it up. I
sometimes setup new machines, and my dotfiles
repo is the very first thing I
pull in after the machine boots. Each host usually wants some type of
tweaking, and never realizing I haven’t setup my git
configurations appropriately, I
begin patching and committing configurations for the brand new host.
Subsequent factor I do know I’ve utterly missed the truth that I’ve been committing with
every kind of ad-hoc consumer
settings inferred from git
with out letting me know.
I’ve been conscious of this potential situation for a very long time, and have proactively
tried to mitigate it utilizing numerous methods on a number of events up to now.
Typically dangerous commits handle to slide by means of although. With a stricter deal with a
holistic nix flakes
host setup, I hope I’m rid this situation of partial
(mis)configuration as soon as and for all.
The filter-branch cleanup
Most individuals acquainted with rewriting git
historical past learn about git rebase
and
git rebase --interactive
, which permit operations like “transferring” (or replaying)
commits onto new mother and father, rewriting commit messages, re-assigning creator
data, in addition to making adjustments to the supply tree.
Maybe much less acquainted to folks is the git filter-branch
command, which is
kind of the hydrogen bomb of historical past rewriting. I urge you to heed the obvious warning
that meets you in man git-filter-branch(1)
and maybe contemplate different
options like git-filter-repo:
WARNING
git filter-branch has a plethora of pitfalls that may produce non-obvious
manglings of the supposed historical past rewrite (and might go away you with little
time to analyze such issues because it has such abysmal efficiency).
These security and efficiency points can't be backward compatibly mounted
and as such, its use will not be beneficial. Please use an alternate
historical past filtering software akin to git filter-repo. In case you nonetheless want to make use of
git filter-branch, please fastidiously learn the part known as “SAFETY” (and
the part known as “PERFORMANCE”) to study in regards to the land mines of
filter-branch, after which vigilantly keep away from as lots of the hazards listed
there as fairly doable.
Warning apart, just a few elements lead me to consider this was what I needed on this specific state of affairs:
- All of the defective commits had been pretty current, I wouldn’t contact very outdated historical past.
- I’ve had expertise operating
filter-branch
from approach again and felt assured
selecting it once more. - The manpage has the precise use-case exemplified.
With just a few modifications from this Stack Overflow answer:
❯ git filter-branch --env-filter '
WRONG_EMAIL="martin@machine.localdomain"
NEW_NAME="Martin Myrseth"
NEW_EMAIL="martin@instance.com"
if [ "$GIT_COMMITTER_EMAIL" = "$WRONG_EMAIL" ]
then
export GIT_COMMITTER_NAME="$NEW_NAME"
export GIT_COMMITTER_EMAIL="$NEW_EMAIL"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$WRONG_EMAIL" ]
then
export GIT_AUTHOR_NAME="$NEW_NAME"
export GIT_AUTHOR_EMAIL="$NEW_EMAIL"
fi
' --tag-name-filter cat -- --branches --tags
I do not need the output of this command prepared at hand. It’s some time since I ran
it, and I don’t desiring to do it once more any time quickly. All I can say is it
labored out properly for me.
I don’t assume there’s a lot motive to take a position a complete lot of effort into
understanding all of the ins and outs of filter-branch
. There are most probably
all the time higher choices to resolve the issues it can also remedy, so strive your greatest
to keep away from it.
Following are another workarounds to keep away from committing with a damaged consumer
configuration or making certain that faults are no less than hid in command outputs.
Git templates and pre-commit hooks
Earlier than all the different mitigations outlined within the sections beneath I used to
have a .gittemplates
folder containing just a few git hooks
that will be added
to each newly created repository. One in every of these hooks was the pre-commit hook
which checked that I had a correctly configured consumer.identify
and consumer.e mail
.
#!/usr/bin/env bash
if !(git config user.name &> /dev/null && git config user.email &> /dev/null); then
echo "Please setup your repository with a user.name and user.email" >&2
exit 1
fi
If I ever forgot to properly setup particularly the user.email
for a specific
repository then git
wouldn’t let me commit without annoying me with a warning.
Since I rarely change my name (I haven’t yet), I would hardcode user.name
into
my user-global git configuration.
Due to the chicken-and-egg problem, these hooks weren’t created for my
dotfiles
repo on new hosts because they’re in the dotfiles
repo. It’s a
while since I abandoned this approach alltogether as it’s obsoleted by the
solution of the next section.
Keep in mind this was added a while ago, and before I’d learned about the
superior means of working around this problem which I’ll get to below. This
solution is most likely not what you want.
No second guessing please!
One of the git
defaults I’m not very fond of is the user.useConfigOnly
configuration which is false
by default. Here’s its excerpt from man
:
git-config(1)
user.useConfigOnly
Instruct Git to avoid trying to guess defaults for user.email and user.name,
and instead retrieve the values only from the configuration. For example, if
you have multiple email addresses and would like to use a different one for
each repository, then with this configuration option set to true in the
global config along with a name, Git will prompt you to set up an email
before making new commits in a newly cloned repository. Defaults to false.
I guess the documentation outlines my “default” use-case, which is to use
different email addresses for the repository I work in. With the following
configuration git
will refuse to commit when user
configuration is missing,
thus obsoleting my pre-commit
hook:
[user]
name = "Martin Myrseth"
useConfigOnly = true
Git conditional configuration
It’s hard to argue against the fact that the best way to solve any problem,
is to not have the problem in the first place. Using some “clever” conditional
configuration sections it’s possible to include additional configurations for
e.g. repositories within specific sub-directories on the filesystem, ensuring
that there never is a partial user
configuration.
Once I became aware of this configuration trick I took more care as to where I
placed repositories on disk. Making sure to have separate directories for
personal and work related repos. With this repository directory layout, it’s
possible to have a conditional section in gitconfig
which applies additional
configurations to any repository matching the predicate (i.e. placement on
disk):
[includeIf "gitdir:~/code/work/"]
path = "./work_config"
Any repository under ~/code/work
will include the configuration from
./work_config
, which may contain something like the following:
[commit]
gpgSign = true
[tag]
forceAnnotated = true
gpgSign = true
[user]
email = "martin@day.job"
signingKey = "martin@day.job"
.mailmap
Although the filter-branch
command allows a full cleanup of the history of a
git
repository, it shouldn’t be understated the potential damage and
inconvenience such an operation has on the repository integrity. Rewriting
history has the viral effect of changing SHA1
sums of all subsequent commits,
leading to parallel histories (old vs. new). This is most likely not what you
want for public histories.
On the other end of the spectrum git
provides a rather convenient and
non-destructive feature to solve this particular issue through its mailmap
support. Quoting the man gitmailmap
:
If the file
.mailmap
exists at the toplevel of the repository … it is used
to map author and committer names and email addresses to canonical real names
and email addresses.
The man page of gitmailmap
contains syntactical examples of mailmap entries.
To correct a simple incorrect email one can add an entry on the format:
<proper@email.xx> <commit@email.xx>
The .mailmap
can also correct user.name
issues as well as correct specific
commits and so on. Here’s the .mailmap file from my dotfiles
which fixes up a
few of my previous errors:
Martin Myrseth <mm@myme.no> <mm@myme.no>
Martin Myrseth <mm@myme.no> <myrseth@gmail.com>
Martin Myrseth <mm@myme.no> <mmyrseth@cisco.com>
Martin Myrseth <mm@myme.no> <myme@Tuple.localdomain>
Martin Myrseth <mm@myme.no> <myme@map.localdomain>
Octopus merge ????
I have to admit, I by no means use this, however I keep in mind being amazed the primary time I realized in regards to the many-parent merge means of git
way back.
I might assume most individuals dwell their life considering a merge commit is simply the
mixed results of two considerably associated histories. Ideally two histories that
forked off each other in (hopefully) the not too distant previous.
But, we’ve already seen and debunked the truth that histories should be
“considerably associated” to be able to be merged. That’s what the “take in another
repository” performance coated within the orphan commits part was all about.
I assume then it comes as no shock that the idea of merges solely ever
having simply two mother and father is additionally not a tough limitation.
Tentacles
Let’s see how we will create a many-parent merge commit, known as an “octopus
merge”, by beginning off a brand new repository and including a bunch of branches to it:
❯ mkdir octopus
❯ cd octopus/
❯ git init
Initialized empty Git repository in /house/myme/tmp/octopus/.git/
❯ git commit --allow-empty -m 'Preliminary commit'
Creator identification unknown
*** Please inform me who you might be.
Run
git config --global consumer.e mail "you@instance.com"
...
Ah … proper. Forgot about that ????
❯ git config consumer.e mail 'dave@tentacle.org'
❯ git commit --allow-empty -m 'Preliminary commit'
[main (root-commit) 9ff0a71] Preliminary commit
At this level we have now a brand new git
repository with a single principal
department containing a single empty commit:
❯ git log --all --oneline --graph
* 9ff0a71 (HEAD -> principal) Preliminary commit
Let’s create some branches with content material:
❯ git checkout -b tentacle
Switched to a brand new department 'tentacle'
❯ date > tentacle.txt
❯ git add tentacle.txt
❯ git commit -m 'Add day of tentacle.txt'
[tentacle 4dadc16] Add day of tentacle.txt
1 file modified, 1 insertion(+)
create mode 100644 tentacle.txt
Yay, one limb (aka department) in place!
❯ git log --all --oneline --graph
* 4dadc16 (HEAD -> tentacle) Add day of tentacle.txt
* 9ff0a71 (principal) Preliminary commit
However creating limbs is tedious. Let’s push the fast-forward button:
for count in nine eight seven six five four three two one; do
limb="${count}tacle"
git checkout -b "$limb" main
date > "${limb}.txt"
git add "${limb}.txt"
git commit -m "Add ${limb}"
done
Switched to a new branch 'ninetacle'
[ninetacle 3f7a95e] Add ninetacle
1 file changed, 1 insertion(+)
create mode 100644 ninetacle.txt
Switched to a new branch 'eighttacle'
[eighttacle e9cd39a] Add eighttacle
1 file changed, 1 insertion(+)
create mode 100644 eighttacle.txt
Switched to a new branch 'seventacle'
...
Switched to a new branch 'sixtacle'
...
Switched to a new branch 'fivetacle'
...
Switched to a new branch 'fourtacle'
...
Switched to a new branch 'threetacle'
...
Switched to a new branch 'twotacle'
...
Switched to a new branch 'onetacle'
[onetacle c78c58a] Add onetacle
1 file changed, 1 insertion(+)
create mode 100644 onetacle.txt
And we got ourselves a bunch of limbs!
❯ git log --all --oneline --graph
* e9cd39a (eighttacle) Add eighttacle
| * e310cbc (fivetacle) Add fivetacle
|/
| * 44ad755 (fourtacle) Add fourtacle
|/
| * 3f7a95e (ninetacle) Add ninetacle
|/
| * c78c58a (HEAD -> onetacle) Add onetacle
|/
| * 6be7cf4 (seventacle) Add seventacle
|/
| * a54e5c1 (sixtacle) Add sixtacle
|/
| * 3b1a5da (threetacle) Add threetacle
|/
| * bb79112 (twotacle) Add twotacle
|/
| * 4dadc16 (tentacle) Add day of tentacle.txt
|/
* 9ff0a71 (main) Initial commit
Time to assemble our squid:
❯ git merge tentacle ninetacle eighttacle seventacle sixtacle fivetacle fourtacle threetacle twotacle onetacle -m 'Assemble squid'
Fast-forwarding to: tentacle
Trying simple merge with ninetacle
Trying simple merge with eighttacle
Trying simple merge with seventacle
Trying simple merge with sixtacle
Trying simple merge with fivetacle
Trying simple merge with fourtacle
Trying simple merge with threetacle
Trying simple merge with twotacle
Trying simple merge with onetacle
Merge made by the 'octopus' strategy.
eighttacle.txt | 1 +
fivetacle.txt | 1 +
fourtacle.txt | 1 +
ninetacle.txt | 1 +
onetacle.txt | 1 +
seventacle.txt | 1 +
sixtacle.txt | 1 +
tentacle.txt | 1 +
threetacle.txt | 1 +
twotacle.txt | 1 +
10 files changed, 10 insertions(+)
create mode 100644 eighttacle.txt
create mode 100644 fivetacle.txt
create mode 100644 fourtacle.txt
create mode 100644 ninetacle.txt
create mode 100644 onetacle.txt
create mode 100644 seventacle.txt
create mode 100644 sixtacle.txt
create mode 100644 tentacle.txt
create mode 100644 threetacle.txt
create mode 100644 twotacle.txt
The end result is the most wonderful git
graph ever!
We’ve managed to create a new commit in our repository with no less than ten
parents. We can also confirm this using git show
:
❯ git show
commit 442b9a2852fc2707517690f1a994c1c5a38ac20b (HEAD -> main)
Merge: 4dadc16 3f7a95e e9cd39a 6be7cf4 a54e5c1 e310cbc 44ad755 3b1a5da bb79112 c78c58a
Author: Martin Myrseth <dave@tentacle.org>
Date: Fri Jan 20 01:09:57 2023 +0100
Assemble squid
Note the Merge:
line with all the parent SHA1
sums. Also notice how git
deviates from the more “vanilla”
showcat-file -p
output by renaming each of the
metadata labels.
Use-cases
Honestly, in practice I haven’t found a single valid use-case for octopus merges
which aren’t already covered by sequencing a series of merges, one after the
other. Perhaps there are some integration use-cases out there which really let’s
the octopus merge strategy shine. Let me know!
I should also note that the octopus merge strategy is quite conservative and
bluntly refuses to merge anything which doesn’t trivially apply without
conflicts. I imaging trying to juggle changes and their origins during a merge
resolution to be quite the mess.
One thing I like though about the octopus merge is that it quite visually shows
how simple the git
graph model really is. It has helped me build intuition
about what goes on during a merge operation in git
.
The dishonest merge ????
While on the topic of merges, I’d like to quickly break down some of the
misconception(?) that merge commits are something special in git
.
It might be true that there’s some special sauce involving merge-bases
and
heuristics in order to determine the merge result of joining multiple
histories. But once a commit with multiple parents have been made there’s no
requirement that whichever tree
is associated with a merge commit to make any
kind of sense with regards to the merge operation its parent relationship
reflects.
Let’s continue from where the octopus merge left off and see that we’ve got all
ten *tacles in place:
❯ ls -la
total 52
drwxr-xr-x 3 myme users 4096 Jan 20 01:09 .
drwxr-xr-x 7 myme users 4096 Jan 20 00:42 ..
-rw-r--r-- 1 myme users 32 Jan 20 01:09 eighttacle.txt
-rw-r--r-- 1 myme users 32 Jan 20 01:09 fivetacle.txt
-rw-r--r-- 1 myme users 32 Jan 20 01:09 fourtacle.txt
drwxr-xr-x 9 myme users 4096 Jan 20 01:09 .git
-rw-r--r-- 1 myme users 32 Jan 20 01:09 ninetacle.txt
-rw-r--r-- 1 myme users 32 Jan 20 01:09 onetacle.txt
-rw-r--r-- 1 myme users 32 Jan 20 01:09 seventacle.txt
-rw-r--r-- 1 myme users 32 Jan 20 01:09 sixtacle.txt
-rw-r--r-- 1 myme users 32 Jan 20 01:09 tentacle.txt
-rw-r--r-- 1 myme users 32 Jan 20 01:09 threetacle.txt
-rw-r--r-- 1 myme users 32 Jan 20 01:09 twotacle.txt
There’s nothing stopping us at this point to delete everything introduced by
merging all the tentacles and amending the HEAD
commit:
❯ git rm *.txt
rm 'eighttacle.txt'
rm 'fivetacle.txt'
rm 'fourtacle.txt'
rm 'ninetacle.txt'
rm 'onetacle.txt'
rm 'seventacle.txt'
rm 'sixtacle.txt'
rm 'tentacle.txt'
rm 'threetacle.txt'
rm 'twotacle.txt'
❯ git commit --amend -C HEAD
[main 8494ef5] Assemble squid
Date: Fri Jan 20 01:09:57 2023 +0100
All files are gone:
❯ ls -l
total 0
Yet the default view of git show
of the merge doesn’t hint at anything suspicious:
commit de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (HEAD -> main)
Merge: 4dadc16 3f7a95e e9cd39a 6be7cf4 a54e5c1 e310cbc 44ad755 3b1a5da bb79112 c78c58a
Author: Martin Myrseth <dave@tentacle.org>
Date: Fri Jan 20 01:09:57 2023 +0100
Assemble squid
While asking it to also include the merge commits it’s fairly obvious that
somebody have been messing around with the merge resolution:
❯ git show --pretty=oneline -m --stat
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from 4dadc16d89758ed1625223286e1218b63c988313) (HEAD -> main) Assemble squid
tentacle.txt | 1 -
1 file changed, 1 deletion(-)
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from 3f7a95ecac18a92451f7e205c8ea0bb2366c2e97) (HEAD -> main) Assemble squid
ninetacle.txt | 1 -
1 file changed, 1 deletion(-)
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from e9cd39ad4664b04f29263250396ec1b270e4eeb8) (HEAD -> main) Assemble squid
eighttacle.txt | 1 -
1 file changed, 1 deletion(-)
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from 6be7cf4b00f640a32d61a9e205e0b4a1e18b3bb8) (HEAD -> main) Assemble squid
seventacle.txt | 1 -
1 file changed, 1 deletion(-)
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from a54e5c16f807a3f9aad8dd0c5187abcc9e6b6c7d) (HEAD -> main) Assemble squid
sixtacle.txt | 1 -
1 file changed, 1 deletion(-)
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from e310cbcfecaa3cb6f08084a64c18318f7552a8a7) (HEAD -> main) Assemble squid
fivetacle.txt | 1 -
1 file changed, 1 deletion(-)
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from 44ad755cc07047ee3dd25c5170aa9d4dde60475c) (HEAD -> main) Assemble squid
fourtacle.txt | 1 -
1 file changed, 1 deletion(-)
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from 3b1a5da6c6e5b2d0b93517dda20c3295ed893374) (HEAD -> main) Assemble squid
threetacle.txt | 1 -
1 file changed, 1 deletion(-)
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from bb791123be4bd03a0c6427d1990cd57898dd9793) (HEAD -> main) Assemble squid
twotacle.txt | 1 -
1 file changed, 1 deletion(-)
de3e016de71484e62e6ac7e6dda08fe7f9d85af4 (from c78c58a2debbab2d88ed0e747a54f4d750f8378f) (HEAD -> main) Assemble squid
onetacle.txt | 1 -
1 file changed, 1 deletion(-)
In the end, a merge commit in git
tracks a tree
– like any other commit –
and it only extends on the parent commit metadata by including one parent
field for all commits that serves as inputs to the merge operation. Furthermore,
it places no constraints onto the changes made to the tree
associated with
that commit. Which basically gives a committer full “artistic freedom” as to
what should be the result of a merge, ranging from the trivial “sum of all
differences” or minor conflict resolutions to absolutely wild rewrites that had
absolutely nothing to do with the differences that went into a merge to begin with.
Rounding off
I’m sure that many of these features of git
are by no means news to the
readers of this post, and I’m not exactly sure what pushed me towards writing it
in the first place. If anything, it’s a recollection of (silly) things I’ve done
in the past. Hopefully it could also inspire people to go learn tools that serve
as their daily drivers beyond just the basic or core functionality.
I’m a believer that not everything we learn or do has to necessarily have some
obvious usefulness in and of itself. Often when learning tools, techniques,
programming languages, and everything else in the field of software, I find that
going off on tangents can help build intuition about core concepts, ultimately
leading to a deeper understanding. Of course, the few times this peripheral
knowledge is of actual use in real-life situations it’s even better.
I do place great value in utility, but I also like to remind people to have
fun, experiment, and to build simply for the sake of building. Which, while
typing out this summary, reminded me of this recent post – “Take your pragmatism for a unicycle ride” – which appeared on my favorite tech aggregator site the opposite day. A publish which additionally touched on the significance of
developer vitality. That’s one thing I contemplate very central to my very own
motivation and psychological well-being. If there’s enjoyable available in studying – or
constructing – we’re a lot much less more likely to burn out from it.