mirror of
https://github.com/newren/git-filter-repo.git
synced 2024-09-21 00:00:52 +02:00
23bec32283
The desire to format or lint code throughout history has arisen several times. It's more natural to do this in filter-branch since it somewhat forces people to run external commands, but we have an example contrib demo that shows how to run an external command on each file in history that I created even before any of these requests came in and yet I still periodically get requests about it. Make lint-history ever-so-slightly easier to apply to a subset of filenames, and include its usage as an extra cheat sheet comparison for filter-branch-vs-filter-repo commands. Signed-off-by: Elijah Newren <newren@gmail.com>
347 lines
11 KiB
Markdown
347 lines
11 KiB
Markdown
# Cheat Sheet: Converting from filter-branch
|
|
|
|
This document is aimed at folks who are familiar with filter-branch and want
|
|
to learn how to convert over to using filter-repo.
|
|
|
|
## Table of Contents
|
|
|
|
* [Half-hearted conversions](#half-hearted-conversions)
|
|
* [Intention of "equivalent" commands](#intention-of-equivalent-commands)
|
|
* [Basic Differences](#basic-differences)
|
|
* [Cheat Sheet: Conversion of Examples from the filter-branch manpage](#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage)
|
|
* [Cheat Sheet: Additional conversion examples](#cheat-sheet-additional-conversion-examples)
|
|
|
|
## Half-hearted conversions
|
|
|
|
You can switch nearly any `git filter-branch` command to use
|
|
filter-repo under the covers by just replacing the `git filter-branch`
|
|
part of the command with
|
|
[`filter-lamely`](../contrib/filter-repo-demos/filter-lamely). The
|
|
git.git regression testsuite passes when I swap out the filter-branch
|
|
script with filter-lamely, for example. (However, the filter-branch
|
|
tests are not very comprehensive, so don't rely on that too much.)
|
|
|
|
Doing a half-hearted conversion has nearly all of the drawbacks of
|
|
filter-branch and nearly none of the benefits of filter-repo, but it
|
|
will make your command run a few times faster and makes for a very
|
|
simple conversion.
|
|
|
|
You'll get a lot more performance, safety, and features by just
|
|
switching to direct filter-repo commands.
|
|
|
|
## Intention of "equivalent" commands
|
|
|
|
filter-branch and filter-repo have different defaults, as highlighted
|
|
in the Basic Differences section below. As such, getting a command
|
|
which behaves identically is not possible. Also, sometimes the
|
|
filter-branch manpage lies, e.g. it says "suppose you want to...from
|
|
all commits" and then uses a command line like "git filter-branch
|
|
... HEAD", which only operates on commits in the current branch rather
|
|
than on all commits.
|
|
|
|
Rather than focusing on matching filter-branch output as exactly as
|
|
possible, I treat the filter-branch examples as idiomatic ways to
|
|
solve a certain type of problem with filter-branch, and express how
|
|
one would idiomatically solve the same problem in filter-repo.
|
|
Sometimes that means the results are not identical, but they are
|
|
largely the same in each case.
|
|
|
|
## Basic Differences
|
|
|
|
With `git filter-branch`, you have a git repository where every single
|
|
commit (within the branches or revisions you specify) is checked out
|
|
and then you run one or more shell commands to transform the working
|
|
copy into your desired end state.
|
|
|
|
With `git filter-repo`, you are essentially given an editing tool to
|
|
operate on the [fast-export](https://git-scm.com/docs/git-fast-export)
|
|
serialization of a repo. That means there is an input stream of all
|
|
the contents of the repository, and rather than specifying filters in
|
|
the form of commands to run, you usually employ a number of common
|
|
pre-defined filters that provide various ways to slice, dice, or
|
|
modify the repo based on its components (such as pathnames, file
|
|
content, user names or emails, etc.) That makes common operations
|
|
easier, even if it's not as versatile as shell callbacks. For cases
|
|
where more complexity or special casing is needed, filter-repo
|
|
provides python callbacks that can operate on the data structures
|
|
populated from the fast-export stream to do just about anything you
|
|
want.
|
|
|
|
filter-branch defaults to working on a subset of the repository, and
|
|
requires you to specify a branch or branches, meaning you need to
|
|
specify `-- --all` to modify all commits. filter-repo by contrast
|
|
defaults to rewriting everything, and you need to specify `--refs
|
|
<rev-list-args>` if you want to limit to just a certain set of
|
|
branches or range of commits. (Though any `<rev-list-args>` that
|
|
begin with a hyphen are not accepted by filter-repo as they look like
|
|
the start of different options.)
|
|
|
|
filter-repo also takes care of additional concerns automatically, like
|
|
rewriting commit messages that reference old commit IDs to instead
|
|
reference the rewritten commit IDs, pruning commits which do not start
|
|
empty but become empty due to the specified filters, and automatically
|
|
shrinking and gc'ing the repo at the end of the filtering operation.
|
|
|
|
## Cheat Sheet: Conversion of Examples from the filter-branch manpage
|
|
|
|
### Removing a file
|
|
|
|
The filter-branch manual provided three different examples of removing
|
|
a single file, based on different levels of ease vs. carefulness and
|
|
performance:
|
|
|
|
```shell
|
|
git filter-branch --tree-filter 'rm filename' HEAD
|
|
```
|
|
```shell
|
|
git filter-branch --tree-filter 'rm -f filename' HEAD
|
|
```
|
|
```shell
|
|
git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
|
|
```
|
|
|
|
All of these just become
|
|
|
|
```shell
|
|
git filter-repo --invert-paths --path filename
|
|
```
|
|
|
|
### Extracting a subdirectory
|
|
|
|
Extracting a subdirectory via
|
|
|
|
```shell
|
|
git filter-branch --subdirectory-filter foodir -- --all
|
|
```
|
|
|
|
is one of the easiest commands to convert; it just becomes
|
|
|
|
```shell
|
|
git filter-repo --subdirectory-filter foodir
|
|
```
|
|
|
|
### Moving the whole tree into a subdirectory
|
|
|
|
Keeping all files but placing them in a new subdirectory via
|
|
|
|
```shell
|
|
git filter-branch --index-filter \
|
|
'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
|
|
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
|
|
git update-index --index-info &&
|
|
mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
|
|
```
|
|
|
|
(which happens to be GNU-specific and will fail with BSD userland in
|
|
very subtle ways) becomes
|
|
|
|
```shell
|
|
git filter-repo --to-subdirectory-filter newsubdir
|
|
```
|
|
|
|
(which works fine regardless of GNU vs BSD userland differences.)
|
|
|
|
### Re-grafting history
|
|
|
|
The filter-branch manual provided one example with three different
|
|
commands that could be used to achieve it, though the first of them
|
|
had limited applicability (only when the repo had a single initial
|
|
commit). These three examples were:
|
|
```shell
|
|
git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
|
|
```
|
|
```shell
|
|
git filter-branch --parent-filter \
|
|
'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
|
|
```
|
|
```shell
|
|
git replace --graft $commit-id $graft-id
|
|
git filter-branch $graft-id..HEAD
|
|
```
|
|
|
|
git-replace did not exist when the original two examples were written,
|
|
but it is clear that the last example is far easier to understand. As
|
|
such, filter-repo just uses the same mechanism:
|
|
|
|
```shell
|
|
git replace --graft $commit-id $graft-id
|
|
git filter-repo --force
|
|
```
|
|
|
|
NOTE: --force should usually be avoided unless you have taken care to
|
|
make sure you have a backup (or are running on a fresh clone of) your
|
|
repo. It is needed in this case because filter-repo errors out when
|
|
no arguments are specified, and because it usually first checks
|
|
whether you are in a fresh clone before irrecoverably rewriting your
|
|
repository (git-replace created a new graft and thus added something
|
|
to your previously fresh clone).
|
|
|
|
### Removing commits by a certain author
|
|
|
|
WARNING: This is a BAD example for BOTH filter-branch and filter-repo.
|
|
It does not remove the changes the user made from the repo, it just
|
|
removes the commit in question while smashing the changes from it into
|
|
any subsequent commits as though the subsequent authors had been
|
|
responsible for those changes as well. `git rebase` is likely to be a
|
|
better fit for what you really want if you are looking at this
|
|
example. (See also [this explanation of the differences between
|
|
rebase and
|
|
filter-repo](https://github.com/newren/git-filter-repo/issues/62#issuecomment-597725502))
|
|
|
|
This filter-branch example
|
|
|
|
```shell
|
|
git filter-branch --commit-filter '
|
|
if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
|
|
then
|
|
skip_commit "$@";
|
|
else
|
|
git commit-tree "$@";
|
|
fi' HEAD
|
|
```
|
|
|
|
becomes
|
|
|
|
```shell
|
|
git filter-repo --commit-callback '
|
|
if commit.author_name == b"Darl McBribe":
|
|
commit.skip()
|
|
'
|
|
```
|
|
|
|
### Rewriting commit messages -- removing text
|
|
|
|
Removing git-svn-id: lines from commit messages via
|
|
|
|
```shell
|
|
git filter-branch --msg-filter '
|
|
sed -e "/^git-svn-id:/d"
|
|
'
|
|
```
|
|
|
|
becomes
|
|
|
|
```shell
|
|
git filter-repo --message-callback '
|
|
return re.sub(b"^git-svn-id:.*\n", b"", message, flags=re.MULTILINE)
|
|
'
|
|
```
|
|
|
|
### Rewriting commit messages -- adding text
|
|
|
|
Adding Acked-by lines to the last ten commits via
|
|
|
|
```shell
|
|
git filter-branch --msg-filter '
|
|
cat &&
|
|
echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
|
|
' master~10..master
|
|
```
|
|
|
|
becomes
|
|
|
|
```shell
|
|
git filter-repo --message-callback '
|
|
return message + b"Acked-by: Bugs Bunny <bunny@bugzilla.org>\n"
|
|
' --refs master~10..master
|
|
```
|
|
|
|
### Changing author/committer(/tagger?) information
|
|
|
|
```shell
|
|
git filter-branch --env-filter '
|
|
if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
|
|
then
|
|
GIT_AUTHOR_EMAIL=john@example.com
|
|
fi
|
|
if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
|
|
then
|
|
GIT_COMMITTER_EMAIL=john@example.com
|
|
fi
|
|
' -- --all
|
|
```
|
|
|
|
becomes either
|
|
|
|
```shell
|
|
# Ensure '<john@example.com> <root@localhost>' is a line in .mailmap, then:
|
|
git filter-repo --use-mailmap
|
|
```
|
|
|
|
or
|
|
|
|
```shell
|
|
git filter-repo --email-callback '
|
|
return email if email != b"root@localhost" else b"john@example.com"
|
|
'
|
|
```
|
|
|
|
(and as a bonus both filter-repo alternatives will fix tagger emails
|
|
too, unlike the filter-branch example)
|
|
|
|
|
|
### Restricting to a range
|
|
|
|
The partial examples
|
|
|
|
```shell
|
|
git filter-branch ... C..H
|
|
```
|
|
```shell
|
|
git filter-branch ... C..H ^D
|
|
```
|
|
```shell
|
|
git filter-branch ... D..H ^C
|
|
```
|
|
|
|
become
|
|
|
|
```shell
|
|
git filter-repo ... --refs C..H
|
|
```
|
|
```shell
|
|
git filter-repo ... --refs C..H ^D
|
|
```
|
|
```shell
|
|
git filter-repo ... --refs D..H ^C
|
|
```
|
|
|
|
Note that filter-branch accepts `--not` among the revision specifiers,
|
|
but that appears to python to be a flag name which breaks parsing.
|
|
So, instead of e.g. `--not C` as we might use with filter-branch, we
|
|
can specify `^C` to filter-repo.
|
|
|
|
## Cheat Sheet: Additional conversion examples
|
|
|
|
### Running a code formatter or linter on each file with some extension
|
|
|
|
Running some program on a subset of files is relatively natural in
|
|
filter-branch:
|
|
|
|
```shell
|
|
git filter-branch --tree-filter '
|
|
git ls-files -z "*.c" \
|
|
| xargs -0 -n 1 clang-format -style=file -i
|
|
'
|
|
```
|
|
|
|
filter-repo decided not to provide a way to run an external program to
|
|
do filtering, because most filter-branch uses of this ability are
|
|
riddled with [safety
|
|
problems](https://git-scm.com/docs/git-filter-branch#SAFETY) and
|
|
[performance
|
|
issues](https://git-scm.com/docs/git-filter-branch#PERFORMANCE).
|
|
However, in special cases like this it's fairly safe. One can write a
|
|
script that uses filter-repo as a library to achieve this, while also
|
|
gaining filter-repo's automatic handling of other concerns like
|
|
rewriting commit IDs in commit messages or pruning commits that become
|
|
empty. In fact, one of the [contrib
|
|
demos](../contrib/filter-repo-demos),
|
|
[lint-history](../contrib/filter-repo-demos/lint-history), handles
|
|
this exact type of situation already:
|
|
|
|
```shell
|
|
lint-history --relevant 'return filename.endswith(b".c")' \
|
|
clang-format -style=file -i
|
|
```
|