Commit Graph

23 Commits

Author SHA1 Message Date
Elijah Newren
e333be7b17 filter-repo: consistently use bytestrings for directory names
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-21 09:09:23 -07:00
Elijah Newren
71bb8d26a9 filter-repo: add a --state-branch option for incremental exporting
Allow folks to periodically update the export of a live repo without
re-exporting from the beginning.  This is a performance improvement, but
can also be important for collaboration.  For example, for sensitivity
reasons, folks might want to export a subset of a repo and update the
export periodically.  While this could be done by just re-exporting the
repository anew each time, there is a risk that the paths used to
specify the wanted subset might need to change in the future; making the
user verify that their paths (including globs or regexes) don't also
pick up anything from history that was previously excluded so that they
don't get a divergent history is not very user friendly.  Allowing them
to just export stuff that is new since the last export works much better
for them.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-10-17 18:55:09 -07:00
Elijah Newren
e9678a367f filter-repo: support deleteall directive
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-06-22 22:32:10 -06:00
Elijah Newren
1c25be5be7 filter-repo: add public method for adding objects to stream
External rewrite tools using filter-repo as a library may want to add
additional objects into the stream.  Some examples in t/t9391 did this
using an internal _output field and using syntax that did not seem so
clear.  Provide an insert() method for doing this, and convert existing
cases over to it.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-06-22 22:32:04 -06:00
Elijah Newren
88c1269d5a filter-repo: ensure branches are updated as we go
When we prune a commit for being empty, there is no update to the branch
associated with the commit in the fast-import stream.  If the parent
commit had been associated with a different branch, then the branch
associated with the pruned commit would not be updated without
additional measures.  In the past, we resolved this by recording that
the branch needed an update in _seen_refs.  While this works, it is a
bit more complicated than just issuing an immediate Reset.  Also, note
that we need to avoid calling callbacks on that Reset because those
could rename branches (again, if the commit-callback already renamed
once) causing us to not update the intended branch.

There was actually one testcase where the old method didn't work: when a
branch was pruned away to nothing.  A testcase accidentally encoded the
wrong behavior, hiding this problem.  Fix the testcase to check for
correct behavior.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-06-22 22:32:04 -06:00
Elijah Newren
7d42c2093c filter-repo: limit splicing repos warning to test that splices repos
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-06-08 09:05:31 -07:00
Elijah Newren
0b70b72150 filter-repo: provide extra metadata to some callbacks
For other programs importing git-filter-repo as a library and passing a
blob, commit, tag, or reset callback to RepoFilter, pass a second
parameter to these functions with extra metadata they might find useful.
For simplicity of implementation, this technically changes the calling
signature of the --*-callback functions passed on the command line, but
we hide that behind a _do_not_use_this_variable parameter for now, leave
it undocumented, and encourage folks who want to use it to write an
actual python program that imports git-filter-repo.  In the future, we
may modify the --*-callback functions to not pass this extra parameter,
or if it is deemed sufficiently useful, then we'll rename the second
parameter and document it.

As already noted in our API compatibilty caveat near the top of
git-filter-repo, I am not guaranteeing API backwards compatibility.
That especially applies to this metadata argument, other than the fact
that it'll be a dict mapping strings to some kind of value.  I might add
more keys, rename them, change the corresponding value, or even remove
keys that used to be part of metadata.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-30 22:07:48 -07:00
Elijah Newren
27f08be754 filter-repo: consolidate filtering functions into RepoFilter
Location of filtering logic was previously split in a confusing fashion
between FastExportFilter and RepoFilter.  Move all filtering logic from
FastExportFilter into RepoFilter, and rename the former to
FastExportParser to reflect this change.

One downside of this change is that FastExportParser's _parse_commit
holds two pieces of information (orig_parents and had_file_changes)
which are not part of the commit object but which are now needed by
RepoFilter.  Adding those bits of info to the commit object does not
make sense, so for now we pass an auxiliary dict with the
commit_callback that has these two fields.  This information is not
passed along to external commit_callbacks passed to RepoFilter, though,
which seems suboptimal.  To be fair, though, commit_callbacks to
RepoFilter never had access to this information so this is not a new
shortcoming, it just seems more apparent now.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-30 22:07:48 -07:00
Elijah Newren
2bd86a64bb filter-repo: remove superfluous everything_callback
I introduced this over a decade ago thinking it would come in handy in
some special case, and the only place I used it was in a testcase that
existed almost solely to increase code coverage.  Modify the testcase to
instead demonstrate how it is trivial to get the effects of the
everything_callback without it being present.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-30 22:07:48 -07:00
Elijah Newren
5ed97e999c filter-repo: rename FileChanges to FileChange
This class only represents one FileChange, so fix the misnomer and make
it clearer to others the purpose of this object.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-16 09:02:40 -07:00
Elijah Newren
35052f673d filter-repo (python3): replace strings with bytestrings
This is by far the largest python3 change; it consists basically of
  * using b'<str>' instead of '<str>' in lots of places
  * adding a .encode() if we really do work with a string but need to
    get it converted to a bytestring
  * replace uses of .format() with interpolation via the '%' operator,
    since bytestrings don't have a .format() method.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-08 08:57:51 -07:00
Elijah Newren
4c05cbe072 filter-repo (python3): bytes() instead of chr() or string join
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-08 08:57:51 -07:00
Elijah Newren
c3072c7f01 filter-repo (python3): convert StringIO->BytesIO and __str__->__bytes__
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-08 08:57:51 -07:00
Elijah Newren
9b3134b68c filter-repo (python3): ensure file reads and writes are done in bytes
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-08 08:57:51 -07:00
Elijah Newren
8b8d6b4b43 filter-repo (python3): ensure stdin and args are bytes instead of strings
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-08 08:57:51 -07:00
Elijah Newren
e5955f397f filter-repo (python3): shebang and imports
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-05-08 08:57:51 -07:00
Elijah Newren
805ce360fa filter-repo: simplify API for parent handling in Commit object
While the underlying fast-export and fast-import streams explicitly
separate 'from' commit (first parent) and 'merge' commits (all other
parents), foisting that separation into the Commit object for
filter-repo forces additional places in the code to deal with that
distinction.  It results in less clear code, and especially does not
make sense to push upon folks who may want to use filter-repo as a
library.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-04-29 09:56:38 -07:00
Elijah Newren
30228bdde2 filter-repo: add tests triggering callback sanity checks
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-04-29 09:56:38 -07:00
Elijah Newren
e913ccbe8d filter-repo: add coverage for some corner cases and unusual constructs
There are a number of things not present in "normal" imports that we
nevertheless support and need to be tested:
  * broken timezone adjustment (+051800->+0261; observed in the wild
    in real repos, and adjustment prevents fast-import from dying)
  * commits missing an author (observed in the wild in a real repo;
    just sets author to committer)
  * optional additional linefeeds in the input allowed by
    git-fast-import but usually not written by git-fast-export
  * progress and checkpoint objects
  * progress, checkpoint, and 'everything' callbacks

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-04-29 09:56:38 -07:00
Elijah Newren
ef4b96e7be filter-repo: add API backward compatibility warning
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-04-29 09:56:38 -07:00
Elijah Newren
cbacb6cd82 filter-repo: simplify import in lib-usage examples
Python wants filenames with underscores instead of hyphens and with a
.py extension.  We really want the main file named git-filter-repo, but
we can add a git_filter_repo.py symlink.  Doing so dramatically
simplifies the steps needed to import it as a library in external python
scripts.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-04-26 07:56:03 -07:00
Elijah Newren
6dba1f200c filter-repo: avoid string->datetime->string round trips
Most filtering operations are not interested in the time that commits
were authored or committed, or when tags were tagged.  As such,
translating the string representation of the date into a datetime object
is wasted effort, and causes us to waste more time later as we have to
translate it back into a string.

Instead, provide string_to_date() and date_to_string() functions so that
callers can perform the translation if wanted, and let the normal case
be fast.

Provides a small but noticable speedup when just filtering based on
paths; about a 3.5% improvement in execution time for writing the new
history.

Signed-off-by: Elijah Newren <newren@gmail.com>
2019-04-26 07:56:03 -07:00
Elijah Newren
a5d4d70876 filter-repo: add some testcases making use of filter-repo as a library
Signed-off-by: Elijah Newren <newren@gmail.com>
2019-04-26 07:56:03 -07:00