mirror of
https://github.com/newren/git-filter-repo.git
synced 2024-07-06 18:32:14 +02:00
filter-repo: fix handling of topological changes
Due to pruning of empty commits, merge commits can become degenerate (same commit serving as both parents, or one parent is an ancestor of one of the others). While we usually want to allow such degenerate merge commits to themselves be pruned (assuming they add no additional file changes), we do not want to prune them if the merge commit in the original repository had the same degenerate topology. So, we need to keep track of the ancestry graph of the original repository as well and include it in the logic about whether to allow merge commits to be pruned. Signed-off-by: Elijah Newren <newren@gmail.com>
This commit is contained in:
parent
49732e8b5f
commit
697b9256f5
32
README.md
32
README.md
@ -202,22 +202,22 @@ provide at least one of the last three traits as well:
|
|||||||
the pruning of parents and other ancestors can ultimately result
|
the pruning of parents and other ancestors can ultimately result
|
||||||
in the loss of one or more parents. If a merge commit loses
|
in the loss of one or more parents. If a merge commit loses
|
||||||
enough parents to become a non-merge commit and it has no file
|
enough parents to become a non-merge commit and it has no file
|
||||||
changes, then it too can be pruned. Topology changes are also
|
changes, then it too can be pruned. Merge commits can also have
|
||||||
possible if the entire non-first-parent history is pruned away;
|
a topology that becomes degenerate: it could end up with the
|
||||||
rather than having that parent of the merge be rewritten to the
|
merge_base serving as both parents (if all intervening commits
|
||||||
merge base, it may (depending on whether the merge also had file
|
from the original repo were pruned), or it could end up with one
|
||||||
changes of its own) instead make sense to just prune that parent.
|
parent which is an ancestor of its other parent. In such cases,
|
||||||
(We do not want to prune away a first parent being rewritten to
|
if the merge has no file changes of its own, then the merge
|
||||||
the merge base since some projects prefer --no-ff merges, though
|
commit can also be pruned. However, if the merge commit was
|
||||||
this could be made an option.) Finally, note that we originally
|
already degenerate in the original history, then it was probably
|
||||||
talked not about pruning empty commits, but about pruning commits
|
intentional and the merge commit will not be pruned. Finally,
|
||||||
which become empty. Some projects intentionally create empty
|
note that we originally talked about pruning commits which become
|
||||||
commits for versioning or publishing reasons, and these should
|
empty, NOT about pruning empty commits. Some projects
|
||||||
not be removed. Instead, only commits which become empty should
|
intentionally create empty commits for versioning or publishing
|
||||||
be pruned. (As a special case, commits which started empty but
|
reasons, and these should not be removed. Instead, only commits
|
||||||
originally had a parent and which become a root commit due to the
|
which become empty should be pruned. (As a special case, commits
|
||||||
pruning of other commits will also be considered to have "become
|
which started empty but whose parent was pruned away will also be
|
||||||
empty".)
|
considered to have "become empty".)
|
||||||
|
|
||||||
1. [Speed] Filtering should be reasonably fast
|
1. [Speed] Filtering should be reasonably fast
|
||||||
|
|
||||||
|
@ -806,6 +806,8 @@ class FastExportFilter(object):
|
|||||||
# speak). The depth of a commit is one more than the max depth of any
|
# speak). The depth of a commit is one more than the max depth of any
|
||||||
# of its ancestors.
|
# of its ancestors.
|
||||||
self._graph = AncestryGraph()
|
self._graph = AncestryGraph()
|
||||||
|
# Another one, for ancestry of commits in the original repo
|
||||||
|
self._orig_graph = AncestryGraph()
|
||||||
|
|
||||||
# A set of commit hash pairs (oldhash, newhash) which used to be merge
|
# A set of commit hash pairs (oldhash, newhash) which used to be merge
|
||||||
# commits but due to filtering were turned into non-merge commits.
|
# commits but due to filtering were turned into non-merge commits.
|
||||||
@ -1174,15 +1176,17 @@ class FastExportFilter(object):
|
|||||||
# 'None')
|
# 'None')
|
||||||
# Remove all parents rewritten to None, and keep track of which parents
|
# Remove all parents rewritten to None, and keep track of which parents
|
||||||
# were rewritten to an ancestor.
|
# were rewritten to an ancestor.
|
||||||
tmp = zip(parents, [x in _SKIPPED_COMMITS for x in orig_parents])
|
tmp = zip(parents,
|
||||||
|
orig_parents,
|
||||||
|
[x in _SKIPPED_COMMITS for x in orig_parents])
|
||||||
tmp2 = [x for x in tmp if x[0] is not None]
|
tmp2 = [x for x in tmp if x[0] is not None]
|
||||||
parents, is_rewritten = [list(x) for x in zip(*tmp2)] if tmp2 else ([], [])
|
if not tmp2:
|
||||||
|
# All ancestors have been pruned; we have no parents. Note that the
|
||||||
# However, the way fast-export/fast-import split parents into from_commit
|
# way fast-export/fast-import split parents into from_commit and
|
||||||
# and merge_commits means we'd rather a parentless commit be represented
|
# merge_commits means we'd rather a parentless commit be represented
|
||||||
# as a list containing a single None entry.
|
# as a list containing a single None entry.
|
||||||
if not parents:
|
return [None], None
|
||||||
parents.append(None)
|
parents, orig_parents, is_rewritten = [list(x) for x in zip(*tmp2)]
|
||||||
|
|
||||||
# We can't have redundant parents if we don't have at least 2 parents
|
# We can't have redundant parents if we don't have at least 2 parents
|
||||||
if len(parents) < 2:
|
if len(parents) < 2:
|
||||||
@ -1197,9 +1201,9 @@ class FastExportFilter(object):
|
|||||||
# Deleting duplicate rewritten parents means keeping parents if either
|
# Deleting duplicate rewritten parents means keeping parents if either
|
||||||
# they have not been seen or they are ones that have not been rewritten.
|
# they have not been seen or they are ones that have not been rewritten.
|
||||||
parents_copy = parents
|
parents_copy = parents
|
||||||
pairs = [[p, is_rewritten[i]] for i, p in enumerate(parents)
|
uniq = [[p, orig_parents[i], is_rewritten[i]] for i, p in enumerate(parents)
|
||||||
if not (p in seen or seen_add(p)) or not is_rewritten[i]]
|
if not (p in seen or seen_add(p)) or not is_rewritten[i]]
|
||||||
parents, is_rewritten = [list(x) for x in zip(*pairs)]
|
parents, orig_parents, is_rewritten = [list(x) for x in zip(*uniq)]
|
||||||
if len(parents) < 2:
|
if len(parents) < 2:
|
||||||
return parents_copy, parents[0]
|
return parents_copy, parents[0]
|
||||||
|
|
||||||
@ -1213,10 +1217,21 @@ class FastExportFilter(object):
|
|||||||
if not is_rewritten[cur]:
|
if not is_rewritten[cur]:
|
||||||
continue
|
continue
|
||||||
for other in xrange(num_parents):
|
for other in xrange(num_parents):
|
||||||
if cur != other and self._graph.is_ancestor(parents[cur],
|
if cur == other:
|
||||||
parents[other]):
|
continue
|
||||||
to_remove.append(cur)
|
if not self._graph.is_ancestor(parents[cur], parents[other]):
|
||||||
break # cur removed, so skip rest of others -- i.e. check cur+=1
|
continue
|
||||||
|
# parents[cur] is an ancestor of parents[other], so parents[cur]
|
||||||
|
# seems redundant. However, if it was intentionally redundant
|
||||||
|
# (e.g. a no-ff merge) in the original, then we want to keep it.
|
||||||
|
if self._orig_graph.is_ancestor(orig_parents[cur],
|
||||||
|
orig_parents[other]):
|
||||||
|
continue
|
||||||
|
# Okay so the cur-th parent is an ancestor of the other-th parent,
|
||||||
|
# and it wasn't that way in the original repository; mark the
|
||||||
|
# cur-th parent as removable.
|
||||||
|
to_remove.append(cur)
|
||||||
|
break # cur removed, so skip rest of others -- i.e. check cur+=1
|
||||||
for x in reversed(to_remove):
|
for x in reversed(to_remove):
|
||||||
parents.pop(x)
|
parents.pop(x)
|
||||||
if len(parents) < 2:
|
if len(parents) < 2:
|
||||||
@ -1237,8 +1252,6 @@ class FastExportFilter(object):
|
|||||||
if len(orig_parents) < 2:
|
if len(orig_parents) < 2:
|
||||||
# Special logic for commits that started empty...
|
# Special logic for commits that started empty...
|
||||||
if not had_file_changes:
|
if not had_file_changes:
|
||||||
if orig_parents == [None]:
|
|
||||||
orig_parents = []
|
|
||||||
had_parents_pruned = (len(parents) < len(orig_parents) or
|
had_parents_pruned = (len(parents) < len(orig_parents) or
|
||||||
(len(orig_parents) == 1 and
|
(len(orig_parents) == 1 and
|
||||||
orig_parents[0] in _SKIPPED_COMMITS))
|
orig_parents[0] in _SKIPPED_COMMITS))
|
||||||
@ -1376,6 +1389,8 @@ class FastExportFilter(object):
|
|||||||
parents, new_1st_parent = self.trim_extra_parents(orig_parents, parents)
|
parents, new_1st_parent = self.trim_extra_parents(orig_parents, parents)
|
||||||
from_commit = parents[0]
|
from_commit = parents[0]
|
||||||
merge_commits = parents[1:]
|
merge_commits = parents[1:]
|
||||||
|
if orig_parents == [None]:
|
||||||
|
orig_parents = []
|
||||||
|
|
||||||
# Get the list of file changes
|
# Get the list of file changes
|
||||||
file_changes = []
|
file_changes = []
|
||||||
@ -1407,6 +1422,7 @@ class FastExportFilter(object):
|
|||||||
|
|
||||||
# Record ancestry graph
|
# Record ancestry graph
|
||||||
self._graph.add_commit_and_parents(commit.id, commit.get_parents())
|
self._graph.add_commit_and_parents(commit.id, commit.get_parents())
|
||||||
|
self._orig_graph.add_commit_and_parents(id_, orig_parents)
|
||||||
|
|
||||||
# Record the original list of file changes relative to first parent
|
# Record the original list of file changes relative to first parent
|
||||||
orig_file_changes = set(commit.file_changes)
|
orig_file_changes = set(commit.file_changes)
|
||||||
|
Loading…
Reference in New Issue
Block a user