Internals/PendingPatch

Overview
Commands that only edit pending
Commands that edit both pending and recorded
Properties
Mechanisms
Pitfalls
- Issues
Questions (and answers)

pending patch is wtf-y – Owen Stephens

Overview

The pending patch seems to play at least four roles:

First, it tracks changes that were made explicitly via commands like darcs add, darcs remove, darcs replace etc. Darcs can try to infer such changes from the state of the working tree (and does so if you give it appropriate options), but it would be bad if couldn’t do this explicitly and so that darcs remembers it:

looking for changes, especially file additions is potentially expensive,
it distracts users [prompting about irrelevant things], and
users would presumably want to express ideas like “stop tracking this file, but please don’t delete it!” (darcs remove)

Second, even if Darcs were to have only implicit adds/removes all the time, it would still need a mechanism to help interpret some low-level changes as their high-level equivalents. For example, some hunk changes in working could actually be interpreted either as token replace operations or straightforward hunk changes; likewise, moving a file could be seen as removing a file with the old name and adding a new or a darcs mv operation. The pending patch mechanism is used to avoid these ambiguities by explictly stating where high level operations like darcs token replace have taken place.

Third, as an upshot of tracking the removes and high-level primitive patches, Darcs also need to know the subset of hunk patches that they depend on. In the case of darcs remove, for example, we would need to have a hunk patch that deletes the file contents before appending the rmfile patch to it (this is only necessary if we want pending to be a plain old darcs patch, which sounds like a sensible thing to want). As for darcs replace it can sometimes happen that the new token already exists in your working directory, but you want to do the replace anyway. In these cases, Darcs offers the ability to “force” the replace by first creating hunk patches that revert the new token back to the old one (so the replace can go on top of those).

Fourth, (actually a generalisation of the second), it allows Darcs to track things that simply aren’t represented elsewhere in working, ie. changepref patches.

Commands that only edit pending

Most likely any command that uses patches and touches the working directory will have run into the pending patch one way or the other.

whatsnew

The whatsnew command does a standard diff from the unrecorded to recorded state.

Also, the variant whatsnew --look-for-adds shows users a distinction between files that were added explictly (upper case A) and those that were found via --look-for-adds (lower case A). To support this distinction, whatsnew does two diffs (both with/without the looked-for adds) and a further diff between the diffs.

add

When you use the darcs add command, Darcs needs to update the pending patch to remember that you have explicitly expressed an interest in the affected files or directories (it doesn’t need –look-for-adds to find them). After the usual sanity checks from the UI, it converts the adds into primitive patches and applies Darcs.Repository.addToPending.

What addToPending does is to take some changes (prim patches) that are supposed start at (apply to) the working state, and commute them backwards through (only) those unrecorded changes that are not in pending. The result is then appended to the pending patch, and afterwards (as always when we modify pending) sifted, see Sifting for pending.

remove

TODO

seems a bit magical, don’t really understand this business of gaps, eg. freeGap (addfile f_fp :>: rmfile f_fp :>: NilFL). What’s the significance of the addfile patche followed by the rmfile?
addToPending is involved

revert and unrevert

Reverting patches should be a fairly straightforward matter of removing things from pending. One challenge in getting revert right is that we want to offer the user the ability to pass darcs revert a set of files (ie. “only revert patches affecting these files”); but we also want to make sure we don’t overshoot by leaving behind a pending that only talks about those files. In the current implementation, we take a diff from recorded to unrecorded, which we split by cherry picking into an “leave alone” sequence followed by the patches we want to revert. The new pending patch is built by appending the inverted patches onto the recorded/unrecorded diff and hoping that sifting will do the right thing. (EYK: why not just make a new pending out of the leave-alone sequence?)

TODO unrevert

Commands that edit both pending and recorded

record, unrecord, and amend-record

Recording a patch involves adding a named patch to the repository. Adding a patch to the repository automatically triggers removing its prims from pending. In the case of Darcs record, this is a sensible thing to do because we can consider these prims as already having been accounted for by the new patch and therefore redundant/incorrect. Think of it as transferring these patches from pending to the newly created named patch. As with any other command that modifies the pending patch, this removal is followed by (sifting)[#mechanisms]. It’s not clear to me if in practice this sifted patch is any smaller than our reduced pending.

We can reason about unrecording patches the same way as recording. If recording means that we transfer prims from pending to the new patch; unrecording means that we have to transfer prims out of the soon-to-be-dead patch back into pending. So just as you remove from pending to add patches to the repo, you also add (prepend!) patches to pending when you remove them from the repo.

As currently implemented, amend-record involves removing the original patch (adding its contents to pending), and adding the updated patch (removing its contents from pending). This is almost like doing unrecord followed by record, except that we only have a single sifting phase (add to pending, remove from pending, sift). By rights, one would imagine that amend-recording a patch gives you the same pending patch as doing a single record would have done.

apply, pull, rebase unsuspend

Applying patches involves the mess that is tentativelyMergePatches_ (though not quite as bad nowadays as it was in the past). Basically when applying patches we need to worry about two things: conflicts between the patches themselves, and further conflicts with the unrecorded stuff.

Read this diagram from the bottom up:

          .
          |  resolution <-- conflict markers
          |
          .
 unrec2  / \  new2  <-- applied to the working we have
        /   \
       R2    U
        \   /
    new  \ /  unrec
          R

The context R refers to our current recorded state, U to our unrecorded state (pending + working). The new patches are the ones that would get applied on pristine, but we also need to account for the unrecorded state, so we have new2 which is the result of merging new with unrec. The resolution patches are the conflict markers; they get applied last.

For purposes of applying the patches to the working directory, we want new2 + resolution because it’s simpler to just lay them on top of the unrecorded state we have (U). But for purposes of computing a new pending patch, we need something that is going to sit on top of the new recorded state R2, so we take unrec2 + resolution. And where darcs patch theory is nice is that the resolution patch is the same for both purposes because we know it starts from the same diamond merged place. In any case, the new pending patch is just unrec2 followed by resolution, (and as with all other commands that modify pending) minimised with sifting.

An even more in-depth explanation (with pictures!) is now part of the haddocks for tentativelyMergePatches_.

obliterate, rebase suspend (presumably get –tag too)

Recall that obliterating is effectively just unrecord and revert. The implementation used to follow that idea in principle, but nowadays what happens is that we commute the effect of the patch(es) past pending (storing the commuted version as the new pending patch), then past the detected unrecorded changes, and finally apply their inverse to the working tree.

Properties

Some basic properties of the pending patch

Deleting the pending patch does not lead to an inconsistent repository. You may lose some unrecorded changes, but it won’t break Darcs.
Pending applies cleanly on top of the current recorded state. We can say that a repository has the successive states
1. recorded (pristine)
2. pending (pending patch only)
3. unrecorded (pending + working)
Implementation detail: We currently do not explicitly track the pending state in the Repository type.
Pending contains siftable patches (see primIsSiftable, Hunks and Binary patches for Prim.V1) only if depended on by non-siftable patches in pending.
The pending patch is always sorted and coalesced (see sortCoalesceFL).

Mechanisms

Append vs prepend

It could be useful to recognise the distinction between appending new patches to pending and prepending. Appending makes sense when you’re adding new information (eg. in the case of the darcs add command). Prepending is used internally when removing patches because you are transferring information from the recorded patch to pending. Think of it as redrawing borders: prepend expands pending leftwards into recorded territory; append expands it rightwards into unrecorded territory.

Diff from recorded to unrecord

For repository-local operations like whatsnew or add, darcs obtains a sequence of patches that take it from its recorded state to its unrecorded state. This sequence combines information from both the pending patch and a diff against the working repository. To avoid overlaps between pending/working, Darcs applies the pending patch and diffs working against the resulting tree. It could in principle just return these two sequences in order (pending then diff; there are nowadays additional procedures that do so); in practice, it also massages the combined sequence, which may involve patches being reordered through commutation, canceled out, coalesced etc. This is used for commands that don’t (have to) care about pending vs. detected.

For efficiency, it’s possible to control what directories/files Darcs looks into when computing its diff; however, the patches we return may contain results that are not related to the files in question (see issue1825). Commands that use this mechanism may still need to filter the output.

Sifting for pending

Before writing out the pending patch, we always simplify it by “sifting for pending”. We can think of sifting as a clean-up phase. It’s simpler for operations that use the pending patch to simply dump things in there to make sure pending has the right information; and for other considerations to come in as a second pass. Sifting means simplifying the pending patch by

coalescing it
cancelling out adjacent patch/inverse pairs
getting rid of extraneous hunk/binary patches

Basically anything that can be inferred from a diff and which isn’t needed as a dependency (consider darcs replace –force) has to go.

Consistency check

When we read pending we check that it applies to pristine. If it doesn’t we suggest darcs repair which nowadays also repairs the pending patch.

Pitfalls

The pending patch has historically been a source for grief in Darcs. Here are some of things that have gone wrong in the past, with some attempts at broadly breaking them into categories

Too much pending: pending patch seems to have things it in it should not have
Not enough pending: pending patch seems to be missing prims
Confused by rename: some rename somewhere caused a problem
Kaboom: pending patch is inconsistent/buggy
Performance: some kind of horrible performance thing]
Working: things related to the working directory itself (implications of pending patch for what happens to the working dir; whether we’re diffing right)
Other…

AFAIK, all of the issues in the first 4 categories (“too much”, “not enough”, “kaboom”, and “confused by rename”) are fixed in the head. There remain some from the last 3 groups.

Issues

(deleted, you can search the tracker for an up-to-date list)

Questions (and answers)

how does Darcs behave when you add things to pending and then change your mind?

Answer: Not sure I understand this question. I am assuming this is about hitting Ctrl-C or otherwise terminating darcs. Nowadays pending follows the standard transaction protocol: initially it gets copied to pending.tentative, only that is being modified, and at the end we finalize by renaming it (atomically) back to pending. If the transaction gets aborted, no matter how, the old pending remains untouched.
[Owen] why do we have to commute with pending on record?

Answer: Recording selects unrecorded changes (consisting of pending, followed by changes detected with a diff) in a way that respects dependencies, so all the changes you selected have been commuted backwards (to the left, so they apply to the pristine state), and whatever remains could become the new pending (after sifting) … if it weren’t for the fact that the various –look-for-* options may actually (temporarily) add “pending” changes (to the sequence we select from) that haven’t been in pending to start with. Since the user is free to deselect them, it would be wrong to add them to the pending patch. Instead we have to try and cancel/coalesce the inverse of the recorded changes one by one with every single change in pending, which also involves commutation.
does –look-for-adds also mean “look for removes?” or does darcs systematically look for removes?*

Answer: The latter. Darcs always detects removes of (tracked) files.
how do conflicts with pending work?

Answer: Generally, unrecorded changes (pending as well as any detected changes) are subject to possible conflicts whenever we add patches to the repository “from the outside”, which includes apply, pull, and rebase unsuspend (but not record and amend, which actually turn unrecorded changes into named patches). What we do is create a temporary “anonymous” named patch consisting of all (relevant) unrecorded changes and then merge, discarding the (possibly conflicted) merged version of the anonymous “unrecorded” patch, and remember only the “resolution” (the conflict markup) and apply that to working (after an extra prompt).

When we remove patches from the repository (obliterate, rebase suspend), there can be no conflicts with pending; instead, pending (generally: unrecorded changes) may depend on the removed patch. This must not be allowed, so you have to revert any such changes first. In principle it would be possible to treat this as a sort of conflict, too, by merging the inverse of the effect of the removed patch with unrecorded as above, but this is currently not done.
when do we want to prepend (see tentativelyRemove; amend, get, unrecord) and when do we want to append to pending (get, rollback, unrecord, unrevert)?

Answer: See Append vs prepend.
why do we have different behaviours (removing from pending vs adding inverses?)

Answer: Removing a patch from a sequence is only a well-defined operation if that patch is either at the start or the end of the sequence, so in general you first have to commute it to either place. And the same goes for adding a patch to a sequence. This means that you can’t simply “add P to sequence Ps”, you have to say at which end and make sure the contexts match up. By itself, adding an inverse (prim) patch does not do any “removing”, you have to explicitly do that by e.g. applying sortCoalesceFL to the result, which among other things cancels inverses.