Internals/PatchIndex

Patch Index internal representation

Patch index defines two data-types.

PatchId

  1. Uniquely identifies a patch
  2. Stores patch hash of the patch

example:

20021020200105-e9342-29561b8cad0e1c33e44e7c135cb7753667a67856

FileId

  1. Uniquely identifies a file
  2. Persistent to file move (ie, FileId does not change if the file is moved)
  3. Stores the name with which the file is created, along with the count on the number of times files had this name

example:

2#./tools/upload.cgi
(identifies the second file created with the FileName ./tools/upload.cgi)

Files and corresponding Data Structures

Path Index is stored in files repo_state, fid_map, fp_map, touch_map

repo_state

  • Type: (String, [PatchId])
  • Stores:
    • the inventory hash
    • the patch hashes of the repository patches(order is preserved)
  • Use:
    • the inventory hash is used to test if patch index is in sync with the repository
    • the patches hashes explicitly specify the ordering used in fp_map, fid_map
  • Note:
    • two repositories with same patches but different ordering, have different patch indexes.

example:

Inventory hash: a6e4db850b1174d8dd782c52068c6917af9b62b6e8e4d51f5955e9959359ea9a
Patches: 20021020200105-e9342-29561b8cad0e1c33e44e7c135cb7753667a67856, 20021020201112-e9342-adf7a83591a9d9217ae6a982ee76065798cb5105, 20021021080948-e9342-f230974d0257048c330c29bd91307d2fe90dbc6e

touch_map (also known as info_map)

  • Type: Map FileId (Bool, Set PatchId)
  • Stores:
    • the set of patches that modified a file
    • identifies the path to be a directory or a file
  • Use:
    • Serves the main purpose of patch index. It stores the correlation between a file and the set of patches that modified it.
    • Stores weather a FileId is a file or directory because changes and annotate need different selection of patches based on weather a path is a directory or file.

example:

1#./src -> False
1#./src -> 20110306104156-5ef8f-819275be3ae00439924ecda92547a0efcdf7711d
1#./src -> 20120705205712-5ef8f-d20e6633853cfa5da354b643587922335c412d3f

1#./Map.hs -> True
1#./Map.hs -> 20050406002216-50d88-3c861fbb7b02375b2a9dce828f13f458f7d180c3
1#./Map.hs -> 20050406210939-50d88-770ef6f38a581355d9de5312160a365925e6646a

fid_map

  • Type: Map FileName [(FileId, PatchId, Maybe PatchId)]
  • file name fn corresponds to file id fid from patch id pid1(inclusive) to patch id pid2(exclusive)
  • If a file has a span ending in Nothing (ie. pid2 is “-”), it means that the fn-fid mapping is still correct even after applying the last patch, in other words that it exists in pristine
  • the patches are ordered according to the order stored in repo_state
  • fid_map is used to identify the FileId with the given FileName, at any point in the repository state.

example:

./A -> 2#./A from pid1 to pid2
./A -> 1#./A from pid3 to pid1

(./A no longer exists in pristine) (recall each pidN is a patch hash like 20021020200105-e9342-29561b8cad0e1c33e44e7c135cb7753667a67856)

fp_map

  • Type: Map FileId [(FileName, PatchId, Maybe PatchId)]
  • file id fid has file name fn from patch id pid1(inclusive) to patch id pid2(exclusive)
  • If a file has a span ending in Nothing (ie. pid2 is “-”), it means that the fn-fid mapping is still correct even after applying the last patch, in other words that it exists in pristine
  • the patches are ordered by the ordering in repo_state
  • fp_map is used to identify the FileName with the given FileId, at any point in the repository state.

example:

1#./A -> ./D/B from pid1 to -
1#./A -> ./B from pid2 to pid1
1#./A -> ./A from pid3 to pid2

(./A exists in pristine)

A complete example

Creating a repository

    mkdir R
    darcs init --repo R
    cd R

Patch index:

   repo_state : (inventory_hash1, [])

   fid_spans : Map [ ]

   fp_spans: Map [ ]

   info_map: Map [ ]

Adding directories and a file

    mkdir d e
    echo "f1" > d/f
    darcs record -lam 'p1'

Patch index:

   repo_state : (inventory_hash2, [p1])

   fid_spans : Map [ ./d   -> [(1#./d,   pid1, Nothing)],
                     ./d/f -> [(1#./d/f, pid1, Nothing)],
                     ./e   -> [(1#./e,   pid1, Nothing)] ]

   fp_spans: Map [ 1#./d   -> [(./d,   pid1, Nothing)],
                   1#./d/f -> [(./d/f, pid1, Nothing)],
                   1#./e   -> [(./e,   pid1, Nothing)] ]

   info_map: Map [ 1#./d  -> (False, {pid1}),
                   1#./d/f -> (True,  {pid1}),
                   1#./e  -> (False, {pid1}) ]

Moving a directory

    darcs move d d2
    mkdir d
    echo "f2" > d/f
    darcs record -lam 'p2'

Patch index:

   repo_state : (inventory_hash3, [p1, p2])

   fid_spans : Map [ ./d   -> [(2#./d,   pid2, Nothing), (1#./d,   pid1, Just pid2)],
                     ./d/f -> [(2#./d/f, pid2, Nothing), (1#./d/f, pid1, Just pid2)],
                     ./e   -> [(1#./e,   pid1, Nothing)],
                     ./d2  -> [(1#./d,   pid2, Nothing)],
                     ./d2/f -> [(1#./d/f, pid2, Nothing)] ]

   fp_spans: Map [ 1#./d   -> [(./d2, pid2, Nothing), (./d,   pid1, Just pid2)],
                   2#./d   -> [(./d,  pid2, Nothing)],
                   1#./d/f -> [(./d2/f, pid2, Nothing), (./d/f, pid1, Just pid2)],
                   2#./d/f -> [(./d/f, pid2, Nothing)],
                   1#./e   -> [(./e,   pid1, Nothing)] ]

   info_map: Map [ 1#./d  -> (False, {pid1}),
                   2#./d  -> (False, {pid2}),
                   1#./d/f -> (True,  {pid1, pid2}),
                   2#./d/f -> (True,  {pid2}),
                   1#./e  -> (False, {pid1}) ]

How patch index identifies patches that modified a file

  • the user enters a command, say “darcs changes FILE”
  • the file id of FILE, fid1 is identified using fid_map.
  • the patches corresponding to fid1 are identified using info_map

Automatic update

Patch Index is kept in sync with the patches in the repository.

The synchronization of patch index is tested by computing the hash of inventory, and comparing it with the hash of inventory stored in repo_state. This test is run whenever patch index is used(by changes or annotate). If the patch index is not in sync, patch index is updated.

Patch index is automatically updated whenever the repository state changes. Automatic update was tested to be working upon apply, record, push, pull, amend-record, obliterate, unrecord, tag.

Automatic Creation

Patch index could be created at:

  • get
  • init
  • patch index darcs modifies the patches of an existing repository

A user interrupt using Ctrl-C at get or on existing repositories is caught, and patch index gets disabled.

Algorithm for creation of patch index

The algorithm first looks through each patch in the repository, and associates every patch to “changes” made to files. The change could be Touch, CreateFile, CreateDir, Remove, Rename, Invalid, DuplicateTouch.

For example, a patch p with primitives “addfile f”, “hunk f” and “move g h” will have changes CreateFile f, Touch f, Rename g h.

The conversion from a primitive-patch to a change is mostly a straightforward one to one mapping. However a directory move primitive-patch can have multiple Rename changes. A directory move will have renames for the directory and all files in that directory.

The end result of this pass will be a list of patch id’s, each patch id being associated with some changes.

The patch index is recursively build using this result. Let there be a patch index pi { repo_state, fid_map, fp_map, touch_map } build with all patches until a patch with id pid. pid has some changes associated with it.

  • Touch f: Let the file id of file f, be fid. touch_map will be updated, such that fid will have pid associated with it.
  • CreateFile f: New spans will be started on fid_map, and fp_map. pid will be associated with fid in touch_map. touch map will indicate that f is a file.
  • CreateDir f: New spans will be started on fid_map, and fp_map. pid will be associated with fid in touch_map. touch map will indicate that f is a directory.
  • Remove f: The span associated with fp in fid_map, and fid in fp_map will end. pid will be associated with fid in touch map.
  • Rename f1 f2: The old spans in fid_map and fp_map will end, and new spans will be started. pid will be associated with fid in touch map.
  • Invalid : is ignored
  • DuplicateTouch f: Is treated the same way as Touch

Algorithm for patch index update

The algorithm first finds the common prefix and the dissimilar suffixes between the patches in the repository and the patches tracked by patch index. The patch index of the common prefix is computed by undoing the dissimilar suffix of the patch index. The suffix of the repository is applied on the intermediate patch index, giving the updated patch index.

The undo works as follows: The spans in fp_map and fid_map with the start patch and the end patch both in the prefix will not be changed. The spans which have the start patch and the end patch in the suffix will be removed. The span with start patch in prefix and end patch in suffix will be modified to not have an end patch. In info_map, the patches in the suffix will be removed. If the patches corresponding to a fid become empty, the fid entry is removed.