Internals/NamedPatch

Named patch

A collection of primitive patches, recorded together with the following attributes:

name

a single line of text

date

the timestamp taken at record-time

author

an arbitrary text, usually in the format “John De Bar <john@debar.fam>

log

a longer description of the patch, possibly multiline. When parsing a patch, the first space character is removed from each line of the log. When computing a patch’s hash, trailing newlines are removed from each log line before concatenating.

All these attributes are specified or determined when the patch is recorded and are meant to be read-only after then.

There are two other attributes on which the user has no impact:

inverted

a boolean flag, that indicates if the patch is a result of a rollback (NB. obsolete since DarcsTwo uses a different way of doing rollbacks, but the flag is still there for backward compatibility). The inverted state is indicated by a ‘*’ or ’-’ character immediately preceding the patch’s date.

hash

this is a compact representation of the tuple that is (supposed to be) globally unique

Hash computation

The unique hash id of any patch is computed by the following Haskell code in src/Darcs/Patch/Info.lhs:

make_filename :: PatchInfo -> String
make_filename pi =
    showIsoDateTime d++"-"++sha1_a++"-"++sha1PS sha1_me++".gz"
        where b2ps True = packString "t"
              b2ps False = packString "f"
              sha1_me = concatPS [_pi_name pi,
                                  _pi_author pi,
                                  _pi_date pi,
                                  concatPS $ _pi_log pi,
                                  b2ps $ is_inverted pi]
              d = readPatchDate $ _pi_date pi
              sha1_a = take 5 $ sha1PS $ _pi_author pi

In other words, the hash is composed from the following parts, separated by dashes (“-”):

  1. the UTC time stamp, in ISO format, like “20080519235311”
  2. the first five digits of the author, as encoded by SHA1
  3. the SHA1 code of the tuple ({{{(name, author, date, log, inverted)}}})
  4. a final ".gz" [[FootNote(yes, this is there even when the patch is NOT compressed)]]

As you can notice, the hunks (that is the PrimitivePatch collection, the actual changes the NamedPatch is carrying) aren’t included in the formula. This is a strong assumption in darcs: there cannot be different patches with the exact same set of attributes, and in turn with the same hash.

On-disk patch name

Named patches are stored in the meta directory _darcs/patches.

Historically the actual filename coincided with the hash of the patch, but DarcsTwo introduced the HashedFormat, which uses a completely different scheme.

A HashedFormat repository stores its patches in filenames composed by two parts:

  1. a ten digits zero-padded number, the uncompressed size of the whole patch
  2. the SHA256 64-digits signature of the whole patch

This has several benefits:

  1. darcs is able to assert that the mean of a patch wasn’t mangled in any way, because the filename is basically a checksum of the whole patch, hunks included
  2. it allows a better way of sharing patches between repositories, see CacheSystem

Since, as darcs commutes the patch with the others in the repository, the patch itself (its hunks, I mean) may change at any time:

  1. there’s no one-to-one relation between a particular hash name and the name of the file containing the patch
  2. there can be multiple instances of the same patch, collected under the _darcs/patches metadir: of course, at any given time, at most one of them are effectively listed in the RepositoryInventory

Note also that even if the patch is compressed on disk, its filename does not end with ".gz" —-

See also