Internals/Repository
This page describes the structure of a darcs repository. Yes, this _darcs thingy that appears after you do a darcs initialize
! In this page, we describe repositories without referring to Darcs code. You may want to start by reading the Model page, to have a more global vision of Darcs repositories.
This is work in progress, so I will put a lot of todo everywhere.
(todo mention patch_index)
You can look into gzipped files with zless. Almost everything in _darcs is gzipped.
_darcs after an initialization
This is what we have after darcs init
:
_darcs/
|-- format
|-- hashed_inventory
|-- patches
|-- prefs
| |-- binaries
| |-- boring
| `-- motd
`-- pristine.hashed
`-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
format
describes the format of the repository. With the current version, it contains the lineshashed
anddarcs-2
.hashed
refers to the repository format (it is currently the only supported), anddarcs-2
refers to the patch semantics format (darcs-1 is still available). See the Darcs 2 description page and http://article.gmane.org/gmane.comp.version-control.darcs.devel/5393.hashed_inventory
is a file that contains at the same time a hash and an inventory:- the first line starting with
pristine:
gives the hash of the root of the pristine tree. - the following lines are the latest patches of the history (from older to newer, if there are patches) It is not gzipped.
- the first line starting with
patches
is a directory containing gzipped files, each one containing a named patch. This directory is initially empty.prefs
are plain text files that contain various optionspristine.hashed
contains gzipped files, each one containing either a directory content, or a file content. All together, these files contain the last saved state of the working copy. In the current case, the filee3b0...
is present to describe the current empty root directory of the pristine tree.
After preparing a patch (before recording)
Let’s start preparing a patch:
$ echo "file content" > somefile
$ darcs add somefile
We have the extra files in _darcs:
_darcs/
|-- format
|-- hashed_inventory
|-- index
|-- index_invalid
|-- patches
| |-- pending
| `-- pending.tentative
|-- prefs
| |-- binaries
| |-- boring
| `-- motd
|-- pristine.hashed
| `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
|-- tentative_hashed_inventory
`-- tentative_pristine
index
is an optimization file added since darcs 2.3.1, see Indexindex_invalid
: samepatches/pending
: information about the patch being prepared. Contains nowaddfile ./somefile
. See PendingPatch.patches/pending.tentative
todotentative_hashed_inventory
todotentative_pristine
todo
Now record:
$ darcs record -a -m "My first patch"
After recording a patch
What we have now in _darcs:
_darcs/
|-- format
|-- hashed_inventory
|-- index
|-- index_invalid
|-- inventories
| `-- 0000000205-0332fe4dd444b6b9f94ba71ea1ce3b6fa7cb564e5d4b9f6c0fc7044073ee08db
|-- patches
| |-- 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b
| |-- pending
| `-- pending.tentative
|-- prefs
| |-- binaries
| |-- boring
| `-- motd
|-- pristine.hashed
| |-- 694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df
| |-- 83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603
| `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
`-- tentative_pristine
New or changed files:
patches/0000000172-de13...
: the (gzipped) patch we just recorded.inventories
directory: contains the inventory corresponding to the current history.hashed_inventory
: contains the updated hash of the pristine tree of the repository, and the same contents as the inventory file mentioned above.
The contents of the patch we recorded is:
[my first patch
Guillaume <me@mail.com>**20101016142609
Ignore-this: 9af21412b424aef171164f2b98bc9d10
] addfile ./somefile
hunk ./somefile 1
+file content
So it is really a darcs patch, containing:
- metadata: patch name, author name, timestamp, and an random salt to ensure it is unique
- data: a list of primitive patches that constitute the current patch. We have two primitive patches of different kinds: an
addfile
and ahunk
.
Let us look at the first line of the file hashed_inventory
. It has a hash that starts by 83bf...
. Let us look into pristine.hashed
. It has two more files. Inside of the file 83bf...
we find
file:
somefile
694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df
This means that the current recorded state of the working copy contains a file called somefile
whose contents is given by the hashed file 694b...
. Let us look into that last file:
file content
So, hashed_inventory
describes the current recorded state of the repository and its first line indicates how to build the working copy. This is the information darcs needs when cloning a repository with darcs clone
.
Now one remark. Why do we keep this file printine.hashed/e3b0...
if we no longer need it to build the current working copy? There are two reasons for this. First, it is a little faster to not garbage collect unused pristine files after each transformation of the repository. But most importantly, if the repository is public and someone is in the process of cloning it, you don’t want to have some pristine files of the wanted version disappearing while the cloning is underway.
However one can manually optimize this directory, running darcs optimize clean
. After such a command, _darcs contains:
_darcs/
|-- format
|-- hashed_inventory
|-- index
|-- index_invalid
|-- inventories
| `-- 0000000205-0332fe4dd444b6b9f94ba71ea1ce3b6fa7cb564e5d4b9f6c0fc7044073ee08db
|-- patches
| |-- 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b
| |-- pending
| `-- pending.tentative
|-- prefs
| |-- binaries
| |-- boring
| `-- motd
|-- pristine.hashed
| |-- 694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df
| `-- 83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603
`-- tentative_pristine
So we got rid of that e3b0...
file that is no longer useful. Over time your darcs repositories may grow in size because of this pristine.hashed
directory that accumulates files. Run darcs optimize clean
if you need disk space.
hashed_inventory, inventory
An inventory is a file that describes the history of a repository by listing patches from older to newer. An inventory may start with the lines
Starting with inventory:
0000043508-7a6b...
Which means that this file does not contain the complete history of the repository, and that to look into older patches, you have to open the inventory file named by the hash 0000043508-7a6b...
.
hashed_inventory
is the inventory of the current state of the repository. The subdirectory inventories
stores other inventories useful for the history of the repository (and also “older” inventories that no longer are relevant, like inventories that contain deleted or modificated patches).
Let us take a repository with already many patches. Let us take one inventory file from _darcs/inventories/ :
Starting with inventory:
0000009036-9cbf750ff34fa7b3940af47b7c95ec812d2e536f5feada8d0e89ed530cecddcc
[TAG 1.5.3
Guillaume <me@mail.com>**20100513150110
Ignore-this: 4d602c25b18ca30228400f8800e27253
]
hash: 0000005948-e154869978642799facaca2180634f353d45df6e7478244f4fb16ea831ec612c
[switch to GHC 6.12 Prelude, fix warnings and take sme advice from hlint
Guillaume <me@mail.com>**20100604121359
Ignore-this: 7286831df91ffb8974deeb6a67527fa0
]
...
If we look at the file _darcs/inventories/0000009036-9cbf...
:
Starting with inventory:
0000005042-37894faa0a3f90fcba049147fdb28490d53b1a27b5763feff3a940906a8e0823
[TAG 1.5.2
Guillaume <me@mail.com>**20091110191538
Ignore-this: 7af98721b507b5b53d95688aeee45eff
]
hash: 0000003430-515b0a6e2c0fd55f0fb7fdf85b59387ee78a7c97306b56cd5767e0afedc62303
[comment no longer relevant
Guillaume <me@mail.com>**20100217132511
Ignore-this: e854183117a8d980ccab7efdf5a66a3d
]
hash: 0000000232-c7d79d1acf8a1847869c73e7852937b91d65a179f91e3d5b0581a354f6596cfe
[defer more to getMods
Guillaume <me@mail.com>**20100217173918
Ignore-this: f6e2633492d31565723729e787a62dd2
]
The idea of splitting inventory files is to avoid making darcs open and modificate too big files at once when running commands like record or pull. But splitting too much (say, splitting at every patch) would make darcs open too many files for a lot of operations. So the heuristic used is to split inventory files on tags. This follows the assumption that one cannot modificate the history before the last tag of the repository (to do so, you have to unrecord the tag).
See that inventory files contain the metadata of patches but not their contents. There is a hash for that, and this is the exact name of the file inside of _darcs/patches/ in order to read this patch’s contents.
Why do inventory files store patch metadata, and not only their hash? This is for lazy repositories. In lazy repositories you don’t download patches files but you have the inventory files. So at least you can run darcs log
without having to download extra files. However running darcs log -v
(which shows the contents of every patch), will automatically download all the patches.