This page describes the structure of a darcs repository. Yes, this _darcs thingy that appears after you do a “darcs initialize”! In this page, we describe repositories without referring to Darcs code. You may want to start by reading the Model page, to have a more global vision of Darcs repositories.
This is work in progress, so I will put a lot of todo everywhere.
You can look into gzipped files with zless. Almost everything in _darcs is gzipped.
This is what we have after
_darcs/ |-- format |-- hashed_inventory |-- patches |-- prefs | |-- binaries | |-- boring | `-- motd `-- pristine.hashed `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
formatcontains the two lines
darcs-2. This file is read by darcs before attempting to read from or write in the repository.
hashedrefers to the repository format, and
darcs-2refers to the patch format. For more information, see the Darcs 2 description page and http://article.gmane.org/gmane.comp.version-control.darcs.devel/5393
hashed_inventoryis a plain text file describing the last recorded state of the repository.
patchesis a directory containing gzipped files, each one containing a named patch. This directory is initially empty.
prefsare plain text files that contain various options
pristine.hashedcontains gzipped files, each one containing either a directory content, or a file content. The contents of all current directories and file of the last version of the repository are present. In the current case, the file e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is present to describe the current empty root directory of the repository.
Let’s start preparing a patch:
$ echo "file content" > somefile $ darcs add somefile
We have the extra files in _darcs:
_darcs/ |-- format |-- hashed_inventory |-- index |-- index_invalid |-- patches | |-- pending | `-- pending.tentative |-- prefs | |-- binaries | |-- boring | `-- motd |-- pristine.hashed | `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 |-- tentative_hashed_inventory `-- tentative_pristine
index: extra optimization file added by darcs since 2.3.1
patches/pending: the patch being built. Contains now
$ darcs record -a -m"my first patch"
What we have now in _darcs:
_darcs/ |-- format |-- hashed_inventory |-- index |-- index_invalid |-- inventories | `-- 0000000205-0332fe4dd444b6b9f94ba71ea1ce3b6fa7cb564e5d4b9f6c0fc7044073ee08db |-- patches | |-- 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b | |-- pending | `-- pending.tentative |-- prefs | |-- binaries | |-- boring | `-- motd |-- pristine.hashed | |-- 694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df | |-- 83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603 | `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 `-- tentative_pristine
hashed_inventory: its content has changed, more on that later
inventoriesdirectory: contains files of the same kind that
hashed_inventory, more on that later.
patches/0000000172-...: gzipped patch.
The contents of this last patch is:
[my first patch Guillaume <firstname.lastname@example.org>**20101016142609 Ignore-this: 9af21412b424aef171164f2b98bc9d10 ] addfile ./somefile hunk ./somefile 1 +file content
So it is really a darcs patch, with its metadata (name, author name, timestamp, and an extra hash to be sure no confusion can be made), and its data: addfile, one hunk consisting in one line addition in file
pristine_hashed has two more files.
file: somefile 694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df
This last file is in fact the description of the current last recorded state of the repository: an initial directory with the file
somefile, whose contents are given in the file
694b.... This is how darcs gets the contents of files when doing a
darcs get. But wait, how did I know that this file
83bf... was the description of the base directory of the last recorded state? Well I know it because
hashed_inventory now contains:
pristine:83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603 [my first patch Guillaume <email@example.com>**20101016142609 Ignore-this: 9af21412b424aef171164f2b98bc9d10 ] hash: 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b
hashed_inventory file describes the current recorded state of the repository and its first line gives the file name of the current root. That means
darcs get has all the information to retrieve files by looking at this
hashed_inventory file fist.
Now one remark. Why do we keep this file
printine.hashed/e3b0... if we no longer need it? Well, that’s because darcs wants to be fast and does not delete the pristine files over time. Also, this is something we could think of implementing and see if we can have a “tidying record” that is as fast as the current record. If you run
darcs optimize in that directory, _darcs now contains:
_darcs/ |-- format |-- hashed_inventory |-- index |-- index_invalid |-- inventories | `-- 0000000205-0332fe4dd444b6b9f94ba71ea1ce3b6fa7cb564e5d4b9f6c0fc7044073ee08db |-- patches | |-- 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b | |-- pending | `-- pending.tentative |-- prefs | |-- binaries | |-- boring | `-- motd |-- pristine.hashed | |-- 694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df | `-- 83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603 `-- tentative_pristine
So we got rid of that
e3b0... file that is no longer useful. Over time your darcs repositories may grow in size because of this
pristine.hashed directory that accumulates files. Run “darcs optimize” if you are in desperate need of disk space (the effect is dramatic if you have big files, like binary files, in your repository). See also the GrowingPristineProblem.
An inventory is a file that describes the state of a repository by listing patches. It may start by the hash of another inventory, so that inventory files never get too big.
hashed_inventory is the inventory of the current state of the repository. The subdirectory
inventories stores other inventories useful for the history of the repository.
_darcs/inventories/ contains gzipped context files. Each inventory starts with a hash of the other inventory file it relies upon. Let us take a repository with already many patches. Let us take one inventory file
Starting with inventory: 0000009036-9cbf750ff34fa7b3940af47b7c95ec812d2e536f5feada8d0e89ed530cecddcc [TAG 1.5.3 Guillaume <firstname.lastname@example.org>**20100513150110 Ignore-this: 4d602c25b18ca30228400f8800e27253 ] hash: 0000005948-e154869978642799facaca2180634f353d45df6e7478244f4fb16ea831ec612c [switch to GHC 6.12 Prelude, fix warnings and take sme advice from hlint Guillaume <email@example.com>**20100604121359 Ignore-this: 7286831df91ffb8974deeb6a67527fa0 ] ...
If we look at the file inventories/0000009036-9cbf750ff34fa7b3940af47b7c95ec812d2e536f5feada8d0e89ed530cecddcc
Starting with inventory: 0000005042-37894faa0a3f90fcba049147fdb28490d53b1a27b5763feff3a940906a8e0823 [TAG 1.5.2 Guillaume <firstname.lastname@example.org>**20091110191538 Ignore-this: 7af98721b507b5b53d95688aeee45eff ] hash: 0000003430-515b0a6e2c0fd55f0fb7fdf85b59387ee78a7c97306b56cd5767e0afedc62303 [comment no longer relevant Guillaume <email@example.com>**20100217132511 Ignore-this: e854183117a8d980ccab7efdf5a66a3d ] hash: 0000000232-c7d79d1acf8a1847869c73e7852937b91d65a179f91e3d5b0581a354f6596cfe [defer more to getMods Guillaume <firstname.lastname@example.org>**20100217173918 Ignore-this: f6e2633492d31565723729e787a62dd2 ]
TODO what is the logic behind inventory file segmenting ?
See that inventory files contain the metadata of patches but not their contents. There is a hash for that, and the hash is used as a file name in _darcs/patches/, to store the metadata again + the patch content.
Why is there patch metadata in inventory files, while it is also in _darcs/patches/ files? This is for lazy repositories. In lazy repositories you don’t download patches files but you have inventory files. So at least you can do
darcs changes without having to downlad extra files. However if you want to do
darcs changes -v this downloads all patches. By the way this is a way to “complete” your repository into a full one.