Ideas/CVSKeywords

Why darcs doesn’t do CVS keywords

From DifferencesFromSubversion:

CVS keywords are single line informational strings that can optionally be added to files to managed by CVS. They look like this:

$Id: File2.java,v 1.3 2004/06/02 17:30:42 JHunter Exp $

darcs intentionally doesn’t support this concept. These keywords modify the source files directly, whereas the versioning system should leave the source untouched. Also, it becomes much more complicated to compute diffs, to apply patchsets in different orders and so on. Which date should appear? The date you applied the patches, the date of the latest applied patch or the date of the latest patch applied?

Why it maybe could…

That being said, I wonder if this isn’t a smaller can of worms than we’re making it out to be. For starters:

  • The keywords could leave the pristine tree untouched. That is, they would remain “collapsed” in the pristine tree. Only the working directory need contain their expansion. (this is what CVS and subversion does)
  • Darcs operations could pre-process the files to collapse any keyword information, to make it look like the pristine tree. This pre-processing step could also be used to implement pre-hookish features.
  • We could decide on a sane convention on what we put on the keywords
    • Version is pretty meaningless, but maybe the very latest date of a patch which affects the file; plus the author of that patch?

Ours could look like this then:

$Id: File2.hs 2006-03-17 17:30:42 me@foo.bar.org $

Things that need to be worked out…

  • What happens when you pull patches?
  • Record them?
  • Obliterate them?

Anyway, this is all just a bunch of random speculation. I’m happy the way things are, but I think a crappy implementation of this –crappy in the sense that darcs remains intact, but the keywords don’t necesarily reflect what you think they might – would be better than nothing.


Discussion:

MarkStosberg (presumably) suggested darcs changes as an alternative:

darcs changes --last=1 --summary file.txt

It produces friendly output like this:

Mon Oct 25 07:01:55 EST 2004  Mark Stosberg <mark@summersault.com>
  * typo fix: scopre -> scope

    M ./Changes.lhs -1 +1

‘’It should be noted that this feature is really neither similar to nor more useful than the keywords that CVS provides; indeed, it’s far less useful for the very fact that it *doesn’t* modify the source files. CVS’ keywords are immensely useful for making the version numbers/revisions of a particular file available to the program itself at runtime; such information is exceedingly useful in bug reports, stack traces, etc. I certainly hope that this features makes it into Darcs somehow –JeremyFincher’’

Darcs does actually include its version in the binary – look at darcs --exact-version.

This is done by dumping the output of darcs changes --context into a file at compile time, and including this file into the compiled binary. The context file can be generated by the Makefile itself (if you build in the Darcs tree), or in the predist hook if you distribute darcs dist-generated sources.

Most systems that I am familiar with allow you to turn keyword substitution on/off. You can have it your way. -ehlarson

No, you cannot. Keyword substitution breaks a certain number of properties, such as the fact that the repository contains pristine sources. If keyword substitution is available, there are no longer any guarantees about what you’ve got in the repository, as anyone who ever tried to track a vendor branch in CVS certainly knows.

‘’It’s easy enough to conceive a Makefile or predist command which dumps some version info into a file which is not managed by darcs, and which can then be built into the program or installed somewhere it can read. The main problem I face is that darcs doesn’t seem to have any concept of concise and unique version numbers that could be used for quickly identifying which version of a program is running. The closest thing seems to be the name+emailaddr+datetime associated with each patch, but just having the most recent one of those isn’t right either, because a particular repo might lack some earlier patches. I really miss automatic version numbers. I feel sure there must be a simple solution to this, I just don’t know what it is! –BenWheeler’’

How about instead of fighting the patch algebra, use it to generate a unique version tag? This could be done by first generating a unique hash for each patch, and then combining all those hashes together in an order-insensitive fashion to get a set-of-patches hash. (Their product mod would work, for instance.) This could be included in the Id:... information. Maybe the filename (relative to the repository root) should be part of the hash as well. The Id:... information would also include some human-readable stuff (eg filename and timestamp of last patch) but the hash would be the definitive identifier. This would solve the common-case problem: version string to identify executables, and version string to identify the LaTeX file you give to someone who edits it and hands it back. (For bonus points, maintaining a table of hash->set-of-patches somewhere in _darcs/, and some command to preserve an entry there like darcs idkey [file...], would make life even easier. The table could hold a transient cache of Id keys, and the idkey command would tell it to never flush the given entry. There would also be a symmetric command to try to figure out, from a hash, which file and set of patches it came from.) –BarakPearlmutter