GSoC
Timeline
- March 18 to March 29: Mentoring organization application
- April 8: List of accepted mentoring organizations published
- April 22 to May 3: Student application
- May 8: Slot allocations published to mentoring organizations
- May 9 to 22: Slot allocations trading
- May 27: Accepted student proposals announced on the Google Summer of Code 2013 site.
- June 17: official SoC start
- July 29 to August 2: midterms evaluation submission
- September 16: suggested ‘pencils down’ date
- September 23: firm ‘pencils down’ date
- September 23 to 27: evaluation submission
- October 1: final results announced by Google
- October 19-20: Summer of Code mentor summit
Project ideas
Here are some ideas for 2013 Google Summer of Code student projects. They probably need to be cut down to plans that could realisticly fit into the summer of code timeframe. Note that these themes are just to get you started. We welcome submissions beyond these initial ideas. Get in touch with us! darcs-users@darcs.net or #darcs on freenode.
All of these projects require good Haskell skills. Bonus points if you already contributed code to darcs.
1. Better record
Proposed by: Florent, Ganesh, Guillaume, Owen
The most complicated part of Darcs is, arguably, the patch handling core. However, the task of creating patches has received less attention. The command record creates patches, based on the changes that are present in the working copy. It proposes the user to record changes based on diffing the pristine and the working copy.
Diffing two given files can produce various correct outputs, depending on the algorithm used. The standard diff algorithm (used in darcs and many other places) has been criticized to sometimes produce counterintuitive diffs. We would like to try out the “patience diff” algorithm, which seems to produce more interesting chunks when used on source code. One downside of patience diff is that it may be slower than classic diff, so performance will have to be evaluated.
Moreover, just as the existing flag --look-for-adds proposes adding unversioned files to the new patch, we could use a --look-for-moves flag, which would be handful when one wants to record a file move after having done the move, ie, without using darcs move. Another cool flag would be --look-for-replaces, which would detect token renaming when one forgets about using darcs replace.
If time allows, a --provide-primitive-patches would be useful for darcs to be called by another program that provides the changes to record. For instance, a web interface providing a simple on-line code edit feature a la GitHub.
Tasks
- implement patience diff:
- write tests for diff correctness
- evaluate patience diff performance versus original diff performance
- implement
record --look-for-moves:http://bugs.darcs.net/issue642 - implement
record --look-for-replaces:http://bugs.darcs.net/issue2209 - implement
record --provide-primitive-patches
Comments
This may be the project that involves less modification of the “core” of darcs, and thus may be more accessible to newcomers.
2. Hashed files and cache
Proposed by: Guillaume
Hashed files in darcs follow the same idea as in git repositories or the bittorrent protocol: a file is saved on the disk using its own hash as name. Darcs uses them in many places. However some aspects could be improved.
Garbage collection: darcs only knows how to clean up the _darcs/pristine.hashed directory. This directory contains the recorded state of the working copy. We should extend garbage collecion to the patches and inventories of repositories. As of now, the only way to clean them is to do a new repository clone. See Using/GrowingInventoriesProblem and http://bugs.darcs.net/issue1987.
Cache system: darcs maintains a global cache in ~/.darcs/cache/, that is shared between all repositories of a user. This makes many operations faster and saves disk space by using hard links. However when the cache gets too big, it becomes a problem on its own, since filesystems do not cope well with directories with zillions of files inside. The idea would be to implement bucketed cache, ie use prefix directories.
See also: Internals/CacheSystem, Internals/HashedPristine, Internals/Hashes.
Tasks
- implement garbage collection for
_darcs/patches - implement garbage collection for
_darcs/inventories - (re)implement bucketed cache: a long lost piece of code from 2010 that never made it into darcs but should have (
http://bugs.darcs.net/patch72) - put cache in $XDG_CACHE_HOME
- investigate and implement global cache garbage collection (
http://bugs.darcs.net/issue305)
Comments
Dive into the fantastic world of hashed files, filesystem and hard links! Discover that having 100.000 files in the same directory may not be the greatest idea ever! Linus Torvalds knew it from the beginning and never told us!
3. Optimize optimize --reorder and other patch reordering issues
Proposed by: Guillaume
In their representation on the filesystem, patches of a repository are linearly ordered. The command optimize --reorder reorders patches so that untagged patches are moved to the “front” of this order. How could such untagged patches arrive there? Well this happens when you pull tags from a remote repository.
Such reordering reduces the amount that a typical remote command needs to download. It also reduces the CPU time needed for some operations (which ones?). But it requires some calculation on its own. That’s why we don’t do it all the time.
The current behaviour of the optimize --reorder algorithm is not yet completely understood. For instance the command is not idempotent in certain cases. On some repositories it is abnormally slow.
Tasks
- understand the
optimize --reorderalgorithm - study wether patch reordering should be idempotent (normal form for repository inventories?)
- collect hand-designed and real-life test repositories to measure patch reordering performance
- improve patch reordering performance
- implement
darcs send --minimize-contextie, the ability to create patch bundles with as few patches as possible in the context. The implementation will involve some heuristic vs exact thinking, since this task may be computationally costful.
Comments
This is problably the most “hardcore” project since it involves diving into the patch code of darcs. Sources say it may not be the friendliest piece of code ever. But this is also the most interesting and specific project, since first-class patches are what makes darcs unique among the other revision control systems. Along with the code, we will insist on having good documentation of what is going on.
4. Enhance darcsden
Proposed by: Florent, Ganesh, Owen
Darcsden is the piece of software used by the website http://hub.darcs.net. It enables to host repositories, supports forks and bug tracking. It is written in Haskell and uses the code of Darcs as a library.
- local hub: enable darcsden to be run locally and easily
- local and awareness of branches: track repository relationships
- enable
darcs sendto upload a patch bundle via http to darcsden
5. Better patch dependencies
Proposed by: Florent, Ganesh, Guillaume, Owen
- show on
whatsnewandrecordon which changes do the unrecorded changes sit - automatically discover patch dependencies (
amend --ask-deps) when given test fails without them
6. Other projects
Keep in mind that you could always propose an project with a whole different set of ideas. Be creative! :-)
Other project ideas:
- Add a darcs support to an existing GUI, for instance
http://rabbitvcs.org/orhttp://qct.sourceforge.net/(proposed inhttp://darcs.net/Ideas/GraphicalInterface).
Application process
Sketch out an idea. Can you make Darcs faster? Can you make it more useful? It would make sense to get in touch with darcs-users@darcs.net for some help.
Check out the student guide to know what you’re getting into
Get in touch with the Darcs team if you have not done so already
Write up your proposal (this should take a day or two). See the previous applications if you’re having trouble getting started.
Submit your application to the GSoC website Register as a student first then submit your application.
Older projects
- 2012
- Patch Index (BSRK Aditya)
2011
- Darcs Bridge (Owen Stephens)
- V3 Primative Patches (Petr Rockai)
2010
- Darcs network performance (Alexey Levan)
- Improvements to darcs caching mechanism (Adolfo Builes)
2009 - Hashed storage (Petr Rockai)
- Petr’s application (Slightly post-edited after acceptance)
2007 - Darcs 2 research (Jason Dagit)

