GSoC/2011-Bridge

What?

A bridge between Darcs(current version) and other VCSs (including git and other snapshot-based VCSs and Darcs1). It will allow the interoperability between Darcs2 and other VCSs (transfering patches/commits and applying/sending patches, for example).

How do I use it?

For now, the bridge does not work, but when it does, see the darcs bridge usage wiki page (DarcsBridgeUsage) for information on getting set up and using the bridge.

Design Notes

VCSInterop

Why?

The VCS world contains many alternatives; choosing a tool should not hinder your ability to contribute to projects that haven’t made the same choice as you. All contributions are valuable and should be easy to accept!

When?

Initially, I (owst) propose this as my GSoC project in 2011.

What needs doing?

  • Allow automatic incremental conversion.
  • Create a consistent mapping between Darcs2 and Darcs1 format repositories.
  • Import and export foreign patch formats generated by VCS “send” commands.
  • Create a mapping/encoding of multi-head repositories.
  • Solve the problem of efficiently translating to/from Darcs patches.
  • The cycle problem, in the presence of multiple bridges.
  • “Roundtripping”, whereby information may be lost when converting to and back from another repository format.
  • Translation to and from Darcs specific patch-types e.g. replace patches.

Application

Motivation and Synopsis

The current DVCS world is formed of several popular alternative implementations, with many users and tools centered around their respective chosen VCS. Darcs has a different underlying methodology to all VCSs, being patch-based rather than snapshot-based. I believe it is important to continue to develop Darcs, despite the apparent current trend away from it; Darcs’ fundamentally different view of the world leads to greater usability and simpler operations. Unfortunately, the current Darcs implementation does not completely fulfil the goal of this increased usability. Due to corner cases in conflict handling, poor support for long-term branches, and scalability issues, Darcs is currently recommended only for use in small to medium-sized projects.

My proposed project is to create a generic bridge that will enable easy interoperability and synchronisation between Darcs and other VCSs. The bridge will be designed to be generic, but the focus of this project will be Darcs2 <-> Git and Darcs2 <-> Darcs1. The bridge should allow loss-less, correct conversion to and from Darcs repositories, allowing users to use the tool that suits them and their project best, be that Darcs as it currently exists, or another tool. Users of other VCSs will be able to easily convert their repository into Darcs’ format, and “try out” Darcs, as is currently possible with other VCSs, in order to critique its relative merits, for their specific project.

Project Goals

darcs-fastconvert [1] (d-fc) is a tool that can be used to parse the fast-export data format, as supported by all major current VCSs for import/export. It is however a primarily “single-shot” tool, not easily supporting incremental conversion. The overall goal of this project is to build on it, to create a generic bridge that has specialisations for individual VCSs.

This project will be decomposed into several small incremental targets, listed below, in expected order of difficulty. This proposal is not seeking to “solve the world”, I am aware it has some potentially difficult-to-solve problems. However, to mitigate this, two concurrent streams of work can be undertaken: 1) Items 1-3 below are disjoint, smaller tasks, I expect them to be simpler to complete. These items alone would provide valuable additions to Darcs and they will definitely be completed. 2) Items 4 and 5 (and several of the expected challenges) are somewhat deeper, potentially requiring careful design and implementation to realise the performance potential of the bridge, these would be “great-to-have” features.

  1. Allow automatic incremental conversion [< 1 week]. ~ - Extend d-fc to allow for automatic detection and continuation of conversions. Providing a “synchronise” command that will detect the last conversion point or start a new conversion, without requiring the use of fragile manual state files. This will greatly improve the usability and transparency of the tool, when attempting incremental conversions.

  2. Create a mapping/encoding of multi-head repositories [1+ weeks]. ~ - Darcs does currently not provide an implementation of multi-head repositories (in-repo branching), therefore, VCSs that do support this will need to be mapped to multiple Darcs repositories.

  3. Import and export foreign patch formats generated by VCS “send” commands [2 weeks]. ~ - Import and export of externally formatted patches e.g. Git’s format-patch [2], [3] would allow users of an external VCS to send native-formatted patches, and apply them to a Darcs-managed project, and vice-versa.

  4. Solve the problem of efficiently translating to/from Darcs patches [2 weeks]. ~ - Without careful attention, translation between Git snapshots and Darcs patches can lead to severe performance penalties as was found in the darcs-git tool [4], particularly when translating branches. The use of meta-data mappings or other methods of caching could help alleviate the problem of (expensively) recomputing the data each time the triggering commands are used.

  5. Create a consistent mapping between Darcs2 and Darcs1 format repositories [2 weeks]. ~ - Currently, Darcs includes a “convert” command that allows conversion from pre-Darcs2 format repos, to the Darcs2 format. This tool is somewhat fragile: patches created in separate conversions of the same repository should not be exchanged, due to potential patch-ordering differences. This project could provide a more robust conversion, which is not as constraining.

Potential challenges (Goals 4 and 5)

Creating a robust Darcs bridge is potentially very challenging. Particularly, to meet the goals of correctly translating between Darcs patches and traditional DVCS commits, or between patches in two different versions of Darcs, I will need to address the problems below. I hope to provide a solution to as many of these problems as possible, or failing that, a first (potentially inefficient) attempt, which could be used as the basis for further work.

  1. “Roundtripping”, whereby information may be lost when converting to and back from another repository format. ~ - Addressing the round-trip problem, whereby converting from and back to another repository format may lose information, will be difficult. A possible solution would be the use of patch meta-data, to record otherwise non-representable details, such as with encoding Git branches in Darcs, or Darcs replace patches in Git.

  2. Translation to and from Darcs specific patch-types e.g. replace patches. ~ - A naïve translation from Darcs-specific patches is to “flatten” them into simple hunk-patches, losing their semantic information. It may be possible to create an information-preserving mapping, from and (if possible) to Darcs-specific patch types. E.g. converting from a “replace” patch is easy, but would be hard to accurately detect and infer in the reverse direction.

  3. The cycle problem, in the presence of multiple bridges. ~ - If multiple bridges are present between a pair of repositories, care must be taken to uniquely identify patches, so that they cannot endlessly cycle back-and-forth. This may not happen frequently in practice, but it is a possibility, and should be accounted for.

Project Timeline

Week 1:
  • University exams during this week; will use the community bonding period to ensure that no project time is lost. Build on darcs-fastconvert to allow automatic conversion of simple repositories. Some experimentation and code exploration has already taken place for this step, which will hopefully hasten the results further.
Week 2:
  • Investigate and code a mapping form multi-head repositories to Darcs repositories.
Weeks 3 - 4:
  • Implement code to interpret and apply foreign-format patch files to Darcs repositories, and export patches in foreign formats.
Weeks 5 - 9:
  • Investigate and implement solutions to efficient translation between Darcs and Git repositories.
Concurrently, Week 1 - 11:
  • Investigate deeper problems with cycles, roundtripping, Darcs-specific patches and Darcs2<->Darcs1 mapping.
Week 12:
  • Consolidate work: tidy code, improve documentation and “tidy rough edges”. Potentially write a Monad Reader article detailing my project.
Weekly:
  • Create blog updates (blog.owenstephens.co.uk) detailing my progress and successes/challenges.

Benefits for Darcs and Haskell Community

Within the Haskell community there are two main VCSs in use: Git and Darcs. Creating this bridge would lead to improvements in communication and collaboration between projects. Personal VCS preference should not be an obsticle for developers wanting to send patches to a project using an alternative system, all contributions are valuable!

This project will help to resolve the tension between Darcs and Git within the Haskell (and wider) community. Darcs is still important in the VCS world, and this project is not just about making it easy for people to move away from Darcs. Currently, Darcs is recommended for small to medium-sized projects - clearly, this should, and hopefully will, not be the case in later versions of Darcs. Providing a simple mechanism for users to collaborate in the meantime, regardless of their chosen VCS, is therefore important. This project aims to also support the conversion of Darcs2 to Darcs1 or furture Darcs repository formats, providing a simple migration tool and ensuring existing users will have access to the latest features and stability of Darcs.

Outcomes

A working, tested and documented tool that will allow incremental synchronisation between a current version Darcs repository and a Git or older version Darcs repository.

Potential improvement of Darcs API; as a client of the API, it will be a good opportunity to discover any flaws in the design of the Darcs API, and push it towards a clean, modular implementation.

About Me and Why I Want to Work with Darcs

I am a 22 year old student in my 4th year of a MEng Computer Science degree at Southampton University. I have applied for a PhD at Southampton, starting next year, in the area of type-systems and concurrency. My first exposure to Haskell was 2 years ago; I was impressed by the totally different way of thinking required when using it; it’s safe to say Haskell changed my view of programming. For an individual research project undertaken this year, I researched approaches to I/O in functional languages, particularly focusing on Haskell. Over the last two years, I have written Haskell programs for small tools and several university courseworks, and have read a wide variety of both Haskell research papers and tutorials/tools.

Darcs was the first VCS I used, back in 2006; later, I turned to Git, as it was the vogue at the time. I found Git to be complicated, with a complex UI - to effectively use Git’s features, I found it necessary to understand Git’s internals, taking time to read the source-code and documentation. I have since used Git at a work placement and for several university projects, but have now returned to Darcs. I want to get stuck in to a real-world Haskell project, and see Darcs as the perfect project to do so - I want to improve it and be able to recommend it as a genuinely better, simpler alternative. I attended the Darcs sprint in Paris in April 2011, and learnt a great deal about Darcs and integrated well with the core Darcs team.

[1] http://hackage.haskell.org/package/darcs-fastconvert

[2] http://bugs.darcs.net/issue1683

[3] http://bugs.darcs.net/issue1682

[4] http://lists.osuosl.org/pipermail/darcs-devel/2005-May/002340.html