js' blog

Incrementally exporting a Fossil repository to an existing Git repository
Created: 28.05.2020 18:48 UTC

As mentioned in my last blog post, I recently migrated ObjFW from Git to Fossil, while still keeping to commit to the Git repository, with an automatic conversion to Fossil.

This incremental and automatic export to Fossil worked well, but it was slow, as the fossil import workflow always requires rebuilding of the metadata, as otherwise a fossil pull would not pick up the changes. Also, Fossil supports some nice features that Git does not, like giving colors to branches, so I was missing out on these features by using that workflow.

So I looked into committing to Fossil and exporting to Git instead. Which is something that is supported out of the box using fossil git export. However, it can only export to a new repository, not to an existing repository. This would have meant I would have lost all Git history and everybody would be required to delete their checkout and create a new one. Not only is this rewriting history and cumbersome, but it also has the risk that if there were problems during the conversion, the good state would be lost forever.

Therefore, I looked into how to make this work with an existing repository. Since this works incrementally, there has to be a way to keep state, right? There needs to be a way to keep track of what Git commit matches what Fossil commit. And that's exactly what the marks files generated by git fast-export and fossil import do.

When I converted the repository to Fossil, I used the following (simplified, not the actual commands that I used since there were some hoops necessary to cross security boundaries on my server):

# This is important if there is any non-UTF-8 anywhere in the repository
export LC_ALL=C
git fast-export --all --signed-tags=strip --export-marks=git.marks |
	sed 's/^committer Jonathan.*>/committer js <js>/' |
	sed 's/^author Jonathan.*>/author js <js>/' |
	sed 's/^tagger Jonathan.*>/tagger js <js>/' |
	USER=js fossil import --git --rename-master trunk \
	--export-marks fossil.marks repo.fossil

For later invocations, I then added --import-marks of the same marks file to both git and fossil for the incremental import. The sed invocations are necessary so that the commits get the correct user js instead of the e-mail address being used as the username.

The resulting marks files contain a mapping for mark id to Git commit and mark id to Fossil commit. When using fossil git export to create a Git repository again from the converted Fossil repository, I noticed that it creates a .mirror_state/db file in the target Git repository. This file is an SQLite3 database with the mapping between mark id, Fossil commit and Git commit and is used for incremental Git exports. So I wrote a small program that parses the marks files generated during the initial conversion and later incremental imports and generates the .mirror_state/db file for an existing repository. After creating this file in the existing repository, a normal `fossil git export` now just incrementally exports to the existing Git repository without rewriting any history.

Now only one problem remained: fossil git export always uses username@noemail.net as the committer name (code). This means the commits are not properly attributed on GitHub. I fixed this by just hardcoding a fossil_strcmp() against js there and then emitting my e-mail address instead. But of course the proper fix would be to refactor the code from the deprecated export command to be reusable by the new export.

Finally, I set up a cronjob that runs every minute that exports from the Fossil repository to a checked out (important - exporting to a bare repository doesn't work!) Git repository and pushes that, to both my Git server and GitHub. This is fast enough to run it every minute, unlike the other direction (for which I hacked around the expensiveness of the metadata rebuild, which even happens when nothing changed, by doing some checks first if there's actually new commits since the last import).

You can see a Fossil commit here and the Git commit that was automatically created by this process.