Incrementally exporting a Fossil repository to an existing Git repository
Created: 28.05.2020 18:48 UTC
As mentioned in my last blog post, I recently migrated ObjFW from Git to Fossil, while still keeping to commit to the Git repository, with an automatic conversion to Fossil.
This incremental and automatic export to Fossil worked well, but it was slow,
as the fossil import
workflow always requires rebuilding of the
metadata, as otherwise a fossil pull
would not pick up the
changes. Also, Fossil supports some nice features that Git does not, like
giving colors to branches, so I was missing out on these features by using that
workflow.
So I looked into committing to Fossil and exporting to Git instead. Which is
something that is supported out of the box using fossil git
export
. However, it can only export to a new repository, not to
an existing repository. This would have meant I would have lost all Git history
and everybody would be required to delete their checkout and create a new one.
Not only is this rewriting history and cumbersome, but it also has the risk
that if there were problems during the conversion, the good state would be lost
forever.
Therefore, I looked into how to make this work with an existing repository.
Since this works incrementally, there has to be a way to keep state, right?
There needs to be a way to keep track of what Git commit matches what Fossil
commit. And that's exactly what the marks files generated by git
fast-export
and fossil import
do.
When I converted the repository to Fossil, I used the following (simplified, not the actual commands that I used since there were some hoops necessary to cross security boundaries on my server):
# This is important if there is any non-UTF-8 anywhere in the repository
export LC_ALL=C
git fast-export --all --signed-tags=strip --export-marks=git.marks |
sed 's/^committer Jonathan.*>/committer js <js>/' |
sed 's/^author Jonathan.*>/author js <js>/' |
sed 's/^tagger Jonathan.*>/tagger js <js>/' |
USER=js fossil import --git --rename-master trunk \
--export-marks fossil.marks repo.fossil
For later invocations, I then added --import-marks
of the same
marks file to both git
and fossil
for the incremental
import. The sed
invocations are necessary so that the commits get
the correct user js
instead of the e-mail address being used as
the username.
The resulting marks files contain a mapping for mark id to Git commit and mark
id to Fossil commit. When using fossil git export
to create a Git
repository again from the converted Fossil repository, I noticed that it
creates a .mirror_state/db
file in the target Git repository. This
file is an SQLite3 database with the mapping between mark id, Fossil commit and
Git commit and is used for incremental Git exports. So I wrote a
small program that parses the
marks files generated during the initial conversion and later incremental
imports and generates the .mirror_state/db
file for an existing
repository. After creating this file in the existing repository, a normal
`fossil git export` now just incrementally exports to the existing Git
repository without rewriting any history.
Now only one problem remained: fossil git export
always uses
username@noemail.net
as the committer name
(code).
This means the commits are not properly attributed on GitHub. I fixed this by
just hardcoding a fossil_strcmp()
against js
there
and then emitting my e-mail address instead. But of course the proper fix would
be to refactor the
code from
the deprecated export command to be reusable by the new export.
Finally, I set up a cronjob that runs every minute that exports from the Fossil repository to a checked out (important - exporting to a bare repository doesn't work!) Git repository and pushes that, to both my Git server and GitHub. This is fast enough to run it every minute, unlike the other direction (for which I hacked around the expensiveness of the metadata rebuild, which even happens when nothing changed, by doing some checks first if there's actually new commits since the last import).
You can see a Fossil commit here and the Git commit that was automatically created by this process.