Syncthing — How Syncing Works
Previously we explored how Syncthing detects local changes and indexes information about files. Now we’re going to look closer at how Syncthing handles changes from other devices.
Recall that files are divided in blocks - typically 128 KiB each, but possibly larger for larger files. Each device calculates the hash (cryptographic checksum) of all blocks making up a file and informs its peers about the file contents. Let’s imagine a file consisting of eight blocks, they grey ones here.
When Syncthing gets an index update from a peer device, containing a new block list, it compares the new block list with the one it already has in the index. If there are differences that means the file contents have changed and we should synchronize the file. Lets say we get a list of eight blocks from another device - the blue ones in this illustration.
We compare the blue blocks we got from the other device to our grey ones. In some cases they are the same – the block data hasn’t changed. Those go onto a have
list. In some cases they differ – the block data has changed. These go onto a need
list.
Now that we know what we have and what we need it’s time to start syncing. Syncthing never alters an existing file, in order to avoid inconsistent files visible to the user. Instead, we create a new temporary file of the right size with no contents.
Now that we have a temporary file we copy all the unchanged blocks from the existing, old version of the file. After reading each block we calculate the hash and make sure it is what we expect.
At this point we’ve done what we can with just the data from the existing copy of the file. We still need to handle the blocks in the need list - B2, B3, and B5. For each of these Syncthing will attempt a database lookup of the hash. The index database not only maps files to block lists, but also maps block hashes to files and offsets. This means that if a certain block exists locally in another file, we will find it any copy it. As always, the hash is verified while copying.
Any remaining blocks at this point can’t be found locally so we must ask other devices for them. Syncthing sends requests for the blocks, verifies the responses against the expected hash, and writes the block to the temp file.
Once all blocks have been put in place the temp file gets the correct permissions, attributes and modification time set. Then the old file is removed or archived, and the temp file is moved into place.
The sync is complete and we can update the index database, and send index updates to our peers.
Simplifications
I glossed over a couple of things.
In addition to the straight up block hashes there is also a rolling hash that is computed over the file. We use this to find blocks that have moved in a file – for example if a file was rewritten with more data at the beginning, causing a bunch of blocks to move to a new offset. This comes into play in the copying step, allowing us to find blocks that have the right content but are not at an exact block offset any more.
When an index update includes both a new file and a delete of another file with the same block list we handle it as a rename.
A sync might fail for whatever reason, perhaps that a required block can’t be found in the cluster. In this case Syncthing will keep the half done temp file around, and process it for blocks to reuse on the next sync attempt.
There’s a lot of juggling of permissions, to be able to create files in read only directories for example. There’s also various safety checks along the way to detect if the user suddenly changed the file themselves while were syncing it, to avoid stomping on their changes.
Syncthing can also send index updates while creating the temp file, and other devices can request blocks from our temp file while we are still working on it.