Along with todays ownCloud 4.5 release we released the new ownCloud Client 1.1.0 with a new syncing concept.
This blog will shed some light on the details. I apologize, it’s a long read.
Time Issues
ownCloud Client versions 1.0.x worked with csyncs traditional way of using the file modification times to detect updates between the two repositories that should be synced to each other. That works fine and conforms to our idea to ideally not use any other metadata in syncing than what the file system has anyway.
However, there is one drawback which we all know from daily life: If at least two parties sync on time its important that all clocks are set exactly the same way. Remember good crime movies where a bank robbery always starts with a clock adjustment of all gangsters? We have exactly the same in ownClouds syncing: All involved have to have the same time setting, otherwise modification times of files can not be compared reliably.
There are solutions for computers to set the exact time (like ntp) so in general that works. However, in real life scenarios these are not reliable because either people do not have them started on the system or because the daemon updates the time once in a while and in that time span the clock skews already too much.
Users all the time reported problems with that and other experts continued to advise that we never get around that problems if we don’t change something fundamental and go away from pure time based syncing.
Well, we did that with our csync version 0.60.0 which is the sync engine for ownCloud Client 1.1.0.
An Unique Id
Now, every file and directory inside a sync directory has an unique Id associated. The idea is that the Id changes if the file changes. So in the sync process the need for a file update in either direction can be computed by comparing the two Ids of the file. If the id has changed on one repository the file was changed there and needs to be synced to the other side.
The Ids are generated on the ownCloud server and one challenge for the client is to always download the correct Id of a file. The Ids are just random tags for a file version. It is not associated to the file content as MD5 sums would be. Actually it was a frequent advise to use MD5 sums or a similar approach which digests the files content to detect updates. That would have come very handy because that means comparing file contents directly and, more important, it’s reproducable on either side. Also the client would have been able to recalculate the MD5-Sum of the local files and would not have depended on a local database with Ids that were pulled from the server before.
But we decided against hashes. Calculating MD5-Sums is costly in terms of CPU and time, especially for large files. The CPU problem is small on clients, but not on servers where a lot of clients connect to. Even though the sums can be calculated during upload, the problems remain for the case where the server does not see the upload stream, think of the “mount my Dropbox” case.
For files on the ownCloud server, the Id is always updated when the file gets updated. On the client side the last Id of a file is in the client database. It is invalidated in case the files modification time changed meanwhile to detect local changes.
Change Propagation
Another remarkable change in the 1.1.0 client is that change events in the file tree propagate up to the top directory on the owncloud server, ie. if a file changes in a directory, the id of the directory changes as well as the one of its parent directory etc.
That means that to detect if a file tree has changed, it’s enough to check the top most directories Id. If that has changed, ok, than the client needs to dig deeper, but in the not so rare case that nothing has changed, the one call is enough to detect that. That dramatically lowers the server load with clients because instead of digging through the whole directory structure what we did with the 1.0.x series it is a few requests now.
CSync and ownCloud for Success
These are very intrusive changes to csync. For example, we had to add two additional fields to the database, add code that is able to build a representation of the local file tree from the database and make csync query for the file Ids from the server if needed. Deep under the hood the updater, reconciler and propagator code needed changes to work with the Ids. All these changes did not go back to csync upstream yet.
To not conflict with the upstream version of csync we decided to rename our csync version to ocsync. But: This is a temporar solution for the time we need to catch up with upstream again. That will take a while until everything is sorted again but we will work on that.
I am are very excited about the new version of csync. But obviously there are other changes in the ownCloud Client 1.1.0 which will be subject of another blog post.