I have been using Jekyll to generate both this blog and my academic website for the past year, and I can confidently say that it has solved more problems for me than it has created. (This may sound like faint praise, but I assure you that it is not.)
Recently I have been annoyed at how long it takes to deploy updates to my website due to the way that Jekyll mangles timestamps, which rsync depends heavily on. I finally broke down and spent some time improving the process by tweaking rsync to work better with my Jekyll setup.
It has always bothered me that Jekyll mangles timestamps. When you run
jekyll to regenerate your site, all timestamps are updated to the current time. (This is because all pages are regenerated—a separate and also annoying issue.) So to anything that uses timestamps to determine when a page has changed, it appears that every page changes whenever a single page changes.
There is no solution to this problem within the Jekyll framework. Each output file is created from several input files, so you could imagine setting the timestamp of each output file to be the maximum timestamp from all of the input files. But the input files often live on several computers and/or in a git repository, which makes the timestamp of the input files both ambiguous and worthless. In these circumstances, the timestamp of a file is not the same as the last modified time of the actual data. The only way to preserve the latter is through some external database, the avoidance of which is essentially Jekyll’s raison d’être.
I can overlook the fact that the file metadata on my web server is meaningless, but I have a harder time ignoring the slow deployment this causes. My academic website currently has 43 megabytes in 434 files, all but 400 kilobytes is archival stuff that never changes, and usually I am only changing a few files at a time. Nevertheless,
rsync usually takes 15 seconds, even if I am transferring within the campus network.
I have two sets of files. I want to take all the differences from my local set and send them to the server set. For each pair of files, rsync checks that the sizes and modification times match, and if not, it copies the local file to the server. It has an efficient copy mechanism, so if the files are identical despite having different modification times, very little data is sent. If a large file has only changed in a few places, only the changed chunks are sent.
If you use Jekyll, the modification times never match, so all files are always copied, albeit in an efficient manner. Despite the efficient transfer mechanism, this is slow.
What you want is for rsync to compute and compare checksums for each pair of files, and only transfer files which have different checksums. You can do this by using the
-c) option. Despite a warning from the rsync manual that “this option can be quite slow”, it reduced my transfer time from 15 seconds to 2 seconds.
Here is the command I recommend to deploy a Jekyll site:
rsync --compress \ --recursive \ --checksum \ --delete \ _site/ [email protected]:public_html/
Or, if you prefer the short version:
rsync -crz --delete _site/ [email protected]:public_html/
A side benefit of this tweak is that server timestamps have meaning again. If the local and server files have the same checksum, nothing is copied. The timestamp of the file on the server is now the time the file was last copied to the server.
If you use the
-t) option, the server timestamps are manipulated to match the (meaningless) local file timestamps. This is not what you want.
If you use the
-a) option, which is recommended by almost every rsync tutorial out there, you are implicitly using the
--times option, as
-a is equivalent to
-rlptgoD. This is also not what you want. For a Jekyll site, the only part of
-a that you care about is the
-r. So don’t use
-i) option is a useful way of seeing what is transferred.
-I) option ignores timestamps, but not in the way you want. It simply copies all files no matter what (but still using the efficient transfer mechanism).
--timesoption and don’t use
--checksum, then all files which have matching timestamps are skipped, and all other files are transferred, which changes their timestamp on the server to the current time. If you continue this over time, more and more files have different timestamps even though they are the same, which means they are copied every time.
--size-onlyoption which skips files if they have the same size on the local computer and the server, even if they have different modification times. You are tempting fate if you use this option.