Nathan Grigg

Fastmail JMAP backup

I use Fastmail for my personal email, and I like to keep a backup of my email on my personal computer. Why make a backup? When I am done reading or replying to an email, I make a split-second decision on whether to delete or archive it on Fastmail’s server. If it turns out I deleted something that I need later, I can always look in my backup. The backup also predates my use of Fastmail and serves as a service-independent store of my email.

My old method of backing up the email was to forward all my email to a Gmail account, then use POP to download the email with a hacked-together script. This had the added benefit that the Gmail account also served as a searchable backup.

Unfortunately the Gmail account ran out of storage and the POP script kept hanging for some reason, which together motivated me to get away from this convoluted backup strategy.

The replacement script uses JMAP to connect directly to Fastmail and download all messages. It is intended to run periodically, and what it does is pick an end time 24 hours in the past, download all email older than that, and then record the end time. The next time it runs, it searches for mail between the previous end time and a new end time, which is again 24 hours in the past.

Why pick a time in the past? Well, I’m not confident that if you search up until this exact moment, you are guaranteed to get every message. A message could come in, then two seconds later you send a query, but it hits a server that doesn’t know about your message yet. I’m sure an hour is more than enough leeway, but since this is a backup, we might as well make it a 24-hour delay.

Note that I am querying all mail, regardless of which mailbox it is in, so even if I have put a message in the trash, my backup script will find it and download it.

JMAP is a modern JSON-based replacement for IMAP and much easier to use, such that the entire script is 135 lines, even with my not-exactly-terse use of Python.

Here is the script, with some notes below.

  1 import argparse
  2 import collections
  3 import datetime
  4 import os
  5 import requests
  6 import string
  7 import sys
  8 import yaml
  9 
 10 
 11 Session = collections.namedtuple('Session', 'auth account_id download_template')
 12 def get_session(username, password):
 13     auth = requests.auth.HTTPBasicAuth(username, password)
 14     r = requests.get('https://jmap.fastmail.com/.well-known/jmap', auth=auth)
 15     [account_id] = list(r.json()['accounts'])
 16     download_template = r.json()['downloadUrl']
 17     return Session(auth, account_id, download_template)
 18 
 19 
 20 Email = collections.namedtuple('Email', 'id blob_id date subject')
 21 def query(session, start, end):
 22     json_request = {
 23         'using': ['urn:ietf:params:jmap:core', 'urn:ietf:params:jmap:mail'],
 24         'methodCalls': [
 25             [
 26                 'Email/query',
 27                 {
 28                     'accountId': session.account_id,
 29                     'sort': [{'property': 'receivedAt', 'isAscending': False}],
 30                     'filter': {
 31                         'after': start.isoformat() + 'Z',
 32                         'before': end.isoformat() + 'Z',
 33                     },
 34                     'limit': 50,
 35                 },
 36                 '0',
 37             ],
 38             [
 39                 'Email/get',
 40                 {
 41                     'accountId': session.account_id,
 42                     '#ids': {
 43                         'name': 'Email/query',
 44                         'path': '/ids/*',
 45                         'resultOf': '0',
 46                     },
 47                     'properties': ['blobId', 'receivedAt', 'subject'],
 48                 },
 49                 '1',
 50             ],
 51         ],
 52     }
 53 
 54     while True:
 55         full_response = requests.post(
 56             'https://jmap.fastmail.com/api/', json=json_request, auth=session.auth
 57         ).json()
 58 
 59         if any(x[0].lower() == 'error' for x in full_response['methodResponses']):
 60             sys.exit(f'Error received from server: {full_response!r}')
 61 
 62         response = [x[1] for x in full_response['methodResponses']]
 63 
 64         if not response[0]['ids']:
 65             return
 66 
 67         for item in response[1]['list']:
 68             date = datetime.datetime.fromisoformat(item['receivedAt'].rstrip('Z'))
 69             yield Email(item['id'], item['blobId'], date, item['subject'])
 70 
 71         # Set anchor to get the next set of emails.
 72         query_request = json_request['methodCalls'][0][1]
 73         query_request['anchor'] = response[0]['ids'][-1]
 74         query_request['anchorOffset'] = 1
 75 
 76 
 77 def email_filename(email):
 78     subject = email.subject.translate(str.maketrans('', '', string.punctuation))[:50]
 79     date = email.date.strftime('%Y%m%d_%H%M%S')
 80     return f'{date}_{email.id}_{subject}.eml'
 81 
 82 
 83 def download_email(session, email, folder):
 84     r = requests.get(
 85         session.download_template.format(
 86             accountId=session.account_id,
 87             blobId=email.blob_id,
 88             name='email',
 89             type='application/octet-stream',
 90         ),
 91         auth=session.auth,
 92     )
 93 
 94     with open(os.path.join(folder, email_filename(email)), 'wb') as fh:
 95         fh.write(r.content)
 96 
 97 
 98 if __name__ == '__main__':
 99     # Parse args.
100     parser = argparse.ArgumentParser(description='Backup jmap mail')
101     parser.add_argument('--config', help='Path to config file', nargs=1)
102     args = parser.parse_args()
103 
104     # Read config.
105     with open(args.config[0], 'r') as fh:
106         config = yaml.safe_load(fh)
107 
108     # Compute window.
109     session = get_session(config['username'], config['password'])
110     delay_hours = config.get('delay_hours', 24)
111 
112     end_window = datetime.datetime.utcnow().replace(microsecond=0) - datetime.timedelta(
113         hours=delay_hours
114     )
115 
116     # On first run, 'last_end_time' wont exist; download the most recent week.
117     start_window = config.get('last_end_time', end_window - datetime.timedelta(weeks=1))
118 
119     folder = config['folder']
120 
121     # Do backup.
122     num_results = 0
123     for email in query(session, start_window, end_window):
124         # We want our search window to be exclusive of the right endpoint.
125         # It should be this way in the server, according to the spec, but
126         # Fastmail's query implementation is inclusive of both endpoints.
127         if email.date == end_window:
128             continue
129         download_email(session, email, folder)
130         num_results += 1
131     print(f'Archived {num_results} emails')
132 
133     # Write config
134     config['last_end_time'] = end_window
135     with open(args.config[0], 'w') as fh:
136         yaml.dump(config, fh)

The get_session function is run once at the beginning of the script, and fetches some important data from the server including the account ID and a URL to use to download individual emails.

The query function does the bulk of the work, sending a single JSON request multiple times to page through the search results. It is actually a two-part request, first Email/query, which returns a list of ids, and then Email/get, which gets some email metadata for each result. I wrote this as a generator to make the main part of my script simpler. The paging is performed by capturing the ID of the final result of one query, and asking the next query to start at that position plus one (lines 73-74). We are done when the query returns no results (line 64).

The download_email function uses the blob ID to fetch the entire email and saves it to disk. This doesn’t really need to be its own function, but it will help if I later decide to use multiple threads to do the downloading.

Finally, the main part of the script reads configuration from a YAML file, including the last end time. It loops through the results of query, calling download_email on each result. Finally, it writes the configuration data back out to the YAML file, including the updated last_end_time.