Use Mail.app to send emails from the command line

Apr 27, 2012

AppleScript is a great tool. It is awesome to be able to get the selected text from an application, grab the current URL from Safari, ask the user to choose a file, or show a dialog box requesting text. But writing AppleScript scripts is usually painful.

For anything mildly complicated, I would much rather write something in Python. So a lot of my AppleScripts look like this:

Get information from the user or currently open application
do shell script some_python_or_bash_script
Do something with the result

For yet another time, I recently found myself making an AppleScript where part 3 of the process involved composing an email to someone. It is difficult to take the result of the shell script (which is just a single, structureless string) and parse out multiple fields (body, subject, recipient) to pass to a complicated make new message command.

So instead, I made a Python wrapper around the make new message AppleScript command. Yes, that means I am using AppleScript to call a shell script which runs an AppleScript, but I’m okay with that. (Others have done the same thing, but not with the full set of options that I wanted.)

Why use Mail.app?

There are already command line mail programs. Why not just use one of them? Two reasons.

First, getting mail to transfer properly is always a pain. Comcast won’t let you use their SMTP, and if they did, your message would probably be marked as spam. So you have to figure out how to hook authenticated SMTP up to Google, and then it breaks, and you just get sick of it. Currently, my best solution to this has been to pipe a message over SSH to my work computer, which has a fully functional transfer agent, just to send an email to myself!

Second, and more important, often you want to see the message and maybe edit it a little before you send it. This also minimizes the chance that a script will screw up and either not send the mail or send duplicates.

Create the AppleScript

AppleScript to create a mail message looks about like this:

tell application "Mail"
    make new outgoing message with properties {visible:true,¬
    subject:"Happy Birthday!",content:"The big 60!"}
    tell result
        make new to recipient with properties {address:"[email protected]"}
        make new attachment with properties {file name:"cake.jpg"}
    end tell
end tell

The first half of the Python script does nothing more than create an AppleScript and feed it to the osascript command.

#!/usr/bin/python

import sys
import argparse
import os.path
from subprocess import Popen,PIPE

def escape(s):
    """Escape backslashes and quotes to appease AppleScript"""
    s = s.replace("\\","\\\\")
    s = s.replace('"','\\"')
    return s

def make_message(content,subject=None,to_addr=None,from_addr=None,
    send=False,cc_addr=None,bcc_addr=None,attach=None):
    """Use applescript to create a mail message"""
    if send:
    properties = ["visible:false"]
    else:
    properties = ["visible:true"]
    if subject:
    properties.append('subject:"%s"' % escape(args.s))
    if from_addr:
    properties.append('sender:"%s"' % escape(args.r))
    if len(content) > 0:
    properties.append('content:"%s"' % escape(content))
    properties_string = ",".join(properties)

    template = 'make new %s with properties {%s:"%s"}'
    make_new = []
    if to_addr:
    make_new.extend([template % ("to recipient","address",
        escape(addr)) for addr in to_addr])
    if cc_addr:
    make_new.extend([template % ("cc recipient","address",
        escape(addr)) for addr in cc_addr])
    if bcc_addr:
    make_new.extend([template % ("bcc recipient","address",
        escape(addr)) for addr in bcc_addr])
    if attach:
    make_new.extend([template % ("attachment","file name",
        escape(os.path.abspath(f))) for f in attach])
    if send:
    make_new.append('send')
    if len(make_new) > 0:
    make_new_string = "tell result\n" + "\n".join(make_new) + \
        "\nend tell\n"
    else:
    make_new_string = ""

    script = """tell application "Mail"
    make new outgoing message with properties {%s}
    %s end tell
    """ % (properties_string, make_new_string)

    # run applescript
    p = Popen('/usr/bin/osascript',stdin=PIPE,stdout=PIPE)
    p.communicate(script) # send script to stdin
    return p.returncode

Dr. Drang recently complained about how inconvenient it is to send data to a subprocess in Python. I feel his pain, because I have spent plenty of time and trial and error to figure out how Popen and communicate work. The official documentation is no help, either.

In the end, though, there is nothing terribly ugly about the three lines that run the AppleScript. If you want to send anything to the subprocess’s stdin, you need the argument stdin=PIPE (or =subprocess.PIPE, depending on your import statement). Running communicate returns a tuple with the subprocess’s stdout and stderr, but only if you use the arguments stdout=PIPE and stderr=PIPE. So my script, communicate only returns the stdout (which I discard).

When you don’t specify stderr=PIPE, the error output is just passed along to the main process’s stderr (and so also with stdout). If you run my script from the command line, any errors from the osascript command will just be printed on your screen (unless, of course, you do something like 2>foo).

Use argparse

My newest rule to myself is “Never parse your own command line arguments.” Especially when I make something that I only ever plan to call from other scripts, and nobody but me is ever going to see, it is very tempting to do something stupid like require 8 positional arguments in a specific order.

Then you change some script somewhere and everything breaks. Or you want to use the script again and there is no --help. So you have to jump into source that you wrote a year ago just to figure out what to do. Not good.

The argparse library is new and replaces the short-lived and now depreciated optparse. But it has lots of useful bells and whistles. For example, with the type=argparse.FileType() option, you can add an argument that expects a filename and automatically opens the file for you. It also creates a --help option automatically.

Here is the second half of the script.

def parse_arguments():
    parser = argparse.ArgumentParser(
    description="Create a new mail message using Mail.app")
    parser.add_argument('recipient',metavar="to-addr",nargs="*",
    help="message recipient(s)")
    parser.add_argument('-s',metavar="subject",help="message subject")
    parser.add_argument('-c',metavar="addr",nargs="+",
    help="carbon copy recipient(s)")
    parser.add_argument('-b',metavar="addr",nargs="+",
    help="blind carbon copy recipient(s)")
    parser.add_argument('-r',metavar="addr",help="from address")
    parser.add_argument('-a',metavar="file",nargs="+",
    help="attachment(s)")
    parser.add_argument('--input',metavar="file",help="Input file",
    type=argparse.FileType('r'),default=sys.stdin)
    parser.add_argument('--send',action="store_true",
    help="Send the message")
    return parser.parse_args()


if __name__ == "__main__":
    args = parse_arguments()
    content = args.input.read()
    code = make_message(
        content,
        subject = args.s,
        to_addr = args.recipient,
        from_addr = args.r,
        send = args.send,
        cc_addr = args.c,
        bcc_addr = args.b,
        attach = args.a)
    sys.exit(code)

When you run parse_args, it returns a special Namespace object, which has the parsed arguments as attributes. (Why didn’t they use a dictionary?) In my script, “recipient”, which is a positional argument because it lacks a leading hyphen, is stored in args.recipient. The subject is stored in args.s. If I wanted to, I could pass ["--subject","-s"] to add_argument, and then the subject would be stored in args.subject, but could be specified on the command line as either -s subject or --subject subject. With the action="store_true" argument, args.send will be true if the user gives the --send option, and false otherwise.

I call the script mailapp. Just run

$ ls | mailapp -s "Here's how my home directory looks"

Rsyncing Jekyll

Apr 10, 2012

I have been using Jekyll to generate both this blog and my academic website for the past year, and I can confidently say that it has solved more problems for me than it has created. (This may sound like faint praise, but I assure you that it is not.)

Recently I have been annoyed at how long it takes to deploy updates to my website due to the way that Jekyll mangles timestamps, which rsync depends heavily on. I finally broke down and spent some time improving the process by tweaking rsync to work better with my Jekyll setup.

The Jekyll timestamp problem

It has always bothered me that Jekyll mangles timestamps. When you run jekyll to regenerate your site, all timestamps are updated to the current time. (This is because all pages are regenerated—a separate and also annoying issue.) So to anything that uses timestamps to determine when a page has changed, it appears that every page changes whenever a single page changes.

There is no solution to this problem within the Jekyll framework. Each output file is created from several input files, so you could imagine setting the timestamp of each output file to be the maximum timestamp from all of the input files. But the input files often live on several computers and/or in a git repository, which makes the timestamp of the input files both ambiguous and worthless. In these circumstances, the timestamp of a file is not the same as the last modified time of the actual data. The only way to preserve the latter is through some external database, the avoidance of which is essentially Jekyll’s raison d’être.

Rsync complications

I can overlook the fact that the file metadata on my web server is meaningless, but I have a harder time ignoring the slow deployment this causes. My academic website currently has 43 megabytes in 434 files, all but 400 kilobytes is archival stuff that never changes, and usually I am only changing a few files at a time. Nevertheless, rsync usually takes 15 seconds, even if I am transferring within the campus network.

I have two sets of files. I want to take all the differences from my local set and send them to the server set. For each pair of files, rsync checks that the sizes and modification times match, and if not, it copies the local file to the server. It has an efficient copy mechanism, so if the files are identical despite having different modification times, very little data is sent. If a large file has only changed in a few places, only the changed chunks are sent.

If you use Jekyll, the modification times never match, so all files are always copied, albeit in an efficient manner. Despite the efficient transfer mechanism, this is slow.

The correct way to use rsync with Jekyll

What you want is for rsync to compute and compare checksums for each pair of files, and only transfer files which have different checksums. You can do this by using the --checksum (or -c) option. Despite a warning from the rsync manual that “this option can be quite slow”, it reduced my transfer time from 15 seconds to 2 seconds.

Here is the command I recommend to deploy a Jekyll site:

rsync --compress \
      --recursive \
      --checksum \
      --delete \
      _site/ [email protected]:public_html/

Or, if you prefer the short version:

rsync -crz --delete _site/ [email protected]:public_html/

More meaningful timestamps on the server

A side benefit of this tweak is that server timestamps have meaning again. If the local and server files have the same checksum, nothing is copied. The timestamp of the file on the server is now the time the file was last copied to the server.

If you use the --times (or -t) option, the server timestamps are manipulated to match the (meaningless) local file timestamps. This is not what you want.

If you use the --archive (or -a) option, which is recommended by almost every rsync tutorial out there, you are implicitly using the --times option, as -a is equivalent to -rlptgoD. This is also not what you want. For a Jekyll site, the only part of -a that you care about is the -r. So don’t use -a.

Miscellaneous notes on rsync options

The --itemize-changes (-i) option is a useful way of seeing what is transferred.
The --ignore-times (-I) option ignores timestamps, but not in the way you want. It simply copies all files no matter what (but still using the efficient transfer mechanism).
If you leave off the --times option and don’t use --checksum, then all files which have matching timestamps are skipped, and all other files are transferred, which changes their timestamp on the server to the current time. If you continue this over time, more and more files have different timestamps even though they are the same, which means they are copied every time.
There is a --size-only option which skips files if they have the same size on the local computer and the server, even if they have different modification times. You are tempting fate if you use this option.

Drawing a slope field in SVG using Python

Mar 21, 2012

[Note: Due to changes to Heroku’s offerings, the online version of this has been shut down.]

I am teaching a differential equations course this quarter. Most of my students find slope fields very useful to visualize the set of solutions to a first order differential equation. Here is an example:

Since I know y’ as a function of y and t, I draw a short line at each point (t,y) with slope y’. Then every solution to the differential equation must run tangent to all of these little lines. If you think of the lines as the flow direction of a river, the solutions are the lines traced out by dropping a stick in the river and watching it flow.

Slope fields are incredibly easy to draw, you just have to plug in 900 or so combinations of t and y. But it isn’t something you want to do by hand. There are quite a few slope field generators out on the internet. Most of them are written as Java applets, like this one that I have pointed my students to in the past. Java applets are always painful to use and don’t work at all on the iPad, so I put together a Python script to draw the slope field.

Here are some of the different technologies that go into making this work. There is something for everyone.

Scalable Vector Graphics (SVG)

Until now, my only real experience the SVG format is the SVG gnuplot terminal, which I use to monitor the weather near my house. It is a vector format, which makes it great for line-based drawings. It is based on XML, which makes it easy to manipulate. Even better, it is human-readable. Here is the code for one tick mark:

<line x1 = "47.2" y1 = "121.5" x2 = "54.3" y2 = "127.1" />

You can imagine how easy this makes it to generate the graph using Python.

In HTML5, you can use the svg tag to embed the SVG code directly into the html file, and all current versions of major browsers support this. So if you want to see what the code looks like for the slope field above, just view this page’s source.

Python generator functions

Most of the time, I’m writing scripts to do stuff on my personal computer. I make excessive use of long lists and giant strings, never worrying about memory. This is analogous to using UNIX commands one at a time, writing the output to a temporary file each time.

Generator functions can be used to implement a pipeline structure, like when you chain UNIX commands. You can link several generator functions so that each one uses the output of another. Similar to a UNIX pipeline, the functions run in parallel, and there is no need to temporarily store the output.

Here is an example, a modified version of what I used to make the slopefield.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


def slopefield(fn,tmin,tmax,dt,ymin,ymax,dy):
    """Generator for the slopefield ticks"""
    t = tmin + 0.5 * dt
    while t < tmax:
        y = ymin + 0.5 * dy
        while y < ymax:
            # tick calculates the endpoints of the tick
            yield tick(t,y,fn)
            y += dy
        t += dt

for tick in slopefield(fn,tmin,tmax,dt,ymin,ymax,dy):
    # svg generates one line of svg code
    print svg(tick)

The yield statement makes slopefield a generator function. You could actually do this whole thing without a generator by replacing yield tick(t,y,f) with print svg(tick(t,y,f)). But when I was testing my code, I found it very useful to be able to access the intermediate data, e.g.

for tick in slopefield(fn,tmin,tmax,dt,ymin,ymax,dy):
    print tick

The great thing is that slopefield only generates its output as needed. There is no need to store it in memory somewhere.

You can only use a generator one time through.

Input Sanitization

Originally, I thought that I would parse the equations by myself, because the eval command can be used to execute arbitrary code, and that sounds pretty scary. I thought about using pyparser, but decided I was being ridiculous.

I convinced myself that it would be safe to use eval with the following sanitization, which checks every word of the string against a whitelist.

def sanitize(fn_str):
    """Sanitizes fn_str, evaluates, and returns a function"""

    VALID_WORDS = ['','sin','cos','tan','t','y','abs','sqrt','e',
        'pi','log','ln','acos','asin','atan','cosh','sinh','tanh']

    # separate into words on number and operator boundaries
    words = re.split(r'[0-9.+\-*/^ ()]+',fn_str)

    for word in words:
            if word not in VALID_WORDS:
            error('Unrecognized expression in function: %s' % word)

    s = fn_str.replace('^','**')

    # replace 1.232 with float(1.234)
    s = re.sub(r'[0-9.]+', r'float(\g<0>)', s)

    return eval("lambda t,y: " + s)

I wrap the numbers in a float command because Python does arbitrary-precision integer arithmetic, and I don’t want people to be able to type stuff like 9^9^9^9^9^9^9^9 and set the processor going crazy. That, and we avoid any integer division issues.

As noted above, I had to shut down the sample version of this due to Heroku changes, but the code is on GitHub.

Math publishing, open access, and my arXiv.org workflow

Feb 15, 2012

Publishing a mathematics research paper is a complex and entangled process. If a math professor wants to get tenure, they need to publish their research in prestigious journals, so they submit papers to these journals. The journals, whose editors are other mathematicians, send the papers off the be peer reviewed. Neither the editors nor the reviewers are paid by the journal. The journal doesn’t even have to typeset the paper, since almost all math papers are written in Latex (99 percent, according to the sample I took of papers posted to the math section of arXiv.org). The paper is eventually published, and the journal charges a pretty high price for printed or electronic versions of the paper (sometimes over a dollar per page, which is paid primarily by university libraries). The main thing that the journal provides is its reputation, which serves as a very useful signal about the quality of the papers it contains.

There has been some talk in the past several years about open access in academia. The reasoning is that if taxpayer money is funding basic research, the results should be available to everyone at little or no cost.

The math community has open access almost completely figured out, thanks to the arXiv.org repository (which we usually refer to as “The arXiv”, which is pronounced “The Archive”, because the X stands for the greek letter chi). Although you transfer copyright to the journal when they publish your paper, what you do beforehand is generally your business. So before you submit your paper to a journal, you submit it to arXiv.org, which will post it online forever. Although I don’t know statistics, the vast majority of papers published today are listed on arXiv.org.

Journals are still necessary because of the peer review process and the prestige factor. In addition, not all authors incorporate corrections and changes that they make to their paper during the peer review process into the arXiv.org version. Actually, some journals prohibit you from doing this, although in practice, the word is if they ever tried to crack down on this there would be mass revolt.

That doesn’t mean the community is happy with journals. It would be nice if they allowed more open access, or charged less, etc. To this end, some are participating in a boycott of one company that is seen as a “worst offender”. They have comparatively high prices, they try to stop you from publishing changes to arXiv, and they lobby in favor of legislation against open access. (You probably aren’t surprised to hear they also were in favor of SOPA.)

Organizing arXiv.org papers

The reason I’ve been thinking about this is that a lot of the papers I read come from arXiv.org, and I’ve been trying to improve the way I keep track of the papers I read.

My workflow for managing papers I read goes something like this:

Find a paper I want to read, usually from a citation or the arXiv.org RSS feed.
Download a PDF to my iPad or Dropbox folder. Usually it has a stupid file name, but I don’t worry yet.

(time passes)
Import the citation into BibDesk. BibDesk automatically renames the file using the author and title and moves it to my folder on my Dropbox. The difficult part is step 3. It is very useful to use BibDesk to organize the metadata for the papers I read, but it can be unnecessarily time consuming. BibDesk can import from many different websites, this usually requires you to do search using BibDesk’s built in browser for a paper that you already found once before.

This week I wrote a Python script to download arXiv.org references. It takes one or more arXiv identifiers and returns a citation for the corresponding paper in BibTeX format. Using the script I wrote, together with BibDesk’s great AppleScript integration, I can simply enter the arXiv id (which is watermarked onto the article) and everything else happens automatically.

I miss the old Google plus

Feb 5, 2012

I’m feeling a little pressure after Dr. Drang, whose blog I read regularly and benefit from often, linked to my blog. I’ve always written this blog with an eye toward future searchers of the internet, seeking to return the favor for all the unassuming blogs that have helped me stumble to a solution to my technical problems. So the fact that this blog had an average readership of 2.5 (my wife Amy being one of those) has never bothered me. And while I don’t expect a recommendation from Dr. Drang to change things too much (although I would listen to him), I feel special having suddenly gone from anonymity to having a “Gruber number” of 2 (because I was linked to by a blog that has been linked to on Daring Fireball).

From the early days, Google implemented a plus operator in its searches. When you use Google to search, they make pretty liberal use of synonyms. So if you search for “pouring”, Google also returns results for “pour” and “pours” and “poured”. If you search for “tech”, Google will also return results for “technology”. “Russian astronaut” includes pages that mention “cosmonaut”. It use to be that if you wanted to disable this behavior for a specific word, you could put a plus sign in front of it. Now that the plus sign has special (and much less useful) meaning on Google, you can get the same functionality by enclosing a single word in quotes.

But quotes are much harder than a plus sign, because they come in pairs.

My most common use scenario is this: I’m searching for something kind of obscure, and nothing useful shows up in the first page of results. Instead of clicking to the next page, I refine my search. (Honestly, if it isn’t on the first page, why would it be on the second? Or do you plan to do an exhaustive search of the index?) By looking at which words Google shows in bold, you can often tell that the synonym feature is messing with your results, and turning it off will get you what you want.

In the old days, this meant one click, one plus sign, and one press of the return key. Now, it is click, type, back to the mouse for another click, another key press, and finally return. And if I’m using my iPod or iPad? You can guarantee that at least one of those taps will require the use of the magnifying glass.

And just to prove that this is not just academic grumbling, here is one of the several times that I used this today. I used homebrew to install an updated version of imagemagick because the stock version was being dumb. While it was doing its thing, homebrew said something about “pouring” imagemagick. I headed to Google to investigate. You can imagine what results you get if you search for homebrew pouring, but even if you throw mac on to keep things relevant, you get install guides instructing you to “pour yourself a cup of coffee while you wait.” In fact, the if you narrow down your search further with a term like imagemagick, Google, which is still substituting “pour” for “pouring”, just heads to the French corner of the web, where the word “pour” is found in abundance. Some well placed quotes and a shake of my first at Google Plus for stealing my efficiency, and I’m in a much better place.

It turns out that homebrew was downloading a precompiled binary.

Drawing using TikZ

Feb 3, 2012

TikZ is a Latex package that lets you make graphics and diagrams. I have been meaning to check it out for a while now, but this week I finally had a need. Two days of class were cancelled last month due to snow, and the class I’m teaching to needs speed through some topics. So I made some notes for my students, and I drew this depiction of a linear transformation:

The advantage of TikZ over another drawing program is that it is easy to integrate it into a Latex document, matching fonts and font sizes. It is also very easy to include mathematical formulas in your drawings. And the input files are text files, which makes it easy to keep track of or share your drawings.

The advantage of TikZ over pstricks, which is another popular Latex drawing package, is that you can compile it straight to pdf, instead of generating postscript and distilling.

The code that you use to make a picture is very mathematical. This makes it a little difficult if you want to, say, make a new logo for your business, but for simple mathematical objects, it actually makes things easy. Here is an example of the code to draw one of the planes (which are, of course, actually parallelograms).

\begin{tikzpicture}
\draw[fill=blue]
    (-2, 0.6, 1.4)         % starting point
    -- ++(3.6, 0.6, 0)     % move along this vector
    -- ++(0.3, -1.8, -2.7) % then along this vector
    -- ++(-3.6, -0.6, 0)   % then back along that vector
    -- cycle; 			   % and back to where you started
\end{tikzpicture}

The minus signs tell it to make a path, and the plus signs indicate relative movement. The code for the entire picture is here.

TikZ also works great for diagrams. Having tried my hand at xypic and diagrams and amscd and probably others, I don’t think I’ll ever use anything else. If you want to see how to use TikZ for diagrams, check out this article by Felix Lenders.

Latex.bbpackage

Jan 28, 2012 (updated Apr 20, 2013)

Here is a BBEdit package I wrote to help me write Latex.

View on github: nathangrigg/Latex.bbpackage

This package makes it easier to use BBEdit to write Latex.

What it can do

Typeset your document and locate any errors
View the pdf output, with SyncTeX support if you use Skim
View warnings and errors from the log file
Close environments, change environments, and add or remove a star from an environment.
Look up documentation using texdoc
Check your semantics using chktex

There are also a number of BBEdit clippings to insert often-used Latex commands. BBEdit makes it easy to assign keyboard shortcuts to the clippings you use most.

Public key cryptography

Jan 24, 2012

Public key cryptography is vital to the way internet works. It was invented by mathematicians doing “pure math,” which is the kind that everyone always asks “Where in real life is that crazy math actually used?”

Here is a short introduction to how it works.

Imagine that you travel to Mars and find the planet populated by a bunch of cuddly green aliens who immediately appoint you as dictator for life. As you learn more about your newfound subjects, you find that they have an excellent knowledge of multiplication, but do not know how to do division. You make a mental note to teach them sometime.

Encryption

Meanwhile, all of your attentions are on the administration of your new kingdom. To save on collection costs, you decide to implement a system where each citizen’s taxes are withdrawn directly from his or her checking account, and you need to gather everyone’s account numbers. Of course, you want to keep things secret and secure, because identity theft has been in the news lately, and now would be a bad time for a mass revolt.

Suddenly, you get an incredible idea. You send out the following proclamation:

Please multiply your bank account number by 4 and send it to me immediately.

Sincerely, your dictator.

Everyone does so, and even though the mail system on your planet is notoriously corrupt, nobody’s account number is stolen, because you are the only one on the planet that can divide all of the numbers by 4 and thus recover the proper account numbers.

This is the first main idea of public key cryptography. You can tell everyone in the world exactly how to encrypt their data, and they all will be able to encrypt their data, but no one will be able to decrypt anybody else’s data. Only you possess that knowledge. The recipe for encrypting the data is called the public key (because everyone knows it) and the recipe for decrypting is called the private key (because only you know it).

Signatures

Some time later, someone sends out a counterfeit proclamation saying that the tax rate will be raised to 8000 martian dollars per year. An anti-tax group, thinking that the first proclamation was from you, sends out a proclamation saying that the first proclamation was a counterfeit and that the tax rate would actually be lowered to 1200 martian dollars per year. By now everyone is confused and another proclamation would probably not do much to clear anything up.

Suddenly, you get another incredible idea. You use your unique ability to divide in the opposite direction, and send the following proclamation:

The tax amount will remain at 5000 martian dollars per year. To know that I am the one that actually sent this proclamation and that the other proclamations were fake, multiply 1250 by 4 and you will see get 5000, which is the true tax rate.

Since I am the only one who knows how to divide, know one else could have possibly known that 1250 is the number that you can multiply by 4 to get the current tax rate of 5000. So you can be sure that this message is from me.

Sincerely, your dictator.

This is the second main idea of public key cryptography. You can apply the decryption process (division) to an already unencrypted message. Then anyone in the world can apply the encryption process (multiplication) and see the message. So you are not hiding anything. But since nobody else could have possibly done what you did, everyone knows that (1) the message is from you and (2) nobody has changed the message.

Because this kind of analogous to a handwritten signature certifying that you approve a message, it is called a signature. You use your private key to sign a document, and then anyone in the world can use your public key to check that it is really your signature. If someone intercepts your proclamation and changes the 5,000 to 50,000, then it will be obvious to the people that the message is no longer from you.

Certificates

This is pretty much the process that happens every time you log onto a secure website, such as your bank. Wells Fargo sends you their public key, you use it to encrypt your password, and then they are the only ones in the world who can decrypt your password. (I’m oversimplifying just a tiny bit, but this is the idea.)

There is still one problem, though. When you connect to Wells Fargo, and they send you their public key, how do you know it is really Wells Fargo, and not just someone in Nigeria pretending to be Wells Fargo?

To make sure Wells Fargo really is Wells Fargo, they don’t just send you their public key. They send you a copy of their public key that has been signed by someone your computer trusts. Your computer can check the signature of this trusted third party, which tells you that they checked that this was Wells Fargo’s key, and it hasn’t changed since then. A signed public key is usually called a certificate.

Who is the trusted third party? It is someone called a Certificate Authority (CA), whose job is to make sure that all of these public keys really belong to who they say they do. You may have heard of VeriSign, one of the most popular CA’s. As long as the CA’s are being honest (and, really, they’d be out of business otherwise), you know you are safe.

Most of this stuff happens behind the scenes, but every now and then, something goes wrong. Your web browser has probably told you “This website has an invalid certificate” at some point. What it meant was, either there are no trusted signatures on the public key, or the ones there are invalid. Usually something is just expired or set up wrong. But for all you know, someone is pretending to be something or someone they aren’t, so you should be extremely careful.

And a small amount of math

The idea of public key cryptography is pretty new. It depends on what are called one way functions which are processes that are easy to do in one direction (for example multiplication) and hard to undo (for example factoring).

Comparison of Mac genealogy programs

Jan 7, 2012 (updated Mar 14, 2012)

When I was a kid, I used to use PAF to look at the family history research that my mom had done. I remember the old text-based PAF for DOS, which was probably one of the first computer programs I ever used. Back then, and into the early ’00s, PAF was the genealogy program. Sure, there were others, but they tried their best to be compatible with PAF, or at the very least, with the GEDCOM format, which was designed by the same people who designed PAF. If you wanted a full-featured genealogy program that was highly likely to work with other programs, PAF was your best choice. And it was free.

Things have changed since then. PAF is no longer actively developed, as the LDS church has turned its focus to familysearch.org. You can run PAF on current Windows machines, and there are even some companies that make plugins to address some of its usability hurdles. But it is not the standard that it once was.

Of course, none of this helps me, because I no longer run Windows, and PAF for Mac hasn’t worked since OS 9. After several years of storing things in folders and text files, I decided it was time to go app shopping.

What I’m looking for

Future proof. Best would be if it works for the next 50 years of my life, but that’s probably too much to ask. Instead, I will settle for requiring good GEDCOM support. I’m not interested in any bells and whistles that allow me to add information that I can’t export, because then it will be stuck when I am inevitably forced to change programs.
Customizable. In my looking around, I have accepted the fact that I will not be able to find a program that will do everything I want. In particular, nothing can print the reports I want to print. One thing I liked about PAF was that it would let you export csv files of any information you want. This was another way of making sure that things you put in will be able to come out.
Easy to use. Screens are 4 times larger than when I was a kid. You should at least be able to fit as much on the screen as the old PAF did.

What I found

Here are my thoughts. To be honest, I started this blog post with the impression that there was nothing that was going to work for me. As I wrote, however, I became convinced that Reunion was what I was looking for. It won’t do everything that I want, but it is in my opinion enough above all the other options that there is no point spending money on any of the others.

Reunion

I found Reunion to be one of the easiest programs to work in. I love the two column view that puts a couple in the middle, their parents above, and their children below. I’ve always found it easiest to do family history work when focused on families instead of individuals, but so many other programs make this difficult. Also, I love that Reunion takes advantage of multiple windows, allowing you to use that screen space well. I love the “ages” view and all the different views it allows you to use. Also, Reunion has been around for a long time and isn’t going to leave. It allows you to make notes with sources appearing as footnotes in the source, much like PAWriter (below), but in a more standard GEDCOM manner. I probably wouldn’t trust these footnotes to import into another system, though. I’d have to see how it exports to GEDCOM.

On the bad side, Reunion is by far the most expensive of the bunch at $100. Its reports are not very good, but then good reports are hard to make.

MacFamilyTree

MacFamilyTree is beautiful and has some very nice features. At one point, I was sure I was going to buy it. It has good integration with familysearch.org. It has some nice modern touches, with Google Earth views and some 3D pedigree charts. In the end, though, I couldn’t look past how difficult it is to enter data. Adding a birth date takes a few clicks and a couple different pages. There is no way to see all the information about a person in a single view.

Geditcom II

There are many things I like about Geditcom. First, it stores all records in GEDCOM format. This means that you always know everything you are doing is going to be exportable. You can even put the file under revision control (which the author of Geditcom actually encourages). The author provides Python and AppleScript libraries to allow you to make completely custom reports. Even the reports that ship with Geditcom can be edited. Geditcom is free if you want to use it as a GEDCOM reader only.

Unfortunately, working in Geditcom is difficult. Data entry is tough, and some things just don’t work.

[Update: The release of Geditcom 1.7 seems to solve some of the things that weren’t working for me (like the Latex report), and the developers are making progress on ease of use. It is still not as easy as Reunion, but maybe it makes up for that with its custom reports and its use of straight GEDCOM. When I finish my thesis, maybe I’ll get back into genealogy and let you know how things turn out.]

PAWriter II

PAWriter is the closest to PAF of any of the programs that I have tried.

I am a little worried, because it seems like there is a single developer working on this program, and I fear that one day he may abandon it and no one will take his place. Also, the way it does footnotes is nonstandard. It certainly wouldn’t import into another program.

Lifelines

Lifelines is very powerful, although probably too difficult to use on a daily basis. One of the best parts of Lifelines is its ability to make completely custom reports. On the down side, the reports are written in a custom programming language, which makes it a bit of a hassle. Also, you cannot open GEDCOM files directly with Lifelines, you have to import them. So if I were to use Lifelines only to print reports (since it is pretty unusable as a tree editor), there would be a lot of extra steps involved every time.

Conclusion

Someday I plan to buy Reunion. Right now I am busy and poor, so spending that much money on something I don’t really have time to use seems silly. But with the tinkering I’ve done, I am convinced that it is the best.

I will also write a Python script to process GEDCOM files and make nice reports. Then when I want a complicated report, I can export the GEDCOM, run my script, and have something pretty to look at.

Installing AWStats and using it to build static statistic pages

Dec 6, 2011

I recently moved my academic website from the math department web server to the university central web servers because I was getting tired of all the weird problems I was running into.

I’ve never really cared too much about pageview statistics or anything, but with the recent shuffle, I wanted to at least see if there were a bunch of 404 errors due to something I overlooked. So I started looking at the logs, and considered putting together a simple parser to extract and compile the 404’s. Then I found AWStats which would do it for me, plus a bunch of other stats.

Setting up AWStats

Although AWStats is intended to be run as a cgi script, I didn’t want to. I was already going to have to tweak things because I don’t have root access, and there’s no point making things any more complicated than they need to be. And since my logs are only delivered to me once a day, there’s no point to being able to update my statistics on demand. Even if there were, I don’t think I see the point.

I began by following this great guide by George Notaras. Highlights:

Get AWStats and unzip it into some directory (e.g. ~/local/awstats/)
Create a data directory, for use by AWStats (e.g. ~/local/awstats/statdata)
Create a directory where you will store the html files produced by AWStats (e.g. public_html/stats). You need to copy the icons folder from wwwroot to here.
Rename the file wwwroot/cgi-bin/awstats.model.conf to awstats.mysite.conf (replacing mysite with whatever you want) and edit the configuration file. The configuration file needs to remain in the same directory as awstats.pl. The model configuration file gives you most of the information you need. Don’t forget to tell it where your log file is kept (LogFile), what format it is in (LogType), where the data directory you created lives (DirData) and the location of the icons directory (DirIcons), relative to where the HTML files will be.

Generating the reports

In the tools directory, there is a script called awstats_buildstaticpages.pl. Running this will optionally run awstats and then use the output to make all of the appropriate report pages.

Important options:

-config=mysite to tell it what configuration file to use
-update tells it to run awstats first, which reads the log and compiles statistics. If you don’t add this, then it will produce a report based on all the statistics compiled the last time awstats was run.
-awstatsprog=/path/to/awstats.pl tells it the location of the awstats script.
-dir=public_html/stats tells it where to put the html files
-builddate=%YYYY%MM tells it to include the year and month of the report in the filename of the report. This way you can have more than one month of reports at the same time. This does introduce some complications, however (see below). Also, this only affects the filenames and does not change which month and year’s statistics are generated.
-month=05 tells it to compute the statistics for the month of May. Leave this off and it will do this month’s report. Of course, if you want a complete month’s report, you will have to run on the first of the next month, so you will need to specify a month.
-year=2011 tells it what year to use. Mostly important to compile December’s statistics in January.

And watch out for that bug

Update: This bug has been fixed in more recent versions of AWStats.

Unfortunately, things didn’t work exactly like I hoped. Any time I used the -builddate option, the links in the reports didn’t work. This is the fault of awstats_buildstaticpages.pl.

What this script does is make repeated calls to awstats.pl, each with a different -output option, which creates the 10 or so different pages that awstats can generate.

When you specify a -builddate option, it formulates an option something like -staticlinks=mysite.201201 and passes it to awstats. The option should be -staticlinks=awstats.mysite.201201 instead.

If you feel at all comfortable with perl, you can fix the bug by inserting awstats. into the appropriate place. Or you could write a script to call awstats 10 times, essentially rewriting awstats_buildstaticpages.pl, but in your favorite language and simplified because you aren’t going to make it accept as many options. Or you could hope that the bug is fixed by the time you read this.

My computer setup

Nov 17, 2011

Here is my technology setup. Amy always makes fun of me because she thinks I change things around too often. “If the act of optimizing takes more time that the optimization saves, what’s the point?” she often asks. She has a good point. But I’ll continue to ignore it.

At home, I have a 2009 21-inch iMac. For the most part, I stick with the standard tools—I use Safari, Mail, iPhoto, Terminal, and Preview. For Latex, which I use to typeset everything math, I use BBEdit to edit and TeXShop to compile and preview. I use BibDesk to keep track of papers I read, and I read most papers in full-screen Preview, unless I’m reading on my iPad. I also use BBEdit for pretty much all the rest of my text-editing needs.

Some other apps that I love and use often: FaceTime, to talk to family, Caffeine, to keep my Mac awake, and Notational Velocity for small notes. I use Soulver as my calculator and Day One for keeping a journal. Oh, and 1Password for passwords.

On my iPad, I use PDF Expert to read and annotate papers. I use Instapaper, Twitter, and Reeder to keep up with goings-on in the world. I play Carcassone and Lab Solitaire. I use Day One for my journal, the Kindle app to read books, and I use Verbs for instant messaging, mostly with Amy.

I use my iPod touch to listen to podcasts and check my mail or twitter at times when the iPad would be inconvenient. I use TweetBot because I love its “View replies to this tweet” feature. I use OneBusAway when I take the bus. I use Evernote to take pictures of the whiteboard when it looks like it might be worth saving.

In honor of the 7 billionth baby; or, Not every billion is created equal

Nov 1, 2011

With all the hype that has been going on with the population of the world passing 7 billion people (sample headlines: “Cheers and fears as world population hits 7 billion” and “7 billionth baby: Congratulations are mixed with dire words”), I can’t help but add a little math to the conversation to make things interesting (and meaningful).

Before we talk math, you should know that part of why I care about this is that I take issue with the Malthusian prophecies and general spreading of fear. Many people believe if we don’t take drastic action (on population growth) soon, the fragile planet will collapse under the weight of its huge population. I think this is false, but also misguided. If we are worried about the planet’s resources—and I think we should be—we should be concentrating on limiting our aggregate effect on the environment, insetad of assuming that a smaller population will fix all (or any) of our problems.

I do not think that word means what you think it means

Population growth is exponential. It always has been and always will be. But “exponential” does not mean “huge”. It means that the change in population (number of births minus number of deaths) is a multiple of the current population. This is why we usually talk about population growth in terms of percentages instead of absolute growth.

Of course, when we see a big round number like 7 billion, the percentage talk goes out the window and we start comparing how long it took the world to reach each successive big round number. Did you know that population increased from 6 to 7 billion in less time that it increased from 4 to 5 billion? Therefore population is growing faster now than it was in the 60’s, right?

If we were talking about the number of tater tots produced in a factory, then adding a billion to 4 billion to get 5 billion is exactly the same as adding a billion to 6 billion to get 7 billion. But people aren’t produced by machines. We are the machines. So even though birth rates are lower today than in the 60’s, the population is growing faster. In other words, adding a billion to 4 billion (25%) is harder than adding a billion to 6 billion (17%).

Visualizing the data

Mathematicians would say that in the realm of exponential growth, 6 and 7 billion are closer together than 4 and 5 billion are. Then they would write down some formulas and theorems to indicate exactly what they mean by “closer together”. At the end of the process, you’d get what we call a log graph. This is just like any other graph, except we stretch out the y-axis (by applying the log function) to reflect our new understanding of distance. Here’s what you get.

The graph on the bottom is the growth rate of the population. You see here that population growth peaked in the 60’s and has been decreasing since. You can see this in the population graph as an inflection point, where the graph starts to level out. In fact, the UN predicts that before the end of this century, the graph will hit its maximum (of about 10 billion) and then start to decrease slightly, finally stabilizing at about 9 billion. Of course, this is just an extrapolation of past trends, and no one knows how accurate these predictions will be.

The point I’m trying to make, though, is that it is hard to see these trends by looking at the population graph you usually see. Looking at those graphs, you would say that anyone who believes that world population is “leveling off” is way off track.

It is the people that treat population growth as a giant odometer that are not seeing things clearly.

The graph uses numbers from the U.S. Census Bureau and HYDE, from this wikipedia page. Also, tater tots were invented by my uncle.

Margin-free Latex for on-screen viewing

Oct 28, 2011

I’m spending some time typing up some recent research. Of course, it’s all very mathematical, so I am using Latex. Latex is a markup language, which means that you write the document in a plain text editor, using codes to indicate font changes or things like that. For example, to typeset some text in italics, instead of pushing a button in some program, you type \emph{word} to mean that word should be emphasized (italicized, it turns out). When you are ready to see your document, you run a program which reads in the text file and outputs a PDF file. It is very useful because it creates well-typeset documents and also has features to make typing math really easy.

The rest of this post is meant for people who already know how to use Latex.

One thing that has often bugged me about my Latex workflow is margins. Not margins in the actual printed document, which you don’t really want to change because that would make your lines too long to easily read. No, I’m talking about while I am working on the document, I have a preview window open so I can see what the thing will look like when it’s done, and 40% of the preview window is wasted to margins.

So here is how you remove margins without changing the line length: by changing the paper size. The easiest way is to use the geometry package, which comes standard with any modern Tex distribution. Just place the following in your preamble:

\usepackage[paperwidth=\textwidth + 50pt,
            paperheight=\textheight + 50pt,
            margin=25pt]{geometry}

Boom. Text the same width and height as before, but with tiny margins and a smaller page.

ASCII

Oct 19, 2011

I’ve always loved ASCII. As a kid, I spent a considerable amount of time studying the code chart that was printed in our Epson dot matrix printer’s user manual.

The one thing that I always sort of wondered, but never really asked myself, was “Why do they leave space between the uppercase letters and lowercase letters?” (I’m talking about [, \, ], ^, -, and `.) I thought it was a little annoying, actually, but I never questioned, because that was just the way it was.

I can’t believe that it is only now that I find out that they wanted the lowercase letters and uppercase letters to have only a one bit difference. For example, the code for N is 4E, and the code for n is 6E. In binary, then, N is 1001110 and n is 1101110. And if you want to change something to all caps? Just change that second 1 to a 0, and you are good.

Printing git information in Latex

Oct 18, 2011

What I wanted was a way to print git information in a Latex file in a way that (1) doesn’t modify the actual source and (2) degrades gracefully, that is, my document will still compile for someone else, even if they do not do things my way.

Setting up the Latex source

I start by putting the macro \RevisionInfo where I want it in my Latex source file. I also put a \providecommand command in the preamble to define the default value and ensure that the document compiles even when the git information is not given.

For example:

\documentclass{amsart}
\providecommand{\RevisionInfo}{}
...
\begin{document}
\maketitle
\RevisionInfo
...
\end{document}

Inserting the git information

With a little effort, you can coax git to output information about the most recent commit in the form you want. For example:

git log -1 --date=short --format=format:\
    '\newcommand{\RevisionInfo}{Revision %h on %ad}'

Then you get Latex to put this at the beginning of the source file as you are compiling:

latex $(git log -1 .....) \input{document.tex}

As I said, I only do this if I’m planning on printing or emailing the pdf. The nice thing is that if I’m working on the project with someone else, and they aren’t using git, it doesn’t matter. Everything still works just fine for them, except copies they compile don’t get commit information on them.

Since I use BBEdit to write most of my Latex, it is easy to make a script that will “Typeset inserting git info.”

In the time between when I figured this stuff out and I wrote this post, a package called gitinfo by Brent Longborough was posted on CTAN. It is almost exactly what I wanted to do, but in package form. It will also compile fine even when not inside the repository and it has the added benefit of being much more automatic (once you set it up). The downside is that whoever compiles it needs a copy of the gitinfo package.