Nathan Grigg

Latex.bbpackage

(updated )

Here is a BBEdit package I wrote to help me write Latex.

View on github
nathangrigg/Latex.bbpackage

This package makes it easier to use BBEdit to write Latex.

What it can do

There are also a number of BBEdit clippings to insert often-used Latex commands. BBEdit makes it easy to assign keyboard shortcuts to the clippings you use most.


Public key cryptography

Public key cryptography is vital to the way internet works. It was invented by mathematicians doing “pure math,” which is the kind that everyone always asks “Where in real life is that crazy math actually used?”

Here is a short introduction to how it works.

Imagine that you travel to Mars and find the planet populated by a bunch of cuddly green aliens who immediately appoint you as dictator for life. As you learn more about your newfound subjects, you find that they have an excellent knowledge of multiplication, but do not know how to do division. You make a mental note to teach them sometime.

Encryption

Meanwhile, all of your attentions are on the administration of your new kingdom. To save on collection costs, you decide to implement a system where each citizen’s taxes are withdrawn directly from his or her checking account, and you need to gather everyone’s account numbers. Of course, you want to keep things secret and secure, because identity theft has been in the news lately, and now would be a bad time for a mass revolt.

Suddenly, you get an incredible idea. You send out the following proclamation:

Please multiply your bank account number by 4 and send it to me immediately.

Sincerely, your dictator.

Everyone does so, and even though the mail system on your planet is notoriously corrupt, nobody’s account number is stolen, because you are the only one on the planet that can divide all of the numbers by 4 and thus recover the proper account numbers.

This is the first main idea of public key cryptography. You can tell everyone in the world exactly how to encrypt their data, and they all will be able to encrypt their data, but no one will be able to decrypt anybody else’s data. Only you possess that knowledge. The recipe for encrypting the data is called the public key (because everyone knows it) and the recipe for decrypting is called the private key (because only you know it).

Signatures

Some time later, someone sends out a counterfeit proclamation saying that the tax rate will be raised to 8000 martian dollars per year. An anti-tax group, thinking that the first proclamation was from you, sends out a proclamation saying that the first proclamation was a counterfeit and that the tax rate would actually be lowered to 1200 martian dollars per year. By now everyone is confused and another proclamation would probably not do much to clear anything up.

Suddenly, you get another incredible idea. You use your unique ability to divide in the opposite direction, and send the following proclamation:

The tax amount will remain at 5000 martian dollars per year. To know that I am the one that actually sent this proclamation and that the other proclamations were fake, multiply 1250 by 4 and you will see get 5000, which is the true tax rate.

Since I am the only one who knows how to divide, know one else could have possibly known that 1250 is the number that you can multiply by 4 to get the current tax rate of 5000. So you can be sure that this message is from me.

Sincerely, your dictator.

This is the second main idea of public key cryptography. You can apply the decryption process (division) to an already unencrypted message. Then anyone in the world can apply the encryption process (multiplication) and see the message. So you are not hiding anything. But since nobody else could have possibly done what you did, everyone knows that (1) the message is from you and (2) nobody has changed the message.

Because this kind of analogous to a handwritten signature certifying that you approve a message, it is called a signature. You use your private key to sign a document, and then anyone in the world can use your public key to check that it is really your signature. If someone intercepts your proclamation and changes the 5,000 to 50,000, then it will be obvious to the people that the message is no longer from you.

Certificates

This is pretty much the process that happens every time you log onto a secure website, such as your bank. Wells Fargo sends you their public key, you use it to encrypt your password, and then they are the only ones in the world who can decrypt your password. (I’m oversimplifying just a tiny bit, but this is the idea.)

There is still one problem, though. When you connect to Wells Fargo, and they send you their public key, how do you know it is really Wells Fargo, and not just someone in Nigeria pretending to be Wells Fargo?

To make sure Wells Fargo really is Wells Fargo, they don’t just send you their public key. They send you a copy of their public key that has been signed by someone your computer trusts. Your computer can check the signature of this trusted third party, which tells you that they checked that this was Wells Fargo’s key, and it hasn’t changed since then. A signed public key is usually called a certificate.

Who is the trusted third party? It is someone called a Certificate Authority (CA), whose job is to make sure that all of these public keys really belong to who they say they do. You may have heard of VeriSign, one of the most popular CA’s. As long as the CA’s are being honest (and, really, they’d be out of business otherwise), you know you are safe.

Most of this stuff happens behind the scenes, but every now and then, something goes wrong. Your web browser has probably told you “This website has an invalid certificate” at some point. What it meant was, either there are no trusted signatures on the public key, or the ones there are invalid. Usually something is just expired or set up wrong. But for all you know, someone is pretending to be something or someone they aren’t, so you should be extremely careful.

And a small amount of math

The idea of public key cryptography is pretty new. It depends on what are called one way functions which are processes that are easy to do in one direction (for example multiplication) and hard to undo (for example factoring).


Comparison of Mac genealogy programs

(updated )

When I was a kid, I used to use PAF to look at the family history research that my mom had done. I remember the old text-based PAF for DOS, which was probably one of the first computer programs I ever used. Back then, and into the early ’00s, PAF was the genealogy program. Sure, there were others, but they tried their best to be compatible with PAF, or at the very least, with the GEDCOM format, which was designed by the same people who designed PAF. If you wanted a full-featured genealogy program that was highly likely to work with other programs, PAF was your best choice. And it was free.

Things have changed since then. PAF is no longer actively developed, as the LDS church has turned its focus to familysearch.org. You can run PAF on current Windows machines, and there are even some companies that make plugins to address some of its usability hurdles. But it is not the standard that it once was.

Of course, none of this helps me, because I no longer run Windows, and PAF for Mac hasn’t worked since OS 9. After several years of storing things in folders and text files, I decided it was time to go app shopping.

What I’m looking for

  1. Future proof. Best would be if it works for the next 50 years of my life, but that’s probably too much to ask. Instead, I will settle for requiring good GEDCOM support. I’m not interested in any bells and whistles that allow me to add information that I can’t export, because then it will be stuck when I am inevitably forced to change programs.

  2. Customizable. In my looking around, I have accepted the fact that I will not be able to find a program that will do everything I want. In particular, nothing can print the reports I want to print. One thing I liked about PAF was that it would let you export csv files of any information you want. This was another way of making sure that things you put in will be able to come out.

  3. Easy to use. Screens are 4 times larger than when I was a kid. You should at least be able to fit as much on the screen as the old PAF did.

What I found

Here are my thoughts. To be honest, I started this blog post with the impression that there was nothing that was going to work for me. As I wrote, however, I became convinced that Reunion was what I was looking for. It won’t do everything that I want, but it is in my opinion enough above all the other options that there is no point spending money on any of the others.

Reunion

I found Reunion to be one of the easiest programs to work in. I love the two column view that puts a couple in the middle, their parents above, and their children below. I’ve always found it easiest to do family history work when focused on families instead of individuals, but so many other programs make this difficult. Also, I love that Reunion takes advantage of multiple windows, allowing you to use that screen space well. I love the “ages” view and all the different views it allows you to use. Also, Reunion has been around for a long time and isn’t going to leave. It allows you to make notes with sources appearing as footnotes in the source, much like PAWriter (below), but in a more standard GEDCOM manner. I probably wouldn’t trust these footnotes to import into another system, though. I’d have to see how it exports to GEDCOM.

On the bad side, Reunion is by far the most expensive of the bunch at $100. Its reports are not very good, but then good reports are hard to make.

MacFamilyTree

MacFamilyTree is beautiful and has some very nice features. At one point, I was sure I was going to buy it. It has good integration with familysearch.org. It has some nice modern touches, with Google Earth views and some 3D pedigree charts. In the end, though, I couldn’t look past how difficult it is to enter data. Adding a birth date takes a few clicks and a couple different pages. There is no way to see all the information about a person in a single view.

Geditcom II

There are many things I like about Geditcom. First, it stores all records in GEDCOM format. This means that you always know everything you are doing is going to be exportable. You can even put the file under revision control (which the author of Geditcom actually encourages). The author provides Python and AppleScript libraries to allow you to make completely custom reports. Even the reports that ship with Geditcom can be edited. Geditcom is free if you want to use it as a GEDCOM reader only.

Unfortunately, working in Geditcom is difficult. Data entry is tough, and some things just don’t work.

[Update: The release of Geditcom 1.7 seems to solve some of the things that weren’t working for me (like the Latex report), and the developers are making progress on ease of use. It is still not as easy as Reunion, but maybe it makes up for that with its custom reports and its use of straight GEDCOM. When I finish my thesis, maybe I’ll get back into genealogy and let you know how things turn out.]

PAWriter II

PAWriter is the closest to PAF of any of the programs that I have tried.

I am a little worried, because it seems like there is a single developer working on this program, and I fear that one day he may abandon it and no one will take his place. Also, the way it does footnotes is nonstandard. It certainly wouldn’t import into another program.

Lifelines

Lifelines is very powerful, although probably too difficult to use on a daily basis. One of the best parts of Lifelines is its ability to make completely custom reports. On the down side, the reports are written in a custom programming language, which makes it a bit of a hassle. Also, you cannot open GEDCOM files directly with Lifelines, you have to import them. So if I were to use Lifelines only to print reports (since it is pretty unusable as a tree editor), there would be a lot of extra steps involved every time.

Conclusion

Someday I plan to buy Reunion. Right now I am busy and poor, so spending that much money on something I don’t really have time to use seems silly. But with the tinkering I’ve done, I am convinced that it is the best.

I will also write a Python script to process GEDCOM files and make nice reports. Then when I want a complicated report, I can export the GEDCOM, run my script, and have something pretty to look at.


Installing AWStats and using it to build static statistic pages

I recently moved my academic website from the math department web server to the university central web servers because I was getting tired of all the weird problems I was running into.

I’ve never really cared too much about pageview statistics or anything, but with the recent shuffle, I wanted to at least see if there were a bunch of 404 errors due to something I overlooked. So I started looking at the logs, and considered putting together a simple parser to extract and compile the 404’s. Then I found AWStats which would do it for me, plus a bunch of other stats.

Setting up AWStats

Although AWStats is intended to be run as a cgi script, I didn’t want to. I was already going to have to tweak things because I don’t have root access, and there’s no point making things any more complicated than they need to be. And since my logs are only delivered to me once a day, there’s no point to being able to update my statistics on demand. Even if there were, I don’t think I see the point.

I began by following this great guide by George Notaras. Highlights:

  1. Get AWStats and unzip it into some directory (e.g. ~/local/awstats/)
  2. Create a data directory, for use by AWStats (e.g. ~/local/awstats/statdata)
  3. Create a directory where you will store the html files produced by AWStats (e.g. public_html/stats). You need to copy the icons folder from wwwroot to here.
  4. Rename the file wwwroot/cgi-bin/awstats.model.conf to awstats.mysite.conf (replacing mysite with whatever you want) and edit the configuration file. The configuration file needs to remain in the same directory as awstats.pl. The model configuration file gives you most of the information you need. Don’t forget to tell it where your log file is kept (LogFile), what format it is in (LogType), where the data directory you created lives (DirData) and the location of the icons directory (DirIcons), relative to where the HTML files will be.

Generating the reports

In the tools directory, there is a script called awstats_buildstaticpages.pl. Running this will optionally run awstats and then use the output to make all of the appropriate report pages.

Important options:

And watch out for that bug

Update: This bug has been fixed in more recent versions of AWStats.

Unfortunately, things didn’t work exactly like I hoped. Any time I used the -builddate option, the links in the reports didn’t work. This is the fault of awstats_buildstaticpages.pl.

What this script does is make repeated calls to awstats.pl, each with a different -output option, which creates the 10 or so different pages that awstats can generate.

When you specify a -builddate option, it formulates an option something like -staticlinks=mysite.201201 and passes it to awstats. The option should be -staticlinks=awstats.mysite.201201 instead.

If you feel at all comfortable with perl, you can fix the bug by inserting awstats. into the appropriate place. Or you could write a script to call awstats 10 times, essentially rewriting awstats_buildstaticpages.pl, but in your favorite language and simplified because you aren’t going to make it accept as many options. Or you could hope that the bug is fixed by the time you read this.


My computer setup

Here is my technology setup. Amy always makes fun of me because she thinks I change things around too often. “If the act of optimizing takes more time that the optimization saves, what’s the point?” she often asks. She has a good point. But I’ll continue to ignore it.

At home, I have a 2009 21-inch iMac. For the most part, I stick with the standard tools—I use Safari, Mail, iPhoto, Terminal, and Preview. For Latex, which I use to typeset everything math, I use BBEdit to edit and TeXShop to compile and preview. I use BibDesk to keep track of papers I read, and I read most papers in full-screen Preview, unless I’m reading on my iPad. I also use BBEdit for pretty much all the rest of my text-editing needs.

Some other apps that I love and use often: FaceTime, to talk to family, Caffeine, to keep my Mac awake, and Notational Velocity for small notes. I use Soulver as my calculator and Day One for keeping a journal. Oh, and 1Password for passwords.

On my iPad, I use PDF Expert to read and annotate papers. I use Instapaper, Twitter, and Reeder to keep up with goings-on in the world. I play Carcassone and Lab Solitaire. I use Day One for my journal, the Kindle app to read books, and I use Verbs for instant messaging, mostly with Amy.

I use my iPod touch to listen to podcasts and check my mail or twitter at times when the iPad would be inconvenient. I use TweetBot because I love its “View replies to this tweet” feature. I use OneBusAway when I take the bus. I use Evernote to take pictures of the whiteboard when it looks like it might be worth saving.


In honor of the 7 billionth baby; or, Not every billion is created equal

With all the hype that has been going on with the population of the world passing 7 billion people (sample headlines: “Cheers and fears as world population hits 7 billion” and “7 billionth baby: Congratulations are mixed with dire words”), I can’t help but add a little math to the conversation to make things interesting (and meaningful).

Before we talk math, you should know that part of why I care about this is that I take issue with the Malthusian prophecies and general spreading of fear. Many people believe if we don’t take drastic action (on population growth) soon, the fragile planet will collapse under the weight of its huge population. I think this is false, but also misguided. If we are worried about the planet’s resources—and I think we should be—we should be concentrating on limiting our aggregate effect on the environment, insetad of assuming that a smaller population will fix all (or any) of our problems.

I do not think that word means what you think it means

Population growth is exponential. It always has been and always will be. But “exponential” does not mean “huge”. It means that the change in population (number of births minus number of deaths) is a multiple of the current population. This is why we usually talk about population growth in terms of percentages instead of absolute growth.

Of course, when we see a big round number like 7 billion, the percentage talk goes out the window and we start comparing how long it took the world to reach each successive big round number. Did you know that population increased from 6 to 7 billion in less time that it increased from 4 to 5 billion? Therefore population is growing faster now than it was in the 60’s, right?

If we were talking about the number of tater tots produced in a factory, then adding a billion to 4 billion to get 5 billion is exactly the same as adding a billion to 6 billion to get 7 billion. But people aren’t produced by machines. We are the machines. So even though birth rates are lower today than in the 60’s, the population is growing faster. In other words, adding a billion to 4 billion (25%) is harder than adding a billion to 6 billion (17%).

Visualizing the data

Mathematicians would say that in the realm of exponential growth, 6 and 7 billion are closer together than 4 and 5 billion are. Then they would write down some formulas and theorems to indicate exactly what they mean by “closer together”. At the end of the process, you’d get what we call a log graph. This is just like any other graph, except we stretch out the y-axis (by applying the log function) to reflect our new understanding of distance. Here’s what you get.

population of the world, 1700-present

The graph on the bottom is the growth rate of the population. You see here that population growth peaked in the 60’s and has been decreasing since. You can see this in the population graph as an inflection point, where the graph starts to level out. In fact, the UN predicts that before the end of this century, the graph will hit its maximum (of about 10 billion) and then start to decrease slightly, finally stabilizing at about 9 billion. Of course, this is just an extrapolation of past trends, and no one knows how accurate these predictions will be.

The point I’m trying to make, though, is that it is hard to see these trends by looking at the population graph you usually see. Looking at those graphs, you would say that anyone who believes that world population is “leveling off” is way off track.

It is the people that treat population growth as a giant odometer that are not seeing things clearly.

The graph uses numbers from the U.S. Census Bureau and HYDE, from this wikipedia page. Also, tater tots were invented by my uncle.


Margin-free Latex for on-screen viewing

I’m spending some time typing up some recent research. Of course, it’s all very mathematical, so I am using Latex. Latex is a markup language, which means that you write the document in a plain text editor, using codes to indicate font changes or things like that. For example, to typeset some text in italics, instead of pushing a button in some program, you type \emph{word} to mean that word should be emphasized (italicized, it turns out). When you are ready to see your document, you run a program which reads in the text file and outputs a PDF file. It is very useful because it creates well-typeset documents and also has features to make typing math really easy.

The rest of this post is meant for people who already know how to use Latex.

One thing that has often bugged me about my Latex workflow is margins. Not margins in the actual printed document, which you don’t really want to change because that would make your lines too long to easily read. No, I’m talking about while I am working on the document, I have a preview window open so I can see what the thing will look like when it’s done, and 40% of the preview window is wasted to margins.

So here is how you remove margins without changing the line length: by changing the paper size. The easiest way is to use the geometry package, which comes standard with any modern Tex distribution. Just place the following in your preamble:

\usepackage[paperwidth=\textwidth + 50pt,
            paperheight=\textheight + 50pt,
            margin=25pt]{geometry}

Boom. Text the same width and height as before, but with tiny margins and a smaller page.


ASCII

I’ve always loved ASCII. As a kid, I spent a considerable amount of time studying the code chart that was printed in our Epson dot matrix printer’s user manual.

ASCII code chart

The one thing that I always sort of wondered, but never really asked myself, was “Why do they leave space between the uppercase letters and lowercase letters?” (I’m talking about [, \, ], ^, -, and `.) I thought it was a little annoying, actually, but I never questioned, because that was just the way it was.

I can’t believe that it is only now that I find out that they wanted the lowercase letters and uppercase letters to have only a one bit difference. For example, the code for N is 4E, and the code for n is 6E. In binary, then, N is 1001110 and n is 1101110. And if you want to change something to all caps? Just change that second 1 to a 0, and you are good.


Printing git information in Latex

What I wanted was a way to print git information in a Latex file in a way that (1) doesn’t modify the actual source and (2) degrades gracefully, that is, my document will still compile for someone else, even if they do not do things my way.

Setting up the Latex source

I start by putting the macro \RevisionInfo where I want it in my Latex source file. I also put a \providecommand command in the preamble to define the default value and ensure that the document compiles even when the git information is not given.

For example:

\documentclass{amsart}
\providecommand{\RevisionInfo}{}
...
\begin{document}
\maketitle
\RevisionInfo
...
\end{document}

Inserting the git information

With a little effort, you can coax git to output information about the most recent commit in the form you want. For example:

git log -1 --date=short --format=format:\
    '\newcommand{\RevisionInfo}{Revision %h on %ad}'

Then you get Latex to put this at the beginning of the source file as you are compiling:

latex $(git log -1 .....) \input{document.tex}

As I said, I only do this if I’m planning on printing or emailing the pdf. The nice thing is that if I’m working on the project with someone else, and they aren’t using git, it doesn’t matter. Everything still works just fine for them, except copies they compile don’t get commit information on them.

Since I use BBEdit to write most of my Latex, it is easy to make a script that will “Typeset inserting git info.”

In the time between when I figured this stuff out and I wrote this post, a package called gitinfo by Brent Longborough was posted on CTAN. It is almost exactly what I wanted to do, but in package form. It will also compile fine even when not inside the repository and it has the added benefit of being much more automatic (once you set it up). The downside is that whoever compiles it needs a copy of the gitinfo package.


What I want from a weather app

I am always a little disappointed when I look up the current temperature on the internet or a weather app. One number can only tell you so much about what’s going on outside. We try to make up for it by reporting the high and low temperature for the day, but there’s a lot more to a function than one data point plus two extreme values. Luckily the University of Washington records the temperature on the roof of the ATG every minute and allows you do download it in csv format. From there, a little messing with gnuplot makes it readable, and I really know what the temperature is doing. Here’s an example:

Current Weather

The Python script

The Python script downloads the last 12 hours worth of temperature readings from the University of Washington weather station. The readings are available as a csv file. The script then extracts the useful information from the csv file and converts the times into a format that gnuplot understands. Also, it deals with time zone issues. It then feeds the data through gnuplot to draw the graph and outputs the graph to the user. It also caches the graph to prevent unnecessary strain on my or the weather station’s server.

The gnuplot instruction

The main plot command is

plot "-" using 1:2 smooth bezier lt 3 lw 2 notitle

The "-" means the data file will be given on stdin, but you could also use a filename here. The using 1:2 tells it to use columns 1 and 2 for the x and y data, respectively. Then smooth bezier tells it to smooth the data instead of just connecting all the dots. Color is controlled by lt 3 and line weight by lw 2. Counterintuitively, notitle eliminates the key.

Here is the entire gnuplot code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
reset

# configure svg output
set term svg size 600 480 dynamic fname 'Helvetica'

# tell it that the x-axis represents time
set xdata time

# set the format of the data file
set timefmt "%Y-%m-%d-%H-%M"

# set the format of the axis labels
set format x "%l%p"

# display y-axis labels on the right side with gridlines
set y2tics border
set noytics
set grid y2tics

# axis labels and plot title
set xlabel "Time"
set ylabel "degrees Fahrenheit"
set title "Last 12 hours temperature at UW weather station"

# draw the plot
plot "-" using 1:2 smooth bezier lt 3 lw 2 notitle

Private key authentication for ssh using ssh-keygen

Private key authentication is a way to log into another computer via SSH, and is an alternative to the username/password authentication. It can be more secure, because no one will ever guess your private key, and your private key is never sent over the network, so it cannot be intercepted. It can also be more convenient, because if you don’t assign a password to the private key, you don’t have to type a password to use it.

I create a separate key pair for each computer I use, so that I can always adjust which computers are allowed to log into which computer. I always forget how the ssh-keygen command works, though, and that is the main reason I’m writing this down.

Creating a key pair

The command you want to use is

ssh-keygen -t rsa -b 2048 -C comment

The first two options may be unnecessary because on my computer they are the default values. On at least one of the servers I use, however, they are required. The comment is also unnecessary, but helpful.

Using the keys

If you want to use this key to connect to another computer, that computer needs to have a copy of your public key, usually stored in the file ~/.ssh/authorized_keys.

Once I create a keypair for each computer I use, I copy all the public keys into a subdirectory of ~/.ssh that I call authorized_keys.d. It helps to give each key a more useful name like iMac.pub or office.pub. Then I run

cat authorized_keys.d/* > authorized_keys

Repeat for each host that you want to connect for. The good thing is, if I want to authorize (or unauthorize) another computer, I just add (or remove) the new public key to the directory and rerun this command.


What you should know about keeping your passwords safe

If you want to safely guard your passwords, you should first understand how your password could be “stolen” or discovered. Here are some scenarios.

You tell someone.

Oops. Either you actually tell them (be careful who you trust) or you enter it on a phishing site or respond to an email (don’t do it!).

What you can do: protect your passwords by never telling anyone, for any reason. Minimize the potential damage by using different passwords for different sites.

Someone guesses your password.

Maybe they try your phone number or your birthday or something else that they know about you.

What you can do: try to choose passwords that aren’t about you. Choose random words from the dictionary. If your brother could guess in 5 tries what your password is (or all but one letter of your password), then you should use a different password. Not just because your brother might one day try to steal your identity, but because if he knows something about you, then your Facebook friends can probably do too.

Someone steals your password over wireless internet.

There are two main kinds of encryption happening when you use wireless internet. First: if you are visiting a “secure” site, the kind where the URL starts with https, then the stuff you send is encrypted from the moment it leaves your computer until it is received by Google’s or your bank’s computer. Big companies (Facebook, Google, Microsoft, Amazon, your bank) will at the very least make sure your password is sent in this secure method. Often they will encrypt everything you send or receive. Smaller websites may not.

The second encryption happens when you are using secured wireless, the kind where you have to enter a password. In this case everything you do is encrypted from the your computer to the wireless access point.

If you are using unsecured wireless and entering your password into an unsecured site, then anybody on the same wireless network as you could be running a program that intercepts your password and steals it.

What you can do: Don’t mix passwords. If you can’t use a different password for everything, you should at least not mix important passwords (which are likely to be safe by method one) with less important passwords. If you use the same password to log into your bank or email as you do to log into some Harry Potter fan site, you are asking for trouble.

Someone hacks into one of the websites you use and discovers your password.

This is much less likely to be a problem for reputable websites for many reasons.

What you can do: Again, don’t mix passwords. If you are dead-set on using the same password for everything, possibly changing the last number at each website just to make things slightly different, at least increase your password pool to two. Use one password for your bank and email and the other for everything else.

Note: I’m not actually recommending this. I’m saying this is the least you should do.

Summary

Use a complicated password that no one can guess. Make it kind of random, not about you. If they let you, make it a phrase, like “trees eat ice cream.” This is easy to remember, easy to type, and much harder to guess than “(your-middle-name)2!”.

Use different passwords for different places. Even if you have to write it down somewhere. Use 1Password or something similar to keep track of your passwords. Or if you’d rather, write them in a notebook that you keep in that locked desk drawer that you never knew what the lock was for.


Using Unison to sync files

I have been using Unison to sync files for the past several years. It does a great job, and can sync between Windows, OS X, and Linux computers. Of course, nowadays you can also use Dropbox for this sort of thing, if you don’t mind the space constraints and security issues. Allway sync was once my favorite sync program, but it only syncs Windows machines. It took a bit to get Unison going, and I never got the GUI to work, but for the past 3 years it has synched my files both ways without any problems. I have always used these binaries. If you are going to be synching from one computer to another, you will need to install the same version of Unison on both machines. It syncs via ssh, and only sends the pieces of the files that have changed. I always run unison from the command line (usually through a LaunchAgent), as follows:

unison -options /local/folder ssh://remote.host/path

The options I use are


Creating private links to files using Python

For a while I’ve been wanting to create a private link system. Google Docs, Dropbox, YouTube, and others all give you the option to make a file public but “unlisted,” with a long link that no one will likely guess. You can email the link to others, and no one has to worry about usernames or passwords. This week I implemented a rudimentary system as a Python cgi script.

Schematic

Each file is assigned an id. The ids and corresponding filenames are stored in a text file. When a user requests and id, the Python script checks if the id is in the table, and, if so, serves up the appropriate file. If the id does not have a corresponding file, the user gets an error message.

The id

You can use anything you want here, really. I use a 10-byte id encoded in base 32 as a length-16 string. You could really use a shorter id and still be okay. The nice thing about base 32 is that it is URL safe, and it doesn’t use 0’s, 1’s or 8’s, to avoid confusion with O’s, I’s, and B’s. You can generate an id using the following code:

import os,base64
id = base64.b32encode(os.urandom(10))

I store the ids in a text file that looks something like this

NRTDBP5QYKN3WGYP some-file.pdf
WMADW3QOSHSCATWY another-file.pdf
UEGGUKOMB5FXWNR2 a third file.pdf

Serving up the file

As with any cgi script, you just need to print everything to stdout, starting with the headers. The headers I want to use are

Content-Type: application/pdf;
Content-Disposition: inline; filename="name of file.pdf";
Content-Length: (size of file in bytes);

You can replace “inline” with “attachment” if you want the browser to download the file instead of displaying it in the browser. Don’t forget the quotes around the file name if it has any spaces or special characters in it. Also, don’t forget to send a blank line after the headers and before sending the content. Then you finish it off with

print file.read()

The script is here: private-link.py

A little mod_rewrite

So far, the user needs to enter a URL in the form http://example.com/?id=NRTDBP. With the help of mod_rewrite, we can accept URLs like http://example.com/NRTDBP. Here is the relevant .htaccess file, taking into account that the Python script is named index.cgi.

RewriteEngine On
RewriteBase /path/to/folder/
RewriteRule ^index.cgi - [L]
RewriteRule ^([A-Z0-9a-z]+)/?$ index\.cgi?id=$1 [L]

If you are confused about the last line, here some help on regular expressions.


Managing my library books with a Python script

(updated )

[As of the most recent upgrade of the Seattle Public Library’s website, you can no longer access your checkouts or holds by RSS, so this no longer works. Sad.]

Seattle Public Library

When I was a kid, my mom used to save all of the receipts from the library and when it was time to take the books back, we would check each one off to make sure none were left behind. Nowadays, you can just check the library website, but that can get tedious: log into my account, find out which books I have checked out, find out which books are on hold, long out of my account, log into my wife’s account, repeat. And soon my kids will have accounts too? So much clicking! Ahh! Luckily, the Seattle Public Library offers both your holds list and your checked-out list in RSS/XML format. It was not hard to write a script to download the RSS file, extract the useful information, and display it nicely. For a long time, I ran this once a day using a LaunchAgent on my home computer. This was inefficient, so I finally decided I should understand how cgi scripting works, because up till now php was the only web scripting I had done. Of course, I was embarrassed at how easy cgi scripting really is.

The Python script

The script uses Feed Parser to parse the RSS, which makes things easy. The main idea is this:

feed = feedparser.parse("http://example.com/feed/")
booklist = feed.entries
for book in booklist:
    print book.title    #the title of the RSS entry
    print book.summary  #the summary of the RSS entry

Other than that, the script is doing some basic extraction using str.find and some list sorting.

Making it work as a cgi

This program is the simplest possible cgi script, because it requires no input. The idea behind cgi is that everything that the program outputs is served to the user. The only thing you have to do is begin your output with an html header like this:

print "Content-Type: text/html; charset=UTF-8\n"

Remember that your header should be followed by a blank line, as above. Of course, you should also be careful about catching errors so they aren’t inserted into the html. The script is here: library.cgi