I’m spending some time typing up some recent research. Of course, it’s all very mathematical, so I am using Latex. Latex is a markup language, which means that you write the document in a plain text editor, using codes to indicate font changes or things like that. For example, to typeset some text in italics, instead of pushing a button in some program, you type \emph{word} to mean that word should be emphasized (italicized, it turns out). When you are ready to see your document, you run a program which reads in the text file and outputs a PDF file. It is very useful because it creates well-typeset documents and also has features to make typing math really easy.
The rest of this post is meant for people who already know how to use Latex.
One thing that has often bugged me about my Latex workflow is margins. Not margins in the actual printed document, which you don’t really want to change because that would make your lines too long to easily read. No, I’m talking about while I am working on the document, I have a preview window open so I can see what the thing will look like when it’s done, and 40% of the preview window is wasted to margins.
So here is how you remove margins without changing the line length: by changing the paper size. The easiest way is to use the geometry package, which comes standard with any modern Tex distribution. Just place the following in your preamble:
I’ve always loved ASCII. As a kid, I spent a considerable amount of time studying the code chart that was printed in our Epson dot matrix printer’s user manual.
The one thing that I always sort of wondered, but never really asked myself, was “Why do they leave space between the uppercase letters and lowercase letters?” (I’m talking about [, \, ], ^, -, and `.) I thought it was a little annoying, actually, but I never questioned, because that was just the way it was.
I can’t believe that it is only now that I find out that they wanted the lowercase letters and uppercase letters to have only a one bit difference.
For example, the code for N is 4E, and the code for n is 6E. In binary, then, N is 1001110 and n is 1101110. And if you want to change something to all caps? Just change that second 1 to a 0, and you are good.
What I wanted was a way to print git information in a Latex file in a way that (1) doesn’t modify the actual source and (2) degrades gracefully, that is, my document will still compile for someone else, even if they do not do things my way.
Setting up the Latex source
I start by putting the macro \RevisionInfo where I want it in
my Latex source file. I also put a \providecommand command in the
preamble to define the default value and ensure that the document
compiles even when the git information is not given.
With a little effort, you can coax git to output information about the most recent commit in the form you want. For example:
git log -1 --date=short --format=format:\
'\newcommand{\RevisionInfo}{Revision %h on %ad}'
Then you get Latex to put this at the beginning of the source file as you are compiling:
latex $(git log -1 .....)\input{document.tex}
As I said, I only do this if I’m planning on printing or emailing the pdf. The nice thing is that if I’m working on the project with someone else, and they aren’t using git, it doesn’t matter. Everything still works just fine for them, except copies they compile don’t get commit information on them.
Some Related Ideas
Since I use BBEdit to write most of my Latex, it is easy to make a script that will “Typeset inserting git info.”
In the time between when I figured this stuff out and I wrote this post, a
package called gitinfo by Brent Longborough was posted on CTAN. It is
almost exactly what I wanted to do, but in package form. It will also compile
fine even when not inside the repository and it has the added benefit of being
much more automatic (once you set it up). The downside is that whoever compiles
it needs a copy of the gitinfo package.
I am always a little disappointed when I look up the current temperature
on the internet or a weather app. One number can only tell you so much
about what’s going on outside. We try to make up for it by reporting the
high and low temperature for the day, but there’s a lot more to a
function than one data point plus two extreme values. Luckily the
University of Washington records the temperature on the roof of the ATG
every minute and allows you do download it in csv format. From there, a
little messing with gnuplot makes it readable, and I really know what
the temperature is doing. Here’s an example:
The Python script
The Python script downloads the last 12 hours worth of temperature
readings from the University of Washington weather station. The readings
are available as a csv file. The script then extracts the useful
information from the csv file and converts the times into a format that
gnuplot understands. Also, it deals with time zone issues.
It then feeds the data through gnuplot to draw the
graph and outputs the graph to the user. It also caches the graph to
prevent unnecessary strain on my or the weather station’s server.
The gnuplot instruction
The main plot command is
plot"-"using1:2smoothbezierlt3lw2notitle
The "-" means the data file will be given on stdin, but you could also use a filename here. The using 1:2 tells it to use columns 1 and 2 for the x and y data, respectively. Then smooth bezier tells it to smooth the data instead of just connecting all the dots. Color is controlled by lt 3 and line weight by lw 2. Counterintuitively, notitle eliminates the key.
reset# configure svg outputsettermsvgsize600480dynamicfname'Helvetica'# tell it that the x-axis represents timesetxdatatime# set the format of the data filesettimefmt"%Y-%m-%d-%H-%M"# set the format of the axis labelssetformatx"%l%p"# display y-axis labels on the right side with gridlinessety2ticsbordersetnoyticssetgridy2tics# axis labels and plot titlesetxlabel"Time"setylabel"degrees Fahrenheit"settitle"Last 12 hours temperature at UW weather station"# draw the plotplot"-"using1:2smoothbezierlt3lw2notitle
Private key authentication is a way to log into another computer via SSH, and is an alternative to the username/password authentication. It can be more secure, because no one will ever guess your private key, and your private key is never sent over the network, so it cannot be intercepted. It can also be more convenient, because if you don’t assign a password to the private key, you don’t have to type a password to use it.
I create a separate key pair for each computer I use, so that I can always adjust which computers are allowed to log into which computer. I always forget how the ssh-keygen command works, though, and that is the main reason I’m writing this down.
Creating a key pair
The command you want to use is
ssh-keygen -t rsa -b 2048 -C comment
The first two options may be unnecessary because on my computer they are the default values. On at least one of the servers I use, however, they are required. The comment is also unnecessary, but helpful.
Using the keys
If you want to use this key to connect to another computer, that computer needs to have a copy of your public key, usually stored in the file
~/.ssh/authorized_keys.
Once I create a keypair for each computer I use, I copy all the public keys into a subdirectory of ~/.ssh that I call authorized_keys.d. It helps to give each key a more useful name like iMac.pub or office.pub. Then I run
cat authorized_keys.d/* > authorized_keys
Repeat for each host that you want to connect for. The good thing is, if I want to authorize (or unauthorize) another computer, I just add (or remove) the new public key to the directory and rerun this command.
If you want to safely guard your passwords, you should first understand
how your password could be “stolen” or discovered. Here are some
scenarios.
You tell someone.
Oops. Either you actually tell them (be careful who you trust) or you enter it on a phishing site or respond to an email (don’t do it!).
What you can do: protect your passwords by never telling anyone, for any reason. Minimize the potential damage by using different passwords for different sites.
Someone guesses your password.
Maybe they try your phone number or your birthday or something else that they know about you.
What you can do: try to choose passwords that aren’t about you. Choose random words from the dictionary. If your brother could guess in 5 tries what your password is (or all but one letter of your password), then you should use a different password. Not just because your brother might one day try to steal your identity, but because if he knows something about you, then your Facebook friends can probably do too.
Someone steals your password over wireless internet.
There are two main kinds of encryption happening when you use wireless internet. First: if you are visiting a “secure” site, the kind where the URL starts with https, then the stuff you send is encrypted from the moment it leaves your computer until it is received by Google’s or your bank’s computer. Big companies (Facebook, Google, Microsoft, Amazon, your bank) will at the very least make sure your password is sent in this secure method. Often they will encrypt everything you send or receive. Smaller websites may not.
The second encryption happens when you are using secured wireless, the kind where you have to enter a password. In this case everything you do is encrypted from the your computer to the wireless access point.
If you are using unsecured wireless and entering your password into an unsecured site, then anybody on the same wireless network as you could be running a program that intercepts your password and steals it.
What you can do: Don’t mix passwords. If you can’t use a different password for everything, you should at least not mix important passwords (which are likely to be safe by method one) with less important passwords. If you use the same password to log into your bank or email as you do to log into some Harry Potter fan site, you are asking for trouble.
Someone hacks into one of the websites you use and discovers your password.
This is much less likely to be a problem for reputable websites for many reasons.
What you can do: Again, don’t mix passwords. If you are dead-set on using the same password for everything, possibly changing the last number at each website just to make things slightly different, at least increase your password pool to two. Use one password for your bank and email and the other for everything else.
Note: I’m not actually recommending this. I’m saying this is the least you should do.
Summary
Use a complicated password that no one can guess. Make it kind of random, not about you. If they let you, make it a phrase, like “trees eat ice cream.” This is easy to remember, easy to type, and much harder to guess than “(your-middle-name)2!”.
Use different passwords for different places. Even if you have to write it down somewhere. Use 1Password or something similar to keep track of your passwords. Or if you’d rather, write them in a notebook that you keep in that locked desk drawer that you never knew what the lock was for.
I have been using Unison to
sync files for the past several years. It does a great job, and can sync
between Windows, OS X, and Linux computers. Of course, nowadays you can
also use Dropbox for this sort of thing, if you
don’t mind the space constraints and security issues. Allway
sync was once my favorite sync program, but it
only syncs Windows machines. It took a bit to get Unison going, and I
never got the GUI to work, but for the past 3 years it has synched my
files both ways without any problems. I have always used these
binaries. If you are
going to be synching from one computer to another, you will need to
install the same version of Unison on both machines. It syncs via ssh,
and only sends the pieces of the files that have changed. I always run
unison from the command line (usually through a LaunchAgent), as
follows:
-perms 960 This mask is applied to permissions of everything. Note
960=0o1700, so in my case I am making sure that my local files
(which are usually world-readable by default) are only readable by
me on the server.
-auto syncs without asking, unless there are conflicts
-addversionno calls unison-40 instead of unison on the remote
server. I need this because the remote server has a really old
version of unison installed.
-batch -silent Ignores conflicts completely, instead of asking the
user, and prints no output. I only use these in the automated
version that runs once an hour. I rarely (less than twice a year) have
conflicts.
For a while I’ve been wanting to create a private link system. Google
Docs, Dropbox, YouTube, and others all give you the option to make a
file public but “unlisted,” with a long link that no one will likely
guess. You can email the link to others, and no one has to worry about
usernames or passwords. This week I implemented a rudimentary system as
a Python cgi script.
Schematic
Each file is assigned an id. The ids and corresponding filenames are
stored in a text file. When a user requests and id, the Python script
checks if the id is in the table, and, if so, serves up the appropriate
file. If the id does not have a corresponding file, the user gets an
error message.
The id
You can use anything you want here, really. I use a 10-byte id encoded
in base 32 as a length-16 string. You could really use a shorter id and
still be okay. The nice thing about base 32 is that it is URL safe, and
it doesn’t use 0’s, 1’s or 8’s, to avoid confusion with O’s, I’s, and
B’s. You can generate an id using the following code:
I store the ids in a text file that looks something like this
NRTDBP5QYKN3WGYP some-file.pdf
WMADW3QOSHSCATWY another-file.pdf
UEGGUKOMB5FXWNR2 a third file.pdf
Serving up the file
As with any cgi script, you just need to print everything to stdout,
starting with the headers. The headers I want to use are
Content-Type: application/pdf;
Content-Disposition: inline; filename="name of file.pdf";
Content-Length: (size of file in bytes);
You can replace “inline” with “attachment” if you want the browser to
download the file instead of displaying it in the browser. Don’t forget
the quotes around the file name if it has any spaces or special
characters in it. Also, don’t forget to send a blank line after the
headers and before sending the content. Then you finish it off with
So far, the user needs to enter a URL in the form
http://example.com/?id=NRTDBP. With the help of mod_rewrite, we can
accept URLs like http://example.com/NRTDBP. Here is the relevant
.htaccess file, taking into account that the Python script is named
index.cgi.
[As of the most recent upgrade of the Seattle Public Library’s
website, you can no longer access your checkouts or holds by RSS, so
this no longer works. Sad.]
When I was a kid, my mom used to save all of the receipts from the library
and when it was time to take the books back, we would check each one off
to make sure none were left behind. Nowadays, you can just check the
library website, but that can get tedious: log into my account, find out
which books I have checked out, find out which books are on hold, long
out of my account, log into my wife’s account, repeat. And soon my kids
will have accounts too? So much clicking! Ahh! Luckily, the Seattle
Public Library offers both your holds list and your checked-out list in
RSS/XML format. It was not hard to write a script to download
the RSS file, extract the useful information, and display it nicely. For
a long time, I ran this once a day using a LaunchAgent on my home computer. This was inefficient, so I finally decided I should understand how cgi scripting
works, because up till now php was the only web scripting I had done. Of
course, I was embarrassed at how easy
cgi scripting really is.
The Python script
The script uses Feed Parser to parse the
RSS, which makes things easy. The main idea is this:
feed=feedparser.parse("http://example.com/feed/")booklist=feed.entriesforbookinbooklist:printbook.title#the title of the RSS entryprintbook.summary#the summary of the RSS entry
Other than that, the script is doing some basic extraction using
str.find and some list sorting.
Making it work as a cgi
This program is the simplest possible cgi script, because it requires no
input. The idea behind cgi is that everything that the program outputs
is served to the user. The only thing you have to do is begin your
output with an html header like this:
print"Content-Type: text/html; charset=UTF-8\n"
Remember that your header should be followed by a blank line, as above.
Of course, you should also be careful about catching errors so they aren’t
inserted into the html. The script is here:
library.cgi
The computer people just installed an iMac in my office to
replace a very old computer that was running Ubuntu. Unfortunately, they
installed it with a run-of-the-mill keyboard that has the Alt key next
to the space bar and the Windows/Super/Command key between Ctrl and
Alt. My brain can’t handle it, so I started searching for a keyboard
remapper. Eventually I discovered that the ability to remap the modifier
keys is built in to Mac OS X. You just go to System Preferences ->
Keyboard and click “Modifier Keys.”