I have a Python script that downloads OFX files
from each of my banks and credit cards.
For a long time, I have been intending to make the HTTP requests multithreaded,
since it is terribly inefficient to wait for one response to arrive before
sending the next request.
Here is the single-threaded code block I was working with.
1
2
3
4
5
6
7
8
9
10
11
defReadOfx(accounts):downloaded=[]foraccountinaccounts:try:account.AddOfx(read_ofx.Download(account))excepturllib.error.HTTPErroraserr:print("Unable to download {}: {}".format(account,err))else:downloaded.append(account)returndownloaded
Using the Python 2.7 standard library, I would probably use either the
threading module or multiprocessing.pool.ThreadPool.
In both cases, you can call a function in a separate thread but you cannot
access the return value. In my code, I would need to alter Download
to take a second parameter and store the output there. If the second parameter
is shared across multiple threads, I have to worry about thread safety.
Doable, but ugly.
In Python 3.2 an higher, the concurrent.futures module
makes this much easier. (It is also backported to Python 2.)
Each time you submit a function to be run on a separate thread, you get a Future
object. When you ask for the result, the main thread blocks until your thread is
complete. But the main benefit is that I don’t have to make any changes to
Download.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Among other imports, we have `from concurrent import futures`.defReadOfx(accounts):withfutures.ThreadPoolExecutor(max_workers=10)asex:ofx_futures=[(account,ex.submit(read_ofx.Download,account))]print("Started {} downloads".format(len(ofx_futures)))downloaded=[]foraccount,futureinofx_futures:try:account.AddOfx(future.result())excepturllib.error.HTTPErroraserr:print("Unable to download {}: {}".format(account,err))else:downloaded.append(account)returndownloaded
In a typical run, my 6 accounts take 3, 4, 5, 6, 8, and 10 seconds to
download. Using a single thread, this is more than 30 seconds. Using multiple
threads, we just have to wait 10 seconds for all responses to arrive.
I have been using IPython for interactive Python shells for several
years.
For most of that time, I have resisted the web-browser-based
notebook interface and mainly used the console version.
Despite my love of all things texty, I
finally gave in, and began using the web version almost exclusively.
So much that I got annoyed at constantly needing to start and stop the IPython
server and having a terminal dedicated to running it.
Always running server using Launchd
My first step was to always keep the IPython server running.
I did this with a KeepAlive launchd job.
Here is the plist:
This job runs ipython notebook with the --port flag, so that the port
stays the same each time.
I used LaunchControl to create and load this launch agent,
but you can also just save it in ~/Library/LaunchAgents and run launchctl load.
If you want, you can be done now. The notebook browser is running at
http://localhost:10223.
Virtual host and proxy using Apache
But I was not done, because I already had too many processes on my machine that
were serving content at some localhost port. This required me to memorize port
numbers, made Safari’s autocorrect not very useful, and felt barbaric.
What I needed was a domain name that resolved to http://localhost:10223.
To do this, I needed a virtual host and a proxy.
Before reading further, you should know that
I am not an Apache expert.
In fact, I have never managed an Apache webserver
except as a hobby. The best I can promise you is that this works for me, on my
OS X computer, for now.
In /etc/hosts, I created a new host called py.
127.0.0.1 py
This resolves py to 127.0.0.1, i.e., localhost.
Now in /etc/apache2/httpd.conf I created a virtual host and a proxy.
This forwards all traffic to py on port 80 to
localhost on port 10223.
Note that the order of the ProxyPass directives is
apparently important.
Also, if you use * instead of the address in the
VirtualHost directive, you might also be forwarding requests originating
outside of your machine, which sounds dangerous.
Then I ran sudo apachectl restart, and everything seemed to work.
Note that Safari interprets py as a Google search, so I have to type py/.
Chrome does the same thing, except for that after I load py/ once,
the trailing slash is optional.
Apparently I wrote my own. It’s nothing complicated, but it is full featured.
It parses the first line to get the starting line number, then replaces any
existing line numbers with the correct ones.
My script changes
9: a
b
c
to
9: a
10: b
11: c
and
1: a
10: b
11: c
to
1: a
2: b
3: c
It can even remove line numbers if the first line starts with “X:”, so it changes
X: a
2: b
3: c
to
a
b
c
I save it as ~/bin/numberlines, so I can use it on the command line
or by selecting lines and typing ':!numberlines in Vim.
#!/usr/bin/python"""
Number lines on a block of text, preserving indentation.
Allow the first line to indicate start number.
If the first lines starts with 'X:' then remove line numbers.
"""importreimportsysLINE_NO=re.compile(r"^\s*(\d*|[Xx]*):( |$)")lines=sys.stdin.readlines()first_line_number=LINE_NO.match(lines[0])# Determine indentation.indent=min((re.match(" *|\t*",line).group(0)forlineinlinesifline.strip()),key=len)iffirst_line_numberandfirst_line_number.group(1)[0]in"Xx":formatter="{indent}{line}"# These are irrelevant.start=1padding=0else:formatter="{indent}{number:>{padding}}: {line}"start=int(first_line_number.group(1))iffirst_line_numberelse1padding=len(str(start+len(lines)))fori,lineinenumerate(lines):line=LINE_NO.sub("",line[len(indent):])ifnotline:line="\n"sys.stdout.write(formatter.format(indent=indent,number=start+i,padding=padding,line=line))
My favorite shell, zsh, allows you to define a function without assigning
it a name. This turns out to be incredibly useful, and for none of the reasons
that anonymous functions are usually used.
Let’s say I want to copy four or five files from one place to another, but in a
more complicated way than the standard cp command allows.
Obviously, you can just run these commands by using your shell’s history and
editing each time. But editing is hard, especially in the middle of a line and
in more than one place per line.
One way to solve this is with a for loop:
$ for d in 201405082014061020140731 20140802;do> cp $d/old.txt $d/new.txt
> done
But this is not very flexible. If one of the commands fails, you
will probably end up with some copies completed and others not. After fixing the
problem, you will have to remove the copies that succeeded from the for loop.
Another way to solve this problem is write a function and then use it.
(Note that in bash you need to use three lines to define the function.)
This solves the problem of a single command failing, since you can fix it up,
rerun the failed command, and continue along. But it also turns a one-step
process into two. If I later find I need to do two more
copies, I have probably opened a new shell, so I have to first redefine
mycp before reusing it.
Each time you run the command, you just have to find the previous command in
your shell history and edit the final word. If you need to use this same command
tomorrow, you can search your shell history, and again you only need to edit the
final word.
This week Dr. Drang wrote about a useful feature in recent versions of
OS X’s Terminal.app,
which allows you to display a man page in its own special window.
This is especially useful when you want to look something up but you are in the
middle of typing a long command.
It’s not polite to respond to a good solution to a common problem
by claiming to have
a better solution, and even less so when the better solution requires you to
change your shell. But this is the internet, so here goes.
When I need to look at a man page while writing a command, I use zsh’s
push-lineediting command.
This clears the prompt and waits for you to type something else.
After executing this new command, it restores your original prompt.
The nice thing about this is that it is useful beyond just looking at manuals.
Often while typing some command,
I realize I need a quick mkdir or cd or even ls before I’m ready to
execute.
You can bind push-line to a key (I use Ctrl-B) by putting bindkey '^B' push-line
in your .zshrc.
Even better, you can use push-line-or-edit to get the same behavior
with one very useful addition. Normally, if you are typing a continuation line
of a multi-line command, you cannot make changes to prior lines.
But push-line-or-edit redraws the lines as a single block of text,
which allows you to edit anything you have typed so far.
When I started using a script to add items to my TaskPaper file,
I was a little worried about the script making changes to my file while it
was open in TaskPaper. So I used TaskPaper’s preference to save
my files every five seconds, and nothing bad happened for a while.
Then I started seeing corrupted files. It seems like OS X autosave is doing
something weird. If I poke at it, I can get parts of the file go missing,
or sometimes a dialog box pops up to complain. But everything works fine as
long as I do an actual “⌘S” save.
To prevent corruption, I added
a few lines to my shell script, which use AppleScript to save my
TaskPaper file before making the changes.
I use pgrep to check if TaskPaper is running, and a
heredoc to send the text of the script to the osascript binary.
if pgrep TaskPaper > /dev/null;then/usr/bin/osascript << EOM
tell application "TaskPaper"
repeat with Doc in documents whose name is "tasks.taskpaper"
save Doc
end repeat
end tell
EOMfi
(It is so much easier to embed AppleScript in a bash script than the other
way around.)
The most widely read post on this site is my 2012 post on scheduling tasks
using launchd. But my knowledge of launchd is limited to my
experience. In particular, I was mistaken about how to set up a task when your
computer has multiple accounts.
(For many years, my wife and I shared an account, mostly because it’s still so
difficult to switch between
accounts and properly share files. But now, with iPhones and
iCloud, it’s even more painful to share an account, so we finally split things
up.)
In my post, I wrote:
If you have multiple users and need something to
run no matter who is logged in, you should look into putting it in
/Library/LaunchAgents.
But this isn’t quite right. For system-wide jobs, there are two
folders that can
contain your Launch Agent plists: /Library/LaunchAgents and
/Library/LaunchDaemons.
The difference is that system-wide Launch Agents
run exactly like per-user
Launch Agents, except that they run once for each user. If you have two users
logged in, the system will run two instances of the Launch Agent job.
Each job will run with that user’s permissions. (This may actually
cause problems. For example, if you need to write to a file, you must use a
different file for each user or use a file that is world-writable.)
Launch Daemons, on the other hand, spawn a single instance, regardless of who is
or is not logged in. By default, these run with root permissions (be careful!),
although you can (and almost always should) customize this with the UserName key.
Here’s my new favorite way to get tasks into TaskPaper.
It’s a combination of Drafts, Dropbox,
launchd, a Python script, and
a shell script.
That sounds convoluted, but once each piece of the pipeline
is in place, I just enter one or more tasks into Drafts on my phone,
and three seconds later, it is in my TaskPaper file on my Mac.
It’s like iCloud, but without the mystery.
Merge new tasks into TaskPaper
I wrote a Python script to insert new tasks in the proper place
in my TaskPaper file. Since TaskPaper files are just plain text, this is not too
complicated.
My script reads in a text file and interprets each line as a new task. If the
task has a project tag, it removes the tag, and then it groups the tasks by
project. Anything without a project is assumed to be in the inbox. Next, it
reads my main TaskPaper file, and figures out where each project begins and
ends. Finally, it inserts each new task at the end of the appropriate project.
A shell script calls the Python script with the correct arguments, merging
my inbox.txt file into my tasks.taskpaper file, and deleting the
now-redundant inbox.txt file. Update: To avoid corrupting
my TaskPaper file, I use some AppleScript within this shell script
to first save the file if it is open.
(Of course, the Python script could have done these last steps also, but it’s much
better to make the Python script generic, so I can use it for other purposes.)
Watch inbox for changes
The next step is to automate the merging. This is where OS X’s launchd
is useful. One solution would be to run the shell script on some kind of timed
interval. But launchd is smarter than that.
Using the WatchPaths key, I can have the shell script run whenever my inbox.txt
file is modified.
Since OS X keeps an eye on all filesystem changes, this actually
has a very low overhead and means that my shell script will be run within seconds
of any modifications to inbox.txt.
Here is my Launch Agent definition, stored in a plist file in ~/Library/LaunchAgents.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"><plistversion="1.0"><dict><key>Label</key><string>net.nathangrigg.taskpaper-merge-inbox</string><key>Program</key><string>/Users/grigg/bin/taskpaper_merge_inbox.sh</string><key>StandardErrorPath</key><string>/Users/grigg/Library/Logs/LaunchAgents/taskpaper_merge_inbox.log</string><key>StandardOutPath</key><string>/Users/grigg/Library/Logs/LaunchAgents/taskpaper_merge_inbox.log</string><key>WatchPaths</key><array><string>/Users/grigg/Dropbox/Tasks/inbox.txt</string></array></dict></plist>
Drafts and Dropbox
With the hard work out of the way, I just define a custom Dropbox action in Drafts
that appends text to inbox.txt in my Dropbox folder.
With no fuss, Drafts sends the new task or tasks off to Dropbox, which dutifully
copies them to my Mac, which springs into action, merging them into my TaskPaper
file.
With so many applications and services fighting to be the solution to all of our
problems, it is refreshing to see tools that are happy solving their portion
of a problem and letting you go elsewhere to solve the rest.
I use Time Machine to back up my home iMac to a USB external hard drive.
But I don’t want the Time Machine volume mounted all of the time.
It adds clutter and slows down Finder.
I’ve been using a shell script and a Launch Agent to automatically mount
my Time Machine volume, back it up, and unmount it again.
Since this takes care of running Time Machine, I have Time Machine turned off
in System Preferences.
Shell script
The shell script used to be more complicated, but Apple has been been
improving their tools. You could actually do this in three commands:
Mount the volume (line 6).
Start the backup (line 14). The --block flag prevents the command from
exiting before the backup is complete.
Eject the volume (line 16).
Everything else is either logging or to make sure that I only eject the volume
if it wasn’t mounted to begin with. In particular, line 4 checks if the Time
Machine volume is mounted at the beginning.
Nothing complicated here. This uses launchd
to run the shell script every two hours
and capture the output to a log file.
I save this as “net.nathangrigg.time-machine.plist” in “/Library/LaunchDaemons”,
so that it is run no matter who is logged in. If you do this, you need to use
chown to set the owner to root, or it will not be run.
If you are the only one that uses your computer, you can just save it in
“~/Library/LaunchAgents”, and you don’t have to worry about changing the owner.
Either way, run launchctl load /path/to/plist to load your agent for the first time.
(Otherwise, it will load next time you log in to your computer.)
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"><plistversion="1.0"><dict><key>Label</key><string>net.nathangrigg.time-machine</string><key>Program</key><string>/Users/grigg/bin/time-machine.sh</string><key>StandardErrorPath</key><string>/Users/grigg/Library/Logs/LaunchAgents/time-machine.log</string><key>StandardOutPath</key><string>/Users/grigg/Library/Logs/LaunchAgents/time-machine.log</string><key>StartInterval</key><integer>7200</integer></dict></plist>
Fstab
OS X will still mount your Time Machine volume every time you log in.
You can fix this by adding one line to “/etc/fstab” (which you may
need to create).
Replace the UUID with your drive’s UUID, which you can find using
diskutil info "/Volumes/Time Machine Backups". For more detailed instructions,
see this article by Topher Kessler.
Launchd is a Mac OS X job scheduler, similar to cron.
One key advantage is that if your computer is asleep at a job’s scheduled time,
it will run the job when your computer wakes up.
LaunchControl is a Mac app by soma-zone that helps manage
launchd lobs. It aims to do “one thing well” and succeeds spectacularly.
Whether you are new to writing launchd agents or you already have
some system in place, go buy LaunchControl now.
(I tried to make this not sound like an advertisement, but I failed. This
is not a paid advertisement.)
Complete control
At its core, LaunchControl is a launchd-specific plist editor.
There is no magic. You simply drag the keys you want into your document
and set their values. There is no translation layer, forcing you to guess
what to type into the app to get the functionality you know launchd provides.
It is an excellent launchd reference. Every option is fully
annotated, so you won’t have to search the man page or the internet to know what
arguments you need to specify.
Helpful hints
LaunchControl is extremely helpful. If you specify an option that doesn’t make
sense, it will tell you. If the script you want to run doesn’t exist or is not
executable, it will warn you. If you are anything like me, this will save you
four or five test runs as you iron out all of the details of a new job.
Debugging
LaunchControl also acts as a launchd dashboard.
It lets you start jobs manually.
It shows you which jobs are running, and for each job,
whether the last run succeeded or failed.
For jobs that fail, it offers to show you the console output.
This is all information you could have found on your own,
but it is very useful to have it all in one place and available when you need
it.
I’ve been kicking the tires of TaskPaper lately. I’m intrigued by its
minimalist, flexible, plain-text approach to managing a to-do list.
I have a lot of repeating tasks, some with strange intervals. For
example, once per year, I download a free copy of my credit report. But I can’t
just do it every year on January 1, because if I’m busy one year and don’t do it
until the 4th, I have to wait until at least the 4th the following year. You see
the problem. The solution is to give myself a buffer, and plan on downloading
my credit report every 55 weeks.
Taskpaper has no built-in support for repeating tasks, but its plain-text format
makes it easy to manipulate using external scripts. So, for example, I can keep
my repeating tasks in an external file, and then once a month have them inserted
into my to-do list.
The plain-text calendar tool when, which I also use to remember
birthdays, seems like the perfect tool for the job. You store your
calendar entries in a text file using a cron-like syntax. You can also
do more complicated patterns. For example, I put this line in my file:
!(j%385-116), Transunion credit report
The expression !(j%385-116) is true whenever the modified Julian day is
equal to 116 modulo 385. This happens every 385 days, starting today.
When I run when with my new calendar file, I get this output:
today 2014 Feb 22 Transunion credit report
I wrote a quick Python script to translate this into TaskPaper syntax.
#!/usr/bin/pythonimportargparsefromdatetimeimportdatetimeimportreimportsubprocessWHEN="/usr/local/bin/when"defWhen(start,days,filename):command=[WHEN,"--future={}".format(days),"--past=0","--calendar={}".format(filename),"--wrap=0","--noheader","--now={:%Y %m %d}".format(start),]returnsubprocess.check_output(command)defTranslate(line):m=re.match(r"^\S*\s*(\d{4} \w{3} +\d+) (.*)$",line)try:d=datetime.strptime(m.group(1),"%Y %b %d")exceptAttributeError,ValueError:returnlinereturn" - {} @start({:%Y-%m-%d})".format(m.group(2),d)defNextMonth(date):ifdate.month<12:returndate.replace(month=(date.month+1))else:returndate.replace(year=(date.year+1),month=1)defStartDateAndDays(next_month=False):date=datetime.today().replace(day=1)ifnext_month:date=NextMonth(date)days=(NextMonth(date)-date).days-1returndate,daysif__name__=="__main__":parser=argparse.ArgumentParser(description="Print calendar items in taskpaper format")parser.add_argument("filename",help="Name of calendar file")parser.add_argument("-n","--next",action="store_true",help="Use next month instead of this month")args=parser.parse_args()date,days=StartDateAndDays(args.next)out=When(date,days,args.filename)forlineinout.split('\n'):ifline:printTranslate(line)
This takes the when output, and translates it into something I can dump into
my TaskPaper file:
After many years of school, I now have a Real Job. Which means I need to
save for retirement. I don’t do anything fancy, just index funds in a
401(k). Nevertheless, I am curious about how my money is growing.
The trouble with caring even a little about the stock market is that all the
news and charts focus on a day at a time. Up five percent, down a percent, down
another two percent. I don’t care about that.
I could average the price changes
over longer periods of time, but that is not helpful because I’m making
periodic contributions, so some dollars have been in the account longer than
others.
What I really want to know is, if I put all my money into a savings account with
a constant interest rate, what would that rate need to be to have the same final
balance as my retirement account?
Now it’s math. A single chunk of money P with interest rate r
becomes the well-known Pert after t years.
So if I invest a bunch of amounts Pi,
each for a different ti years at
interest rate r, I get
∑ Pierti.
I need to set this equal to the
actual balance B of my account and solve for r.
At this point, I could use solve the equation using something from
scipy.optimize. But since I’m doing this for fun, I may as well
write something myself. The nice thing about my interest function is that it
increases if I increase r and decreases if I decrease r. (This is called
monotonic and is a property
of the exponential function, but is also intuitively obvious.)
So I can just pick values for r and plug them in, and I’ll
immediately know if I need to go higher or lower. This is a textbook scenario
for a binary search algorithm.
The following Python function will find when our monotonic function is zero.
1
2
3
4
5
6
7
8
9
10
11
12
from__future__importdivision# For Python 2.defFindRoot(f,lower,upper,tolerance):"""Find the root of a monotonically increasing function."""r=(lower+upper)/2whileabs(upper-lower)>tolerance:r=(lower+upper)/2iff(r)>0:upper=relse:lower=rreturn(lower+upper)/2
This will look for a root between lower and upper, stopping when it gets
within tolerance. At each stage of the loop, the difference between lower
and upper is cut in half, which is why it is called binary search, and which
means it will find the answer quickly.
Now suppose that I have a Python list transactions of pairs (amount, time),
where amount is the transaction amount and time is how long ago in years
(or fractions of years, in my case)
the transaction happened. Also, I have the current balance stored in balance.
The difference between our hypothetical savings account and our actual account
is computed as follows:
This will go through the loop about 16 times.
(log2((upper−lower)/tolerance))
The U.S. government mandates that interest rates be given as annual
percentage yield (APY), which is the amount of interest you would earn on
one dollar in one year, taking compounding into consideration. Since I have assumed
interest is compounded continuously, I should convert to APY for easier
comparison. In one year, one dollar compounded continuously becomes
er. Subtracting the original dollar, I get the
APY:
I have used Jekyll for this site ever since I first created it.
I’ve contemplated switching to something Python and Jinja based,
since I’m more much more familiar with these tools than I am with Ruby.
But there is something about Jekyll’s simple model that keeps me here.
It’s probably for the best, since it mostly keeps me from fiddling, and
there are better directions to steer my urge to fiddle.
Having said that, I couldn’t help but write one little plugin.
I wrote this so I can look up a page or post by its URL.
It is an excellent companion to Jekyll’s recent support for data files.
The plugin defines a new Liquid
tag called assign_page which works kind of
like the built-in assign tag. If you write
{% assign_page foo = '/archive.html' %}, it creates
a variable called foo that refers to object containing information
about archive.html. You can then follow with
{{ foo.title }} to get the page’s title.
The plugin code
Here is the code that I store in my _plugins folder.
moduleJekyllmoduleTagsclassAssignPage<Liquid::AssignTrailingIndex=/index\.html$/defpage_hash(context)reg=context.registerssite=reg[:site]ifreg[:page_hash].nil?reg[:page_hash]=Hash[(site.posts+site.pages).collect{|x|[x.url.sub(TrailingIndex,''),x]}]endreturnreg[:page_hash]end# Assign's Initializer stores variable name# in @to and the value in @from.defrender(context)url=@from.render(context)page=page_hash(context)[url.sub(TrailingIndex,'')]raiseArgumentError.new"No page with url #{url}."ifpage.nil?context.scopes.last[@to]=page''endendendendLiquid::Template.register_tag('assign_page',Jekyll::Tags::AssignPage)
On Line 3, you see that my AssignPage class is a subclass of Liquid’s Assign
class. Assign defines an intialize method to parse the tag, storing
the variable name in @to and the value in @from.
By not overriding initialize, I get that functionality for free.
On Line 6, I define a function that creates a hash table
associating URLs with
pages. Liquid lets you store stuff in context.registers, and Jekyll stores
the site’s structure in context.registers[:site]. Lines 10 and 11 create the
hash table and store it in context.registers so I don’t have to recreate it
for each assign_page tag. Ignoring the removal of trailing index.html,
this is the same as the Python dictionary comprehension
{x.url:xforxinsite.posts+site.pages}
Line 20 uses the hash table to look up the URL. The rest of the lines are pretty
much copied from Assign. Line 19 evaluates @from,
which lets you specify a variable containing the URL instead of just a URL.
Line 22 puts the page in the
proper variable. Line 23 is very important because Ruby functions return
the result of the last statement. Since Liquid will print our function’s
return value, we want to make sure it is blank.
Apple has a history of erasing Python’s site-packages folder during operating
system upgrades, leaving users without their third-party Python modules and
breaking scripts everywhere. Although I’ve heard that some reports
of the upgrade to 10.9 leaving things alone, mine were wiped once again.
Last year when this happened, I vowed to switch everything over to
virtualenv,
which allows you to install packages in a custom location. With
this setup, getting things working again was as easy as recreating
my local.pth file:
sudo vim /Library/Python/2.7/site-packages/local.pth
with a single line containing the path to my virtualenv site packages:
It’s a long story, but for the last six months, I have been using Vim as my primary text editor. As I began to use Vim more often, I was frustrated by the lack of a tutorial that went beyond the basics. I finally found what I was looking for in Steve Losh’s Learn Vimscript the Hard Way, which is an excellent introduction to Vim’s power features. I also discovered the real reason there are no advanced tutorials, which is that everything you need to know is contained in Vim’s help files.
Vim’s documentation is incredibly complete and very useful. Unfortunately, it makes heavy use of cross references, and the cross references only work with Vim’s internal help viewer. I have no qualms about reading a reference document, but I would strongly prefer to do this kind of reading reclining on a couch with an iPad, rather that Control+F-ing my way through a read-only Vim buffer.
I wanted a way to read and annotate the help files on my iPad. The
files were available as HTML, but annotating HTML files is complicated. There are some apps that can annotate HTML, but there is no standard or portable way to do so.
I converted the HTML files to ePub using Calibre, but Vim’s help is very dependent on having lines that are 80 characters long. This caused problems in iBooks.
So instead, I settled on the old favorite, PDF. I can easily annotate a PDF on my iPad and then move those annotations to my computer or another device. Actually, the Vim documentation was already available in PDF format, but without the internal links.
To convert the Vim help files, which are specially-formated plain text, into a hyperlinked PDF, I started with Carlo Teubner’s HTML conversion script, which takes care of the syntax highlighting and linking. I just needed a way to programmatically make a PDF file.
Latex
Latex is clearly the wrong tool for the job. I don’t need the hyphenation or intelligent line breaking that Latex excels at. All I need is to display the text on a PDF page in a monospace font, preserving whitespace and line breaks. Latex ignores whitespace and line breaks.
But Latex is what I know, and I am very familiar with the hyperref package, which can make internal links for the cross references, so I used it anyway.
I used the fancyvrb package, which allows you to preserve whitespace and special characters, like the built-in verbatim environment does, but also allows you to use some Latex commands. This allowed me to do syntax highlighting and internal hyperlinks.
At one point, I ran into an issue where Latex was botching hyphenated urls. The good people at the Latex StackExchange site figured out how to fix it. The level at which they understand the inner workings of Tex amazes me.