After my post two weeks ago about managing my library of academic papers, someone mentioned that I should upload the arxiv2bib
script that I wrote to the Python Package Index (PyPI).
I have been curious about Python packaging before, but had never really understood it. This script was the perfect candidate for me to experiment on: a single python file with no dependencies. So I dove in.
I’ll be honest, it wasn’t easy. In the end it worked, and I was even able to use my newfound knowledge to upload my more complicated Day One Export script, which has templates, multiple files, and dependencies. But I spent more time than I wanted to screwing things up. Worst of all, I don’t see any way I could have avoided all these mistakes. It really is that convoluted.
So here is my story. This is not a tutorial, but hopefully my journey will enlighten you. The links should be helpful too.
Python packaging is centered around the setup.py
script. Python comes with the distutils
package, which makes creating the script really easy, assuming that you don’t want to do anything complicated. (Caveat: you often need to do something complicated.) Without needing any extra code, the distutils
package empowers setup.py
to build and install python modules, upload them to PyPI, even create a bare bones Windows graphical installer.
I followed this guide from Dive into Python 3 (everything applies to Python 2). All you have to do is fill in the arguments to the setup
script. Then you run python setup.py sdist
to create a tar.gz
containing the source. With python setup.py register
, it registers the current version of the project on PyPI (you will need an account first). Finally, python setup.py upload
will upload the distribution to PyPI.
At this point, things were working, but not as well as I wanted. First of all, I wanted my script to work with either Python 2 or Python 3. This isn’t too difficult; I just copied some install code from Beautiful Soup.
I also wanted things to work on Windows, but this was much more difficult. You can designate certain files as “scripts”, and distutils will copy them into /usr/local/bin
(or similar). On Windows, it copies to C:\Python27\Scripts
, but Windows won’t recognize a Python script as executable unless it ends in .py
. So I made the install script rename the file if it was installing on Windows.
Because setup.py
is just a Python file, it can really do just about anything. (Another reason to use virtualenv, so that you don’t have to sudo
someone else’s script.) But if you find yourself doing crazy things, take a deep breath, and just use setuptools.
Setuptools is not part of the Python standard library, but it is almost universally installed. This is the package that brings eggs and easy_install
and is able to resolve dependencies on other packages. It extends the capabilities of distutils, and makes a lot of things possible with a lot less hacking.
There are plenty of setuptools guides. For my Day One Export script, I was most interested in declaring dependencies, which is done with the install_requires
argument (as in this). For both scripts, I was also interested in the entry_points
argument, which allows you to make executable scripts that run both in Windows (by creating an exe
wrapper) and in Unix (the usual way).
If I were to do it again, I would skip distutils and just use setuptools.
One thing I did stress about was what to do for users that don’t have setuptools installed. Some packages use distutils as a fallback, while others don’t. In the end, I settled for printing a nice error message if setuptools is not installed.
Here is where things get really confusing. Distribute is a fork of setuptools. For the most part, you can pretend it doesn’t exist. It acts like setuptools (but with fewer bugs), so some people will technically use distribute to install the package instead of setuptools. But this doesn’t affect you.
Distribute also has Python 3 support, so all Python 3 users will be using it instead of setuptools. Again, this doesn’t affect you much, except that distribute offers some tools to automatically run 2to3
during installation.
Update: Now you have even less reason to care about distribute, because it was merged back into setuptools.
It is confusing enough to have three different packaging systems, but the Python maintainers aren’t happy enough with setuptools/distribute to bring them into the standard library. The goal is to replace all three with a better system. It is currently being developed as distutils2, but will be called packaging when it lands in the standard library. At one point, this was scheduled to happen with Python 3.3, but that has been pushed back to version 3.4.
Well, I’m glad I finally understand how these things work. And this is the sort of thing that you will never understand until you do it yourself. So in that sense, it was worth it.
The number of people who will use a script that can be installed, especially via pip
or easy_install
, is likely an order of magnitude more than then number of people who would use the script otherwise. So packaging is the nice thing to do.
Even for my own use, it is nice to have these scripts “installed” instead of “in ~/bin
”. I can use pip
to check the version number, quickly install a new version, or uninstall.