Math publishing, open access, and my arXiv.org workflow

Feb 15, 2012

Publishing a mathematics research paper is a complex and entangled process. If a math professor wants to get tenure, they need to publish their research in prestigious journals, so they submit papers to these journals. The journals, whose editors are other mathematicians, send the papers off the be peer reviewed. Neither the editors nor the reviewers are paid by the journal. The journal doesn’t even have to typeset the paper, since almost all math papers are written in Latex (99 percent, according to the sample I took of papers posted to the math section of arXiv.org). The paper is eventually published, and the journal charges a pretty high price for printed or electronic versions of the paper (sometimes over a dollar per page, which is paid primarily by university libraries). The main thing that the journal provides is its reputation, which serves as a very useful signal about the quality of the papers it contains.

There has been some talk in the past several years about open access in academia. The reasoning is that if taxpayer money is funding basic research, the results should be available to everyone at little or no cost.

The math community has open access almost completely figured out, thanks to the arXiv.org repository (which we usually refer to as “The arXiv”, which is pronounced “The Archive”, because the X stands for the greek letter chi). Although you transfer copyright to the journal when they publish your paper, what you do beforehand is generally your business. So before you submit your paper to a journal, you submit it to arXiv.org, which will post it online forever. Although I don’t know statistics, the vast majority of papers published today are listed on arXiv.org.

Journals are still necessary because of the peer review process and the prestige factor. In addition, not all authors incorporate corrections and changes that they make to their paper during the peer review process into the arXiv.org version. Actually, some journals prohibit you from doing this, although in practice, the word is if they ever tried to crack down on this there would be mass revolt.

That doesn’t mean the community is happy with journals. It would be nice if they allowed more open access, or charged less, etc. To this end, some are participating in a boycott of one company that is seen as a “worst offender”. They have comparatively high prices, they try to stop you from publishing changes to arXiv, and they lobby in favor of legislation against open access. (You probably aren’t surprised to hear they also were in favor of SOPA.)

Organizing arXiv.org papers

The reason I’ve been thinking about this is that a lot of the papers I read come from arXiv.org, and I’ve been trying to improve the way I keep track of the papers I read.

My workflow for managing papers I read goes something like this:

Find a paper I want to read, usually from a citation or the arXiv.org RSS feed.
Download a PDF to my iPad or Dropbox folder. Usually it has a stupid file name, but I don’t worry yet.

(time passes)
Import the citation into BibDesk. BibDesk automatically renames the file using the author and title and moves it to my folder on my Dropbox. The difficult part is step 3. It is very useful to use BibDesk to organize the metadata for the papers I read, but it can be unnecessarily time consuming. BibDesk can import from many different websites, this usually requires you to do search using BibDesk’s built in browser for a paper that you already found once before.

This week I wrote a Python script to download arXiv.org references. It takes one or more arXiv identifiers and returns a citation for the corresponding paper in BibTeX format. Using the script I wrote, together with BibDesk’s great AppleScript integration, I can simply enter the arXiv id (which is watermarked onto the article) and everything else happens automatically.