Nathan Grigg

Monitor long-running commands: capture return status

This is the first in a series of posts describing the system I built to monitor long-running commands.

For me, this started as a way to light a blink 1 LED at work when a long build completed, bringing me back to the more important task from whatever may have distracted me. After a recent post by Casey Liss, I’ve added watch notifications to the mix.

The problem, for me at least, is that I never know when a command is going to take long enough that I lose interest and want a notification, so appending a command at the end of the original is not an option:

build; notify

You could always run build, and then when you realize you want notified, type notify blindly, but any stray keys or sometimes even mouse scrolling might mess up the follow up command. (Or whatever long-running command you are using might be capturing stdin.)

What I really wanted was a way for any terminal window to wait on any other terminal window. And of course, it would be nice to be able to do something different depending on what the return value is.

The first step is to capture and log the return value of every command in every window. Not long ago, this would have seemed ludicrous, but with SSDs, it really isn’t a big deal to write to disk once per command.

Unique window identifiers

To be able to tell one command from another, you need some kind of an identifier for each window. At work, I always use tmux, which calls windows panes, and gives each pane an incrementing identifier, stored in the TMUX_PANE variable. At home, I use iTerm2, which sets the TERM_SESSION_ID variable. This variable actually has two parts separated by a colon, and the unique id is in the second part, which is accessible using parameter subtitution as ${TERM_SESSION_ID#*:}.

Log the return value

Both zsh and bash can run an arbitrary command right before drawing the prompt, that is, right after any command finishes. This is a perfect place to capture and log the return value.

In zsh, this is precmd, so you can define something like this in your zshrc (using tmux pane):

precmd() {
    echo "$?" > "$HOME/.local/logs/return-$TMUX_PANE"
}

The variable $? holds the return value of the last command.

In bash, this is PROMPT_COMMAND, so you can put this in your bashrc (using iTerm window id):

PROMPT_COMMAND='echo "$?" > "$HOME/.local/logs/return-${TERM_SESSION_ID#*:}"'

In the next post, I’ll show you how to use this logged value to wait for a command to finish.


My poor 911 track record

When I was 20 and living in Chile, I was in a car accident. I was driving through an intersection in the Antofagasta centro and didn’t notice that the traffic signal was out. Neither did another driver, who happened to be drunk and driving 70 mph. The truck I was driving rolled twice and landed on its side. When I was sure my passengers were okay (we were), I climbed out the window. I pulled out my cell phone and dialed 133, the Chilean 911 equivalent.

I got a busy signal. It didn’t matter. We were downtown and I could already hear sirens.

A few years ago, my neighbor broke his hip in his driveway. I called 911 on my cell phone and was connected to the wrong city’s dispatch center. This was understandable, since I lived on the border of the two cities. They transferred my call, which was not cumbersome since dispatchers pick up on the first ring.

A few weeks ago, I was driving behind an erratic driver who I thought was drunk. He was driving slowly and in the middle or wrong side of the road. I honked, thinking maybe he was just inattentive, and he pulled over, let me pass, and started following me. At this point I was home, but a little freaked out, so I drove around the block and circled back, only to find him stopped and blocking the one-lane street that my house is on. California advertises 911 as the number to call to report drunk driving, so I gave up on solving things myself and called 911.

I got a Verizon message that the number I was dialing had been discontinued or moved.

I tried again. Same thing.

While I sat in my car in disbelief, the driver of the car parked in front of my house and exited the vehicle. It was an elderly lady and despite her poor driving was probably not drunk and definitely not out to get me.

I felt a little silly while I parked my car, but what if that had been a more urgent situation? What if there had been a threat or an accident or a heart attack and minutes mattered?

How do I fix this?

I spent an hour on the phone with Verizon. I felt only somewhat better to hear that none of the technicians I worked with had heard of this before. They had me reset my network settings, reboot my phone, and call 911 again.

Same thing.

I got escalated to the next level. They had me read off a bunch of settings, which checked out. They had me turn off my phone, they did some kind of hard reset of my network state on their end, and then had me turn my phone back on and try again.

This time it worked. (I was apologetic and the dispatchers were understanding.)

And that was it. Verizon is sorry, but nobody has any idea what went wrong.

I programmed my city’s 10-digit emergency number into my phone. Hopefully that works as a backup.

(Wifi calling was enabled, but I had no wifi connection at the time. I’ve since disabled it. Voice over LTE was also enabled. I’ve disabled it.)


This looks ominous. (But it wasn’t actually.)


My family was sick all last week, and I fell behind on a tight deadline at work. So in the last two days, I’ve been working like crazy to catch up. Almost done.


I got a debit card for my oldest child so I could move to direct deposit allowance. Dealing with cash gets old fast.


I’ve often wished there were counterparts to the “for Dummies” series labeled “for mathematicians.” Replace 500 pages of babble and sidebars with five pages of formulas and analysis.

I’m trying to figure out a few U.S. tax laws that apply to me for the first time this year.


My thermostat just showed me an advertisement. I’m not happy about this.


Here is a great article on OmniFocus, its current state, and useful tips by Gabe Weatherhead. Like Gabe, I recently started using OmniFocus 2 after some years away.


Brute force

One of the main lessons that I have learned in the last three years in my job as a programmer is that if you only have to do a job once or twice, use brute force.

Today, I was setting up a Synology NAS. The instructions for getting started are extremely sparse, consisting mainly of “type in the IP address in a web browser and follow the instructions there.”

In the old days, I would have started typing IP addresses until I found one that worked. Today, I made a new directory called “ips” and ran

for i in $(seq 50); do
  echo "curl 10.0.1.$i > $i"
done | parallel -j 25

Then, since I am (obviously) impatient, I ran wc -l * every few seconds to see which files were not empty. I found it at 10.0.1.5.

Later I discovered find.synology.com, which is the official way to find your Synology.


I got an Apple Watch for my birthday. So far, the best part is easy access to my calendar while I’m at work.


I would like to write more regularly, so I’m experimenting with a shorter post format, what Manton Reece calls a microblog post. I spent some time today tweaking my Jekyll configuration to enable them on this site.


Uploading an Image with Workflow and Flask

Workflow is an iOS app that lets you build a simple program by dragging blocks around, similar to Apple’s Automator app that ships with macOS. A recent update makes it possible to send a wider variety of HTTP requests, which allows you to interact with web APIs that aren’t otherwise supported.

Or, if you have a web server, write your own API.

Upload images workflow

Here is a workflow to take images from my phone and upload them to my server. It makes one request per image. It sets the custom header Grigg-Authentication to make sure that random people aren’t uploading images. It puts a file into the POST request with field name image. The responses will be HTML image tags, which are collected and then copied to the clipboard.

Flask

Flask is a Python web framework. It makes it very easy to map URLs to Python functions.

The first thing I wrote was a private decorator, that would check the HTTP headers for my authentication key. It doesn’t have to be a decorator, but that makes it easier to reuse in the future.

1
2
3
4
5
6
7
8
9
KEY = 'password1!'

def private(f):
    @functools.wraps(f)
    def wrapper(*args, **kwargs):
        if flask.request.headers.get('Grigg-Authenticate') != KEY:
            flask.abort(403)
        return f(*args, **kwargs)
    return wrapper

If you are not using a secure (HTTPS) connection, somebody could read your authentication key and pretend to be you. You can set this up directly with Flask, but since I’m already running nginx, I used that. (I will share the details in a future post.)

Next, there is some basic Flask setup. I changed the response MIME type to plain text and registered an error handler that will report any exceptions in the response, rather than logging an error where I won’t see it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
app = flask.Flask(__name__)

class TextResponse(flask.Response):
    default_mimetype = 'text/plain'

app.response_class = TextResponse

@app.errorhandler(Exception)
def handle_generic_exception(e):
    return 'Unhandled exception: {!r}\n'.format(e)

Then, there is the routing code. This function is called every time someone visits /blog/upload-image, as specified in the route decorator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
@app.route('/blog/upload-image', methods=['POST'])
@private
def blog_upload_image():
    try:
        fh = flask.request.files['image']
    except KeyError:
        flask.abort(400, 'Expected a file with key "image", not found')

    _, extension = os.path.splitext(fh.filename)
    filename = upload_image(fh, extension)
    return '<img src="{}" class="centered">\n'.format(filename)

Finally, the actual work is done by the upload_image function. I save the image into a dated directory with a random filename, then run a bunch of git commands.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class Error(Exception):
    pass

def random_chars(size):
    return base64.b32encode(
            uuid.uuid4().bytes).decode('ascii').lower().rstrip('=')[:size]

def upload_image(fh, extension):
    """Upload image to blog and return filename, relative to site root."""
    subdir = 'images/{:%Y}'.format(datetime.datetime.today())
    try:
        os.mkdir(os.path.join(PATH, subdir))
    except FileExistsError:
        pass

    basename = ''.join((random_chars(8), extension))
    filename = os.path.join(subdir, basename)
    fh.save(os.path.join(PATH, filename))

    output = []
    def run(args):
        output.append(' '.join(args))
        output.append(subprocess.check_output(
            args, cwd=PATH, stderr=subprocess.STDOUT))
    try:
        run(['git', 'pull', '--ff-only'])
        run(['git', 'add', filename])
        run(['git', 'commit', '-m', 'Add image'])
        run(['git', 'push'])
    except subprocess.CalledProcessError:
        raise Error('Git operation failed. Output:\n{}'.format(
            '\n'.join(output)))

    return filename

Time Zone News

If you like time zones—who doesn’t?—you should check out Time Zone News. Once a month or so, I get gems like this in my news feed:

Haiti cancels daylight saving time with two days notice

The planned change to daylight saving time in Haiti at 2 am local time on 13 March 2016 has been cancelled.

Or this one:

Chile reintroduces DST

Chile’s Ministry of Energy announced today that Chile will be observing daylight saving time again. Chile Standard Time will be changed back to UTC -4 at 00:00 on 15 May, and DST will be observed from 00:00 on 14 August 2016, changing time in Chile to UTC -3.

Chile used to observe DST every year until a permanent UTC offset of -3 was introduced in 2015.

It is unclear whether the time change also applies to Easter Island.


Filter App.net RSS

I was looking to make more room on my phone’s home screen, and I realized that my use of App.net had dwindled more than enough to remove it. I never post any more, but there are a couple of people I would still like to follow that don’t cross post to Twitter.

App.net has RSS feeds for every user, but they include both posts and replies. I only want to see posts. So I brushed off my primitive XSLT skills.

I wrote an XSLT program to delete RSS items that begin with @. While I was at it, I replaced each title with the user’s name, since the text of the post is also available in the description tag.

Here is the transformation that would filter my posts, if I had any:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- Default identity transformation -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<!-- Replace title with my username -->
<xsl:template match="item/title/text()">nathangrigg</xsl:template>

<!-- Remove completely items which are directed at other users.
     The RSS feed has titles of the form @username: text of post. -->
<xsl:template match="item[contains(title, '@nathangrigg: @')]" />
</xsl:stylesheet>

Now I can use xsltproc to filter the RSS. In order to fill in the username automatically, I wrapped the XSLT program in a shell script that also invokes curl.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/bin/bash
set -o errexit
set -o pipefail
set -o nounset

if (( $# != 1 )); then
    >&2 echo "USAGE: $0 username"
    exit 1
fi

username=$1

xslt() {
cat << EOM
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- Default identity transformation -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
<!-- Replace title with just the username -->
<xsl:template match="item/title/text()">$username</xsl:template>
<!-- Remove completely items which are directed at other users.
        The RSS feed has titles of the form @username: text of post. -->
<xsl:template match="item[contains(title, '@$username: @')]" />
</xsl:stylesheet>
EOM
}

rss() {
    curl --silent --fail https://api.app.net/feed/rss/users/@$username/posts
}

xsltproc <(xslt) <(rss)

Illustrating Python multithreading vs multiprocessing

While adding multithreading support to a Python script, I found myself thinking again about the difference between multithreading and multiprocessing in the context of Python.

For the uninitiated, Python multithreading uses threads to do parallel processing. This is the most common way to do parallel work in many programming languages. But CPython has the Global Interpreter Lock (GIL), which means that no two Python statements (bytecodes, strictly speaking) can execute at the same time. So this form of parallelization is only helpful if most of your threads are either not actively doing anything (for example, waiting for input), or doing something that happens outside the GIL (for example launching a subprocess or doing a numpy calculation). Using threads is very lightweight, for example, the threads share memory space.

Python multiprocessing, on the other hand, uses multiple system level processes, that is, it starts up multiple instances of the Python interpreter. This gets around the GIL limitation, but obviously has more overhead. In addition, communicating between processes is not as easy as reading and writing shared memory.

To illustrate the difference, I wrote two functions. The first is called idle and simply sleeps for two seconds. The second is called busy and computes a large sum. I ran each 15 times using 5 workers, once using threads and once using processes. Then I used matplotlib to visualize the results.

Here are the two idle graphs, which look essentially identical. (Although if you look closely, you can see that the multiprocess version is slightly slower.)

Idle threads. The tasks of each group run in parallel. Idle processes. The tasks of each group run in parallel.

And here are the two busy graphs. The threads are clearly not helping anything.

Busy threads. Each task run sequentially, despite multithreading. Busy processes. The tasks of each group run in parallel.

As is my custom these days, I did the computations in an iPython notebook.