Software Design for the Command Line

Periodically we are told in the course of our work "Just use the command line tool! Jim wrote it last year to make it easy." So naturally you call the tool with --help and unceremoniously get a "Error" with exit code 1. That is fine, next you check out the source code to figure out the arguments but you can't specify the host because the tool predates going multiregional so it has all this hardcoded. "Should be a quick little change and I'll be done in an hour" but a few days later you're in the middle of refactoring the damned thing and there is still the question of "will this work safely in production" because there are no tests.

We write command line tools for all kinds of things, ranging from convenience to mission critical tasks that are essential to rescuing your software from emergencies. However, we frequently fall into the trap of thinking "this is one time use." There is definitely a tone from some engineers when talking about "scripting" versus "programming" but this status difference is largely a self-fulfilling prophecy: if you don't use the same software design principles for well-factored production code in your tools, then of course they will not meet your quality bar. The programming language used to solve a problem is rarely the root cause for low quality.

There are useful design principles to follow when writing a command line tool. Whether you're building from scratch or struggling to refactor an existing tool or just feel like the tests are awkward for a new tool then this post aims to provide a short guide to what a well-designed tool might look like. In addition to this post, there is a GitHub repository that concretely demonstrates some principles. It is in Python and this piece will reference Python libraries but there are equivalent libraries in all major languages and the software design is language agnostic.

Any new tool should be easy to reliably distribute, meaning invest in the necessary packaging early. Scripting languages do not encourage this because they let you run loose scripts that can autodiscover their dependencies. This is excellent for prototyping, and it probably works great on your machine, or when you have all the necessary dependencies on some remote tools host but this will break down when you need to reliably get your tool working on a new target machine. If your scripting language has packaging and dependencies (ie setup.py and requirements.txt in Python) you should invest in these early, especially if it gives you standard tooling. Figuring out all the deployment method and dependencies of a tool is much less painful when done early.

Keep three levels of abstraction in mind when writing for the command-line: the entry point for a tool, the main function, and the library function(s). Scripting languages let you treat your files like they are bash scripts but this is very painful to safely extend. In many languages the entry point and main function are the same thing with something in the language handling invoking the main function with raw command line arguments. Languages like Python just use raw execution of the file and you will need a separate main function and a statement to invoke the main function. Wrap your logic into a main function and use the bare minimum to get the command line arguments to the main function. This is just basic encapsulation that can lead to many advantages: other tools might import your tool to leverage its functionality now that it has a main function that can be safely included and you can write a very simple dry-run test. A simple test of a main function may seem pointless, but in dynamic scripting languages that don't show you problems until runtime it can be really helpful for catching problems in newly written code.

The main function will need to process the raw command line arguments. Make sure to use an argument parser for this. Even if a tool has simple inputs (say a series of numbers), assume that future users of the tool do not know this. That means the inputs for the tool include these simple inputs... and --help to explain it to new users. Do not roll your own argument parser. A good argument parser takes a lot of love and when writing a tool it is tempting to take shortcuts to focus on the tool. For example, it is commonly expected that both --help and -h work. Do not waste effort on this, find a good argument parsing library and use it. New arguments are almost sure to follow and a handrolled argument parser just makes that harder.

Invoking an argument parser in the main function is a pattern of abstracting application logic, inputs, and outputs from the real tool logic which should be placed in a library function. Dealing with environment variables, command line inputs, command line exit codes (ie sys.exit) are not the responsibility of the library function and should be isolated to the main function or its clearly namespaced related helpers. Similarly the main function should be isolated from any complex orchestration or logic that belongs to the library. It can be tempting to just put a simple print for output in your library or include a useful loop in your main function but both of these decisions will mix responsibilities creating code that is brittle, hard to test, and hard to re-use.

Error handling is an instructive example. Expected errors might warrant special handling, generally a transformation to a useful help message. Putting the transformation into the library degrades it's re-usability and can be scattered in multiple places making it difficult for future contributors to make effective changes.

Imagine the library function being invoked by other users, in particular your tests. Doing this abstraction right results in a single integration test to ensure command line resources are correctly passed to the library function and then as many unit tests of the library function deemed necessary. This distinction is important because it localizes the most painful to write mocks and thus makes setting up test conditions for the unit tests far easier. If everything was flattened into the same layer of abstraction this pain would be shared by every single unit test. If a new unit test has a bunch of mocking around it, it may be a code smell indicating a resource that should be created and owned by the main function. A good example would be testing stdout output from a function. Instead of awkwardly mocking stdout for a test the library function should return some state (that could be converted to a string and then sent to stdout by a caller) or receive a file handle for the output. Similarly moving the creation of a required API client up to the main function radically simplifies mocking for many library functions. Sometimes the mocking just needs some additional unit test helpers, for example discovering that io.StringIO in Python is a drop-in replacement for most file-handles is a gamechanger for many unit tests. Similarly Python context managers can make a lot of sense in a library function but require a lot of boiler-plate to mock so providing a helper for this case can have a lot of value.

Deciding what belongs in the main function and what belongs in the library is a classic challenge of software design, but simply having this distinction already puts you on the right path. Think of what someone invoking the library function without using the main function would want as an interface. A good rule of thumb is to ensure your library function has a single responsibility which often means breaking it up into components that get called by a simple wrapper for the use case of your tool. These components make up the library for your tool and doing this correctly can result in the trivial creation of well-tested new tools with very divergent use-cases in the same domain as the original tool. When it comes to operational tooling this can be a massive win when you have to develop similar but new tooling against a specific event.