Peter Neumark
Prezi Engineering
Published in
5 min readNov 28, 2014

--

Snakebasket — Recursively Install Python Dependencies

If you’re not structuring your Python application components into packages for reuse, you should be. If you are already, you know that Python packaging can at times be annoying, to say the least. Also, you probably use pip, a tool for installing packages from various sources, including tarballs on the web, git repositories, or the default Python Package Index, PyPI.

Pip promotes an approach to specifying dependencies for Python packages that’s similar to Java’s classpath. Rather than specifying just the packages that your application directly depends on — typically in a file called requirements.txt — pip suggests you include all dependencies in that file. Both direct and indirect. Then, all you need to do is run one command to setup the environment for your application:

$ pip install -r requirements.txt

What we’d rather do is have all of our internal packages specify their direct dependencies in their requirements.txt, and have pip recursively chase down and install dependencies as needed. Unfortunately pip doesn’t support this recursive behavior, so we created snakebasket.

Snakebasket is a drop-in replacement for pip. For non-git package requirements it’ll behave just like pip. But when it comes across a requirement that points to a git repo, it’ll clone that repo and run itself on that repo’s requirements.txt file — if it exists — before attempting to install the package. Here’s an example.

Say you’re building a web application using Django. Let’s call it Prezi. Your requirements.txt file contains some package names pegged to versions that pip will fetch from PyPI. It may also contain packages that you haven’t published publicly that live in a git repo. It’ll look something like this:

# prezi python github dependencies
-e git+git@github.com:prezi/aws.git@v1.0.1#egg=prezi_aws
-e git+git@github.com:prezi/config.git@v1.1.9#egg=prezi_config
-e git+git@github.com:prezi/django-utils.git@v1.2.4#egg=prezi_django_utils
-e git+git@github.com:prezi/mac-client.git@v1.0.2#egg=prezi_mac_client
-e git+git@github.com:prezi/prezi-utils.git@v1.1.5#egg=prezi_utils
# Test related
mock==0.8.0
unittest2==0.5.1
-e git+git@github.com:prezi/django-test.git@v1.0.2#egg=prezi-django-test
# For the upload django command:
poster==0.8.1
# For the media API
django-tastypie==0.9.14
django-configglue==0.6.1
Django==1.5.1
anyjson==0.3.1
boto==2.9.0

If you run $ pip install -r requirements.txt, it will succeed. But when you try to run your web application, it will throw an error the first time you import a module from the prezi_logging package. You may be asking yourself, what is prezi_logging and why isn’t listed in the requirements.txt file for your web application? What is going on? To answer those question, you need to look at the requirements.txt file for prezi_mac_client:

-e git+git@github.com:prezi/logging.git#egg=prezi-logging

The only package that prezi_mac_client depends on is prezi_logging. Presumably it also imports modules and calls functions fromprezi_logging. When you’re installing dependencies for your web application, pip installs prezi_mac_client, but pip won’t install any of that package’s dependencies, in particular prezi-logging. Pip does not inspect each dependency’s requirements.txt. It more or less runs $ python setup.py install for each dependency.

Snakebasket, in a word, installs everything. Point it to a requirements.txt file, and it will clone all the git repos that provide packages to check if they themselves contain a requirements.txt file. If it does, it’ll recursively perform its task, collecting the names of all the packages that need to be installed. Once it’s done with the traversal, it’ll install all the packages either from the git repo specified, or from PyPI. Here’s how you install and use it:

curl -ss -L http://href.prezi.com/snakebasket | bash -s
sb install -r requirements.txt

After that, you can run your web application without running into any errors caused by missing dependencies.

Dependency Hell Avoidance

There’s one other way in which snakebasket differs from pip: the way it treats versions. Building an application involves managing its dependencies and their versions. That alone is simple, though not always easy. To complicate things further, the packages your application uses themselves have dependencies, and those are not as simple to control. When your application, through its tree of package dependencies, requires different versions of the same package, that’s called dependency hell.

Dependency hell is an unsolved problem in Computer Science right up there with P ≠ NP. Snakebasket does not solve this problem, but it does take an opinionated approach to dealing with it that works most of the time, and is typically what you’d do anyways. While traversing a project’s dependency tree, it records all package names their associated version specifications. For each package, it will install the latest version that is allowed by some version specifications it collected, and no other versions. For example, if one of your packages requires ‘mock=0.8.0' and another requires ‘mock=1.0.1', snakebasket will install mock version 1.0.1 and no other versions.

Other Approaches

Snakebasket enables you to specify the dependencies of your packages without using setuptools features or setting up your own pip mirror. In essence, it turns your git repositories into a pip mirror. This solution works for us, but it might not be right for everybody. Here are a few other approaches you can take.

The first option is to use the dependency_links options for setuptools to specify the location of your git projects, and then use install_requiresto declare them as dependencies of your project. We decided against this because we never want to install a package from PyPI that shares a name with one of our internal packages, accidentally or otherwise.

The other option is to setup your own PyPI mirror, publish your internal packages to that, and use setuptool’s install_requires to specify package dependencies. While you’re at it you can host your 3rd-party dependencies there as well, and fix any dependency hell issues that arise. We decided against this approach for two reasons. First, the PyPI mirror is another service you have to maintain and operate, and we didn’t want to do that. The second and more important reason is that this solution doesn’t address the dependency hell problem.

If you resolve all version conflicts when you initially setup your PyPI mirror and pin all your package versions, then yes, you’re out of dependency hell for the moment. But chances are that those conflicts won’t stay resolved without your strict adherence to a process to preserve that. Pip will eventually start failing when you try to install your package, and you’ll find yourself back in dependency hell. At that point you’ll want to use a tool like snakebasket to resolve those version conflicts for you.

You could just not pin versions to avoid dependency hell, but that has even worse consequences. If you don’t pin package versions, you won’t be able to reliably reproduce the state of a deployment. The state of the deployment depends on the state of your PyPI mirror, which is recorded nowhere in your application code or configuration files. This approach makes the task of debugging your application much harder than it should be, and that’s something we wanted to avoid.

Towards Open Source

Snakebasket is the first internal project that we’ve open-sourced, but not the last. As a company that’s benefited from open-source software throughout our four years of existence, we’re proud and excited to finally start giving back. You can find all of our open source contributions on Prezi’s GitHub page, and browse, download, or fork snakebasket here.

Expect more to come.

April 19, 2013 By Ryan Lane and Peter Neumark

--

--