Dependency sets for pip

One of the things I enjoy about building projects with nodejs
is using npm, specifically the devDependencies part of
package.json. This allows you to have one set of dependencies that are
installed in production, but have extra dependencies installed for development,
such as test libraries, deploy tools, etc. To get the development dependencies
with npm you run:

1
$ npm intall --dev

how about pip

It turns out if you are using pip 1.2 or newer, you can now do the same thing
in your setup.py file for Python packages.

An example setup.py file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python

from setuptools import setup
from myproject import __version__

required = [
'gevent',
'flask',
...
]

extras = {
'develop': [
'Fabric',
'nose',
]
}

setup(
name="my-project",
version=__version__,
description="My awsome project.",
packages=[
"my_project"
],
include_package_data=True,
zip_safe=False,
scripts=[
'runmyproject',
],
install_requires=required,
extras_require=extras,
)

To install this normally (in “edit” mode) you’d run:

1
$ pip install -e .

To install the develop set of dependencies you can run:

1
$ pip install -e .[develop]

As you can see, you can have multiple sets of extra dependencies and call them
whatever you want.

Have fun,
Aaron

setuptools, pip, and custom python index

In modern-day software development for the web I find that we end up trying many different ways to deploy code. While at work we’re using python as our primary programming language, I’ve enjoyed the node.js philosophy, especially the practice of Small Kernels of Functionality and Loosely Coupled Components.

From the article

“…why package two modules together if you can simply break them apart into two kernels of functionality which are codependent?””

Problem

One of the core sore points for me right now is the existence of “common” libraries in our work. It’s common to have a piece of code that is needed in the current project, but doesn’t particularly belong there. The approach (I often see) is to create said “common” library and deploy that with all of the projects that need the code. The major resistance to putting this in an individual package is probably the overhead of maintaining a separate repository for the individual code, along with the pull/commit/push/tag/release cycle that comes with it to make changes to a potentially developing module. So in the end, we end up with the “common” library.

The problem with is many-fold though:

  • dependency chains are not explicit,
  • the “common” library grows over time,
  • the same library becomes disorganized,
  • it’s not clear later on how to break things out because it’s not clear what projects are using what parts of the library,
  • the library with all theses different pieces of functionality breaks the rule of single responsibility.

Back to the node.js philosophy, if you’ve ever used npm before, you know that there are tons and tons of modules available for node (as an interesting sidenode, npmjs module counts are growing by 94 modules/day at the time of writing [link]). The recommended approach is to keep modules small, and publish them independently so they can be used explicitly across applications. James Halliday writes about this approach on his blog.

Back to Python

Python has been criticized for having painful package management. At work, we currently use setuptools for installing packages from Github, and it does a pretty decent job. As I’ve written before you can specify dependency_links in the setup.py file to pull tarballs from any source control system that will provide them. Like I said, this works pretty well.

Mypi

I’ve also recently set up a mypi private package index for our work, so we can start moving towards small, reusable python packages. I’ve also looked at djangopypi and djangopypi2, the latter being a bootstrap-converted fork of the former. Both these projects seem to add a little more functionality around users management, and of course they’re built on Django, which means you get the nice Django admin at the same time. I haven’t had time to do a full comparison, that will have to come later. For the time being, mypi seems to do the trick nicely.

Where setuptools falls apart

Turns out, using pip, you can just specify a custom index in your ~/.pip/pip.conf and then pip install <packagename> and you’re good to go. That’s fine for installing one-off modules, however, automating the entire depenedency installation process wasn’t obvious at first.

Setuptools fail

My scenario had 2 projects, Project A and Project B. Project A relies on custom packages in my mypi index, and is published to the package also. Project B has a single dependency on Project A. Using setuptools python setup.py install would find Project A in the private package index (via dependency_links), but none of Project A‘s custom index dependencies were being found, despite having specified the dependency_links in that project.

Long story longer (and the answer)

The answer just turned out to be a little bit more understanding of the evolution of python package management, specifically this little tidbit about pip:

“Internally, pip uses the setuptools package, and the pkg_resources module, which are available from the project, Setuptools.”

Turns out pip spits out the setuptools configuration (whatever you have in your setup.py) into a /<project-name>.egg-info/ folder, including dependency_links.

To get the pip equivalent of python setup.py develop just run:

1
2
# -e means 'edit'
$ pip install -e .

To get the same for python setup.py install run:

1
$ pip install .

The super-cool thing about this is that dependency_links no longer need to be set in the setup.py files as pip will use the custom index set up in the ~/.pip/pip.conf file.

Done and done

I think this solution will solve some of the problem of having all the git/Github overhead involved in releases. With a simple fab setup, release candidates and formal releases can be incremented and deployed in a way that feels a little more clean and independent of the git workflow, while still maintaining source control. I’m hoping it will promote users to push modules early in a ‘sharable’ way to the private index so they can be easily installed for others. All in all, it feels cleaner to do it this way for me.

Hope that helps someone else down the road. Now we have a nice private registry for our python packages, and an easy way to automate their installation.

Note It appears that djangopypi is actually maintained by Disqus, that may make it a good reason to use the project, as it will probably be maintained for a longer period. I will explore that option and write up a comparison later.