Modules, Packages, and all that

One of the key features of Python is that the actual core language is fairly small. This is an intentional design feature to maintain simplicity. Much of the powerful functionality comes through external modules and packages.

The main work of installation so far has been to supplement the core Python with useful modules for science analysis.

A module is simply a file containing Python definitions, functions, and statements. Putting code into modules is useful because of the ability to import the module functionality into your script or IPython session, for instance:

import atpy
data = atpy.Table('my_table.fits')

You’ll see import in virtually every Python script and soon it will be second nature.

Question:
:?: Importing modules and putting the module name in front is such a bother, why do I need to do this?

Answer:
:!: It keeps everything modular and separate. For instance many modules have a read() function since this is a common thing to do. Without using the <module>.<function>(…) syntax there would be no way to know which one to call.

Sometimes it is convenient to make an end-run around the <module>. prefixing. For instance when you run ipython –pylab the interpreter does some startup processing so that a number of functions from the numpy and matplotlib modules are available without using the prefix.

Python allows this with this syntax:

from <module> import *

That means to import every function and definition from the module into the current namespace (in other words make them available without prefixing). For instance you could do:

from atpy import *
data = Table('my_table.fits')

A general rule of thumb is that from <module> import * is OK for interactive analysis within IPython but you should avoid using it within scripts.

A package is just a way of collecting related modules together within a single tree-like hierarchy. Very complex packages like NumPy or SciPy have hundreds of individual modules so putting them into a directory-like structure keeps things organized and avoids name collisions. For example here is a partial list of sub-packages available within SciPy

scipy.fftpack Discrete Fourier Transform algorithms
scipy.stats Statistical Functions
scipy.lib Python wrappers to external libraries
scipy.lib.blas Wrappers to BLAS library
scipy.lib.lapack Wrappers to LAPACK library
scipy.integrate Integration routines
scipy.linalg Linear algebra routines
scipy.sparse.linalg Sparse Linear Algebra
scipy.sparse.linalg.eigen Sparse Eigenvalue Solvers
scipy.sparse.linalg.eigen.arpack Eigenvalue solver using iterative methods.

If you’ve gotten this far you have a working scientific Python environment that has most of what you will ever need. Nevertheless it is almost certain that you will eventually find a need that is not met within your current installation. Here we learn where to find other useful packages and how to install them. Package resources

Good vs. bad resources

When you find some package on the web, look for a few things:

  1. Good modern-looking documentation with examples
  2. Installs easily without lots of dependencies (or has detailed installation instructions)
  3. Actively developed

Search engines

Enter some keywords into your favorite search engine: “python blah blah” or “python seismology blah blah”

PyPI

The Python Package Index is the main repository for 3rd party Python packages (about 14000 packages and growing).

The advantage of being on PyPI is the ease of installation using pip install <package_name>.

There are two standard methods for installing a package.

pip install

The pip install script is available within our scientific Python installation and is very easy to use (when it works). During the installation process you already saw many examples of pip install in action.

Features include:

  • If supplied with a package name then it will query the PyPI site to find out about that package. Assuming the package is there then pip install will automatically download and install the package.
  • Will accept a local tar file (assuming it contains an installable Python package) or a URL pointing to a tar file.
  • Can install in the user package area via pip install <package or URL> –user (see discussion further down)

python setup.py install

Some packages may fail to install via pip install. Most often there will be some obvious (or not) error message about compilation or missing dependency. If there are troubles with compilation, most likely the development package (usually marked with <dependency>-dev in the Linux package repository) of the missing dependency is not install. If installing the development packages doesn't solve the problem, the likely next step is to download the installation tar file and untar it. Go into the package directory and look for files like:

INSTALL
README
setup.py
setup.cfg

If there is an INSTALL or README file then hopefully you will find useful installation instructions. Most well-behaved python packages do the installation via a standard setup.py script. This is used as follows:

python setup.py --help  # get options
python setup.py install # install in the python area (root / admin req'd)
python setup.py install --user # install to user's package area

More information is available in the Installing Python Modules page.

An important option in the installation process is where to put the package files. We’ve seen the –user option in pip install and python setup.py install. What’s up with that? In general, if you don’t have to you should not use –user, but see the discussion in Multiple Pythons on your computer for a reason you might.

WITH ''--user''

Packages get installed in a local user-owned directory when you do something like either of the following:

pip install --user asciitable
python setup.py install --user

This puts the packages into:

Mac ~/Library/Python/2.x/lib/python/site-packages
Linux ~/.local/lib/python-2.x/site-packages
Windows %APPDATA%/Python/Python2x/site-packages

On Mac if you did not use the EPD Python Framework then you may see user packages within ~/.local/lib as for linux. This depends on whether Python is installed as a MacOS Framework or not.

WITHOUT --user

This option may require root or admin privilege because the package will be installed in the system area instead of your own local directories. For most astronomers running on a single-user machine this is a good option.

Installing this way has the benefit of making the package available for all users of the Python installation, but has the downside that it is a bit more difficult to back out changes if required. How do I find a package once installed?

Finding the file associated with a package or module is simple, just use the help command in IPython:

import scipy
help(scipy)
 
This gives something like:
 
NAME
    scipy
 
FILE
    /usr/local/lib/python2.6/site-packages/scipy/__init__.py
 
DESCRIPTION
    SciPy: A scientific computing package for Python
    ================================================
 
    Documentation is available in the docstrings and
    online at http://docs.scipy.org.
    ...

There is no simple and fully consistent way to do this. The Python community is working on this one. In most simple cases, however, you can just delete the module file or directory that is revealed by the technique shown above.

You can also use pip uninstall <packagename> to uninstall the package. But make shure to check if all the files are really deleted.

If you attempt to install a package but it does not work, your basic options are:

  • Dig in your heels and start reading the error messages to see why it is unhappy. Often when you find a specific message it’s time to start googling by pasting in the relevant parts of the message.
  • Send a message containing a detailed description of the installation error to a dedicated mailing list.

The official reference on Modifying Python’s Search Path gives all the details. In summary:

When the Python interpreter executes an import statement, it looks for modules on a search path. A default value for the path is configured into the Python binary when the interpreter is built. You can determine the path by importing the sys module and printing the value of sys.path:

$ python
Python 2.2 (#11, Oct  3 2002, 13:31:27)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-112)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/local/lib/python2.3', '/usr/local/lib/python2.3/plat-linux2',
'/usr/local/lib/python2.3/lib-tk', '/usr/local/lib/python2.3/lib-dynload',
'/usr/local/lib/python2.3/site-packages']
>>>

Within a script it is possible to adjust the search path by modify sys.path which is just a Python list. Generally speaking you will want to put your path at the front of the list using insert:

import sys
sys.path.insert(0, '/my/path/python/packages')

You can also add paths to the search path using the PYTHONPATH environment variable.

This text is based on a chapter form the Practical Python for Astronomers by Tom Aldcroft, Tom Robitaille, Brian Refsdal and Gus Muench at the 2011 Smithsonian Astrophysical Observatory