Friday, December 21, 2007

Filesystem structure of a Python project

Do:
  • name the directory something related to your project. For example, if your project is named "Twisted", name the top-level directory for its source files Twisted. When you do releases, you should include a version number suffix: Twisted-2.5.
  • create a directory Twisted/bin and put your executables there, if you have any. Don't give them a .py extension, even if they are Python source files. Don't put any code in them except an import of and call to a main function defined somewhere else in your projects. (Slight wrinkle: since on Windows, the interpreter is selected by the file extension, your Windows users actually do want the .py extension. So, when you package for Windows, you may want to add it. Unfortunately there's no easy distutils trick that I know of to automate this process. Considering that on POSIX the .py extension is a only a wart, whereas on Windows the lack is an actual bug, if your userbase includes Windows users, you may want to opt to just have the .py extension everywhere.)
  • If your project is expressable as a single Python source file, then put it into the directory and name it something related to your project. For example, Twisted/twisted.py. If you need multiple source files, create a package instead (Twisted/twisted/, with an empty Twisted/twisted/__init__.py) and place your source files in it. For example, Twisted/twisted/internet.py.
  • put your unit tests in a sub-package of your package (note - this means that the single Python source file option above was a trick - you always need at least one other file for your unit tests). For example, Twisted/twisted/test/. Of course, make it a package with Twisted/twisted/test/__init__.py. Place tests in files like Twisted/twisted/test/test_internet.py.
  • add Twisted/README and Twisted/setup.py to explain and install your software, respectively, if you're feeling nice.
Don't:
  • put your source in a directory called src or lib. This makes it hard to run without installing.
  • put your tests outside of your Python package. This makes it hard to run the tests against an installed version.
  • create a package that only has a __init__.py and then put all your code into __init__.py. Just make a module instead of a package, it's simpler.
  • try to come up with magical hacks to make Python able to import your module or package without having the user add the directory containing it to their import path (either via PYTHONPATH or some other mechanism). You will not correctly handle all cases and users will get angry at you when your software doesn't work in their environment.

30 comments:

  1. Yay added to memories.

    ReplyDelete
  2. That was awesome. Thanks. I tend to use eclipse with pydev and have the src/ dir can get annoying.

    ReplyDelete
  3. Thanks! Very timely post for me!

    http://pytomation.sourceforge.net

    ReplyDelete
  4. foa, thanks a lot.
    Your article is the only one I could find on python projects structure.
    It is weird there are no other resources on the topic. Or is there anything else?

    regarding "ProjName\bin: ...Don't give them a .py extension, even if they are Python source files". I do not see a good reason for recommendation under windows as you cannot just mark file as executable. I am missing the point or what would be your suggestion?

    Thanks,
    IM

    ReplyDelete
  5. What do you recommend for a project that has two modules: a main module and a testing module?

    ReplyDelete
  6. Directories and empty files are pretty cheap, so I suggest creating a package in that case. It makes it obvious where everything goes in the repository and how it can be installed. If you want to avoid having an extra dot in your APIs, then you can still import things into the package namespace to do this (ie, instead of foo.bar.baz, you can put "from foo.bar import baz" into foo/__init__.py" and then no one will even notice you used a package).

    You *can* avoid making a package, just have foo.py and test_foo.py, but then you have two top-level names instead of one (so you have to specify twice as many things in setup.py, etc). If you do this though, then you might be tempted not to create a package when you add your third file... ;) So I'd just create a package.

    ReplyDelete
  7. It's bothersome to have so much code duplicated in every script. Fortunately, this can be factored into a reusable location, but unfortunately doing so probably complicates the packaging story. I'm not sure which unfortunateness is worse.

    The refactoring, by the way, is this. Put a module in the bin/ directory that does the path mangling on import. Then, import it at the top of each script. This works, because the directory the script is in is added to sys.path by Python. But then you have to /distribute/ different scripts than sit in your repository, or distribute a dummy module that does something different from the VCS'd one. And then you get into trouble with people who don't know this, or you have to jump through crazy distutils hoops that break in the next version of Python, or whatever.

    It's sad. These days I don't rely on the preamble (at least not for Twisted scripts), because I use an environment management tool (Combinator) which makes sure my path is set up right. That's probably not the ultimate solution, but it goes a long way.

    Maybe someone else can come up with the real solution.

    ReplyDelete
  8. The advice to *not* put the '.py' extension on executable scripts is bad for Windows. Without the file extension the scripts won't be executable on Windows.

    ReplyDelete
  9. Use virtualenv and have your development version on the path.

    ReplyDelete
  10. True. Windows is a very sad place. Most likely, if you want scripts on Windows, you should do them with batch files or some other native Windows format. Of course, it's more work. I think there are probably some nice features you get by doing this (something along the lines of resistance to Python upgrades or environment isolation) but actually describing them in detail (or investigating to find out if they actually exist) feels too difficult right now. ;)

    ReplyDelete
  11. Is virtualenv a real project yet? The only time I tried to use it, it broke, and then I couldn't find an issue tracker, a website, a mailing list, or even an IRC channel for it. I ended up mailing Ian directly and never heard back.

    ReplyDelete
  12. Well, providing them as a .py file works fine. You just setup your PATH and PATHEXT environment variables to handle them. In this context your advice not to use the .py extension for scripts would screw this royally, and so classifies as what I like to call 'bad advice'. :-)

    ReplyDelete
  13. You're right. Provide them with the .py on Windows, and without elsewhere. I'll update the post to make this distinction.

    ReplyDelete
  14. Still there is not alot that can be found on python structure. Ive been searching for the past day or so and still cant find anything apart from here. Very intriguded as to why there is nowhere else that contains info like this. Chislehurst Glaziers

    ReplyDelete
  15. Every now and then I re-read this file and think it should be added to docs.python.org. Anyway, to avoid duplicating a preamble I usually create a 'launcher' file and symlink it with several names. The preamble uses argv[0] to decide what it should run from the 'main' module or similar. This makes the preamble a bit more complex, but also means it can easily be anywhere on the filesystem (most sensibly with the rest of the project) and doesn't repeat itself.

    ReplyDelete
  16. Thanks for the great article. For those that don't know, there's also discussion of packages and modules here:

    http://docs.python.org/py3k/tutorial/modules.html

    ReplyDelete
  17. I just looked up this post again. A classic.

    Part of the problem with using a package for 'single module and tests' is that you have to make up a second, irrelevant name.

    e.g.
    * foo/__init__.py
    * foo/bar.py
    * foo/tests/test_foo.py

    And, maybe that should be test_bar.py.

    I guess you *could* get by with:
    * foo/__init__.py
    * foo/_foo.py
    * foo/tests/test_foo.py

    Which has the nice side-effect of communicating that direct imports from foo._foo are unsupported.

    ReplyDelete
  18. If you put your executables in a "bin/" subdir, you'll probably want batch/shell scripts at the top level so that people can just double-click something in their file manager.

    ReplyDelete
  19. What stops people from clicking on things in the bin folder? :)

    Also, this is only about the source tree structure. The installed layout can include icons on the desktop or in an application menu or on the deskbar or wherever (and perhaps installed layouts merit their own post, but they're a lot harder and platform specific).

    ReplyDelete
  20. "Don't put your source in a directory called src or lib. This makes it hard to run without installing." Could You elaborate?

    ReplyDelete
  21. Thanks. I like the idea of putting your executables in a Twisted/bin because that way you can give your main executable the same name ("twisted") as the name of your package folder ("twisted").

    Question: is there a way to facilitate people running the executables directly from the bin more easily -- without first installing your project (e.g. in the case of developers working on your project)? Not everyone is comfortable or used to managing their PYTHONPATH, etc.

    ReplyDelete
  22. Piotr, the idea is that the source layout of your code - your packages and modules - should not get in the way of using the code from a source checkout. Since Python uses directories to represent packages, this means you don't want extra directories hanging around that aren't meant to be Python packages. My top-level namespace isn't "src", it's "twisted", so my source belongs in a directory named "twisted". Sometimes people name their top-level directory "src" or "lib" and let the installer fix the name in the installed copy - this is the worst, since it means the source is completely unusable until you install it (it won't have the right top-level name; instead of "import twisted" I would have to write "import lib", and that breaks _completely_ as soon as two different projects decide this is a good idea). Slightly less bad is to have something like "src/twisted/". At least I can use this code via its normal names if I add "src" to sys.path somehow (probably via PYTHONPATH). However, it's still pointless extra work.

    ReplyDelete
  23. Chris, the solution Twisted takes is to have the scripts understand their placement in the project and do the sys.path mangling themselves. This used to be duplicated in each script, but we recently refactored this. You can find the new version here: . The _preamble module (which you can find in the same directory) has all the logic for figuring out what to add to sys.path. As a bonus, we don't install _preamble.py, so the import fails after Twisted has been installed - exactly what you want, since an installed Twisted has no need for sys.path management. I won't say that this is perfect solution, but in practice it seems to be working okay for us.

    ReplyDelete
  24. I think the post should be updated to recommend using entry_points in setuptools/Distribute rather than placing .py files in a 'bin' folder. It avoids all the mess of Unix and Windows using different conventions. You might want to have an equivalent .py file in the bin folder anyway so you can run without installing, though I frankly don't see the need... why not just install the thing? You can just do "pip uninstall foo" afterward anyway if you don't want to keep it.

    ReplyDelete
  25. Thanks for your input, furrykef. I disagree that entry_points should be recommended in general. There are a lot of issues with setuptools (eg, unmaintained) and distribute (eg, planned abandonment). I also generally don't recommend pip. It can be a tolerable fallback, but it's not a replacement for proper packaging (eg, it often does not correctly install things, and sometimes it catastrophically fails to correctly uninstall them).

    These projects are all trying to provide some useful functionality, but they're not really there yet. All the other recommendations in this post I can make whole heartedly. Anything about setuptools, distribute, and pip I would have some reservations about. Perhaps someday there'll be something I feel good enough about in that space to recommend though.

    ReplyDelete
  26. This is an excellent post. How would I adapt this for situations were I have other file types in addition to python code? For example, I am using the PIL ImageFont module to draw text for which I have some fonts (*.pil files). The font are used by a module I have stored in "twisted". Should I just put the fonts in the "twisted" module directory? I also have some static content such as html pages, css, javascript and images. How should I handle this? So far I have the following structure:

    Twisted/bin/ # main python script
    Twisted/twisted/ # for modules and pil files
    Twisted/static/css
    Twisted/static/html
    Twisted/static/images
    Twisted/static/js

    Thanks in advance for the comments.

    ReplyDelete
    Replies
    1. Hi Vince,

      Unfortunately, installing non-source files is something distutils is extremely bad at. It's lots of extra work to get them included, and then every platform has basically gone and made a different decision about where the files should actually be installed to. This means that once you have non-Python files you're installing, no matter what you do, you'll probably have people complaining at you that you're doing it wrong. Ideally what you would do is create a platform-specific package for each platform you intend to support (which is, of course, a huge amount of work). Sorry for this non-answer! If you come across a real solution, I'd love to hear about it. Without knowing what a good solution is, I'm not going to recommend any particular layout in the source tree, since one might be related to the other. Lacking an actual solution, I guess I'd just suggest a layout that makes it easy to test and develop the software.

      Delete
  27. I have a question about the bin folder.
    If I put my start script in the bin folder, how will I import my module which is inside the twisted folder without messing around with the libraries path?

    ReplyDelete
    Replies
    1. Two possibilties.

      First, you *should* mess around with the libraries path (sys.path). You want to make a new Python package available to the Python import system, that means you should have a new entry on sys.path (during development, at least, or sometimes for specialized cases of installation where you are not installing to something like site-packages; once you install to site-packages, of course, the package is importable and you don't have to do anything else).

      This is what I do with 99% of the packages I develop. It works great.

      The second option is what Twisted itself has decided to do. The decision was made about ten years ago, and so now we're mostly just carrying history along with us. I'm not sure we'd choose to do the same thing again, if we had to make the decision today. Anyhow, the option is to do the messing around inside the script. You can see how we implemented this in these two files: http://twistedmatrix.com/trac/browser/trunk/bin/manhole and http://twistedmatrix.com/trac/browser/trunk/bin/_preamble.py

      Delete
  28. This was exactly the advice I needed, thank you.

    ReplyDelete