Monday, July 20, 2009

How to define a main entry point into a Python program

Here's a common pattern you'll find in many Python programs:


import some.modules

def ineSomeFunctions():
pass

class Whatever:
pass

def main():
ineSomeFunctions(Whatever())


if __name__ == '__main__':
main()


This works because the global __name__ is set to "__main__" when evaluating the code in the file invoked on the command line. This has a problem, though. It also puts all of those functions and classes into a module named "__main__". Sometimes this isn't an issue, but usually it will become one.

So what should you do instead? This:


if __name__ == '__main__':
import mymodule
raise SystemExit(mymodule.main())

import some.modules

def ineSomeFunctions():
pass

class Whatever:
pass

def main():
ineSomeFunctions(Whatever())


It's probably possible to do even better than this, but even this simple change buys a lot - suddenly no more __main__ wackiness. So, do it this way!

7 comments:

  1. I'm not sure either of how this could become a problem... would you please give us an example?

    ReplyDelete
  2. In the solution you offer, are there two different files? The first one would end after the 'raise' and the second one begins with the 'import' but we do not see it with this blog engine.

    Does this solve the problem that the buildout entry point solve? http://www.buildout.org/docs/recipe.html#packaging-recipe

    ReplyDelete
  3. I think he's saying you can do it with one file. That file is "mymodule.py", and that conditional at the top will trigger when mymodule.py is being evaluated as __main__, and import it as "mymodule" instead.

    It sounds a little weird that Python would evaluate the same module twice for an import, as mymodule and __main__, but perhaps that's one of the quirks that this is attempting to defend against.

    ReplyDelete
  4. Hi Devin,

    I agree completely. Mutable global state is a bad, bad thing. :) Part of my motivation for this post, though, was to suggest a very simple alteration to a common idiom which results in a slight improvement in the resulting behavior. On the other hand, it's a *big* job to education someone enough so that they realize they should stop relying on globals.

    Introducing a second source file is also a good way to address this. But again, that's a slightly larger change than the one I proposed here. I wholeheartedly endorse that approach, but didn't cover it here because it's not quite as simple.

    By the way, there's also another class of problems that this change is meant to address. Even if you don't have mutable globals, defining functions and classes in the __main__ module gives them a funny name - __main__.Whatever. Using the definition imported from mymodule fixes that problem as well. This most often comes up as an issue when someone tries to pickle or otherwise serialize something from a __main__ module. A name like __main__.Whatever is much less likely to result in something that's recoverable, as compared to mymodule.Whatever.

    This problem is also fixed by your suggested solutions, of course. So I'll again second everything you said. :)

    By the way, I certainly do find this to be a common idiom. The Python standard library itself contains 674 occurrences of it! I think this is quite unfortunate.

    ReplyDelete
  5. Hi Alan,

    I think Devin did a good job explaining one of the possible problems - two copies of some global mutable state diverging - in his comment above. In my reply to him, I mentioned another - that of naming and how this can interact poorly with serialization. Does that help?

    ReplyDelete
  6. Yep, Kevin inferred my intent correctly. There is only one file in the solution I proposed. That's why it's raising SystemExit - to avoid having execution continue on through the rest of the file (plus it adds a feature - letting main specify the exit code, but that's secondary).

    ReplyDelete
  7. I haven't used buildout, but I suspect that the entry point feature does solve this problem as well, and probably better than I did in my post. :) Since the buildout entry point takes responsibility for handling the script, it removes the need to have a __main__ check in your code, and I assume also removes the case where the implementation module is evaluated as the __main__ module.

    ReplyDelete