import some.modules
def ineSomeFunctions():
pass
class Whatever:
pass
def main():
ineSomeFunctions(Whatever())
if __name__ == '__main__':
main()
This works because the global
__name__
is set to "__main__"
when evaluating the code in the file invoked on the command line. This has a problem, though. It also puts all of those functions and classes into a module named "__main__"
. Sometimes this isn't an issue, but usually it will become one.So what should you do instead? This:
if __name__ == '__main__':
import mymodule
raise SystemExit(mymodule.main())
import some.modules
def ineSomeFunctions():
pass
class Whatever:
pass
def main():
ineSomeFunctions(Whatever())
It's probably possible to do even better than this, but even this simple change buys a lot - suddenly no more
__main__
wackiness. So, do it this way!
I'm not sure either of how this could become a problem... would you please give us an example?
ReplyDeleteIn the solution you offer, are there two different files? The first one would end after the 'raise' and the second one begins with the 'import' but we do not see it with this blog engine.
ReplyDeleteDoes this solve the problem that the buildout entry point solve? http://www.buildout.org/docs/recipe.html#packaging-recipe
I think he's saying you can do it with one file. That file is "mymodule.py", and that conditional at the top will trigger when mymodule.py is being evaluated as __main__, and import it as "mymodule" instead.
ReplyDeleteIt sounds a little weird that Python would evaluate the same module twice for an import, as mymodule and __main__, but perhaps that's one of the quirks that this is attempting to defend against.
Hi Devin,
ReplyDeleteI agree completely. Mutable global state is a bad, bad thing. :) Part of my motivation for this post, though, was to suggest a very simple alteration to a common idiom which results in a slight improvement in the resulting behavior. On the other hand, it's a *big* job to education someone enough so that they realize they should stop relying on globals.
Introducing a second source file is also a good way to address this. But again, that's a slightly larger change than the one I proposed here. I wholeheartedly endorse that approach, but didn't cover it here because it's not quite as simple.
By the way, there's also another class of problems that this change is meant to address. Even if you don't have mutable globals, defining functions and classes in the __main__ module gives them a funny name - __main__.Whatever. Using the definition imported from mymodule fixes that problem as well. This most often comes up as an issue when someone tries to pickle or otherwise serialize something from a __main__ module. A name like __main__.Whatever is much less likely to result in something that's recoverable, as compared to mymodule.Whatever.
This problem is also fixed by your suggested solutions, of course. So I'll again second everything you said. :)
By the way, I certainly do find this to be a common idiom. The Python standard library itself contains 674 occurrences of it! I think this is quite unfortunate.
Hi Alan,
ReplyDeleteI think Devin did a good job explaining one of the possible problems - two copies of some global mutable state diverging - in his comment above. In my reply to him, I mentioned another - that of naming and how this can interact poorly with serialization. Does that help?
Yep, Kevin inferred my intent correctly. There is only one file in the solution I proposed. That's why it's raising SystemExit - to avoid having execution continue on through the rest of the file (plus it adds a feature - letting main specify the exit code, but that's secondary).
ReplyDeleteI haven't used buildout, but I suspect that the entry point feature does solve this problem as well, and probably better than I did in my post. :) Since the buildout entry point takes responsibility for handling the script, it removes the need to have a __main__ check in your code, and I assume also removes the case where the implementation module is evaluated as the __main__ module.
ReplyDelete