Friday, November 11, 2005

There is an easy way to determine if a Python RPC library is broken

If the library employs Python's pickle module, it is broken. Period.

When will people learn that pickle is not suitable for this task? "Oh, it looks so easy." "Oh, it's so fast, just look at those objects fly." "Oh it will core my process when handling a maliciously constructed string, there goes my server." "Oh, it allows arbitrary code to be executed by a remote party, woops there goes my credit card database."

Not convinced? Run this code on your computer:

import pickle
pickle.loads("cposix\nsystem\np0\n(S'cat /etc/passwd'\np1\ntp2\nRp3\n.")

Wake up. This is not news. The pickle documentation explicitly points out the fact that it is not intended to be used in this fashion (although frankly, this warning could be a little closer to the beginning of the pickle documentation). Stop doing it. Stop releasing software that does it. Just stop, already.


  1. We use pickle for RPC. The servers are firewalled off from the outside world, and check a crypto HMAC on the message before attempting to unpickle it, to be sure that the message was generated by 'us'.

  2. I accept that pickle/cPickle are a pile of smelly *censored*.

    It's also a fact that pickles are by default unsafe.

    But I think Jp knows how to use pickle/cPickle safely, and ignores it for the argument sake:

    import cPickle as pickle
    import cStringIO as StringIO
    import sys

    allowed_modules = {"mymodule":set(["myClass"])}

    def checkfunction(module, klass):
    if allowed_modules.has_key(module) and klass in allowed_modules[module]:
    mod = __import__(module, {}, {}, ["__all__"])
    _class = getattr(mod, klass)
    return _class
    raise ValueError("Not supported: %r/%r" % (module, klass))

    unpickler = pickle.Unpickler(StringIO.StringIO("cposix\nsystem\np0\n(S'cat /etc/passwd'\np1\ntp2\nRp3\n."))
    unpickler.find_global = checkfunction
    print unpickler.load()

  3. Pickle has other problems beyond creating unnecessary vulnerabilities to the guy you're receiving serialized objects from. Glyph has written cogent posts in the past about how automatically serializing things without thinking about them makes it very difficult to maintain backwards-compatibility, and even simple things like rearranging your namespace hierarchy can introduce such problems. And it doesn't really support incremental serialization.

    But I still think it's very useful outside the context of RPC. This little OED-browsing app uses it to log user-entered guideword information to a file; pickle.dump((first_word, last_word, pageno), self.logfile) is considerably simpler than worrying about whether first_word contains newlines or spaces or colons or whatever, and I don't have to explicitly convert the page number back into a number when I read it back out of the file. (Please don't put newlines in the guidewords unless you're going to delete them shortly afterwards.)

    Presumably I could use Jelly for this too --- that program already depends on Twisted and Nevow --- but I haven't yet taken the time to learn how.

  4. That sounds like an eminently reasonable use of Pickle. If the scope of the storage requirements were to grow, I might re-examine the choice, but pickle is like totally convenient and using it as you described doesn't introduce any security issues (that I know of ;).

    Converting to Jelly (actually Jelly + Banana, or perhaps just Banana) would probably be easy (replace pickle.dumps() with banana.encode(jelly.jelly()) or just banana.banana(), and loads() with jelly.unjelly(banana.decode()) or just banana.decode()), but I don't know that I see much value in it.

  5. I don't think it's that easy to core Python with a pickle string is it?

    Certainly agree with the rest.

  6. It's not easy, but in the course of developing with cPickle/Python 2.3, I've done it twice.

  7. Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pickle
    >>> pickle.loads("cposix\nsystem\np0\n(S'cat /etc/passwd'\np1\ntp2\nRp3\n.")
    Traceback (most recent call last):
    File "", line 1, in ?
    File "C:\Python24\lib\", line 1394, in loads
    return Unpickler(file).load()
    File "C:\Python24\lib\", line 872, in load
    File "C:\Python24\lib\", line 1104, in load_global
    klass = self.find_class(module, name)
    File "C:\Python24\lib\", line 1138, in find_class
    ImportError: No module named posix
    >>> pickle.loads("cposix\nsystem\np0\n(S'cat /etc/passwd'\np1\ntp2\nRp3\n.")

    but it works surely on unix