Tuesday, April 5, 2005

socket.recv -- three ways to turn it into recvinto

Inspired by http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/408859

While some people are busy worrying about how to make Python's builtin sockets less efficient, one might be wondering if the reverse is possible - how do you make them more efficient? After all, you usally want your program to run more quickly, or tax your CPU less heavily, or consume fewer resources, not the reverse. Fortunately, I have just the solution for you1. The approach explored below will be to avoid allocating new memory when reading from the socket. Since malloc() is a relatively expensive operation, this will save us a bunch of CPU time, as well as saving us memory by reducing chances for heap fragmentation and so forth.

  1. Solution the first: readinto

    exarkun@boson:~$ python
    Python 2.4.1 (#2, Mar 30 2005, 21:51:10)
    [GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket, array, os
    >>> s = socket.socket()
    >>> s.bind(('', 4321))
    >>> s.listen(3)
    >>> c, a = s.accept()
    >>> buf = array.array('c', '\0' * 50)
    >>> os.fdopen(c.fileno()).readinto(buf)2
    50
    >>> buf.tostring()
    'apiodjwoaidjaowidjalskdjlaksdjlaksjdawd\r\naiopjwdoa'
    >>> c.recv(10)
    Traceback (most recent call last):
    File "", line 1, in ?
    socket.error: (9, 'Bad file descriptor')
    >>>

    As you can see, the handy readinto method of file objects can be used to provide a pre-allocated memory space for a read to use. Unfortunately, it is a file method, not a socket method (also, its documentation recommends strongly against its use, though I can't imagine why!). We can get around this, though, since a file descriptor is just a file descriptor. os.fdopen will happily give us a file object wrapped around the socket we're really interested in. Then it's a simple matter of calling readinto on the resulting file object with an array we have previously allocated.


    "Great!" you say. "Why even bother with the other two examples?" you wonder. Well, there are a few problems. Even if we accept the os.fdopen hack, and even if we do not let the strong words in the file.readinto docstring dissuade us, there's still a tiny problem. file.readinto closes the file descriptor before returning! Damn, there goes our socket. Maybe the next solution will fare better.

  2. Solution the second: recv(2)
    Okay, that stuff with file.readinto was just silly. Let's get serious here. libc already provides the functionality we need here, and has for decades. This is basic BSD sockets 101. Stevens would cry (if he were still with us) if he saw us doing anything else. So let's cut the funny business and just do what a C programmer would do: call recv.

    exarkun@boson:~$ python
    Python 2.4.1 (#2, Mar 30 2005, 21:51:10)
    [GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dl
    >>> libc = dl.open('libc.so.6')
    >>> import socket, array
    >>> s = socket.socket()
    >>> s.bind(('', 4321))
    >>> s.listen(3)
    >>> c, a = s.accept()
    >>> buf = array.array('c', '\0' * 50)
    >>> libc.call('recv', c.fileno(), buf.buffer_info()[0], 50, 0)
    29
    >>> buf.tostring()
    'aldjiawoidjaskdjlacnwmoqawd\r\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
    >>>
    >>> libc.call('recv', c.fileno(), buf.buffer_info()[0], 50, 0)
    30
    >>> buf.tostring()
    'ncbnczmnxbcmznxcbzmnxbcu7wyw\r\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

    Sweet. We open libc so we can call recv in it, create a socket as usual, and another array object to act as our pre-allocated memory location. Note we use the buffer_info method this time, because recv() does not expect a "read-write buffer object" (like file.readinto did), but a pointer to a location in memory, which is exactly what buffer_info()[0] gives us. Then we just call recv. Easy as eatin' pancakes. We can even do it twice, demonstrating that recv isn't doing anything ridiculous, like closing the socket for us (I did it with the same array object, overwriting the previous contents, demonstrating that our no-allocation trick is working just fine).

    I know what you're thinking, though. array objects? What the hell can you do with an array object? Well, here's what. All kinds of stuff! Why, you can build one from a string. Or build a string from one. Or, uh, swap the byte order... umm, oh yea you can reverse them too. Cool deal, eh? Err, no, maybe not actually... None of those cool string methods are around, unfortunately. You can create a string from the array but that kind of defeats the purpose... in doing so you've just allocated a pile of memory. Nuts. Well, wait, don't give up yet, we may be able to improve upon this situation...

  3. Solution the ultimate: recv(2) (uh yea, again).

    The only problem we really have with recv isn't actually with recv: it's with array! Let's not throw the baby out with the bathwater, then. Solution: drop array, keep recv. We want a string. Well, let's use a string.


    exarkun@boson:~$ python
    Python 2.4.1 (#2, Mar 30 2005, 21:51:10)
    [GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket, dl
    >>> libc = dl.open('libc.so.6')
    >>> s = socket.socket()
    >>> s.bind(('', 4321))
    >>> s.listen(3)
    >>> c, a = s.accept()
    >>> buf = '\0' * 50
    >>> libc.call('recv', c.fileno(), id(buf) 20, 50, 0)
    36
    >>> buf
    'aodijaacnwuihaiuwdhkasjnbkawuhdawd\r\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
    >>>

    It's the perfect solution. No wasted memory allocation, but the same level of convenience as a normal call to socket.recv. Rarely are we lucky enough to find such elegant and flawless solutions in computer science. The astute reader might object to the magical 20 in the recv call as being inelegant or flawed, however the value can easily be computed at runtime. The code to do so is extremely simple and only omitted because it slightly too large to fit in the margin.


So there you have it. Happy networking.


1 Sorry, it's way too late to post something useful. Especially when I could post something fun instead.
2 Note: in each example where socket IO occurs, I have launched telnet in another terminal and type in some random bytes.

1 comment:

  1. You should guess why I ROAR at you cruelly.
    Could it be because you are using extreme undocumented behaviour?????
    Hint: what would happen if you recv(2)ed with one-character strings!! Would it not BE ULTIMATELY EVIL FOR YOU WOULD BE WRITING MEMORY WHICH WOULD MAKE ALL PYTHON INVALID! It might!
    I will also poke you and squish your shoulders with lots of evil and squish.

    ReplyDelete