Friday, May 26, 2006

A Note for Python Users

If you use Twisted (or any other network software which uses non-blocking sockets) and you handle large numbers of concurrent connections, you probably want to avoid Python 2.4.3:

exarkun@kunai:~$ python
fPython 2.4.3 (#2, Apr 27 2006, 14:43:58)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> files = []
>>> for i in range(1024):
... files.append(file('/dev/null'))
>>> import socket
>>> s = socket.socket()
>>> s.connect(('', 80))
>>> s.send('GET / HTTP/1.1\r\n')
Traceback (most recent call last):
File "", line 1, in ?
socket.error: unable to select on socket

Hopefully this will be fixed in Python 2.4.4 and Python 2.5.0. Monitor the progress of this bug on the sourceforge bug tracker.

Monday, May 22, 2006

Limiting Parallelism

Concurrency can be a great way to speed things up, but what happens when you have too much concurrency? Overloading a system or a network can be detrimental to performance. Often there is a peak in performance at a particular level of concurrency. Executing a particular number of tasks in parallel will be easier than ever with Twisted 2.5 and Python 2.5:

from twisted.internet import defer, task

def parallel(iterable, count, callable, *args, **named):
coop = task.Cooperator()
work = (callable(elem, *args, **named) for elem in iterable)
return defer.DeferredList([coop.coiterate(work) for i in xrange(count)])

Here's an example of using this to save the contents of a bunch of URLs which are listed one per line in a text file, downloading at most fifty at a time:

from twisted.python import log
from twisted.internet import reactor
from twisted.web import client

def download((url, fileName)):
return client.downloadPage(url, file(fileName, 'wb'))

urls = [(url, str(n)) for (n, url) in enumerate(file('urls.txt'))]
finished = parallel(urls, 50, download)
finished.addCallback(lambda ign: reactor.stop())

[Edit: The original generator expression in this post was of the form ((yield foo()) for x in y). The yield here is completely superfluous, of course, so I have removed it.]

[Edit: The original post talked about Twisted 2.4 and Python 2.5. It has since turned out that Python 2.5 is too disimilar to Python 2.4 for Twisted 2.4 to run on it. Twisted 2.5 is required to use Python 2.5.]

Monday, May 15, 2006