Tuesday, August 21, 2007

Twisted Trial performance improvements

Trial, Twisted's xUnit(-esque) testing package (library, discovery tool, runner, etc), has long inherited its overall testing flow from the Python standard library unittest module. It's quite simple, actually: iterate over a sequence of test cases and invoke each one with the result object. Lately, this has actually led to some noticeable problems with its performance. Creating a test case instance for each test method isn't extremely unreasonable, but in the process of running them all, more and more objects are created and the process balloons to an unreasonable size. The Twisted test suite typically uses about 800MB of RAM by the time the last test method is run.

So, in order to be able to run the Twisted tests on machines without 800MB of free memory, we changed our TestSuite so that it drops each test after it runs it. The suite now takes about one quarter as much memory to run. As an unexpected bonus, it also runs almost 50% faster (66% for trial --force-gc which calls gc.collect in between each test method). I can only explain the speedup as time saved inside the garbage collector itself due there being far fewer objects to examine (this is not a completely satisfying explanation, but I cannot think of a better one).

If you're using trial to run your test suite, you may notice reduced memory requirements and reduced overall runtime, too. :)


  1. Another explanation (and the first thing that came to mind when you described the total memory requirements) is that less time will be spent thashing the HD to access the swap file.

    Obviously this will vary between platforms, but on a Win box this will be a serious time saving change. I'm less hip with Linux's swap file dynamics, but I do know that accessing something in memory takes a lot less time than it does to fetch from magnetic media no matter the platform, and the act of dumping stuff from active memory to a swap file will bind up other processes, too.

    2c, spend wherever.

  2. Good thought. I don't think any of the runs I timed were actually going into swap themselves, but maybe the cost of pushing other stuff into swap contributed to the slowdown.

    Now that this is resolved, I'm going to go back to trying pypy-c with Twisted, which could use so much memory before that it went way into swap and basically never came back. :)

  3. If one absolutely HAD to know, I suppose turning off swap would level out the playing field. Although, I don't know if you can even DO that with standard Linux builds. You can on Win, if you're suicidal :)