Sunday, October 25, 2009

Twisted Web in 60 seconds: WSGI


Welcome to the 13th installment of Twisted Web in 60 seconds. For a while, I've been writing about how you can implement pages by working with the Twisted Web resource model. The very first example I showed you used an existing Resource subclass to serve static content from the filesystem. In this installment, I'll show you how to use WSGIResource, another existing Resource subclass which lets you serve WSGI applications in a Twisted Web server.




First, a few things about WSGIResource. It is a multithreaded WSGI container. Like any other WSGI container, you can't do anything asynchronous in your WSGI applications, even though this is a Twisted WSGI container. In the latest release of Twisted as of this post, 8.2, WSGIResource also has a few significant bugs. These are fixed in trunk (and the fixes will be included in 9.0), so if you want to play around with WSGI in any significant way, you probably want trunk for now.




The first new thing in this example is the import of WSGIResource:



  from twisted.web.wsgi import WSGIResource



Nothing too surprising there. We still need one of the other usual suspects, too:



  from twisted.internet import reactor



You'll see why in a minute. Next, we need a WSGI application. Here's a really simple one just to get things going:



  def application(environ, start_response):
     start_response('200 OK', [('Content-type', 'text/plain')])
     return ['Hello, world!']



If this doesn't make sense to you, take a look at one of these fine tutorials. Otherwise, or once you're done with that, the next step is to create a WSGIResource instance - as this is going to be another rpy script example.



  resource = WSGIResource(reactor, reactor.getThreadPool(), application)



I need to dwell on this line for a minute. The first parameter passed to WSGIResource is the reactor. Despite the fact that the reactor is global and any code that wants it can always just import it (as, in fact, this rpy script simply does itself), passing it around as a parameter leaves the door open for certain future possibilities. For example, having more than one reactor. There are also testing implications. Consider how much easier it is to unit test a function that accepts a reactor - perhaps a mock reactor specially constructed to make your tests easy to write ;) - rather than importing the real global reactor. Anyhow, that's why WSGIResource requires you to pass the reactor to it.




The second parameter passed to WSGIResource is a thread pool. WSGIResource uses this to actually call the application object passed in to it. To keep this example short, I'm passing in the reactor's internal threadpool here, letting me skip its creation and shutdown-time destruction. For finer control over how many WSGI requests are served in parallel, you may want to create your own thread pool to use with your WSGIResource. But for simple testing, using the reactor's is fine (although I'm cheating here a little - I apologize - getThreadPool is a new API, not present in 8.2: you need trunk for this example to work; please ask Chris Armstrong to release 9.0 already).




The final argument is the application object. This is pretty typical of how WSGI containers work.




The example, sans interruption:



  from twisted.web.wsgi import WSGIResource
 from twisted.internet import reactor

 def application(environ, start_response):
     start_response('200 OK', [('Content-type', 'text/plain')])
      return ['Hello, world!']

 resource = WSGIResource(reactor, reactor.getThreadPool(), application)



Up to the point where the WSGIResource instance defined here exists in the resource hierarchy, the normal resource traversal rules apply - getChild will be called to handle each segment. Once the WSGIResource is encountered, though, that process stops and all further URL handling is the responsibility of the WSGI application. Of course this application does nothing with the URL, so you won't be able to tell that.




Oh, and as was the case with the first static file example, there's also a command line option you can use to avoid a lot of this. If you just put the above application function, without all of the WSGIResource stuff, into a file, say, foo.py, then you can launch a roughly equivalent server like this:



  $ twistd -n web --wsgi foo.application



Tune in next time, when I'll discuss HTTP authentication.

Thursday, October 22, 2009

Twisted Web in 60 seconds: logging errors


Welcome to the twelfth installment of "Twisted Web in 60 seconds". The previous installment created a server which dealt with response errors by aborting response generation, potentially avoiding pointless work. However, it did this silently for any error. In this installment, I'll modify the previous example so that it logs each failed response.




This example will use the Twisted API for logging errors. As I mentioned in the first post covering Deferreds, errbacks are passed an error. In the previous example, the _responseFailed errback accepted this error as a parameter but ignored it. The only way this example will differ is that this _responseFailed will use that error parameter to log a message.




This example will require all of the imports required by the previous example, which I will not repeat here, plus one new import:



  from twisted.python.log import err



The only other part of the previous example which changes is the _responseFailed callback, which will now log the error passed to it:



      def _responseFailed(self, failure, call):
         call.cancel()
         err(failure, "Async response demo interrupted response")



I'm passing two arguments to err here. The first is the error which is being passed in to the callback. This is always an object of type Failure, a class which represents an exception and (sometimes, but not always) a traceback. err will format this nicely for the log. The second argument is a descriptive string that tells someone reading the log what the source of the error was.




Here's the full example with the two above modifications:



from twisted.web.resource import Resource
from twisted.web.server import NOT_DONE_YET
from twisted.internet import reactor
from twisted.python.log import err

class DelayedResource(Resource):
   def _delayedRender(self, request):
       request.write("Sorry to keep you waiting.")
       request.finish()

   def _responseFailed(self, failure, call):
       call.cancel()
       err(failure, "Async response demo interrupted response")

   def render_GET(self, request):
       call = reactor.callLater(5, self._delayedRender, request)
       request.notifyFinish().addErrback(self._responseFailed, call)
       return NOT_DONE_YET

resource = DelayedResource()



Run this server (see the end of the previous installment if you need a reminder about how to do that) and interrupt a request. Unlike the previous example, where the server gave no indication that this had happened, you'll see a message in the log output with this version.




Next time I'll show you about a resource that lets you host WSGI applications in a Twisted Web server.

Sunday, October 18, 2009

Twisted Web in 60 seconds: interrupted responses


Welcome to the eleventh installment of "Twisted Web in 60 seconds". Previously I gave an example of a Resource which generates its response asynchronously rather than immediately upon the call to its render method. When generating responses asynchronously, the possibility is introduced that the connection to the client may be lost before the response is generated. In such a case, it is often desirable to abandon the response generation entirely, since there is nothing to do with the data once it is produced. In this installment, I'll show you how to be notified that the connection has been lost.




This example will build upon the example from installment nine which simply (if not very realistically) generated its response after a fixed delay. I will expand that resource so that as soon as the client connection is lost, the delayed event is canceled and the response is never generated.




The feature this example relies on is provided by another Request method: notifyFinish. This method returns a new Deferred which will fire with None if the request is successfully responded to or with an error otherwise - for example if the connection is lost before the response is sent.




The example starts in a familiar way, with the requisite Twisted imports and a resource class with the same _delayedRender used previously:



  from twisted.web.resource import Resource
 from twisted.web.server import NOT_DONE_YET
 from twisted.internet import reactor

 class DelayedResource(Resource):
     def _delayedRender(self, request):
         request.write("<html><body>Sorry to keep you waiting.</body></html>")
         request.finish()



Before defining the render method, I'm going to define an errback (an errback being a callback that gets called when there's an error), though. This will be the errback attached to the Deferred returned by Request.notifyFinish. It will cancel the delayed call to _delayedRender.



      def _responseFailed(self, err, call):
         call.cancel()



Finally, the render method will set up the delayed call just as it did before, and return NOT_DONE_YET likewise. However, it will also use Request.notifyFinish to make sure _responseFailed is called if appropriate.



      def render_GET(self, request):
         call = reactor.callLater(5, self._delayedRender, request)
         request.notifyFinish().addErrback(self._responseFailed, call)
         return NOT_DONE_YET



Notice that since _responseFailed needs a reference to the delayed call object in order to cancel it, I passed that object to addErrback. Any additional arguments passed to addErrback (or addCallback) will be passed along to the errback after the Failure instance which is always passed as the first argument. Passing call here means it will be passed to _responseFailed, where it is expected and required.




That covers almost all the code for this example. Here's the entire example without interruptions, as an rpy script:



from twisted.web.resource import Resource
from twisted.web.server import NOT_DONE_YET
from twisted.internet import reactor

class DelayedResource(Resource):
   def _delayedRender(self, request):
       request.write("Sorry to keep you waiting.")
       request.finish()

   def _responseFailed(self, err, call):
       call.cancel()

   def render_GET(self, request):
       call = reactor.callLater(5, self._delayedRender, request)
       request.notifyFinish().addErrback(self._responseFailed, call)
       return NOT_DONE_YET

resource = DelayedResource()



Toss this into example.rpy, fire it up with twistd -n web --path ., and hit http://localhost:8080/example.rpy. If you wait five seconds, you'll get the page content. If you interrupt the request before then, say by hitting escape (in Firefox, at least), then you'll see perhaps the most boring demonstration ever - no page content, and nothing in the server logs. Success!




Next time I'll digress slightly to cover the basics of Twisted logging and expand this example to use it to show when clients fail to receive the response they requested.

Twisted Security Outreach


Following the second of Matasano's recommendations for how to get security right, Twisted now has a security outreach page. All you security researchers out there who've been holding back because you thought we wouldn't pay attention, bring it on. :)

Saturday, October 10, 2009

Twisted Web in 60 seconds: asynchronous responses (via Deferred)


Welcome to the tenth installment of "Twisted Web in 60 90 seconds". Previously I gave an example of a Resource which generates its response asynchronously rather than immediately upon the call to its render method. Though it was a useful demonstration of the NOT_DONE_YET feature of Twisted Web, the example itself didn't reflect what a realistic application might want to do. In this installment, I'll introduce Deferred, the Twisted class which is used to provide a uniform interface to many asynchronous events, and show you an example of using a Deferred-returning API to generate an asynchronous response to a request in Twisted Web1.




Deferred is the result of two consequences of the asynchronous programming approach. First, asynchronous code is frequently (if not always) concerned with some data (in Python, an object) which is not yet available but which probably will be soon. Asynchronous code needs a way to define what will be done to the object once it does exist. It also needs a way to define how to handle errors in the creation or acquisition of that object. These two needs are satisfied by the callbacks and errbacks of a Deferred. Callbacks are added to a Deferred with Deferred.addCallback; errbacks are added with Deferred.addErrback. When the object finally does exist, it is passed to Deferred.callback which passes it on to the callback added with addCallback. Similarly, if an error occurs, Deferred.errback is called and the error is passed along to the errback added with addErrback. Second, the events that make asynchronous code actually work often take many different, incompatible forms. Deferred acts as the uniform interface which lets different parts of an asynchronous application interact and isolates them from implementation details they shouldn't be concerned with.




That's almost all there is to Deferred. To solidify your new understanding, now consider this rewritten version of DelayedResource which uses a Deferred-based delay API. It does exactly the same thing as the previous example. Only the implementation is different.




First, the example must import that new API I just mentioned, deferLater:



  from twisted.internet.task import deferLater



Next, all the other imports (these are the same as last time):



  from twisted.web.resource import Resource
  from twisted.web.server import NOT_DONE_YET
  from twisted.internet import reactor



With the imports done, here's the first part of the DelayedResource implementation. Again, this part of the code is identical to the previous version:



  class DelayedResource(Resource):
     def _delayedRender(self, request):
         request.write("<html><body>Sorry to keep you waiting.</body></html>")
         request.finish()



Next I also need to define the render method. Here's where things change a bit. Instead of using callLater, I'm going to use deferLater this time. deferLater accepts a reactor, delay (in seconds, as with callLater), and a function to call after the delay to produce that elusive object I was talking about above in my description of Deferreds. I'm also doing to use _delayedRender as the callback to add to the Deferred returned by deferLater. Since it expects the request object as an argument, I'm going to set up the deferLater call to return a Deferred which has the request object as its result.



      def render_GET(self, request):
         d = deferLater(reactor, 5, lambda: request)



The Deferred referenced by d now needs to have the _delayedRender callback added to it. Once this is done, _delayedRender will be called with the result of d (which will be request, of course — the result of (lambda: request)()).



          d.addCallback(self._delayedRender)



Finally, the render method still needs to return NOT_DONE_YET, for exactly the same reasons as it did in the previous version of the example.



          return NOT_DONE_YET



And with that, DelayedResource is now implemented based on a Deferred. The example still isn't very realistic, but remember that since Deferreds offer a uniform interface to many different asynchronous event sources, this code now resembles a real application even more closely; you could easily replace deferLater with another Deferred-returning API and suddenly you might have a resource that does something useful.




Finally, here's the complete, uninterrupted example source, as an rpy script:



from twisted.internet.task import deferLater
from twisted.web.resource import Resource
from twisted.web.server import NOT_DONE_YET
from twisted.internet import reactor

class DelayedResource(Resource):
   def _delayedRender(self, request):
       request.write("Sorry to keep you waiting.")
       request.finish()

   def render_GET(self, request):
       d = deferLater(reactor, 5, lambda: request)
       d.addCallback(self._delayedRender)
       return NOT_DONE_YET

resource = DelayedResource()



1I know I promised an example of handling lost client connections, but I realized that example would also involve Deferreds, so I wanted to introduce Deferreds by themselves first. Tune in next time for the example I told you I'd show you this time.

Twisted Web in 60 seconds: Index

Here's an index of all the "Twisted Web in 60 seconds" entries, for your linking and searching convenience:

Wednesday, October 7, 2009

Twisted Web in 60 seconds: asynchronous responses


Welcome to the ninth installment of "Twisted Web in 60 seconds". In all the previous installments, the resource examples I presented generated responses immediately. One of the features of prime interest of Twisted Web, though, is the ability to generate a response over a longer period of time while leaving the server free to respond to other requests. In other words, asynchronously. In this installment, I'll show you how you can write a resource like this.




A resource which generates a response asynchronously looks like one which generates a response synchronously in many ways. The same base class, Resource, is used either way; the same render methods are used. There are three basic differences, though.




First, instead of returning the string which will be used as the body of the response, the resource uses Request.write. This method can be called repeatedly. Each call appends another string to the response body. Second, when the entire response body has been passed to Request.write, the application must call Request.finish. As you might expect from the name, this ends the response. Finally, in order to make Twisted Web not end the response as soon as the render method returns, the render method must return NOT_DONE_YET. Consider this example:



  from twisted.web.resource import Resource
 from twisted.web.server import NOT_DONE_YET
 from twisted.internet import reactor

 class DelayedResource(Resource):
     def _delayedRender(self, request):
         request.write("<html><body>Sorry to keep you waiting.</body></html>")
         request.finish()

     def render_GET(self, request):
         reactor.callLater(5, self._delayedRender, request)
         return NOT_DONE_YET



If you're not familiar with reactor.callLater, all you really need to know about it to understand this example is that the above usage of it arranges to have self._delayedRender(request) run about 5 seconds after callLater is invoked from this render method and that it returns immediately.




All three of the elements I mentioned earlier can be seen in this example. The resource uses Request.write to set the response body. It uses Request.finish after the entire body has been specified (all with just one call to write in this case). And it returns NOT_DONE_YET from its render method. So there you have it, asynchronous rendering with Twisted Web.




Here's a complete rpy script based on this resource class (see the previous installment if you need a reminder about rpy scripts):



from twisted.web.resource import Resource
from twisted.web.server import NOT_DONE_YET
from twisted.internet import reactor

class DelayedResource(Resource):
   def _delayedRender(self, request):
       request.write("<html><body>Sorry to keep you waiting.</body></html>")
       request.finish()

   def render_GET(self, request):
       reactor.callLater(5, self._delayedRender, request)
       return NOT_DONE_YET

resource = DelayedResource()



Drop this source into a .rpy file and fire up a server using twistd -n web --path /directory/containing/script/. You'll see that loading the page takes 5 seconds. If you try to load a second before the first completes, it will also take 5 seconds from the time you request it (but it won't be delayed by any other outstanding requests).




Something else to consider when generating responses asynchronously is that the client may not wait around to get the response to its request. Next time I'll demonstrate how to detect that the client has abandoned the request and that the server shouldn't bother to finish generating its response.

Friday, October 2, 2009

Twisted Web in 60 seconds: rpy scripts (or, how to save yourself some typing)


Welcome to the eighth installment of "Twisted Web in 60 seconds". In the previous installment, I griped about how much typing I had to do for each of the examples. The goal of this installment is to show you another way to run a Twisted Web server with a custom resource which doesn't require as much code.




The feature I'm talking about is called an rpy script. An rpy script is a Python source file which defines a resource and can be loaded into a Twisted Web server. The advantages of this approach are that you don't have to write code to create the site or set up a listening port with the reactor. That means fewer lines of code that aren't dedicated to the task you're trying to accomplish.




There are some disadvantages, though. An rpy script must have the extension .rpy. This means you can't import it using the usual Python import statement. This means it's hard to re-use code in an rpy script. This also means you can't easily unit test it. The code in an rpy script is evaluated in an unusual context, So, while rpy scripts may be useful for testing out ideas, I would not recommend them for much more than that.




Okay, with that warning out of the way, let's dive in. First, as I mentioned, rpy scripts are Python source files with the .rpy extension. So, open up an appropriately named file (for example, example.rpy) and put this code in it:



import time

from twisted.web.resource import Resource

class ClockPage(Resource):
   isLeaf = True
   def render_GET(self, request):
       return "<html><body>%s</body></html>" % (time.ctime(),)

resource = ClockPage()




You may recognize this as the resource from the first dynamic rendering example. What's different is what you don't see: I didn't import reactor or Site. There's no calls to listenTCP or run. Instead, and this is the core idea for rpy scripts, I just bound the name resource to the resource I want the script to serve. Every rpy script must bind this name, and this name is the only thing Twisted Web will pay attention to in an rpy script.




All that's left is to drop this rpy script into a Twisted Web server. There are a few ways to do this. The simplest way is with twistd:




$ twistd -n web --path .




Hit http://localhost:8080/example.rpy to see it run. You can pass other arguments here too. twistd web has options for specifying which port number to bind, whether to set up an HTTPS server, and plenty more. You can also pass options to twistd here, for example to configure logging to work differently, to select a different reactor, etc. For a full list of options, see twistd --help and twistd web --help.




That's it for rpy scripts for now. I'll probably make use of them in future examples to keep the focus on the new material. And speaking of which, check out the next installment to learn about asynchronous rendering.

Twisted Web in 60 seconds: handling POSTs


Welcome to the seventh installment of "Twisted Web in 60 seconds" in which I'll show you how to handle POST requests. All of the previous installments have focused on GET requests. Unlike GET requests, POST requests can have a request body - extra data after the request headers; for example, data representing the contents of an HTML form. Twisted Web makes this data available to applications via the Request object.




Here's an example web server which renders a static HTML form and then generates a dynamic page when that form is posted back to it. (While it's convenient for this example, it's often not a good idea to make a resource that POSTs to itself; this isn't about Twisted Web, but the nature of HTTP in general; if you do this, make sure you understand the possible negative consequences).




As usual, we start with some imports (see previous installments for details). In addition to the Twisted imports, this example uses the cgi module to escape user-entered content for inclusion in the output.

  from twisted.web.server import Site
 from twisted.web.resource import Resource
 from twisted.internet import reactor

 import cgi




Next, we'll define a resource which is going to do two things. First, it will respond to GET requests with a static HTML form:

  class FormPage(Resource):
     def render_GET(self, request):
         return '<html><body><form method="POST"><input name="the-field" type="text" /></form></body></html>'

This is similar to the static resource I used as an example in a previous installment. However, I'll now add one more method to give it a second behavior; this render_POST method will allow it to accept POST requests:
      def render_POST(self, request):
         return '<html><body>You submitted: %s</body></html>' % (cgi.escape(request.args["the-field"][0]),)




The main thing to note here is the use of request.args. This is a dictionary-like object that provides access to the contents of the form. The keys in this dictionary are the names of inputs in the form. Each value is a list containing strings (since there can be multiple inputs with the same name), which is why I had to extract the first element to pass to cgi.escape. request.args will be populated from form contents whenever a POST request is made with a content type of application/x-www-form-urlencoded or multipart/form-data (it's also populated by query arguments for any type of request).




Finally, the example just needs the usual site creation and port setup:

  root = Resource()
 root.putChild("form", FormPage())
 factory = Site(root)
 reactor.listenTCP(8880, factory)
 reactor.run()

Run the server and visit http://localhost:8880/form, submit the form, and watch it generate a page including the value you entered into the single field.




Here's the complete source for the example:

from twisted.web.server import Site
from twisted.web.resource import Resource
from twisted.internet import reactor

import cgi

class FormPage(Resource):
   def render_GET(self, request):
       return '<html><body><form method="POST"><input name="the-field" type="text" /></form></body></html>'

   def render_POST(self, request):
       return '<html><body>You submitted: %s</body></html>' % (cgi.escape(request.args["the-field"][0]),)

root = Resource()
root.putChild("form", FormPage())
factory = Site(root)
reactor.listenTCP(8880, factory)
reactor.run()




Since I'm getting a little bored with some of the boilerplate involved in these examples, the next installment will introduce rpy files, a good way to try out new concepts and APIs (like the ones presented in this series) without all the repetitive boilerplate.