Saturday, December 29, 2012

Distant Scheduled Events Unreliable in Twisted

Someone recently asked a question about whether reactor.callLater could be used to precisely schedule events in the very distant future. The person gave an example of scheduling now - December, 2012 - an event to run at a particular time in December 2014 - a date two years in the future.

One thing to keep in mind about scheduled events (actually, any events) in a Twisted-using application is that the whole system is cooperative and single-threaded. So an event scheduled to run at a specific time - Twisted uses Python floats to represent a time (seconds since the epoch) at which to run scheduled events - cannot run at that time if something else happens to be running when that time comes around. For example, if you have a prime calculator and you're busy calculating a 10,000 digit prime number on December 30, 2014 at 15:30:00 then your event scheduled for that time is not going to run. Instead, it'll run sometime after the prime calculation is finished.

The asker knew this, though, and was only curious about whether there were any intrinsic scheduling limitations related to very distant times. The answer is that there is a limitation, and I've already alluded to it in the paragraph above.

Twisted uses Python floats to represent time. The precision available to floating points declines as the values themselves get larger (this is why they're called "floating points"! The decimal point can move around).

At 2 ** 51, values that differ by less than 0.5 cannot be represented. At 2 ** 52, the range grows to 1.0, and above 2 ** 53, the representation only supports even integers (you can probably see a pattern here - add a bit to the magnitude, subtract a bit from the precision). As Twisted is treating this value as a number of seconds since an arbitrary starting point, this means that 2 ** 53 seconds after that starting point, scheduling granularity has dropped to 2 seconds; ie, there is only one point in time out of every two second interval at which an event can be scheduled. Try to schedule it for a different time and it will silently be adjusted to the time in that interval which can be represented. As a specific example, if you tried to schedule an event to run at 9007199254740993 seconds after the epoch, it would most likely run at 9007199254740992 seconds after the epoch instead.

What does this mean in practical terms? Perhaps not a lot. Assuming our current system of marking the passage of time continues in use until then, 9007199254740992 seconds after the epoch falls about 25 minutes before midnight on November 11th (a Sunday), in the year 285,428,751.

Friday, August 31, 2012

January to August Reading List

Down on the Farm. Charles Stross.
Children of the Sky. Vernor Vinge.
Toast. Charles Stross.
The Etched City. K. J. Bishop.
American Fascists. Chris Hedges.
Why We Get Fat. Gary Taubes.
The Forever War. Joe Haldeman.
The Accidental Time Machine. Joe Haldeman.
Cyborg Assault. Vaughn Heppner.
Planet Wrecker. Vaughn Heppner.
Star Fortress. Vaughn Heppner.
The Restoration Game. Ken Macleod.
The Night Sessions. Ken Macleod.
Redshirts. John Scalzi.
The Ghost Brigades. John Scalzi.
Old Man's War. John Scalzi.
The Last Colony. John Scalzi.
The Year of the Jackpot. Robert Heinlein.
The Vegetarian Myth. Lierre Keith.
Raising Pastured Pigs. Samantha Biggers.
The China Study. T. Colin Campbell, Thomas M. Campbell II.
Orion. Ben Bova.
Vengeance of Orion. Ben Bova.
Heir to the Empire. Timothy Zahn.
Dark Force Rising. Timothy Zahn.
The Last Command. Timothy Zahn.
The Whole Soy Story. Kaayla T. Daniel.
Stop Alzheimer's Now!. Bruce Fife.
Pastured Poultry Profits. Joel Salatin.
Greener Pastures on Your Side of the Fence. Bill Murphy.

Wednesday, August 15, 2012

Heads Up, San Francisco

I will be visiting San Francisco in December and January. To my various acquaintances in the bay area, let's get together and do something fun. To anyone interested in Python or Twisted, my recently formed company would be happy to offer on-site training or other consulting services while I'm in the area. Drop us a line.

Wednesday, May 9, 2012

Commercial support contracts for Twisted

Last week I posted a survey to gauge interest in commercial Twisted support contracts to the Twisted mailing list:

http://twistedmatrix.com/pipermail/twisted-python/2012-May/025537.html

If you think this might be applicable to your interests and you didn't see the initial posting or haven't had a chance to respond yet, please take a minute or two to fill it out (it's very short, no essay questions at all). Thanks!

Friday, April 20, 2012

GNOME Bug Reports

Almost ten years after jwz coined "CADT", they're still at it. Way to keep the dream alive, guys.

Sunday, February 26, 2012

Side Project: Crop Planning Software

Elsewhere, I wrote about the beginning of growing season and some software I've written to help us out this year. The software I was talking about is very spartan right now. It tries to serve exactly our needs, with just enough user interface so that we can get at the information we need. If you notice where exactly on Launchpad I'm currently hosting it, you'll get some idea about how much effort I've put into making this a real, distributable, useful-to-anyone-else project so far.

What the software does at this point is this:

  • Load data from a (semi-)structured file (csv, because it's easy to create and export data in this format using Open Office). The data it can load describes certain crops and certain varieties of those crops, including information about start and end of season, required growing days, anticipated yields, etc.
  • Plan out a seed order, based on that yield data and additional product data (also in the input file). Doing this without wasting a ton of money ends up being something like a solution to the covering problem, due to discounts for buying greater quantities (sometimes unbelievable discounts, with marginal costs for additional seed ranging as low as 5% of the base cost). This is also a very tedious part of the program, as common suppliers offer seed in well over a dozen different package sizes (with "packages" with the same name containing different amounts of seed for different kinds of vegetables, and of course different vegetables requiring different amounts of seed to produce a particular yield).
  • Predict various kinds of resource usage at each point in the season. Resources include things like bed feet (eg, we have 22 beds, each 100 feet long, so we have 2200 bed feet; our crop plan cannot exceed this, or we'll have plants that have nowhere to be planted), plug flag usage (where seeds are started and grow until they're hardy enough to be transplanted outside), and man hours (there are two of us, we don't want to plant so much that we would need to hire help to deal with it).
  • Generate a schedule of when to seed each variety, when to expect to transplant them outdoors, and when to harvest them. The schedule can be displayed as a list or it can be generated as an iCalendar file and loaded into something like Google Calendar or Apple's iCal.

These are all pretty basic pieces of information that someone growing vegetables would want to know. On a small scale, they're the kinds of things you can plan out in your head, or keep track of on paper. As you want to do more, though, it can be overwhelming. For example, our schedule for this season has 376 events on it. I wouldn't have wanted to generate that manually.

There is also some rudamentary graphing functionality. This is for visualizing some of the pieces of information I mentioned above (eg plug flat usage). So far this part has been mostly for fun, as it's hard to make any additional specific decisions based on the graphs, as opposed to the textual, numerical output also generated. One thing it has been useful for, though, is sanity checking the output. It's easier to see a crazy spike or a mysterious plateau on a graph than in numerical data.

As far as the implementation goes, there's nothing really fancy going on here. I've added a lot of features that I hadn't originally planned on (or realized would be useful). As I mentioned, this is a new domain for me to be working in. There is some unit test coverage now, but I didn't start out doing test-driven development. This has bitten me a few times already, as some of the scheduling logic is subtle enough that I can't change it without introducing bugs. Fortunately that part of the code is somewhat well tested now. Well, not completely untested, at least. Development has been test-driven for a month or two now, so I expect things to get easier going forward.

Everything is written in Python, of course. I used vobject to generate the iCalendar output, with pytz to help with the timezone math (oh, timezones, how I loathe you). A pleasantly small amount of code suffices for that.

I used matplotlib and dateutil to generate the graphs. I have a tolerate/hate relationship with matplotlib. It clear does a lot of stuff, and I've seen people use it to good effect. Most of its functionality escapes me, though, and I can hardly learn about a new API without observing that it is completely terrible. Still, I used it because it can do the job, and better than the other options, in my experience.

For the highly tedious structure definition, I used a class from Epsilon. epsilon.structlike.record is a lot like the Python standard library collections.namedtuple. Any time I used the latter, though, I remember how it is implemented and I feel bad. So I stick to the former.

I also used Twisted and html5lib to write a simple web scraper to turn variety names into Johnny's product identifiers. Even if ordering seeds this way ends up being a one-off task, writing the scraper to get this information was definitely easier than chasing down product identifiers in a Johnny's catalog or from the Johnny's website, which each have their own... unique approach to organization. I asked Johnny's if they could make this information available in any sort of structured format and they told me they couldn't. Maybe I should sell it back to them?

Many features are still missing from the planning software. Some of them are simple, like reporting how many flats to seed in the iCalendar event it generates, instead of just reporting how many bed feet will be used after the seeds germinate and are transplanted out into the field. Others are a bit bigger, like having a more coherent model for the underlying data. I might want to put this off until the end of the season, when I might have a better idea if I've fully understood the underlying data myself.

I don't expect this to be useful to a lot of people. In case this sort of tool does appeal to you, though, I'd love feedback (particularly from people more experienced with planning and executing these kinds of agricultural tasks) - but no feature requests, please :)

Saturday, January 21, 2012

Cleaning Up Branch Checkouts

Since Twisted development typically involves at least one branch per ticket, a Twisted developer can end up with a lot of branches checked out.  For example, this morning I had 177 Twisted branches checked out on my laptop.  Many of these were branches that I contributed code to, and perhaps even merged into trunk myself when they were complete.  I could probably have deleted them at that point, but I usually can't be bothered.  Besides, I put everything I have into the branch itself, by the time I'm merging it I'm done.  Other branches are ones I've done code reviews on for other developers.  I don't keep track of when these get merged into trunk as closely, since typically someone else is going to do those merges.

The incremental cost of another Twisted branch is pretty minimal.  A few more megs used on my hard drive is barely noticable.  The aggregate cost can get pretty high though (Seven GB for the 177 branches I had this morning).  At some point this can cause problems.

Not all of these branches have been merged to into trunk, either, or I could just wipe them all out with ease.  And while I try never to leave uncommitted changes in a branch checkout, nobody's perfect...  What I really want to do is just get rid of the branches that just aren't relevant anymore.

So I use cleanup-local.py to deal with the mess.  It looks at my branch checkouts, talks to the Twisted issue tracker to learn the state of the associated ticket (due to the naming convention for Twisted branches, it is easy to determine which ticket is associated with a branch, given just the branch name).  Then it deletes all the checkouts associated with closed tickets (due to the Twisted workflow, if a ticket is closed, it is a very safe bet that you won't need its branch anymore).

The net result is that in (far) less time than it took to write this post, my laptop went from having 177 Twisted branches to having just 34.  To save even more time, I could probably set this up as a weekly cron job or something similar.  It's easy enough to run now, though, that I just do so manually once every couple of months to keep things tidy.

Here's a brief snippet from today's run:

Found password-comparison-4536-2 for ticket(s): 4536
Status of 4536 is assigned
Found pb-chat-example-4459 for ticket(s): 4459
Status of 4459 is closed
Removing closed: pb-chat-example-4459
Found plugin-cache-2409 for ticket(s): 2409
Status of 2409 is closed
Removing closed: plugin-cache-2409
Found poll-default-2234-2 for ticket(s): 2234
Status of 2234 is closed
Removing closed: poll-default-2234-2

Tuesday, January 10, 2012

Learn About Twisted at PyCon 2012

At PyCon this year I'll be presenting a tutorial to introduce Python programmers to Twisted. This tutorial has two goals. First, to give attendees a firm grasp of Twisted's concurrency model, both in the abstract and the concrete. Second, to remove the mystery around the tools Twisted provides for developing robust, testable concurrent applications. If you attend, you'll come away with an understanding of how event loops work and how to write code that works best in Twisted's event loop.

I am a long time core Twisted developer with real world experience building maintainable, scalable systems with Twisted. I've also presented similar introductory Twisted tutorials several times in the past, letting me learn the common sticking points and teaching approaches to help overcome them.

Check out the tutorial's page on the PyCon 2012 website for details about what will be covered. Come learn how to leverage Twisted and Twisted-based libraries to their fullest extent!