Sunday, February 26, 2012

Side Project: Crop Planning Software

Elsewhere, I wrote about the beginning of growing season and some software I've written to help us out this year. The software I was talking about is very spartan right now. It tries to serve exactly our needs, with just enough user interface so that we can get at the information we need. If you notice where exactly on Launchpad I'm currently hosting it, you'll get some idea about how much effort I've put into making this a real, distributable, useful-to-anyone-else project so far.

What the software does at this point is this:

  • Load data from a (semi-)structured file (csv, because it's easy to create and export data in this format using Open Office). The data it can load describes certain crops and certain varieties of those crops, including information about start and end of season, required growing days, anticipated yields, etc.
  • Plan out a seed order, based on that yield data and additional product data (also in the input file). Doing this without wasting a ton of money ends up being something like a solution to the covering problem, due to discounts for buying greater quantities (sometimes unbelievable discounts, with marginal costs for additional seed ranging as low as 5% of the base cost). This is also a very tedious part of the program, as common suppliers offer seed in well over a dozen different package sizes (with "packages" with the same name containing different amounts of seed for different kinds of vegetables, and of course different vegetables requiring different amounts of seed to produce a particular yield).
  • Predict various kinds of resource usage at each point in the season. Resources include things like bed feet (eg, we have 22 beds, each 100 feet long, so we have 2200 bed feet; our crop plan cannot exceed this, or we'll have plants that have nowhere to be planted), plug flag usage (where seeds are started and grow until they're hardy enough to be transplanted outside), and man hours (there are two of us, we don't want to plant so much that we would need to hire help to deal with it).
  • Generate a schedule of when to seed each variety, when to expect to transplant them outdoors, and when to harvest them. The schedule can be displayed as a list or it can be generated as an iCalendar file and loaded into something like Google Calendar or Apple's iCal.

These are all pretty basic pieces of information that someone growing vegetables would want to know. On a small scale, they're the kinds of things you can plan out in your head, or keep track of on paper. As you want to do more, though, it can be overwhelming. For example, our schedule for this season has 376 events on it. I wouldn't have wanted to generate that manually.

There is also some rudamentary graphing functionality. This is for visualizing some of the pieces of information I mentioned above (eg plug flat usage). So far this part has been mostly for fun, as it's hard to make any additional specific decisions based on the graphs, as opposed to the textual, numerical output also generated. One thing it has been useful for, though, is sanity checking the output. It's easier to see a crazy spike or a mysterious plateau on a graph than in numerical data.

As far as the implementation goes, there's nothing really fancy going on here. I've added a lot of features that I hadn't originally planned on (or realized would be useful). As I mentioned, this is a new domain for me to be working in. There is some unit test coverage now, but I didn't start out doing test-driven development. This has bitten me a few times already, as some of the scheduling logic is subtle enough that I can't change it without introducing bugs. Fortunately that part of the code is somewhat well tested now. Well, not completely untested, at least. Development has been test-driven for a month or two now, so I expect things to get easier going forward.

Everything is written in Python, of course. I used vobject to generate the iCalendar output, with pytz to help with the timezone math (oh, timezones, how I loathe you). A pleasantly small amount of code suffices for that.

I used matplotlib and dateutil to generate the graphs. I have a tolerate/hate relationship with matplotlib. It clear does a lot of stuff, and I've seen people use it to good effect. Most of its functionality escapes me, though, and I can hardly learn about a new API without observing that it is completely terrible. Still, I used it because it can do the job, and better than the other options, in my experience.

For the highly tedious structure definition, I used a class from Epsilon. epsilon.structlike.record is a lot like the Python standard library collections.namedtuple. Any time I used the latter, though, I remember how it is implemented and I feel bad. So I stick to the former.

I also used Twisted and html5lib to write a simple web scraper to turn variety names into Johnny's product identifiers. Even if ordering seeds this way ends up being a one-off task, writing the scraper to get this information was definitely easier than chasing down product identifiers in a Johnny's catalog or from the Johnny's website, which each have their own... unique approach to organization. I asked Johnny's if they could make this information available in any sort of structured format and they told me they couldn't. Maybe I should sell it back to them?

Many features are still missing from the planning software. Some of them are simple, like reporting how many flats to seed in the iCalendar event it generates, instead of just reporting how many bed feet will be used after the seeds germinate and are transplanted out into the field. Others are a bit bigger, like having a more coherent model for the underlying data. I might want to put this off until the end of the season, when I might have a better idea if I've fully understood the underlying data myself.

I don't expect this to be useful to a lot of people. In case this sort of tool does appeal to you, though, I'd love feedback (particularly from people more experienced with planning and executing these kinds of agricultural tasks) - but no feature requests, please :)

4 comments:

  1. what version of python,

    with 2.6 so far i get the following error

    File "/usr/local/lib/python2.6/dist-packages/dateutil/rrule.py", line 13, in
    import _thread
    ImportError: No module named _thread

    ReplyDelete
    Replies
    1. Weird! I'm using Python 2.7, but I don't think that makes any
      difference in this case. This looks like a bug in dateutil. I'm
      using dateutil 1.4.1 (the version packaged for Ubuntu 11.10). It
      looks like you might have dateutil 2.0, which only works with
      Python 3.x (not Python 2.6 or 2.7). If that's the case, can you
      try going back to dateutil 1.4 or 1.5?

      Delete
  2. can you post sample input files? seems like you require seeds/oz and other things that could be scraped from johnny's, so i'm a bit confused. thanks!

    ReplyDelete
  3. oops, ignore my last comment, i found input files!

    ReplyDelete