Tuesday, May 31, 2005

Time out!

Back to work after a long weekend, and what a productive weekend it was: I paired with Itamar on some fun stuff (more about this later), and some Twisted fixes and enhancements too ;)

Today I paired with Glyph on work stuff. It didn't feel like the most productive day, but we did accomplish a few concrete tasks: a deployment-blocking bug got fixed (it's hard to feel too great about this, though: it was a regression, and one I introduced no less); a few demo features got hammered out (but mostly refinements of previous ideas and implementations). Pairing is great even on bad days though: the result is always that much better, the work goes that much faster, and the extra level of communication helps resettle other problems I might be having (even ones I'm not aware of yet), which carries over into subsequent days.

After work (well, eventually, it was probably about 10:30 PM by the time we got started), Glyph, Ying and I relaxed with _Sin City_. Okay: relaxed is not the right word. Still, one must qualify these things as entertainment (I think I actually did like the movie (this is not to say I enjoyed it (don't ask me what the difference between those things is, I'm still working on it myself)), but I did feel physically ill by the time it was over).

But there is one distinct and unwavering downside to a day of pairing. Since I'm not constantly checking my mailbox all day, things tend to pile up. Today I came home to 113 messages to read, about 15 of which I could reasonably be expected to reply to, and half that which I really, really should take care of (not tonight though, I decided to blog about it instead). I suppose I could look on the bright side: I know exactly what I'm doing tomorrow morning before work.

Tuesday, May 24, 2005

Doubling bytestreams

For a while now, every day, we have been moving a somewhat hefty chunk of bytes, about 15GB worth from one machine onto another. The bytes are a tar file, generated in realtime and piped to ssh (connected to a host untaring the bytes). Pretty standard stuff, really. A while back we decided we wanted to send this tar to two hosts, instead of one. No big deal, we just ran tar twice, piping the output to an ssh process connected to a different host each time. Worked like a charm. Recently, we decided the load incurred by the second copy was heavy enough to be worth avoiding. Obvious solution: pipe tar to tee, send one of tee's outputs to one ssh, the other to the other.

That doesn't work. Woops.

For whatever reason, tee chokes after a bit less than 8 GB of data. write error it says, and poof one of the streams is dead (always the same one, interestingly, and the other one always carries on just fine). Rather than waste too much time trying to figure out who is at fault here (or perhaps as a way of doing so ;), I wrote this quickie:


#!/usr/bin/python

"""Write bytes read from one file to one or more files.
"""

import os, sys

from twisted.python import usage, log

class Options(usage.Options):
def opt_out(self, arg):
vars(self).setdefault('out', []).append(arg)
opt_o = opt_out

def postOptions(self):
self.outfds = []
for fname in vars(self).get('out', []):
self.outfds.append(os.open(fname, os.O_WRONLY | os.O_CREAT))

def main(infd, *outfds):
while 1:
bytes = os.read(infd, 2 ** 16)
if not bytes:
break
for i in xrange(len(outfds) - 1, -1, -1):
try:
os.write(outfds[i], bytes)
except:
log.msg("Error writing to %d" % (outfds[i],))
log.err()
del outfds[i]

if __name__ == '__main__':
o = Options()
try:
o.parseOptions()
except:
raise # sys.exit(str(sys.exc_info()[1]))
else:
log.startLogging(sys.stdout)
main(0, *o.outfds)

I called it yjoint, uncreative clod that I am (Hey, at least I didn't call it pytee). It's not exactly a drop-in tee replacement. We use it more or less like this:


exarkun@boson:~/$ tar c yummy_data | yjoint \
> --out >(ssh host1 tar x) \
> --out >(ssh host2 tar x)

Swapped tee out and yjoint in, and suddenly we are in business again.

I wonder what the deal with tee is?