Tuesday, December 9, 2008

ioctl TIOCSTTY EIO

What does it mean if a TIOCSTTY ioctl fails with EIO (errno 5) on Linux?

7 comments:

  1. Surely you mean TIOCSCTTY? There does not seem any return path leading to EIO in drivers/char/tty_io.c. Maybe your file descriptor is not a terminal?

    ReplyDelete
  2. Oops, yes, TIOCSCTTY.

    Superficial inspection of the code which encounters this error suggests the file descriptor is for a terminal. The file descriptor comes back from Python's pty.openpty (a wrapper around openpty(3)). Then a fork() happens and the child tries to do the ioctl. Once in a while, the ioctl fails. Interestingly, os.isatty (a wrapper around isatty(3)) returns True for the FD being passed to the ioctl call before the ioctl call and usually returns True after the ioctl, but in the cases where the ioctl fails with EIO, it returns False afterwards. So the ioctl is breaking the file descriptor? Or the file description? But again, only sometimes (perhaps 10% or so, but it seems non-deterministic).

    ReplyDelete
  3. I think it's a bit more involved than that. The master does get closed in the relevant codepath, but only in the child process after fork(). So it's still open in the parent, so the slave is still valid. I actually instrumented the code in question (it's not really a minimal example yet, unfortunately) and saw that isatty returned True immediately before the ioctl call and then False immediately afterwards when the ioctl itself failed with EIO.

    ReplyDelete
  4. What happens between the first call to isatty and the ioctl? I bet this is a timing issue with the parent somehow closing the master. Is this code publicly visible? Twisted.Conch maybe?

    ReplyDelete
  5. Hm, under strace it seems to fail a lot quicker. I still think it's a timing issue and not a problem with the ioctl in particular. Changing the ioctl call to os.write seems to confirm that. I'm not familiar enough with the reactor stuff to see what's really going on there and it's too late for me to dig in further, sorry.

    ReplyDelete
  6. strace seems to make it take longer to fail on my system. :) I did finally manage to get it to fail under strace a couple times though. First time strace itself had a bit of a problem though, spitting out "trace: ptrace(PTRACE_SYSCALL, ...): No such process" right before the trial process failed.

    Anyhow, thanks for the discussion so far. At least I'm a bit closer to understanding the problem than I was before.

    ReplyDelete
  7. Figured out what was going on, at least. It ended up being a bug in the unit test. The test didn't wait for the process to get into any particular state, it just immediately started cleaning up. Sometimes it would close the PTY before the child got past the setup. :/ So the fix is just to make the test wait for the child to finish initialization before trying to clean it up.

    So it is actually the case you gave an example of earlier, albeit it split across two processes.

    ReplyDelete