watch this The wheels are turning, slowly turning. home
SSH to EC2 (Refrain) 2017-09-15

Recently Moshe wrote up a demonstration of the simple steps needed to retrieve an SSH public key from an EC2 instance to populate a known_hosts file. Moshe’s example uses the highly capable boto3 library for its EC2 interactions. However, since his blog is syndicated on Planet Twisted, reading it left me compelled to present an implementation based on txAWS instead.

First, as in Moshe’s example, we need argv and expanduser so that we can determine which instance the user is interested in (accepted as a command line argument to the tool) and find the user’s known_hosts file (conventionally located in ~):

from sys import argv
from os.path import expanduser

Next, we’ll get an abstraction for working with filesystem paths. This is commonly used in Twisted APIs because it saves us from many path manipulation mistakes committed when representing paths as simple strings:

from filepath import FilePath

Now, get a couple of abstractions for working with SSH. Twisted Conch is Twisted’s SSH library (client & server). KnownHostsFile knows how to read and write the known_hosts file format. We’ll use it to update the file with the new key. Key knows how to read and write SSH-format keys. We’ll use it to interpret the bytes we find in the EC2 console output and serialize them to be written to the known_hosts file.

from twisted.conch.client.knownhosts import KnownHostsFile
from twisted.conch.ssh.keys import Key

And speaking of the EC2 console output, we’ll use txAWS to retrieve it. AWSServiceRegion is the main entrypoint into the txAWS API. From it, we can get an EC2 client object to use to retrieve the console output.

from txaws.service import AWSServiceRegion

And last among the imports, we’ll write the example with inlineCallbacks to minimize the quantity of explicit callback-management code. Due to the simplicity of the example and the lack of any need to write tests for it, I won’t worry about the potential problems with confusing tracebacks or hard-to-test code this might produce. We’ll also use react to drive the whole thing so we don’t need to explicitly import, start, or stop the reactor.

from twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react

With that sizable preamble out of the way, the example can begin in earnest. First, define the main function using inlineCallbacks and accepting the reactor (to be passed by react) and the EC2 instance identifier (taken from the command line later on):

@inlineCallbacks
def main(reactor, instance_id):

Now, get the EC2 client. This usage of the txAWS API will find AWS credentials in the usual way (looking at AWS_PROFILE and in ~/.aws for us):

    region = AWSServiceRegion()
    ec2 = region.get_ec2_client()

Then it’s a simple matter to get an object representing the desired instance and that instance’s console output. Notice these APIs return Deferred so we use yield to let inlineCallbacks suspend this function until the results are available.

    [instance] = yield ec2.describe_instances(instance_id)
    output = yield ec2.get_console_output(instance_id)

Some simple parsing logic, much like the code in Moshe’s implementation (since this is exactly the same text now being operated on). We do take the extra step of deserializing the key into an object that we can use later with a KnownHostsFile object.

    keys = (
        Key.fromString(key)
        for key in extract_ssh_key(output.output)
    )

Then write the extracted keys to the known hosts file:

    known_hosts = KnownHostsFile.fromPath(
        FilePath(expanduser("~/.ssh/known_hosts")),
    )
    for key in keys:
        for name in [instance.dns_name, instance.ip_address]:
            known_hosts.addHostKey(name, key)
    known_hosts.save()

There’s also the small matter of actually parsing the console output for the keys:

def extract_ssh_key(output):
    return (
        line for line in output.splitlines()
        if line.startswith(u"ssh-rsa ")
    )

And then kicking off the whole process:

react(main, argv[1:])

Putting it all together:

from sys import argv
from os.path import expanduser

from filepath import FilePath

from twisted.conch.client.knownhosts import KnownHostsFile
from twisted.conch.ssh.keys import Key

from txaws.service import AWSServiceRegion

from twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react

@inlineCallbacks
def main(reactor, instance_id):
    region = AWSServiceRegion()
    ec2 = region.get_ec2_client()

    [instance] = yield ec2.describe_instances(instance_id)
    output = yield ec2.get_console_output(instance_id)

    keys = (
        Key.fromString(key)
        for key in extract_ssh_key(output.output)
    )

    known_hosts = KnownHostsFile.fromPath(
        FilePath(expanduser("~/.ssh/known_hosts")),
    )
    for key in keys:
        for name in [instance.dns_name, instance.ip_address]:
            known_hosts.addHostKey(name, key)
    known_hosts.save()

def extract_ssh_key(output):
    return (
        line for line in output.splitlines()
        if line.startswith(u"ssh-rsa ")
    )

react(main, argv[1:])

So, there you have it. Roughly equivalent complexity to using boto3 and on its own there’s little reason to prefer this to what Moshe has written about. However, if you have a larger Twisted-based application then you may prefer the natively asynchronous txAWS to blocking boto3 calls or managing boto3 in a thread somehow.

Also, I’d like to thank LeastAuthority (my current employer and operator of the Tahoe-LAFS-based S4 service which just so happens to lean heavily on txAWS) for originally implementing get_console_output for txAWS (which, minor caveat, will not be available until the next release of txAWS is out).

As always, if you like this sort of thing, check out the support links on the right.