22 Sep 2018, 12:00

Distributed Capabilities via Cryptography

This is a follow up to my previous capabilities post. As before, you probably want to read Capability Myths Demolished and the Noise Protocol specification first for full value extraction. This is a pretty rough draft, I was going to rewrite it but decided just to publish as is, work in progress, and write another post later, having left it for several weeks after writing it. This stuff needs a much clearer explanation.

I went to a Protocol Labs dinner last night (thanks for the invite Mike) and managed to corner Brian Warner from Agoric and ask about cryptographic distributed capabilities, which was quite helpful. This stuff has not really been written down, so here is an attempt to do so. I should probably add references at some point.

Non cryptographic capabilities

For reference and completeness, let us cover how you transmit capabilities without any cryptography, and what the downsides are. The basic model is called the “Swiss number”, after a (I suspect somewhat mythical) model of an anonymous Swiss bank account, where you just present the account number, which you must keep secret, in order to deposit and withdraw money, no questions asked. This is pretty much the standard model in the historic literature, largely written before public key cryptography was feasible to use. In modern terms, the Swiss number capability should be a random 256 bit number, and the connection should of course be encrypted to prevent snooping. The implementation is easy, just check (in constant time!) that the presented number is equal to the object’s number. Minting these is trivial. The capability is a random bearer token.

The downsides are pretty clear. First is that you may present the capability to the wrong party for checking. Checking and transfer of capabilities are very different operations, and we would like that checking did not reveal the token. This is a general problem with bearer tokens, such as JWT, they can easily be presented to the wrong party, or to a man in the middle attacker. We would like cryptographic protection for the check operation to avoid this. The second downside, which is somewhat related, is that we have no idea how to identify the intended object. Any party who has a copy of the capability can pretend to be the object it refers to, as there is no asymmetry between parties. We have to rely on some external naming system, that might be subverted. The third issue is that we have to build our own encryption, and the token we have does not help, as it does not act as a key or help identify the other party. So we have to rely on anonymous key exchange, which is subjet to man in the middle attacks as we do not know an identity for the other participant, or again some sort of external source of truth, such as the PKI system.

These downsides are pretty critical for modern secure software, so we need to do better. We will refer to these three properties, check does not reveal, object identifier, and encryption included to analyze some alternatives.

There are some things I am not going to discuss in this post. I mentioned the model of secret public keys, which appears in some of the literature, in an earlier post, but will ignore it here as it has security issues. I am not going to cover macaroons either; they are another form of bearer token with differently interesting properties.

Cryptographic Capabilities

The obvious way to solve the second problem, of being able to identify the object that the capability refers to securely, is to give the object an asymmetric key. We can then hand out the public key, which can uusefully be the object identifier and be used to locate it, while the object keeps its private key secure, and does not hand it out to any other object (it can be kept in a TPM type device as it is only needed for restricted operations). We can now set up an encrypted channel with this object, and as we know the public key up front, we can be sure we have connected to the right object if we validate this correctly. In Noise Protocol terms, we can use an NK handshake pattern, where the connecting object is anonymous but it knows the public key it is connecting to. We can also use XK (or IK) if we want to pass the object identity of the connecting object, for example for audit purposes. Once we have connected, we can use the Swissnum model to demonstrate we have the capability, but without the risk of passing the capability to the wrong party.

However, we can improve this, by using the Swissnum as a symmetric key, and incorporating it as a secret known by both parties into the asymmetric handshake. In Noise Protocol terms this is the NKpsk0 handshake (or XKpsk0) that I mentioned in my previous post. The handshake will only succeed if both parties have the same key, as the key is securely mixed into the shared symmetric key that is generated from the Diffie-Hellman exchange of the public keys. This is even better than the Swissnum method above, as the handshake is shorter as you do not need the extra phase to pass and potentially acknowledge the Swissnum; it looks pretty similar as a symmetric key is generally just an arbitrary random sequence of 256 bits or so anyway.

This model does solve all our three issues, as a handshake to the wrong party does not reveal the capability, the object cannot be spoofed by another (without stealing the private key) and the keys support and encrypted channel. It is not the only mechanism however. Minting new capabilities is easy, you just create a new symmetric key, and creating objects is easy, create an asymmetric keypair.

Instead of using an asymmetric key and a symmetric key, Brian Warner pointed out to me yesterday that we can present a certificate instead of the symmetric key. This is slightly more complex. To demonstrate possession of a capability, we will present a certificate to the object. We have to sign an ephemeral that the object presents us, and the simplest method is if the object that the capability is for has the public key to check the signature, and the capability is the private signing key. Anyone with the capability can directly sign the certificate, and you pass the private key around to transfer the capability. Note that the subject of the capability does not need to know the private signing key, so it cannot necessarily pass on a capability to access itself. This might an advantage in some circumstances. Note that the holders of the capability need to transfer a private key to pass the capability on, so they cannot hold the key in a TPM device that does not allow key export, or indeed a general cryptographic API that only supports a private key type that has signing operations but not an export operation, which has been common practise. Note that the Noise Protocol Framework support for signatures is a work in progress, scheduled for a revision later this year.

If you don’t want to pass around private keys, you could use a chained signature model, where each party that passes on a capability adds to a signature chain, authenticating the public key of the next party, all chaining down to the original key. This would mean unbounded lengths of chain though, that would be a problem for many use cases. It would provider an audit trail of how each party got the capability, but transparency logs probably do this more effectively.

Thinking about this model, actually we do not need to use signatures, we can just use encryption keys directly. The same as before the object the capability is granted over has a private encryption key, but instead of using signatures, we can create an asymmetric encryption keypair, and give the object the public key, while capability holders get the private key, and pass the private key around as the capability. So to validate an encryption handshake, the object will check that the capability holder has the correct private key, while the capability holder will validate it is talking to the object that possesses the identiy private key. In Noise protocol terms this is a KK handshake, where both parties know the public key for the other party, and verify that each possesses the private key. The signature version is a KK variant with one signature substituted for anencryption key, and there is another variant where both keys are replaced by signatures, the Noise signature protocol modifiers allow sigantures to substitute for longer term with ephemeral Diffie-Hellman key agreement in any combination, with some deferral modifictions.

So we see that rather than using the mixed symmetric and asymmetric key model (NKpsk) that I discussed before, we can use symmetric key only (KK) models for distributed capabilities. The differences for the user are relatively small, as both methods fulfil our three criteria, we just have the difference that the object need not be able to pass on capabilities to itself in the public key only model, and the fact that we have to pass around asymmetric private keys, which there is a reluctance to do sometimes. For quantum resistence, it is possible to use a combination of both symmetric and asymmetric keys, sharing a symmetric key among all parties.

27 Aug 2018, 16:00

Open Source and Cloud: who eats who?

Having been on holiday there has been a bit of an outburst of discussion around new licenses for open source, or less open source code. But in many ways the more interesting piece, that puts it in context was Joseph Jacks’ seventh Open Consensus piece, “OSS Will Eat Cloud Computing”. The Redis arguments about changing some code from the anti cloud provider AGPL to a no commercial exploitation license were only aimed at the cloud providers making open source code available as a service while often not contributing back significantly, and not providing any revenue to upstream open source companies.

But if OSS is going to eat up cloud, and the commercial open source companies are also going to be increasingly succesful, then this starts to seem like just a teething issue, soon to be forgotten. So let us look what the optimistic case looks like. First though I think it helps to look at the open source strategies of the cloud providers themselves, to give some context to the current position.

Open source and the cloud providers

AWS does not produce a lot of open source software of any great interest itself. It has started to show some interest in strategic moves, such as backing the nodeless Kubernetes work but overall, nothing of interest. Plenty of open source to help you use its services of course. On the other hand it is the main co-opter that Redis Labs complains about, shipping often slightly old and modified versions of open source software. There are many examples, from the widely used MySQL service to many other projects, such as Redis sold as “Amazon ElasticCache for Redis”, which is “Redis compatible” but does not confirm or deny being upstream Redis. Another example is Amazon ECS, which bundled open source Docker with an orchestration layer quite different from the way that the open source community was going, as a proprietary way to run Docker that has ultimately been unsuccesful.

Azure produces the largest quantity of open source of any of the three major US based cloud providers, with much of the Azure infrastructure actually being open source, including Service Fabric, the entirety of the Function as a service platform and so on. However the sheer quantity of open source coming out of Microsoft means that it needs curation and most of it has not created a deep community. They urgently need to prioritise a smaller number of core projects, get community contribution or shut them down if this fails, and get them into the CNCF to counteract the Google centric nature of the projects there. With careful management Azure could be very well positioned for an OSS eats the cloud future but it needs the community behind it, or failing that to follow the community instead.

Google is the smallest of the cloud players but has so far used open source the most strategically, and has shown that a large investment, such as that in Kubernetes, can push developer mindshare in the direction you want, and gain mass adoption. This has been combined with other succesful projects, such as Go, which has been one of the most succesful programming language launches of recent times. However it is not clear to me that this success can necessarily be replicated on an ongoing basis, and there are opportunities for other players to disrupt the strategic play. First Google demands extreme control over its projects, and many of its “open source” projects are just code over the wall plays, with no interest in real community involvement. Others offer community involvement very late once the direction is already strongly defined, as is clear in the decision to not donate Istio to the CNCF until well post the 1.0 release. There is a whole strategic roadmap being mapped out, pushing the Google way of doing things, and I predict that much of it will not stick in the end. Not every project is going to be in the right place at the right time that Kubernetes was. Another issue is the suite of “GIFFE” (Google infrastructure for everyone else) companies that ex-Googlers like to start up and Google Ventures likes to fund, which further spread the Google way, have a problem. The main issue is that Google already has an internal project that matches these, so in many cases they do not have any interest in actually buying a company with an open source reimplementation. So there is no real process for an exit to Google, unlike the classic Cisco spin out and then purchase model for startups where the viable companies get bought up by the original company again. The biggest exit in this space has been CoreOS that was purchased to remove a competitor in the Kubernetes market; the Linux distribution it started with added no value to the transaction.

The other impact of all three cloud providers that is important is the hiring. Many engineers who might otherwise be working on other open source projects are being hired by the cloud providers to work on their projects, which are largely not open source. The rapidly growing revenues and high margins mean that hiring by the three is very significant, and both Amazon and Microsoft have profits and stock prices (and hence equity compensation) that are now largely being driven by the growth in cloud. Google is still largely an advertising company, and Google Cloud is very small compared to the other two, so there is less of a direct multiplier there. This adds to pressure on salaries in the rest of the industry and shifts people to working on the cloud providers open and closed source strategic directions.

What the near term would look like

If open source is to eat cloud, cloud has to become the commodity layer. We have an similar recent model in the mobile phone providers in the 1990s. Suddenly there was a huge opportunity for telecom companies who were in a mature low margin business, plus upstart businesses to enter a high growth high margin business. Huge profits were made, new global giant companies such as Vodafone were created, and well in the end mobile became just a commodity business to run the (open source driven) internet over. Margins continue to fall, no new value was captured by the network owners despite many attempts. The details of how this failed are not that relevant perhaps; the important thing is that trillions of dollars of value capture that was hoped for, even expected, did not in the end materialize. The key is the “dumb pipes” phrase that telcos worried about so much.

Dumb cloud pipes

The route to dump cloud pipes involves a fair number of forces converging. First the explosive growth in cloud as a whole, as happened in mobile, removes much price pressure, while there is an explosion of services (especially at AWS that is constantly launching them). With the huge demand, there is initially little price sensitivity, and there is an exploration of services while people discover which are most useful. Pricing is opaque and users do not initially realise exactly what they are going to consume or how much value it has. This is our current phase, with cloud provider margins being very high. Counteracting this comes customer maturity, such as better budget control, standardised feature sets, and better price negotiation. Prices could easily start to fall again, at a rate of 20%-30% a year or higher for long periods. The clouds will try to build moats and lock in at this point, building on the current points of lock in. These especially include IAM, where models and APIs differ substantially between providers, hence the build out of deeper security related services such as cloud HSM and other areas that deepen use of these APIs. Data gravity is another moat, and some people have suggested that data storage might end up being subsidised until it is free, anything to get more data in; transit costs dominate for many use cases anyway, and highly discourage cross cloud use cases. Cloud provider network costs are deliberately high.

In general, like the old saying about the internet, that it sees censorship as something to route around, open source tends to see lock in and moats as something to fill in. We already have a situation where the S3 API (the oldest part of AWS) is the standard for all providers, and has open source implementations such as Minio. Managed Kubernetes is another area where all the providers are being forced by the market to provide a standard interface. Pure compute is not so standardised but is undifferentiated. The next thing we see coming are higher level interfaces over whole classes of API; one example of the type of approach is Pulumi that provides a very different, programming language focused rather than API focused, but designed to work across arbitrary clouds without caring. Note that some of the Google open source efforts promote these type of changes, in order to try to make their cloud more interchangeable with AWS in particular, but they also have a large amount of proprietary technology that they are using to attempt moat building at the same time.

Community of purpose

There are some open source companies already working in this space, including my employer Docker and several other large scale companies, as well as the wealth and diversity of smaller, growing companies that make up the current community. As Joseph points out in his post, these commerical open source companies are growing very rapidly but this is being largely ignored as cloud is more obvious. There is plenty more room of course, and as customers gradually realise that the cloud provision is a dumb pipe and the open source software they run on top is where the actual value is they will want to get it from the real providers, and engage directly with the communities to contribute and pay for changes and support.

Ultimately it is the end customers who swing the situation, realise that pipes and hardware are just utility, and the people there they like we have seen elsewhere continue to move towards and engage with open communities, open source communities, and demand that their organizations do fully engage too. So far we have seen that every enterprise has engaged in the consumption of open source software, but its value is still only tangentially embedded in the majority of organisations. A decade ago we used to sell open source software because people would decide they would have one open source vendor in a pitch, and we would win it. Soon there will be whole areas of software, especially infrastructure, where closed source, including closed source cloud services just won’t be viable

Countering the rosy view that the experience of open source as a better development model will inevitably lead to growth in understanding and use of open source, what if people just like free or cheap and convenient, and cloud delivered proprietary services are good enough? Essentially though that is just an argument that cloud providers are the best at producing convenient interfaces; historically that has been true as it is their business, but it is not an exclusive ability, just one that needs working on. As Sophie Haskins points out, open source companies have often undervalued the work on actually making their code deployable and maintainable in production, which the cloud providers have done instead, in a closed way instead. Taking back ownership of this clearly will help.

Overall the question is will open communities simply fold over in the path of cloud provision, or will they route around blockages to open innovation and co-opt the infrastructure for new purposes and tools. It is hard not to be optimistic given the current rate of innovation.

08 Jul 2018, 22:21

From Filesystems to CRUD and Beyond

From Filesystems to CRUD and Beyond

In the beginning there was the filesystem. Well, in 1965 to be precise, when Multics introduced the first filesystem. The Multics filesystem was then implemented in Unix, and is pretty recognisable with a few tweaks with what we use as filesystems now. Files are byte addressed, with read, write and seek operations, and there are read, write and execute permissions by user. Multics had permissions via a list of users rather than groups, but the basic structure is similar.

We have got pretty used to filesystems, and they are very convenient, but they are problematic for distributed and high performace systems. In particular, as standardised by Posix, there are severe performance problems. A write() is required to write data before it returns, even in a situation where multiple writers are writing to the same file; the write should then be visible to all other readers. This is relatively easy to organize on a single machine, but it requires a lot of synchronisation on a cluster. In addition there are lots of metadata operations, around access control, timestamps that are technically required, although access time is often disabled, and directory operations are complex.

The first well known alternative to the Posix filesystem was probably Amazon S3, launched in 2006. This removed a lot of the problematic Posix constraints. First there are no directories, although there is a prefix based listing that can be used to give an impression of directories. Second, files can only be updated atomically as a whole. This makes it essentially a key value store with a listing function, and events. Later optional versioning was added too, so previous versions of a value could be retrieved. Access control is a rather complex extended version of per user ACLs, read, write and ability to make changes. S3 is the most succesful and largest distributed filesystem ever created. People rarely complain that it is not Posix compatible; atomic file update actually seems to capture most use cases. Perhaps the most common complaint is inability to append, as people are not used to the model of treating a set of files as a log rather than an individual appended file. There are interfaces to treat S3 as a Posix-like filesystem, such as via Fuse, although they rarely attempt to emulate full semantics and may do a lot of copying behind the scenes, they can be convenient for some use cases where users like to have a familiar interface.

One of the reasons for the match between S3 and programs was that it was designed around the HTTP verbs: GET, PUT, DELETE. The HTTP resource model, REST, was documented by Fielding in 2000, before S3, and PATCH, which you can argue has slightly more Posixy semantics, was only added in 2010. S3 is an HTTP service, and feels web native. This was the same move that led to Ruby on Rails in 2005, and the growth of the CRUD (create, read, update, delete) application as a design pattern, even if that was originally typically database backed.

So, we went from the Multics model to a key value model for scale out, while keeping access control unchanged? Is this all you need to know about files? Well actually no, there are two other important shifts, that remain less well understood.

The first of these is exemplified by Git, released in 2005, although the model had been around before that. The core part of git is a content addressed store. I quite like the term “value store” for these; essentially they are just like a key value store only you do not get to choose the key, usually it is some sort of hash of the content, SHA1 (for now) in Git, SHA256 for Docker images and most modern versions. Often this is implemented by a key value store, but you can optimise, as keys are always the same size and more uniformly distributed. Listing is not a basic operation either. The user interface for content addressed stores is much simpler, of the CRUD operations, only create and read are meaningful. Delete is usually a garbage collection operation, and there is no update. From the distributed system point of view there is no longer any need to deal with update races, ETags and so on, removing a lot of complexity. The content provides the key so there are no clashes. Many applications will use a small key valuestore as well, such as for naming tags, but this is a very small part of the overall system.

Sadly, content addressible systems are not as common as they should be. Docker image layers were originally not content addressed, but this was switched for security reasons a long time ago. There are libraries such as Irmin that present a git model for data structures, but CRUD based models still dominate developer mindshare. This is despite the advantages for things like caches, where content addressed data can have an infinite cache lifetime, and the greater ease of distribution. It is now possible to build a content addressed system on S3 based on SHA256 hashes, as signed URLs, now that S3 supports an x-amz-content-sha256 header, see this gist for example code. The other cloud providers, and Minio, currently still only support MD5 based content hashes, or CRC32c in the case of Google, that are of no use at all. Hopefully they will update this to a modern useful content hash soon. I would highly recommend looking at whether you can build the largest part of systems on a content addressed store.

The second big change is even less common so far, but it starts to follow on from the first. Access control via ACL is complicated and easy to make mistakes with. With content addressed storage, in situations where access control is not uniform, such as private images on Docker hub, access control is complicated. Ownership is also complicated as many people could have access to some pieces of content. The effective solution here is to encrypt content and use key management for access control. Encryption as an access control method has upsides and downsides. It simplifies the common read path, as no access control is needed on the read side at all. On the write side, with content addressing, you just need a minimal level of access control to stop spam. On the downside there is key management to deal with, and a possible performance hit. Note the cloud providers provide server side encryption APIs, so they will encrypt the contents of your S3 buckets with keys that they have access to and which you can use IAM to delegate, but this is somewhat pointless, as you still have exactly the same IAM access controls, and no end to end encryption; it is mainly a checkbox for people who think it fixes regulatory requirements.

So in summary, don’t use filesystems for large distributed systems, keep them local to one machine. See if you can design your systems based on content addressing, which scales best, and failing that use a key value store. User ACL based access control is complicated to manage to scale, although cloud providers like it as it gives them lock in. Consider encrypting data that needs to be private as an access control model instead.

06 Feb 2018, 20:40

Making Immutable Infrastructure simpler with LinuxKit

Config Management Camp

I gave this talk at Config Management Camp 2018, in Gent. This is a great event, and I recommend you go if you are interested in systems and how to make them work.

Did I mention that Gent is a lovely Belgian town?

The slides can be downloaded here.

Below is a brief summary.


Some history of the ideas behind immutability.

“The self-modifying behavior of both manual and automatic administration techniques helps explain the difficulty and expense of maintaining high availability and security in conventionally-administered infrastructures. A concise and reliable way to describe any arbitrary state of a disk is to describe the procedure for creating that state.”

Steve Traugott, Why Order Matters: Turing Equivalence in Automated Systems Administration, 2002

“In the cloud, we know exactly what we want a server to be, and if we want to change that we simply terminate it and launch a new server with a new AMI.”

Netflix Building with Legos, 2011

“As a system administrator, one of the scariest things I ever encounter is a server that’s been running for ages. If you absolutely know a system has been created via automation and never changed since the moment of creation, most of the problems disappear.”

Chad Fowler,Trash Your Servers and Burn Your Code, 2013

“Use container-specific OSes instead of general-purpose ones to reduce attack surfaces. When using a container-specific OS, attack surfaces are typically much smaller than they would be with a general-purpose OS, so there are fewer opportunities to attack and compromise a container-specific OS.”

NIST Application Container Security Guide, 2017


Updating software is a hard thing to do. Sometimes you can update a config file and send a SIGHUP, other times you have to kill the process. Updating a library may mean restarting everything that depends on it. If you want to change the Docker config that means restarting all the containers potentially. Usually only Erlang programs self update correctly. Our tooling has got a domain specific view of how to do all this, but it is difficult. Usually there is some downtime on a single machine. But in a distributed system we always allow for single system downtime, so why be so hardcore about updates? Just restart the machine with a new image, that is the immutable infrastructure idea. Not immutable, just disposable.


Immutability does not mean there is no state. Twelve factor apps are not that interesting. Everything has data. But we have built Unix systems based on state being all kind of mixed up everywhere in the filesystem. We want to try to split between immutable code and mutable application state.

Functional programming is a useful model. There is state in functional programs, but it is always explicit not implicit. Mutable global state is the thing that functional programming was a reaction against. Control and understand your state mutation.

Immutability was something that we made people do for containers, well Docker did. LXC said treat containers like VMs, Docker said treat them as immutable. Docker had better usability and somehow we managed to get people to think they couldn’t update container state dynamically and to just redeploy. Sometimes people invent tooling to update containers, with Puppet or Chef or whatever, those people are weird.

The hard problems are about distributed systems. Really hard. We can’t even know what the state is. These are the interesting configuration management problems. Focus on these. Make the individual machine as simple as possible, and just think about the distribution issues. Those are really hard. You don’t want configuration drift on machines messing up your system, there are plenty of ways to mess up distributed systems anyway.


Why are there no immutable system products? Actually the sales model does not work well with something that is at build time only, not running on your infrastructure. The billing models for config management products don’t really work well. Immutable system tooling is likely to remain open source and community led for now. Cloud vendors may well be selling you products based on immutable infrastructure though.


LinuxKit was originally built for Docker for Mac. We needed a simple embedded, maintainable, invisible Linux host system to run Docker. The first commit message said “not required: self update: treated as immutable”. This project became LinuxKit, open sourced in 2017. The only kind of related tooling is Packer, but that is much more complex. One of the goals for LinuxKit was that you should be able to build an AWS AMI from your laptop without actually booting a machine. Essentially LinuxKit is a filesystem manipulation tool, based on containers.

LinuxKit is based on a really simple model, the same as a Kubernetes pod. First a sequential series of containers runs, to set up the system state, then containerd runs the main services. This config corresponds to the yaml config file, which itself is used to build the filesystem. Additional tooling lets you build any kind of disk format, for EFI or BIOS, such as ISOs, disk images or initramfs. There are development tools to run images on cloud providers and locally, but you can use any tooling, such as Terraform for production workloads.

Why are people not using immutable infrastructure?

Lack of tooling is one thing. Packer is really the only option other than LinuxKit, and it has a much more complex workflow involving booting a machine to install. This makes a CI pipeline much more complex. There are also nearly immutable distros like Container Linux, but this is very hard to customise compared to LinuxKit.


This is a very brief summary of the talk. Please check out LinuxKit it is an easy, different and fun way to use Linux.

21 Jan 2018, 23:00

Using the Noise Protocol Framework to design a distributed capability system


In order to understand this blog you should know about capability-based security. Perhaps still the best introduction, especially if you are mainly familiar with role based access control is the Capability Myths Demolished paper.

You will also need to be familiar with the Noise Protocol Framework. Noise is a fairly new crypto meta protocol, somewhat in the tradition of the NaCl Cryptobox: protocols you can use easily without error. It is used in modern secure applications like Wireguard. Before reading the specification this short (20m) talk from Real World Crypto 2018 by Trevor Perrin, the author, is an excellent introduction.


Our stacks have becoming increasingly complicated. One of the things I have been thinking about is protocols for lighter weight interactions. The smaller services get, and the more we want high performance services, the more the overhead of protocols designed for large scale monoliths don’t perform. We cannot replace larger scale systems with nanoservices, serverless and Edge services if they cannot perform. In addition to performance, we need scaleable security and identity for nanoservices. Currently nanoservices and serverless are not really competitive in performance with larger monolithic code, which can serve millions of requests a second. Current serverless stacks hack around this by persisting containers for minutes at a time to answer a single request. Edge devices need simpler protocols too; you don’t really want GRPC in microcontrollers. I will write more about this in future.

Noise is a great framework for simple secure crypto. In particular, we need understandable guarantees on the properties of the crypto between services. We also need a workable identity model, which is where capabilities come in.


Capability systems, and especially not distributed capability systems, are not terribly widely used at present. Early designs included KeyKos, and the E language, which has been taken up as the Cap’n Proto RPC design. Pony is also capability based, although these are somewhat different deny capabilities. Many systems include some capability like pieces though; Unix file descriptors are capabilities for example, which is why file descriptor based Unix APIs are so useful for security.

With a large number of small services, we want to give out fine grained capabilities. With dynamic services, this is much the most flexible way of identifying and authorizing services. Capabilities are inherently decentralised, with no CAs or other centralised infrastructure; services can create and distribute capabilities independently, and decide on their trust boundaries. Of course you can also use them just for systems you control and trust too.

While it has been recognised for quite a while that there is [an equivalence between public key cryptography and capabilities(http://www.cap-lore.com/CapTheory/Dist/PubKey.html), this has not been used much. I think part of the reason is that historically, public key cryptography was slow, but of course computers are faster now, and encryption is much more important.

The correspondance works as follows. In order for Alice to send an encrypted message to Bob, she must have his public key. Usually, people just publish public keys so that anyone can send them messages, but if you do not necessarily do this things get more interesting. Possession of Bob’s public key gives the capability of talking to Bob; without it you cannot construct an encrypted message that Bob can decode. Actually it is more useful to think of they keys in this case not as belonging to people but as roles or services. Having the service public key allows connecting to it; having the private key lets you serve it. Note you still need to find the service location to connect; a hash of the public key could be a suitable DNS name.

On single hosts, capabilities are usually managed by a privileged process, such as the operating system. This can give out secure references, such as small integers like file descriptors, or object pointers protected by the type system. These methods don’t really work in a distributed setup, and capabilities need a representation on the wire. One of the concerns in the literature is that if a (distributed) capability is just a string of (unguessable) bits that can be distributed, then it might get distributed maliciously. There are two aspects to this. First if a malicious agent has a capability at all, it can use it maliciously, including proxying other malicious users, if it has network access. So being able to pass the capability on is no worse. Generally, only pass capabilities to trusted code, ideally code that is confined by (lack of) capabilities in where it can communicate and does not have access to other back channels. Don’t run untrusted code. In terms of keys being exfiltrated unintentionally, this is also an issue that we generally have with private keys; with capabilities all keys become things that, especially in these times of Sceptre, we have to be very careful with. Mechanisms that avoid simply passing keys, and pass references instead, seem to me to be more complicated and likely to have their own security issues.

Using Noise to build a capability framework

The Noise spec says “The XX pattern is the most generically useful, since it supports mutual authentication and transmission of static public keys.” However we will see that there different options that make sense for our use case. The XX pattern allows two parties who do not know each other to communicate and exchange keys. The XX echange still requires some sort of authentication, such as certificates to see if the two parties should trust each other.

Note that Trevor Perrin pointed out that just using a public key is dangerous and using a pre-shared key (psk) in addition is a better design. So you should use psk+public key as the capability. This means that accidentally sharing the public key in a handshake is not a disastrous event.

When using keys as capabilities though we always know the public key (aka capability) of the service we want to connect to. In Noise spec notation, that is all the ones with <- s in the pre-message pattern. This indicates that prior to the start of the handshake phase, the responder (service) has sent their public key to the initiator (directly or indirectly). That is, that the initiator of the communication posseses the capability required to connect to the service, in capability speak. So these patterns are the ones that correspond to capability systems; for the interactive patterns that is NK, KK and XK.

NK corresponds to a communication where the initiator does not provide any identification. This is the normal situation for many capability systems; once you have the capability you perform an action. If the capability is public, or widely distributed, this corresponds more or less to a public web API, although with encryption.

XK, and IK are the equivalent in web terms of providing a (validated) login along with the connection. The initiator passes a public key (which could be a capability, or just used as a key) during the handshake. If you want to store some data which is attached to the identity you use as the passed public key, this handshake makes sense. Note that the initiator can create any number of public keys, so the key is not a unique identifier, just one chosen identity. IK is the same semantics but has a different, shorter, handshake with slightly different security properties; it is the one used by Wireguard.

KK is an unusual handshake in traditional capability terms; it requires that both parties know in advance each other’s public key, ie that there is in a sense a mutual capability arrangement, or rendezvous. You could just connect with XK and then check the key, but having this in the handshake may make sense. An XK handshake could be a precursor to a KK relationship in future.

In addition to the more common two way handshakes, noise supports unidirectional one way messages. It is not common to use public key encryption for offline messages, such as encrypting a file or database record at present. Usually symmetric keys are used. The noise one way methods use public keys, and all three N, X and K require the recipient public key (otherwise they would not be confidential), so they all correspond to capabilities based exchanges. Just like the interactive patterns, they can be anonymous or pass or require keys. There are disadvantages to these patterns, as there is no replay protection, as the receiver cannot provide an ephemeral key, but for offline uses, such as store and forward, or file or database encryption. Unlike symmetric keys for this use case, there is a seperate sender and receiver role, so the ability to read database records does not mean the ability to forge them, improving security. It also fits much better in a capabilities world, and is simpler as there is only one type of key, rather than having two types and complex key management.


I don’t have an implementation right now. I was working on a prototype previously, but using Noise XX and still trying to work out what to do for authentication. I much prefer this design, which answers those questions. There are a bunch of practicalities that are needed to make this usable, and some conventions and guidelines around key usage.

We can see that we can use the Noise Protocol as a set of three interactive and three one way patterns for capability based exchanges. No additional certificates or central source of authorisation is needed other than public and private key pairs for Diffie Hellmann exchanges. Public keys can be used as capabilities; private keys give the ability to provide services. The system is decentralised, encrypted and simple. And there are interesting properties of mutual capabilities that can be used if required.

/* removed Google analytics */