26 Jan 2020, 13:30


Next weekend is Fosdem, the largest open source event in Europe. A lot of people will no doubt be coming for the first time, or thinking about coming another year, so I thought it might be helpful to explain what it is. Fosdem is not really like any other event, so Americans in particular find it confusing, thinking it might be like OSCON or something. It is not. Of US events I know, it is perhaps most like All Things Open, but it really is a different thing. My qualifications for writing this are that I have been on and off since about 2004. I worked there a few years, back when Greenpeace ran the conference WiFi, before Cisco took over, and I have spoken once.

The first practicality is you notice you don’t have to register, or indeed pay. You should however donate (on site) if you can afford it, although they will try to give you a really ugly t-shirt if you do. Most people do not donate, so the conference relies on volunteers, the Université libre de Bruxelles which gives the space, and, increasingly, corporate sponsors. The next practicality is where to stay. The location is not very central, and while there is a tram link it can get extraordinarily full. The best plan is to either stay within walking distance, or to stay near the start of the tram line, which is near St Catherine in the centre of Brussels. You can also use taxi/Uber but the sheer number of people trying to get to and from the location can mean delays. Brussels is one of my favourite cities in Europe, and along with my friends who live there, one of the reasons I usually decide to attend. I highly recommend you spend some time visiting the city. It is February though, so bring hat, gloves and warm clothes. Some years it has been snowy and the hills get slippery so be careful walking around, and allow extra time.

Brussels snow

The next practicality is that this conference is overwhelmingly attended by white men. Most tracks will not have any women speakers. We know tech has a diversity problem, but it is really in your face here more than other places. Since 2016 there has at least been a code of conduct after Sarah Mei wrote about it in 2015. Richard Stallman attended as recently as 2016. Sarah’s piece says it “feels like 2007”, and this is changing very slowly.

Fosdem attendees

Fosdem started as a developer meetup place, where distributed communities would meet to hack on things, and talk about what they have done. So everything is divided by project like grouping. There are a large number of rooms for talks, but not enough for all the diversity of modern open source, so some years projects like Perl that always used to have Fosdem community meetings don’t get a room, and things get grouped where they used to be split, like “small languages” or “desktop”. From an audience point of view thats better, and the community meetings do tend to happen, in the hacking rooms, over meals and so on. The traditional thing to do is sit in one room all day, but of course lots of people are interested in learning about new things and want to wander around. And some things are massively popular and in smallish rooms (most rooms are smallish), such as the Go room in recent years.

So the talk you want to go to might well be full. Full means full, if the sign is on the door it means you won’t get in. Remember all the talks are recorded and streamed, with AV team is pretty amazing. Years ago only some of the rooms were recorded, but now you won’t miss them. So have a backup plan. I remember a particularly enjoyable we can’t get into the Go devroom meeting with Jaana and others one year. Overall my strategy is generally to go to a few things at random that might be interesting, maybe target a few specific ones that I really want to go to (and go early maybe for the previoustalk) but not regret if I can’t get in, and spend most of the time talking to people. The random things can be great, that is how I started working on NetBSD and rump kernels, after going to a talk pretty much because I thought a talk about testing kernels might be interesting. You never know what paths you might go down in future.

Note there is a growing Fringe of events around Fosdem, both before, after and during. No dount, like with the Edinburgh Festical, the Fringe will soon dwarf the original event.

The whole event is really hectic, and there are going tobe maybe 6,000 people there, maybe it is more. This gets overwhelming, so take time out for yourself. I am only planning to attend on Saturday this year, and just to chill out in Brussels on Sunday.

Posters from Fosdem

Fosdem has a strong culture of open source as freedom and as a political statement, and there is widespread antipathy to corporate open source. For a long time there was no real sign of the larger tech companies, but this has changed in recent years, with Google and AWS sponsoring this year as is the CNCF, and visible presence of more corporate and industry rather than grassroots open source. You will meet people who don’t like this, don’t like permissive licenses, and might object to your company’s open source policies. In many ways this feels kind of refreshing

Food is very important. Talks run all day, so you need to plan some time for lunch. The quickest thing is the baguettes that are available at various places, eg downstairs back of Jansson. They are very efficient about dispensing these fast. There isn’t much choice. There are food trucks out front, with huge queues at lunchtime. I usually go down the road to Le Pain Quotidian (eat in or take away) in the small cluster of shops down the road. That is busy but less so. There is really not much else around this area.

Coffee is important too. There is a GitHub sponsored coffee stall that is good, but it is free so the queue tends to be very long. The next best coffee is at the cafeteria. Le Pain Quotidian does coffee too. If you want tea, on Saturday this year OpenUK are serving tea and biscuits and Brexit commiseration on their stand.

Beer is a fixture at Fosdem. Belgium makes some of the finest beers in the world, and some ok ones too. Beers are sold at several points in the venue, and it is common to take them to talks and so on. Beware most Belgian beers are strong. Also the kriek they sell at the venue is terrible, even though Belgium makes some amazing examples of this beer style. There is a pre-conference “beer event” on Friday, I haven’t even tried to go for many years, even though they take over an entire street it is too crowded to be enjoyable or find anyone you want to talk to. Yes, there are a lot of alcohol focused events, and events in bars which could be offputting if you don’t drink.

Antique shop

Brussels is a lovely city. The architecture is beautiful, both the old as exemplified by Grand Place which is magical in the evening, and the art deco gems, such as the Musical Instrument Museum, once a shop, and the diversity everywhere. It is said that there is a rule about not copying buildings, although I am not sure this is really the cause, but Belgium does not have terraces of identical houses, but every building is totally different. The Belgians are also as eccentric as the British, if not more so. Also don’t miss the Galaries Royal St Hubert, the first glazed shopping street in Europe, from 1847.

Perhaps my favourite area are the parts between Sablon, which has a grand antique market and excellent chocolate shops, and the Marché aux Puces, the flea market which is full of junk. In between are several streets lagely filled with antique shops, selling midcentury furniture, and well everything. Some are huge inside full of things and stuff of every kind just jumbled up anyhow. There are often amazing window displays like the one below.

window display

Food in Brussels is really good, although Fosdem is not always the best time to eat as you are often with indeterminate amounts of people and getting reservations, which are often needed on Friday and Saturday nights, is hard. Also most places are small. Brussels is very international, and all kinds of food are available there. While most people just think that Belgian food is frites with mayonnaise and waffles, but there is both French and Flemish food that are traditional, and great seafood, not just mussels. The local beer is lambic beer, the sourdough of beer styles made with wild yeast that is only made in the region. Cantillon is one of the best, and has an amazing museum in the working brewery in Brussels. This styleof beer is sour, but it is absolutely delicious. If you love this style, Moeder Lambic is a great place to try it. There have been a number of new breweries open recently, de la Senne is excellent and available in good bars.

This year, the Friday night before Fosdem is Brexit. Brussels has a large UK community, and Fosdem always has a large UK contingent, with whole Eurostar trains being filled on Friday evening usually. So be nice to any of us you see.

So, yeah, that is Fosdem. Unique. Could be better. Enjoy Brussels.

Brussels market

01 Nov 2019, 17:27

Linearity among the toctou

Illustration from Paul Klee, Pedagogical Sketchbook, 1925

I have been reading a lot of papers on linear types recently. Originally it was to understand better why Rust went down the path it did, but I found a lot more interesting stuff there. While some people now are familiar with linear typesas the basis for Rust’s memory management, they have been around for a long time and have lots of other potential uses. In particular they are interesting for improving resource allocation in functional programming languages by reusing storage in place where possible. Generally they are useful for reasoning about resource allocation. While the Rust implementation is probably the most widely used at present, it kind of obscures the underlying simple principles by adding borrowing, so I will only mention it a little in this post.

So what are linear types? I recommend you read “Use-once” variables and linear objects: storage management, reflection and multi-threading by Henry Baker, as it is the best general overview I have found. The basic idea is extremely simple, linear variables can only be used once, so any function that receives one must either return it, or pass it to another function that consumes it. Using values only once sounds kind of weird and restrictive, but there are some ways it can be made easier. Some linear types may have an explicit copy operation to duplicate them, and others may have operations that return a new value, in a sequential way. For example a file object might have a read operation that returns the portion read and a new linear object to read for the next part, preserving a functional model: side effects are fine if you cannot reuse a variable. You won’t really recognise much of the Rust model here, as it allows borrows, which presents a much less austere effect. It does all sound fairly odd until you get used to it, even though it is simpler than say monads as a way of sequencing. Note also that there are related affine types,where you can use values zero or one times, so values can be discarded, and other forms such as uniqueness types, and many other fun variants in the literature.

Memory is probably the easiest way to understand the use cases. Think about variables as referring to a chunk of memory, rather than being a pointer. Memory can be copied, but it is an explicit relatively costly operation (ie memcpy) on the memory type, so the normal access should be linear with explicit copying only if needed. Because the value of the memory may be changed at any time by a write, you need to make sure there are not multiple writers or readers that are not reading in a deterministic order. Rust does this with mutable borrows, and C++ has a related thing with move semantics.

Rust’s borrow checker allows either a single reference with read and write access, or multiple readers when there is no write access. Multiple readers is of course not a linear access pattern, but is safe as multiple reads of an immutable object return the same value. The complexity of the borrow checker comes from the fact that objects can change between these states, which requires making sure statically that all the borrows have finished. Some of the use cases for linearity in functional languages relate to this, such as efficiently initialising an object that will be immutable later, so you want linear write access in the initialisation phase, followed by a non linear read phase. There are definitely interesting language tradeoffs in how to expose these types of properties.

Anyway, I was thinking about inter process communication (IPC) again recently, in particular ring buffer communication between processes, and it occured to me that this is another area where linearity is a useful tool. One of the problems with shared memory buffers for communication, where one process has read access and the other write access for each direction of communication is that the writing process may try to attack the reader by continuing to write after reading has started. The same issue applies for userspace to kernel communication, where another userspace thread may write to a buffer that the kernel has already read. This is to trigger a time of check time of use (toctou) attack, for example if there is a check that a size is in range, but after that the attacker increases it. The standard defence is to copy buffers to a private buffer, where validation may happen undisturbed. This of course has a performance hit, but many IPC implementations, and the Linux kernel, do this for security reasons.

Thinking about toctou as a linearity problem, we can see that “time of check” and “time of use” are two different reads, and if we treat the read buffer as a linear object, and require that its contents are each only read once, then time of check and time of use cannot be different. Note of course that it does not matter exactly which version gets read, all that matters is that it is a consistent one. We have to remember the value of the part we check and keep that for later if we can’t use it immediately. So linear read has its uses. Of course it is not something that programming languages give us at present, generally a compiler will assume that it can reload from memory if it needs to. Which is why copying is used; copying is a simple linear operation that is available. But there are often cases where the work being done on the buffer can be done in a linear way without copying, if only we had a way of telling the compiler or expressing it in the language.

Overall, I have found the linear types literature helpful in finding ways to think about resource allocation, and I would recommend exploring in this space.

21 Jul 2019, 20:52

Fuzz rising

Go and read the excellent blog post from Cloudflare on their recent outage if you haven’t already.

I am not going to talk about most of it, just a few small points that especially interest me right now, which are definitely not the most important things from the outage point of view. This post got a bit long so I split it up, so this is part one.

Fuzz testing has been around for quite some time. American Fuzzy Lop was released in 2013, and was the first fuzzer to need very little configuration to find security issues. This paper on mutational fuzzing is a starting point if you are interested in the details of how this works. The basic idea is that you start with a valid input, and gradually mutate it, looking for “interesting” changes that change the path the code takes. This is often coverage guided, so that you attempt to cover all code paths by changing input data.

Fuzz testing is not the only tool in the space of automated security issue detection. There is traditional static analysis tooling, although it is generally not very efficient at finding most security issues, other than a few things like SQL injection that are often well covered. It tends to have a high false positive rate, and unlike fuzz testing will not give you a helpful test case. Of course there are many other things to consider in comprehensive security testing, this list of considerations is very useful. Another technique is automated variant analysis, taking an existing issue and finding other cases of the same issue, as done by platforms such as Semmle.

Fuzzing as a service is available too. Operationally fuzzing is not something you want to run in your CI pipeline, as it is not a test that finishes, it is something that you should run continuously 247 on the latest version of your code to find issues, as itstill takes a long time to find issues, and is randomised. Services include Fuzzbuzz a fairly new commercial service (with a free tier) who are very friendly, Microsoft Security Risk Detection and Google’s OSS-Fuzz for open source projects.

As Cloudflare commented “In the last few years we have seen a dramatic increase in vulnerabilities in common applications. This has happened due to the increased availability of software testing tools, like fuzzing for example.” Some numbers give an idea of the scale: as of January 2019, Google’s ClusterFuzz has found around 16,000 bugs in Chrome and around 11,000 bugs in over 160 open source projects integrated with OSS-Fuzz. We can see the knock on effect on the rate of CVEs being reported.

If we look at the kinds of issues found, data from a 2017 Google blog post the breakdown is interesting.

As you can see a very large proportion are buffer overflows, manual memory management issues like use after free, and the “ubsan“ category, which is all the stuff in C or C++ code that if you happen to write it the compiler can turn your program into hot garbage if it feels like it. Memory safety is still a major cause of errors, as you can see if you follow the @LazyFishBarrel twitter account. Note that the majority of projects are still not running comprehensive automated testing for these issues, and this problem is rapidly increasing. Note that there are two factors at play: first, memory errors are an easier target than many other sorts of errors to find with current tooling, but second there is a huge codebase that has huge numbers of these errors.

Microsoft Security Response Center also just released a blog post with some more numbers. While ostensibly about Microsoft’s gradually increasing coding in Rust, the important quote is that “~70% of the vulnerabilities Microsoft assigns a CVE each year continue to be memory safety issues”.

In my talk at Kubecon I touch on some of these issues with C (and to some extent C++) code. The majority of the significant issues found in the CNCF security audits were in C or C++ code, despite the fact there is not much of the is code in the reviewed projects.

Most of the C and C++ code that causes the majority of open source CVEs is shipped in Linux distributions. Linux distros are the de facto package manager for C code, and C++ to a lesser extent; neither of these langauges have developed their own language specific package management yet. From the Debian stats, of the billion or so lines of code, 43% is ANSI C and 24% is C++ which has many of the same problems in many codebases. So 670 million lines of code, in general without enough maintainers to deal with the existing and coming waves of security issues that fuzzing will find. This is the backdrop of increasing complaints about unfixed CVEs in Docker containers, where these tend to me more visible due to wider use of scanning tools.

Is it worth fuzzing safer languages such as Go and Rust? Yes, you will still find edge conditions, and potentially other cases such as race conditions, although the payoff will not be nearly as high. For C code it is absolutely essential, but bugs and security issues are found elsewhere. Oh and fuzzing is fun!

My view is that we are just at the beginning of this spike, and we will not just find all the issues and move on. Rather we will end up with the Linux distributions, which have this code will end up as toxic industrial waste areas, the Agbogbloshie of the C era. As the incumbents, no they will not rewrite it in Rust, instead smaller more nimble different types of competitor will outmanouvre the dinosaurs. Linux distros generally consider that most of their role is packaging not creation, with a few exceptions like Systemd; most of their engineering work is in the long term support business, which still pays well despite being increasingly out of step with how non-C software is used, and how cloud deployments work, where updating software is part of normal life, and five or ten year software lifetimes without updates are not the target. We are not going to see the Linux distros work on solving this issue.

Is this code exploitable? Almost certainly yes with sufficient effort. We discussed Thomas Dulien’s paper Weird machines, exploitability, and provable unexploitability at the Säntis Systems Summit recently, I highly recommend it if you are interested in exploitability. But overall, proving code is not exploitable is in general not going to be possible, and attackers always have the advantage. Sure they will pick the easiest things first, but most attacks are automated now and attacking scales well. Security is risk management, but with memory safety being a relatively easy exploit in many cases, it is a high risk. Obviously not all this code is exposed to attackers via network or attacker supplied data, especially in containerised environments, but some is, and you will spend increasing amounts of time working out what is a risk. The sheer volume of security issues just makes risk management more difficult.

If you are a die hard C hacker and want to remain one, the last bastion of C is of course OpenBSD. Throw up the pledge barricades, remove anything you can, keep reviewing. That is the only heroic path left.

In the short term, start to explore and invest in ways to replace every legacy C dependency you are currently using. Write a deprecation roadmap. Cut down your dependencies on Linux distributions. Shift to memory safe languages everywhere, and if you use C++ make sure you only use the safer subset. Look to smaller more nimble Linux distributions that start shipping memory safe code; although the moves here have been slow so far, you only need a little as once distros stop having to be C package managers they can do a better job of being minimal userspaces. There isn’t much code you really need to run modern applications that themselves do not have many C dependencies, as implementations like LinuxKit show. If you just sit on top of the kernel, using its ABI stability guarantees there is little you need to do other than a little configuration; well other than worry about the bugs in a kernel written in … C.

Memory unsafe languages are not going to get better, or safe. It is time to move on.

27 Jan 2019, 19:00

Kubernetes as an API standard

There is now a rustyk8s mailing list to discuss implementations of the Kubernetes API in Rust.

There was a lot of interest in my tweet a couple of months about writing an implementation of the Kubernetes API in Rust. I had a good conversation at Kubecon with some people about it, and thought I should explain more why it is interesting.

Kubernetes is an excellent API for running code reliably. So much so that people want to run it everywhere. People have described it as the universal distributred systems API, and something that will eventually be embedded into hardware, or the kernel (or Linux) of distributed systems. Maybe some of these are ambitious, but nothing wrong with ambition, and hey it is a nice, simple API at its core. Essentially it just does reconciliation between the world and desired state for an extensible set of things, things that include a concept of a pod by default. That is pretty much it, a simple idea.

A simple idea, but not simply expressed. If you build a standalone Kubernetes system, somehow that simple idea amounts to a gigabyte of compiled code. Sure, there are some extraneous debug symbols, and a few extra versions of etcd for version upgrades, and maybe one day Go will produce less bloated code, but that is not going to cut it for embedded systems and other interesting potential use cases of Kubernetes. Nor is it easy to understand, find your way around the code and hack on it.

Another problem with Kubernetes is that it suffers from the problem that the implementation is the specification. Lots of projects start like that but as they mature the specification is often separated, and alternative implementations can thrive. Without an independent specification, alternative implementations often have to copy every accidental nuance of the original, and even replicate bugs. Kubernetes is in the right state where starting to move towards an independent specification would be productive. We know that there are some rough edges in the implementation that need to be cleared up, and some parts where the API is not yet the best it could be.

One approach is to try to cut back the current implementation to a more manageable size, by removing parts. This is what Darren Shepherd of Rancher has done with “k3s”, removing a million or so lines of code. But a second, complementary approach is to build a new simple implementation from the ground up without any baggage to start with. Then by looking at differences in behaviour, you can start to understand which parts are the core specification, and which parts are accidental. Given that the way the code for Kubernetes is written has been described as a “clusterfuck” by Kris Nova, this seems a productive route: “Unknown to most, Kubernetes was originally written in Java… If the anti patterns weren’t enough we also observe how Kubernetes has over 20 main() functions in a monolithic “build” directory… Kubernetes successfully made vendoring even more challenging than it already was, and discuss the pitfalls with this design. We look at what it would take to begin undoing the spaghetti code that is the various Kubernetes binaries.”

Of course we could write a new implementation in Go, but the temptation would then be to import bunches of existing code, and it might not end up that different. A different language makes sense to stop that. The aim should be to build the minimum needed to implement the code API. So what language? Rust makes the most sense it seems, although there are some other options.

There is a small but growing community of cloud native Rust projects. In the CNCF, there is TiKV from PingCAP and the Linkerd 2 data plane. Anther project that has recently been launched in the space is AWS Firecracker. The Rust ecosystem is especially strong in security, and control of memory usage, both of which are important for effective scalable systems. In the last year or so the core libraries needed in the cloud native space have really been filled in.

So are you interested in hacking on a greenfield implementation of Kubernetes in Rust? There is not yet a public codebase to hack on, but I know that there are some people hacking in private. The minimal viable project is something that you can talk to with kubectl and run pods, and API extensions. The conformance tests should help, although they are not complete enough to constitute a specification by any means, but starting to pass some tests would be a satisfying achievement. If you want to meet up with cloud native Rust community, a bunch of people will be at Fosdem in early February, and I will sort out a fringe even at KubeCon EU as well. Happy hacking!

01 Jan 2019, 18:00


You might have noticed me tweeting a bunch about RISC-V in recent months. It is actually something I have been following for several years now, since the formation of LowRISC in Cambridge quite some time ago, but this year has suddenly seen a huge maturing of the ecosystem.

In case you have been sitting under a rock hacking on something for some time, RISC-V is an open instruction set for CPUs. It is pronounced “risk five”. It looks a bit like MIPS, if you know your instruction sets, and yes it is very RISC, pretty minimal really. It is designed to be cleanly extended, and has 32, 64 and 128 bit implementations. So far the 32 bit version is for microcontrollers, the 64 bit for operating systems like Linux with MMUs, and the 128 bit version is for future dreams.

But an instruction set, even one without licensing and patent issues, is not that interesting on its own. There are some other options there after all, although they all have some issues. What is more interesting is that there are open and freely modifiable open source implementations. Lots of them. There are proprietary ones too, and hybrid ones with some closed IP and some open, but the community has been building open. Not just open cores, but new open toolchains (largely written in Scala) for design, test, simulation and so on.

SiFive core designer

The size of the community growth this year has been huge, starting with the launch by SiFive of the first commercially available RISC-V machine that could run Linux at Fosdem in January. Going to a RISC-V meetup (they are springing up in Silicon Valley, Cambridge, Bristol and Israel) you feel that this is hardware done by people who want to do hardware like open source software is done. People are building cores, running in silicon or on FPGA, tooling, secure enclaves, operating systems, VC funded business and revenue funded businesses. You meet people from Arm at these meetups, finding out what is going on, while Intel is funding RISC-V businesses, as if they want to make serious competition for Arm or something! Meanwhile MIPS has opened its ISA as a somewhat late reaction.

A few years ago RISC-V was replacing a few small microcontrollers and custom CPUs, now we see companies like Western Digital announcing they will switch all their cores to RISC-V, while opening their designs. There are lots of AI/TPU cores being built with RISC-V cores, and Esperanto is building chips with over a thousand 64 bit RISC-V cores on. The market for specialist AI chips came along at the same time as RISC-V was maturing, and it was a logical new market.

RISC-V is by no means mature; it is forecast it will ship 10-100 million cores in 2019, the majority of them 32 bit microcontrollers, but that adds to the interest, it is at the stage where you can now start building things, and lots of people are building things for fun or serious reasons, or porting code, or developing formal ISA models or whatever. Open source wins because a huge community just decides it is the future and rallies around every piece of the ecosystem. 2018 was the year that movement became really visible for RISC-V.

I haven’t started hacking on any RISC-V code yet, but I have an idea for a little side project, but I have joined the RISC-V Foundation as an individual member and hope to get to the RISC-V Workshop in Zurich and several meetups. See you there and happy hacking!

/* removed Google analytics */