26 Apr 2020, 18:00

Home Work Setup

The other day I got a message from Jenny “the Burce” saying that I had to get some equipment to upgrade my live streaming setup for the DockerCon dry run. Cameras and microphone and things, a list from Bret Fisher. Only problem, I soon discovered, was that nothing on the list was actually available. Somehow just after lockdown everything that people might need to live stream audio and video had been panic bought, along with the flour, toilet paper and eggs. So over the next month or so I have gradually put together a setup that works, with the aim of improving the audio and video quality.

It has also been the first time I have worked at home for long periods, previously I mostly went to the office with a few meetings at home at the start and end of the day. Given that we are all going to be homeworking for a long period, may as well make it better. Due to lack of availability all the low end stuff was unavailable, but will give some pointers and suggestions as to what is worthwhile or not, and supply chains should start to improve soon. I am lucky enough to have a reasonable amount of space, if you are working in a constrained space I would imagine choices are more limited.

Also I am lucky enough to be able to work at home, or at all in these difficult times. Tech workers are so lucky and safe compared to so many others.

Desk

1970s Danish teak desk, bought on the Holloway Road some years back. Not in perfect condition or anything, a desk for using. Hard to move around. Big, not going to fit in a small space. I don’t remember the price, it wasn’t a lot and it will last another 40 years. The lamp is a German, asymmetric one from the 1930s.

Computer

MacBook Pro from a few years back. I have been wondering about getting a desktop as, well, not going anywhere. However I want something silent and that seems really difficult now. I do have a Linux box (carefully constructed with large slow fans) and a FreeBSD FreeNAS box under the table, but although they are fairly quiet I find them too noisy when working so I mostly keep them switched off. The cloud is silent of course, a great advantage. I may go down the silent PC building route again soon, will keep you posted.

Monitor

Dell 27 inch 4k monitor I took from the office. I thought it was too small on this desk, but then realised I had it too far back. I would probably get a USB C monitor now, just to get more ports nearby but this is fine. I don’t like double monitors due to the gap, I prefer a single large one.

Internet connection

It is going to be difficult to improve quality for live conversations without a good internet connection. Obviously there may not be much choice where you are though, so changing this can be difficult. I use Andrews and Arnold with 80/20Mbs VDSL; they are a high quality service with static IPs, IPv6, and they do not have oversubscription. It costs a bit more than other providers.

Keyboard and trackpad

Need cleaning. Apple bluetooth ones. I also have a (noisy) Hacker’s keyboard around. I much prefer trackpads to mice or trackballs now.

Dock

Realising I was about to plug in more things than the computer has ports, I got the Caldigit TS3 Plus as recommended by someone on Twitter. This provides power down one thunderbolt cable to the computer, while having everything else plug into it. It has displayport for the monitor, and wired ethernet, meaning I can avoid wifi issues. The wired ethernet goes via ethernet over mains adaptors downstairs to the router. Note that if you have the new MacBook Pro 16 inch, this consumes a peak 97W of power which is more than this delivers although maybe there will be a firmware fix. CPU peak power consumption is getting ridiculous now, 100W laptops!

Webcam

I managed to order a Logitech StreamCam direct from Logitech just before all webcams sold out. It is excellent quality, see pictures below. I sit it on top of the monitor, and it has USB C. It has a very wide angle of view, but I eventually found out that the Logitech Camera Settings App allows you to modify this, with a narrower setting too. This is just a crop, so it is not as high quality. The Logitech software is much worse on Mac than Windows it seems, with far less control available; some of the Windows controls appear to be done in software with a software video out that other applications can connect to which is not available on Mac. The Logitech 4k cameras apparently have three zoom options as well as ability to set frame rates, and it looks like some stock may become available again, so these could be better for a cropped view. Actually using the 4k option is not really possible with most software at present though, and it requires lots of CPU to encode.

Having the camera above you on the monitor is way better than using the camera on a laptop, which is generally low down unless you raise it up a lot; also as you want to use a monitor generally the laptop is probably to the side, which looks strange on calls. I don’t know why Apple do not improve the quality of laptop cameras to match their phone cameras, and I have heard of people using phones to stream.

Another option a friend is exploring is using a digital camera; most recent cameras can stream video although generally only via HDMI out so you need something like the Elgato Cam Link and these are also hard to get now. With a choice of lenses and zoom and excellentpicture quality this is an option if you already have a suitable camera; you probably want to use a lens around 35mm it seems. You will need to mount it behind the monitor which needs some work. Obviously this is a substantially more expensive option and only makes sense if you have a camera already for other uses.

Lighting

Cameras are way better quality with lights. You might not immediately notice, so here are some crops to give you an idea of low light versus a reasonable light. I have the Elgato Key Light, which is wifi controlled. You probably need something this bright, I had a small LED panel and it was not bright enough.

The pictures below show crops of the video in the dark without lighting, with light from the window only and lit with additional lighting.

The Key Light has a slightly annoying property of occasionally losing wifi access and needing to be reset, although it stays on during this time, so I am not sure I can entirely recommend it, although it hasn’t happened for a while now. It is also expensive, but generally good. Lights are difficult to buy. This clamps to the table which is good, as tripod type stands take up loads of desk space or floor space around.

I also have a window to the side, which provides most of the light during the day, but I use the light at a lower level as a fill light, or else the side of my face away from the window is very dark. At night I use the light as a key light, and don’t use a fill, so it is a bit like Rembrandt lighting. Look at three-point lighting to get an idea of how to place lights, you ideally want them diagonally notdirectly in front, or else it looks very flat. I place the webcam a little bit asymmetrically pointing into the room so it does not catch the very bright window. The worst setup is if you have a window behind you, when the camera will have a hard time, as you can see when having calls with people with that setup.

Audio

Audio gets complicated very fast. Your options are to use your laptop, or to use the microphone on your webcam, which is what I was doing for a while, and still do sometimes. There is another problem though about how to listen to the audio, and avoiding the microphone picking up the sound of the other party, or yourself. I had a bias towards audio/music equipment as I have used it in the past a little and it is currently relatively easily available; there are very different routes you could take here.

The original recommendation from Bret was to get the Samson Q2U, but this remains totally unobtainable. Actually all USB microphones were unobtainable. If you get a USB dynamic microphone, such as the Q2U or the Audio Technica ATR2100 which is similar but more expensive (but maybe available now) then your route will be simpler and cheaper than mine below.

So I went the traditional route. Generally the advice seemed to be that unless your room is a soundproofed studio, get a dynamic microphone not a condenser microphone, as they are more directional and likely to mostly pick up your voice not what is going on outside or downstairs or even the noise from your keyboard. I went for the classic Shure SM57 a microphone that has been around so long it has its own Wikipedia page and White House stories. I ordered direct from the manufacturer which was very quick; apparently there are a lot of fakes of these so it is worth buying from a reputable place. You can’t see it clearly in the photo above as it is pointing straight at me, as I am sitting it does not obstruct the view, but I can move it away andback on the mic stand, see below.

As the mic has XLR analogue outputs you need to plug it into the computer. The easiest way is to get an audio interface, that combines a microphone pre-amp and an analogue to digital converter. I got the Audient EVO4, which seems really nice and excellent quality. Audient is a UK company that makes mixers and other professional audio recording hardware; this is their “diffusion line” but has the same high quality hardware. This also acts as a headphone amp, and can live mix the audio from the mic into the headphone so you can listen to yourself speaking. It supports two mics, or a mic and an instrument, and there is also a four channel version, for a future world without social distancing when we are in the same room again. There is only one potential issue with this combination, which is that the microphone outputs at a very low level. The EVO4 has 58dB gain, which is quite a bit more than most units I looked at, but if you have a quiet normal speaking voice and don’t project it, even if you have the gain set to maximum, if you speak more than around two inches away from the mic it is a little quieter than ideal. At around two inches away it is fine although with some extra bass emphasis, or if you speak up a bit, but I am not really used to doing either of those most of the time on calls. I should probably get used to it; the recommendation is to be less than 15cm away.

I ended up, in the spirit of testing every option, getting a FetHead which is a tiny microphone preamp that fits inline with the mic and provides an additional 27dB of gain, powered from the preamp. This is designed for exactly this use case with dynamic microphones. Adding it suddenly shifted from having to use max gain at all times to being in the middle of the scale and having plenty of room to adjust. It also cut the small low noise level even lower. I would say if your preamp has less than 58dB of gain you would need this with this mic, otherwise you could get away without it but it gives a little more flexibility. I chose the EVO4 partly due to the fact it has relatively high gain, so you would get more choice with the FetHead as any audio interface will be fine, although the Evo4 is still a nice choice I think.

Usually you are recommended to use headphones for audio recording, so as not to record the output sounds along with input. Much software has echo cancellation built in, and the Mac has some hardware cancellation, although that may just be on the built in microphone and speakers. This means that you don’t necessarily need to wear headphones for many use cases, although they will give you a better idea of relative volume levels if you have multiple sources, and depending on your exact setup and mic they will reduce echo or noise. Your voice will sound a little different in the headphones than you are used to, but there is no lag, and you get used to it. Having the audio in your headphones stops you shouting which people tend to do with headphones as they cannot hear themselves and compensate. A dynamic mic like the Shure is also fine for recording with speakers even without cancellation, that is a normal stage recording setup that they are often used for, ideally with the speakers at 65 degrees behind the mic as that is the zone of least sensitivity. I may well set up some speakers later; the EVO4 has line out for speakers too. It is less clear where to put the speakers on the desk though.

You really want a mic “boom stand” with this setup so you can move the mic out of the way, and then place it back in the right place, as mic placement is important. I had no idea about stands and got the Neewer NB-35 which is very cheap, and it does the job but it is a bit annoying as the part that holds the mic is hard to keep at the right angle, and the whole thing moves in a slightly annoying way. I may try a different one.

I originally got the Audio Technica M30x headphones. These are not too expensive, and good quality closed ear headphones, which block out external noise well. I did find that wearing them for long periods made my ears hot and slightly squashed and they are not great after an hour or so. I ended up getting open backed, around the ear headphones, Sennheiser HD600 which are way more comfortable to wear for long periods, and sound great. As they aren’t closed, other people could hear you so you wouldn’t wear them travelling or in a shared office, but if you have your own room to work in this design works really well, if you don’t want total sound isolation and noise cancellation (you can hear the doorbell ring, which is useful). You also can hear yourself speak, although I do like a little microphone mixed in; you could use these with any kind of microphone without a mixer, and some come with built in mics.I tested recording while having music playing in the headphones, and with the Shure mic the recording level even with quite loud music is negligible with your head in the normal direction; if you point your ears at the mic it clearly picks up the sound. With a less directional mic such as the one in the webcam it picks up a quite a bit of the noise though.

Overall I would say that with a dynamic microphone you get a lot more flexibility in your headphone options. For recording something offline I would probably use the closed ear headphones or not listen at all during the recording (the EVO4 can show mic line level). For talking to other people and daily use the open back headphones are so much more comfortable that they make a lot of sense, and you can just switch from listening to music to making calls.

I didn’t make any effort to choose portable equipment, as this is lockdown, but other than the mic stand it is all relatively portable equipment. The EVO4 can be plugged into an iPod with USB C, or an iPhone if you have the Lightning to USB3 Camera Adapter which despite its name is a generic USB3 adapter that accepts input power over another lightning port to power external devices that need additional power that the phone won’t provide. I tested recording and playback on my phone with this adapter and it worked fine.

The best place I have found for buying audio equipment, other than ordering direct from the manufacturer, is Thomann. They are a German family firm but with a global online shop, and deliver fast and efficiently to the UK, and their prices are a lot lower than Amazon.

Comparing the options

Below is a video of using internal camera and webcam, and internal mic, webcam mic, airpods and the Shure mic. I used the Zoom cloud recording, so this gives an idea of what someone would see and hear at the other end of a call with me, rather than the best quality for local recording. Note that I had the window open and a motorbike goes past a couple of times, but sadly not while I was using each microphone, but I did type on the keyboard so you can hear what some non directional noise pickup is like. Overall the audio quality and resistance to noise pickup for the Shure SM57 is substantially better than any of the other options. So be nice to your co-workers and improve your audio.

Linux

I haven’t yet tested any of this equipment on Linux. I use my Linux machines as servers not desktop machines at present. The EVO4 audio is a standard USB audio device so should just work, and I think the Logitech cameras in base settings are, but there may well be no control of settings, probably including crop, as this is maybe not standard, I am not entirely sure. Probably best to check.

Is it worth it?

Well, it is not necessary. As I spend a lot of time on calls and do quite a few conference talks that will all be online for at least the next year or so, I think improving the quality is worth it. The differences are noticeable as you can see from the recordings. Audio quality makes a lot of difference to meetings, and I would make that a priority if you want to work on something. Supply chains should get better over the next few months so it should get easier to find more choices.

26 Jan 2020, 13:30

WTFosdem

Next weekend is Fosdem, the largest open source event in Europe. A lot of people will no doubt be coming for the first time, or thinking about coming another year, so I thought it might be helpful to explain what it is. Fosdem is not really like any other event, so Americans in particular find it confusing, thinking it might be like OSCON or something. It is not. Of US events I know, it is perhaps most like All Things Open, but it really is a different thing. My qualifications for writing this are that I have been on and off since about 2004. I worked there a few years, back when Greenpeace ran the conference WiFi, before Cisco took over, and I have spoken once.

The first practicality is you notice you don’t have to register, or indeed pay. You should however donate (on site) if you can afford it, although they will try to give you a really ugly t-shirt if you do. Most people do not donate, so the conference relies on volunteers, the Université libre de Bruxelles which gives the space, and, increasingly, corporate sponsors. The next practicality is where to stay. The location is not very central, and while there is a tram link it can get extraordinarily full. The best plan is to either stay within walking distance, or to stay near the start of the tram line, which is near St Catherine in the centre of Brussels. You can also use taxi/Uber but the sheer number of people trying to get to and from the location can mean delays. Brussels is one of my favourite cities in Europe, and along with my friends who live there, one of the reasons I usually decide to attend. I highly recommend you spend some time visiting the city. It is February though, so bring hat, gloves and warm clothes. Some years it has been snowy and the hills get slippery so be careful walking around, and allow extra time.

Brussels snow

The next practicality is that this conference is overwhelmingly attended by white men. Most tracks will not have any women speakers. We know tech has a diversity problem, but it is really in your face here more than other places. Since 2016 there has at least been a code of conduct after Sarah Mei wrote about it in 2015. Richard Stallman attended as recently as 2016. Sarah’s piece says it “feels like 2007”, and this is changing very slowly.

Fosdem attendees

Fosdem started as a developer meetup place, where distributed communities would meet to hack on things, and talk about what they have done. So everything is divided by project like grouping. There are a large number of rooms for talks, but not enough for all the diversity of modern open source, so some years projects like Perl that always used to have Fosdem community meetings don’t get a room, and things get grouped where they used to be split, like “small languages” or “desktop”. From an audience point of view thats better, and the community meetings do tend to happen, in the hacking rooms, over meals and so on. The traditional thing to do is sit in one room all day, but of course lots of people are interested in learning about new things and want to wander around. And some things are massively popular and in smallish rooms (most rooms are smallish), such as the Go room in recent years.

So the talk you want to go to might well be full. Full means full, if the sign is on the door it means you won’t get in. Remember all the talks are recorded and streamed, with AV team is pretty amazing. Years ago only some of the rooms were recorded, but now you won’t miss them. So have a backup plan. I remember a particularly enjoyable we can’t get into the Go devroom meeting with Jaana and others one year. Overall my strategy is generally to go to a few things at random that might be interesting, maybe target a few specific ones that I really want to go to (and go early maybe for the previoustalk) but not regret if I can’t get in, and spend most of the time talking to people. The random things can be great, that is how I started working on NetBSD and rump kernels, after going to a talk pretty much because I thought a talk about testing kernels might be interesting. You never know what paths you might go down in future.

Note there is a growing Fringe of events around Fosdem, both before, after and during. No dount, like with the Edinburgh Festical, the Fringe will soon dwarf the original event.

The whole event is really hectic, and there are going tobe maybe 6,000 people there, maybe it is more. This gets overwhelming, so take time out for yourself. I am only planning to attend on Saturday this year, and just to chill out in Brussels on Sunday.

Posters from Fosdem

Fosdem has a strong culture of open source as freedom and as a political statement, and there is widespread antipathy to corporate open source. For a long time there was no real sign of the larger tech companies, but this has changed in recent years, with Google and AWS sponsoring this year as is the CNCF, and visible presence of more corporate and industry rather than grassroots open source. You will meet people who don’t like this, don’t like permissive licenses, and might object to your company’s open source policies. In many ways this feels kind of refreshing

Food is very important. Talks run all day, so you need to plan some time for lunch. The quickest thing is the baguettes that are available at various places, eg downstairs back of Jansson. They are very efficient about dispensing these fast. There isn’t much choice. There are food trucks out front, with huge queues at lunchtime. I usually go down the road to Le Pain Quotidian (eat in or take away) in the small cluster of shops down the road. That is busy but less so. There is really not much else around this area.

Coffee is important too. There is a GitHub sponsored coffee stall that is good, but it is free so the queue tends to be very long. The next best coffee is at the cafeteria. Le Pain Quotidian does coffee too. If you want tea, on Saturday this year OpenUK are serving tea and biscuits and Brexit commiseration on their stand.

Beer is a fixture at Fosdem. Belgium makes some of the finest beers in the world, and some ok ones too. Beers are sold at several points in the venue, and it is common to take them to talks and so on. Beware most Belgian beers are strong. Also the kriek they sell at the venue is terrible, even though Belgium makes some amazing examples of this beer style. There is a pre-conference “beer event” on Friday, I haven’t even tried to go for many years, even though they take over an entire street it is too crowded to be enjoyable or find anyone you want to talk to. Yes, there are a lot of alcohol focused events, and events in bars which could be offputting if you don’t drink.

Antique shop

Brussels is a lovely city. The architecture is beautiful, both the old as exemplified by Grand Place which is magical in the evening, and the art deco gems, such as the Musical Instrument Museum, once a shop, and the diversity everywhere. It is said that there is a rule about not copying buildings, although I am not sure this is really the cause, but Belgium does not have terraces of identical houses, but every building is totally different. The Belgians are also as eccentric as the British, if not more so. Also don’t miss the Galaries Royal St Hubert, the first glazed shopping street in Europe, from 1847.

Perhaps my favourite area are the parts between Sablon, which has a grand antique market and excellent chocolate shops, and the Marché aux Puces, the flea market which is full of junk. In between are several streets lagely filled with antique shops, selling midcentury furniture, and well everything. Some are huge inside full of things and stuff of every kind just jumbled up anyhow. There are often amazing window displays like the one below.

window display

Food in Brussels is really good, although Fosdem is not always the best time to eat as you are often with indeterminate amounts of people and getting reservations, which are often needed on Friday and Saturday nights, is hard. Also most places are small. Brussels is very international, and all kinds of food are available there. While most people just think that Belgian food is frites with mayonnaise and waffles, but there is both French and Flemish food that are traditional, and great seafood, not just mussels. The local beer is lambic beer, the sourdough of beer styles made with wild yeast that is only made in the region. Cantillon is one of the best, and has an amazing museum in the working brewery in Brussels. This styleof beer is sour, but it is absolutely delicious. If you love this style, Moeder Lambic is a great place to try it. There have been a number of new breweries open recently, de la Senne is excellent and available in good bars.

This year, the Friday night before Fosdem is Brexit. Brussels has a large UK community, and Fosdem always has a large UK contingent, with whole Eurostar trains being filled on Friday evening usually. So be nice to any of us you see.

So, yeah, that is Fosdem. Unique. Could be better. Enjoy Brussels.

Brussels market

01 Nov 2019, 17:27

Linearity among the toctou

Illustration from Paul Klee, Pedagogical Sketchbook, 1925

I have been reading a lot of papers on linear types recently. Originally it was to understand better why Rust went down the path it did, but I found a lot more interesting stuff there. While some people now are familiar with linear typesas the basis for Rust’s memory management, they have been around for a long time and have lots of other potential uses. In particular they are interesting for improving resource allocation in functional programming languages by reusing storage in place where possible. Generally they are useful for reasoning about resource allocation. While the Rust implementation is probably the most widely used at present, it kind of obscures the underlying simple principles by adding borrowing, so I will only mention it a little in this post.

So what are linear types? I recommend you read “Use-once” variables and linear objects: storage management, reflection and multi-threading by Henry Baker, as it is the best general overview I have found. The basic idea is extremely simple, linear variables can only be used once, so any function that receives one must either return it, or pass it to another function that consumes it. Using values only once sounds kind of weird and restrictive, but there are some ways it can be made easier. Some linear types may have an explicit copy operation to duplicate them, and others may have operations that return a new value, in a sequential way. For example a file object might have a read operation that returns the portion read and a new linear object to read for the next part, preserving a functional model: side effects are fine if you cannot reuse a variable. You won’t really recognise much of the Rust model here, as it allows borrows, which presents a much less austere effect. It does all sound fairly odd until you get used to it, even though it is simpler than say monads as a way of sequencing. Note also that there are related affine types,where you can use values zero or one times, so values can be discarded, and other forms such as uniqueness types, and many other fun variants in the literature.

Memory is probably the easiest way to understand the use cases. Think about variables as referring to a chunk of memory, rather than being a pointer. Memory can be copied, but it is an explicit relatively costly operation (ie memcpy) on the memory type, so the normal access should be linear with explicit copying only if needed. Because the value of the memory may be changed at any time by a write, you need to make sure there are not multiple writers or readers that are not reading in a deterministic order. Rust does this with mutable borrows, and C++ has a related thing with move semantics.

Rust’s borrow checker allows either a single reference with read and write access, or multiple readers when there is no write access. Multiple readers is of course not a linear access pattern, but is safe as multiple reads of an immutable object return the same value. The complexity of the borrow checker comes from the fact that objects can change between these states, which requires making sure statically that all the borrows have finished. Some of the use cases for linearity in functional languages relate to this, such as efficiently initialising an object that will be immutable later, so you want linear write access in the initialisation phase, followed by a non linear read phase. There are definitely interesting language tradeoffs in how to expose these types of properties.

Anyway, I was thinking about inter process communication (IPC) again recently, in particular ring buffer communication between processes, and it occured to me that this is another area where linearity is a useful tool. One of the problems with shared memory buffers for communication, where one process has read access and the other write access for each direction of communication is that the writing process may try to attack the reader by continuing to write after reading has started. The same issue applies for userspace to kernel communication, where another userspace thread may write to a buffer that the kernel has already read. This is to trigger a time of check time of use (toctou) attack, for example if there is a check that a size is in range, but after that the attacker increases it. The standard defence is to copy buffers to a private buffer, where validation may happen undisturbed. This of course has a performance hit, but many IPC implementations, and the Linux kernel, do this for security reasons.

Thinking about toctou as a linearity problem, we can see that “time of check” and “time of use” are two different reads, and if we treat the read buffer as a linear object, and require that its contents are each only read once, then time of check and time of use cannot be different. Note of course that it does not matter exactly which version gets read, all that matters is that it is a consistent one. We have to remember the value of the part we check and keep that for later if we can’t use it immediately. So linear read has its uses. Of course it is not something that programming languages give us at present, generally a compiler will assume that it can reload from memory if it needs to. Which is why copying is used; copying is a simple linear operation that is available. But there are often cases where the work being done on the buffer can be done in a linear way without copying, if only we had a way of telling the compiler or expressing it in the language.

Overall, I have found the linear types literature helpful in finding ways to think about resource allocation, and I would recommend exploring in this space.

21 Jul 2019, 20:52

Fuzz rising

Go and read the excellent blog post from Cloudflare on their recent outage if you haven’t already.

I am not going to talk about most of it, just a few small points that especially interest me right now, which are definitely not the most important things from the outage point of view. This post got a bit long so I split it up, so this is part one.

Fuzz testing has been around for quite some time. American Fuzzy Lop was released in 2013, and was the first fuzzer to need very little configuration to find security issues. This paper on mutational fuzzing is a starting point if you are interested in the details of how this works. The basic idea is that you start with a valid input, and gradually mutate it, looking for “interesting” changes that change the path the code takes. This is often coverage guided, so that you attempt to cover all code paths by changing input data.

Fuzz testing is not the only tool in the space of automated security issue detection. There is traditional static analysis tooling, although it is generally not very efficient at finding most security issues, other than a few things like SQL injection that are often well covered. It tends to have a high false positive rate, and unlike fuzz testing will not give you a helpful test case. Of course there are many other things to consider in comprehensive security testing, this list of considerations is very useful. Another technique is automated variant analysis, taking an existing issue and finding other cases of the same issue, as done by platforms such as Semmle.

Fuzzing as a service is available too. Operationally fuzzing is not something you want to run in your CI pipeline, as it is not a test that finishes, it is something that you should run continuously 247 on the latest version of your code to find issues, as itstill takes a long time to find issues, and is randomised. Services include Fuzzbuzz a fairly new commercial service (with a free tier) who are very friendly, Microsoft Security Risk Detection and Google’s OSS-Fuzz for open source projects.

As Cloudflare commented “In the last few years we have seen a dramatic increase in vulnerabilities in common applications. This has happened due to the increased availability of software testing tools, like fuzzing for example.” Some numbers give an idea of the scale: as of January 2019, Google’s ClusterFuzz has found around 16,000 bugs in Chrome and around 11,000 bugs in over 160 open source projects integrated with OSS-Fuzz. We can see the knock on effect on the rate of CVEs being reported.

If we look at the kinds of issues found, data from a 2017 Google blog post the breakdown is interesting.

As you can see a very large proportion are buffer overflows, manual memory management issues like use after free, and the “ubsan“ category, which is all the stuff in C or C++ code that if you happen to write it the compiler can turn your program into hot garbage if it feels like it. Memory safety is still a major cause of errors, as you can see if you follow the @LazyFishBarrel twitter account. Note that the majority of projects are still not running comprehensive automated testing for these issues, and this problem is rapidly increasing. Note that there are two factors at play: first, memory errors are an easier target than many other sorts of errors to find with current tooling, but second there is a huge codebase that has huge numbers of these errors.

Microsoft Security Response Center also just released a blog post with some more numbers. While ostensibly about Microsoft’s gradually increasing coding in Rust, the important quote is that “~70% of the vulnerabilities Microsoft assigns a CVE each year continue to be memory safety issues”.

In my talk at Kubecon I touch on some of these issues with C (and to some extent C++) code. The majority of the significant issues found in the CNCF security audits were in C or C++ code, despite the fact there is not much of the is code in the reviewed projects.

Most of the C and C++ code that causes the majority of open source CVEs is shipped in Linux distributions. Linux distros are the de facto package manager for C code, and C++ to a lesser extent; neither of these langauges have developed their own language specific package management yet. From the Debian stats, of the billion or so lines of code, 43% is ANSI C and 24% is C++ which has many of the same problems in many codebases. So 670 million lines of code, in general without enough maintainers to deal with the existing and coming waves of security issues that fuzzing will find. This is the backdrop of increasing complaints about unfixed CVEs in Docker containers, where these tend to me more visible due to wider use of scanning tools.

Is it worth fuzzing safer languages such as Go and Rust? Yes, you will still find edge conditions, and potentially other cases such as race conditions, although the payoff will not be nearly as high. For C code it is absolutely essential, but bugs and security issues are found elsewhere. Oh and fuzzing is fun!

My view is that we are just at the beginning of this spike, and we will not just find all the issues and move on. Rather we will end up with the Linux distributions, which have this code will end up as toxic industrial waste areas, the Agbogbloshie of the C era. As the incumbents, no they will not rewrite it in Rust, instead smaller more nimble different types of competitor will outmanouvre the dinosaurs. Linux distros generally consider that most of their role is packaging not creation, with a few exceptions like Systemd; most of their engineering work is in the long term support business, which still pays well despite being increasingly out of step with how non-C software is used, and how cloud deployments work, where updating software is part of normal life, and five or ten year software lifetimes without updates are not the target. We are not going to see the Linux distros work on solving this issue.

Is this code exploitable? Almost certainly yes with sufficient effort. We discussed Thomas Dulien’s paper Weird machines, exploitability, and provable unexploitability at the Säntis Systems Summit recently, I highly recommend it if you are interested in exploitability. But overall, proving code is not exploitable is in general not going to be possible, and attackers always have the advantage. Sure they will pick the easiest things first, but most attacks are automated now and attacking scales well. Security is risk management, but with memory safety being a relatively easy exploit in many cases, it is a high risk. Obviously not all this code is exposed to attackers via network or attacker supplied data, especially in containerised environments, but some is, and you will spend increasing amounts of time working out what is a risk. The sheer volume of security issues just makes risk management more difficult.

If you are a die hard C hacker and want to remain one, the last bastion of C is of course OpenBSD. Throw up the pledge barricades, remove anything you can, keep reviewing. That is the only heroic path left.

In the short term, start to explore and invest in ways to replace every legacy C dependency you are currently using. Write a deprecation roadmap. Cut down your dependencies on Linux distributions. Shift to memory safe languages everywhere, and if you use C++ make sure you only use the safer subset. Look to smaller more nimble Linux distributions that start shipping memory safe code; although the moves here have been slow so far, you only need a little as once distros stop having to be C package managers they can do a better job of being minimal userspaces. There isn’t much code you really need to run modern applications that themselves do not have many C dependencies, as implementations like LinuxKit show. If you just sit on top of the kernel, using its ABI stability guarantees there is little you need to do other than a little configuration; well other than worry about the bugs in a kernel written in … C.

Memory unsafe languages are not going to get better, or safe. It is time to move on.

27 Jan 2019, 19:00

Kubernetes as an API standard

There is now a rustyk8s mailing list to discuss implementations of the Kubernetes API in Rust.

There was a lot of interest in my tweet a couple of months about writing an implementation of the Kubernetes API in Rust. I had a good conversation at Kubecon with some people about it, and thought I should explain more why it is interesting.

Kubernetes is an excellent API for running code reliably. So much so that people want to run it everywhere. People have described it as the universal distributred systems API, and something that will eventually be embedded into hardware, or the kernel (or Linux) of distributed systems. Maybe some of these are ambitious, but nothing wrong with ambition, and hey it is a nice, simple API at its core. Essentially it just does reconciliation between the world and desired state for an extensible set of things, things that include a concept of a pod by default. That is pretty much it, a simple idea.

A simple idea, but not simply expressed. If you build a standalone Kubernetes system, somehow that simple idea amounts to a gigabyte of compiled code. Sure, there are some extraneous debug symbols, and a few extra versions of etcd for version upgrades, and maybe one day Go will produce less bloated code, but that is not going to cut it for embedded systems and other interesting potential use cases of Kubernetes. Nor is it easy to understand, find your way around the code and hack on it.

Another problem with Kubernetes is that it suffers from the problem that the implementation is the specification. Lots of projects start like that but as they mature the specification is often separated, and alternative implementations can thrive. Without an independent specification, alternative implementations often have to copy every accidental nuance of the original, and even replicate bugs. Kubernetes is in the right state where starting to move towards an independent specification would be productive. We know that there are some rough edges in the implementation that need to be cleared up, and some parts where the API is not yet the best it could be.

One approach is to try to cut back the current implementation to a more manageable size, by removing parts. This is what Darren Shepherd of Rancher has done with “k3s”, removing a million or so lines of code. But a second, complementary approach is to build a new simple implementation from the ground up without any baggage to start with. Then by looking at differences in behaviour, you can start to understand which parts are the core specification, and which parts are accidental. Given that the way the code for Kubernetes is written has been described as a “clusterfuck” by Kris Nova, this seems a productive route: “Unknown to most, Kubernetes was originally written in Java… If the anti patterns weren’t enough we also observe how Kubernetes has over 20 main() functions in a monolithic “build” directory… Kubernetes successfully made vendoring even more challenging than it already was, and discuss the pitfalls with this design. We look at what it would take to begin undoing the spaghetti code that is the various Kubernetes binaries.”

Of course we could write a new implementation in Go, but the temptation would then be to import bunches of existing code, and it might not end up that different. A different language makes sense to stop that. The aim should be to build the minimum needed to implement the code API. So what language? Rust makes the most sense it seems, although there are some other options.

There is a small but growing community of cloud native Rust projects. In the CNCF, there is TiKV from PingCAP and the Linkerd 2 data plane. Anther project that has recently been launched in the space is AWS Firecracker. The Rust ecosystem is especially strong in security, and control of memory usage, both of which are important for effective scalable systems. In the last year or so the core libraries needed in the cloud native space have really been filled in.

So are you interested in hacking on a greenfield implementation of Kubernetes in Rust? There is not yet a public codebase to hack on, but I know that there are some people hacking in private. The minimal viable project is something that you can talk to with kubectl and run pods, and API extensions. The conformance tests should help, although they are not complete enough to constitute a specification by any means, but starting to pass some tests would be a satisfying achievement. If you want to meet up with cloud native Rust community, a bunch of people will be at Fosdem in early February, and I will sort out a fringe even at KubeCon EU as well. Happy hacking!

/* removed Google analytics */