100 Million Lines Of Code? 100 Million Problems.

The modern world has a scaling problem. We live in an era of huge numbers, where words like “million” and “billion” get tossed about freely — 10 million bits per second is considered slow, and 10 billion bits of disk space is considered small. And yet here’s another “million” that’s been floating around since 2014: modern cars will frequently contain 100 million lines of code. Bear in mind, that’s without self-driving. Given the size of all things ‘computer’, many people will — very reasonably — ask: Is, uhm, that a lot of code?

The short answer? Absolutely. 100 million lines of code is a terrifying amount.

But why is this particular “million” — of all things mega- and giga-sized — so large, or so concerning? For a start, it’s doing a heck of a lot of work.

Code — the instructions for a computer — first met automobile in 1977, when General Motors used a chip to display information, such as speed, in their Oldsmobile Tornado. By 1981, the code had grown to about 50,000 lines, which is 50,000 separate instructions (roughly speaking).

Since then, code has truly become king of the carriage, taking charge of the entire vehicle — from steering to brakes, all the way down to the entertainment system. After all, hell hath no fury like a driver whose car isn’t compatible with their smart phone.

So, those 100 million lines of code are certainly busy — possibly even powerful. Unfortunately, the amount of code isn’t an indication of good workmanship. In fact, the opposite is often the case: the more code, the lower the quality.

The problem isn’t so much the code itself, which might well be working perfectly. Instead, it’s the code writers, readers, and maintainers that we need to worry about. As is so often the case, we people are the problem.

This is really what makes 100 million lines of code unique among “millions”: each line had to be put there by a person. Vast numbers mean nothing to a computer; as soon as a human becomes involved, the impact of those numbers skyrocket. We see this everywhere — downloading 1,000 books onto your Kindle could take a matter of hours; actually reading all those books could take an entire lifetime.

Books act as a decent analogy for coding. Each line of code — like each sentence in a book — has significance. It has a reason for existing, a reason for being put there by its author, and an effect on the overall narrative. As a result, it also has the potential to cause great damage. One wrong code change could have your car emailing your speeding offences to the local police station. I mean, it probably won’t … but if programmers don’t understand the code they’re working on, then anything is possible.

Understanding code is a taxing job, despite what Hollywood would have us believe. Old code can be obscure for any number of reasons, from the complexity of the problem, to slightly confusing terminology, to sheer laziness from the original programmer — sometimes you, sometimes someone else. Either way, maintaining code becomes a kind of psychological game. Each line leads to a series of questions: Why have they written it that way? What is this doing? Does it even work?

100 million lines of code comes burdened by 500 million such questions.

It seems counter-intuitive, but good programmers tend to spend their time deleting code, rather than creating it. They simplify the picture — big and small — rather than complicate it. Even when they do have to add code, they’ll often remove some from elsewhere. Like being carbon neutral, they’re code neutral.

Of course, this doesn’t come easily. Code — like so many things — suffers from Ernest Hemingway’s famous (and possibly apocryphal) observation: “The first draft of anything is shit”. Good code doesn’t just come out beautifully formed. The early attempts at solving a problem can appear to be the ramblings of a madman. These then need to be polished into something that makes for easy reading.

100 million lines of code suggests this polishing stage hasn’t happened. Instead, the code is almost certainly bloated in ways that will cause problems. For example, one of the easiest ways to rack up lines is through Copy – Paste. If two very similar tasks are required, the lazy (or ignorant or under-pressure or bored or simply just tired) programmer might paste a chunk of code, and tweak a line or two. This is a recipe for disaster — it will escalate quickly. Before you know it, every fix will have to be made in 10 places … until it’s inevitably left out somewhere. And like that, you’ve introduced a bug.

So what can be done? Some prominent figures in the programming community, such as Robert “Uncle Bob” Martin, have suggested a certain standard be met for software professionals. This would mean taking a test — a sort-of bar exam for programmers. The argument: code is now so important, so prominent, that it is increasingly ridiculous to think that we can have things any other way. Eventually, someone will bring those standards in. It might as well be us.

As a programmer, the idea of regulation can seem a restrictive — almost draconian — one; as someone with a family who use cars, the idea of regulation is a no-brainer.

Whatever the solution, something’s got to give. With 100 million lines of code, we already travel on a bloated system of possible misunderstandings and probable mistakes. Add self-driving to the mix, and those 100 million lines could well rise to a billion. Our world’s scaling problems aren’t going to disappear any time soon.