Most corporate code bases end up like Emacs

Once or twice a day, I check on HackerNews. Today a particular article caught my attention, called Emacs is Not Enough. I grew up on Emacs, and I enjoy it, but it certainly has its warts. I have slowly transitioned away from Emacs, and now reach for a web-based IDE like DevSpaces for most of my programming. The article gives an accurate and scathing critique of the modern state of Emacs, and it got me thinking about software in general. I'll drop some quotes from that article in the rest of this one.

One principle that we believe heavily in is that “Code is a Liability, Not an Asset.” Rahul’s recent talk at reInvent 2022 mentioned this point repeatedly. Code is seen as an asset because code made in house is unique, and crafted to fit a particular problem. This is one line of thinking, but it can lead you down a bad path.

\> Emacs is pretty much incompatible with this idea of being structured in any way. And so, all its 10 gazillion lines of Elisp are a liability, not an asset.

This problem is not unique to Emacs. Not at all. Most corporate code bases are filled with cruft, dealing with special cases, fixes made under tight deadlines which have not been properly architected, undocumented behaviors that downstream code is forced to work around, and much more.

\> At this point it's all sunk costs and damage control.

This is why the idea of “sunk costs” is a fallacy. Code that took so much effort to write must have some worth. However, it also has an associated on-going cost which may be far larger than accounted for. I have seen many bespoke in-house developed machine learning pipelines, platforms, etc. All of them were pet projects that ballooned in scope. When the caretaker of the pet project moves on, some poor team is stuck with maintaining this project in perpetuity. Every day more and more code gets written against this piece of custom code, and the technical debt grows. I have seen the same thing with databases, analytics toolkits, and any kind of utility code. Often the code that embodies a company's true insight and value proposition is quite small.

That is not to say that it isn’t valuable. There is a semi-famous example of an implementation in Octave of a Singular Value Decomposition algorithm for separating two audio recordings taken in the same room by microphones that are spatially separated. For the record, that one line is:
[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

When you find yourself maintaining a huge codebase full of mostly utilitarian code, the right course is to transition to AWS Managed Services, such as SageMaker, RDS, and ElastiCache. AWS is going to maintain SageMaker, in software terms, forever. RDS will be serving data until well past the time when 32-bit integers overflow.

\> We want to flush it down the drain of eternity, not keep it.

\> And we don't need to be even looking at the packages, we need to be looking at our experience using this software as a whole. That's the source of real value. Not code.

From a customer's point of view, they don’t care that we wrote these million lines of code ourselves. They care that, given a certain set of inputs, a certain output comes out. And, if they find out that behind the scenes, it is mostly gluing together a patchwork of various AWS services, this should inspire confidence. These services are all built and maintained by teams who have structure and incentive dedicated exclusively to the well-running of their respective products, and many companies depend on it. In other words, by using AWS Managed Services, you are using code that many other companies, as well as AWS themselves, are invested in.

\> Can you imagine to be building ever more crap on top of it?

\> This is a perfect example why Emacs has persisted for so long - when things get tough, it lets you slog your way through just another day.

Many codebases start to feel like this eventually. There is a voice in the back of your mind saying, “We will need to re-write this eventually. This code is a duct-taped mess.” But, the “eventually” rarely, if ever, comes. Until things break drastically, very few shops have the resources and the discipline to pause for “the big rewrite.” And as we know from Joel Spolsky, a “big rewrite” is also almost always a mistake.

Circling back to the original article, what the author is advocating for is a new project. Not even a rewrite, but a complete re-envisioning of how to interact with code. It sounds really interesting and I’ll certainly keep an eye on that project throughout the years. However, this is a passion project. In corporate codebases, we often don’t have the luxury of doing a “big rewrite” where the results may not be realized for years.

We recommend finding “seams” in your code where managed services can be swapped in to replace their aging, bespoke counterparts. Do you have a predictive model that has been maintained for years and starting to need constant tuning and retraining? Replace it with SageMaker. Some code to extract tabular data from PDF that keeps breaking and is dependent on an ancient version of Java? Swap that out for Textract! By reducing your code footprint, you can empower your developers to focus on your code's critical and value-adding parts. Rather than see the lines of code in your organization as a bragging point, start thinking about value per line.

tl;dr

Most corporate code bases consist of utility code, such as ETL, running simple ML models, validating data, etc. Over time, these codebases become brittle, and better ways of doing things can’t be implemented without breaking the many layers of code written on top of them. These can be replaced by AWS managed services. This will free up time and resources to focus on the core value-adding code which truly differentiates your business.