M32 — Andromeda Galaxy

KISS, YAGNI!

Andreas Maier
5 min readJan 20, 2023

--

KISS

KISS (Keep it simple and stupid) is the most important principle for IT projects. It requires that every problem is implemented with the simplest possible solution. In particular, you should always be on the lookout for cases where a requirement is simple, but the proposed solution turns out to be extremely complex. E.g.

  • Several XML files need to be generated. A Groovy-derived DSL is used as a solution instead of standard tools for generating XML files.
  • A simple PHP web application is to be set up for internal use. As a solution, a Kubernetes cluster will be set up in the Google Cloud using Terraform. A Helm chart will be written for the application and deployed using Tiller.
  • 30 MB of CSV files need to be made searchable via a website. As a solution, an Elasticsearch cluster is set up, which obtains the data of its index from a MySQL table. The Elasticsearch cluster requires 3 servers with a total of 6GB RAM.
  • 300GB of data need to be filtered and processed monthly. As a solution, a single VM with 3TB of RAM will be set up in AWS. A single-node Spark cluster is installed on top of it and several interlocking pyspark scripts are implemented to read and write the data from an Amazon S3 bucket.

All of these (real-world!) examples violate the KISS principle in obvious ways. Most often, this happens when a new, mostly unproven technology is introduced into a project. This is often done without considering whether the technology will solve the existing problems at all. Or, conversely, whether the problems that these technologies solve even exist. But the KISS principle requires to work with simple and stupid technologies first. Only when you reach the limits of simple technologies, can you evaluate and use more complex technologies. Because, despite KISS, the following still applies:

“Everything should be made as simple as possible, but not simpler.“

Albert Einstein

YAGNI

YAGNI (you ain’t gonna need it) means, in a narrower sense, that you shouldn’t program anything you don’t really need. You can derive this principle from KISS. Because, by definition, the simplest possible solution contains nothing superfluous. Nevertheless, a violation of the YAGNI principle (also called overengineering) happens quite often, especially by developers who are no longer beginners, but also not yet full professionals. These developers/architects are often aware of important principles and best practices, but blindly always apply them without considering whether they are really needed. E.g.

  • Microservices: An application is divided into several independent services according to the microservice principle. Instead of a single DB, each service now has its own DB, source repo, deployment pipeline, version number and runs in its own Docker container. However, it turns out that in the project, each microservice is only called by exactly one other microservice. And in the end, the application is only executable if all services are running at the same time in the correct version on the now necessary Kubernetes cluster.
  • Dependency injection frameworks (Spring, Angular) are popular to make the code (unit) testable. The steep learning curve, the performance loss (due to reflection to the runtime and memory consumption) and the problems when debugging with such frameworks are often overlooked. In addition, all the unit tests are of little use if the application as a whole does not work because there are no integration tests or end-to-end tests. For integration and end-to-end testing, however, dependency injection frameworks are of little use.
  • Kafka is a highly scalable stream processing engine capable of handling 10000–100000 messages per second. The customer has a data volume of 10–100 messages per second. But Kafka, along with a schema registry, is known to be the standard in messaging system and therefore has no alternative. In the end, the customer wonders why messages get lost in the complex and expensive system or not delivered in the right order.
  • Two developers decide to use Scrum in their two-man team because it is more agile.

All these (real-world!) examples are deliberately not taken from pure software development, because for that there are two more concrete instructions, which in turn can be derived from KISS and YAGNI.

Premature optimization

“Premature optimization is the root of all evil” (Donald Knuth): This principle states that you should not talk up any performance problems when developing software. Instead, you should focus on the performance problems that you have actually identified through performance testing, and then fix them. Anything else is premature optimization, which often leads to unnecessarily complex code. E.g.

  • Unnecessary caching (think about memory consumption) of results that are in fact rarely called.
  • Unrolling of deeper nested for-loops to avoid O (N²) performance problems. This overlooks the fact that N is often very small and unrolling makes the code unnecessarily hard to read.
  • Obscure micro-optimizations, often forgetting that the compiler does these micro-optimizations internally anyway. The result is not faster code, but simply unreadable one (like the infamous Perl oneliners).
  • Asynchronous multithreading of processes that are not performance relevant at all. Instead, you create additional problems because the order of the results can no longer be guaranteed.

Premature Flexibilization

“Premature flexibilization is the root of whatever evil is left”: This principle states that you should not make things flexible in software development just because you can. It is directed against unnecessary layers of abstraction (variables, classes, abstract classes,…) that are always needed when you want to prepare your code (often without a concrete requirement) for changes that you suspect might be required in the future. Often, however, it turns out that the customer or the project requires quite different flexibilities than you imagined when programming on your own. Then you end up with unnecessarily flexible code, which is unfortunately also unnecessarily complex. E.g.

  • The customer surely wants all variables to be configurable and changeable at runtime. So we need a configuration system that reads configuration files at regular intervals (even better, we listen to file system events, because push is better than pull) and sets them internally in the program.
  • We need rollback functionality (also for database, configuration files etc) in our deployment process in case we have bugs! This overlooks the fact that roll-forward (i.e. rapid deployment of a version with a bug fix) is often the much simpler solution.
  • Object-relational mapping (ORM) frameworks are, of course, the only way to talk to a database! That’s why the Python script that runs once a month has to work with an ORM framework instead of just hardcoding SQL statements. The fact that ORM frameworks often generate extremely odd SQL code and are difficult to debug in case of problems is often overlooked.
  • Code duplications must be avoided at all costs! It is often overlooked that there are now extremely good IDEs that make refactoring and replacing, even at multiple points in a large code base, fast and easy. Instead, unnecessary and wrong abstractions are often introduced, which only increase the problems instead of making the code more maintainable. But the wrong abstraction is worse than duplicating code!

Summary

All in all, the principles mentioned here are meant to prevent one thing: unnecessary complexity. Why is this?

Because complexity is the enemy of stability!

That’s why every software developer, big data architect, data scientist and IT project manager should always keep in mind:

Keep is simple and stupid! You ain’t gonna need a complex solution!

--

--

Andreas Maier

PhD in Astrophysics, currently working as Senior Data Scientist & Machine Learning Engineer