It happens that after looking at the old code, we say: “It’s easier to rewrite it than change it.” It is sad if we are talking about our own code, which was written with such love several years ago. Head of Developer Relations at Evrone Grigory Petrov in his report at TechLead Conf 2020 analyzed the problems that lead to such a situation, and told how to deal with the Software complexity problem.
In this article, seemingly disjoint things intersect: neurophysiology, the curse of the zero price of copying, cognitive and social intuition. And, of course, it raises the topic of code complexity. You will learn about where it comes from, why it can’t be removed, and how to live with it.
Evrone is engaged in custom development of complex software. Therefore, it is important for its employees to write readable code so that customers can maintain it themselves and thank the company for a job well done.
But despite twenty years of experience in programming, Grigory admits that writing readable code is still difficult. And in this article we will discuss all the alleged difficulties.
When an artist paints a picture, he does it in strokes: he takes a brush, paints and begins to apply one stroke at a time. But we must not forget that he can always take a step back to look at his creation in its entirety.
We can say that programmers also write code in a kind of “brushstroke”: identifier by identifier, operator by operator, expression, statement, line by line, you get creations of 10, 100, 1000 lines of code.
But unlike an artist, programmers can’t “take a step back”. The artist uses the machinery of the visual cortex, which has a zoom mechanism. And when the code is written, the mechanisms of short-term and long-term memory are used, which, unfortunately, do not have a zoom mechanism architecturally provided.
Therefore, one of the main problems that exist in software development is what we call the Software complexity problem, which can be translated into Russian as “the problem of program complexity”.
But there are a huge number of other areas where complexity also accumulates. For example, rocket science. People who make spaceships have a lot of difficulties, don’t they?
But they also have years of training that start right from kindergarten. Future rocket scientists learn counting, first oral, then written, then come to school, where they study physics and mathematics. Years pass, they enter the Institute, where they learn, in fact, about the construction of rockets. They have established practices on how to build rockets, and skills are reinforced by repetition. The future rocket Builder designs, experiments, improves, and the fifteenth rocket will still go beyond the atmosphere.
Our brain, of course, is not a” blank slate ” from birth, but it is also not a computer with pre-installed software. It is believed that we can think exactly the same thoughts and in the way that we have learned in our lives. Intuitive thinking does a good job at the household level when evaluating ranges: evaluate how beautiful the picture is, how well the repair is done, how talented the artist is.
But if we try to apply intuitive thinking to someone else’s code, our brain automatically returns the result :” this code is bad, because you didn’t write it. Rewrite everything.”
We don’t have an intuitive way to evaluate “code quality”. Programming is a fundamentally new field, and our brains can’t intuitively apply life experiences from the real world to it.
In addition, programmers, unlike artists, find it difficult to learn from masters. Of course, we can come to our equivalent of an art gallery-GitHub-and look at big projects there. But if you make a project checkout from GitHub, there may be half a million lines of code. This is a lot, and we don’t have the optical zoom to just look at the code without delving into it. Therefore, it is very difficult for programmers to learn by example. I won’t even talk about the fact that GitHub is more of a building material warehouse than an art gallery.
It’s also hard for software customers who don’t have the intuition to understand what technical debt and refactoring are, and why the team wants a lot of money so that it doesn’t seem to do anything special.
So back to the question of accumulating complexity, programming is the same as rocket science. But, due to the lack of a Foundation, complexity accumulates much faster, and the accumulation of complexity makes the code unreadable.
Unfortunately for us, complexity cannot be removed from the code. After all, it is the benefit that the program we have written brings.
But the difficulty can be reallocated! This is exactly what will be discussed in the article.
The hippocampus is a part of the brain that is supposedly involved in memory formation. It is known that when the hippocampus fails, memory breaks down.
How this happens is not entirely clear. But there is such a pattern as “Miller’s wallet”: when a person looks at new objects, on average, he can keep from 5 to 9 of them in focus.
Modern neuroscientists have concluded that Miller was a big optimist, and in reality the number of objects held in focus is closer to 5. This is how many new pieces can be stored in short-term memory before it starts to fail.
But we also have long-term memory, the volume of which is quite large. However, putting something in long-term memory takes a long time.
When a person is just learning to play chess, they slowly scan the chessboard, remembering the rules and trying to find certain combinations. The window of his attention, which contains 5 elements, slowly crawls along the Board.
But if we are talking about a player who has been sitting at the chessboard for 10-15 years, then his brain automatically uses the usual patterns: combinations of pieces, typical attacks and defenses.
When an experienced chess player looks at the Board, there is very little new information for him. It is this new information that it holds in short-term memory, and it is usually only about 2-3 elements. Everything else is already in long-term memory. But this kind of training takes years.
Libraries and frameworks for programming languages can be a way to transfer information from short-term memory to long-term memory.
If a programmer has been writing in Python and using requests for many years, they get used to them. Typical constructions of using requests — how to make a request, how to send and receive JSON, how to solve speed and latency issues-are familiar to him. The code that uses the library becomes readable for the programmer. There is no more complexity in this code. At least for this particular programmer.
If a programmer starts using a different library, the readability of the code decreases for them. Therefore, sometimes it may be reasonable to choose a library or framework that is not optimal in terms of speed or usability characteristics, but is nevertheless mega popular. This code will be much more readable for a large number of programmers.
The coding style standard works the same way. Developers ‘ code can be readable to each other, but only if they live with it for at least a few months. Neurophysiology says that it takes a few months and a few hundred repetitions for our memory to build long-term potential connections, whatever they may be.
All this is now very conveniently packaged in linters. So if we want to make sure that the code that programmers in our team write is readable primarily for themselves, we pack the coding standard into linters and configure linters in their IDE.
But memory is a long time. This is the simplest, but also the most time-consuming way to deal with complexity.
The second most popular method is decomposition into parts of 5 elements.
Let’s look at the evolution of a typical spherical program in a vacuum.
As a rule, it starts with a single file that implements a minimum of functionality. Then, as you add lines of code, the programmer intuitively begins to divide the program into smaller files. After some time, when there are several dozen files, a more or less experienced programmer selects the modules that the programming language provides.
A little later, the programmer starts using language abstractions. These are usually classes, mixins, interfaces, protocols, and syntactic sugar. Modern programming languages generally allow the programmer to deal with complexity by adding high-level abstractions and syntactic sugar.
After a while, when the programming language abstractions exhaust themselves, and there are several tens of thousands of lines, developers begin to look with interest towards DSLs: YAML, JSON, Ruby, XML, etc.
Especially great interest is shown in Java, where XML configs for programs are just a de facto standard. But even if the team doesn’t write in Java, it’s more than happy to lay out and redistribute the excess complexity in JSON, YAML, and other places it can find.
Finally, when there are a lot of lines of code, programs begin to divide into microservices that are now fashionable.
I recall an anecdote that any architectural problem can be solved by introducing an additional layer of abstraction. Except for the problem of too many extra layers of abstraction.
The good thing is that we have other tools to write readable code.
First of all, meta-information is needed not by the compiler and programming language, but by people.
It is like road signs that are arranged by code. Where there is too much complexity accumulated, it is divided into parts with an indication of what is in these parts. The main part is the same, still huge, but external “road signs” allow you to look at it from different angles.
Main, main, and fundamental road signs are identifiers.
Identifiers are variables, constants, function and class names — all the names that we give to entities in the code.
An undocumented feature of our brain is that the part of the cortex that recognizes words (Broca’s and Wernicke’s zones) is very good at gluing them together. Therefore, no matter how long a word is, from the point of view of our working memory, it will almost always be one entity (within reasonable limits).
Mosgorvodokanalstroy is one Entity for our brain.
The ID is great for writing readable code if it answers the question ” what is it?”. The programmer came to the project, looked at the ID, and it is immediately clear to him that. Modern programming languages have PascalCase, camelCase, and snake_case for this purpose. When choosing a specific style, we choose what is more familiar to our team.
In Go, everything is very difficult with complexity, because the language provides almost no syntactic sugar. In the book “How to write in the Go programming language” there is a paragraph about the evolution of identifiers. It talks about how to deal with cognitive complexity in code. When choosing a name for an identifier, the authors suggest looking at what is next to that identifier. If there is something very simple (function, 2-3 lines of code that are obvious), then the ID can be i or v:
v => users => admin_users
But as the complexity and amount of code increases, we want to increase the length of the identifier so that it better answers the question “what is this?” if such information is not clear from the context.
After the IDs, there are comments that already answer the question ” why is this?”.
The worst comment in the code is the one that retells what is happening in the code. But you can already see this by reading the code! But information about why this happens, as a rule, is contained only in the developer’s head.
Top world programmers often write code without comments at all. The identifiers they use for variables, constants, functions, and classes, and the way they break code down using the tools provided by the programming language, tell the story better than the most successful comments. The best comment is the code itself.
But writing the way the best programmers do is hard. Therefore, we can add comments to the code that answer the question ” why?”.
Similarly, comments in commits can give you an understanding of why this commit was made. And if there is a link to the ticket in such a commit, then the complexity is redistributed there, giving additional points of support when reading the code after many years and answering the question ” why was this done?”.
Documentation can also be considered as the last Bastion. If you couldn’t make the code that answers the question ” why?”, couldn’t add comments that answer this question, and comments in commits and tickets didn’t work out either, open it readme.md and we write a big architectural paragraph there.
Documentation has huge risks of getting out of sync with the code, so when we write readable code, we should try to put something in the documentation only if there is no choice. For example, when we have a very large project.
Autogeneration of documentation is a different story. Many examples of good code that we see: frameworks and libraries. When making them, documentation is important to us, so we document each method, and then auto-generate the documentation. But this must be done wisely.
Tests can also serve as documentation. Because they show execution paths.
In the last 5-10 years, types have been introduced to dynamic programming languages. They serve as a kind of” trap ” for mistakes. When a programmer writes code, they can use the Gradual approach of modern languages. And where the complexity is increased, add types so that several “traps”are placed in the code.
If the programmer uses this code incorrectly after some time (for example, after six months), the “trap” will work, underline the line in the IDE in red, and everything will immediately become clear.
The Gradual approach to writing readable code mostly revolves around the number 5.
There are several ways to redistribute complexity:
The Gradual approach to working with complexity can be formulated in one sentence: if the number of new things in the code is much higher than 5, you need to use one of the ways to redistribute complexity from the list above.
The question of what a “new thing” is remains a bit off-screen. It depends on the developer’s background: how many years they have been writing code, what programming languages, frameworks, and approaches they know.
If the team has developers at different levels (for example, juniors and seniors), they will not be able to write code that is readable to each other. What is not new for a senior who has been writing code for 20 years, it will be for a Junior. Therefore, the code that the senior will write will be very simple, clear, and well — readable-but for seniors. And for juniors, the amount of “new” and, accordingly, the complexity in such code will be off the scale.
Practice shows that if we want the code that our developers write to be readable primarily for themselves, the qualifications of those who do this in the same team should be approximately the same.
Writing readable code is difficult. And the Gradual approach discussed in the article is not always applicable. Software development is very different: there is microcontroller development, there is game development, there is business automation according to specifications, and there the rules of the game are completely different.
But in most cases, the Gradual approach, which revolves around the number 5, is a good starting point.