Tips on reading code - What I learnt from reading code in 24 languages
Code is everywhere and as software engineers, QAs and Operations teams we are likely to read more code than we write. Over recent years that has only increased in complexity with the number of languages in use.
It may be that you are reading code to peer review, trying to understand a legacy code base untouched in years, looking to understand that code in an open source project or trying to decipher a code sample online that looks like it may solve your problem. Most of the time this will be in a language you are working in daily and understand but as teams adopt a polyglot landscape of code this may not be the case. The chance increases as we use more and more open source projects. What to look at as a terraform provider? that is in GO, need to extend team city? then you need to read Java. In addition code samples online may solve our issue but be written in another language which we first need to understand.
It is unrealistic to fully understand every language you come across but there are a number of techniques we can use to help quickly understand the intentions of code.
In my career I have had to read code for all the reasons above. I have had to read a code base as part of company acquisitions, fix an issue in an old open source C++ library that we consumed and countless code reviews.
To begin with they were all in languages I was familiar with but soon I found myself trying to work out code that to me looked unfathomable like this APL sample.
V←3 1 9 1 5 12 7 9 18 135
I would not normally compare computing languages with spoken ones as they are very different however, on first glance the feeling was similar. It was like when you go travelling, there are countries that use familiar alphabets but where the words are different or they add a few letters. In these you can make an educated guess. Then there are those where the words are read in the opposite direction with a different set of symbols and you feel out of your depth. All of this of course depends on your past experiences and each person’s experience will be different to each challenge.
Each programming language exists for a reason. I persevered and learnt a lot about a range of languages but more importantly expanded my set of techniques and tools to read unfamiliar code. The tasks also broadened my knowledge of different languages and why they exist.
I won’t go through every challenge or language, if you are interested they can be found here https://code-reading.org/blog/. I will however talk about the techniques I used drawing on some examples. The aim for this is not to become an expert in the languages but rather get a grasp of what is trying to be achieved with the code and understand why that language is suited to it. This is often the case when reading code in our daily jobs, needing only enough to be able to translate it and extract the information we need.
Understand the Basics of the Language
Each language exists for a reason. Firstly it’s important to get an initial grasp of the features and core concepts before diving in. In every language I came across in this test I was able to find an introductory video, blog post or website that took me through the core concepts. It is important to identify the type of language, is it strongly typed?, is it a functional language?, why is this language unique? Languages may have different operators or concepts than you may first assume.
In code snippet 13, we were presented with a sample in Hendy. You have probably not heard of Hendy before, it is a training language to help teach programming. The Hendy introduction video quickly outlined its purpose.
Use an Editor to Syntax Highlight
Modern IDEs such as Visual Studio code offer syntax highlighting for a great range of languages. This can start to make the code easier to read, to identify strings, keywords, variables and other structures. Once these keywords are identified it is possible to then search to understand the meaning and context along with the use of variable names to understand the intention.
In this example of PHP from challenge 9 loading it in VS Code makes it much easier to read.
Read the Docs
Once the parts of the code have been separated we can then search the documentation for built in methods or unfamiliar operators. I often avoid reading long documentation, however if we focus on a few key methods we can start to form a picture of what the intent of the code is.
In challenge 11, I could see it used SQL Regex however I was not sure on the flavour of SQL. Searching for part of the statement led me to the postgresql documentation on pattern matching where I could compare it to the example I had and understand how it worked.
What you Already Know
You may be able to narrow down what the code is doing based on the knowledge you already know. In code snippet 20, the question was to determine what sorting algorithm was being used in the sample code. I was not familiar with the language GCL however I was familiar with a number of sorting algorithms. This allowed me to compare the code and quickly rule out some of the algorithms.
You must also be careful of what you know and making assumptions. Language may look the same but operate differently. It is therefore worth testing your assumptions.
Much like google translate there are online converters that will convert from one language to another. These are only going to work with similar languages, not produce production code and much like language convertors the results can be mixed. That said they can be useful to get an initial understanding of code. I have often used them to convert individual lines of code or to help understand equivalent built in methods.
Online Compiler and Containers
It used to be quite a lot of work to run code in a new language on your local machine however now we can use containers or even more conveniently there are online compilers. If it’s safe to do so you can take the code and run it in the browser stepping through line by line to understand what is happening. The other advantage is the code can be modified to prove your understanding.
Not all of the code examples were compilable, some of them were short snippets. I did however use online compilers to run single lines of code to prove my theory of what a certain command in the code did.
Sometimes you are lucky enough to be able to run the full code in the browser. This was the case with the final puzzle which asked what the LOGO code would draw. Using JSLogo I ran the code in sections to understand and see what it was drawing.
Improve your Search Skills
As programmers we spend a lot of time searching online. Unlike C# Some languages such as GO due to the additional meanings make it hard to search. It is also important to make sure we are searching for the right thing.
Code in other spoken languages
It may be a surprise that not all code is in english. Wikipedia has a list of non English programming languages. This can add an extra level of complexity to reading code. A scenario I have faced before and a more common challenge is code that has been commented on, and where variables are in a language that I did not speak. Luckily google translate can do a good enough job to assist.
In the code reading challenge this scenario came up with challenge 14. To make it harder I did not know what language it was. The first variable was
teller . This is also an English word, however the second
tafel could be detected as the Dutch for table, allowing me to identify
counter. It really highlighted the importance of variable names, once I had these the code made more sense and I could see it was printing the times tables.
Putting it all Together
In challenge 15 there was a code snippet of APL with the question “APL’s grade up (⍋) operation can be used to sort. What does its output mean and how can it be used to sort value?”
V←3 1 9 1 5 12 7 9 18 135
At an initial glance this looked very confusing. I first found out a bit more about APL and learnt that it is a symbol based language from the 1960s based around multidimensional arrays. It’s main aim is to be concise.
Searching for ⍋ was not going to be much use however I did learn that the unicode symbol is dedicated to it’s us in the APL language.
We did however know this is called a Grade Up therefore a search for “apl language grade up” led to notes on how it is used. It’s worth noting for many searches across this challenge I needed to append language to my search to find results.
In short this operator returns the indexes of the items if they were put in order. This makes sense as we know APL is based around arrays.
We can then try this at https://tryapl.org/
⍋23 1 13
2 3 1
From this we can see the first item is at index 2 (value 1), then index 3 (value 13) and finally index 1 (value 23). The thing that jumped out at me by testing it that it is not zero based, something I had not read.
These are a set of techniques I have personally used over the years and are far from exhaustive.
I have just started to read The Programmers Brain by Felienne Hermans which was sent to me. This book talks about the theory behind reading code along with many different techniques.
This challenge was created by https://codereading.club/ and they host a public reading club. If you are interested in the topic please do visit their site to learn more.
Code is everywhere and we often read more than we write. In this post we covered the reasons we read code and then looked at the tips to help understand what we see:
- Understand the basics of the language
- Use editor to syntax highlight
- Read the docs
- Rely on what you already know
- Code convertors
If this is a topic you are interested in you can read more at https://code-reading.org/