Obfuscation (from the English obfuscate — to make non-obvious, confusing, confusing) in a broad sense – bringing the source text or executable code of a program to a form that preserves its functionality, but makes it difficult to analyze, understand the algorithms of work and modify during decompilation.
As is clear from the above, obfuscation methods should complicate the code, transforming it in such a way as to hide the logic of its operation from third parties.
Ideally, I would like a program that has passed obfuscation to give no more information than a black box that mimics the behavior of the original program. A hypothetical algorithm that implements such a transformation is called “Black Box Obfuscation”. Decompiling the program encrypted in this way would give attackers no more information than decompiling the messenger client, which is just a wrapper over the API of the “real” application, which would completely solve the problem posed in the previous block. However, it is shown that the implementation of such an algorithm for an arbitrary program is impossible.
How it works
most obfuscation methods convert the following aspects of the code:
• Structure: use a variety of data formatting, renaming identifiers, removing comments code, etc.
Obfuscation tools can work with both source or bytecode and binary, but obfuscation of binary files is more complex, and should vary depending on the system architecture.
When obfuscating code, it is important to properly evaluate which parts of the code can be effectively obfuscated. Obfuscation of performance-critical code should be avoided.
one of the most important elements of obfuscation is the transformation of the data used by the program into a different form, which has a minimal impact on the performance of the code, but makes it much more difficult for hackers to reverse engineer.
here you can get acquainted with interesting examples of using the binary form of writing numbers to complicate the readability of the code, as well as changing the form of data storage and replacing values with various identical expressions.
Control flow obfuscation can be performed by changing the order of program execution statements. Modifying the control graph by inserting arbitrary transition instructions and converting tree-like conditional constructs into planar switching operators, as shown in the following diagram.
This method changes the structure of data storage to make it more difficult to use. For example, an algorithm can choose random addresses of data in memory, as well as relative distances between different data elements. This approach is notable because even if an attacker can “decode” the data used by the application on a particular device, then on other devices it will still not be able to reproduce its success.
Read more about address obfuscation can be found here.
this method prevents attacks by regularly releasing updates to the obfuscated software. timely replacement of parts of existing software with new obfuscated instances may force an attacker to abandon the existing result of reverse analysis, since the effort to crack the code in this case may exceed the value obtained from this.
Converting and modifying the assembly language can also make the reverse engineering process more difficult. One of these methods is the use of overlapping instructions (jump-in-a-middle), as a result of which the disassembler can produce incorrect output. Assembly code can also be strengthened against intrusion by including useless control statements and other junk code.
Debugging information can be used to reverse engineer the program, so it is important to block unauthorized access to debugging data. Obfuscation tools achieve this by changing the line numbers and file names in the debug data, or by completely removing debugging information from the program.
I did not describe the history of the development of various approaches to obfuscation, as in my opinion, it is well reflected in the article already existing on Habr.
This article was written in 2015, and I could not find on the Internet a significant number of articles and other materials on the topic of my post that have accumulated during this time. In my opinion, in our age, the development of all kinds of web applications that need little obfuscation as a method of protecting information is becoming increasingly popular. However, just the same compression of the source code of programs using obfuscation methods in such applications is often useful.
In conclusion, I would like to add that when using obfuscation methods, you should not neglect other methods of protecting your code, because obfuscation is far from a silver bullet in protecting programs from hacking.