How analyze an algorithm with only the binary program?, how improved a program without having the source code?, how to divide without subtracting?, a curious user will always feel drawn to reveal the secrets behind the code and this document is a good starting point
The term reverse engineering has a multitude of possible applications and is natural, as thinking beings that we are, have the ability to imagine its meaning and its possible scope simply considering only derivations of the same term.
Personally, I am interested in of greatly due to the remarkable intellectual challenges what tends to impose the "investment" of systems, procedure that manages to merge work into a kind of indistinct logic, knowledge and effort mixture, even formulating one greater challenge, way back, the original obstacle faced by the team of the project in which we focus. Specifically, this article will try to, at least for the most part, exposing the procedure of reverse engineering on a small, but quite real example, to embed the author in this extraordinary world.
The origin after reverse engineering
In the computational environment, it is very common to have a sort of black box, where we know what we entered and the results that we get, but we do not know the procedure with the precision that we would need to replicate it.
This lack of documentation processes is the main thrust of the reverse engineering.
The secret behind a program
Many times have seen that in this site I use implicitly hypothesis about behaviorsfor example if we want to avoid a error message and we do not know exactly the place where work in binary, just look for where the text of the message is loaded.
This is not coincidence and thus encompass the problems has direct relationship with the known scientific method.
Imagine now that we have an application that gives us a different value for each entry made. Without much difficulty, we can get the idea of something similar to the following illustration:
Now, this application only thing it does is to display a message in the following way:
How to discover the algorithm after those results?
It is substantial to try to achieve an inductive thinking where we will be able to identify the particular principle of the problem, to get to extrapolate a general solution.
Entry "0", exit "30"
Enter "1", exit "46"
Enter "2", exit "65"
Enter "3", exit "90"
Enter "4", exit "120"
Enter "5", exit "156"
Already to have observed the application we could decide to process data that gives us and manage to create an equivalent code or investigate the inner workings. Usually the first form of research will help us to cover simple problems, but our goal will always be to be able to deal with the biggest possible challenge so our journey will continue with the internal investigation.
Step by step
First, we look for the code segment involved in deploying that message. Already almost mechanically simply seek a call to MessageBox o MessageBoxEx system library User32.dll.
Any programmer intuiría easily do the upper lines of this application, in particular that sprintf It is part of the C standard library. Clearly we see also the title of the window in the code, these values are interpreted in an intelligent manner by this decompiler)OllyDbg), which assists us in very much to understand the code.
Taking into account the above considerations, the truth is that by logic we can foresee that our simple research already gave us sufficient evidence to draw upon the steps of the routine, which begins a little further up:
Note that intermediate steps of memory allocations and other formalisms of assembler will be skipped to encourage the understanding of the reader.
Initially the value of entry supposedly already read from the dialog of the program we have, now we focus on the first important action. We will assume the input as an unknown value. In the following piece of code interest, added seven to the unknown:
I.e., at the moment we have a very simple formula that looks so:
Then, the resulting value of the sum is multiplied by the same:
We again multiply the result by the former, provided after the sum. In this second time we can already express operation simply by raising to the cube the previous sum:
Now we subtract ten to the result:
The procedure that follows offers a greater intellectual challenge and will be addressed in the next item.
The genius hiding assembler
The following steps are an overview of several hours working on the code and start can seem "mystical" operation.
The truth is that after all the above steps, a constant number is loaded into the register EAX:
What makes the code is to multiply the result of temporary exit by that number:
By such a large number multiplication produces a sort of overflow in the container of the result record to exceed the size of the (in this case 32-bit processor) Word. These extra bits are stored in contiguous EDX register in a natural way (records are contiguous in the processor), within the processor registers. Now, moves a bit (will truncate the last bit) of the EDX register:
What happened after that tangle of operations without an apparent logical sense?
I will reveal you the response giving the final equation:
It is true, and although it seems incredible, the only thing accomplished is dividing the result of previous operations by eleven. Now wonder but how?, in two words: magic numbers.
To divide, we have to do is subtract cyclically until we reach zero or less, for example, to divide by 13 42, would have to perform the following sequence:
Thus, we get 42 13 division, is 3 with remainder 3. Now, remember the number that was loaded to the EAX register?
Imagine that we have any number. In this case I'll use number 7326, selected completely randomly, in decimal:
We multiply it by the number charged in EAX, following the same procedure above, but in decimal:
The result is large enough:
We express the result in hexadecimal. As in this case, we work with a 32-bit word size, have only 8 digits in hexadecimal number corresponds to those 32-bit, FFFFFFFF in this case would be the maximum figure represented in hex with 32 zeros or ones, i.e., the remaining "shall be" in the next record:
Now that we have the number that remained in the registry later:
Following the procedure, we must truncate the last bit of the result, we first convert the figure to binary:
What we get?, remember that removing the last bit we really do is divide by base, similarly to what it represents add numbers at the end, e.g. in decimal is simply multiply by ten, in this case, in base two, are divided by two:
Even more evident, in base ten:
It is not to be something demonic. It is a mere coincidence, but still missing the last step of this small demonstration. How much is the result, respect to the initial number?:
Incredibly, it is the division for eleven of the value:
Do amazing?, it is amazing how a costly operation, such as division, may be reduced to these levels.
The magic numbers are special numbers that allow split values in this particular way, there are tables of these numbers, for 32- and 64-bit and usually use compilers when they detect a division by a constant, that know their "magic" number.
Resources used in the documentexecutables, code source and images (1.7 MB)
It is clear that the issue of reverse engineering is one of the most interesting topics that could face is a technology professional and generally enjoy the long process involved thanks to the unpredictable challenge presented.
After repeatedly meditate throughout this topic appear older controversies that make us think if someday it will be possible to apply these techniques, and discover the secret of the human brain, the mysterious effects of quantum mechanics or even something as fundamental as the origin of life.