Sitting in my room and installing Arch for the 4th time, it hit me. How can you know how a software works, and what methods do you use to achieve that? How do you create a different version of an OS or software without knowing the ingredients? Reading through thick-as-elephant’s-foot software books and scrolling the r/programmerhumor reveals the answer but let me save you a lot of time. It’s with reverse engineering.
Reverse engineering means dissecting a piece of software to find out how it works and what tools, libraries, and modules the software developers used to create it. The knowledge you get from dismantling it serves as a role model for a new piece of code, software, network firewall, or hardware.
What is reverse engineering?
Talking technical: It’s a process of deconstructing, analyzing, and understanding a system, product, or software without proprietary documentation. The goal is to learn how it works, its design patterns, how it’s structured, and so forth.
In the software development world, reverse engineering is examining on a microscopic level and decompiling its code. It’s like playing Dora the Explorer but with machine code.
Reverse engineering allowed researchers to better understand microprocessors and their mysterious flaws. It’s also how Google’s Project Zero came to life.
Use cases of reverse engineering
Reverse engineering is one of the cornerstones of software testing and a handy tool in these cases:
- Debugging — By understanding how a software system works, developers more easily identify and fix bugs or issues that may arise. What’s even better, with reverse engineering, they’re going ahead of time and anticipate problems before they ever happen. It’s like using a crystal ball but with code.
- Security — Using reverse engineering, security teams identify vulnerabilities in software, such as security flaws or potential ways to exploit the system. Nobody wants a tiny secret backdoor in their code through which hackers hijack customer data.
- Interoperability — When two software systems need to work together, reverse engineering helps developers understand how the systems communicate and seamlessly fit together. It’s about finding a common language between the two and building a uniform software ecosystem.
- Compatibility — Reverse engineering lets developers increase the compatibility between different software systems, such as enabling a new application to work with an older operating system. This allows things like running Doom Eternal on Windows XP or creating an emulator for running Windows applications on Linux.
The most famous example of reverse engineering in recent years check-marked all those four use cases: the development of ReactOS.
Running drivers and applications from another OS in a secure and open-source environment is the purpose of reverse engineering at its best.
How does it work?
So how do you actually reverse engineer something? What steps do you take to create a how-to manual for addressing compatibility issues or recreating legacy parts from an unknown thing?
Reverse engineering consists of a couple of steps you need to master:
- Extracting the machine code — a bunch of 0s and 1s — and putting them through program language statements. The language turns them into a source code, the main target of this step. This makes dealing with complex systems easier by converting them into a high-level programming language. That way, you shed light on how and why it works.
- Inspecting gathered information — Developers run different debuggers to test the execution of the code. It’s about getting familiar with the system as much as possible. It’s also about understanding the values of different variables and memory sets the code uses.
- Recording the behavior — Every reverse-engineered code needs to run through three stages: recording the functionality, data, and control flow. Recording how details are processed through each module of the software structure with a structured language puts every hidden Ace-of-Spades of the functionality on the table. Data flow lets you know how data distribution is done through each software process and how they communicate. The control structure is the essence of software behavior. What levels of high data control and structures a system or software has is a must-record.
- Revision — If the reverse-engineered software doesn’t work the way you want, it won’t do you much good. With software revision, you’re sure the code runs consistently and correctly without any bugs or glitches.
- Creating the documentation — Playing the librarian, your documentation must include software SRS, history, and design document overview, for future uses. Because, who wants to reverse engineer something from scratch all over again?
Sure, doing all those steps above manually is fine. Many junior devs do it. But what if I tell you doing reverse engineering automatically with tools is much better and joyful? Because it is!
The one I’ve described above it’s only one approach to reverse engineering. The second approach is examining the software’s runtime behavior, which involves observing how it behaves while in use. It’s done by intercepting and analyzing the data exchanged between the software and other systems, such as the operating system or hardware components.
It’s like having a sneak peek into how software communicates, behaves, and functions under the hood while the engine keeps revving.
Reverse engineering: TL;DR
Let’s be honest: Reverse engineering is a complex and time-consuming process that requires a strong understanding of computer science and software development. It’s also essential to be aware of any legal issues surrounding the reverse engineering of technology because of potential violations of copyright or intellectual property laws.
Despite these challenges, reverse engineering is an all-mighty tool for understanding and improving existing software and hardware systems. It played a key role in developing many crucial technologies and continues to be an important part of the software development and research communities as we know them.