Reverse Engineering, why and how.

Here's where we take apart software...

Reverse Engineering, why and how.

Postby Erant » Sat Jan 23, 2010 3:25 pm

I'm writing this article at my desk. On this desk is, from left to right, an FPGA board, an oscilloscope, a macbook, a soldering station, a power supply and a multimeter. Connected to the macbook is an RFID reader, a cheap chinese portable mediaplayer and a serial to USB converter. Running on the macbook is MacOSX, with a VMWare session running Windows XP and IDA Pro v5.6. On the RFID reader is my university identification. Strewn across the desk are various parts, batteries, LCD screens and assorted PCBs. Next to me is a whiteboard filled with scribblings of hexadecimal numbers.

Reverse engineering is the art of divining the inner workings of an electronics device or piece of software. People who practice this art call themselves hackers, crackers, or reverse engineers. Ask any reverse engineer to describe his desk to you and I guarantee you it won't be far off from my description. We are an odd bunch of people, regularly sitting deep into the night, peering at assembly code, writing C code or poking at an electronic device with a soldering iron.

Reverse engineering and hacking

Reverse engineers and hackers are usually the same. As said, reverse engineering is trying to figure out how a device works. This includes trying to figure out how security works on a device and how to circumvent it. At this point you are 'hacking' the device. In this article I will interchange the terms hacker and reverse engineer. I will however be focusing more on the reverse engineering part, as this is broader and easier to explain.

The why.

Making a device or piece of software do what it wasn't intended to do is usually the end goal in reverse engineering. However the goal is usually not the thing that matters most. It's the road leading up to the goal that is the most interesting to a reverse engineer. The puzzle that unfolds itself as you dig deeper in the bowels of your subject is the real treat. The euphoria of finding that one function that checks the hash of that payload you're trying to inject into your Nintendo Wii is indescribable. These small 'eureka' moments is what we thrive on.

Commercially, reverse engineering is becoming more and more important. Industrial espionage, figuring out what the competition is doing, is an important field of work where reverse engineering is applied. Companies are beginning to arise that specialize in such areas. Another application lies with the companies creating the devices we want to hack. They can hire hackers to try and hack their devices. If they succeed, that security hole can be patched before the product hits the market.

The how.

The device.

To illustrate the how of reverse engineering I will discuss a small case study. The device at hand is the cheap chinese mediaplayer that is connected to my laptop. These devices have a 320x240 pixel LCD screen, an SD slot, can do USB host mode, feature a 2D graphics engine, have a 2GB internal FLASH chip, and have a small mobile phone camera. At their heart is an ARM926EJ-S based SPMP3050 CPU, running at 166MHz. All this will run you roughly $35, ordering from a well-known chinese website.

The seed.

The way this usually goes is that one hacker buys such a device, looking to work on something simple between the larger projects. He opens it up, finds interesting hardware and then proceeds to post about the device. This sparks interest from other hackers, and then suddenly you have yourself a 'scene' around the little device. It was no different here. One hacker bought the thing, wanting something to hack on. Soon, a group of experienced hackers surrounded this innocuous device.

Starting.

Almost all embedded devices have a very low-level way to talk to them. This is because unlike a regular PC, it is very hard to debug these devices. Once something crashes, it is likely to freeze the screen, or your fancy USB code that gives you debugging information. For this reason, most devices have a simple serial port. Our mediaplayer is no exception, opening up the device shows us a pair of soldering pads. We quickly identified these as a serial port, soldered up a USB to serial converter and booted the device. As expected we were greeted with some debug information:

Code: Select all
S+                                                   
Ver:1.7                                             
Boot From Nand                                       
S+sdram_iotrap:0x2

SPMP3050 SW LABEL 202.9


With a way to get low-level debug information out of the device, we decided to rewrite the firmware that came with the device. This posed our first problem: We don't have the datasheet for the CPU, and the manufacturer doesn't want to give it to us. This means that we don't know how to 'talk' to all the shiny peripherals in this device. We will have to reverse engineer the processor.

This is where we whip out the first of our tools, the disassembler. The disassembler allows you to look at portions of the devices firmware in assembly code. This is useful, because the firmware DOES know how to talk to all the peripherals. Looking on the internet, we find some firmware upgrades. Ripping the firmware out of this upgrade, figuring out some addresses and throwing the firmware into our disassembler shows us:

Code: Select all
; FUNCTION CHUNK AT ROM:24000358 SIZE 00000024 BYTES

BL      sub_24000138
BL      sub_24000268
BL      sub_24000340
BL      sub_2400037C
MOV     R0, #'S'
BL      sub_240003B4
BL      sub_240003E0
MOV     R0, #'+'
BL      sub_240003B4
BL      sub_240003FC
BL      sub_24000424
B       loc_24000358


This is the initialization sequence of the device. Immediately we notice two function calls that take 'S' and '+' as arguments. This looks familiar... Looking back at the trace we got from the serial port, we notice the string "S+"! This must mean that the functions here push a character out over the serial port. Looking at the actual code in this function, we find a fairly simple sequence, simply writing to a single register:

Code: Select all
ROM:240003B4                 LDR     R1, =unk_10001000 ; Load 0x10001000 into R1
ROM:240003B8                 STRB    R0, [R1,#0x822] ; Store whatever's in R0 to R1 + 0x822 (0x10001822)
ROM:240003BC                 BX      LR ; return to callee


With some prior knowledge as to how serial ports are usually exposed in hardware, this tells us that this register is at least the transmit FIFO. More reversing will later show that when reading from this register, it functions as the receive FIFO as well. Most peripherals need some initializing, and the serial port is no different. Setting baudrate, parity, start and stop bits is all done in different registers. Locating the initialization sequence is trivial, with more knowledge how peripherals are usually laid out in memory. To keep things simple, registers are usually clustered together in a memory range. The memory range for the serial port is apparently 0x10001800. Any register in this range (from 0x10001800 to 0x100018FF) will have to do with the serial port. This is not always true, but a good starting point nonetheless.

Because the serial port has to be initialized before sending the two characters S and + over the line, we look at the code before these calls. Only one function here uses registers in the 0x10001800 range, so this is a likely candidate:

Code: Select all
ROM:2400037C                 LDR     R0, =unk_10001000
ROM:24000380                 MOV     R1, #0x68
ROM:24000384                 STRB    R1, [R0,#0x820]
ROM:24000388                 MOV     R1, #0
ROM:2400038C                 STRB    R1, [R0,#0x821]
ROM:24000390                 MOV     R1, #0xD0
ROM:24000394                 STRB    R1, [R0,#0x824]
ROM:24000398                 MOV     R1, #0x11
ROM:2400039C                 STRB    R1, [R0,#0x825]
ROM:240003A0                 MOV     R1, #0x88
ROM:240003A4                 STRB    R1, [R0,#0x82F]
ROM:240003A8                 MOV     R1, #2
ROM:240003AC                 STRB    R1, [R0,#0x880]
ROM:240003B0                 BX      LR


Now that we have a decent idea of how to output some data, we need to upload our code. Looking at the firmware update procedure we find that you have to press a special button combination, boot the device, and then use the firmware upgrade tool. Doing this gives us a different trace on the serial port:

Code: Select all
S+                                                   
Ver:1.7                                             
Boot From USB


This gives rise to the idea that we can load our own code over USB. A bit of quick reverse engineering of the firmware update tool, and 'spload' is born. A program that will load our own code over USB. Compiling a quick program written in C that does the initialization sequence for the serial port and outputting a simple string gives us:

Code: Select all
S+                                                   
Ver:1.7                                             
Boot From Nand
Hello World!


Succes! We have succeeded in initialising the serial port, and outputting a string, not to mention running our own code. This is one of the eureka moments that we thrive on.

As we continue stepping through the firmware, we find initialization routines for the LCD, the NAND flash, the 2D engine and several other peripherals. These are all quite a bit more complex than the serial port, so we write some supporting code to help us reverse engineer these peripherals. CBL is a program I wrote, and is short for "Crappy BootLoader". Crappy Bootloader can change and read memory address, inject code into the existing firmware and interrupt the running firmware. These functions will help us reverse the rest of the peripherals. What we can now do for example is let the firmware run until a picture appears on the LCD, and then interrupt it. By looking at what location in the firmware the drawing is being done, we can find out how to initialize the LCD.
Another technique we can now apply is incremental implementation. This means that we let the firmware run up to a certain point, and then run our own replacement code. If something shows up on the LCD, we know we have succesfully implemented the initialization up to that point.

With these techniques and a few more we can fully reverse the device, and do with it what we will. Right now we're up to the point of being able to run a Real Time Operating System with preemptive kernel called Prex on the device, have full interrupt support, the LCD is fully working, we can read from the internal NAND chip and 2D graphics acceleration support is almost complete.

This example is ofcourse a trivial one, and the project is more a fun little sideproject than anything very important. But the procedure here is very much the same for larger projects, such as a gaming console or security system. I have not discussed the security side of reverse engineering (even though a form of security does apply here, called security through obscurity), because this is an advanced subject and is still a grey area as far as legality goes. In conclusion, reverse engineering is an exciting field of Computer Science and is becoming more and more important. Companies will continue to fight the hackers and come up with more inventive ways of keeping us out. However, in order to know how to keep the hackers out, one needs to know the way hackers try to gain entry. In other words, to protect a system, one needs to know how to hack it. Only a hacker can keep a hacker out.

-- Erant
Erant
 
Posts: 23
Joined: Sat Jan 16, 2010 9:41 am

Re: Reverse Engineering, why and how.

Postby c1de0x » Sun Jan 24, 2010 8:24 am

Nice article Erant.
c1de0x
 
Posts: 22
Joined: Sun Jan 17, 2010 8:51 am
Location: Where I'm Not

Re: Reverse Engineering, why and how.

Postby iZsh » Sun Jan 24, 2010 12:38 pm

Nice one, except we now write python and ruby code instead of C ;)

There are a lot of crackme/crack tutorials on the internet. I think more general RE tutorials (such as this one, with even more details), would help a lot toward showing new comers there're a lot more to discover, and much more interesting, than just patching serial numbers.
iZsh
 
Posts: 19
Joined: Sat Jan 16, 2010 12:18 pm

Re: Reverse Engineering, why and how.

Postby matt » Sun Jan 24, 2010 5:47 pm

Great article and introduction to reversing.

My desktop looks a lot like yours, but I use a native XP box with three 23" screens, and I haven't bothered updating IDA to 5.6 from 5.5. The soldering stuff, logic analyzer, target PCBs etc are on a bench in another room.

The systems I work with generally use a RAM buffer to load the serial output. So if you search for the output you see on the serial line, you'll find a number of sequential RAM addresses that get loaded with the data, then some other subroutine (called elsewhere) will eventually copy them into the transmit buffer SFR. Just one more layer to dig through.
matt
 
Posts: 2
Joined: Thu Jan 21, 2010 12:18 am

Re: Reverse Engineering, why and how.

Postby Erant » Sun Jan 24, 2010 6:41 pm

matt wrote:The systems I work with generally use a RAM buffer to load the serial output. So if you search for the output you see on the serial line, you'll find a number of sequential RAM addresses that get loaded with the data, then some other subroutine (called elsewhere) will eventually copy them into the transmit buffer SFR. Just one more layer to dig through.


This system does the same, albeit at a later stage in the bootprocess. The way I've depicted here is very common in the init-sequence of an embedded device. The high-level functionality you described isn't available until an OS has been loaded, or at the very least a bootloader.
Erant
 
Posts: 23
Joined: Sat Jan 16, 2010 9:41 am

Re: Reverse Engineering, why and how.

Postby hugochavez » Mon Jan 25, 2010 1:07 pm

This was a fascinating article, from someone at the periphery of anything reverse engineering-related. I'd like to learn more :)

Are you able to say which device it is that you're working on? Or will that be revealed in good time?

thanks!
HC
hugochavez
 
Posts: 1
Joined: Mon Jan 25, 2010 1:02 pm

Re: Reverse Engineering, why and how.

Postby Erant » Mon Jan 25, 2010 7:26 pm

hugochavez wrote:Are you able to say which device it is that you're working on? Or will that be revealed in good time?


It's a cheap chinese mediaplayer like I said. It can be found at DealExtreme: http://www.dealextreme.com/details.dx/sku.21965

I actually wrote this article a while back, when we were still going at it full force. We've since become slightly bored with the project, and are only sometimes working on it. It's something that I'll probably pick up from time to time, when I don't really feel like doing anything else.
Erant
 
Posts: 23
Joined: Sat Jan 16, 2010 9:41 am

Re: Reverse Engineering, why and how.

Postby sdevlin » Mon Jan 25, 2010 7:44 pm

Interesting article. Any chance you could post some resources where we can learn more?
sdevlin
 
Posts: 1
Joined: Mon Jan 25, 2010 7:40 pm


Return to Reverse Engineering