Reverse Engineering: Cracking Grand Prix 2.a for MS-DOS


Table of Contents

Introduction


Formula One Grand Prix was released by MicroProse for MS-DOS in 1992, and their follow up game (Grand Prix 2) was released in 1996. Grand Prix 1 (GP1) came on anywhere between three and six 3.5" floppy disks, while Grand Prix 2 (GP2) was distributed on CD-ROM. A supposed internal development version named Grand Prix 2.á dated 1994 came with six disks with capacities up to 1.44MB (3.5" floppy). This release was distributed on various warez cds including Tango #7, and referenced as existing in "Coke 4" on Now That's What I Call Games 5. The only information that was provided with the game files included a FILE_ID.DIZ and .NFO file. The FILE_ID.DIZ reads as follows:

This is the highly keap
secret MicroProse
Gprix 2.á.
Just released to very few
people, but there are no
.nfo fileZ and no SN.


This wouldn't be the first time source code or an internal development version of a game has appeared on a warez CD, namely a debug version of SimCopter with the pdb information, and the source code for Fleet Defender: F-14 Tomcat. The GPRIX2.NFO file also left some information, although the author was unspecified:

Sorry dudez couldn't hack this baby. If ya'z have any luck then UP the
SN as soon as posible. BTW this aint an easy one to hack, I tried for hours.


So it's clear that there was a serial number that needed to be cracked, and hasn't already been since 1994.




Early Problems and Tool Selection


My first attempt to getting this to run was to use Windows XP, and just run the program with NTVDM (as I did with Weather Wizard). This is one of the few times that NTVDM couldn't properly emulate/translate a 16bit program. This would mean running the program under an emulator such as DosBox.


When running INSTALL.EXE under DosBox the installer properly launches and we're greeted with a seemingly straight-forward installer. We are prompted with a serial number input and after five incorrect attempts, we are booted back to DOS with a message that says "Do not pirate software". It's now time to open this application up in a disassembler that supports 16bit DOS; the free version of IDA no longer supports this so my choice was Ghidra. Unfortunately Ghidra did not like the INSTALL.EXE file and totally refused to even perform analysis.


Without IDA and Ghidra, I stumbled across two disassemblers that I was unfamiliar with - Reko and radare2. Since radare2 appeared to be a bit more popular, I decided to give it a try first. Despite it's front-end iaito not having up-to-date pre-compiled releases, and only works/compiles properly on Ubuntu - it was able to start disassembling. With radare2 open and debugging INSTALL.EXE at the same time, it quickly became apparent that this installer was compressed. Additionally, the analysis on both Reko and radare2 had an issue where int 0x20 (Terminate Program) would not stop the analysis - meaning that the basic code blocks which followed would be the result of invalid opcodes. Below is a screenshot that demonstrates this issue under radare2.



These blocks are likely generated by some recursive descent algorithm - therefore the block at cs:0029 should have ended at the int 0x20. Another problem with such algorithms is that they cannot accurately track a ret; this instruction will just pop a value off the stack and use it as the return address. In our case, we have a ret in our entry function, and since an entry doesn't have a caller - an address is simply pushed onto the stack which is used as the return address. In this very first section, 0xC7 WORDs are copied from ds:si to es:di, then we ret to es:di. Or in other words, we are loading executable data into a segment which we are about to execute. This new executable section was analyzed as part of our existing entry function, because the analysis didn't stop on int 0x20. This resulted in the first instruction becoming malformed in the static analysis, but it's not a big deal. Below is a screenshot comparing the static analysis versus the actual code execution.



I've reported the int 0x20 behavior to both Reko and radare2 which have both addressed this issue in their subsequent releases. The rest of the code in the current segment loads bytes, sets up other segments - it was clear this was packed and what I needed was an unpacker so I could perform further static analysis. This is where I tried to use Reko to see what it came up with. What's interesting about this disassembler is that it recognized that it was packed, and told me that the packer was PKLite. This however was a fruitless venture because even with PKLite unpackers, I couldn't get it to fully unpack. After talking with the developer behind the project - it appears that Reko was capable of unpacking however it was failing for a reason that was later resolved.


Without the unpacker the only other thing to do was to just dump the memory at each segment. At first I used DosBox - however MEMDUMPBIN and MEMDUMP did not work. I switched to DosBox-X and it worked, so from here on that was the tool I used. What's interesting about dumping segments under emulation is that both DosBox and DosBox-X resulted in different segments offsets, however they were consistent run-to-run and after system restarts. For the remainder of the analysis I worked using radare2 because of the IDA-like graph view, along with DosBox-X.



Binary Analysis


I started by searching for string references, and radare2 did a great job in discerning these. The relationship between data and code is not transparent not only because this is a packed binary, but because I dumped the memory segments for static analysis. When searching for the string "Incorrect, please try again", radare2 correctly found a reference to the serial checking At the end of the code block of this loop, there exists a stack variable which is checked against "5" - if it's less than five the variable is incremented. If this value is more than five, the loop exits and the message "Do not pirate software" appears. I've reconstructed the code flow graph below.



This would mean that the only exit condition to this function is when we've exhausted attempts. From this perspective, this would imply that one of the functions in this loop contains the exit condition for the program once the serial number check has passed - and not the callee to this function. Not returning to the callee would be an unusual code design choice, requiring something like an int 0x20 as we've seen previously.

Side Note: From here on, all function and variable names are ones that I've labeled. This binary (moreover dumped memory segments) does not have debug information.

Now it's time to check what the functions do in this loop, so I set breakpoints on all of them. When the debugger isn't trapped on a breakpoint, it reveals the function which takes user input - GetInputPassword at 0D69:09CD. This function is minimal and simply provides parameters to another function called GetInputAndExecute at 0D69:07A0. The passed parameters are the structure which ends up holding the password we've entered, and a function pointer to StoreInputPassword. GetInputAndExecute will call GetInput at 0CFD:039F which will actually receive the user input to store into the string structure (0E58:135A), and then call the function pointer which was passed - StoreInputPassword.

1. GetInputPassword
2. GetInputAndExecute(struct string_info*, void* func)
3a. GetInput(string_info)
3b. func()


This summarizes how the password input is received and stored, the function which follows is UNK_GetSecretPassword at 0D69:08A4. Similar to GetInputPassword, this function also calls GetInputAndExecute - however with a function pointer to GetEndOfString. When the user has entered the password, it likely came with a carriage return and line feed (CRLF) upon pressing VK_ENTER. All this function does is update the length of what the string should be without these characters. Lastly, UNK_GetSecretPassword will call a function ExecuteStringFunc_18h (0D69:092E) which as the name implies, takes the function pointer in this unknown string struct at offset 0x18 to execute it. This is where things get weird because the function at offset 0x18 for our password struct (0E58:135A) is EmptyFunction at 0CFD:047D - a function that just returns zero. After this, nothing happens control returns to the serial check loop.



One thing that is certain is that GetInputPassword stores the user input into a structure. Where I had uncertainty is the UNK_GetSecretPassword function, which would be the only logical place where we could check the result of our password and proceed to installation. This funtcion calls GetInputAndExecute which doesn't check the password result, but it also calls that function pointer at offset 0x18 in our unknown string structure. This function call currently resolves to EmptyFunction, and it's the very last function which is called before we iterate the serial check loop. The only logical explanation I can deduce is that this pointer must be the password checking function, and for some reason it's pointing to EmptyFunction instead. I wanted to finish reverse engineering this loop, and below are my final results.



Filling out code graph seems logical, except of course of the password checking. On iteration the status is set to an empty string, then we accept user input. If attemps is less than five we display "Incorrect, please try again" - and if greater then the status string is set to "Do not pirate software". Like the password struct (135A) which has a 0x18 string function which resolves to EmptyFunction, the status structure points to DisplaySerialBox at 0CFD:044C. This makes sense, there was an update to the status string therefore we call a handler function which notifies the screen to be redrawn. This further alludes to the fact that if the password string was updated, we should have a handler function which reads and processes the new attempt - instead of calling EmptyFunction. Below is a table that describes these structures into further detail.

Offset Size Description 135A Password 145A Status
0x00 WORD Unknown 0x00 0x01
0x02 WORD unique id 0xD7B1 0xD7B2
0x04 WORD Unknown 0x80 0x80
0x06 WORD Unknown 0x00 0x00
0x08 WORD String length w/o CRLF - -
0x0A WORD String length w/ CRLF - -
0x0C WORD string ptr in es 13DA 14DA
0x10 DWORD function ptr 0CFD:0367 (SetString18h) 0CFD:0367 (SetString18h)
0x14 DWORD function ptr 0CFD:039F (GetInputString) 0CFD:044C (DisplaySerialBox)
0x18 DWORD function ptr (handler) 0CFD:047D (EmptyFunction) 0CFD:044C (DisplaySerialBox)
0x1A DWORD function ptr 0CFD:047D (EmptyFunction) 0CFD:047D (EmptyFunction)


The last feasibility in the password check is to see how the function handler (offset 0x18) is populated into the struct. When reverse engineering the two string structures I found a function pointer at offset 0x10, now named SetString18h at 0CFD:0367. This function is called during initialization of INSTALL.EXE, and sets various function pointers into the two structs - including the handler (offset 0x18). If the unique id is 0xD7B1, then the handler is EmptyFunction. If the unique id is 0xD7B2, then the handler is DisplaySerialBox.



It's now apparent that the only exit condition from the serial check loop is to fail the password check five times. The conclusion at this point through about a month of reverse engineering is that this is a fake program.



Inspecting Disk Files


If the program was fake, then what were those six disks? Up until this point I never even looked at those files but I decided to open them in a hex editor. One immediate problem is that the "header" for each disk was different. Disks 1,2,3,5 were consistent - however Disk 4 had the label "Disk 4/6" on line 0x60 where the others were on line 0x50. Disk 6 had it's data start on line 0xB0 instead of 0xA0 like the rest. The biggest problem in the inconsistencies was Disk 6 because if the header is ASCII text, a program would need to assume where data begins with a static offset.



I took a look at other disks from MicroProse for comparison, F-15 Strike Eagle III had MZ headers and Grand Prix 1 had PK headers. There were no disks that wasted 150+ bytes on a flashy header. Grand Prix 2 was released in 1996 and this puts Grand Prix 2.a around midway through development, assuming production started in 1992. The final release was on a CD-ROM with a total content size of 500MB+ and I would assume by this point it would have required more than six disks.



MicroProse Hard Disk Installation Utility


One inconsistency that stuck out from the beginning is that the installer says Grand Prix 2.β (2.b) whereas all other warez files (and even the disks) say 2.á (2.a). This made me wonder about the origin of the installer, because it appeared to be an official MicroProse installer that was also used on many other DOS games. I wanted to check how the other MicroProse installers worked, so I reached out to Brian Reynolds on twitter who said that the serial section didn't exist when he created the software. Unfamiliar with MicroProse games and the lack of information about the installer, I made a small tracker of which games used which version of the installer.

Game Year Version
Grand Prix 2.b 1994 Ver 1.59 Lib 3.76
UFO 1994 Ver 1.06 Lib 3.36
Grand Prix (1.03) 1992 Ver 1.06 Lib 3.36
Grand Prix (1.12) 1992 Ver 1.06 Lib 3.36
Pirates! Gold 1993 Ver 1.17 Lib 4.26
Railroad Tycoon Deluxe 1993 Ver 1.16 Lib 4.26
Task Force 1942 (41101.1) 1992 Ver 1.12 Lib 3.36
Darklands 1992 Ver 1.14 Lib 4.13
F-15 Strike Eagle III 1992 Ver 1.12 Lib 4.02
Master of Orion 1993 Ver 1.17 Lib 4.32

The way the versioning appears to work is that the Lib version will either increase with the Ver number, or stay the same. The years don't really matter because a game could, and most likely just shipped with any arbitrary version. One thing to note is that the Lib version will never be mismatched with the Ver number. In this case Grand Prix 2.b is Ver 1.59 which would mean it would need a Lib version of at least 4.26 - but it's 3.76 which is an anomaly.



Conclusion


The installer is simply a fake program based purely on code analysis, supported by the totality of information regarding disk data and installer comparisons. It's still a mystery which team or person created this fake program, and how they were able to add the serial key prompt. This would likely mean the user had access to source code, as it didn't appear to be a modified existing installer. Although I didn't get to crack some unreleased and top-secret game, it was a good challenge and I learned a lot more about DOS reversing and the warez scene in general.