BYTE* / Shellcode Part 1: How to Swallow Exceptions in Win32 Assembly

Date: Aug 26, 2017

Last-Modified: Sep 18, 2018

NOTE: See the Limitations section for information about how the SafeSEH linker option impacts exception handling

SUMMARY:
This is the first article in a two-part series aimed at beginner to intermediate level reverse-engineers or any programmer who wants to understand how to perform code branching using Win32's Structured Exception Handling (SEH) facilities without relying on the Standard C Runtime library or any external functions. More precisely, we will be using pure x86 assembly instructions to illustrate bare-metal exception handling code to comply with what the operating system requires while at the same time, ignoring the rest of the complexities associated with the equivalent exception-handling functionality produced by a C/C++ compiler. If you need to understand or create your own anti-debugging/anti-disassembly tricks involving exceptions, the information in this article may come in handy. Therefore you're going to need some Win32 x86 assembly language experience. Also note that these techniques will work in any 32-bit process running in either 32-bit or 64-bit versions of Windows.

SEH allows the __try/__except/__finally block features present in Microsoft's Standard C Runtime library to capture exceptions and branch appropriately without allowing those exceptions to be seen outside of the function blocks they are declared in. In Part Two of this series, we'll apply this knowledge towards one method of discovering the randomly loaded location of KERNEL32.DLL so that we may dynamically access any API in the system whether or not the function(s) have been formally imported into the current process. This will bypass the Address Space Layout Randomization (ASLR) security feature present on post-XP versions of Windows but will also work on any of the older NT-Based versions of Windows.

A firm grasp on how to encapsulate exceptions produced by your own code is necessary before we move on to Part Two. Code must be able to recover from Access Violation exceptions as regions of memory that are not mapped in to the current process are encountered.

In this article, we will also go to the effort to write position-independent code that isn't reliant on static external dependencies or otherwise use hardcoded addresses of any kind. Also known as "shellcode", this is the type of code security professionals and malware authors are the most interested in because it can be the most powerful due to its portability. This will allow it to work whether compiled directly or injected into a process as it will make no assumptions about where it may be loaded in memory. Although exception handling and writing shellcode are two separate things, we need this flexibility for Part Two of this series where it will be used as a building-block in an injectable bootstrapping routine.

RECOMMENDED READING:
While the concepts of low-level Win32 exception handling are not new and have been discussed in several hacking and security-related tutorials and publications over the past decade, most sample code only shows how to insert and remove stack exception handler frames. There is little information on how that handler, once called by the operating system, might get back to the same context the program was in prior to the exception. This would be the assembly equivalent of branching into a C/C++ __except block followed by cleanly exiting the block and proceeding with the remainder of the function normally. I wrote this article not only to fill that information gap, but to share some of the things I've discovered along the way.

Almost all articles that deal with the low-level mechanics of Win32 SEH refer readers to Matt Pietrek's legendary 1997 article, A Crash Course on the Depths of Win32 Structured Exception Handling. If you've never read it, I highly recommend reading at least everything prior to the "Compiler-level SEH" section. If you have the time though, its worth reading the whole thing.

PART 1: READING ARBITRARY MEMORY LOCATIONS WITHOUT CRASHING:
We want the capability to "poke" at any location in memory that may or may not be mapped into the current process. If an exception is generated because we've attempted to access an invalid memory location, we don't want the hosting process to crash. The first thing we need is a function that tells us whether or not we can READ memory at any given 32-bit address. It would seem lucky for us that the KERNEL32.DLL function IsBadReadPtr() has the exact functionality we need. Its prototype is:

BOOL IsBadReadPtr( CONST VOID *lp, // address of memory block UINT ucb // size of block );

This function has been present in all versions of Windows NT since the early days, simply returning NONZERO if the passed memory range cannot be read, or ZERO otherwise.

Although all of KERNEL32.DLL's functions are accessible to all Win32 processes, shellcode can't predict where these functions will reside in memory because each time Windows boots (Vista and up), the base load addresses for all system DLLs are randomized for security. So we must write our own version of IsBadReadPtr() from scratch and it must be able to handle exceptions resulting from accessing "bad" memory addresses.

The easiest way to determine if memory is accessible or not is by simply trying to read from that memory location. If an Access Violation exception is generated, we know that memory is not accessible and we return 1, otherwise return 0. This is the same infamous exception that is generated anytime code attempts to access values through a NULL pointer. Nothing could be simpler than the following C implementation using a __try/__except block:

BOOL CustomIsBadReadPtr(const DWORD* p) { //assume all memory is readable BOOL bBad = 0; __try { DWORD dwDummy = *p; //if we got here, memory was readable! } __except(EXCEPTION_EXECUTE_HANDLER) { //safe to assume we got an ACCESS VIOLATION exception bBad = 1; } return(bBad); }

The KERNEL32.DLL implementation of IsBadReadPtr() was also built using __try/__except blocks and is not much different than my version above, except it supports reading a range of bytes. For our purposes we'll just test that we can read a single DWORD at the specified memory location.

While the function above seems simple enough, the assembly code generated by the Visual C++ 7.1 compiler is rather ugly:

00404788 55 PUSH EBP 00404789 8BEC MOV EBP, ESP 0040478B 6A FF PUSH -1 0040478D 68 588D4000 PUSH OFFSET asmtest.00408D58 00404792 68 1E604000 PUSH <&MSVCR71._except_handler3> ;external dependency! 00404797 64:A1 00000000 MOV EAX, DWORD PTR FS:[0] 0040479D 50 PUSH EAX 0040479E 64:8925 00000000 MOV DWORD PTR FS:[0], ESP 004047A5 51 PUSH ECX 004047A6 51 PUSH ECX 004047A7 51 PUSH ECX 004047A8 51 PUSH ECX 004047A9 53 PUSH EBX 004047AA 56 PUSH ESI 004047AB 57 PUSH EDI 004047AC 8965 E8 MOV DWORD PTR SS:[EBP-18], ESP 004047AF 8365 E4 00 AND DWORD PTR SS:[EBP-1C], 00000000 004047B3 8365 FC 00 AND DWORD PTR SS:[EBP-4], 00000000 004047B7 8B45 08 MOV EAX, DWORD PTR SS:[EBP+8] 004047BA 8B00 MOV EAX, DWORD PTR DS:[EAX] ;DWORD dwDummy = *p; 004047BC 8945 E0 MOV DWORD PTR SS:[EBP-20], EAX ;if we got here, memory was readable! 004047BF 834D FC FF OR DWORD PTR SS:[EBP-4], FFFFFFFF 004047C3 EB 12 JMP SHORT asmtest.004047D7 004047C5 33C0 XOR EAX, EAX ;__except block begin 004047C7 40 INC EAX 004047C8 C3 RETN ;return to MSVC handler 004047C9 8B65 E8 MOV ESP, DWORD PTR SS:[EBP-18] ;__except block cleaned up; finish remainder of function 004047CC C745 E4 01000000 MOV DWORD PTR SS:[EBP-1C], 1 004047D3 834D FC FF OR DWORD PTR SS:[EBP-4], FFFFFFFF 004047D7 8B45 E4 MOV EAX, DWORD PTR SS:[EBP-1C] ;tear down MSVCRT exception frame 004047DA 8B4D F0 MOV ECX, DWORD PTR SS:[EBP-10] 004047DD 64:890D 00000000 MOV DWORD PTR FS:[0], ECX 004047E4 5F POP EDI 004047E5 5E POP ESI 004047E6 5B POP EBX 004047E7 C9 LEAVE 004047E8 C3 RETN

This is no fault of the compiler, its just that C/C++ needs to be able to support features like nested __try/__except blocks, object cleanup, etc. so the majority of the code here is to comply with the Visual C++ implementation of SEH.

Besides the size of 97 bytes, the worst thing about the generated code is that it depends on the C Runtime Library function _except_handler3(). Shellcode can't have external static dependencies for reasons already discussed, so the code above is unusable for our purposes in its current state. Instead we're going to create a version of the same function that will have no external dependencies. Because we will also remove the code specific to the Visual C++ exception handling semantics, the resulting code will be only about half the size!

The minimum we must do is register an exception handling frame within the current thread and then just perform the read operation from the pointer passed. To simply things, we can embed the exception handling block directly within the function no different than any conditional jump branch, which just sets the return value to 1. The remainder of the function resets the exception handler back to its original state and returns 0. You might write the initial version of the function like this:

00111236 55 PUSH EBP ;set up function's stack frame 00111237 8BEC MOV EBP, ESP 00111239 68 55121100 PUSH exception_handler (0x111255) ;build EXCEPTION_REGISTRATION structure on stack, first with our handler 0011123E 64:FF35 00000000 PUSH DWORD PTR FS:[0] ; followed by previous handler in chain 00111245 64:8925 00000000 MOV DWORD PTR FS:[0], ESP ;install our exception_frame 0011124C 8B45 08 MOV EAX, DWORD PTR SS:[EBP+8] ;eax = pointer passed as argument #1 0011124F 8B00 MOV EAX, DWORD PTR DS:[EAX] ;eax = *eax / dereference pointer - can we read DWORD memory? 00111251 33C0 XOR EAX, EAX ;if we got here, no exception occurred, memory is readable, return zero 00111253 EB 03 JMP SHORT cleanup (0x111258) ;skip past exception_handler code to cleanup and exit 00111255 33C0 XOR EAX, EAX ;exception_handler entry point 00111257 40 INC EAX ;return 1 00111258 64:8F05 00000000 POP DWORD PTR FS:[0] ;cleanup; restore previous handler 0011125F 83C4 04 ADD ESP, 4 ;remove exception_handler from stack 00111262 5D POP EBP ;restore caller's stack frame 00111263 C2 0400 RETN 4 ;remove STDCALL argument and return from function

The 48-byte code above starts by setting up a normal function stack frame, builds an EXCEPTION_REGISTRATION structure on the stack, and then installs that frame as the current exception handler. Under Win32, accessing FS:[0] returns a pointer to the first DWORD in Thread Environment Block (TEB) which points to first exception frame of the current thread's exception handler chain. We set our frame to be first in this list, chaining on to original frame which is used if our handler chooses not not to handle the exception. After all of the setup code, we load the memory location we are testing and attempt to dereference that memory at instruction 0x11124F. If nothing happens, we simply drop to the instruction following the dereference which zeroes EAX (our return value), jumps past exception_handler code to the cleanup code. Cleanup consists of assigning the previous exception handling frame back to the TEB and removing our exception frame from the stack. In the case an exception does occur at 0x11124F, which we can safely assume will be a first-chance Access Violation exception 0xC0000005, the operating system first notices the exception from an interrupt within kernel-mode. Kernel-mode then propagates the exception into user-mode where our thread's EIP is changed from where the exception occurred into NTDLL.DLL code that ultimately walks our thread's exception handler chain to search for someone to handle it. Since our handler happens to be first in the list, our exception handling block is the first to gain control.

If we passed a readable memory address to the function above, it would return zero without any problems. If an invalid address was passed, we'd run into a couple problems after our exception handler gained control. The first hint that something is wrong would be returning to a seemingly random location in memory after the final RET instruction. Why?

The first of our problems is that Win32 expects our handler to be a function whose signature is formally:

typedef EXCEPTION_DISPOSITION (CDECL *ExceptionHandler) (EXCEPTION_RECORD* ExceptionRecord, EXCEPTION_REGISTRATION* EstablisherFrame, CONTEXT* ContextRecord, DISPATCHER_CONTEXT* DispatcherContext);

Within an exception handler, the stack is set up so that a simple RET (without operands) will take you back to the operating system's NTDLL.ExecuteHandler(), not the caller of the function we were in that generated the exception. Essentially the stack is in a different state than it was prior to the exception. Regardless of its proximity to where the exception was generated, an exception handler will be running in a different stack context which allow the operating system's SEH semantics to kick in. This includes providing the handler with all sorts of information about the exception and even giving it a choice to "fix-things" and resume execution where the exception occurred. In other words, the operating system pushed a bunch of stuff on the stack after the exception and our original function return address is no longer aligned with the caller of our function.

Since our function's primary purpose is to set a flag that an exception was hit and return to our caller, how can we safely break out of the handler?

HOW TO EXIT AN EXCEPTION HANDLER:
A Win32 exception handler must handle an exception in one of three ways, two of which require the handler returning a value through EAX back to the operating system's NTDLL.ExecuteHandler():

Return 0 (ExceptionContinueExecution)
This tells the OS to retry executing the excepting instruction (with or without modifications to the passed CONTEXT structure)
Return 1 (ExceptionContinueSearch)
This tells the OS "I'm not handling that exception, try the next handler in the chain"
DON'T RETURN
Just keep executing from the handler, usually to terminate the process

It is possible to accomplish our goal with choice #1 by resuming execution at a different branch within the function (requires modification of the CONTEXT structure), however this method has already had a lot of dicussion and isn't as lightweight. We definitely don't want choice #2 because our code will lose control usually resulting in process termination. For our purposes, we want choice #3 so you can see how to manually clean up all the stuff the operating system placed on the stack and avoid returning back to the operating system. There may be other names for unwinding the stack inside an exception handler, but I call it swallowing the exception. Although this method is perfectly safe and super-elegant, its not exactly going to be endorsed by Microsoft. Besides the unlikelihood of changing between versions of Windows, other side effects include smaller code and faster execution; what's not to love?!

Swallowing an exception from the context of the handler requires a minimum of restoring the original value of the ESP register. As long as you don't make assumptions about the contents of the other registers, you should be able to pick up right where your function left off. Let's first discuss at least 3 reliable ways to restore ESP:

GRAB IT FROM THE CONTEXT STRUCTURE:
The handler can access a CONTEXT structure the operating system placed on the stack as argument #3. This structure contains the original register values as they existed prior to the exception. The CONTEXT pointer will be at [ESP+0x0C] and the original value of the ESP register is at offset 0xC4 into this structure.
MARK THE STACK WITH A SENTINEL VALUE:
PUSH any "unnaturally" occuring DWORD pattern (e.g.: 0xBAADBEEF, 0xBADC0FEE, etc.) on the stack after the local exception handler frame has been established. The exception handler can then unwind the stack by POPing values off in a loop until the sentinel is encountered.
REFERENCE THE CURRENT EXCEPTION-HANDLER FRAME:
If we are in the handler for a "leaf" function (a function that doesn't call any other functions) we can take advantage of the fact that FS:[0] will always point at the frame that belongs to the current handler. A side effect of this is that FS:[0] also happens to be the value of ESP after the exception frame was established, which is usually the value of ESP we want to restore.

Avoid this method if your function calls other functions within your established exception frame as these functions could set up their own frames and defer back to your handler. In other words, if your handler could ever catch a nested function's exception because the nested function's handler chose not to handle it, your function will get the nested function's stack context and you'll surely crash.

I'm going with choice #3 because it results in the smallest most elegant solution for our simple function. We can avoid hardcoding any values or structure offsets and also serves to teach another point about how the operating system calls into your exception handler which we'll get in to below.

Referring back to address 0x111245 in the code shown above, notice the instruction that installed our exception handler frame: MOV FS:[0],ESP. Since that was the last thing we placed on the stack prior to the exception, FS:[0] will still contain the value of ESP we need to restore. Therefore, all we need is:

MOV ESP, FS:[0]

However there is one catch to this method that illustrates another important point about how the operating system calls exception handlers. Just prior to invoking your handler, NTDLL.ExecuteHandler() will have installed yet another exception handling frame in front of your frame to catch what is known as a nested exception. This handler exists to prevent an infinite loop should an exception occur within your exception handler. Therefore for our code to work, we must dereference the current frame's "previous" pointer to get back to our frame which will now be second in the chain. The code should instead look like this:

MOV ESP, FS:[0] ;ESP = OS-provided nested exception handler frame MOV ESP, [ESP] ;ESP = our frame

Note that EBP will still not be within the context of our function but since this simple function doesn't use EBP after the exception, we'll just allow the caller's EBP to get restored as it normally does in the function's epilogue (cleanup-code). More complex functions that need EBP to access variables after the exception handling block completes might just restore EBP from a saved position on the stack after both ESP is restored and the exception frame is removed. Or you might just simplify your life by pulling both EBP and ESP from the CONTEXT structure to kill two birds with one stone and don't bother referencing FS:[0].

THE FIXED FUNCTION:
Incorporating the changes above, the fixed version of our function is now 63 bytes and it can fully encapsulate an exception while maintaining stack integrity:

00AF1270 55 PUSH EBP ;set up function's stack frame 00AF1271 8BEC MOV EBP, ESP 00AF1273 E8 00000000 CALL $+5 (next_instruction) ;get EIP of next instruction (after 5-byte CALL) - MASM-syntax 00AF1278 58 POP EAX ;EAX now equals whatever memory location THIS instruction is loaded at 00AF1279 83C0 1C ADD EAX, 01Ch ;apply relative offset so EAX now points to our exception-handler entry point 00AF127C 50 PUSH EAX ;build EXCEPTION_REGISTRATION structure on stack, first with our handler 00AF127D 64:FF35 00000000 PUSH DWORD PTR FS:[0] ; followed by previous handler in chain 00AF1284 64:8925 00000000 MOV DWORD PTR FS:[0], ESP ;install our exception_frame to be first in chain 00AF128B 8B45 08 MOV EAX, DWORD PTR SS:[EBP+8] ;eax = pointer passed as argument #1 00AF128E 8B00 MOV EAX, DWORD PTR DS:[EAX] ;eax = *eax / dereference pointer - can we read DWORD memory? 00AF1290 33C0 XOR EAX, EAX ;if we got here, no exception occurred, memory is readable, return zero (eax=0) 00AF1292 EB 0D JMP SHORT CLEANUP_label (0000000Fh) ;skip past exception_handler code 00AF1294 33C0 XOR EAX, EAX ;exception_handler entry point 00AF1296 40 INC EAX ;return 1 00AF1297 64:8B25 00000000 MOV ESP, DWORD PTR FS:[0] ; **** SWALLOW EXCEPTION BY **** 00AF129E 8B2424 MOV ESP, DWORD PTR SS:[ESP] ; **** RESTORING ESP **** 00AF12A1 64:8F05 00000000 POP DWORD PTR FS:[0] ;CLEANUP_label - restore previous exception handler 00AF12A8 83C4 04 ADD ESP, 4 ;remove our exception_handler from stack 00AF12AB 5D POP EBP ;restore caller's stack frame 00AF12AC C2 0400 RETN 4 ;remove STDCALL argument and return from function

The code above is also a little larger because we've changed the 3rd instruction from pushing a hardcoded exception handler address to dynamically-calculating the address relative to where these instructions are currently executing in memory. Most shellcode that needs to reference itself in a portable manner must employ some technique to find where it is loaded into memory.

CONCLUSION:
So now we have a shellcode function that will safely indicate whether a particular memory address is accessible. We'll use this function as a building-block for Part Two of this series where we search memory for the location of KERNEL32.DLL.

LIMITATIONS:
If the module, and not the shellcode itself, that executes the exception-swallowing technique described here, was linked using the Visual Studio 7.1 (2003) or later linker without the /SAFESEH:NO option, AND the linker determines that all statically linked libraries are compatible with SAFESEH (MASM code requires a SafeSEH opt-in using the .safeseh directive), the resulting PE Module will be created with an IMAGE_LOAD_CONFIG_DIRECTORY. Within this directory is a list of pre-registered exception handlers that effectively enable the SafeSEH security mechanism. SafeSEH is designed to prevent code from creating exception handlers on the fly. Specifically, instructions that assign a value to the top-level exception frame fs:[0] are ignored as if they are NOPs for pointers not pre-registered within the list of safe exception handlers. The result is that our intentional access violations cannot be swallowed and will go to the default exception handler causing the thread will crash. This doesn't affect the shellcode itself or the sample from working since we didn't opt-in, however it affects any modules that run the injected code. If SafeSEH is enabled in the module, the shellcode will cause the module to crash, otherwise it will work. You can check for the presence of a module's IMAGE_LOAD_CONFIG_DIRECTORY using the -t option to the pelook tool. It should be noted that zero'ing out this directory entry is all that is needed to disable SafeSEH and allow this technique to work once again.

Also, if any VEH handlers (Vectored Exception Handling) were previously registered by the process, note that a VEH chain receives exception notifications before the thread's SEH chain. Available in Windows XP and up, VEH is primarily used for debugging purposes. How a VEH handler chooses to react to an exception may prevent a normal exception handler from gaining control. Please refer to Matt Pietrek's VEH article or MSDN for more information.

NOTES:
A final note worth mentioning is that while swallowing exceptions is perfect for exceptions occurring in small "leaf" functions, you probably shouldn't do this when your exception handler could catch exceptions originating from nested function's exception handlers who chose not to handle them. Swallowing these exceptions short-curcuits the operating system's normal unwind semantics that take place after a handler volunteers to handle the exception by returning back to NTDLL.ExecuteHandler(). Because we don't return, we prevent a nested handler's cleanup code from executing (destructors and __finally blocks).

<END OF ARTICLE>

Questions or Comments?