NOTE: See the
Limitations section for information about how the SafeSEH linker option impacts exception handling
This is the first article in a two-part series aimed at beginner to intermediate level reverse-engineers or any
programmer who wants to understand how to perform code branching using Win32's Structured Exception Handling
(SEH) facilities without relying on the Standard C Runtime library or any external functions. More precisely, we
will be using pure x86 assembly instructions to illustrate bare-metal exception handling code to comply with what the
operating system requires while at the same time, ignoring the rest of the complexities associated with the
equivalent exception-handling functionality produced by a C/C++ compiler.
If you need to understand or create your own anti-debugging/anti-disassembly tricks involving exceptions,
the information in this article may come in handy.
Therefore you're going to need some Win32 x86 assembly language experience.
Also note that these techniques will work in any 32-bit process running in either 32-bit or 64-bit versions of Windows.
SEH allows the __try/__except/__finally block features present in Microsoft's Standard C Runtime library to
capture exceptions and branch appropriately without allowing those exceptions to be seen outside of the
function blocks they are declared in. In Part Two of this series, we'll apply this knowledge towards one method
of discovering the randomly loaded location of KERNEL32.DLL so that we may dynamically access any API in the
system whether or not the function(s) have been formally imported into the current process. This will
bypass the Address Space Layout Randomization (ASLR) security feature present on post-XP versions of Windows
but will also work on any of the older NT-Based versions of Windows.
A firm grasp on how to encapsulate exceptions produced by your own code is necessary
before we move on to Part Two. Code must be able to recover
from
Access Violation exceptions as regions of memory that are not mapped in to the current process are
encountered.
In this article, we will also go to the effort to write position-independent code that isn't reliant on static external dependencies
or otherwise use hardcoded addresses of any kind. Also known as "shellcode", this is the type of code security professionals and malware
authors are the most interested in because it can be the most powerful due to its portability.
This will allow it to work whether compiled directly or injected into a process as it will make no assumptions about where it may be loaded in memory.
Although exception handling and writing shellcode are two separate things, we need this flexibility
for Part Two of this series where it will be used as a building-block in an injectable bootstrapping routine.
While the concepts of low-level Win32 exception handling are not new and have been discussed in several hacking
and security-related tutorials and publications over the past decade, most sample code only shows how to insert
and remove stack exception handler frames. There is little information on how that handler, once called by the
operating system, might get back to the same context the program was in prior to the exception.
This would be the assembly equivalent of branching into a C/C++ __except block
followed by cleanly exiting the block and proceeding with the remainder of the function normally. I wrote this
article not only to fill that information gap, but to share some of the things I've discovered along the way.
Almost all articles that deal with the low-level mechanics of Win32 SEH refer readers to Matt Pietrek's
legendary 1997 article,
A Crash Course on the Depths of Win32 Structured Exception Handling.
If you've never read it, I highly recommend reading at least everything prior to the "Compiler-level SEH" section.
If you have the time though, its worth reading the whole thing.
We want the capability to "poke" at any location in memory that may or may not be mapped into the current process.
If an exception is generated because we've attempted to access an invalid memory location, we
don't want the hosting process to crash.
The first thing we need is a function that tells us whether or not we can READ memory at any given 32-bit address.
It would seem lucky for us that the KERNEL32.DLL function IsBadReadPtr() has the exact functionality we need.
Its prototype is:
BOOL IsBadReadPtr(
CONST VOID *lp, // address of memory block
UINT ucb // size of block
);
|
This function has been
present in all versions of Windows NT since the early days, simply returning NONZERO if the passed memory range cannot be read,
or ZERO otherwise.
Although all of KERNEL32.DLL's functions are accessible to all Win32 processes,
shellcode can't predict where these functions will reside in memory because each time Windows boots (Vista and
up), the base load addresses for all system DLLs are randomized for security.
So we must write our own version of IsBadReadPtr() from scratch and it must be able to handle
exceptions resulting from accessing "bad" memory addresses.
The easiest way to determine if memory is accessible or not is by simply trying to read from that memory location.
If an Access Violation exception is generated, we know that memory is not accessible
and we return 1, otherwise return 0.
This is the same infamous exception that is generated anytime code attempts to access values through
a NULL pointer. Nothing could be simpler than the following C implementation using a __try/__except block:
BOOL CustomIsBadReadPtr(const DWORD* p)
{
//assume all memory is readable
BOOL bBad = 0;
__try
{
DWORD dwDummy = *p;
//if we got here, memory was readable!
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
//safe to assume we got an ACCESS VIOLATION exception
bBad = 1;
}
return(bBad);
}
|
The KERNEL32.DLL implementation of IsBadReadPtr() was also built using __try/__except blocks and is not
much different than my version above, except it supports reading a range of bytes.
For our purposes we'll just test that we can
read a single DWORD at the specified memory location.
While the function above seems simple enough, the assembly code generated by the Visual C++ 7.1 compiler is
rather ugly:
00404788 55 PUSH EBP
00404789 8BEC MOV EBP, ESP
0040478B 6A FF PUSH -1
0040478D 68 588D4000 PUSH OFFSET asmtest.00408D58
00404792 68 1E604000 PUSH <&MSVCR71._except_handler3> ;external dependency!
00404797 64:A1 00000000 MOV EAX, DWORD PTR FS:[0]
0040479D 50 PUSH EAX
0040479E 64:8925 00000000 MOV DWORD PTR FS:[0], ESP
004047A5 51 PUSH ECX
004047A6 51 PUSH ECX
004047A7 51 PUSH ECX
004047A8 51 PUSH ECX
004047A9 53 PUSH EBX
004047AA 56 PUSH ESI
004047AB 57 PUSH EDI
004047AC 8965 E8 MOV DWORD PTR SS:[EBP-18], ESP
004047AF 8365 E4 00 AND DWORD PTR SS:[EBP-1C], 00000000
004047B3 8365 FC 00 AND DWORD PTR SS:[EBP-4], 00000000
004047B7 8B45 08 MOV EAX, DWORD PTR SS:[EBP+8]
004047BA 8B00 MOV EAX, DWORD PTR DS:[EAX] ;DWORD dwDummy = *p;
004047BC 8945 E0 MOV DWORD PTR SS:[EBP-20], EAX ;if we got here, memory was readable!
004047BF 834D FC FF OR DWORD PTR SS:[EBP-4], FFFFFFFF
004047C3 EB 12 JMP SHORT asmtest.004047D7
004047C5 33C0 XOR EAX, EAX ;__except block begin
004047C7 40 INC EAX
004047C8 C3 RETN ;return to MSVC handler
004047C9 8B65 E8 MOV ESP, DWORD PTR SS:[EBP-18] ;__except block cleaned up; finish remainder of function
004047CC C745 E4 01000000 MOV DWORD PTR SS:[EBP-1C], 1
004047D3 834D FC FF OR DWORD PTR SS:[EBP-4], FFFFFFFF
004047D7 8B45 E4 MOV EAX, DWORD PTR SS:[EBP-1C] ;tear down MSVCRT exception frame
004047DA 8B4D F0 MOV ECX, DWORD PTR SS:[EBP-10]
004047DD 64:890D 00000000 MOV DWORD PTR FS:[0], ECX
004047E4 5F POP EDI
004047E5 5E POP ESI
004047E6 5B POP EBX
004047E7 C9 LEAVE
004047E8 C3 RETN |
This is no fault of the compiler, its just that C/C++ needs to be able to support
features like nested __try/__except blocks, object cleanup, etc. so the majority of the code
here is to comply with the Visual C++ implementation of SEH.
Besides the size of 97 bytes, the worst thing about the generated code is that it depends on
the C Runtime Library function _except_handler3(). Shellcode can't have external static dependencies for reasons already discussed, so
the code above is unusable for our purposes in its current state.
Instead we're going to create a version of the same function that will have no external dependencies.
Because we will also remove the code specific to the Visual C++ exception handling semantics, the resulting
code will be only about half the size!
The minimum we must do is register an exception handling frame within the current thread and then just perform the read operation
from the pointer passed.
To simply things, we can embed the exception handling block directly within the function no different than any conditional jump branch,
which just sets the return value to 1. The remainder of the function resets the exception handler back to its original state
and returns 0. You might write the initial version of the function like this:
00111236 55 PUSH EBP ;set up function's stack frame
00111237 8BEC MOV EBP, ESP
00111239 68 55121100 PUSH exception_handler (0x111255) ;build EXCEPTION_REGISTRATION structure on stack, first with our handler
0011123E 64:FF35 00000000 PUSH DWORD PTR FS:[0] ; followed by previous handler in chain
00111245 64:8925 00000000 MOV DWORD PTR FS:[0], ESP ;install our exception_frame
0011124C 8B45 08 MOV EAX, DWORD PTR SS:[EBP+8] ;eax = pointer passed as argument #1
0011124F 8B00 MOV EAX, DWORD PTR DS:[EAX] ;eax = *eax / dereference pointer - can we read DWORD memory?
00111251 33C0 XOR EAX, EAX ;if we got here, no exception occurred, memory is readable, return zero
00111253 EB 03 JMP SHORT cleanup (0x111258) ;skip past exception_handler code to cleanup and exit
00111255 33C0 XOR EAX, EAX ;exception_handler entry point
00111257 40 INC EAX ;return 1
00111258 64:8F05 00000000 POP DWORD PTR FS:[0] ;cleanup; restore previous handler
0011125F 83C4 04 ADD ESP, 4 ;remove exception_handler from stack
00111262 5D POP EBP ;restore caller's stack frame
00111263 C2 0400 RETN 4 ;remove STDCALL argument and return from function |
The 48-byte code above starts by setting up a normal function stack frame, builds an
EXCEPTION_REGISTRATION
structure on the stack, and then installs that frame as the current exception handler. Under
Win32, accessing FS:[0] returns a pointer to the first DWORD in
Thread Environment Block (TEB) which points to first exception
frame of the current thread's exception handler chain. We set our frame to be first in this list,
chaining on to original frame which is used if our handler
chooses not not to handle the exception. After all of the setup code, we load the memory location we are
testing and attempt to dereference that memory at instruction 0x11124F. If nothing happens, we simply drop to
the instruction following the dereference which zeroes EAX (our return value), jumps past exception_handler code to
the cleanup code. Cleanup consists of assigning the previous exception handling frame back to the TEB and removing our
exception frame from the stack. In the case an exception does occur at 0x11124F, which we
can safely assume will be a first-chance Access Violation exception 0xC0000005, the operating system first
notices the exception from an interrupt within kernel-mode. Kernel-mode then propagates the exception into user-mode
where our thread's EIP is changed from where the exception occurred into NTDLL.DLL code that ultimately walks
our thread's exception handler chain to search for someone to handle it. Since our handler happens to be first
in the list, our exception handling block is the first to gain control.
If we passed a readable memory address to the function above, it would return zero without any problems.
If an invalid address was passed, we'd run into a couple problems after our exception handler gained control.
The first hint that something is wrong would be returning to a seemingly random location in memory after the final RET instruction.
Why?
The first of our problems is that Win32 expects our handler to be a function
whose signature is formally:
typedef EXCEPTION_DISPOSITION (CDECL *ExceptionHandler)
(EXCEPTION_RECORD* ExceptionRecord, EXCEPTION_REGISTRATION* EstablisherFrame, CONTEXT* ContextRecord, DISPATCHER_CONTEXT* DispatcherContext); |
Within an exception handler, the stack is set up so that a simple RET (without operands) will take
you back to the operating system's NTDLL.ExecuteHandler(), not the caller of the function we were
in that generated the exception.
Essentially the stack is in a different state than it was prior to the exception.
Regardless of its proximity to where the exception was generated, an exception handler will be running in a different stack context
which allow the operating system's SEH semantics to kick in. This includes providing the handler with all sorts of information
about the exception and even giving it a choice to "fix-things" and resume execution where the exception occurred.
In other words, the operating system pushed a bunch of stuff on the stack after the exception and our original function return
address is no longer aligned with the caller of our function.
Since our function's primary purpose is to set a flag that an exception was hit and return to our caller,
how can we safely break out of the handler?
A Win32 exception handler must handle an exception in one of three ways, two of which require the handler returning a value through EAX back to the operating system's NTDLL.ExecuteHandler():
- Return 0 (
ExceptionContinueExecution ) This tells the OS to retry executing the excepting instruction (with or without modifications to the passed
CONTEXT structure)
- Return 1 (
ExceptionContinueSearch ) This tells the OS "I'm not handling that exception, try the next handler in the chain"
- DON'T RETURN
Just keep executing from the handler, usually to terminate the process
|
It is possible to accomplish our goal with choice #1 by resuming execution at a different branch within the
function (requires modification of the
CONTEXT structure),
however this method has already had a lot of dicussion and isn't as lightweight. We definitely don't want choice #2 because
our code will lose control usually resulting in process termination. For our purposes, we want choice #3 so
you can see how to manually clean up all the stuff the operating system placed on the stack
and avoid returning back to the operating system. There may be other names for unwinding the stack inside an
exception handler, but I call it swallowing the exception.
Although this method is perfectly safe and super-elegant, its not exactly going to be endorsed by Microsoft.
Besides the unlikelihood of changing between versions of Windows, other side effects include smaller code and
faster execution; what's not to love?!
Swallowing an exception from the context of the handler requires a minimum of restoring the original value of
the ESP register. As long as you don't make assumptions about the contents of the other registers, you should be able to
pick up right where your function left off. Let's first discuss at least 3 reliable ways to restore ESP:
-
GRAB IT FROM THE CONTEXT STRUCTURE:
The handler can access a CONTEXT
structure the operating system placed on the stack as argument #3.
This structure contains the original register values as they existed prior to the exception.
The CONTEXT pointer will be at [ESP+0x0C] and the original value of the ESP register is at offset 0xC4 into this structure.
-
MARK THE STACK WITH A SENTINEL VALUE:
PUSH any "unnaturally" occuring DWORD pattern (e.g.: 0xBAADBEEF, 0xBADC0FEE, etc.) on the stack after
the local exception handler frame has been established. The exception handler can then unwind the stack
by POPing values off in a loop until the sentinel is encountered.
-
REFERENCE THE CURRENT EXCEPTION-HANDLER FRAME:
If we are in the handler for a "leaf" function (a function that doesn't call any other functions) we can
take advantage of the fact that FS:[0] will always point at the frame that belongs to the current
handler. A side effect of this is that FS:[0] also happens to be the value
of ESP after the exception frame was established, which is usually the value of ESP we want
to restore.
Avoid this method if your function calls other functions within your established exception frame as
these functions could set up their own frames and defer back to your handler. In other words, if your
handler could ever catch a nested function's exception because the nested function's handler chose not
to handle it, your function will get the nested function's stack context and you'll surely crash.
|
I'm going with choice #3 because it results in the smallest most elegant solution for our simple function.
We can avoid hardcoding any values or structure offsets and also serves to teach
another point about how the operating system calls into your exception handler which we'll get in to below.
Referring back to address 0x111245 in the code shown above, notice the instruction that installed our exception
handler frame:
MOV FS:[0],ESP
.
Since that was the last thing we placed on the stack prior
to the exception, FS:[0] will still contain the value of ESP we need to restore.
Therefore, all we need is:
However there is one catch to this method that illustrates another important point about how the operating
system calls exception handlers. Just prior to invoking your handler, NTDLL.ExecuteHandler() will have installed
yet another exception handling frame in front of your frame to catch what is known as a nested exception.
This handler exists to prevent an infinite loop should an exception occur within your exception handler.
Therefore for our code to work, we must dereference the current frame's "previous" pointer to get back to our
frame which will now be second in the chain. The code should instead look like this:
MOV ESP, FS:[0] ;ESP = OS-provided nested exception handler frame
MOV ESP, [ESP] ;ESP = our frame
|
Note that EBP will still not be within the context of our function but since this simple function doesn't use EBP after the exception,
we'll just allow the caller's EBP to get restored as it normally does in the function's epilogue (cleanup-code).
More complex functions that need EBP to access variables after the exception handling block completes might just restore EBP from
a saved position on the stack after both ESP is restored and the exception frame is removed. Or you might just
simplify your life by pulling both EBP and ESP from the
CONTEXT structure to kill two birds with one stone
and don't bother referencing FS:[0].
Incorporating the changes above, the fixed version of our function is now 63 bytes and it can fully encapsulate
an exception while maintaining stack integrity:
00AF1270 55 PUSH EBP ;set up function's stack frame
00AF1271 8BEC MOV EBP, ESP
00AF1273 E8 00000000 CALL $+5 (next_instruction) ;get EIP of next instruction (after 5-byte CALL) - MASM-syntax
00AF1278 58 POP EAX ;EAX now equals whatever memory location THIS instruction is loaded at
00AF1279 83C0 1C ADD EAX, 01Ch ;apply relative offset so EAX now points to our exception-handler entry point
00AF127C 50 PUSH EAX ;build EXCEPTION_REGISTRATION structure on stack, first with our handler
00AF127D 64:FF35 00000000 PUSH DWORD PTR FS:[0] ; followed by previous handler in chain
00AF1284 64:8925 00000000 MOV DWORD PTR FS:[0], ESP ;install our exception_frame to be first in chain
00AF128B 8B45 08 MOV EAX, DWORD PTR SS:[EBP+8] ;eax = pointer passed as argument #1
00AF128E 8B00 MOV EAX, DWORD PTR DS:[EAX] ;eax = *eax / dereference pointer - can we read DWORD memory?
00AF1290 33C0 XOR EAX, EAX ;if we got here, no exception occurred, memory is readable, return zero (eax=0)
00AF1292 EB 0D JMP SHORT CLEANUP_label (0000000Fh) ;skip past exception_handler code
00AF1294 33C0 XOR EAX, EAX ;exception_handler entry point
00AF1296 40 INC EAX ;return 1
00AF1297 64:8B25 00000000 MOV ESP, DWORD PTR FS:[0] ; **** SWALLOW EXCEPTION BY ****
00AF129E 8B2424 MOV ESP, DWORD PTR SS:[ESP] ; **** RESTORING ESP ****
00AF12A1 64:8F05 00000000 POP DWORD PTR FS:[0] ;CLEANUP_label - restore previous exception handler
00AF12A8 83C4 04 ADD ESP, 4 ;remove our exception_handler from stack
00AF12AB 5D POP EBP ;restore caller's stack frame
00AF12AC C2 0400 RETN 4 ;remove STDCALL argument and return from function |
The code above is also a little larger because we've changed the 3rd instruction from pushing a hardcoded
exception handler address to dynamically-calculating the address relative to where these instructions
are currently executing in memory. Most shellcode that needs to reference
itself in a portable manner must employ some technique to find where it is loaded into memory.
So now we have a shellcode function that will safely indicate whether a particular memory address is accessible.
We'll use this function as a building-block for
Part Two
of this series where we search memory for the location of KERNEL32.DLL.
If the module, and not the shellcode itself, that executes the exception-swallowing technique described here,
was linked using the Visual Studio 7.1 (2003) or later linker without the /SAFESEH:NO option, AND the linker determines that all statically linked libraries
are compatible with SAFESEH
(MASM code requires a SafeSEH opt-in using the .safeseh directive),
the resulting PE Module will be created with an IMAGE_LOAD_CONFIG_DIRECTORY. Within this directory is a list of pre-registered exception handlers
that effectively enable the
SafeSEH security mechanism. SafeSEH is designed to prevent code from
creating exception handlers on the fly. Specifically, instructions that assign a value to the top-level exception frame fs:[0] are ignored as if they
are NOPs for pointers not pre-registered within the list of safe exception handlers. The result is that our intentional access violations cannot be swallowed
and will go to the default exception handler causing the thread will crash. This doesn't affect the shellcode itself or the sample from working since we didn't opt-in,
however it affects any modules that run the injected code. If SafeSEH is enabled in the module, the shellcode will cause the module to crash, otherwise it will work. You can
check for the presence of a module's IMAGE_LOAD_CONFIG_DIRECTORY using the -t option to the
pelook tool.
It should be noted that zero'ing out this directory entry is all that is needed to disable SafeSEH and allow this technique to work once again.
Also, if any VEH handlers (Vectored Exception Handling) were previously registered by the process,
note that a VEH chain receives exception notifications before the thread's SEH chain.
Available in Windows XP and up, VEH is primarily used for debugging purposes.
How a VEH handler chooses to react to an exception may prevent a normal exception handler from gaining control.
Please refer to
Matt Pietrek's VEH article or
MSDN
for more information.
A final note worth mentioning is that while swallowing exceptions is perfect for exceptions occurring in
small "leaf" functions, you probably shouldn't do this when your exception handler could catch
exceptions originating from nested function's exception handlers who chose not to handle them.
Swallowing these exceptions short-curcuits the operating system's normal unwind semantics
that take place after a handler volunteers to handle the exception by returning back to NTDLL.ExecuteHandler().
Because we don't return, we prevent a nested handler's cleanup code from executing (destructors and __finally blocks).
<END OF ARTICLE>
Questions or Comments?