Semantic Jargon Export Obfuscation
What It Does
A PE export table is populated with hundreds of plausible-sounding function names drawn from unrelated technical domains (machine learning, networking, game engines, DevOps). The names are syntactically valid and semantically coherent within their domains, but all resolve to a handful of tiny ret-only stubs. This creates a veneer of a large, legitimate software project, drowning real functionality in noise and frustrating signature-based detection.
Detection / Fingerprint
- Export count > 400 with unique RVA count < 25 (high name-to-body ratio)
- Names are grammatically consistent English compound words drawn from 2–4 technical domain vocabularies mixed together (e.g.,
BackoffExtrapolate,CorruptTurbulence,TokenizeDrag,CrossEntropyRevokePinch) - No meaningful cross-references from the stub bodies — each stub is
push rbp; mov rbp,rsp; retor shorter - High-entropy
.datasection suggests actual payload is separate from the export façade
Implementation Patterns Observed
In the sunwukong sample:
- 503 exported names mapped to ~21 unique RVAs in the
.textsection ^[sample fa16b64a/pefile.txt:338-500] - Names generated by combining a vocabulary of ~100 base terms (e.g.,
Backoff,CrossEntropy,Tokenize,Drag,Perplexity,Turbulence) with random pairing - The export directory is large (0x3475 bytes) and sits in
.rdata, consuming a notable portion of initialized data
Reproduce on Your Own VMs
Toolchain: Python 3 + Visual Studio 2022 (MSVC 14.50) or mingw-w64.
- Generate a vocabulary file (
vocab.txt) with ~100 technical terms from ML, networking, game dev, and misc domains. - Use a Python script to produce 500+ unique compound names:
import random, itertools first = ["Backoff","Perplexity","CrossEntropy","Tokenize","Corrupt","Gradient","Bandwidth","PacketLoss"] second = ["Extrapolate","Turbulence","Drag","RevokePinch","Shrink","Activate","Bounce","Reshard"] names = [f"{a}{b}" for a,b in itertools.product(first, second)] - Create a minimal DLL project in C with an exports
.deffile mapping all names to a single_Stubfunction:__declspec(dllexport) void __stdcall _Stub(void) { __asm { ret } } - Build with
/SUBSYSTEM:WINDOWS, link with.deffile. Inspect withdumpbin /exports. - Observe the export table in
pefile.pyorrabin2 -iEshowing 500+ names and very few unique RVAs.
Verification step: Scan your reproducer with yara using the rule from the sunwukong analysis. Should match on export count and name set overlap.
Defensive Countermeasures
- Sigma / EDR: Alert on PE images with export count > 300 and unique-RVA-to-name ratio < 0.05.
- Hunt query:
| from PEExports | where ExportCount > 300 and len(set(RVA)) < 30 - YARA: Combine
pe.number_of_exportswith string sets of known semantic-jargon vocabulary hits.
Pages Where Observed
- sunwukong — malware family employing this technique
- hippamsascom — Littel LLC / "wireless sensor" sibling (9a3c18be) displaying identical export flooding pattern
- /intel/analyses/fa16b64ae95d6492be2074e65a0d6eae3ddb8adb9706f41f1fb0ad71c50aa7ce.html — primary analysis