Know your Malware – A Beginner’s Guide to Encoding Techniques Used to Obfuscate Malware
With the launch of Wordfence CLI, our high performance security scanner that can detect the vast majority of PHP malware targeting WordPress, Wordfence continues to emphasize the importance of malware detection and remediation. Malware targeting WordPress uses a variety of obfuscation techniques to avoid detection, and today’s post dives into some of the most common built-in PHP functionality malware often makes use of in order to do this.
What is Obfuscation?
Obfuscation is the process of concealing the purpose or functionality of code or data so that it evades detection and is more difficult for a human or security software to analyze, but still fulfills its intended purpose.
Obfuscation makes use of various types of encoding techniques, but is not exactly the same thing as encoding. There are countless legitimate uses for encoding data, including saving space through compression, transmitting data over a network, and packaging code so that it can be easily interpreted by programs in an expected format. Meanwhile obfuscation is intentionally designed to prevent understanding and detection by humans and security software.
Obfuscation is also different from encryption in that it can typically be reversed without a “key”, though there are some encoding techniques, such as XOR encoding, which do use keys and are used in both encryption and obfuscation.
Since obfuscation often relies heavily on encoding techniques, It’s important to understand what these techniques look like, their typical legitimate use cases, and signs that they’re being used to hide something potentially malicious. In today’s article, we will cover some of the most commonly used encoding techniques, and teach you how to spot legitimate uses as well as potentially suspicious patterns.
What is Base64 encoding?
Base64 encoding is widely used to send and store data. If you’ve ever played with Linux and tried to look at an executable file using the
cat command, you might have noticed that your terminal starts acting very strangely. This is because binary data includes an enormous number of potential byte sequences, and software that’s not designed to interpret a particular file format can incorrectly interpret some of these sequences as commands.
Base64 encoding allows any data, including binary data, to be stored and transmitted as text which makes it very convenient for programs to talk to one another without being misunderstood, especially over a network.
It uses 26 lower-case letters, 26 upper-case letters, the digits 0-9, and the ‘+’ and ‘/’ symbols for a total of 64 characters, plus ‘=’ for padding.
Note that, unlike the Base 8(Octal) and Base 16(Hexadecimal) encodings we’ll cover later, base64 is not a direct representation of the underlying bytes. Instead, it converts their octal representations to Base 10(Decimal) and then uses a lookup table to assign a character value. You can find out more about this process in the Wikipedia article on Base64 encoding.
How is Base64 Encoding Used Legitimately?
You’ve likely seen base64 encoded data in the past, and it’s very easy to spot – for instance,
SGVsbG8sIFdvcmxkIQ== decodes to “Hello, World!” and you can run the code snippet:
This means that any attacker that knows the value of
$odqwv can thus send commands to the file that have already been XORd against that value, which will then be reversed and executed.
In this example,
$odqwv is the XORd value of
trhvvbuzeadricgobq which turns out to be “base64_decode.” You can find this value by creating a simple one liner
which prints the value. In this case
$odqwv is the literal string “base64_decode” but this is simply used as a key and does not refer to the built-in function itself.
The value in
$_COOKIE[“dj”] is then XORd against the
$odqwv key, which is ‘base64_decode’, and the result is called as a function, with similar steps occurring throughout the rest of the code.
Putting it All Together
Most obfuscated malware uses a combination of these techniques to hide its functionality, and combined techniques are one of the clearest indications of malicious activity. For example, take the following code:
If supplied with the correct
$xor_key, it will output “Hello, World!”.
Let’s take a look at how we did this:
First, we took the code ‘echo “Hello, World!”;’ and XOR-encoded it with a key value of ‘K’, resulting in the output
We then ran it through the
gzdeflate function, which results in a binary output that can’t be rendered here, but after base64-encoding that output it turns into
If you placed the code in a
hello.php file on your site and accessed it, you’d get a blank screen unless you sent a request to
/hello.php?k=K, which would output “Hello, World!”.
While this example only outputs “Hello, World!” when it is passed the right key, it is trivial to disguise any PHP code in this manner, including destructive code that adds malicious administrators, creates additional malicious files, or alters system settings.
In today’s article, we discussed the most commonly used encoding techniques in PHP, their legitimate applications, and how malicious code uses them to obfuscate its purpose and intent. While obfuscation is an arms race, the Wordfence scanner and Wordfence CLI both use our incredibly effective malware detection signatures and are able to detect the vast majority of obfuscated malware targeting WordPress. A large part of why this is possible is due to our expertise and deep understanding of these encoding techniques and which combinations of encoding tend to indicate malicious behavior. Our experienced security analysts are continuously writing new signatures to improve our detection capabilities.
In a future article, we’ll cover more advanced obfuscation techniques that rely on other properties and quirks of PHP, but it’s necessary to understand basic encoding methods first because of how frequently they’re used, even when they’re not the primary method of obfuscation.
We encourage readers who want to learn more about this to experiment with the various code snippets we have presented. More advanced readers may wish to review public malware repositories in order to better learn to spot these indicators, but be sure to be careful with any actual malware samples you find and only execute them in a virtual environment, as even PHP malware can be used for local privilege escalation on vulnerable machines.
For security researchers looking to disclose vulnerabilities responsibly and obtain a CVE ID, you can submit your findings to Wordfence Intelligence and potentially earn a spot on our leaderboard.