How to Optimize Data Sources: Collection Management Framework

What if I told you there was a system that allowed you to structure your intelligence gathering, manage your data sources, and guide your team to answers during investigations… would you believe me? Well, let me introduce you to the idea of a Collection Management Framework – a structured approach to organizing your data.

This article details what a Collection Management Framework is and the major benefits it can provide your entire security team, from incident responders to threat intelligence analysts. You will then learn how to create your own using a five-phase process. This process will allow you to develop a Collection Management Framework, comprehensively assess it, and continuously improve it. 

Let’s get started building a system to optimize how you use your data sources!


What is a Collection Management Framework?

A Collection Management Framework (CMF) is a tool for identifying data sources available to you and what information you can get from them. It structures your data sources and answers what questions you can ask of them, including where to ask these questions and how far back you can ask them. 

Data Source

A data source is any system, tool, or platform from which you can gather information. This could be external information from a threat intelligence feed about atomic indicators (e.g., malware samples, domain names, or IP addresses) or threat actors (e.g., TTPs, targets, history). But can also be internal information generated by a security tool like a SIEM or EDR. Anything that you reliably query for information is a data source.

A CMF acts as a guide for intelligence analysts, incident responders, and other defenders when they need to fulfill an intelligence requirement. For instance, if an incident responder needs to quickly determine if a piece of software is malicious, they can consult their organization’s CMF. They can find out what data sources can answer if a piece of software is malicious, what questions they can ask,  and how accurate, trustworthy, or complete the information will likely be. 

CMFs can include internal or external data sources, as shown below.

What you include in your CMF will depend on how you intend to use it, what intelligence requirements you want the CMF to fulfill, and what data is available. There is no one size fits. Instead, you must create a CMF that matches your team’s requirements and ensure analysts can use it effectively.

Analysts who do not understand their data source collection on a technical level cannot satisfy intelligence requirements against it or understand the limitations of their sources.


Why Use a Collection Management Framework?

But why bother using a CMF? What tangible benefits do they provide besides getting more use out of Microsoft Excel? 

The main advantage a CMF provides is structure. These frameworks allow defenders to effectively identify and structure their data sources, allowing them to be well-prepared and respond effectively to cyber threats. They know what data is available to them, what information they can produce using it, and how to generate it efficiently to satisfy an intelligence requirement during an investigation. 

Another advantage of using a CMF is gap analysis. A CMF will allow you to identify what data is available, as well as what you are lacking and need to obtain to fulfill your intelligence requirements. It may show that you have a data source you can query for answers about a domain name, but it may also highlight that you cannot ask any questions about an SSL certificate and need to find a new data source for these questions. 

Ultimately, this gap analysis allows you to assess your preparedness for various threat scenarios and perform strategic planning to fill gaps. This may involve investing in people, processes, or technology if a gap needs to be filled or determining a gap is not a priority based on your threat model. Again, your CMF will be unique to your organization, its intelligence requirements, and the questions it wants answered. 

Key Benefits of a Collection Management Framework

  • Structure: Allows you to structure your data sources and what questions you can ask of them.
  • Efficiency: Improves efficiency of investigations by providing a structured and centralized source to help analysts answer questions.
  • Gap analysis: Allows you to identify gaps in data sources, what questions you cannot ask, and limitations regarding intelligence requirements that cannot fulfilled due to a lack of data.
  • Strategic planning: It can identify areas that need improvement, investment, or prioritization.
  • Metric for preparedness: Helps measure how prepared you are for various threat scenarios based on how well you can answer questions during an investigation or incident.

So, how should you go about creating a CMF? Let’s take a look.


Developing a Collection Management Framework

Despite each organization requiring a different CMF tailored to their individual needs, the core questions that drive a CMF remain the same:

  • What data is collected and from where? 
  • How long is the data stored? 
  • What types of questions can the data answer? 

To answer these questions and develop a CMF, you can follow this excellent Dragos whitepaper outlining the five phases required to create and improve a CMF. These phases follow a lifecycle that ensures your CMF is continually updated and improved. Let’s take a look at each phase. 

Developing a Collection Management Framework

Phase 1: Develop New Requirements

To create a CMF, you must know what requirements it aims to fulfill and what questions it should help you answer. This can be difficult. Many organizations struggle to create these initial requirements because they fail to understand their business’s risks fully. 

Creating good intelligence requirements is discussed in length in the What Are Intelligence Requirements? A Comprehensive Guide. This article will show you a three-step method for building effective intelligence requirements that satisfy business needs.

To form your initial requirements, here are several methods you can use:

  • Interview business owners: Understanding the challenges key stakeholders face will help you identify requirements specific to your organization.
  • Perform tabletop exercises: These are risk-based scenarios where you “play out” how a real-world incident would unfold and how your organization would respond. This helps your incident response team identify what data is available and what would be required to effectively triage and investigate an incident. This enables you to fill out your CMF and find any gaps.
  • Consult existing documentation: Risk registers, asset inventories, and incident response plans can provide enough information to begin building your initial requirements.
  • Use crown jewel analysis: This approach assesses what assets will have the most business impact if compromised. It allows you to spot assets, networks, or environments critical to business processes and must be protected at all costs. Identifying these assets can then help you create requirements for protecting them. Learn more in this guide.
  • Threat model: A threat model identifies threats likely to impact your organization based on industry, demographics, threat landscape, tradecraft, technology, and other factors. Ensure your threat modeling considers past threats and current or emerging tradecraft (attacker behavior). A good way to do this is through threat profiling
Example Requirements
  • Do I have a log source to tell me if our servers are exposed to the Log4j vulnerability
  • Do I know if large amounts of data are being exfiltrated from my production network? 
  • Can I tell if C2 communication is happening inside my DMZ?

These methods should produce a list of requirements that you can feed into the next phase of the CMF development cycle, where you find what information is available to fulfill these requirements.

Requirements will change over time as new threats emerge, feedback comes in, and your business’s priorities shift. This is good. However, you will often need to define triggers that ensure your requirements are up-to-date and accurate, such as feedback from incident response activities, threat hunts, and the Threat Intelligence Lifecycle. These create or refine requirements based on your organization’s real-world experiences.

Phase 2: Develop a Collection Plan

Phase 2 involves matching your requirements list against your available data sources. For example, if you’re assessing if Log4j impacts you, you will need to find data sources that tell you what servers you have running, what software they have installed, if they use any third-party software that could have been impacted, what a successful exploit would look like, and how long all this data is kept for.

Your collection plan can take any form suitable to your team. It could be a full knowledge base system like Notion that links to resources or a simple Excel spreadsheet that points to where an analyst can find data. Here is a basic spreadsheet you could use to assess the impact of Log4j.

This CMF uses the Cyber Kill Chain to structure the questions you can ask. For instance, the Windows Event Logs can be used to answer questions about Exploitation, Installation, and Actions on Objectives. You could query these to find out if you have seen a successful exploitation of the Log4j vulnerability within the last 60 days.

This is just one way of structuring your CMF. You could use external data sources to answer requirements, just like you saw with the malware CMF before, or perhaps even populate the CMF with different questions, such as those focusing on the MITRE ATT&CK matrix. These questions can help you align your prevention, detection, and response capabilities with the TTPs used by threat actors. 

Ultimately, how you structure your CMF will depend on your organization’s requirements, your team’s capabilities, and your use cases.

If your CMF is becoming too big and hard to use, try splitting it into different CMFs that focus on specific use cases. For example, one is for incident response, another is for threat hunting, and a third is for querying external data sources.

Phase 3: Implement

At this phase, you have a list of requirements and have identified available data sources to help you fulfill these requirements. Now, you can focus on creating new procedures that use your CMF and identify new data sources that can be leveraged. 

As you begin using your CMF, you will soon find opportunities to improve existing data sources, such as centralizing logging, turning configurations, and lengthening data retention. You will discover ways to streamline your existing data sources and make it easier for your team to perform their daily operations, preventing bottlenecks and reducing overhead. 

Additionally, you may identify new capabilities that data sources offer. This could include advanced logging options that allow you to build better detections or threat hunting queries and highlight opportunities for pivoting or follow-on data collection. Documenting your data sources, their connections, and what questions they can answer allows you to pivot between data sets and find relevant intelligence quickly. 

The Implement phase will also help identify intelligence gaps that cannot be filled with your current data sources and require new data sources. This could be simple changes like enabling a Group Policy setting in Microsoft Active Directory to gather PowerShell logs or something more complex like collecting NetFlow data to get better visibility of your network traffic. Gathering this information will help you prioritize future security investments and outline to management what your current security program can and cannot protect against. 

Phase 4: Test

With your operationalized data sources and new procedures created, you can now test them to ensure they are effective and fit for purpose. You can use two metrics to test the completeness of your CMF at this phase: quantity and quality of coverage:

  • Quantity of coverage: A complete CMF will allow you to accurately assess how much of your IT estate you have visibility over and where additional resources are required for protection and detection against threats.
  • Quality of coverage: You must also assess the quality of data available to analysts. How devices are configured will affect the retention time, completeness, and overall quality of the log data available. Your CMF needs to consider this, as it can limit the questions that can be asked.

The Test phase may lead to prioritizing additional security controls, extending your visibility, and narrowing down the questions your CMF can answer based on organizational constraints (e.g., budget). 

Phase 5: Update the Collection Plan

Your collection requirements and collection plan will grow. New requirements will be added as the threat landscape shifts, and new data sources will be added as your organization reprioritizes. If you don’t prune your CMF by continuing to review your requirements and data sources, it will become an unusable mess. You must schedule time for this review and create procedures that share any changes with all teams who use your CMF. 


Practical Recommendations For Creating a Collection Management Framework

A Collection Management Framework is a great tool for structuring your data collection and empowering analysts to fulfill intelligence requirements rapidly. However, it can be difficult to create one from scratch. Here are some practical recommendations that should keep you on track and ensure your CMF is effective:

  • Split up monolith CMFs: If your CMF is becoming too big and hard to use, split it into multiple CMFs focusing on specific use cases.
  • Schedule time to review your CMF: Having a CMF is just the start. You must keep it up-to-date, add new data sources when they become available, and prune it to save it from becoming a mess. Scheduling time to do this will ensure your CMF is not left unkept and forgotten.
  • Keep it simple: Don’t go overboard by making your CMF too detailed. A CMF is supposed to act as a guide that helps analysts answer questions and fulfill requirements. It is not a comprehensive knowledge base.
  • Focus on coverage, then quality: Aim to have visibility of every asset in your environment, be it endpoint devices, servers, cloud assets, network switches, or anything else your organization owns. Getting to 100% coverage is difficult, and it will take time. Once you get there, you can gradually improve the quality of the data you obtain from your assets, which will likely require additional buy-in and investment. 
  • Look for new ways of using data sources: Fully investigate what each of your data sources can provide you. You will be amazed at how much more data you can get from fine-tuning configuration settings or turning on options you didn’t know were available. Always read the manual. 

Conclusion

Congrats, you made it to the end! Now you know what a Collection Management Framework is and some of the benefits of having one. That said, you need to create one to see all the benefits.  Get your hands dirty, follow the five-phase process outlined in this article, and keep the practical recommendations in mind as you build one. 

A Collection Management Framework is a fundamental pillar for any security operations or threat intelligence team. They ensure you structure your data sources to improve the efficiency of your investigation, identify gaps in coverage, and measure your preparedness for a cyber attack. Not using one is like going down a dark alley with only a flashlight. 

Turn on the lights and start discovering what you can defend against, what can be improved, and what you must invest in to ensure your organization is protected!


Frequently Asked Questions

What Are the Components of a Collection Management Framework?

A Collection Management Framework (CMF) should be unique to your organization and its requirements. This means that the components you include will be specific to you. In general, a CMF will consist of the following:

  • Data sources: The data being collected and where it is being collected from.
  • Questions: The types of questions the data can answer.
  • Storage location and duration: Where an analyst can find the data (or query the data from) and how long this data is stored.
Who Is Responsible for Maintaining the Collection Management Framework?

A CMF is generally the responsibility of a team lead. They are ultimately responsible for maintaining and improving a CMF over its life. However, all analysts should aim to contribute new data sources to the CMF that enhance its data completeness, trustworthiness, or accuracy. This may include optimizing current data sources or adding new ones to help analysts during an investigation. It should be a team effort to improve data collection.

What Are the Types of Collection Management Frameworks?

There are two main types of CMFs: internal and external. Internal CMFs will include data sources that map to the organization’s internal assets, such as endpoints, servers, networks, firewalls, etc.

External CMFs will include data sources outside the organization that you pay to access or are open source and freely available. These are commonly threat intelligence feeds, malware repositories, or network data (e.g., DNS records, SSL certificates, domain reputation data, etc.)