Data collection is integral to cyber threat intelligence, making your threat intelligence collection sources one of your program’s most important components. Failure to have strong intelligence collection sources will cloud your visibility of threats and prevent you from generating accurate intelligence that bolsters your organization’s cyber defenses.

This guide will teach you what intelligence collection sources are by breaking down the differences between closed and open, technical and human, and internal and external sources. It will then showcase what you can use as a collection source and the potential benefits and drawbacks.

It is vital that you define your intelligence collection sources and streamline the collection process so you and your team can effectively collect, analyze, and disseminate actionable intelligence. Let’s get started learning how!

What are Threat Intelligence Collection Sources?

In cyber threat intelligence (CTI), you structure your work around the six stages of the threat intelligence lifecycle: planning, collection, processing, analysis, dissemination, and feedback. Following this lifecycle allows you to fulfill the intelligence requirements you are tasked with completing for your organization or individual teams.

For instance, an intelligence requirement could include gathering tactics, techniques, and procedures (TTPs) that a specific threat actor has used during the last six months. This would allow the security operations team to prioritize what security controls they invest their time and effort bolstering.

To fulfill intelligence requirements, you need to collect data. This is the second stage of the threat intelligence lifecycle: collection. At this stage, you gather information that addresses the intelligence requirements you defined in the planning stage by identifying relevant data sources from which to collect information.

A data source is a channel, system, or repository that you can use to gather information about cyber security threats relevant to your organization. Thus, collection sources are simply data sources you can reuse to gather new information repeatedly.

In the intelligence community, there are various collection sources, such as human intelligence (HUMINT), geospatial intelligence (GEOINT), measurement and signature intelligence (MASINT), signals intelligence (SIGINT), open-source intelligence (OSINT), and more. However, in CTI, you can simplify collection sources into two spectrums:

Technical vs. human: Is the information you are collecting from a technical control (e.g., logs) or a human operator (e.g., a person online)?
Open vs. closed: Is the information freely available or behind an access restriction (paywall)?

Types of Cyber Threat Intelligence Collection Sources

Technical Intelligence

This includes all the technical aspects of cyber threats, such as malware, exploits, hacking tools, attacker infrastructure (domains and IP addresses), and TTPs. You will commonly collect this from external CTI feeds (commercial and open source) and your own internal log data.

Human Intelligence

This is information collected through direct human interactions and observations. In CTI, this usually involves interacting with threat actors and affiliates on private messaging groups (e.g., Telegram) and monitoring the dark web, social media, or data breach websites.

Open Source

Open intelligence sources are those that you can collect data from freely. They can be internal (your own log data) or external (open-source CTI feeds).

Closed Source

These are proprietary or restricted sources that are either behind a paywall or have restrictions on who can access them. They include commercial CTI feeds and private messaging platforms like Telegram or Signal.

Now that you know the basics of intelligence collection sources, let’s examine specific examples and how to use them.

Threat Intelligence Collection Sources

At a high level, collection sources can be technical or human and open or closed. Delving further into the specifics, collection sources can be further grouped into two main categories.

Internal Collection Sources: They contain information specific to your organization.
External Collection Sources: They can contain information about a threat.

Cyber Threat Intelligence Collection Sources

Each category of collection source is valuable to an organization, and you will use both to gather the information required to complete the intelligence lifecycle and fulfill intelligence requirements. Let’s start by looking at internal intelligence collection sources.

Internal Intelligence Collection Sources

Internal collection sources are resources within an organization that provide information about threats. This information could come from security incidents raised by your detection tools (SIEM, EDR, etc.), threat hunts you perform as you sift through logs, or anomalies spotted by User and Entity Behavior Analytics (UEBA) systems (or a clever human analyst).

As threat intelligence goes, this is the best type of information to gather because it is the most specific to your organization. It is relevant, timely, actionable, and gold dust for CTI analysts. Unfortunately, a security incident must be allowed to play out for this data to be collected through intrusion analysis.

If your security operations team blocks a threat before it can perform any malicious actions, the information you can gather about its capabilities or the infrastructure it uses is limited, as is the intelligence you can generate. This leads to an ongoing battle between security and CTI analysts about how long to wait before neutralizing a threat.

The security team wants to block and eradicate it instantly, while the CTI team wants to let it play out and gather as much information as possible. Usually, the security team wins because most businesses are risk-averse and want to take the easy win. This is not necessarily a bad strategy. It is just important you communicate to the wider business how this affects the intelligence that can be generated.

One way to overcome the limitations of requiring real security incidents to produce internal data is by using honeypots. These decoy systems are designed to attract, detect, and analyze threats by mimicking your organization’s systems and capturing the TTPs used to attack them. They are very valuable tools for CTI teams.

With the advantages and limitations in mind, here are the main types of internal intelligence collection sources from which you can gather information.

Malware

Malware is a general term for any malicious software you come across. This could be a Word document containing a malicious macro, ransomware, or a post-exploitation tool like Rubeus. It will be one of your major sources of data collection as most intrusions will use some form of malware, either off-the-shelf malware like Cobalt Strike or custom-made, to achieve their objectives.

Many intrusions don’t involve adversaries developing custom malware or hacking tools. Instead, threat actors will use off-the-shelf, commodity malware because building custom malware and tools requires time, skill, and money. This makes the development of custom malware expensive for adversaries.

Public threat intelligence reports often focus heavily on malware. They discuss where they found the malware, what attack campaign it relates to, and how it works. At the end of the report, they include a list of hash values you can use as IOCs to hunt for in your environment to see if you have been affected.

This is okay, but the real value of malware as a data source comes from analyzing what you have found in your environment. By analyzing malware, you can gain valuable insights into TTPs used by attackers that are specific to your environment. TTPs/malware map to the “capability” of a threat actor (using the Diamond Model) and can be used to track an adversary, measure their maturity, and prioritize your cyber defenses.

Diamond Model With Capability Highlighted

That said, mapping an adversary directly to the malware they use is dangerous for two reasons:

Not all adversary capabilities are malware and not all threats will use it.
If you focus on tracking adversaries and their capabilities, you don’t want to fall into the trap of tracking malware instead. Threat actors will use a variety of malware that changes based on each operation (e.g. who they are targeting, how they are doing it, etc.). Tracking malware tracks the malware author and not the user.

From a technical perspective, you may want to track commodity malware being used and its capabilities. This can help ensure you have the correct technical controls in place as the malware author adds new functionality to their malware, protecting you against various actors. Just ensure you know what you are tracking, and don’t confuse the malware with the threat actor.

Here are the use cases for malware as an intelligence collection source:

You can use malware to pivot to new data sources (network indicators) and identify commonalities between intrusions. This is great for finding additional intelligence and IOCs and tracking adversaries or attack campaigns.
Malware can be used at various stages of the Pyramid of Pain. You can use it to search for hash values quickly, identify artifacts left behind after it is executed for threat hunting, or carefully dissect it and identify its behavior to write detection rules.
Malware zoos are collections of malware samples uploaded to online sandboxes for analysis. These are fertile hunting grounds you can use to collect data about adversaries and build possible detections. For instance, if you identify actor XYZ using the latest version of PlugX, you can search for samples of this malware, download and analyze them, and then write rules to detect it in your environment.
You can parse the configuration data of malware you find in your environment to reveal information about the adversary. This is particularly useful when commodity malware is used to understand why a certain configuration was used and if you can track this decision across intrusions, helping you track the adversary.

Malware zoos are the most common method to collect and hunt malware samples using YARA rules. They are a great external data source that you can use if you are limited by the malware you can find in your environment.

However, be careful when uploading your own malware to these sites as you may tip off an adversary or accidentally upload comprising data. Popular zoos include VirusTotal, Malware Bazaar, and theZoo.

Domains and IPs

Domains and Internet Protocol (IP) addresses are the second most common intelligence collection source. These network indicators map to an adversary’s infrastructure on the Diamond Model.

An adversary will use domains and IP addresses for two main reasons:

To establish a command-and-control (C2) channel with the malware they deployed. This is commonly referred to as a C2 server.
To host malware or phishing sites.

These network indicators are as prominent as malware during a cyber attack in today’s interconnected world. Every attacker needs infrastructure to target a victim.

Diamond Model With Infrastructure Highlighted

Understanding how to collect and use domains and IPs effectively is crucial for your threat intelligence team. They can offer many details about a threat actor, including aliases (who registered the domain), geolocation data, associated malware or attack campaigns, and additional network indicators (TLS certificates, IP addresses, URLs, user agents, etc.) to hunt for.

Each of these is a new data point you can pivot to using the indicator lifecycle and generate actionable intelligence.

You can gather domains and IP addresses from external data sources, like threat intelligence feeds or CTI reports, but gathering them from your own data sources (e.g., network logs) is most valuable. Adversaries can quickly change their network infrastructure between or during campaigns. Your internal data sources contain information about the specific network infrastructure used to target your organization, making them the most actionable.

Here are the use cases for domains and IPs as an intelligence collection source:

Domains and IP addresses are a great data source to perform threat hunting with. They allow you to identify suspicious or malicious traffic in your logs quickly.
You can use external network indicators to enrich your current data during intrusion analysis or incident response to track an attack’s origins.
Domains and IP addresses make great pivot points allowing you to uncover more data about an adversary and their infrastructure.

External Intelligence Collection Sources

External intelligence collection sources are outside of your organization. They provide insights on threats and vulnerabilities that are not specific to you but have been spotted being used or exploited by threat actors.

These insights include the latest attack techniques, infrastructure being used to perform attacks, and industry trends, to name a few. This information is valuable, and you should use it to protect your organization from emerging or ongoing threats if it is relevant to you. However, the fact that it is reported by a third party limits the timeliness and how actionable the intelligence generated from this data can be.

For example, for an indicator of compromise (IOC) to appear on a threat report, intelligence feed, or the dark web, it must pass through many people. The analysts who find it during an intrusion must report it, the vendor the analyst works for needs to circulate that report, it’s then added to a threat feed, and, finally, you can pick that indicator up.

There are many steps, and at each one, the timeliness and actionability of that indicator (or data) degrades in value to the point where you are left with an indicator that is days, weeks, or even months old. During this time, the threat will likely change as adversaries regularly change their atomic indicators (IP addresses, domain, hash value) and tools to evade detection.

This is not to say external data is useless. It can be very useful to search through your logs for IOCs. You might find something that wasn’t detected by your security tools or was overlooked by your security team. You just need to recognize the limitations of external data and use it as fishing bait or a pivot point rather than a detection rule or key piece of evidence.

Let’s examine the types of external intelligence collection sources you can use to gather data.

Threat Intelligence Feeds

Threat intelligence feeds are the cyber security equivalent of RSS news or social media feeds. They are never-ending lists that provide real-time data about threats and vulnerabilities, usually including a brief description of the threat and its related IOCs.

There are two main types of threat intelligence feeds:

Open-source feeds

These freely available feeds gather data from public news sites, security blogs, social media posts, and public cyber security reports. Also known as open-source intelligence (OSINT), you can use these feeds without restriction. Unfortunately, they tend to be very generic and require effort to filter down for your specific intelligence requirements.

Closed or commercial feeds

These feeds are locked behind a paywall and provided by a threat intelligence vendor or included in an intelligence-sharing community membership, such as an ISAC, CERT, or private group. They tend to be better than open-source feeds because they are more specific to your organization, industry, or intelligence requirements, and the IOCs are timelier.

Threat intelligence feeds are great in theory, and you should use them as an intelligence source. However, it is important to remember that they contain data external to your organization that is often atomic and easily changeable by an adversary. They will allow you to detect generic threats attacking targets on mass but are not specific to you.

To make the most of these feeds, you must take the time to measure their quality. This includes their relevance to your organization, how well they match your intelligence requirements, and the quality of data included (e.g., the number of false positives, how timely the data is, etc.). You should also create a pipeline that verifies the IOCs included in the feed are legitimate and pushes them to your security tools for collection and threat hunting automatically.

This guide teaches you how to build an automated IOC pipeline using Python, MISP, and CrowdStrike Falcon.

Here are the use cases for threat intelligence feeds as an intelligence collection source:

Curated threat intelligence feeds allow you to automate the ingestion, processing, and dissemination of threat intelligence.
You can use multiple feeds focusing on different threats to give a broader picture of the threat landscape.
Threat feeds can be paid or free, giving you plenty of choice about what kind of data you choose to ingest.
You can create your own threat intelligence feeds for ultimate control using news aggregators like Feedly and Inoreader.

TLS Certificates

Transport Layer Security (TLS) certificates are cryptographic protocols that provide secure communication over computer networks. They uniquely tie together an IP address, domain name, and web server and reveal significant information about the infrastructure of malicious actors.

Essential Threat Intelligence Collection Sources You Need to Know

Gathering TLS certificates reveals the following key intelligence:

Certificate Details: Information about the certificate holder, issuing CA, validity period, and cryptographic keys.
Domain and IP Associates: Data linking TLS certificates to specific domains and IP addresses used by threat actors.
Certificate Anomalies: Detection of unusual or suspicious patterns in certificate issuance and usage, such as self-signed certificates or certificates with short validity periods. This can indicate the certificate is being used to host a malicious site.

You can collect TLS certificates from active scans of the Internet or through threat intelligence feeds. TLS certificates are higher up the pyramid of pain than domains and IP addresses, requiring more effort and cost for the adversary to change. It is very common for threat actors to use the same TLS certificate between campaigns or make cryptographic mistakes that reveal operational details.

Tools to scan and gather TLS certificates include Google’s Certificate Transparency Project, CertSpotter, and Censys. You can then use tools like Wireshark and Zeek to capture, compare, and analyze SSL/TLS traffic.

Once your internal data sources and threat intelligence feeds are set up, establishing the collection of TLS certificates is often the next step for most organizations.

Here are the use cases for TLS certificates as an intelligence collection source:

You can use TLS certificate data to identify and block malicious websites and servers before they can cause harm.
TLS certificate data can be used to expand your list of IOCs, widen your threat hunts, and enrich your current data set.
TLS certificates are a great data point to track threat actors and attack campaigns as they are often reused due to the time and cost of reproducing them.

Dark Web Monitoring

Dark web monitoring is a relatively new intelligence collection source. It involves observing forums, marketplaces, and other underground communities not on the surface web for early warnings of potential breaches, data leaks, and emerging threats.

Cybercriminals often use the dark web for nefarious activities, including selling stolen data, bragging about breaching an organization, and distributing malware. Here, you will find the latest developments from the criminal underworld, making it a futile source of information.

Unfortunately, dark web monitoring can be difficult for three main reasons:

Access and Anonymity: Cybercriminals will vet newcomers, making it difficult to access certain groups where capabilities or data are being sold.
Volume and Noise: Filtering for content relevant to your organization in a sea of illicit activity can be challenging.
Ethical and Legal Considerations: Setting up your own dark web monitoring program requires navigating the legal restrictions and ethical considerations of engaging with cyber criminals.

To overcome these challenges, many threat intelligence vendors, such as ZeroFox, Rapid7, and Redscan, offer products or services for monitoring the dark web.

Although dark web monitoring can be an excellent collection source, it is an advanced capability you should only consider acquiring once you have solidified your other collection sources (malware, domain and IPs, TLS certificates, and threat intelligence feeds). These traditional collection sources offer more value for the time invested and don’t require expensive external services.

Here are the use cases for dark web monitoring as an intelligence collection source:

Data from the dark web is often on the bleeding edge, covering emerging threats, future attacks, and recent data breaches.
You can gather a wide variety of intelligence from the dark web, including vulnerabilities that are being exploited, malware for sale, emerging threat actors and attack campaigns, leaked credentials, and more!
Dark web monitoring helps you proactively defend your organization by addressing potential risks and threats in real-time, rather than waiting for traditional detection tools or the media to break a news story.

Private Message Groups

Private message groups are the most direct source of intelligence you can collect. This activity involves actively participating in or observing the communications of threat actors in groups not accessible through public forums.

These private groups are typically hosted on platforms like Telegram, Discord, WhatsApp, and Signal and offer early warnings of emerging threats, planned attacks, and new techniques.

Like dark web monitoring, private message groups are difficult to enter and effectively monitor. Cybercriminals will often vet newcomers, aim not to incriminate themselves, and there is a lot of noise to filter through. Thankfully, your aim is not to unmask these criminals but to track their capabilities and potential threats.

Again, several platforms, including Intel471 Titan and Recorded Future, can help you do this. However, it should be low on your priority list as an intelligence collection source.

Here are the use cases for private message groups as an intelligence collection source:

Private message groups provide the earliest warning signs of a targeted attack, giving you time to strengthen your defenses before an attack happens.
Unlike all the other intelligence collection sources discussed, private message groups allow you to directly inject yourself into the intelligence process and deceive an adversary.
Like dark web monitoring, anything can be discussed in a private message group, including vulnerabilities, malware, threat actors and attack campaigns, leaked credentials, etc.

Conclusion

Data collection is crucial for producing and sharing good cyber threat intelligence, making your threat intelligence collection sources key components.

This guide has taught you what intelligence collection sources are, shown you examples of sources you can use, and discussed the strengths and weaknesses of each. You should now have enough knowledge to start defining your intelligence collection sources, refining your collection process, and recognizing how to use your data better.

Remember that you don’t need to immediately use all the collection sources discussed. You just need to accurately define where you are gathering your intelligence and what it can be used for so you can fulfill your intelligence requirements. A Collection Management Framework (CMF) is a great tool for doing this.

Frequently Asked Questions

What Is a Source of Threat Intelligence?

A threat intelligence source produces or holds data that you can use to better understand a cyber threat. It can be something internal to your organization (e.g., malware found on systems or network indicators found in your logs) or external (e.g., TLS certificates found on the web or a dark web forum).

It can also be technical (e.g., an IP address), human (e.g., some messaging in a private group), open (freely available), or closed (restricted access). It is a threat intelligence source if you can use it to gain insights about a threat.

What Is an Example of Threat Intelligence?

An example of threat intelligence could be a malicious Internet Protocol (IP) address. An IP address uniquely identifies a network-connected device so it can connect with other devices. Threat actors will use command-and-control (C2) servers to control their malware, these will have IP addresses.

If this IP address is seen in an attack campaign, then a threat intelligence vendor will categorize it as malicious, and you use this as a piece of threat intelligence (an indicator) to better inform key decision-makers. They may then block this IP address or perform a threat hunt to see if any employee workstation has communicated with it. The action chosen will be based on the Courses of Action (CoA) matrix.

What Is a Threat Source?

A threat source is anything that can harm an organization’s assets, operations, or data by affecting its confidentiality, integrity, or availability (CIA triad). Threat sources can be human (malicious insiders or cybercriminals), technological (malware and vulnerabilities), organizational (third-party or supply chain risks), or environmental (natural disasters). The threat intelligence collection sources discussed in this guide are used to collect data about threat sources.

What Are the Types of Threat Intelligence Sources?

There are three main spectrums that threat intelligence sources exist on:

Open vs. Closed: Whether access to the information is freely available or restricted by a paywall or other barrier.
Technical vs. Human: Whether the data is collected from technical controls (e.g., malware, exploits, hacking tools, attacker infrastructure, etc.) or through direct human interactions and observations (e.g., private message groups like Telegram).
Internal vs. External: Whether the data comes from inside your organization (e.g., within your data logs) or outside your organization (e.g., third-party logs).

Each threat intelligence source will match these three main categories. For example, a threat intelligence feed could be open, technical, and external.