Python Threat Hunting Tools Part 7 — Parsing CSV

Welcome back to this series on building threat hunting tools. In this series, I will be showcasing a variety of threat hunting tools that you can use to hunt for threats, automate tedious processes, and extend to create your own toolkit!

The majority of these tools will be simple, with a focus on being easy to understand and implement. This is so that you, the reader, can learn from these tools and begin to develop your own. There will be no cookie-cutter tutorial on programming fundamentals like data types, control structures, etc., this series will focus on the practical implementation of scripting through small projects.

It is encouraged that you play with these scripts, figure out ways to break or extend them, and try to improve on their basic design to fit your needs. I find this the best way to learn any new programming language/concept and, certainly, the best way to derive value!

In this installment, we will look at extracting data from a CSV file using Python.

What is a CSV File?

A Comma-Separated Values (CSV) file is a plain text file format that uses commas to separate data into rows and columns. It provides a simple and portable way to structure data.

Each column is separated by a comma and each row is separated by a newline character.

Name,Email,Phone
John Doe,[email protected],555-1234
Jane Smith,[email protected],555-5678

The data contained in a CSV file usually consists of a header row (Name,Email,Phone), which is followed by records (each line) that have attributes or fields (each value in the row). For instance, the record for John Doe has the attributes Name (John Doe), Email ([email protected]), and Phone (555-1234).

CSV files are widely used in cyber security for reporting and tracking. They allow analysts to structure data about threats, vulnerabilities, hunts, and so on, that can then easily be shared with colleagues. An analyst can also collate this data and perform statistical analysis to identify trends and patterns in the data. This is common in cyber threat intelligence to get a clear picture of the threat landscape.

The Problem

A colleague of ours has used the revised version of our Python web scraping script from Python Threat Hunting Tools: Part 5 — Command Line Arguments to capture a list of high-severity vulnerabilities published by the Cybersecurity and Infrastructure Security Agency (CISA). They used the command line argument --csv to save this list to a CSV file and have now shared this file with us.

CISA defines a high-severity vulnerability as any vulnerability that scores 7.0 and above on the Common Vulnerability Scoring System (CVSS). As busy security analysts, we need to focus on the most critical vulnerabilities first, the ones that can lead to Remote Code Execution (RCE). To do this, we only want a list of vulnerabilities that have a CVSS score of 9.0 or more.

How can we do this in Python using the CSV file our colleague shared?

The Solution

To solve this problem, we need to parse the CSV file our colleague shared, filter out all the vulnerabilities that have a CVSS score of 9.0 or more, and then create a new CSV file that only contains the most severe vulnerabilities.

The Python standard library comes to our rescue here. It has a built-in module called csv that allows us to read and write CSV files. It can parse a CSV file into a Python data structure (a list), from which we can then filter out the data we need and re-compile our vulnerability list.

Let’s see how that is done!

First, we need to import the Python modules we will be using in this script. We will use csv module and the argparse module.

Python
import csv
import argparse

Our script will let the user specify the name of the CISA vulnerabilities report they want to use on the command line. To do this, we need to parse this filename from the command line using the argparse module.

Python
# get csv file from command line using argparse
parser = argparse.ArgumentParser()
parser.add_argument("csv_file", help="CSV file (needs to be in same directory)")
args = parser.parse_args()
csv_file = args.csv_file

Now we can open this file and begin reading its contents with the CSV reader object supplied by the csv module.

Python
# open csv file
with open(csv_file, 'r') as file:
   # create CSV reader object
   reader = csv.reader(file)
   …

We want to filter out all the critical vulnerabilities in this CSV file (vulnerabilities that have a CVSS score of 9.0 or more). So we need to create a Python list to hold these critical vulnerabilities critical_vulns and loop through all the rows in the CSV file.

Python
# create a list to critical vulns (9+ cvss)
critical_vulns = []

# loop through rows
for row in reader:
   …

We can skip the header row of the CSV file and append this straight to our critical_vulns list. This saves us from re-typing this information later. 

Next, we need to convert the CVSS score in the file into a floating point number so we can determine if it’s greater than or equal to 9.0 using a Boolean comparison. This is done by casting (changing the data type) from a string (its original type) into a floating point type with the built-in float() function. Now we can make this greater than or equal to comparison and append the entire row to our list of critical vulnerabilities if True.

Python
# skip header row
if row[4] == "CVSS":
   critical_vulns.append(row)
   continue

# check cvss score
if float(row[4]) >= 9.0:
   critical_vulns.append(row)

The CVSS score is the fifth column in the CSV file. However, Python starts counting from 0 so we need to use row number 4.

Finally, we can create a new filename and write all the critical vulnerabilities to a new CSV file. This is done using the CSV writer object supplied by the csv module and its .writerows() method. All we need to do is supply the list of critical vulnerabilities we want in our new CSV file.

Python
# create new CSV file containing only critical critical
filename = '[CRITICAL] ' + csv_file

with open(filename, 'w', newline='') as file:
   # create CSV writer object
   writer = csv.writer(file)

   # write critical vulns to file
   writer.writerows(critical_vulns)

Complete!

We can now scrape vulnerabilities from the latest CISA reports and then filter out only the most critical ones so that we can prioritize our remediation efforts. That’s not all although, the code demonstrated here can be used to parse any CSV file you come across. This is very useful in the world of cyber security where CSV files are used everywhere!

The full script can be found on GitHub and will look like this:

Python
import csv
import argparse

# get csv file from command line using argparse
parser = argparse.ArgumentParser()
parser.add_argument("csv_file", help="CSV file (needs to be in same directory)")
args = parser.parse_args()
csv_file = args.csv_file

# open csv file
with open(csv_file, 'r') as file:
   # create CSV reader object
   reader = csv.reader(file)

   # create a list to critical vulns (9+ cvss)
   critical_vulns = []

   # loop through rows
   for row in reader:
       # skip header row
       if row[4] == "CVSS":
           critical_vulns.append(row)
           continue

       # check cvss score
       if float(row[4]) >= 9.0:
           critical_vulns.append(row)

# create new CSV file containing only critical critical
filename = '[CRITICAL] ' + csv_file

with open(filename, 'w', newline='') as file:
   # create CSV writer object
   writer = csv.writer(file)

   # write critical vulns to file
   writer.writerows(critical_vulns)

print(f"Finished writing {len(critical_vulns) - 1} to {filename}")

Try exploring the ways you can improve or adapt this script. Currently, the CSV file must be in the same directory as the script to create the critical list of vulnerabilities. 

  • Can you make it so this CSV file can be anywhere on the filesystem? 
  • How about filtering for other things? 
  • Can you filter on the other fields included in the CISA report (Product name, Vendor name, and date Published)?

Next time in this series we will look at parsing JSON files, another common file format in cyber security. This will allow us to extend our data extraction and analysis capabilities!

Discover more in the Python Threat Hunting Tools series!