Python Threat Hunting Tools: Part 5 — Command Line Arguments

Python Threat Hunting Tools: Part 5 — Command Line Arguments

Welcome back to this series on building threat hunting tools!

In this series, I will be showcasing a variety of threat hunting tools that you can use to hunt for threats, automate tedious processes, and extend to create your own toolkit! The majority of these tools will be simple with a focus on being easy to understand and implement. This is so that you, the reader, can learn from these tools and begin to develop your own. 

There will be no cookie-cutter tutorial on programming fundamentals like data type, control structures, etc. Instead, this series will focus on the practical implementation of scripting/programming through small projects. It is encouraged that you play with these scripts, figure out ways to break or extend them, and try to improve on their basic design to fit your needs. I find this the best way to learn any new programming language/concept and, certainly, the best way to derive value!

In this installment we will look at how we can give our threat hunting tools command line arguments that modify their behavior when they are executed.

What are Command Line Arguments?

Command line arguments are parameters or options that are passed to a program or script when it is executed via the command line. They let the user of the script modify behavior or configure its settings, typically by providing inputs (e.g. files, strings, the output of another command) or flags (e.g. -h for help). If an input is provided via command line arguments then it is parsed by the script into a usable format and then used somewhere within the script.

Command line arguments are often used to automate repetitive tasks, configure programs, or to pass inputs that are too complex to be entered via a GUI. They are also used to build more complex scripts and programs that can be executed with a variety of options and inputs, making them more versatile and powerful.

What are Command Line Arguments

The following code executes myscript.py with the -v and file.txt arguments. The -v argument is typically referred to as a “flag” and it’s best thought of as changing the script’s settings, in this case providing verbose output. Whereas, the file.txt argument is an input argument on which the script will perform some actions on.

Bash
python myscript.py file1.txt -v

The most popular Python module for parsing command line arguments is argparse. This module allows you to create an ArgumentParser object to which you add arguments using the .add_argument() method. 

Once you have added your arguments you can run the .parse_args() method to parse the arguments and the module will take care of the low-level work of collecting, parsing, and storing the command line arguments provided by the script’s user. To access an argument you simply call it as an attribute of the ArgumentParser object you created (e.g. args.<argument_name>). 

It is a simple and intuitive module to use that I highly recommend!

The Problem

In a previous installment of this series (Python Threat Hunting Tools: Part 2 — Web Scraping), we created a web scraping script that used Beautiful Soup to get a list of high-severity vulnerabilities published by the Cybersecurity and Infrastructure Security Agency (CISA). This script was awesome and gave us a CSV file containing relevant information about each vulnerability. However, it was not very flexible.

What if a user wanted the output in text or JSON format instead?

Let’s refactor this script to give the user a few more options in terms of output. That way they can incorporate the script into their own vulnerability management process, perhaps one that doesn’t use CSV files.

The Solution

To start let’s import the argparse module into our original web scraping script (which can be found in its entirety here):

Python
# import necessary Python modules
import requests, csv
from bs4 import BeautifulSoup
import argparse, json

[redacted]

We have also imported the json Python module here which we will use to output data in JSON format later.

Now we can create our argument parser object. I have named mine parser.

Python
parser = argparse.ArgumentParser()

Let’s add a few output options to the argument parser so the user can choose different formats for the vulnerability information the script scrapes. The optional action parameter is used to store the argument as a Boolean value (e.g. if --csv is present then store as True), while the help parameter displays a nice message to the user when reading the script’s documentation (e.g. with myscript.py -h).

Python
parser = argparse.ArgumentParser()
parser.add_argument('--csv', action='store_true', help='output in CSV format')
parser.add_argument('--json', action='store_true', help='output in JSON format')
parser.add_argument('--txt', action='store_true', help='output in text format')
args = parser.parse_args()

Perfect! Our argument parser now has three arguments that the user can specify on the command line when invoking this script:

  • --csv which will output the data in CSV format. This was the previous behavior of the script.
  • --json which will output the data in JavaScript Object Notation (JSON). This is a standard text-based format for representing structured data based on JavaScript object syntax and is commonly used for transmitting data over the web. Pretty handy if the user of our script wants to send this information to someone or to a web API.
  • --txt which will output the data in standard text format. This is useful if the user needs to search for something within the vulnerability data (e.g. using grep) or needs a quick visual representation of the data without using a GUI application.

With our arguments defined, we now need to add the functionality to our web scraping script.

I have restructured our original web scraping script to now include three functions for outputting the scraped data. These are output_csv, output_json, and output_txt. Each of these functions takes in the list vulns (which stores the vulnerability data) and modifies the global FILENAME variable to make the output specific to each format. For instance, CISA vulnerabilities - April 3, 2023 becomes CISA vulnerabilities - April 3, 2023.csv.

The output_csv the function just encapsulates the code we created previously for outputting scraped data in CSV form:

Python
def output_csv(vulns):
   # edit filename to be a CSV
   csv_file = FILENAME + ".csv"
   
   # create a CSV file
   with open(csv_file, "w", encoding='UTF8', newline='') as f:
       # create csv writer to write data to file
       writer = csv.writer(f)
       # write header row
       writer.writerow(HEADER_ROW)

       # write vulnerabilities
       for vuln in vulns:
           data_row = [vuln['product'], vuln['vendor'], vuln['description'], vuln['published'],vuln['cvss'], vuln["cve"], vuln['reference']]
           writer.writerow(data_row)

The output_json function takes the list of vulnerabilities (vulns) which are Python dictionary objects. It creates an appropriate filename by appending the .json extension and opens this file for writing. To output a list of dictionary objects to this JSON file it uses the json module’s .dump() method.

Python
def output_json(vulns):
   # edit filename to be a JSON
   json_file = FILENAME + ".json"

   # create a CSV file
   with open(json_file, "w", encoding='UTF8') as f:
       # write the data to the file in JSON format
       json.dump(vulns, f, indent=2)

The output_txt function, like the other functions, takes the list of vulnerabilities (vulns) and creates an appropriate filename. Next, it opens the file for writing and unpacks the HEADER_ROW list into a single string variable which it writes to the file. Then a divider is added and it writes out each vulnerability in vulns using string concatenation.

Python
def output_txt(vulns):
   # edit filename to be a TXT
   txt_file = FILENAME + ".txt"

   # create a CSV file
   with open(txt_file, "w", encoding='UTF8') as f:
       # write header row
       f.write(" |\\t".join(HEADER_ROW) + "\\n")
       f.write("-" * 80 + "\\n")

       # write the data to the file in JSON format
       for vuln in vulns:
           data = vuln['product'] + " | " + vuln['vendor'] + " | " + vuln['description'] + " | " + vuln['published'] + " | " + vuln['cvss'] + " | " + vuln['cve'] + " | " + vuln['reference']

           f.write(data + "\n\n")

Once our script has finished scraping, we need to decide which format to output the data in. We do this by using a simple if statement that checks if each of the possible output variables is True (remember argparse stores these variables as True or False depending on if they are present on the command line). If True then we call the relevant output function to generate that data file.

Python
# decide on output format
if args.csv:
   output_csv(vulns)
elif args.json:
   output_json(vulns)
elif args.txt:
   output_txt(vulns)
else:
   print(vulns)

Complete!

You can now run this script with the --csv, --json, or --txt flag specified on the command line and the script will output your chosen format:

  • python .\cisa_vulns_scraper_v2.py --csv for a nicely formatted CSV file
  • python .\cisa_vulns_scraper_v2.py --json for a beautiful JSON file
  • python .\cisa_vulns_scraper_v2.py --txt for an ASCII art-style text file

The full script can be found on GitHub and will look like this:

Python
# import necessary Python modules
import requests, csv
from bs4 import BeautifulSoup
import argparse, json

# command line argument parser
parser = argparse.ArgumentParser()
parser.add_argument('--csv', action='store_true', help='output in CSV format')
parser.add_argument('--json', action='store_true', help='output in JSON format')
parser.add_argument('--txt', action='store_true', help='output in text format')
args = parser.parse_args()

# file header row for CSV and TXT output
HEADER_ROW= ["Product", "Vendor", "Description", "Published", "CVSS", "CVE", "Reference"]

# output functions
def output_csv(vulns):
   # edit filename to be a CSV
   csv_file = FILENAME + ".csv"

   # create a CSV file
   with open(csv_file, "w", encoding='UTF8', newline='') as f:
       # create csv writer to write data to file
       writer = csv.writer(f)
       # write header row
       writer.writerow(HEADER_ROW)

       # write vulnerabilities
       for vuln in vulns:
           data_row = [vuln['product'], vuln['vendor'], vuln['description'], vuln['published'],vuln['cvss'], vuln["cve"], vuln['reference']]
           writer.writerow(data_row)

def output_json(vulns):
   # edit filename to be a JSON
   json_file = FILENAME + ".json"
   # create a CSV file
   with open(json_file, "w", encoding='UTF8') as f:
       # write the data to the file in JSON format
       json.dump(vulns, f, indent=2)

def output_txt(vulns):
   # edit filename to be a TXT
   txt_file = FILENAME + ".txt"

   # create a CSV file
   with open(txt_file, "w", encoding='UTF8') as f:
       # write header row
       f.write(" |\t".join(HEADER_ROW) + "\n")
       f.write("-" * 80 + "\n")

       # write the data to the file in JSON format
       for vuln in vulns:
           data = vuln['product'] + " | " + vuln['vendor'] + " | " + vuln['description'] + " | " + vuln['published'] + " | " + vuln['cvss'] + " | " + vuln['cve'] + " | " + vuln['reference']
           f.write(data + "\n")

# download the page
WEB_PAGE = "https://www.cisa.gov/news-events/bulletins/sb23-100"
page = requests.get(WEB_PAGE)

# parse the page with Beautiful Soup
soup = BeautifulSoup(page.content, "html.parser")

# variables for output
PAGE_TITLE = soup.title.string
a = soup.title.string.split("of")
b = a[1].split("|")
FILENAME = "CISA vulnerabilties - " + b[0].strip()

# capture high vulnerabilities table
table = soup.find("table")
table_body = table.find("tbody")
rows = table_body.find_all("tr")

# list to hold vulnerability dictionaries
vulns = []

# loop through table rows
for row in rows:
   # create table columns
   cols = [x for x in row.find_all("td")]

   # extract relevant fields
   product, vendor = cols[0].text.split("--")
   description = cols[1].text.strip()
   published = cols[2].text.strip()
   cvss = cols[3].text.strip()
   cve = cols[4].find("a").text.strip()
   reference = cols[4].find("a").get("href")

   # store fields as a dictionary object
   vuln = {
       "product": product.strip(),
       "vendor": vendor.strip(),
       "description": description,
       "published": published,
       "cvss": cvss,
       "cve": cve,
       "reference": reference
   }

   # append dictionary object to vulnerability list
   vulns.append(vuln)

# decide on output format
if args.csv:
   output_csv(vulns)
elif args.json:
   output_json(vulns)
elif args.txt:
   output_txt(vulns)
else:
   print(vulns)

# print message to terminal indicating success
print(f"Printed {PAGE_TITLE}")
print(f"-> see {FILENAME}")

Try exploring the ways you can improve or adapt this script. Currently, the user can output to all three formats, but do you want this or could you impose a restraint? Also, maybe you could let the user specify which security bulletin from CISA’s site they want to scrape by adding the --url option.

Next time in this series we will look at how we can create executable files from our Python scripts so we can run them anywhere!

Discover more in the Python Threat Hunting Tools series!

Back to top arrow

Interesting in Learning More?

Learn the dark arts of red teaming

If you want more of a challenge, take on one of their certification exams and land your next job in cyber:

Learn more cyber security skills

If you’re looking to level up your skills even more, have a go at one of their certifications: