Welcome back to this series on building threat hunting tools!
In this series, I will be showcasing a variety of threat hunting tools that you can use to hunt for threats, automate tedious processes, and extend to create your own toolkit! The majority of these tools will be simple with a focus on being easy to understand and implement. This is so that you, the reader, can learn from these tools and begin to develop your own.
There will be no cookie-cutter tutorial on programming fundamentals like data type, control structures, etc. Instead, this series will focus on the practical implementation of scripting/programming through small projects. It is encouraged that you play with these scripts, figure out ways to break or extend them, and try to improve on their basic design to fit your needs. I find this the best way to learn any new programming language/concept and, certainly, the best way to derive value!
In this installment we will look at how we can give our threat hunting tools command line arguments that modify their behavior when they are executed.
What are Command Line Arguments?
Command line arguments are parameters or options that are passed to a program or script when it is executed via the command line. They let the user of the script modify behavior or configure its settings, typically by providing inputs (e.g. files, strings, the output of another command) or flags (e.g. -h
for help). If an input is provided via command line arguments then it is parsed by the script into a usable format and then used somewhere within the script.
Command line arguments are often used to automate repetitive tasks, configure programs, or to pass inputs that are too complex to be entered via a GUI. They are also used to build more complex scripts and programs that can be executed with a variety of options and inputs, making them more versatile and powerful.
The following code executes myscript.py
with the -v
and file.txt
arguments. The -v
argument is typically referred to as a “flag” and it’s best thought of as changing the script’s settings, in this case providing verbose output. Whereas, the file.txt
argument is an input argument on which the script will perform some actions on.
python myscript.py file1.txt -v
The most popular Python module for parsing command line arguments is argparse. This module allows you to create an ArgumentParser
object to which you add arguments using the .add_argument()
method.
Once you have added your arguments you can run the .parse_args()
method to parse the arguments and the module will take care of the low-level work of collecting, parsing, and storing the command line arguments provided by the script’s user. To access an argument you simply call it as an attribute of the ArgumentParser
object you created (e.g. args.<argument_name>
).
It is a simple and intuitive module to use that I highly recommend!
The Problem
In a previous installment of this series (Python Threat Hunting Tools: Part 2 — Web Scraping), we created a web scraping script that used Beautiful Soup to get a list of high-severity vulnerabilities published by the Cybersecurity and Infrastructure Security Agency (CISA). This script was awesome and gave us a CSV file containing relevant information about each vulnerability. However, it was not very flexible.
What if a user wanted the output in text or JSON format instead?
Let’s refactor this script to give the user a few more options in terms of output. That way they can incorporate the script into their own vulnerability management process, perhaps one that doesn’t use CSV files.
The Solution
To start let’s import the argparse
module into our original web scraping script (which can be found in its entirety here):
# import necessary Python modules
import requests, csv
from bs4 import BeautifulSoup
import argparse, json
[redacted]
We have also imported the json Python module here which we will use to output data in JSON format later.
Now we can create our argument parser object. I have named mine parser
.
parser = argparse.ArgumentParser()
Let’s add a few output options to the argument parser so the user can choose different formats for the vulnerability information the script scrapes. The optional action parameter is used to store the argument as a Boolean value (e.g. if --csv
is present then store as True
), while the help parameter displays a nice message to the user when reading the script’s documentation (e.g. with myscript.py -h
).
parser = argparse.ArgumentParser()
parser.add_argument('--csv', action='store_true', help='output in CSV format')
parser.add_argument('--json', action='store_true', help='output in JSON format')
parser.add_argument('--txt', action='store_true', help='output in text format')
args = parser.parse_args()
Perfect! Our argument parser now has three arguments that the user can specify on the command line when invoking this script:
--csv
which will output the data in CSV format. This was the previous behavior of the script.--json
which will output the data in JavaScript Object Notation (JSON). This is a standard text-based format for representing structured data based on JavaScript object syntax and is commonly used for transmitting data over the web. Pretty handy if the user of our script wants to send this information to someone or to a web API.--txt
which will output the data in standard text format. This is useful if the user needs to search for something within the vulnerability data (e.g. usinggrep
) or needs a quick visual representation of the data without using a GUI application.
With our arguments defined, we now need to add the functionality to our web scraping script.
I have restructured our original web scraping script to now include three functions for outputting the scraped data. These are output_csv
, output_json
, and output_txt
. Each of these functions takes in the list vulns
(which stores the vulnerability data) and modifies the global FILENAME
variable to make the output specific to each format. For instance, CISA vulnerabilities - April 3, 2023
becomes CISA vulnerabilities - April 3, 2023.csv
.
The output_csv
the function just encapsulates the code we created previously for outputting scraped data in CSV form:
def output_csv(vulns):
# edit filename to be a CSV
csv_file = FILENAME + ".csv"
# create a CSV file
with open(csv_file, "w", encoding='UTF8', newline='') as f:
# create csv writer to write data to file
writer = csv.writer(f)
# write header row
writer.writerow(HEADER_ROW)
# write vulnerabilities
for vuln in vulns:
data_row = [vuln['product'], vuln['vendor'], vuln['description'], vuln['published'],vuln['cvss'], vuln["cve"], vuln['reference']]
writer.writerow(data_row)
The output_json
function takes the list of vulnerabilities (vulns
) which are Python dictionary objects. It creates an appropriate filename by appending the .json
extension and opens this file for writing. To output a list of dictionary objects to this JSON file it uses the json module’s .dump()
method.
def output_json(vulns):
# edit filename to be a JSON
json_file = FILENAME + ".json"
# create a CSV file
with open(json_file, "w", encoding='UTF8') as f:
# write the data to the file in JSON format
json.dump(vulns, f, indent=2)
The output_txt
function, like the other functions, takes the list of vulnerabilities (vulns
) and creates an appropriate filename. Next, it opens the file for writing and unpacks the HEADER_ROW
list into a single string variable which it writes to the file. Then a divider is added and it writes out each vulnerability in vulns
using string concatenation.
def output_txt(vulns):
# edit filename to be a TXT
txt_file = FILENAME + ".txt"
# create a CSV file
with open(txt_file, "w", encoding='UTF8') as f:
# write header row
f.write(" |\\t".join(HEADER_ROW) + "\\n")
f.write("-" * 80 + "\\n")
# write the data to the file in JSON format
for vuln in vulns:
data = vuln['product'] + " | " + vuln['vendor'] + " | " + vuln['description'] + " | " + vuln['published'] + " | " + vuln['cvss'] + " | " + vuln['cve'] + " | " + vuln['reference']
f.write(data + "\n\n")
Once our script has finished scraping, we need to decide which format to output the data in. We do this by using a simple if
statement that checks if each of the possible output variables is True (remember argparse
stores these variables as True
or False
depending on if they are present on the command line). If True
then we call the relevant output function to generate that data file.
# decide on output format
if args.csv:
output_csv(vulns)
elif args.json:
output_json(vulns)
elif args.txt:
output_txt(vulns)
else:
print(vulns)
Complete!
You can now run this script with the --csv
, --json
, or --txt
flag specified on the command line and the script will output your chosen format:
python .\cisa_vulns_scraper_v2.py --csv
for a nicely formatted CSV filepython .\cisa_vulns_scraper_v2.py --json
for a beautiful JSON filepython .\cisa_vulns_scraper_v2.py --txt
for an ASCII art-style text file
The full script can be found on GitHub and will look like this:
# import necessary Python modules
import requests, csv
from bs4 import BeautifulSoup
import argparse, json
# command line argument parser
parser = argparse.ArgumentParser()
parser.add_argument('--csv', action='store_true', help='output in CSV format')
parser.add_argument('--json', action='store_true', help='output in JSON format')
parser.add_argument('--txt', action='store_true', help='output in text format')
args = parser.parse_args()
# file header row for CSV and TXT output
HEADER_ROW= ["Product", "Vendor", "Description", "Published", "CVSS", "CVE", "Reference"]
# output functions
def output_csv(vulns):
# edit filename to be a CSV
csv_file = FILENAME + ".csv"
# create a CSV file
with open(csv_file, "w", encoding='UTF8', newline='') as f:
# create csv writer to write data to file
writer = csv.writer(f)
# write header row
writer.writerow(HEADER_ROW)
# write vulnerabilities
for vuln in vulns:
data_row = [vuln['product'], vuln['vendor'], vuln['description'], vuln['published'],vuln['cvss'], vuln["cve"], vuln['reference']]
writer.writerow(data_row)
def output_json(vulns):
# edit filename to be a JSON
json_file = FILENAME + ".json"
# create a CSV file
with open(json_file, "w", encoding='UTF8') as f:
# write the data to the file in JSON format
json.dump(vulns, f, indent=2)
def output_txt(vulns):
# edit filename to be a TXT
txt_file = FILENAME + ".txt"
# create a CSV file
with open(txt_file, "w", encoding='UTF8') as f:
# write header row
f.write(" |\t".join(HEADER_ROW) + "\n")
f.write("-" * 80 + "\n")
# write the data to the file in JSON format
for vuln in vulns:
data = vuln['product'] + " | " + vuln['vendor'] + " | " + vuln['description'] + " | " + vuln['published'] + " | " + vuln['cvss'] + " | " + vuln['cve'] + " | " + vuln['reference']
f.write(data + "\n")
# download the page
WEB_PAGE = "https://www.cisa.gov/news-events/bulletins/sb23-100"
page = requests.get(WEB_PAGE)
# parse the page with Beautiful Soup
soup = BeautifulSoup(page.content, "html.parser")
# variables for output
PAGE_TITLE = soup.title.string
a = soup.title.string.split("of")
b = a[1].split("|")
FILENAME = "CISA vulnerabilties - " + b[0].strip()
# capture high vulnerabilities table
table = soup.find("table")
table_body = table.find("tbody")
rows = table_body.find_all("tr")
# list to hold vulnerability dictionaries
vulns = []
# loop through table rows
for row in rows:
# create table columns
cols = [x for x in row.find_all("td")]
# extract relevant fields
product, vendor = cols[0].text.split("--")
description = cols[1].text.strip()
published = cols[2].text.strip()
cvss = cols[3].text.strip()
cve = cols[4].find("a").text.strip()
reference = cols[4].find("a").get("href")
# store fields as a dictionary object
vuln = {
"product": product.strip(),
"vendor": vendor.strip(),
"description": description,
"published": published,
"cvss": cvss,
"cve": cve,
"reference": reference
}
# append dictionary object to vulnerability list
vulns.append(vuln)
# decide on output format
if args.csv:
output_csv(vulns)
elif args.json:
output_json(vulns)
elif args.txt:
output_txt(vulns)
else:
print(vulns)
# print message to terminal indicating success
print(f"Printed {PAGE_TITLE}")
print(f"-> see {FILENAME}")
Try exploring the ways you can improve or adapt this script. Currently, the user can output to all three formats, but do you want this or could you impose a restraint? Also, maybe you could let the user specify which security bulletin from CISA’s site they want to scrape by adding the --url
option.
Next time in this series we will look at how we can create executable files from our Python scripts so we can run them anywhere!
Discover more in the Python Threat Hunting Tools series!