Python Threat Hunting Tools: Part 4 — Browser Automation

Welcome back to this series on building threat hunting tools!

In this series, I will be showcasing a variety of threat hunting tools that you can use to hunt for threats, automate tedious processes, and extend to create your own toolkit! The majority of these tools will be simple, focusing on being easy to understand and implement. This is so that you, the reader, can learn from these tools and begin to develop your own. 

There will be no cookie-cutter tutorial on programming fundamentals, instead, this series will focus on the practical implementation of scripting/programming through small projects. It is encouraged that you play with these scripts, figure out ways to break or extend them, and try to improve on their basic design to fit your needs. I find this the best way to learn any new programming language/concept and, certainly, the best way to derive value!

In this installment, we will be delving into the topic of browser automation and see how we can automate a threat intelligence process when API access is not available.

What is Browser Automation?

Browser automation refers to the process of automating tasks that are typically performed manually by a user in a web browser. It involves using software tools or scripts to replicate user interactions with a web browser, such as clicking buttons, filling in forms, navigating between pages, and extracting data.

 There are two primary purposes for browser automation:

  • Web Scraping: Where data is extracted from websites for analysis or other purposes.
  • Web Testing: Where automated tests are run to ensure that web applications function correctly.
What is Browser Automation?

There are several tools and frameworks available for browser automation, including Selenium, Puppeteer, and Playwright. These tools allow developers to write scripts or programs that interact with web browsers in a programmatic way, making it possible to automate repetitive tasks and save time and effort. In this demonstration, we will be using Selenium.

Selenium is an open-source browser automation framework that allows developers to automate web browsers for testing purposes, web scraping, or any other task that requires browser automation. It supports multiple programming languages, including Java, Python, Ruby, and JavaScript, making it a popular choice for automation tasks across different platforms and technologies.

Developers use Selenium through a set of APIs that allow them to interact with web pages in a programmatic way and simulate user interactions such as clicking buttons, filling out forms, and navigating between pages. It also provides features for capturing screenshots, handling alerts, and working with different web elements such as dropdowns, checkboxes, and radio buttons. 

The framework supports various browsers (including Chrome, Firefox, Safari, Internet Explorer, and Edge) as well as a headless mode which allows developers to run automation tasks without opening a visible browser window. This allows developers to be more efficient when running automated tests in a continuous integration environment.

We will be using Selenium through its Python package which can be installed with Python’s pip package installer using the command.

Python
pip install selenium

The Problem

You will often find older web applications that don’t support an API or websites that hide their API behind a paywall.

A company I previously worked at was using an EDR solution to run threat hunting queries. This solution had an API but it was behind a paywall. This made it difficult to automate running threat hunting queries because every time you wanted to run one you had to go through the web GUI. As a workaround I put together a Python script that used browser automation to loop through all the threat hunting queries I wanted to run.

It logged into the EDR solution, opened a new browser tab, automatically typed out the query into the search bar and ran it. Then it would open a new tab and type out the next query, and so on. All I had to do was cycle through the browser tabs to see if any results had been returned to investigate. This saved hours of time manually typing out and running queries.

The example in this demonstration will be VirusTotal. This is the de facto platform for gathering intelligence about a threat, however, their web API is notoriously expensive to access. If we want to programmatically look up if an IOC is malicious or not on VirusTotal then we have to overcome this financial burden. Through the power of browser automation, we can do this!

Note, we can do this but it is at a slightly limited capacity. VirusTotal, like most websites, blocks automatic web requests from robots using CAPTCHAs and time restraints. If you submit a certain number of requests in a limited time then VirusTotal will ask you to fill out a CAPTCHA to confirm you are not a robot. There are exotic ways to get around this but I have found that keeping your IOC count below 20 will not trigger this protection and you will be good.

The Solution

Let’s use the Selenium browser automation framework to feed a list of IOCs we want to quickly check if they are malicious or not using VirusTotal. This tool will take a list of IOCs, navigate to VirusTotal, and then search for this IOC. It will then open a new browser tab and search for the next IOC on the list. Thus, effectively automating VirusTotal lookups without having to splurge on their expensive API access.

First, we need to import our required Python packages. For this project, we will use selenium and, specifically, the webdriver module. We will also be using the Edge web browser (because most people have it automatically) which requires the Edge service and Edge option modules. You can use Chrome or Firefox drivers instead, but that is left up to you to implement (it’s very similar). We also need to import the validators package (used to validate IP addresses and domain names) and the os package.

Python
from selenium.webdriver.edge.service import Service as EdgeService
from selenium.webdriver.edge.options import Options
from selenium import webdriver
import validators
import os

Now we need to write a Python function that will load a text file containing IOCs into a Python list that we can loop through and send each IOC in the list to VirusTotal for lookup. This function takes a filename, opens it for reading using a context manager, and then line-by-line will clean up any whitespace using the .strip() method and append the IOC to a list called iocs. Once done it will print a status message to the user and return the list of IOCs.

Python
# loads IOCs from text file to Python list for processing
def load_iocs(filename):
   # create list to store IOCs
   iocs = []

   # open file in read mode
   with open(filename, "r") as f:
       for line in f.readlines():
           iocs.append(line.strip())

   # print output
   print(f"[+] Loaded {len(iocs)} IOCs for testing.")
   
   # return list of IOCs
   return iocs

Now we need a function that will loop through these IOCs and look them up using VirusTotal. This function needs to take a list of IOCs (iocs), create a web driver to open Edge with, and then for each IOC it needs to send it to the appropriate VirusTotal URL where it will be searched for. 

First, we create the web driver (driver) that will perform browser automation. This has several options, such as detach and service, so that it can run independently of the script for better performance. 

Next, we begin our loop through the IOC list. To filter the IOCs we can use the validators package which allows us to determine if an IOC is an IP address or domain. We can also filter based on the length of the IOC. 

Here I have determined that IOCs which are 32 characters in length (and not a domain) are likely to be an MD5 hash, IOCs of 40 characters are likely SHA1 hashes, and IOCs of 64 characters are likely SHA256 hashes. 

If none of these conditions are met then the IOCs are simply skipped. There is also some execution tracking in the script which prints out how the script is progressing so that the user knows when the script has completed. This uses the count variable.

Python
# runs IOC list through virus total lookups

def run(iocs):
   # create web driver to run virus total lookups
   # service = EdgeService(verbose = True)
   service = EdgeService()
   edge_options = Options()
   edge_options.add_experimental_option("detach", True)
   driver = webdriver.Edge(service=service, options=edge_options)

   # keep track of script execution
   count = 1

   print("[+] Starting...")
   for ioc in iocs:
       print(f"{count}/{len(iocs)}")
       
       # add a new tab to browser window
       driver.switch_to.new_window("tab")

       # filter based on IOC IP address, hash, and domain to query correct URL
       if validators.ip_address.ipv4(ioc):
           driver.get(f"<https://www.virustotal.com/gui/ip-address/{ioc}>")
       elif validators.domain(ioc):
           driver.get(f"<https://www.virustotal.com/gui/domain/{ioc}>")
       elif len(ioc) == 32:
           # MD5 hash
           driver.get(f"<https://www.virustotal.com/gui/file/{ioc}>")
       elif len(ioc) == 40:
           # SHA1 hash
           driver.get(f"<https://www.virustotal.com/gui/file/{ioc}>")
       elif len(ioc) == 64:
           # SHA256 hash
           driver.get(f"<https://www.virustotal.com/gui/file/{ioc}>")
       else:
           print(f"[+] Error processing {ioc} ... skipping #{count}")

       # wait to not overload website
       driver.implicitly_wait(5)
       # increase count by 1
       count += 1

   # output finished
   print(f"=== All {len(iocs)} have been now run. You can now analyze. ===")

Notice how for each IOC a new tab is opened using the web driver method switch_to.new_window("tab") this is so that a new window does not open every time and we can save on system resources.

With our functions defined, we can now move on to running this script. Here I have created two variable names. The BASE_DIR variable stores the current working directory and the FILENAME variable uses this to create a full file path to a file named iocs.txt. This is the text file containing each IOC we want to look up using VirusTotal. In this file is a list of IPs, domains, and hashes, all separated by a new line. 

Once we have the full file path we can pass this into our previously created load_iocs() function to gather the IOCs contained in this text file and then pass this list into the run() function to look up these IOCs using browser automation and VirusTotal.

Python
# get ioc list filename
BASE_DIR = os.getcwd()
FILENAME = os.path.join(BASE_DIR, "iocs.txt")
# gather iocs from file
ioc_list = load_iocs(FILENAME)
# run virus total lookup using iocs
run(ioc_list)

The full code can be found on GitHub and looks like this:

Python
from selenium.webdriver.edge.service import Service as EdgeService
from selenium.webdriver.edge.options import Options
from selenium import webdriver
import validators
import os

# loads IOCs from text file to Python list for processing
def load_iocs(filename):
   # create list to store IOCs
   iocs = []

   # open file in read mode
   with open(filename, "r") as f:
       for line in f.readlines():
           iocs.append(line.strip())

   # print output
   print(f"[+] Loaded {len(iocs)} IOCs for testing.")
   # return list of IOCs
   return iocs

# runs IOC list through virus total lookups
def run(iocs):
   # creater web driver to run virus total lookups
   # service = EdgeService(verbose = True)
   service = EdgeService()
   edge_options = Options()
   edge_options.add_experimental_option("detach", True)
   driver = webdriver.Edge(service=service, options=edge_options)

   # keep track of script execution
   count = 1

   print("[+] Starting...")
   for ioc in iocs:
       print(f"{count}/{len(iocs)}")
       
       # add a new tab to browser window
       driver.switch_to.new_window("tab")

       # filter based on IOC IP address, hash, and domain to query correct URL
       if validators.ip_address.ipv4(ioc):
           driver.get(f"https://www.virustotal.com/gui/ip-address/{ioc}")
       elif validators.domain(ioc):
           driver.get(f"https://www.virustotal.com/gui/domain/{ioc}")
       elif len(ioc) == 32:
           # MD5 hash
           driver.get(f"https://www.virustotal.com/gui/file/{ioc}")
       elif len(ioc) == 40:
           # SHA1 hash
           driver.get(f"https://www.virustotal.com/gui/file/{ioc}")
       elif len(ioc) == 64:
           # SHA256 hash
           driver.get(f"https://www.virustotal.com/gui/file/{ioc}")
       else:
           print(f"[+] Error processing {ioc} ... skipping #{count}")

       # wait to not overload website
       driver.implicitly_wait(5)
       # increase count by 1
       count += 1

   # output finished
   print(f"=== All {len(iocs)} have now ran. You can now analyse. ===")

# get ioc list filename
BASE_DIR = os.getcwd()
FILENAME = os.path.join(BASE_DIR, "iocs.txt")

# gather iocs from file
ioc_list = load_iocs(FILENAME)

# run virus total lookup using iocs
run(ioc_list)

Try experimenting with this script; add new IOCs to the iocs.txt file, change variable names to see what breaks, or find another threat intelligence provider you can substitute for VirusTotal. This script is just a starting point to show you how to use browser automation to efficiently gather threat intelligence!

Next time in this series we will look at changing a script’s behavior by using Python command line arguments. Till then, happy hunting!

Discover more in the Python Threat Hunting Tools series!