Python Threat Hunting Tools: Part 8 — Parsing JSON

Welcome back to this series on building threat hunting tools. In this series, I will be showcasing a variety of threat hunting tools that you can use to hunt for threats, automate tedious processes, and extend to create your own toolkit!

The majority of these tools will be simple, with a focus on being easy to understand and implement. This is so that you, the reader, can learn from these tools and begin to develop your own. There will be no cookie-cutter tutorial on programming fundamentals like data types, control structures, etc., this series will focus on the practical implementation of scripting through small projects.

It is encouraged that you play with these scripts, figure out ways to break or extend them, and try to improve on their basic design to fit your needs. I find this the best way to learn any new programming language/concept and, certainly, the best way to derive value!

In this installment, we will be exploring how to parse JSON data.

What is a JSON?

JavaScript Object Notation (JSON) is a data format used to exchange data between machines. It is typically seen in web technologies because it is lightweight, easy to read and write for humans, and is supported by various programming and scripting languages. You have likely encountered JSON data before when browsing the web or interacting with web APIs.

JSON rose to popularity because it is a subset of JavaScript, the predominant programming language on the web, and is considered a better alternative to XML for representing structured data.

JSON data looks similar to a Python dictionary. It uses key-value pairs to structure data and support various data types (e.g., strings, numbers, Booleans, arrays, nested objects). Here is an example of JSON data.

JSON
{
 "name": "John Smith",
 "job": "Hacker",
 "age": 25
}

The Problem

As a threat hunter, you need to interact with web APIs to gather threat intelligence, perform threat hunts, and investigate if Indicators of Compromise (IOCs) are malicious. The investigation of IOCs was covered in a previous article in this series where you got to see how to use the Maltiverse API to query open-source intelligence and determine if an IP address was malicious.

Maltiverse is a threat intelligence platform that collects data on IOCs from various open-source feeds. The platform uses big data analytics and machine learning techniques to process this data and determine if an IOC is malicious. They have a free API you can use to query their data (requests are limited).

In this previous installment, you used the Maltiverse API’s Python module to interact with the API directly rather than having to interact with the API over the web. However, not all APIs you interact with will have a Python module for you to use. In that case, you need to use Python’s request module to make web requests to API endpoints. These endpoints will likely return JSON data to you, which you must parse and analyze.

In Python Threat Hunting Tools: Part 3 — Interacting with APIs you also saw how to use the requests module to query web APIs. However, that article does not discuss how to parse the JSON data returned and analyze this data for useful threat intelligence. So let’s solve a new problem to showcase this skillset!

Today you need to determine if a domain name is malicious in the most efficient way possible (of course this means using a command line tool).

How do you go about this?

The Solution

To solve this problem, you can send a request to the Maltiverse /hostname/{hostname} web API endpoint using Python’s requests module. This endpoint will return JSON data to you, including if the domain you supplied is malicious. You must parse this JSON data and search through it to find the answer. Once you have this threat intelligence, you can print it to the command line.

Let’s get started!

First, you need to import the necessary modules. These will be Python’s json module (to parse JSON data), the requests module (to send web requests), and the validators module (to validate domain names).

Python
import json
import requests
import validators

Next, you need to define the URL (API endpoint) where you want to send your web API requests.

Python
url = 'https://api.maltiverse.com/hostname/'

Now let’s create an infinite while loop to continuously ask the user of your command line tool for input until they want to quit. You can ask them for the domain they wish to check, perform some data sanitization on their input to clean it up, and then create the exit condition where the tool terminates if the user types exit.

Python
while True:
 # get user input
 domain = input("Please enter domain to search for (or type exit to quit): ")
 # clean user input
 clean_domain = domain.strip()

 # check if the user wants to exit
 if clean_domain == "exit":
   print("Goodbye.")
   break
 
 # code continues
 ...

Once you know the user doesn’t want to quit, you can move on to validating if the domain name they provided is valid. You can do this by using the validators module. This Python module can be used to quickly determine if user input is a valid domain name. It returns True if it is.

Python
# validate the domain address
if validators.domain(clean_domain):
 # code to execute with valid domain
 ...

If the domain name is valid, you can then query the Maltiverse web API to find out if it is a malicious domain using the requests module. This Python module makes sending web requests easy! Just supply a URL and the Maltiverse web API endpoint will return JSON data.

Remember, you are using the url specified in the previous code.

Python
# query to Maltiverse API for a response (returned as a JSON object)
response = requests.get(url + clean_domain)

Now comes the main part, parsing the JSON data. To do this, you can use the json module. This Python module parses JSON data into a Python dictionary object, allowing you to easily interact with the data using familiar dictionary methods such as .items(), .values(), and indexing. To parse JSON into a dictionary use the json module’s .loads() method.

Python
# parse the JSON object to just get the relevant text data
result = json.loads(response.text)

With the JSON data in a Python dictionary, you can now perform your analysis on it and determine if Maltiverse has classified the domain name as malicious.

The Maltiverse API returns data in a format similar to the one below (some sections have been redacted for conciseness).

JSON
{
 "as_name":"AS19324 Dosarrest Internet Security LTD",
 "blacklist":[
   {
    "count":1,
    "description":"Amadey",
    "first_seen":"2023-05-06 05:18:01",
    "labels":[
       "malicious-activity"
    ],
    "last_seen":"2023-05-06 05:18:01",
    "source":"ThreatFox Abuse.ch"
   },
   ...
 ],
 "classification":"whitelist",
 "creation_time":"2018-02-28 18:48:33",
 "dnssec":[
   "unsigned",
   "Unsigned"
 ],
 "domain":"test.com",
 "domain_consonants":5,
 "domain_lenght":8,
 "email":[
   "[email protected]",
   "[email protected]",
   "[email protected]"
 ],
 "entropy":2.75,
 "hostname":"test.com",
 "index_selection":"public",
 "is_alive":false,
 "is_cnc":false,
 "is_distributing_malware":false,
 "is_iot_threat":false,
 "is_mining_pool":false,
 "is_phishing":false,
 "is_storing_phishing":false,
 "last_online_time":"2023-06-10 04:07:02",
 "modification_time":"2023-06-10 04:07:02",
 "nameserver":[
   "NS65.WORLDNIC.COM",
   "NS66.WORLDNIC.COM"
 ],
 "number_of_offline_malicious_urls_allocated":0,
 "number_of_online_malicious_urls_allocated":0,
 "resolved_ip":[
   {
    "ip_addr":"69.172.200.235",
    "timestamp":"2018-02-28 18:48:33"
   }
 ],
 ...
}

This dictionary of key-value pairs contains various pieces of information. The piece we are most interested in is the classification key’s value. If the domain is classified as malicious this is probably a true positive, if suspicious or neutral you probably want to investigate it further, and if the domain is classified as whitelist it is probably a false positive. Here the classification is whitelist so it’s probably a false positive.

To access this value in a Python dictionary, you need to index its corresponding key (e.g. result['classification']). In your threat hunting tool, you could wrap this in a Try / Except block.

Python
# check if the 'classification' key is present
try:
 # print the 'classficiation' of the domain name
 print(f"\\n=> The domain {clean_domain} has been identified as {result['classification']} by Malitverse\\n")
except KeyError:
 # no classficiation avaliable from Maltiverse
 print(f"\\n - The domain name {clean_domain} cannot be classified by Malitverse\\n")

# code continues
...

Here we test if the key value classification is present by referencing it in a Python f-string statement. If there is no classification for the domain name given by Maltiverse, then a KeyError will be generated. You can catch this error using the except statement and print that Maltiverse has not classified this domain.

Done!

Our Python threat hunting tool is complete. You can now start it up and query domain names to see whether they are malicious using the Maltiverse API. This tool allows you to quickly investigate domain names found during your threat hunting activities right from the command line!

Running the Maltiverse Checker with JSON Parsing Script

The full script can be found on GitHub and will look like this.

Python
import json
import requests
import validators

# URL to send Maltiverse API requests related to domain names
url = 'https://api.maltiverse.com/hostname/'

while True:
 # get user input
 domain = input("Please enter domain to search for (or type exit to quit): ")
 # clean user input
 clean_domain = domain.strip()

 # check if the user wants to exit
 if clean_domain == "exit":
   print("Goodbye.")
   break

 # validate the domain address
 if validators.domain(clean_domain):
   # query to Maltiverse API for a response (returned as a JSON object)
   response = requests.get(url + clean_domain)
   # parse the JSON object to just get the relevant text data
   result = json.loads(response.text)

   # check if the 'classficiation' key is present
   try:
     # print the 'classficiation' of the IP address
     print(f"\\n=> The domain {clean_domain} has been identified as {result['classification']} by Malitverse\\n")
   except KeyError:
     # no classficiation avaliable from Maltiverse
     print(f"\\n - The domain name {clean_domain} cannot be classified by Malitverse\\n")  
     
 else
   print(f" - {clean_domain} is not a valid domain name. Please try again...\\n")
   continue

You should explore ways of improving this script:

  • Can you add the functionality to query if file hashes are malicious?
  • Can you use other threat intelligence APIs to confirm if the results returned by Maltiverse are legitimate?
  • Can you automatically query IP addresses returned from domain names to see if they are considered malicious?

Let me know how you get on adapting this script to your own threat hunting needs!

Discover more in the Python Threat Hunting Tools series!