Author: Allen Ace
Lab: Home DFIR Lab (Oracle VirtualBox + BOTS v2)
Date: 2025-09-25
This document describes a hands-on threat hunting exercise using Splunk and the Boss of the SOC (BOTS) v2 dataset. The focus is detecting reconnaissance against public-facing webservers carried out via a non-standard browser. Key findings include a suspicious user agent (NaenaraBrowser) connecting via an ExpressVPN IP and downloading company_contacts.xlsx.
- Practice Splunk-based threat hunting workflows
- Detect reconnaissance behaviors and anomalous user agents
- Pivot from user agents to source IPs and accessed resources
- Produce IOCs and reproducible hunt queries
VM used: Windows VM1 (Splunk instance)
Dataset: Boss of the SOC v2 (Attack Only)
Ingest steps (summary):
- Download Attack-Only dataset from https://github.com/splunk/botsv2
- Extract with 7-Zip (decompress then unarchive)
- Move
botsv2_data_setintoC:\Program Files\Splunk\etc\apps - Restart Splunk (if required) and verify ingestion with:
spl
index=botsv2 sourcetype=stream:smtp
Set time picker to 01 Aug 2017 → 31 Aug 2017.
Threat Intel Excerpt (Hunt seed)
“The unknown adversary is conducting reconnaissance of public-facing webservers using a non-standard browser over port 80.”
Hunt hypothesis: An adversary probed www.froth.ly in August 2017 using a non-standard browser. We expect to find anomalous user agent strings and associated source IPs that accessed sensitive assets.
Our first query will be using the sourcetype stream:http to see what fields are available
Spl
index=botsv2 sourcetype=stream:http
As seen below, there are fields for user agent strings (“http_user_agent”) and website (”site”) that we can use to determine which user agents accessed our organization website, froth.ly.
We can now write a query that will show us all user agent strings that accessed our website froth.ly and the number of times it was seen.
index=botsv2 sourcetype=stream:http site="www.froth.ly" | stats count by http_user_agent | sort + count
Using an open source website, linked below, we can learn more about the user agent string.
https://explore.whatismybrowser.com/useragents/parse/
As seen above, the Naenara browser is being run from a Fedora Linux system. The Linux system is not uncommon; however, we should do more research on the browser. By using Google we discovered that Naenara is a North Korean web browser.
While this is informative we should not jump to conclusions when it comes to attribution. We can now pivot from the user agent string to discover more, such as source IP address.
We can pivot by clicking the user agent string and selecting view events. This adds the user agent string to our query.
Now we can add a stats count by the source destination ip address to determine the IP address the session with the Naenar browser connected to our website.
index=botsv2 sourcetype=stream:http site="www.froth.ly" http_user_agent="Mozilla/5.0 (X11; U; Linux i686; ko-KP; rv: 19.1br) Gecko/20130508 Fedora/1.9.1-2.5.rs3.0 NaenaraBrowser/3.5b4" | stats count by src dest
As seen below, all the sessions were from a single IP address, 85.203.47.86.
Now we know the IP address (85.203.47.86) that the user agent used to connect to www.froth.ly.
We can then research the IP address using open source intelligence. Using IPinfo.io, with a free account, we can research the IP address 85.203.47.86.
As seen below, we can see that 85.203.47.86 is part of the ExpressVPN service.
At this point we know that a suspicious user agent accessed our public-facing website. We need to continue our analysis to determine what information was accessed. This may give us insight in to the intent of the attacker.
By running our previous query, we can look at other available fields and what kind of information we can discover.
index=botsv2 sourcetype=stream:http site="www.froth.ly" http_user_agent="Mozilla/5.0 (X11; U; Linux i686; ko-KP; rv: 19.1br) Gecko/20130508 Fedora/1.9.1-2.5.rs3.0 NaenaraBrowser/3.5b4" | stats count by src dest
As seen below, there is a field named http_content_type. That field includes the type of information that was accessed. The most interesting type of information access is a spreadsheet.
By adding the interesting content type to our search and reviewing the uri_path field, we can see the name of the spreadsheet is company_contacts.xlsx.
Creating a table, the information is presented in a more useful format.
index=botsv2 sourcetype=stream:http site="www.froth.ly" http_user_agent="Mozilla/5.0 (X11; U; Linux i686; ko-KP; rv: 19.1br) Gecko/20130508 Fedora/1.9.1-2.5.rs3.0 NaenaraBrowser/3.5b4" http_content_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" | table _time, src, dest, uri_path, url
A spreadsheet with company contacts is the kind of information an adversary would search for during a reconnaissaince. This could provide them with a target list for social engineering/phishing.
Below is a diagram of what we know.
- Practice Splunk-based threat hunting workflowsPractice Splunk-based threat hunting workflowsAnomalous user agent: NaenaraBrowser/3.5b4 (appears to be Naenara, a DPRK browser)
- Practice Splunk-based threat hunting workflowsPractice Splunk-based threat hunting workflowsSource IP: 85.203.47.86 (identified as an ExpressVPN endpoint via IPinfo)
- Practice Splunk-based threat hunting workflowsAccessed resource: /path/to/company_contacts.xlsx (download of spreadsheet)
- Practice Splunk-based threat hunting workflowsConclusion: Adversary-level reconnaissance likely aimed at harvesting contact lists for spear-phishing or follow-on attacks. Attribution is not conclusive due to VPN usage; focus on IOCs and response.
- Practice Splunk-based threat hunting workflowsUser agent: Mozilla/5.0 (X11; U; Linux i686; ko-KP; rv:19.1br) ... NaenaraBrowser/3.5b4
- Practice Splunk-based threat hunting workflowsIP: 85.203.47.86 (ExpressVPN endpoint)
- Practice Splunk-based threat hunting workflowsURI: /company_contacts.xlsx
- Practice Splunk-based threat hunting workflowsTime range: August 2017 (see timeline or screenshots)
Save these to ioc.yml or iocs.csv for ingestion by detection tools.
- Practice Splunk-based threat hunting workflowsIngest BOTS v2 Attack dataset into Splunk (place under etc/apps/ and restart).
- Practice Splunk-based threat hunting workflowsSet time picker to Aug 2017.
- Practice Splunk-based threat hunting workflowsRun the queries listed above.
- Practice Splunk-based threat hunting workflowsPivot on anomalous UAs and follow the src field to get IPs.
- Practice Splunk-based threat hunting workflowsInvestigate whether the attacker returned using the same or different UAs/IPs.
- Practice Splunk-based threat hunting workflowsHunt for suspicious POSTs, form submissions, or exploit attempts in server logs.
- Practice Splunk-based threat hunting workflowsCheck web server access logs for HTTP referrers and potential exploit strings.
- Practice Splunk-based threat hunting workflowsExpand hunting to DNS logs and firewall logs for related hostnames/IPs.
- Practice Splunk-based threat hunting workflowsSplunk BOTS v2: https://github.com/splunk/botsv2
- Practice Splunk-based threat hunting workflowsWhatIsMyBrowser UA parser: https://explore.whatismybrowser.com/useragents/parse/
- Practice Splunk-based threat hunting workflowsIP enrichment: https://ipinfo.io/
This investigation continues into the next phase of the attack lifecycle, focusing on how the adversary gains initial access after reconnaissance.
License
This repository is licensed under the MIT License. See LICENSE for details.











