The Cisco Umbrella Investigate add-on for Splunk leverages the Investigate API to enrich events within Splunk. The add-on can be obtained from Splunkbase at no additional cost. A license for Investigate API is required in order to use it and as such, this is plugin is only available with the API in their Umbrella subscription package.
The Splunk add-on for Investigate combines your data sources from other security tools such as SIEM, IDS/IPS and firewall with the power of Investigate's API, allowing you to feed data against Investigate quickly and easily within Splunk.
For more information, please see our datasheet: https://learn-umbrella.cisco.com/datasheets/splunk-add-on-for-investigate
The Splunk add-on allows for three types of data to be queried: domain names, IP addresses and file hashes. There are three key steps: installing the add-on with the appropriate settings, then creating a scheduled search to pull data and then reviewing the data with Investigate.
- Cisco Investigate API key
- A running Splunk instance
The instructions here have been tested most recently with Splunk 6.4.2 and 6.5.2 in a Linux environment and Splunk 6.4.2 in a Windows environment.
IMPORTANT: The scheduled search should only be for domains that are alerting on your security events in a SIEM or other system. You cannot use the scheduled search for your entire traffic logs due to API throttling restrictions. Please expect a maximum throughput of 5,000 unique domains per hour. Your saved search should not contain more than this number of domains.
The add-on can be found on Splunkbase: https://splunkbase.splunk.com/app/3324/
These are the external dependencies used to aid in this add-on's functionality.These dependencies are packaged with the add-on, so there's no need to perform any installation, but it is noted here so that you can make informed decisions about licensing, etc.
Install into Splunk with your method of choice:
Splunk Web: go into the Manage Apps page and click the “Install app from file” option, then follow the instructions.
Splunk CLI: download the opendns_investigate.tgz file to your Splunk node of choice and install with the following command:
$SPLUNK_HOME/splunk/bin/splunk install app cisco-umbrella-investigate-add-on_040.tgz -auth <username>:<password>
Both methods will require a restart of the Splunk node.
After starting the node, navigate back to the Manage Apps page, find the listing for the Umbrella Investigate add-on and click the “Set up” option. This will load the standard setup page for the add-on.
Typically the Managed Apps page is found under the gear icon from the main launch page, next to Apps:
If you wish to change the name of the add-on, click View Properties.
The Investigate API key is needed to authenticate the API requests. This can be obtained from the Investigate UI dashboard (note: not all Investigate customers have access to the API). Once you have the key, you will need to enter your Cisco Umbrella Investigate API key in data inputs. This is to ensure your API key is stored in an encrypted format. Go to Settings > Data Inputs > Cisco Investigate Credentials. Click 'new'. Enter any name you like, and then enter the API key gathered from the Investigate dashboard. Click next, and your API key will be encrypted and saved. If you are ever issued a new API key, you can update it here.
Any proxy authentication is also set here.
NOTE: some earlier versions of the add-on had the API key in the setup screens shown below and this may still be the case if you have not yet updated your add-on to the newest version.
Once you've done that, pick the Set up and this screen is shown (the screenshot is in two parts for ease of reading):
You will be prompted to enter information into the setup page when you first start the add-on. These include:
- Request destination fields: The add-on expects defined fields. These are the same fields you’ll define in the search regex (e.g. domain, host, destination, etc), typically with the domain, IP or hash information. These should be entered in comma separated format.
- Scheduled search name: The name of the saved search you want us to pull domain information from. More information on creating the saved search is below; if the search has not been created yet, this can be skipped.
The next two fields define the pruning of data to ensure the data store does not exceed a certain size.
Set how far back in time you want to save data: to limit the size of the data store, here you define how long you would like to save data. The format should be saved as a Splunk time modifier for search. For example, if you wish to save data for a week, you would enter -7d@d. Leaving this blank will disable timestamp pruning
Set how much data you want to save: to prune data not to exceed a specific number of rows, set the max number of rows here. Anything excess will be deleted in time-ascending order (i.e.: oldest first). Leaving this blank will disable size pruning.
The final three settings are tied to support for proxy servers and nonstandard hosts and ports:
- Proxies: Set the IP and port of your proxy server. requests to the investigate API. Make sure to use the following format: ip:port. Only IP and port are required, not protocol. If this is blank, the add-on will make direct connections to the Investigate API. Proxy authentication is handled in the section above. Note: We currently only support http/https proxying (and not SOCKS proxies).
Host name: use this to set the hostname of the Splunk management server, if different from this host.
Port : set the Splunk Management port.
Next, we need to create a scheduled search. You will need to create one with the ability to get any kind of destination (domain, IP or hash) from log files.
For instance, the field itself may be called “dest”, and in the examples here, we're using example firewall logs, which use the field name “dest_host_blocked”.
The scheduled search should only be for domains that are alerting on your security events in a SIEM or other system. You cannot use the scheduled search for your entire traffic logs due to API throttling restrictions.
A new scheduled search can be created from within Splunk Web by going to Settings > Searches, reports, and alerts and once in that section, clicking the New button. Pick the app context first:
This scheduled search should query for certain time ranges. For instance, it may poll every hour for the data from two hours before it is run. So it may run every 5 minutes after the hour (e.g. 11:05AM) and look for data within a one hour segment, beginning two hours before (9AM-10AM if the current time is 11AM).
The schedule should be made carefully to ensure one search is not still running while another one kicks off as this could lead to significant performance issues within Splunk.
Make sure permissions are set correctly so the add-on and user have permissions to view the search report. Make sure that the scheduled search name matches the exact name, as this is case-sensitive.
An example saved search query would be:
index="firewall_logs" earliest=-2h latest=-1h | fields dest_host_blocked
A more complex example, specific to a single host:
index="firewall_events" earliest=-2h latest=-1h cs_host=adobe.com | fields cs_host, cs_hash
This way, the dest_host_blocked field for requests in firewall logs index will be filtered in a simple to parse way for the add-on to process. If you want to do a more logical conditional, like >, then you can use 'where'.
Saved search queries and how they're constructed ultimately depend on your data sources and Umbrella support may not be able to recommend which fields are appropriate for your indexes. However you pull data into the Splunk add-on as an index will depend entirely on your datasources.
The scheduled search should only be for domains that are alerting on your security events in a SIEM or other system. You cannot use the scheduled search for your entire traffic logs due to API throttling restrictions and you may find that volume of data exceeds the natural limits of your Splunk instance. Beyond that however the volume of data from non-security-related internet traffic will simply be of no use to your team.
Please expect a maximum throughput of 5,000 unique domains per hour. Your saved search should not contain more than this number of domains.
Next, be sure to enable the scripted input for the add-on. You will need to:
- Go to the Data Inputs settings under "Settings".
- Under "Local inputs", click "Scripts".
- Click to enable the add-on's scripted input: $SPLUNK_HOME/etc/apps/opendns_investigate/bin/investigate_input.py
4.Configure the schedule it will run on by clicking its link and modifying the interval value
Once you've created your scheduled search, go back to the 'Set up' section and add the scheduled search name in the appropriate field.
When installing on a distributed cluster, the add-on (scripted input) must be installed on the search head (or one of the search heads). That node will run the add-on process.
The basics of the Splunk App are three key collectors, each matching a particular set of API results: one for domains, one for IP addresses and one for file hashes.
To view contents of the store containing your Investigate data, create a Splunk search with the following command for domains:
| inputlookup investigate_domains
For IP addresses use:
| inputlookup investigate_ips
For file hashes, use:
| inputlookup investigate_hashes
You can use the contents of the store to enrich event data within Splunk.
Each of the three stores of data (domains, IP addresses and hashes) is treated as a separate set of keys for data. This is because the data types are fundamentally different. The output matches closely, but not exactly, to what you would typically see using the API to query.
Use standard data sorting techniques to build queries, such as:
| inputlookup investigate_domains | where not isnull('cooccurrences.0') | fields dest, cooccurrences.0, status_label, last_queried | sort -last_queried
The 'investigate_domains' query broadly covers the same fields as the API would for any given domain, such as the ASN of the domain, the content categories it matches, any cooccurrences or related domains, DGA score, whether it is in fast flux, general status (known bad or unknown) and WHOIS data. Information about all of these fields can be found earlier in the API documentation.
The 'investigate_ips' query covers the destination (the IP itself), the last queried time, the resource record history for that IP (DNS RR History for an IP), as well as the labels for the domains that resolved to this IP at one point:
The 'investigate_hashes' query covers AV results, as well as network connections, file type (magic type) and security categories:
For more information about some of the above data, as well as information about any additional fields, see the Investigate API documentation above.
There is a custom search command which can filter out search results to only contain hosts with a certain status from the Investigate API—e.g., you can filter out only search results that have a malicious host.
You must be in the Cisco Investigate app context to use this command.
For example, if you have an index named
proxy_logs which stores hosts in a field named
host, then you can run this command in the search box to filter out indices to only include those whose
host field is a malicious host, according to the Investigate API:
index="proxy_logs" | investigatefilter host_field=host
By default, the
status parameter is assigned an argument of -1 (i.e. malicious). However, you can search for any supported status code (-1, 0, or 1). For example, to filter out indices to only include hosts that are deemed benign, you can run:
index="proxy_logs" | investigatefilter host_field=host status=1
If you like, you can make this your saved search for the Investigate add-on so that it only enriches data with malicious hosts.
A script has been provided for pruning of KV Store collections used by this add-on.
The following two methods can be configured and enabled. This can also be done in the user interface as options in the set up.
- time-based: Entries older than a user-supplied time modifier, e.g. "-7d@d" would
delete everything older than 7 days.
- size-based: A limit can be set on the max number of rows in a collection.
When run, the pruning script will delete rows in time-ascending (i.e. oldest first)
order until the number of rows is equal to the maximum.
Both of these options can be set in the add-on setup page.
- Go to the Data Inputs settings under "Settings".
- Under "Local inputs", click "Scripts".
- Click to enable the add-on's scripted input:
- Configure the schedule it will run on by clicking its link and modifying the interval value
Support can be reached at: email@example.com