I've been presenting DFIR Redefined: Deeper Functionality for Investigators with R across the country at various conference venues and thought it would helpful to provide details for readers.
The basic premise?
Incident responders and investigators need all the help they can get.
Let me lay just a few statistics on you, from Secure360.org's The Challenges of Incident Response, Nov 2016. Per their respondents in a survey of security professionals:
- 38% reported an increase in the number of hours devoted to incident response
- 42% reported an increase in the volume of incident response data collected
- 39% indicated an increase in the volume of security alerts
The 2017 SANS Incident Response Survey, compiled by Matt Bromiley in June, reminds us that “2016 brought unprecedented events that impacted the cyber security industry, including a myriad of events that raised issues with multiple nation-state attackers, a tumultuous election and numerous government investigations.” Further, "seemingly continuous leaks and data dumps brought new concerns about malware, privacy and government overreach to the surface.”
Finally, the survey shows that IR teams are:
- Detecting the attackers faster than before, with a drastic improvement in dwell time
- Containing incidents more rapidly
- Relying more on in-house detection and remediation mechanisms
- Have you been pwned?
- Visualization for malicious Windows Event Id sequences
- How do your potential attackers feel, or can you identify an attacker via sentiment analysis?
- Fast Frugal Trees (decision trees) for prioritizing criticality
With R you can interface with data via file ingestion, database connection, APIs and benefit from a wide range of packages and strong community investment.
From the Win-Vector Blog, per John Mount “not all R users consider themselves to be expert programmers (many are happy calling themselves analysts). R is often used in collaborative projects where there are varying levels of programming expertise.”
I propose that this represents the vast majority of us, we're not expert programmers, data scientists, or statisticians. More likely, we're security analysts re-using code for our own purposes, be it red team or blue team. With a very few lines of R investigators might be more quickly able to reach conclusions.
All the code described in the post can be found on my GitHub.
Have you been pwned?
This scenario I covered in an earlier post, I'll refer you to Toolsmith Release Advisory: Steph Locke's HIBPwned R package.
Visualization for malicious Windows Event Id sequences
How do your potential attackers feel, or can you identify an attacker via sentiment analysis?
- twitteR: provides access to the Twitter API. Most functionality of the API is supported, with a bias towards API calls that are more useful in data analysis as opposed to daily interaction.
- Rtweet: R client for interacting with Twitter’s REST and stream API’s.
This gives you an immediate feels for spikes in interest by day as well as time of day, particularly with attention to retweets.
The result in the scenario ironically indicates that the majority of related tweets using our hashtags of interest are coming from Androids per Figure 8. :-)
orig$text[which.max(orig$emotionalValence)] tells us that the most positive tweet is "A bunch of Internet tech companies had to work together to clean up #WireX #Android #DDoS #botnet."
orig$text[which.min(orig$emotionalValence)] tells us that "Dangerous #WireX #Android #DDoS #Botnet Killed by #SecurityGiants" is the most negative tweet.
Interesting right? Almost exactly the same message, but very different valence.
How do we measure emotional valence changes over the day? Four lines later...
filter(orig, mday(created) == 29) %>%
ggplot(aes(created, emotionalValence)) +
geom_smooth(span = .5)
Three lines of R...
ggplot(orig, aes(x = emotionalValence, y = retweetCount)) +
geom_point(position = 'jitter') +
...and we learn just how popular negative tweets are in Figure 10.
In Part 2 of DFIR Redefined: Deeper Functionality for Investigators with R we'll explore this scenario further via sentiment analysis and Twitter data, as well as Fast Frugal Trees (decision trees) for prioritizing criticality.
Let me know if you have any questions on the first part of this series via @holisticinfosec or russ at holisticinfosec dot org.
Cheers...until next time.