Sunday, February 11, 2018

toolsmith #131 - The HELK vs APTSimulator - Part 1

Ladies and gentlemen, for our main attraction, I give you...The HELK vs APTSimulator, in a Death Battle! The late, great Randy "Macho Man" Savage said many things in his day, in his own special way, but "Expect the unexpected in the kingdom of madness!" could be our toolsmith theme this month and next. Man, am I having a flashback to my college days, many moons ago. :-) The HELK just brought it on. Yes, I know, HELK is the Hunting ELK stack, got it, but it reminded me of the Hulk, and then, I thought of a Hulkamania showdown with APTSimulator, and Randy Savage's classic, raspy voice popped in my head with "Hulkamania is like a single grain of sand in the Sahara desert that is Macho Madness." And that, dear reader, is a glimpse into exactly three seconds or less in the mind of your scribe, a strange place to be certain. But alas, that's how we came up with this fabulous showcase.
In this corner, from Roberto Rodriguez, @Cyb3rWard0g, the specter in SpecterOps, it's...The...HELK! This, my friends, is the s**t, worth every ounce of hype we can muster.
And in the other corner, from Florian Roth, @cyb3rops, the The Fracas of Frankfurt, we have APTSimulator. All your worst adversary apparitions in one APT mic drop. Battle!

Now with that out of our system, let's begin. There's a lot of goodness here, so I'm definitely going to do this in two parts so as not undervalue these two offerings.
HELK is incredibly easy to install. Its also well documented, with lots of related reading material, let me propose that you take the tine to to review it all. Pay particular attention to the wiki, gain comfort with the architecture, then review installation steps.
On an Ubuntu 16.04 LTS system I ran:
  • git clone
  • cd HELK/
  • sudo ./ 
Of the three installation options I was presented with, pulling the latest HELK Docker Image from cyb3rward0g dockerhub, building the HELK image from a local Dockerfile, or installing the HELK from a local bash script, I chose the first and went with the latest Docker image. The installation script does a fantastic job of fulfilling dependencies for you, if you haven't installed Docker, the HELK install script does it for you. You can observe the entire install process in Figure 1.
Figure 1: HELK Installation
You can immediately confirm your clean installation by navigating to your HELK KIBANA URL, in my case
For my test Windows system I created a Windows 7 x86 virtual machine with Virtualbox. The key to success here is ensuring that you install Winlogbeat on the Windows systems from which you'd like to ship logs to HELK. More important, is ensuring that you run Winlogbeat with the right winlogbeat.yml file. You'll want to modify and copy this to your target systems. The critical modification is line 123, under Kafka output, where you need to add the IP address for your HELK server in three spots. My modification appeared as hosts: ["","",""]. As noted in the HELK architecture diagram, HELK consumes Winlogbeat event logs via Kafka.
On your Windows systems, with a properly modified winlogbeat.yml, you'll run:
  • ./winlogbeat -c winlogbeat.yml -e
  • ./winlogbeat setup -e
You'll definitely want to set up Sysmon on your target hosts as well. I prefer to do so with the @SwiftOnSecurity configuration file. If you're doing so with your initial setup, use sysmon.exe -accepteula -i sysmonconfig-export.xml. If you're modifying an existing configuration, use sysmon.exe -c sysmonconfig-export.xml.  This will ensure rich data returns from Sysmon, when using adversary emulation services from APTsimulator, as we will, or experiencing the real deal.
With all set up and working you should see results in your Kibana dashboard as seen in Figure 2.

Figure 2: Initial HELK Kibana Sysmon dashboard.
Now for the showdown. :-) Florian's APTSimulator does some comprehensive emulation to make your systems appear compromised under the following scenarios:
  • POCs: Endpoint detection agents / compromise assessment tools
  • Test your security monitoring's detection capabilities
  • Test your SOCs response on a threat that isn't EICAR or a port scan
  • Prepare an environment for digital forensics classes 
This is a truly admirable effort, one I advocate for most heartily as a blue team leader. With particular attention to testing your security monitoring's detection capabilities, if you don't do so regularly and comprehensively, you are, quite simply, incomplete in your practice. If you haven't tested and validated, don't consider it detection, it's just a rule with a prayer. APTSimulator can be observed conducting the likes of:
  1. Creating typical attacker working directory C:\TMP...
  2. Activating guest user account
    1. Adding the guest user to the local administrators group
  3. Placing a svchost.exe (which is actually srvany.exe) into C:\Users\Public
  4. Modifying the hosts file
    1. Adding mapping to private IP address
  5. Using curl to access well-known C2 addresses
    1. C2:
  6. Dropping a Powershell netcat alternative into the APT dir
  7. Executes nbtscan on the local network
  8. Dropping a modified PsExec into the APT dir
  9. Registering mimikatz in At job
  10. Registering a malicious RUN key
  11. Registering mimikatz in scheduled task
  12. Registering cmd.exe as debugger for sethc.exe
  13. Dropping web shell in new WWW directory
A couple of notes here.
Download and install APTSimulator from the Releases section of its GitHub pages.
APTSimulator includes curl.exe, 7z.exe, and 7z.dll in its helpers directory. Be sure that you drop the correct version of 7 Zip for your system architecture. I'm assuming the default bits are 64bit, I was testing on a 32bit VM.

Let's do a fast run-through with HELK's Kibana Discover option looking for the above mentioned APTSimulator activities. Starting with a search for TMP in the sysmon-* index yields immediate results and strikes #1, 6, 7, and 8 from our APTSimulator list above, see for yourself in Figure 3.

Figure 3: TMP, PS nc, nbtscan, and PsExec in one shot
Created TMP, dropped a PowerShell netcat, nbtscanned the local network, and dropped a modified PsExec, check, check, check, and check.
How about enabling the guest user account and adding it to the local administrator's group? Figure 4 confirms.

Figure 4: Guest enabled and escalated
Strike #2 from the list. Something tells me we'll immediately find svchost.exe in C:\Users\Public. Aye, Figure 5 makes it so.

Figure 5: I've got your svchost right here
Knock #3 off the to-do, including the process.commandline,, and file.creationtime references. Up next, the At job and scheduled task creation. Indeed, see Figure 6.

Figure 6. tasks OR schtasks
I think you get the point, there weren't any misses here. There are, of course, visualization options. Don't forget about Kibana's Timelion feature. Forensicators and incident responders live and die by timelines, use it to your advantage (Figure 7).

Figure 7: Timelion
Finally, for this month, under HELK's Kibana Visualize menu, you'll note 34 visualizations. By default, these are pretty basic, but you quickly add value with sub-buckets. As an example, I selected the Sysmon_UserName visualization. Initially, it yielded a donut graph inclusive of malman (my pwned user), SYSTEM and LOCAL SERVICE. Not good enough to be particularly useful I added a sub-bucket to include process names associated with each user. The resulting graph is more detailed and tells us that of the 242 events in the last four hours associated with the malman user, 32 of those were specific to cmd.exe processes, or 18.6% (Figure 8).

Figure 8: Powerful visualization capabilities
This has been such a pleasure this month, I am thrilled with both HELK and APTSimulator. The true principles of blue team and detection quality are innate in these projects. The fact that Roberto consider HELK still in alpha state leads me to believe there is so much more to come. Be sure to dig deeply into APTSimulator's Advance Solutions as well, there's more than one way to emulate an adversary.
Next month Part 2 will explore the Network side of the equation via the Network Dashboard and related visualizations, as well as HELK integration with Spark, Graphframes & Jupyter notebooks.
Aw snap, more goodness to come, I can't wait.
Cheers...until next time.

Monday, January 01, 2018

toolsmith #130 - OSINT with Buscador

First off, Happy New Year! I hope you have a productive and successful 2018. I thought I'd kick off the new year with another exploration of OSINT. In addition to my work as an information security leader and practitioner at Microsoft, I am privileged to serve in Washington's military as a J-2 which means I'm part of the intelligence directorate of a joint staff. Intelligence duties in a guard unit context are commonly focused on situational awareness for mission readiness. Additionally, in my unit we combine part of J-6 (command, control, communications, and computer systems directorate of a joint staff) with J-2, making Cyber Network Operations a J-2/6 function. Open source intelligence (OSINT) gathering is quite useful in developing indicators specific to adversaries as well as identifying targets of opportunity for red team and vulnerability assessments. We've discussed numerous OSINT offerings as part of toolsmiths past, there's no better time than our 130th edition to discuss an OSINT platform inclusive of previous topics such as Recon-ng, Spiderfoot, Maltego, and Datasploit. Buscador is just such a platform and comes from genuine OSINT experts Michael Bazzell and David Wescott. Buscador is "a Linux Virtual Machine that is pre-configured for online investigators." Michael is the author of Open Source Intelligence Techniques (5th edition) and Hiding from the Internet (3rd edition). I had a quick conversation with him and learned that they will have a new release in January (1.2), which will address many issues and add new features. Additionally, it will also revamp Firefox since the release of version 57. You can download Buscador as an OVA bundle for a variety of virtualization options, or as a ISO for USB boot devices or host operating systems. I had Buscador 1.1 up and running on Hyper-V in a matter of minutes after pulling the VMDK out of the OVA and converting it with QEMU. Buscador 1.1 includes numerous tools, in addition to the above mentioned standard bearers, you can expect the following and others:
  • Creepy
  • Metagoofil
  • MediaInfo
  • ExifTool
  • EmailHarvester
  • theHarvester
  • Wayback Exporter
  • HTTrack Cloner
  • Web Snapper
  • Knock Pages
  • SubBrute
  • Twitter Exporter
  • Tinfoleak 
  • InstaLooter 
  • BleachBit 
Tools are conveniently offered via the menu bar on the UI's left, or can easily be via Show Applications.
To put Buscador through its paces, using myself as a target of opportunity, I tested a few of the tools I'd not prior utilized. Starting with Creepy, the geolocation OSINT tool, I configured the Twitter plugin, one of the four available (Flickr, Google+, Instagram, Twitter) in Creepy, and searched holisticinfosec, as seen in Figure 1.
Figure 1:  Creepy configuration

The results, as seen in Figure 2, include some good details, but no immediate location data.

Figure 2: Creepy results
Had I configured the other plugins or was even a user of Flickr or Google+, better results would have been likely. I have location turned off for my Tweets, but my profile does profile does include Seattle. Creepy is quite good for assessing targets who utilize social media heavily, but if you wish to dig more deeply into Twitter usage, check out Tinfoleak, which also uses geo information available in Tweets and uploaded images. The report for holisticinfosec is seen in Figure 3.

Figure 3: Tinfoleak
If you're looking for domain enumeration options, you can start with Knock. It's as easy as handing it a domain, I did so with as seen in Figure 4, results are in Figure 5.
Figure 4: Knock run
Figure 5: Knock results
Other classics include HTTrack for web site cloning, and ExifTool for pulling all available metadata from images. HTTrack worked instantly as expected for I used Instalooter, "a program that can download any picture or video associated from an Instagram profile, without any API access", to grab sample images, then ran pyExifToolGui against them. As a simple experiment, I ran Instalooter against the infosec.memes Instagram account, followed by pyExifToolGui against all the downloaded images, then exported Exif metadata to HTML. If I were analyzing images for associated hashtags the export capability might be useful for an artifacts list.
Finally, one of my absolute favorites is Metagoofil, "an information gathering tool designed for extracting metadata of public documents." I did a quick run against my domain, with the doc retrieval parameter set at 50, then reviewed full.txt results (Figure 6), included in the output directory (home/Metagoofil) along with authors.csv, companies.csv, and modified.csv.

Figure 6: Metagoofil results

Metagoofil is extremely useful for gathering target data, I consider it a red team recon requirement. It's a faster, currently maintained offering that has some shared capabilities with Foca. It should also serve as a reminder just how much information is available in public facing documents, consider stripping the metadata before publishing. 

It's fantastic having all these capabilities ready and functional on one distribution, it keeps the OSINT discipline close at hand for those who need regular performance. I'm really looking forward to the Buscador 1.2 release, and better still, I have it on good authority that there is another book on the horizon from Michael. This is a simple platform with which to explore OSINT, remember to be a good citizen though, there is an awful lot that can be learned via these passive means.
Cheers...until next time.

Sunday, November 19, 2017

toolsmith #129 - DFIR Redefined: Deeper Functionality for Investigators with R - Part 2

You can have data without information, but you cannot have information without data. ~Daniel Keys Moran

Here we resume our discussion of DFIR Redefined: Deeper Functionality for Investigators with R as begun in Part 1.
First, now that my presentation season has wrapped up, I've posted the related material on the Github for this content. I've specifically posted the most recent version as presented at SecureWorld Seattle, which included Eric Kapfhammer's contributions and a bit of his forward thinking for next steps in this approach.
When we left off last month I parted company with you in the middle of an explanation of analysis of emotional valence, or the "the intrinsic attractiveness (positive valence) or averseness (negative valence) of an event, object, or situation", using R and the Twitter API. It's probably worth your time to go back and refresh with the end of Part 1. Our last discussion point was specific to the popularity of negative tweets versus positive tweets with a cluster of emotionally neutral retweets, two positive retweets, and a load of negative retweets. This type of analysis can quickly give us better understanding of an attacker collective's sentiment, particularly where the collective is vocal via social media. Teeing off the popularity of negative versus positive sentiment, we can assess the actual words fueling such sentiment analysis. It doesn't take us much R code to achieve our goal using the apply family of functions. The likes of apply, lapply, and sapply allow you to manipulate slices of data from matrices, arrays, lists and data frames in a repetitive way without having to use loops. We use code here directly from Michael Levy, Social Scientist, and his Playing with Twitter Data post.

polWordTables = 
  sapply(pol, function(p) {
    words = c(positiveWords = paste(p[[1]]$pos.words[[1]], collapse = ' '), 
              negativeWords = paste(p[[1]]$neg.words[[1]], collapse = ' '))
    gsub('-', '', words)  # Get rid of nothing found's "-"
  }) %>%
  apply(1, paste, collapse = ' ') %>% 
  stripWhitespace() %>% 
  strsplit(' ') %>%

par(mfrow = c(1, 2))
  lapply(1:2, function(i) {
    dotchart(sort(polWordTables[[i]]), cex = .5)

The result is a tidy visual representation of exactly what we learned at the end of Part 1, results as noted in Figure 1.

Figure 1: Positive vs negative words
Content including words such as killed, dangerous, infected, and attacks are definitely more interesting to readers than words such as good and clean. Sentiment like this could definitely be used to assess potential attacker outcomes and behaviors just prior, or in the midst of an attack, particularly in DDoS scenarios. Couple sentiment analysis with the ability to visualize networks of retweets and mentions, and you could zoom in on potential leaders or organizers. The larger the network node, the more retweets, as seen in Figure 2.

Figure 2: Who is retweeting who?
Remember our initial premise, as described in Part 1, was that attacker groups often use associated hashtags and handles, and the minions that want to be "part of" often retweet and use the hashtag(s). Individual attackers either freely give themselves away, or often become easily identifiable or associated, via Twitter. Note that our dominant retweets are for @joe4security, @HackRead,  @defendmalware (not actual attackers, but bloggers talking about attacks, used here for example's sake). Figure 3 shows us who is mentioning who.

Figure 3: Who is mentioning who?
Note that @defendmalware mentions @HackRead. If these were actual attackers it would not be unreasonable to imagine a possible relationship between Twitter accounts that are actively retweeting and mentioning each other before or during an attack. Now let's assume @HackRead might be a possible suspect and you'd like to learn a bit more about possible additional suspects. In reality @HackRead HQ is in Milan, Italy. Perhaps Milan then might be a location for other attackers. I can feed  in Twittter handles from my retweet and mentions network above, query the Twitter API with very specific geocode, and lock it within five miles of the center of Milan.
The results are immediate per Figure 4.

Figure 4: GeoLocation code and results
Obviously, as these Twitter accounts aren't actual attackers, their retweets aren't actually pertinent to our presumed attack scenario, but they definitely retweeted @computerweekly (seen in retweets and mentions) from within five miles of the center of Milan. If @HackRead were the leader of an organization, and we believed that associates were assumed to be within geographical proximity, geolocation via the Twitter API could be quite useful. Again, these are all used as thematic examples, no actual attacks should be related to any of these accounts in any way.

Fast Frugal Trees (decision trees) for prioritizing criticality

With the abundance of data, and often subjective or biased analysis, there are occasions where a quick, authoritative decision can be quite beneficial. Fast-and-frugal trees (FFTs) to the rescue. FFTs are simple algorithms that facilitate efficient and accurate decisions based on limited information.
Nathaniel D. Phillips, PhD created FFTrees for R to allow anyone to easily create, visualize and evaluate FFTs. Malcolm Gladwell has said that "we are suspicious of rapid cognition. We live in a world that assumes that the quality of a decision is directly related to the time and effort that went into making it.” FFTs, and decision trees at large, counter that premise and aid in the timely, efficient processing of data with the intent of a quick but sound decision. As with so much of information security, there is often a direct correlation with medical, psychological, and social sciences, and the use of FFTs is no different. Often, predictive analysis is conducted with logistic regression, used to "describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables." Would you prefer logistic regression or FFTs?

Figure 5: Thanks, I'll take FFTs
Here's a text book information security scenario, often rife with subjectivity and bias. After a breach, and subsequent third party risk assessment that generated a ton of CVSS data, make a fast decision about what treatments to apply first. Because everyone loves CVSS.

Figure 6: CVSS meh
Nothing like a massive table, scored by base, impact, exploitability, temporal, environmental, modified impact, and overall scores, all assessed by a third party assessor who may not fully understand the complexities or nuances of your environment. Let's say our esteemed assessor has decided that there are 683 total findings, of which 444 are non-critical and 239 are critical. Will FFTrees agree? Nay! First, a wee bit of R code.

cvss <- c:="" coding="" csv="" p="" r="" read.csv="" rees="">cvss.fft <- data="cvss)</p" fftrees="" formula="critical">plot(cvss.fft, what = "cues")
     main = "CVSS FFT",
     decision.names = c("Non-Critical", "Critical"))

Guess what, the model landed right on impact and exploitability as the most important inputs, and not just because it's logically so, but because of their position when assessed for where they fall in the area under the curve (AUC), where the specific curve is the receiver operating characteristic (ROC). The ROC is a "graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied." As for the AUC, accuracy is measured by the area under the ROC curve where an area of 1 represents a perfect test and an area of .5 represents a worthless test. Simply, the closer to 1, the better. For this model and data, impact and exploitability are the most accurate as seen in Figure 7.

Figure 7: Cue rankings prefer impact and exploitability
The fast and frugal tree made its decision where impact and exploitability with scores equal or less than 2 were non-critical and exploitability greater than 2 was labeled critical, as seen in Figure 8.

Figure 8: The FFT decides
Ah hah! Our FFT sees things differently than our assessor. With a 93% average for performance fitting (this is good), our tree, making decisions on impact and exploitability, decides that there are 444 non-critical findings and 222 critical findings, a 17 point differential from our assessor. Can we all agree that mitigating and remediating critical findings can be an expensive proposition? If you, with just a modicum of data science, can make an authoritative decision that saves you time and money without adversely impacting your security posture, would you count it as a win? Yes, that was rhetorical.

Note that the FFTrees function automatically builds several versions of the same general tree that make different error trade-offs with variations in performance fitting and false positives. This gives you the option to test variables and make potentially even more informed decisions within the construct of one model. Ultimately, fast frugal trees make very fast decisions on 1 to 5 pieces of information and ignore all other information. In other words, "FFTrees are noncompensatory, once they make a decision based on a few pieces of information, no additional information changes the decision."

Finally, let's take a look at monitoring user logon anomalies in high volume environments with Time Series Regression (TSR). Much of this work comes courtesy of Eric Kapfhammer, our lead data scientist on our Microsoft Windows and Devices Group Blue Team. The ideal Windows Event ID for such activity is clearly 4624: an account was successfully logged on. This event is typically one of the top 5 events in terms of volume in most environments, and has multiple type codes including Network, Service, and RemoteInteractive.
User accounts will begin to show patterns over time, in aggregate, including:
  • Seasonality: day of week, patch cycles, 
  • Trend: volume of logons increasing/decreasing over time
  • Noise: randomness
You could look at 4624 with a Z-score model, which sets a threshold based on the number of standard deviations away from an average count over a given period of time, but this is a fairly simple model. The higher the value, the greater the degree of “anomalousness”.
Preferably, via Time Series Regression (TSR), your feature set is more rich:
  • Statistical method for predicting a future response based on the response history (known as autoregressive dynamics) and the transfer of dynamics from relevant predictors
  • Understand and predict the behavior of dynamic systems from experimental or observational data
  • Commonly used for modeling and forecasting of economic, financial and biological systems
How to spot the anomaly in a sea of logon data?
Let's imagine our user, DARPA-549521, in the SUPERSECURE domain, with 90 days of aggregate 4624 Type 10 events by day.

Figure 9: User logon data
With 210 line of R, including comments, log read, file output, and graphing we can visualize and alert on DARPA-549521's data as seen in Figure 10

Figure 10: User behavior outside the confidence interval
We can detect when a user’s account exhibits  changes in their seasonality as it relates to a confidence interval established (learned) over time. In this case, on 27 AUG 2017, the user topped her threshold of 19 logons thus triggering an exception. Now imagine using this model to spot anomalous user behavior across all users and you get a good feel for the model's power.
Eric points out that there are, of course, additional options for modeling including:
  • Seasonal and Trend Decomposition using Loess (STL)
    • Handles any type of seasonality ~ can change over time
    • Smoothness of the trend-cycle can also be controlled by the user
    • Robust to outliers
  • Classification and Regression Trees (CART)
    • Supervised learning approach: teach trees to classify anomaly / non-anomaly
    • Unsupervised learning approach: focus on top-day hold-out and error check
  • Neural Networks
    • LSTM / Multiple time series in combination
These are powerful next steps in your capabilities, I want you to be brave, be creative, go forth and add elements of data science and visualization to your practice. R and Python are well supported and broadly used for this mission and can definitely help you detect attackers faster, contain incidents more rapidly, and enhance your in-house detection and remediation mechanisms.
All the code as I can share is here; sorry, I can only share the TSR example without the source.
All the best in your endeavors!
Cheers...until next time.

Tuesday, October 17, 2017

McRee added to ISSA's Honor Roll for Lifetime Achievement

HolisticInfoSec's Russ McRee was pleased to be added to ISSA International's Honor Roll this month, a lifetime achievement award recognizing an individual's sustained contributions to the information security community, the advancement of the association and enhancement of the professionalism of the membership.
According to the press release:
"Russ McRee has a strong history in the information security as a teacher, practitioner and writer. He is responsible for 107 technical papers published in the ISSA Journal under his Toolsmith byline in 2006-2015. These articles represent a body of knowledge for the hands-on practitioner that is second to none. These titles span an extremely wide range of deep network security topics. Russ has been an invited speaker at the key international computer security venues including DEFCON, Derby Con, BlueHat, Black Hat, SANSFIRE, RSA, and ISSA International."
Russ greatly appreciates this honor and would like to extend congratulations to the ten other ISSA 2017 award winners. Sincere gratitude to Briana and Erin McRee, Irvalene Moni, Eric Griswold, Steve Lynch, and Thom Barrie for their extensive support over these many years.

toolsmith #128 - DFIR Redefined: Deeper Functionality for Investigators with R - Part 1

“To competently perform rectifying security service, two critical incident response elements are necessary: information and organization.” ~ Robert E. Davis

I've been presenting DFIR Redefined: Deeper Functionality for Investigators with R across the country at various conference venues and thought it would helpful to provide details for readers.
The basic premise?
Incident responders and investigators need all the help they can get.
Let me lay just a few statistics on you, from's The Challenges of Incident Response, Nov 2016. Per their respondents in a survey of security professionals:
  • 38% reported an increase in the number of hours devoted to incident response
  • 42% reported an increase in the volume of incident response data collected
  • 39% indicated an increase in the volume of security alerts
In short, according to Nathan Burke, “It’s just not mathematically possible for companies to hire a large enough staff to investigate tens of thousands of alerts per month, nor would it make sense.”
The 2017 SANS Incident Response Survey, compiled by Matt Bromiley in June, reminds us that “2016 brought unprecedented events that impacted the cyber security industry, including a myriad of events that raised issues with multiple nation-state attackers, a tumultuous election and numerous government investigations.” Further, "seemingly continuous leaks and data dumps brought new concerns about malware, privacy and government overreach to the surface.”
Finally, the survey shows that IR teams are:
  • Detecting the attackers faster than before, with a drastic improvement in dwell time
  • Containing incidents more rapidly
  • Relying more on in-house detection and remediation mechanisms
To that end, what concepts and methods further enable handlers and investigators as they continue to strive for faster detection and containment? Data science and visualization sure can’t hurt. How can we be more creative to achieve “deeper functionality”? I propose a two-part series on Deeper Functionality for Investigators with R with the following DFIR Redefined scenarios:
  • Have you been pwned?
  • Visualization for malicious Windows Event Id sequences
  • How do your potential attackers feel, or can you identify an attacker via sentiment analysis?
  • Fast Frugal Trees (decision trees) for prioritizing criticality
R is “100% focused and built for statistical data analysis and visualization” and “makes it remarkably simple to run extensive statistical analysis on your data and then generate informative and appealing visualizations with just a few lines of code.”

With R you can interface with data via file ingestion, database connection, APIs and benefit from a wide range of packages and strong community investment.
From the Win-Vector Blog, per John Mount “not all R users consider themselves to be expert programmers (many are happy calling themselves analysts). R is often used in collaborative projects where there are varying levels of programming expertise.”
I propose that this represents the vast majority of us, we're not expert programmers, data scientists, or statisticians. More likely, we're security analysts re-using code for our own purposes, be it red team or blue team. With a very few lines of R investigators might be more quickly able to reach conclusions.
All the code described in the post can be found on my GitHub.

Have you been pwned?

This scenario I covered in an earlier post, I'll refer you to Toolsmith Release Advisory: Steph Locke's HIBPwned R package.

Visualization for malicious Windows Event Id sequences

Windows Events by Event ID present excellent sequenced visualization opportunities. A hypothetical scenario for this visualization might include multiple failed logon attempts (4625) followed by a successful logon (4624), then various malicious sequences. A fantastic reference paper built on these principle is Intrusion Detection Using Indicators of Compromise Based on Best Practices and Windows Event Logs. An additional opportunity for such sequence visualization includes Windows processes by parent/children. One R library particularly well suited to is TraMineR: Trajectory Miner for R. This package is for mining, describing and visualizing sequences of states or events, and more generally discrete sequence data. It's primary aim is the analysis of biographical longitudinal data in the social sciences, such as data describing careers or family trajectories, and a BUNCH of other categorical sequence data. Somehow though, the project page somehow fails to mention malicious Windows Event ID sequences. :-) Consider Figures 1 and 2 as retrieved from above mentioned paper. Figure 1 are text sequence descriptions, followed by their related Windows Event IDs in Figure 2.

Figure 1
Figure 2
Taking related log data, parsing and counting it for visualization with R would look something like Figure 3.

Figure 3
How much R code does it take to visualize this data with a beautiful, interactive sunburst visualization? Three lines, not counting white space and comments, as seen in the video below.

A screen capture of the resulting sunburst also follows as Figure 4.

Figure 4

How do your potential attackers feel, or can you identify an attacker via sentiment analysis?

Do certain adversaries or adversarial communities use social media? Yes
As such, can social media serve as an early warning system, if not an actual sensor? Yes
Are certain adversaries, at times, so unaware of OpSec on social media that you can actually locate them or correlate against other geo data? Yes
Some excellent R code to assess Twitter data with includes Jeff Gentry's twitteR and rtweet to interface with the Twitter API.
  • twitteR: provides access to the Twitter API. Most functionality of the API is supported, with a bias towards API calls that are more useful in data analysis as opposed to daily interaction.
  • Rtweet: R client for interacting with Twitter’s REST and stream API’s.
The code and concepts here are drawn directly from Michael Levy, PhD UC Davis: Playing With Twitter.
Here's the scenario: DDoS attacks from hacktivist or chaos groups.
Attacker groups often use associated hashtags and handles and the minions that want to be "part of" often retweet and use the hashtag(s). Individual attackers either freely give themselves away, or often become easily identifiable or associated, via Twitter. As such, here's a walk-through of analysis techniques that may help identify or better understand the motives of certain adversaries and adversary groups. I don't use actual adversary handles here, for obvious reasons. I instead used a DDoS news cycle and journalist/bloggers handles as exemplars. For this example I followed the trail of the WireX botnet, comprised mainly of Android mobile devices utilized to launch a high-impact DDoS extortion campaign against multiple organizations in the travel and hospitality sector in August 2017. I started with three related hashtags: 
  1. #DDOS 
  2. #Android 
  3. #WireX
We start with all related Tweets by day and time of day. The code is succinct and efficient, as noted in Figure 5.

Figure 5
The result is a pair of graphs color coded by tweets and retweets per Figure 6.

Figure 6

This gives you an immediate feels for spikes in interest by day as well as time of day, particularly with attention to retweets.
Want to see what platforms potential adversaries might be tweeting from? No problem, code in Figure 7.
Figure 7

The result in the scenario ironically indicates that the majority of related tweets using our hashtags of interest are coming from Androids per Figure 8. :-)

Figure 8
Now to the analysis of emotional valence, or the "the intrinsic attractiveness (positive valence) or averseness (negative valence) of an event, object, or situation."
orig$text[which.max(orig$emotionalValence)] tells us that the most positive tweet is "A bunch of Internet tech companies had to work together to clean up #WireX #Android #DDoS #botnet."
orig$text[which.min(orig$emotionalValence)] tells us that "Dangerous #WireX #Android #DDoS #Botnet Killed by #SecurityGiants" is the most negative tweet.
Interesting right? Almost exactly the same message, but very different valence.
How do we measure emotional valence changes over the day? Four lines later...
filter(orig, mday(created) == 29) %>%
  ggplot(aes(created, emotionalValence)) +
  geom_point() + 
  geom_smooth(span = .5)
...and we have Figure 9, which tell us that most tweets about WireX were emotionally neutral on 29 AUG 2017, around 0800 we saw one positive tweet, a more negative tweets overall in the morning.

Figure 9
Another line of questioning to consider: which tweets are more often retweeted, positive or negative? As you can imagine with information security focused topics, negativity wins the day.
Three lines of R...
ggplot(orig, aes(x = emotionalValence, y = retweetCount)) +
  geom_point(position = 'jitter') +
...and we learn just how popular negative tweets are in Figure 10.

Figure 10
There are cluster of emotionally neutral retweets, two positive retweets, and a load of negative retweets. This type of analysis can quickly lead to a good feel for the overall sentiment of an attacker collective, particularly one with less opsec and more desire for attention via social media.
In Part 2 of DFIR Redefined: Deeper Functionality for Investigators with R we'll explore this scenario further via sentiment analysis and Twitter data, as well as Fast Frugal Trees (decision trees) for prioritizing criticality.
Let me know if you have any questions on the first part of this series via @holisticinfosec or russ at holisticinfosec dot org.
Cheers...until next time. 

Sunday, September 10, 2017

Toolsmith Tidbit: Windows Auditing with WINspect

WINSpect recently hit the toolsmith radar screen via Twitter, and the author, Amine Mehdaoui, just posted an update a couple of days ago, so no time like the present to give you a walk-through. WINSpect is a Powershell-based Windows Security Auditing Toolbox. According to Amine's GitHub README, WINSpect "is part of a larger project for auditing different areas of Windows environments. It focuses on enumerating different parts of a Windows machine aiming to identify security weaknesses and point to components that need further hardening. The main targets for the current version are domain-joined windows machines. However, some of the functions still apply for standalone workstations."
The current script feature set includes audit checks and enumeration for:

  • Installed security products
  • World-exposed local filesystem shares
  • Domain users and groups with local group membership
  • Registry autoruns
  • Local services that are configurable by Authenticated Users group members
  • Local services for which corresponding binary is writable by Authenticated Users group members
  • Non-system32 Windows Hosted Services and their associated DLLs
  • Local services with unquoted path vulnerability
  • Non-system scheduled tasks
  • DLL hijackability
  • User Account Control settings
  • Unattended installs leftovers
I can see this useful PowerShell script coming in quite handy for assessment using the CIS Top 20 Security Controls. I ran it on my domain-joined Windows 10 Surface Book via a privileged PowerShell and liked the results.

The script confirms that it's running with admin rights, checks PowerShell version, then inspects Windows Firewall settings. Looking good on the firewall, and WINSpect tees right off on my Window Defender instance and its configuration as well.
Not sharing a screenshot of my shares or admin users, sorry, but you'll find them enumerated when you run WINSpect.

 WINSpect then confirmed that UAC was enabled, and that it should notify me only apps try to make changes, then checked my registry for autoruns; no worries on either front, all confirmed as expected.

WINSpect wrapped up with a quick check of configurable services, SMSvcHost is normal as part of .NET, even if I don't like it, but the flowExportService doesn't need to be there at all, I removed that a while ago after being really annoyed with it during testing. No user hosted services, and DLL Safe Search is enable...bonus. Finally, no unattended install leftovers, and all the scheduled tasks are normal for my system. Sweet, pretty good overall, thanks WINSpect. :-)

Give it a try for yourself, and keep an eye out for updates. Amine indicates that Local Security Policy controls, administrative shares configs, loaded DLLs, established/listening connections, and exposed GPO scripts on the to-do list. 
Cheers...until next time.

Monday, August 28, 2017

Toolsmith Release Advisory: Magic Unicorn v2.8

David Kennedy and the TrustedSec crew have released Magic Unicorn v2.8.
Magic Unicorn is "a simple tool for using a PowerShell downgrade attack and inject shellcode straight into memory, based on Matthew Graeber's PowerShell attacks and the PowerShell bypass technique presented by Dave and Josh Kelly at Defcon 18.

Version 2.8:
  • shortens length and obfuscation of unicorn command
  • removes direct -ec from PowerShell command
"Usage is simple, just run Magic Unicorn (ensure Metasploit is installed and in the right path) and Magic Unicorn will automatically generate a PowerShell command that you need to simply cut and paste the PowerShell code into a command line window or through a payload delivery system."

Wednesday, August 16, 2017

Toolsmith #127: OSINT with Datasploit

I was reading an interesting Motherboard article, Legal Hacking Tools Can Be Useful for Journalists, Too, that includes reference to one of my all time OSINT favorites, Maltego. Joseph Cox's article also mentions Datasploit, a 2016 favorite for fellow tools aficionado,, see 2016 Top Security Tools as Voted by Readers. Having not yet explored Datasploit myself, this proved to be a grand case of "no time like the present."
Datasploit is "an #OSINT Framework to perform various recon techniques, aggregate all the raw data, and give data in multiple formats." More specifically, as stated on Datasploit documentation page under Why Datasploit, it utilizes various Open Source Intelligence (OSINT) tools and techniques found to be effective, and brings them together to correlate the raw data captured, providing the user relevant information about domains, email address, phone numbers, person data, etc. Datasploit is useful to collect relevant information about target in order to expand your attack and defense surface very quickly.
The feature list includes:
  • Automated OSINT on domain / email / username / phone for relevant information from different sources
  • Useful for penetration testers, cyber investigators, defensive security professionals, etc.
  • Correlates and collaborate results, shows them in a consolidated manner
  • Tries to find out credentials,  API keys, tokens, sub-domains, domain history, legacy portals, and more as related to the target
  • Available as single consolidating tool as well as standalone scripts
  • Performs Active Scans on collected data
  • Generates HTML, JSON reports along with text files
YouTube: Quick guide to installation and use

Second, a few pointers to keep you from losing your mind. This project is very much work in progress, lots of very frustrated users filing bugs and wondering where the support is. The team is doing their best, be patient with them, but read through the Github issues to be sure any bugs you run into haven't already been addressed.
1) Datasploit does not error gracefully, it just crashes. This can be the result of unmet dependencies or even a missing API key. Do not despair, take note, I'll talk you through it.
2) I suggest, for ease, and best match to documentation, run Datasploit from an Ubuntu variant. Your best bet is to grab Kali, VM or dedicated and load it up there, as I did.
3) My installation guidance and recommendations should hopefully get you running trouble free, follow it explicitly.
4) Acquire as many API keys as possible, see further detail below.

Installation and preparation
From Kali bash prompt, in this order:

  1. git clone /etc/datasploit
  2. apt-get install libxml2-dev libxslt-dev python-dev lib32z1-dev zlib1g-dev
  3. cd /etc/datasploit
  4. pip install -r requirements.txt
  5. mv
  6. With your preferred editor, open and add API keys for the following at a minimum, they are, for all intents and purposes required, detailed instructions to acquire each are here:
    1. Shodan API
    2. Censysio ID and Secret
    3. Clearbit API
    4. Emailhunter API
    5. Fullcontact API
    6. Google Custom Search Engine API key and CX ID
    7. Zoomeye Username and Password
If, and only if, you've done all of this correctly, you might end up with a running instance of Datasploit. :-) Seriously, this is some of the glitchiest software I've tussled with in quite a while, but the results paid handsomely. Run python, where is your target. Obviously, I ran python to acquire results pertinent to your author. 
Datasploit rapidly pulled results as follows:
211 domain references from Github:
Github results
Luckily, no results from Shodan. :-)
Four results from Paste(s): 
Pastebin and Pastie results
Datasploit pulled russ at holisticinfosec dot org as expected, per email harvesting.
Accurate HolisticInfoSec host location data from Zoomeye:

Details regarding HolisticInfoSec sub-domains and page links:
Sub-domains and page links
Finally, a good return on DNS records for and, thankfully, no vulns found via PunkSpider

DataSploit can also be integrated into other code and called as individual scripts for unique functions. I did a quick run with python and the results were impressive:
I love that the first query is of Troy Hunt's Have I Been Pwned. Not sure if you have been? Better check it out. Reminder here, you'll really want to be sure to have as many API keys as possible or you may find these buggy scripts crashing. You'll definitely find yourself compromising between frustration and the rapid, detailed results. I put this offering squarely in the "shows much promise category" if the devs keep focus on it, assess for quality, and handle errors better.
Give Datasploit a try for sure.
Cheers, until next time...

toolsmith #131 - The HELK vs APTSimulator - Part 1

Ladies and gentlemen, for our main attraction, I give you...The HELK vs APTSimulator, in a Death Battle! The late, great Randy "Macho...