As discussed here before, those of us in the DFIR practice, and Blue Teaming at large, are overwhelmed by data and scale. Success truly requires algorithmic methods. If you're not already invested here I have an immediately applicable case study for you in tidy anomaly detection with anomalize.
First, let me give credit where entirely due for the work that follows. Everything I discuss and provide is immediately derivative from Business Science (@bizScienc), specifically Matt Dancho (@mdancho84). He created anomalize, "a tidy anomaly detection algorithm that’s time-based (built on top of tibbletime) and scalable from one to many time series," when a client asked Business Science to build an open source anomaly detection algorithm that suited their needs. I'd say he responded beautifully, when his blogpost hit my radar via R-Bloggers it lived as an open tab in my browser for more than a month until generating this toolsmith. Please consider Matt's post a mandatory read as step one of the process here. I'll quote Matt specifically before shifting context: "Our client had a challenging problem: detecting anomalies in time series on daily or weekly data at scale. Anomalies indicate exceptional events, which could be increased web traffic in the marketing domain or a malfunctioning server in the IT domain. Regardless, it’s important to flag these unusual occurrences to ensure the business is running smoothly. One of the challenges was that the client deals with not one time series but thousands that need to be analyzed for these extreme events."
Key takeaway: Detecting anomalies in time series on daily or weekly data at scale. Anomalies indicate exceptional events.
Now shift context with me to security-specific events and incidents, as the pertain to security monitoring, incident response, and threat hunting. In my November 2017 post, recall that I discussed Time Series Regression with the Holt-Winters method and a focus on seasonality and trends. Unfortunately, I couldn't share the code for how we applied TSR, but pointed out alternate methods, including Seasonal and Trend Decomposition using Loess (STL):
- Handles any type of seasonality ~ can change over time
- Smoothness of the trend-cycle can also be controlled by the user
- Robust to outliers
- anomalize(): Applies anomaly detection methods to the remainder component.
- time_recompose(): Calculates limits that separate the “normal” data from the anomalies
|Figure 1: Initial experiment|
I originally tried to write log data from log.csv straight to an anomalize.R script with logs = read_csv("log.csv") into a tibble (ready your troubles with tibbles jokes), which was not being parsed accurately, particularly time attributes. To correct this, from Matt's Github I grabbed tidyverse_cran_downloads.R, and modified it as follows:
This helped greatly thanks to the tibbletime package, which is "is an extension that allows for the creation of time aware tibbles. Some immediate advantages of this include: the ability to perform time based subsetting on tibbles, quickly summarising and aggregating results by time periods. Guess what, Matt wrote tibbletime too. :-)
I then followed Matt's sequence as he posted on Business Science, but with my logs defined as a function in Security_Access_Logs_Function.R. Following, I'll give you the code snippets, as revised from Matt's examples, followed by their respective results specific to processing my Event ID 4624 daily count log.
First, let's summarize daily login counts across three servers over four months.
The result is evident in Figure 2.
|Figure 2: Server logon counts visualized|
The result is revealed in Figure 3.
|Figure 3: Security event log anomalies|
In Figure 4, you'll note that there are anomalous logon counts on SERVER-549521 in June.
|Figure 4: SERVER-549521 logon anomalies with Twitter & GESD methods|
Again, we note anomalies in June, as seen in Figure 5.
|Figure 5: SERVER-549521 logon anomalies with STL & IQR methods|
The result is a four part visualization, including observed, season, trend, and remainder as seen in Figure 6.
|Figure 6: Decomposition for SERVER-549521 Logins|