Wednesday, September 02, 2015

toolsmith #108: Visualizing Network Data with Network Data

Prerequisites

R development environment (R, RStudio)

This month finds us in a new phase for toolsmith as it will not be associated with ISSA or the ISSA Journal any further. Suffice it to say that the ISSA board and management organization decided they no longer wanted to pay the small monthly stipend I’d been receiving since the inception of the toolsmith column. As I am by no means a profiteer, I am also not a charity, so we simply parted ways. All the better I say, as I have been less than satisfied with ISSA as an organization: Ira Winkler and Mary Ann Davidson should serve to define that dissatisfaction.
I will say this, however. All dissatisfaction aside, it has been my distinct pleasure to write for the ISSA Journal editor, Thom Barrie, who has been a loyal, dedicated, committed, and capable editor and someone I consider a friend. I will miss our monthly banter, I will miss him, and I thank him most sincerely for these nine years as editor. The ISSA Journal is better for his care and attention. Thank you, Thom.
Enough said, what’s next? I’ll continue posting toolsmith here while I consider options for a new home or partnership. I may just stick exclusively to my blog and see if there is a sponsor or two who might be interested in helping me carry the toolsmith message.
I thought I'd use our new circumstances to test a few different ideas with you over the next few months, your feedback is welcome as always, including ideas regarding what you might like to see us try. As always toolsmith will continue to offers insights on tools useful to the information security practitioner, typically open source and free.

To that end, I thought I'd offer you a bit of R code I recently cranked out for a MOOC I was taking. The following visualizations with R are the result of fulfilling a recent assignment for Coursera’s online Data Visualization class. The assignment was meant to give the opportunity to do non-coordinate data visualization with network data as it lends itself easily to graph visualization. I chose, with a bit of cheekiness in mind, to visualize network data…wait for it…with security-related network data.

Data Overview

I gathered data for the assignment from a network traffic packet capture specific to malware called Win32/Sirefef or ZeroAccess that uses stealth to hide its presence on victim systems. This Trojan family runs the gamut of expected behaviors, including downloading and running additional binaries, contacting C2, and disabling system security features. The Microsoft Malware Protection Center reference is here.
The packet capture I used was gathered during a ZeroAccess run-time analysis in my lab using a virtualized Windows victim and Wireshark, which allowed me to capture data to be saved as a CSV. The resulting CSV provides an excellent sample set inclusive of nodes and edges useful for network visualization. Keep in mind that this is a small example with a reduced node count to avoid clutter and serve as an exemplar. A few notes about the capture:
  • Where the protocol utilized was HTTP, the resulting packet length was approximately 220 bytes.
  • Where the protocol was TCP other than HTTP, the resulting packet length was approximately 60 bytes.
  • For tidy visualization these approximations are utilized rather than actual packet length.
  • Only some hosts utilized HTTP, specific edges are visualized where appropriate.
A summary of the data is available for your review after the Graphviz plots at the end of this document.

DiagrammeR and Graphviz

The DiagrammeR package for R includes Graphviz, which, in turn, includes four rendering engines including dot, neato, twopi, and circo. I’ve mentioned Graphviz as part of my discussion of ProcDot and AfterGlow as it is inherent to both projects. The following plots represent a subset of the ZeroAccess malware network traffic data.
- The green node represents the victim system.
- Red nodes represent the attacker systems.
- Orange nodes represent the protocol utilized.
- The cyan node represent the length of the packet (approximate.)
- Black edges represent the network traffic to and from the victim and attackers.
- Orange edges represent hosts conversing over TCP protocol other than HTTP.
- Cyan edges represent the relationship of protocol to packet length.
- Purple edges represent hosts communicating via the HTTP protocol.
Graphs are plotted in order of my preference for effective visualization; code for each follows.

After these first four visualizations, keep reading, I pulled together a way to read in the related CSV and render a network graph automagically.

--------------------------------------------------------------------------------------------------------------------------
Visualization 1: Graphviz ZeroAccess network circo plot



Visualization 1 code

library(DiagrammeR)
grViz("
digraph {
      
      graph [overlap = false]      
      
      node [shape = circle,
      style = filled,
      color = black,
      label = '']
      
      node [fillcolor = green]
      a [label = '192.168.248.21']
      
      node [fillcolor = red]
      b [label = '176.53.17.23']
      c [label = '46.191.175.120']
      d [label = '200.112.252.155']
      e [label = '177.77.205.145']
      f [label = '124.39.226.162']
      
      node [fillcolor = orange]
      g [label = 'TCP']
      h [label = 'HTTP']
      
      node [fillcolor = cyan]
      i [label = '60']
      j [label = '220']
      
      edge [color = black]
      a -> {b c d e f}
      b -> a
      c -> a
      d -> a
      e -> a
      f -> a
      
      edge [color = orange]
      g -> {a b c d e f}
      
      edge [color = purple]
      h -> {a b}
      
      edge [color = cyan]
      g -> i
      h -> j
      }",
engine = "circo")

--------------------------------------------------------------------------------------------------------------------------

Visualization 2: Graphviz ZeroAccess network dot plot


Visualization 2 code


library(DiagrammeR)
grViz("
digraph {
      
      graph [overlap = false]      
      
      node [shape = circle,
      style = filled,
      color = black,
      label = '']
      
      node [fillcolor = green]
      a [label = '192.168.248.21']
      
      node [fillcolor = red]
      b [label = '176.53.17.23']
      c [label = '46.191.175.120']
      d [label = '200.112.252.155']
      e [label = '177.77.205.145']
      f [label = '124.39.226.162']
      
      node [fillcolor = orange]
      g [label = 'TCP']
      h [label = 'HTTP']
      
      node [fillcolor = cyan]
      i [label = '60']
      j [label = '220']
      
      edge [color = black]
      a -> {b c d e f}
      b -> a
      c -> a
      d -> a
      e -> a
      f -> a
      
      edge [color = orange]
      g -> {a b c d e f}
      
      edge [color = purple]
      h -> {a b}
      
      edge [color = cyan]
      g -> i
      h -> j
      }",
engine = "dot")

--------------------------------------------------------------------------------------------------------------------------
Visualization 3: Graphviz ZeroAccess network twopi plot


Visualization 3 code

library(DiagrammeR)
grViz("
digraph {
      
      graph [overlap = false]      
      
      node [shape = circle,
      style = filled,
      color = black,
      label = '']
      
      node [fillcolor = green]
      a [label = '192.168.248.21']
      
      node [fillcolor = red]
      b [label = '176.53.17.23']
      c [label = '46.191.175.120']
      d [label = '200.112.252.155']
      e [label = '177.77.205.145']
      f [label = '124.39.226.162']
      
      node [fillcolor = orange]
      g [label = 'TCP']
      h [label = 'HTTP']
      
      node [fillcolor = cyan]
      i [label = '60']
      j [label = '220']
      
      edge [color = black]
      a -> {b c d e f}
      b -> a
      c -> a
      d -> a
      e -> a
      f -> a
      
      edge [color = orange]
      g -> {a b c d e f}
      
      edge [color = purple]
      h -> {a b}
      
      edge [color = cyan]
      g -> i
      h -> j
      }",
engine = "twopi")

--------------------------------------------------------------------------------------------------------------------------

Visualization 4: Graphviz ZeroAccess network neato plot


Visualization 4 code


library(DiagrammeR)
grViz("
digraph {
      
      graph [overlap = false]      
      
      node [shape = circle,
      style = filled,
      color = black,
      label = '']
      
      node [fillcolor = green]
      a [label = '192.168.248.21']
      
      node [fillcolor = red]
      b [label = '176.53.17.23']
      c [label = '46.191.175.120']
      d [label = '200.112.252.155']
      e [label = '177.77.205.145']
      f [label = '124.39.226.162']
      
      node [fillcolor = orange]
      g [label = 'TCP']
      h [label = 'HTTP']
      
      node [fillcolor = cyan]
      i [label = '60']
      j [label = '220']
      
      edge [color = black]
      a -> {b c d e f}
      b -> a
      c -> a
      d -> a
      e -> a
      f -> a
      
      edge [color = orange]
      g -> {a b c d e f}
      
      edge [color = purple]
      h -> {a b}
      
      edge [color = cyan]
      g -> i
      h -> j
      }",
engine = "neato")

Read in a CSV and render plot

Populating graphs arbitrarily as above as examples is nice...for examples. In the real world, you'd likely just want to read in a CSV derived from a Wireshark capture.
As my code is crap at this time, I reduced zeroaccess.csv to just the source and destination columns, I'll incorporate additional data points later. To use this from your own data, reduce CSV columns down to source and destination only.
Code first, with comments to explain, derived directly from Rich Iannone's DiagrammerR example for using data frames to define Graphviz graphs.



Visualization 5 is your result. As you can see, 192.168.248.21 is the center of attention and obviously our ZeroAccess victim. Yay, visualization!

Visualization 5

Following is a quick data summary, but you can grab it from Github too.

Network Data

Summary: zeroaccess.csv

zeroaccess <- span=""> read.csv("zeroaccess.csv", sep = ",")
summary(zeroaccess)

##             Source            Destination  Protocol       Length       
##  192.168.248.21:340   192.168.248.21:152   HTTP: 36   Min.   :  54.00  
##  176.53.17.23  : 90   176.53.17.23  : 90   TCP :456   1st Qu.:  60.00  
##  140.112.251.82:  6   140.112.251.82:  6              Median :  62.00  
##  178.19.22.191 :  6   178.19.22.191 :  6              Mean   :  84.98  
##  89.238.36.146 :  6   89.238.36.146 :  6              3rd Qu.:  62.00  
##  14.96.213.41  :  3   1.160.72.47   :  3              Max.   :1506.00  
##  (Other)       : 41   (Other)       :229

head(zeroaccess)

##           Source    Destination Protocol Length
## 1 192.168.248.21   176.53.17.23      TCP     62
## 2 192.168.248.21   176.53.17.23      TCP     62
## 3 192.168.248.21   176.53.17.23      TCP     62
## 4   176.53.17.23 192.168.248.21      TCP     62
## 5 192.168.248.21   176.53.17.23      TCP     54
## 6 192.168.248.21   176.53.17.23     HTTP    221

In closing

Hopefully this leads you to wanting to explore visualization of security data a bit further, note the reference material in Acknowledgments.
I've stuffed all this material on Github for you as well and will keep working on the CSV import version as well.
Ping me via email or Twitter if you have questions (russ at holisticinfosec dot org or @holisticinfosec). Cheers…until next month.

Acknowledgements

Rich Iannone for DiagrammeR and the using-data-frames-to-define-graphviz-graphs example
Jay and Bob for Data-Driven Security (the security data scientist's bible)

Moving blog to HolisticInfoSec.io

toolsmith and HolisticInfoSec have moved. I've decided to consolidate all content on one platform, namely an R markdown blogdown sit...