Network Analysis


Click the button below to explore an app I wrote to visualize the musician network from major recordings of Eric Dolphy.


This analysis below is included to illustrate the procedure for developing network graphs in R. What follow seeks to determine characteristics of the terrorist network that was responsible for the attacks on the United States on September 11th, 2001. Additionally, a section is included which attempts to model the network to determine if predictions can be made based on the modeled data.


DATA - Sourced from publicly released information reported in major newspapers such as the New York Times, Wall Street Journal, Washington Post, and the Los Angeles Times, the dataset was developed by a researcher in the field: Valdis Krebs. The data were downloaded from the Center for Computational Analysis of Social and Organizational Systems website.

METHOD - A script was written in R to generate general network statistics and visualizations. Visualizations were generated using several methods including the Fruchterman-Reingold algorithm. Additionally, models were developed utilizing Exponential-Family Random Graph Models to determine predictive capabilities. Models were evaluated by Akaike and Bayesian Information Criterion as well as other goodness of fit diagnostics. The code can be found by clicking the black button below.

CONCLUSIONS - The analysis was able to illustrate some properties of the network and was able to produce visualizations appropriately displaying these properties after some modifications to default package setting. There are several individuals which have significant properties of betweenness and closeness whose importance to the network is illustrated in the visualizations. The analysis finds the actor with the highest closeness to be Hamza Alghamdi. The actor with the highest betweenness is also Hamza Alghamdi followed closely by Nawaf Alhazmi. As illustrated in the plot we can see that actors Nawaf Alhazmi and Hamza Alghamdi have the highest degrees.

 
finalFR.png
 

Lets take a closer look

Visualization and Network Summary

The analysis begins with an examination of some features of the network. The network has a size of 19. All 19 individuals have attributes such as names and flights they occupied assigned to objects used for analysis. Let’s discuss a few preliminary findings.

Density is the proportion of observed ties (also called edges, arcs, or relations) in a network to the maximum number of possible ties. Thus, density is a ratio that can range from 0 to 1. The closer to 1 the density is, the more interconnected is the network. We find here a density of 0.15789. This shows a lower proportion of observed ties. The diameter of a network is a useful measure of its compactness. A path is the series of steps required to go from node A to node B in a network. The shortest path is the shortest number of steps required. The diameter then for an entire network is the longest of the shortest paths across all pairs of nodes. This is a measure of compactness or network efficiency in that the diameter reflects the ‘worst case scenario’ for sending information (or any other resource) across a network. We find diameter to have a max value of 9. We find clustering or transitivity to be the proportion of closed triangles (triads where all three ties are observed) to the total number of open and closed triangles (triads where either two or all three ties are observed). Thus, like density, transitivity is a ratio that can range from 0 to 1. We calculate transitivity to be 0.3970588.

Lets start to visualize the network and see if we are able to optimize that visualization. Utilizing some default settings in R we get the following.

 
defualtNetPlot.png
 

This plot provides the defualt visualization from the Networks package. We can examine some ways to improve this visualization by producing the plot using the Fruchterman-Reingold algorithm on the right and also the circle method on the left.

 
circleFR.png
 

We find here that utilizing the Fruchterman-Reingold algorithm to plot the network illustrates its entirety well. Let's now compare another plotting method against the more appropriate Fruchterman-Reingold algorithm.

 
randomFR.png
 

Of the methods compared, it appears that utilizing the Fruchterman-Reingold algorithm will assist in the understanding of the visualization. We can now utilize attributes in the data and also use color to communicate some characteristic of the node or network. Specifically, information stored in a categorical node attribute can often be communicated through node color choices. Let’s do so by applying some color and labels to the Fruchterman-Reingold plot.

 
fullFR.png
 

Node size can also be used to improve the network visualization. We will calculate three different measures of node centrality to apply to the plot. Calculations of closeness, betweenness, and degree will be used and two plots fit: a raw value plot on the left and and an adjusted plot taking the log of each calculated value on the right.

 
calculatedFR.png
 

Now that we have determined the parameters, we can apply them to optimize the plot. The following produces that optimized visualization.

 
finalFR.png
 

After going over a few variations of the plot, here we can more clearly see distinctive groups of each color, indicating which flight and what actors interacted between those flights. Next, lets take a more advanced look at possible sub groups and communities.

Community Detection and Sub Groups

We will take a look at some of the simplest types of cohesive subgroups. Lets examine the network first for cliques. From our calculations we find the largest clique within the network to be Nawaf Alhazmi, Ahmed Alnami, Saeed Alghamdi, and Hamza Alghamdi. This subgroup contains the induviduals indentified earlier whilst examining betweeness and closeness. While cliques are useful here, they simply are not very common in larger social networks. As such, we will explore others methods to determine communities.

Additional calculations for modularity and membership were made. Given what was found we consider the modularity high and we can see from the membership calculation that there are four different sub groups. A plot of this information using the cluster Walk Trap algorithm will help illustrate the result.

 
groupOne.png
 

We see here the distinct subgroups found by the cluster walk trap algorithm. Let’s examine some additional methods.

Calculations of modularity and membership for each method show that all the detection algorithms identify either two or five subgroups. Modularity ranges from about 0.46 to 0.50. Let’s take a look at the plots from each of these algorithms.

 
allAlg.png
 

The plot above illustrates well the difference in the groups found by the algorithm. The nodes were labeled by flight as that would be assumed to be one possible manner of forming groups. We can see that not all groups are formed by flights though. We do see the groups of four and five formed by each.

Network Modeling

Exponential-family random graph models will be used to further analyze the network and to see what impact the attributes have on connections in the network. We fit a model with only the edge attributes and ran a simulation to determine how it performed.

 
tri.png
 

After 100 simulations of the network based on the null model, we find that the null model did not quite capture the amount of triangles in the network, which was nine. The next step is to take a look at some more complex models to see if we can find a better fit. The comparison of some of those models was also to see if we can find significant variables that assist in predicting the connection between two terrorist “nodes”.

The result of this additional comparison did not find performance enhancements thru any iteration of the model as exhibited by an increasing AIC in each successive model. We did find that the pilots age effects the model by capturing more edges than the null model. Similarly, we find that it captures more triangles as well. Proceeding with a model fit to include the pilots age we can look at its goodness of fit.

gof.png
gof2.png

The goodness of fit diagnostics illustrate here the final models performance. The plots show that the model captured degree and dyad wise shared partners well. Where we can see it has had some issues is with capturing edge wise shared partners.