4 minutes
Animating a Monte Carlo Simulation
Introduction
Oftentimes, I run into difficulty trying to explain some of the concepts of statistical sampling with audiences that either have very limited or no understanding of statistics. Given that the majority of communication of analysis has to be digested in a 1-2 hour meeting, data visualization typically is the best route for communicating my methods.
One such case is unboxing how sampling works based on your presumed population and using that sampling to build a monte carlo simulation. I’m a big fan of interactive visualizations or animations that break down in detail the underpinnings of an analysis. The basis of this post is to demonstrate using ggplot2 to construct the frames of the animation and ImageMagick to combine the frames into a gif of a simple monte carlo simulation.
Generate the Data
First, I’m going to use base R’s random sampling functions for the Poisson and the Negative Binomial to generate samples given the presumed parameters. The numbers are then added together to show a very basic monte carlo simulation. In addition, a “slice” of the data is taken that I’m going to use later to break down each step of the simulation.
library(ggplot2)
library(trstyles)
library(dplyr)
library(tidyr)
set.seed(82)
n <- 10000
mcHist <- data_frame(Poisson = rpois(n, 3),
NegBinom = rnbinom(n, 5, .5)) %>%
mutate(Simulation = Poisson + NegBinom) %>%
gather(Distribution, Value) %>%
mutate(Distribution = sub("NegBinom", "Negative Binomial", Distribution))
mcSample <- mcHist %>%
group_by(Distribution) %>%
slice(1:200) %>%
gather(Distribution, Value) %>%
group_by(Distribution) %>%
mutate(rowNum = row_number(Distribution))
lapply(split(mcHist[['Value']], mcHist[['Distribution']]), summary)
$`Negative Binomial`
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 3.000 4.000 5.017 7.000 25.000
$Poisson
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 2.000 3.000 2.991 4.000 12.000
$Simulation
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 5.000 8.000 8.008 10.000 29.000
Histogram of Expected Results
The background image that I want to show is what is the theoretical model that we
are moving towards. I created a large sample of 10,000 so that random number
generators will give a close approximation of the population (the presumed
distributions). I chose discrete distributions above so that it will be easy to
plot histograms without having to worry about the bin widths. With a little
tweaking using facet_grid
and geom_text
, I can lay out each distribution and
annotate the calculation.
g <- ggplot(mcHist) +
geom_histogram(aes(Value, ..density.., color = Distribution),
binwidth = 1, alpha = .3, fill = "transparent") +
scale_color_tr(guide = FALSE) +
facet_grid(~Distribution) +
theme_tr() +
coord_cartesian(ylim = c(0, .23)) +
labs(x = "", y = "Frequency") +
theme(panel.grid = element_blank(),
strip.text = element_text(size = 16, hjust = 0.1),
axis.text.x = element_blank()) +
geom_text(x = rep(21, 3*n), y = rep(.1, 3*n),
label = rep(c("+", "=", ""), each = n), size = 24)
g
Create Frames for Animation
Now that I have my starting frame to show each individual sample, I am going to
use geom_dotplot
to plot over the histogram. To help track the simulation moving
through time, the latest sample will be colored red. The for
loop iterates
over each row number, filters to the current state of the simulation, and
updates what is to be plotted. Then the handy ggsave
saves each frame for the
animated gif.
for (i in 1:length(unique(mcSample$rowNum))) {
dataUpdate <- mcSample %>%
group_by(Distribution) %>%
filter(rowNum %in% 1:i) %>%
group_by(Distribution) %>%
mutate(Last = rowNum == i)
gUpdate <- g +
geom_dotplot(data = dataUpdate,
aes(Value, fill = Last), color = NA,
binwidth = 1, method = "histodot",
dotsize = .6) +
scale_fill_manual(guide = FALSE, values = c("black", "red"))
#gUpdate
ggsave(filename = paste0('frames/plot', sprintf("%03d",i) , '.jpg'),
plot = gUpdate, device = 'jpeg')
}
Convert Using ImageMagick
All the heavy lifting is over now. I just need to use ImageMagick
to process the
frames and convert to a gif. You can use a call to the command line here, but my
personal preference is to just switch to the command line. ImageMagick
is a
standalone utility software that you will need to install separately from R.
convert -delay 20 -loop 0 *.jpg ../mcsim_animation.gif
I now have an animation that I can use to explain my distribution assumptions and show how each sample updates the model. It’s pretty clear over time that the sampling will eventually converge to the theoretical model.