+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to Geospatial Analysis in R

Thomas Roh
Data Scientist, HDR

2018-09-05

1 / 58

Software Requirements

  • R >= 3.4.3
  • R Packages
    • sf >= 0.6-3
    • sp >= 1.3-1
    • ggplot2 >= 3.0.0.9
    • leaflet >= 2.0.1
    • raster >= 2.6-7
    • httr >= 1.3.1
    • rvest
  • RStudio Deskop (preferred)
    • Any other IDE for R will work as well
  • Census API Key
2 / 58

Outline

  • Introduction to R and GIS
  • GIS Data
    • Shapefiles, GeoJSON, WKT, rasters
    • CRS
    • spatial geometries
  • Reading Spatial Data into R
    • Data Representations
  • sp Package
  • sf Package
  • raster Package
  • leaflet Package
  • ggplot2 Package
3 / 58

Data Types in R

4 / 58

Data Types in R

typeof(1L)
[1] "integer"
4 / 58

Data Types in R

typeof(1L)
[1] "integer"
typeof(1.1)
[1] "double"
4 / 58

Data Types in R

typeof(1L)
[1] "integer"
typeof(1.1)
[1] "double"
typeof(TRUE)
[1] "logical"
4 / 58

Data Types in R

typeof(1L)
[1] "integer"
typeof(1.1)
[1] "double"
typeof(TRUE)
[1] "logical"
typeof("string")
[1] "character"
4 / 58

Data Types in R

5 / 58

Data Types in R

typeof(factor(c("string1", "string2")))
[1] "integer"
5 / 58

Data Types in R

typeof(factor(c("string1", "string2")))
[1] "integer"
factor(c("string1", "string2"))
[1] string1 string2
Levels: string1 string2
5 / 58

Data Types in R

typeof(factor(c("string1", "string2")))
[1] "integer"
factor(c("string1", "string2"))
[1] string1 string2
Levels: string1 string2
ordered(c("string1", "string2"))
[1] string1 string2
Levels: string1 < string2
5 / 58

Data Structures in R

6 / 58

Data Structures in R

## all must be of the same type
c(1L, 2L)
[1] 1 2
6 / 58

Data Structures in R

## all must be of the same type
c(1L, 2L)
[1] 1 2
matrix(rnorm(4), ncol = 2)
[,1] [,2]
[1,] -1.267763 -0.4673907
[2,] 1.734005 -0.2678249
6 / 58

Data Structures in R

## all must be of the same type
c(1L, 2L)
[1] 1 2
matrix(rnorm(4), ncol = 2)
[,1] [,2]
[1,] -1.267763 -0.4673907
[2,] 1.734005 -0.2678249
str(array(rnorm(8), dim = c(2, 2, 2)))
num [1:2, 1:2, 1:2] -1.378 0.351 -1.386 -0.329 -1.006 ...
6 / 58

Data Structures in R

## all must be of the same type
c(1L, 2L)
[1] 1 2
matrix(rnorm(4), ncol = 2)
[,1] [,2]
[1,] -1.267763 -0.4673907
[2,] 1.734005 -0.2678249
str(array(rnorm(8), dim = c(2, 2, 2)))
num [1:2, 1:2, 1:2] -1.378 0.351 -1.386 -0.329 -1.006 ...
## different data types can be stored together
sapply(list(1L, "string", FALSE, 1.1), typeof)
[1] "integer" "character" "logical" "double"
6 / 58

Dataframes in R

7 / 58

Dataframes in R

## special class of list
df <- data.frame(x = 1:2, y = c("string1", c("string2")))
df
x y
1 1 string1
2 2 string2
7 / 58

Dataframes in R

## special class of list
df <- data.frame(x = 1:2, y = c("string1", c("string2")))
df
x y
1 1 string1
2 2 string2
df[1,"y", drop = FALSE]
y
1 string1
7 / 58

Dataframes in R

## special class of list
df <- data.frame(x = 1:2, y = c("string1", c("string2")))
df
x y
1 1 string1
2 2 string2
df[1,"y", drop = FALSE]
y
1 string1
df[df$x == 1, names(df) == 'x']
[1] 1
7 / 58

Subsetting in R

8 / 58

Subsetting in R

df['y']
y
1 string1
2 string2
8 / 58

Subsetting in R

df['y']
y
1 string1
2 string2
df$y
[1] string1 string2
Levels: string1 string2
8 / 58

Subsetting in R

df['y']
y
1 string1
2 string2
df$y
[1] string1 string2
Levels: string1 string2
all.equal(df$y, df[['y']])
[1] TRUE
8 / 58

Subsetting in R

df['y']
y
1 string1
2 string2
df$y
[1] string1 string2
Levels: string1 string2
all.equal(df$y, df[['y']])
[1] TRUE

Advanced R - Data Structures

8 / 58

GIS Data Files (Shapefile)

9 / 58

GIS Data Files (Shapefile)

  • Vector geospatial data format specification developed by ESRI to minimize storage space, improve shape drawing speed, and for interoperatibily with GIS software
9 / 58

GIS Data Files (Shapefile)

  • Vector geospatial data format specification developed by ESRI to minimize storage space, improve shape drawing speed, and for interoperatibily with GIS software

  • Typically in distributed in a zip file and stored in a directory with at least the following files:

    • Main file (.shp)
    • Index file (.shx)
    • dBASE table (.dbf)
9 / 58

GIS Data Files (Shapefile)

  • Vector geospatial data format specification developed by ESRI to minimize storage space, improve shape drawing speed, and for interoperatibily with GIS software

  • Typically in distributed in a zip file and stored in a directory with at least the following files:

    • Main file (.shp)
    • Index file (.shx)
    • dBASE table (.dbf)
  • The other most significant file is the projection format (.prj) which supplies the coordinate reference system
9 / 58

GIS Data Files (Shapefile)

  • Vector geospatial data format specification developed by ESRI to minimize storage space, improve shape drawing speed, and for interoperatibily with GIS software

  • Typically in distributed in a zip file and stored in a directory with at least the following files:

    • Main file (.shp)
    • Index file (.shx)
    • dBASE table (.dbf)
  • The other most significant file is the projection format (.prj) which supplies the coordinate reference system

  • There are many other files that may be in the directory, which can be ignored for now.

9 / 58

GIS Data Files (Shapefile)

  • Vector geospatial data format specification developed by ESRI to minimize storage space, improve shape drawing speed, and for interoperatibily with GIS software

  • Typically in distributed in a zip file and stored in a directory with at least the following files:

    • Main file (.shp)
    • Index file (.shx)
    • dBASE table (.dbf)
  • The other most significant file is the projection format (.prj) which supplies the coordinate reference system

  • There are many other files that may be in the directory, which can be ignored for now.

Source: ESRI

9 / 58

GIS Data Files (GeoJSON)

10 / 58

GIS Data Files (GeoJSON)

  • Spatial data format developed in line with JavaScript Object Notation (JSON). GeoJSON objects are standardized JSON objects.
10 / 58

GIS Data Files (GeoJSON)

  • Spatial data format developed in line with JavaScript Object Notation (JSON). GeoJSON objects are standardized JSON objects.
  • All information about the is contained in a single file (.geojson)
    • FeatureCollection->Feature->Geometry
    • Geometry Types: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, and GeometryCollection
    • CRS information (optional)
    • JSON is a named list of object data, can be nested and assymetrical
10 / 58

GIS Data Files (GeoJSON)

  • Spatial data format developed in line with JavaScript Object Notation (JSON). GeoJSON objects are standardized JSON objects.
  • All information about the is contained in a single file (.geojson)
    • FeatureCollection->Feature->Geometry
    • Geometry Types: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, and GeometryCollection
    • CRS information (optional)
    • JSON is a named list of object data, can be nested and assymetrical
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
[-96, 43]
]
}
}
10 / 58

GIS Data Files (GeoJSON)

  • Spatial data format developed in line with JavaScript Object Notation (JSON). GeoJSON objects are standardized JSON objects.
  • All information about the is contained in a single file (.geojson)
    • FeatureCollection->Feature->Geometry
    • Geometry Types: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, and GeometryCollection
    • CRS information (optional)
    • JSON is a named list of object data, can be nested and assymetrical
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
[-96, 43]
]
}
}

Source: IETF

10 / 58

GIS Data Files (WKT)

11 / 58

GIS Data Files (WKT)

  • Well-known Text (WKT) is a fully open source data specification for spatial objects that is designed to be human and machine readable.
11 / 58

GIS Data Files (WKT)

  • Well-known Text (WKT) is a fully open source data specification for spatial objects that is designed to be human and machine readable.

  • A simple text markup language that is easy to store in a text file or in a database as a character string. For databases that do not support spatial objects, you can store as WKT as character strings and use an interpreter in R to read it.

11 / 58

GIS Data Files (WKT)

  • Well-known Text (WKT) is a fully open source data specification for spatial objects that is designed to be human and machine readable.

  • A simple text markup language that is easy to store in a text file or in a database as a character string. For databases that do not support spatial objects, you can store as WKT as character strings and use an interpreter in R to read it.

  • The print method for geometry objects in the sf package will show WKT representation in the console.

11 / 58

GIS Data Files (WKT)

  • Well-known Text (WKT) is a fully open source data specification for spatial objects that is designed to be human and machine readable.

  • A simple text markup language that is easy to store in a text file or in a database as a character string. For databases that do not support spatial objects, you can store as WKT as character strings and use an interpreter in R to read it.

  • The print method for geometry objects in the sf package will show WKT representation in the console.

POINT (-95.9345 41.2565)
LINESTRING (-96.05645 41.22345, -95.90417 41.11618)
POLYGON ((-95.9345 41.2565, -95.8345 41.2565,
-95.9345 41.3565, -95.9345 41.2565))
11 / 58

GIS Data Files (WKT)

  • Well-known Text (WKT) is a fully open source data specification for spatial objects that is designed to be human and machine readable.

  • A simple text markup language that is easy to store in a text file or in a database as a character string. For databases that do not support spatial objects, you can store as WKT as character strings and use an interpreter in R to read it.

  • The print method for geometry objects in the sf package will show WKT representation in the console.

POINT (-95.9345 41.2565)
LINESTRING (-96.05645 41.22345, -95.90417 41.11618)
POLYGON ((-95.9345 41.2565, -95.8345 41.2565,
-95.9345 41.3565, -95.9345 41.2565))

Source: Open Geospatial Consortium

11 / 58

Spatial Data Types

12 / 58

Spatial Data Types

library(sf)
set.seed(82)
p <- st_point(c(-95.9345, 41.2565))
mp <- st_multipoint(matrix(rnorm(4, sd = .1) +
rep(p, each = 2), ncol = 2))
l <- st_linestring(mp)
ml <- st_multilinestring(list(l + rnorm(4, sd = .05),
l - rnorm(4, sd = .05)))
plgn <- st_polygon(list(matrix(rep(p, 4) + c(0,0,.1,0,0,.1,0,0),
ncol = 2, byrow = TRUE)))
mplgn <- st_multipolygon(list(plgn -.15 ,plgn -.2))
12 / 58

Spatial Data Types

par(mar = rep(0, 4))
plot(p, xlim = p[1]+c(-.1, .01), ylim = p[2]+c(-.2, .1))
plot(mp, add = TRUE, col = 'red')
plot(l, add = TRUE, col = 'blue')
plot(ml, add = TRUE, col = 'orange')
plot(plgn, add = TRUE, col = 'purple')
plot(mplgn, add = TRUE, col = 'yellow')

13 / 58

Spatial Data Types

par(mar = rep(0, 4))
gcol <- st_geometrycollection(list(p, mp, l, ml, plgn, mplgn))
plot(gcol, xlim = p[1]+c(-.1, .01), ylim = p[2]+c(-.2, .1))

14 / 58

Reading a Shapefile in with sf

15 / 58

Reading a Shapefile in with sf

states <- st_read('dataWS/states/cb_2017_us_state_500k.shp',
stringsAsFactors = FALSE,
quiet = TRUE) %>%
st_transform(crs = 4326)
15 / 58

Reading a Shapefile in with sf

states <- st_read('dataWS/states/cb_2017_us_state_500k.shp',
stringsAsFactors = FALSE,
quiet = TRUE) %>%
st_transform(crs = 4326)
str(states)
15 / 58

Changing the Coordinate Reference System

states <- st_transform(states, crs = 4326)
st_crs(states)
Coordinate Reference System:
EPSG: 4326
proj4string: "+proj=longlat +datum=WGS84 +no_defs"
16 / 58

Changing the Coordinate Reference System

states <- st_transform(states, crs = 4326)
st_crs(states)
Coordinate Reference System:
EPSG: 4326
proj4string: "+proj=longlat +datum=WGS84 +no_defs"
states %>%
st_transform(crs = 4326) %>%
st_crs()
16 / 58

Changing the Coordinate Reference System

states <- st_transform(states, crs = 4326)
st_crs(states)
Coordinate Reference System:
EPSG: 4326
proj4string: "+proj=longlat +datum=WGS84 +no_defs"
states %>%
st_transform(crs = 4326) %>%
st_crs()
16 / 58

Mapping with ggplot2

library(ggplot2)
ggplot(states[!states$STUSPS %in%
c('HI', 'AK', 'AS', 'GU', 'MP', 'PR', 'VI'), ]) +
geom_sf(fill = 'transparent', color = 'blue', size = 1) +
theme_bw()

17 / 58

Omaha City Limits

cityLimits <- st_read('dataWS/city_limits.csv',
stringsAsFactors = FALSE,
quiet = TRUE,
crs = 4326)
bb <- st_bbox(cityLimits, crs = st_crs(4326))
tracts <- st_read('dataWS/census_tracts_31/cb_2017_31_tract_500k.shp',
stringsAsFactors = FALSE,
quiet = TRUE) %>%
st_transform(crs = 4326)
18 / 58

Omaha City Limits

ggplot(cityLimits) +
geom_sf(aes(fill = TOWN)) +
geom_sf(data = tracts, fill = 'transparent') +
coord_sf(xlim = c(bb$xmin, bb$xmax), ylim = c(bb$ymin, bb$ymax)) +
scale_fill_viridis_d(option = 'B') +
theme_bw()

19 / 58

Filtering Spatial Data

20 / 58

Filtering Spatial Data

omaha <- cityLimits[cityLimits$TOWN == 'Omaha',]
tracts <- tracts[st_intersects(tracts, omaha, sparse = FALSE), ]
ggplot(omaha) +
geom_sf(aes(fill = TOWN)) +
geom_sf(data = tracts, fill = 'transparent') +
coord_sf(xlim = c(bb$xmin, bb$xmax), ylim = c(bb$ymin, bb$ymax)) +
scale_fill_viridis_d(option = 'B') +
theme_bw()

20 / 58

U.S. Census API

21 / 58

U.S. Census API

library(httr)
query_acs <- function(var, stateNo, apiKey = Sys.getenv("census_token")) {
endpoint <- "https://api.census.gov/data/2016/acs/acs5"
varString <- paste0(c("NAME", var), collapse = ",")
apiCall <- paste0(endpoint,
"?get=", varString,
"&for=tract:*&in=state:", stateNo,
"&key=", apiKey)
response <- GET(apiCall)
census <- content(response)
census <- lapply(census[-1], function(x) {
setNames(data.frame(x, stringsAsFactors = FALSE),
nm = census[[1]])
})
census <- Reduce(rbind, census)
census[['geo_id']] <- paste0(census$state, census$county, census$tract)
census
}
census <- query_acs(c("B00001_001E", "B06011_001E"), 31)
census[1,]
NAME B00001_001E B06011_001E state
1 Census Tract 9654, Adams County, Nebraska 924 32048 31
county tract geo_id
1 001 965400 31001965400

US Census: User Guide, Census Variables, Examples

21 / 58

Chloropleths

22 / 58

Chloropleths

names(census)[1:3] <- c('description', 'population', 'median_income')
census[['population']] <- as.numeric(census[['population']])
census[['median_income']] <- as.numeric(census[['median_income']])
tracts <- merge(tracts, census, by.x = 'GEOID', by.y = 'geo_id')
tracts[['density']] <- tracts[['population']]/(tracts[['ALAND']]/1000000)
22 / 58
ggplot(tracts) +
geom_sf(aes(fill = density)) +
scale_fill_viridis_c(name = 'Population Density (per km2)') +
theme_bw()

tracts[['description']][which.max(tracts$density)]
[1] "Census Tract 39, Douglas County, Nebraska"
23 / 58

Interactive Mapping

library(leaflet)
tracts <- tracts[tracts$median_income > 0,]
centroids <- st_centroid(st_union(omaha))
tractBaseMap <- leaflet(tracts) %>% addTiles() %>%
setView(centroids[[1]][1], centroids[[1]][2], zoom = 10)
tractBaseMap

leaflet

24 / 58

Color Palettes

pal <- colorNumeric("viridis", tracts$median_income)
palBin <- colorBin('viridis', tracts$median_income, 5)
palQuant <- colorQuantile('viridis', tracts$median_income, 5)
tractMap <- tractBaseMap %>%
addPolygons(color = ~pal(median_income),
group = 'Continuous', weight = 1, fillOpacity = .5) %>%
addLegend(pal = pal, values = ~median_income,
group = 'Continuous Legend',
position = 'bottomright')
25 / 58
tractMap
26 / 58

Legends

tractMap <- tractMap %>%
addPolygons(color = ~palBin(median_income),
group = 'Bin', weight = 1, fillOpacity = .5) %>%
addPolygons(color = ~palQuant(median_income),
group = 'Quantile', weight = 1, fillOpacity = .5) %>%
addLegend(pal = palBin, values = ~median_income,
group = 'Bin Legend',
position = 'bottomright') %>%
addLegend(pal = palQuant, values = ~median_income,
group = 'Quantile Legend',
position = 'bottomleft') %>%
addLayersControl(baseGroups = c('Continuous', 'Bin', 'Quantile'),
overlayGroups = c('Continuous Legend',
'Bin Legend',
'Quantile Legend'))
27 / 58
tractMap
28 / 58

Omaha Building Violations

violations <- st_read(
'dataWS/OPBCV/Omaha_Planning_Building_Code_Violations.shp',
quiet = TRUE,
stringsAsFactors = FALSE) %>%
st_transform(crs = 4326)
vColors <- ifelse(grepl('Closed', violations$StatusCurr), 'black', 'red')
tdown <- makeAwesomeIcon('thumbs-down',
library = 'fa',
markerColor = 'white',
iconColor = vColors,
squareMarker = TRUE)

Font Awesome

29 / 58

Markers/Points

tractMap <- tractBaseMap %>%
addPolygons(color = ~pal(median_income),
group = 'Continuous', weight = 1, fillOpacity = .5) %>%
addLegend(pal = pal, values = ~median_income,
group = 'Continuous Legend',
position = 'bottomright') %>%
addAwesomeMarkers(data = violations,
icon = tdown,
clusterOptions = markerClusterOptions(),
popup = ~paste(
"<b>Permit:</b>",PermitNum, "<br>",
"<b>Type:</b>", CaseType, "<br>"))
30 / 58
tractMap
31 / 58

Spatial Aggregation of Violations

violWithin <- st_within(violations, tracts, sparse = FALSE)
nViol <- colSums(violWithin)
nViol[1:5]
[1] 64 83 66 47 71
sum(nViol)
[1] 4325
nrow(violations)
[1] 4331
32 / 58
tracts <- cbind(tracts, nViol)
ggplot(tracts) +
geom_sf(aes(fill = nViol)) +
geom_sf(data = violations[ rowSums(violWithin) == 0, ], color = 'red') +
scale_fill_viridis_c(option = 'plasma') +
theme_bw()

33 / 58

Year Structures Were Built

yrBuilt <- query_acs(paste0("B25034_", sprintf("%03.f", 1:11), "E"), 31)
head(yrBuilt, 5)
NAME B25034_001E B25034_002E
1 Census Tract 9654, Adams County, Nebraska 1866 12
2 Census Tract 9655, Adams County, Nebraska 1689 0
3 Census Tract 9656, Adams County, Nebraska 2246 5
4 Census Tract 9657, Adams County, Nebraska 728 0
5 Census Tract 9658, Adams County, Nebraska 1036 0
B25034_003E B25034_004E B25034_005E B25034_006E B25034_007E B25034_008E
1 27 338 282 241 333 140
2 8 140 36 77 301 173
3 23 131 354 52 361 221
4 0 15 41 14 47 42
5 0 16 0 4 75 128
B25034_009E B25034_010E B25034_011E state county tract geo_id
1 73 58 362 31 001 965400 31001965400
2 240 143 571 31 001 965500 31001965500
3 174 366 559 31 001 965600 31001965600
4 54 46 469 31 001 965700 31001965700
5 332 121 360 31 001 965800 31001965800
yrBuilt <- yrBuilt[ ,!names(yrBuilt) %in% c('state', 'county', 'tract',
'NAME')]
34 / 58

Querying Census Data Labels

query_acs_var <- function(var) {
endpoint <- "https://api.census.gov/data/2016/acs/acs5/variables/"
x <- lapply(paste0(endpoint, var), function(x) {content(GET(x))})
setNames(sapply(x, getElement, 'label'),
var)
}
varNames <- query_acs_var(paste0("B25034_", sprintf("%03.f", 1:11), "E"))
varNames[1:5]
B25034_001E
"Estimate!!Total"
B25034_002E
"Estimate!!Total!!Built 2014 or later"
B25034_003E
"Estimate!!Total!!Built 2010 to 2013"
B25034_004E
"Estimate!!Total!!Built 2000 to 2009"
B25034_005E
"Estimate!!Total!!Built 1990 to 1999"
35 / 58

Prepare the Data

varNames <- gsub(" ", "_", gsub('[A-Za-z]+\\!\\!', "", varNames))
varNames
B25034_001E B25034_002E B25034_003E
"Total" "Built_2014_or_later" "Built_2010_to_2013"
B25034_004E B25034_005E B25034_006E
"Built_2000_to_2009" "Built_1990_to_1999" "Built_1980_to_1989"
B25034_007E B25034_008E B25034_009E
"Built_1970_to_1979" "Built_1960_to_1969" "Built_1950_to_1959"
B25034_010E B25034_011E
"Built_1940_to_1949" "Built_1939_or_earlier"
names(yrBuilt)[1:11] <- varNames[names(yrBuilt)[1:11]]
yrBuilt[, -12] <- lapply(yrBuilt[, -12], as.numeric)
yrBuilt <- na.omit(yrBuilt)
tracts <- st_as_sf(tracts, crs = 4326)
tracts <- merge(tracts, yrBuilt, by.x = 'GEOID', by = 'geo_id')
tracts <- as.data.frame(tracts)
class(tracts)
[1] "data.frame"
36 / 58
trainDT <- tracts[ , c('Built_1939_or_earlier',
'Built_1940_to_1949',
'Built_1950_to_1959',
'Built_1960_to_1969',
'Built_1970_to_1979',
'Built_1980_to_1989',
'Built_1990_to_1999',
'Built_2000_to_2009',
'Built_2010_to_2013',
'Built_2014_or_later',
'Total',
'density',
'median_income',
'nViol')]
trainDT[['built_1949']] <- trainDT$Built_1939_or_earlier +
trainDT$Built_1940_to_1949
trainDT[['built_1969']] <- trainDT$Built_1950_to_1959 +
trainDT$Built_1960_to_1969
trainDT[['built_1989']] <- trainDT$Built_1970_to_1979 +
trainDT$Built_1980_to_1989
trainDT[['built_1990']] <- trainDT$Built_1990_to_1999 +
trainDT$Built_2000_to_2009 +
trainDT$Built_2010_to_2013 +
trainDT$Built_2014_or_later
trainDT <- trainDT[-c(1:10)]
37 / 58

Train a Model

poissonModel <- glm(nViol ~ ., family = 'poisson', data = trainDT)
summary(poissonModel)
Call:
glm(formula = nViol ~ ., family = "poisson", data = trainDT)
Deviance Residuals:
Min 1Q Median 3Q Max
-9.1355 -1.6920 0.2232 1.2533 6.4402
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.526e+00 6.822e-02 66.339 < 2e-16 ***
Total -1.830e-04 6.183e-05 -2.960 0.00308 **
density -1.188e-03 1.856e-04 -6.398 1.57e-10 ***
median_income -6.522e-05 2.261e-06 -28.848 < 2e-16 ***
built_1949 1.254e-03 6.716e-05 18.680 < 2e-16 ***
built_1969 5.714e-04 6.993e-05 8.171 3.05e-16 ***
built_1989 4.775e-04 8.706e-05 5.484 4.15e-08 ***
built_1990 NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 4104.8 on 161 degrees of freedom
Residual deviance: 1061.4 on 155 degrees of freedom
AIC: 1776.6
Number of Fisher Scoring iterations: 5
38 / 58

Singularity

alias(poissonModel)
Model :
nViol ~ Total + density + median_income + built_1949 + built_1969 +
built_1989 + built_1990
Complete :
(Intercept) Total density median_income built_1949 built_1969
built_1990 0 1 0 0 -1 -1
built_1989
built_1990 -1
39 / 58

Results

poissonModel <- glm(nViol ~ ., family = 'poisson', data = trainDT[,-1])
summary(poissonModel)
Call:
glm(formula = nViol ~ ., family = "poisson", data = trainDT[,
-1])
Deviance Residuals:
Min 1Q Median 3Q Max
-9.1355 -1.6920 0.2232 1.2533 6.4402
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.526e+00 6.822e-02 66.339 < 2e-16 ***
density -1.188e-03 1.856e-04 -6.398 1.57e-10 ***
median_income -6.522e-05 2.261e-06 -28.848 < 2e-16 ***
built_1949 1.071e-03 3.782e-05 28.332 < 2e-16 ***
built_1969 3.884e-04 5.340e-05 7.273 3.51e-13 ***
built_1989 2.945e-04 5.688e-05 5.177 2.25e-07 ***
built_1990 -1.830e-04 6.183e-05 -2.960 0.00308 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 4104.8 on 161 degrees of freedom
Residual deviance: 1061.4 on 155 degrees of freedom
AIC: 1776.6
Number of Fisher Scoring iterations: 5
40 / 58

Model Diagnostics

par(mfrow = c(2,2))
plot(poissonModel)

41 / 58

Model Error

tracts[['pred_nViol']] <- predict(poissonModel, type = "response")
tracts[['pred_error']] <- tracts$nViol-tracts$pred_nViol
qplot(poissonModel$y,
predict(poissonModel, type = "response")) +
geom_abline(slope = 1, intercept = 0, color = 'red') +
labs(x = 'Actual', y = 'Predicted') +
theme_bw()

42 / 58

Spatial Comparison

ggplot(tracts) +
geom_sf(aes(fill = nViol)) +
labs(fill = 'Violations') +
scale_fill_viridis_c() +
theme_bw()

ggplot(tracts) +
geom_sf(aes(fill = pred_nViol)) +
labs(fill = 'Violations') +
scale_fill_viridis_c() +
theme_bw()

43 / 58

Spatial Error

ggplot(tracts) +
geom_sf(aes(fill = pred_error)) +
scale_fill_viridis_c(option = 'cividis') +
theme_bw()

44 / 58

GIS Data Files (Rasters)

45 / 58

GIS Data Files (Rasters)

  • Rasters graphics are a very generic object class
45 / 58

GIS Data Files (Rasters)

  • Rasters graphics are a very generic object class
  • For GIS analysis, a spatial raster is divided into equal units of the CRS
45 / 58

GIS Data Files (Rasters)

  • Rasters graphics are a very generic object class
  • For GIS analysis, a spatial raster is divided into equal units of the CRS
  • A raster is an array with pixel location on an X-Y plane and has topological information contained at the each location
45 / 58

GIS Data Files (Rasters)

  • Rasters graphics are a very generic object class
  • For GIS analysis, a spatial raster is divided into equal units of the CRS
  • A raster is an array with pixel location on an X-Y plane and has topological information contained at the each location
aNames <- c(lapply(c("X", "Y"), paste0, 1:2), list(c("R", "G", "B")))
array(rnorm(10), dim = c(2,2,3), dimnames = aNames)
, , R
Y1 Y2
X1 -0.03789851 -0.9039875
X2 -0.31576897 0.4371677
, , G
Y1 Y2
X1 -0.3617482 1.200864
X2 0.6089799 0.646111
, , B
Y1 Y2
X1 -0.2890454 -0.03789851
X2 1.5248314 -0.31576897
45 / 58

Working with Rasters

46 / 58

Working with Rasters

46 / 58

Working with Rasters

  • raster

  • sf does not have raster abilities yet

46 / 58

Working with Rasters

  • raster

  • sf does not have raster abilities yet

  • Works on large files because it can divide and conquer

46 / 58

Working with Rasters

  • raster

  • sf does not have raster abilities yet

  • Works on large files because it can divide and conquer

  • RasterLayer, RasterStack, RasterBrick

46 / 58

Working with Rasters

  • raster

  • sf does not have raster abilities yet

  • Works on large files because it can divide and conquer

  • RasterLayer, RasterStack, RasterBrick

  • rasterfile

46 / 58

Bounding Box for Omaha Area

library(raster)
coords <- centroids[[1]]
bb <- c(xmin = -96.268204, xmax = -95.742829,
ymin = 41.109533, ymax = 41.393063)
plot(extent(bb))

47 / 58

Shuttle Radar Topography Mission

srtm <- getData('SRTM', lat = coords[2], lon = coords[1],
path = 'dataWS/altitude', download = TRUE)
srtm <- crop(srtm, extent(bb), snap='out')
plot(srtm)

CIAT

48 / 58

Slope

slope <- terrain(srtm, opt='slope')
plot(slope)

49 / 58

Aspect

aspect <- terrain(srtm, opt='aspect')
plot(aspect)

50 / 58

Hill Shade

hs <- hillShade(slope, aspect, 10, 217.92)
plot(hs, col= heat.colors(25))

Azimuth Angle

51 / 58

Hill Shade

hs <- hillShade(slope, aspect, 90, 217.92)
plot(hs, col= heat.colors(25))

52 / 58

Contour Lines

hs <- hillShade(slope, aspect, 45, 217.92)
plot(hs, col= heat.colors(25))
omahaContour <- rasterToContour(srtm, maxpixels = 500)
plot(omahaContour, add = TRUE)

53 / 58

Shiny Web Applications

  • Lightweight framework for web applications

  • Use R engine for analysis

  • R Interface to producing HTML, Javascript for User Interface

  • Customizable HTML, CSS, and Javascript

install.packages('shiny')
install.packages('shinythemes')
install.packages('osrm')

Shiny

54 / 58

Open Source Routing Machine

library(osrm)
driveTime <- osrmIsochrone(loc = c(-95.9980, 41.2524))
driveTime
class : SpatialPolygonsDataFrame
features : 6
extent : -96.92043, -95.16736, 40.522, 41.94424 (xmin, xmax, ymin, ymax)
coord. ref. : +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 4
names : id, min, max, center
min values : 1, 0, 10, 5
max values : 6, 50, 60, 55

osrm, Project OSRM

55 / 58

Self Service Isochrone

  • User Input

    • Coordinates
    • Drive Times
    • Background map
  • Output

    • map of drive times
    • ability to download shapefiles
56 / 58

Shiny Template

ui <- fluidPage()
server <- function(input, output) {
}
shinyApp(ui, server)
  • reactive programming environment

  • input is provided by the end user

  • output is processed by the server based on input and rendered to the end user

  • single file application

57 / 58

Questions?

58 / 58

Software Requirements

  • R >= 3.4.3
  • R Packages
    • sf >= 0.6-3
    • sp >= 1.3-1
    • ggplot2 >= 3.0.0.9
    • leaflet >= 2.0.1
    • raster >= 2.6-7
    • httr >= 1.3.1
    • rvest
  • RStudio Deskop (preferred)
    • Any other IDE for R will work as well
  • Census API Key
2 / 58
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow