Intro to spatial data in R (2022)

Leah A. Wasser


In this lesson we will learn how to perform some basic cleaning and plotting of spatial data in R.

Learning outcomes

At the end of this 30 minute overview you will be able to: 1. Open a vector data layer in R using readOGR() 1. Open a raster data layer in R using raster() 1. Create basic maps using ggplot() 1. Reproject and crop raster and vector data

Work with vector data in R

Intro to vector data in R - Earth Data Science website

Intro to spatial data in R (1)

Point, line OR polygon features can be stored within a vector dataset

There are many ways to import and map vector data in R.

To read the data, you have several options

# unzip data#library(utils) setwd("~/Documents/data/oss-institute")#setwd("~/Documents/github/oss-lessons/spatial-data-gis-law")library(rgdal)library(raster)library(ggplot2)library(rgeos)library(mapview)library(leaflet)library(broom) # if you plot with ggplot and need to turn sp data into dataframesoptions(stringsAsFactors = FALSE)

First, let’s download some data from natural earth.

# download the data download.file("", destfile = '')# unzip the fileunzip(zipfile = "", exdir = 'ne-coastlines-10m')

Next, we can open the data using readOGR from the sp (spatial) package.

# load the data coastlines <- readOGR("ne-coastlines-10m/ne_10m_coastline.shp")
## OGR data source with driver: ESRI Shapefile ## Source: "ne-coastlines-10m/ne_10m_coastline.shp", layer: "ne_10m_coastline"## with 4132 features## It has 2 fields

Are the data points, lines of polygons?
CHALLENGE – Looking at the data, what are the 2 possible vector data structures that this data could be stored in?

# view spatial attributesclass(coastlines)
## [1] "SpatialLinesDataFrame"## attr(,"package")## [1] "sp"
## class : Extent ## xmin : -180 ## xmax : 180 ## ymin : -85.22194 ## ymax : 83.6341
## CRS arguments:## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

Super speedy quick plot with R baseplot … or not. Be patient - this object has a lot of complex features

plot(coastlines, main = "Global Coastlines")

Intro to spatial data in R (2)

This particular layer is complex. There are many details in the boundaries as rendered that we may want if we zoom in but may not need to produce a global map. let’s Simplify it. The gSimplify function is a part of the rgeos package. The simplify function removes vertices from complex lines. Remember that a line is composed of vertices. A circle is simply a line with lots of vertices - the more vertices it has, the more ‘round’ the line appears.
Intro to spatial data in R (3)

As you use this function keep in mind that you are modifying your data. You probably don’t want to do this if you are performing any sort of quantitative analysis on the data but you definitely want to do this if you are creating online maps and other visual products from your data.

The gSimplify function takes 3 arguments

  1. the data that you want to simplify
  2. tol - the tolerance value - a large number will remove more vertices, make the data small AND yield a “blockier” looking object. a SMALLER number will retain more vertices and maintain a smoother looking feature.
# simplify geometrycoastlines_simp <- gSimplify(coastlines, tol = 3, topologyPreserve = TRUE)
plot(coastlines_simp, main = "map with boundaries simplified")

Intro to spatial data in R (4)

Notice that here the map plots faster, but now it looks blocky. We may have simplified TOO MUCH. let’s reduce the tol = argument value to .1.

# simplify with a lower tolerance value (keeping more detail)coastlines_sim2 <- gSimplify(coastlines, tol = .1, topologyPreserve = TRUE)plot(coastlines_sim2, main = "Map of coastlines - simplified geometry\n tol=.1")

Intro to spatial data in R (5)

That’s better. We now have enough detail for plotting purposes but have increased speed dramatically. These types of steps become important when creating online interactive maps to optimize speed.

IMPORTANT: when you modify the geometry you are also modifying the data - in this case any calculated perimeter or area values using these data will be compromised.

ggplot example

Less speedy plot with ggplot – but it looks so nice! NOTE: ggplot throws an error if you don’t include the data = argument for some reason on your geom_ element. Be sure to always expliceltly include this.

# turn the data into a spatial data frame coastlines_sim2_df <- SpatialLinesDataFrame(coastlines_sim2, coastlines@data) #tidy(coastlines_sim2_df)# plot the data ggplot() + geom_path(data = coastlines_sim2_df, aes(x = long, y = lat, group = group)) + labs(title = "Global Coastlines - using ggplot")