Leah A. Wasser
In this lesson we will learn how to perform some basic cleaning and plotting of spatial data in R.
At the end of this 30 minute overview you will be able to: 1. Open a vector data layer in R using
readOGR() 1. Open a raster data layer in R using
raster() 1. Create basic maps using
ggplot() 1. Reproject and crop raster and vector data
Work with vector data in R
Point, line OR polygon features can be stored within a vector dataset
There are many ways to import and map vector data in R.
To read the data, you have several options
sp: Import shapefiles and other data using
sp: more recently the
sfpackage has proved to be both faster and more efficient that
- if you have geojson data - there are several json packages that you can use - check out this tutorial on dealing with geojson imported from API’s in R if you’re interested in learning more.
# unzip data#library(utils) setwd("~/Documents/data/oss-institute")#setwd("~/Documents/github/oss-lessons/spatial-data-gis-law")library(rgdal)library(raster)library(ggplot2)library(rgeos)library(mapview)library(leaflet)library(broom) # if you plot with ggplot and need to turn sp data into dataframesoptions(stringsAsFactors = FALSE)
First, let’s download some data from natural earth.
# download the data download.file("http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_coastline.zip", destfile = 'coastlines.zip')# unzip the fileunzip(zipfile = "coastlines.zip", exdir = 'ne-coastlines-10m')
Next, we can open the data using
readOGR from the sp (spatial) package.
# load the data coastlines <- readOGR("ne-coastlines-10m/ne_10m_coastline.shp")
## OGR data source with driver: ESRI Shapefile ## Source: "ne-coastlines-10m/ne_10m_coastline.shp", layer: "ne_10m_coastline"## with 4132 features## It has 2 fields
Are the data points, lines of polygons?
CHALLENGE – Looking at the data, what are the 2 possible vector data structures that this data could be stored in?
# view spatial attributesclass(coastlines)
##  "SpatialLinesDataFrame"## attr(,"package")##  "sp"
## class : Extent ## xmin : -180 ## xmax : 180 ## ymin : -85.22194 ## ymax : 83.6341
## CRS arguments:## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
Super speedy quick plot with R baseplot … or not. Be patient - this object has a lot of complex features
plot(coastlines, main = "Global Coastlines")
This particular layer is complex. There are many details in the boundaries as rendered that we may want if we zoom in but may not need to produce a global map. let’s Simplify it. The
gSimplify function is a part of the rgeos package. The simplify function removes vertices from complex lines. Remember that a line is composed of vertices. A circle is simply a line with lots of vertices - the more vertices it has, the more ‘round’ the line appears.
As you use this function keep in mind that you are modifying your data. You probably don’t want to do this if you are performing any sort of quantitative analysis on the data but you definitely want to do this if you are creating online maps and other visual products from your data.
gSimplify function takes 3 arguments
- the data that you want to simplify
- tol - the tolerance value - a large number will remove more vertices, make the data small AND yield a “blockier” looking object. a SMALLER number will retain more vertices and maintain a smoother looking feature.
# simplify geometrycoastlines_simp <- gSimplify(coastlines, tol = 3, topologyPreserve = TRUE)
plot(coastlines_simp, main = "map with boundaries simplified")
Notice that here the map plots faster, but now it looks blocky. We may have simplified TOO MUCH. let’s reduce the
tol = argument value to .1.
# simplify with a lower tolerance value (keeping more detail)coastlines_sim2 <- gSimplify(coastlines, tol = .1, topologyPreserve = TRUE)plot(coastlines_sim2, main = "Map of coastlines - simplified geometry\n tol=.1")
That’s better. We now have enough detail for plotting purposes but have increased speed dramatically. These types of steps become important when creating online interactive maps to optimize speed.
IMPORTANT: when you modify the geometry you are also modifying the data - in this case any calculated perimeter or area values using these data will be compromised.
Less speedy plot with ggplot – but it looks so nice! NOTE: ggplot throws an error if you don’t include the
data = argument for some reason on your geom_ element. Be sure to always expliceltly include this.
# turn the data into a spatial data frame coastlines_sim2_df <- SpatialLinesDataFrame(coastlines_sim2, coastlines@data) #tidy(coastlines_sim2_df)# plot the data ggplot() + geom_path(data = coastlines_sim2_df, aes(x = long, y = lat, group = group)) + labs(title = "Global Coastlines - using ggplot")