*Leah A. Wasser*

### Introduction

In this lesson we will learn how to perform some basic cleaning and plotting of spatial data in R.

### Learning outcomes

At the end of this 30 minute overview you will be able to: 1. Open a vector data layer in R using `readOGR()`

1. Open a raster data layer in R using `raster()`

1. Create basic maps using `ggplot()`

1. Reproject and crop raster and vector data

## Work with vector data in R

Intro to vector data in R - Earth Data Science website

Point, line OR polygon features can be stored within a vector dataset

There are many ways to import and map vector data in R.

To read the data, you have several options

`sp`

: Import shapefiles and other data using`readOGR()`

from the`sp`

package`sp`

: more recently the`sf`

package has proved to be both faster and more efficient that`sp`

- if you have geojson data - there are several json packages that you can use - check out this tutorial on dealing with geojson imported from API’s in R if you’re interested in learning more.

`# unzip data#library(utils) setwd("~/Documents/data/oss-institute")#setwd("~/Documents/github/oss-lessons/spatial-data-gis-law")library(rgdal)library(raster)library(ggplot2)library(rgeos)library(mapview)library(leaflet)library(broom) # if you plot with ggplot and need to turn sp data into dataframesoptions(stringsAsFactors = FALSE)`

First, let’s download some data from natural earth.

`# download the data download.file("http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_coastline.zip", destfile = 'coastlines.zip')# unzip the fileunzip(zipfile = "coastlines.zip", exdir = 'ne-coastlines-10m')`

Next, we can open the data using `readOGR`

from the sp (spatial) package.

`# load the data coastlines <- readOGR("ne-coastlines-10m/ne_10m_coastline.shp")`

`## OGR data source with driver: ESRI Shapefile ## Source: "ne-coastlines-10m/ne_10m_coastline.shp", layer: "ne_10m_coastline"## with 4132 features## It has 2 fields`

Are the data points, lines of polygons?

CHALLENGE – Looking at the data, what are the 2 possible vector data structures that this data could be stored in?

`# view spatial attributesclass(coastlines)`

`## [1] "SpatialLinesDataFrame"## attr(,"package")## [1] "sp"`

`extent(coastlines)`

`## class : Extent ## xmin : -180 ## xmax : 180 ## ymin : -85.22194 ## ymax : 83.6341`

`crs(coastlines)`

`## CRS arguments:## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0`

Super speedy quick plot with R baseplot … or not. Be patient - this object has a lot of complex features

`plot(coastlines, main = "Global Coastlines")`

This particular layer is complex. There are many details in the boundaries as rendered that we may want if we zoom in but may not need to produce a global map. let’s **Simplify** it. The `gSimplify`

function is a part of the rgeos package. The simplify function removes vertices from complex lines. Remember that a line is composed of vertices. A circle is simply a line with lots of vertices - the more vertices it has, the more ‘round’ the line appears.

As you use this function keep in mind that you are modifying your data. You probably don’t want to do this if you are performing any sort of quantitative analysis on the data but you definitely want to do this if you are creating online maps and other visual products from your data.

The `gSimplify`

function takes 3 arguments

- the data that you want to simplify
- tol - the
**tol**erance value - a large number will remove more vertices, make the data small AND yield a “blockier” looking object. a SMALLER number will retain more vertices and maintain a smoother looking feature.

`# simplify geometrycoastlines_simp <- gSimplify(coastlines, tol = 3, topologyPreserve = TRUE)`

`plot(coastlines_simp, main = "map with boundaries simplified")`

Notice that here the map plots faster, but now it looks blocky. We may have simplified TOO MUCH. let’s reduce the `tol =`

argument value to .1.

`# simplify with a lower tolerance value (keeping more detail)coastlines_sim2 <- gSimplify(coastlines, tol = .1, topologyPreserve = TRUE)plot(coastlines_sim2, main = "Map of coastlines - simplified geometry\n tol=.1")`

That’s better. We now have enough detail for plotting purposes but have increased speed dramatically. These types of steps become important when creating online interactive maps to optimize speed.

IMPORTANT: when you modify the geometry you are also modifying the data - in this case any calculated perimeter or area values using these data will be compromised.

## ggplot example

Less speedy plot with ggplot – but it looks so nice! NOTE: ggplot throws an error if you don’t include the `data =`

argument for some reason on your geom_ element. Be sure to always expliceltly include this.

`# turn the data into a spatial data frame coastlines_sim2_df <- SpatialLinesDataFrame(coastlines_sim2, coastlines@data) #tidy(coastlines_sim2_df)# plot the data ggplot() + geom_path(data = coastlines_sim2_df, aes(x = long, y = lat, group = group)) + labs(title = "Global Coastlines - using ggplot")`