💾 Archived View for sylvaindurand.org › spatial-data-analysis-with-r › index.gmi captured on 2022-04-28 at 17:51:59. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
If R language has already become a reference in statistical analysis and data processing, it may be thanks to its hability to represent and visualize data. They can be used in order to visualize spatial data in the form of cartographic representations which, combined with its other features, makes it an excellent geographic information system. This article sets out to show, through the provision of relevant example, how R can handle spatial data by creating maps.
Once R is installed on your computer, few libraries will be used: `rgdal` allows us to import and project shapefiles, `plotrix` creates color scales, and `classInt` assigns colors to map data. Once the libraries installed with `install.packages`, load them at the beginning of the session:
library('rgdal') # Reading and projecting shapefiles library('plotrix') # Creating color scales library('classInt') # Assigning colors to data
Graphics will be plot using R base functions. `ggplot2` is an alternative, but it seems less relevant here: longer and less legible code, unability to plot holes inside polygones, `fortify` and ploting can last much longer.
The `rgdal` library provides `readOGR()` in order to read shapefiles. `dsn` must contain the path where shapefiles are located, and `layer` the shapefile name, without extension. `readOGR` reads `.shp`, `.shx`, `.dbf` and `.prj` files. Departements of France are given by Geofla:
# Reading departements departements <- readOGR(dsn="shp/geofla", layer="DEPARTEMENT") # Reading departements boundaries in order to plot France boundaries bounderies <- readOGR(dsn="shp/geofla", layer="LIMITE_DEPARTEMENT") bounderies <- bounderies[bounderies$NATURE %in% c('Fronti\xe8re internationale','Limite c\xf4ti\xe8re'),]
In order to show neighbouring countries, we will use data provided by Natural Earth. We will select Europe countries only:
# Reading country and selecting Europe europe <- readOGR(dsn="shp/ne/cultural", layer="ne_10m_admin_0_countries") europe <- europe[europe$REGION_UN=="Europe",]
The map will use the French official projection "Lambert 93", already declared in the Geofla `.prj` files. `spTransform` will be used for the European coutries.
Then, we will first plot French boundaries, in order to center the map on France. Borders colors are defined in `border`, their tickness in `lwd` and the filling color in `col`.
# Projection europe <- spTransform(europe, CRS("+init=epsg:2154")) # Plot pdf('france.pdf',width=6,height=4.7) par(mar=c(0,0,0,0)) plot(bounderies, col="#FFFFFF") plot(europe, col="#E6E6E6", border="#AAAAAA",lwd=1, add=TRUE) plot(bounderies, col="#D8D6D4", lwd=6, add=TRUE) plot(departements,col="#FFFFFF", border="#CCCCCC",lwd=.7, add=TRUE) plot(bounderies, col="#666666", lwd=1, add=TRUE) dev.off()
The very large number of communes (the smallest administrative level in France) gives us excellent spatial data. Geofla provides us their bounderies, population and area. So we will plot population density:
# Reading shapefile communes <- readOGR(dsn="shp/geofla", layer="COMMUNE") # Calculate density communes$DENSITY <- communes$POPULATION/communes$SUPERFICIE*100000
In order to create a color scale, we will assign shades of blue to each percentile. `classIntervals` calculates percentiles, `smoothColors` create the blue scale, and `findColours` assigns blues depending on each commune depending on their population density. Then, we create a legend, with only five colors.
# Color scale col <- findColours(classIntervals( communes$DENSITY, 100, style="quantile"), smoothColors("#0C3269",98,"white")) # Legend leg <- findColours(classIntervals( round(communes$DENSITY), 5, style="quantile"), smoothColors("#0C3269",3,"white"), under="less than", over="more than", between="–", cutlabels=FALSE)
We can now plot the map. In order to use an embedded font in the PDF, we will use `cairo_pdf()` instead of `pdf`:
# Exporting in PDF cairo_pdf('densite.pdf',width=6,height=4.7) par(mar=c(0,0,0,0),family="Myriad Pro",ps=8) # Ploting the map plot(bounderies, col="#FFFFFF") plot(europe, col="#E6E6E6", border="#AAAAAA",lwd=1, add=TRUE) plot(bounderies, col="#D8D6D4", lwd=6, add=TRUE) plot(communes, col=col, border=col, lwd=.1, add=TRUE) plot(bounderies, col="#666666", lwd=1, add=TRUE) # Ploting the legend legend(-10000,6387500,fill=attr(leg, "palette"), legend=names(attr(leg,"table")), title = "Density (p/km²):") dev.off()
Image: Population density in France
The main value is to plot data provided by external files. We will plot the median taxable income per consumption unit.
Unfortunately, data is missing for more than 5 000 communes, due to tax secrecy. We can "cheat" in order to improve the global render by assigning to those communes the canton (a larger administrative level) median income, given in the same file.
# Loading communes data incomes <- read.csv('csv/revenus.csv') communes <- merge(communes, incomes, by.x="INSEE_COM", by.y="COMMUNE") # Loading cantons data cantons <- read.csv('csv/cantons.csv') # Merging data communes <- merge(communes, cantons, by="CANTON", all.x=TRUE) # Assigning canton median income to communes with missing data communes$REVENUS[is.na(communes$REVENUS)] <- communes$REVENUC[is.na(communes$REVENUS)]
We generate color scale and legend:
col <- findColours(classIntervals( communes$REVENUS, 100, style="quantile"), smoothColors("#FFFFD7",98,"#F3674C")) leg <- findColours(classIntervals( round(communes$REVENUS,0), 4, style="quantile"), smoothColors("#FFFFD7",2,"#F3674C"), under="moins de", over="plus de", between="–", cutlabels=FALSE)
And we plot:
cairo_pdf('incomes.pdf',width=6,height=4.7) par(mar=c(0,0,0,0),family="Myriad Pro",ps=8) plot(boundaries, col="#FFFFFF") plot(europe, col="#F5F5F5", border="#AAAAAA",lwd=1, add=TRUE) plot(boundaries, col="#D8D6D4", lwd=6, add=TRUE) plot(communes, col=col, border=col, lwd=.1, add=TRUE) plot(boundaries, col="#666666", lwd=1, add=TRUE) legend(-10000,6337500,fill=attr(leg, "palette"), legend=gsub("\\.", ",", names(attr(leg,"table"))), title = "Median income:") dev.off()
Image: Median taxable income per consumption unit in France in 2010
We can also add map data: cities, urban areas, rivers, forests... We will here plot the French road network, thanks to Route 500, from IGN:
roads <- readOGR(dsn="shp/geofla", layer="TRONCON_ROUTE") local <- roads[roads$VOCATION=="Liaison locale",] principal <- roads[roads$VOCATION=="Liaison principale",] regional <- roads[roads$VOCATION=="Liaison r\xe9gionale",] highway <- roads[roads$VOCATION=="Type autoroutier",] cairo_pdf('roads.pdf',width=6,height=4.7) par(mar=c(0,0,0,0),family="Myriad Pro",ps=8) plot(boundaries, col="#FFFFFF") plot(europe, col="#F5F5F5", border="#AAAAAA",lwd=1, add=TRUE) plot(boundaries, col="#D8D6D4", lwd=6, add=TRUE) plot(departements,col="#FFFFFF", border="#FFFFFF",lwd=.7, add=TRUE) plot(local, col="#CBCBCB", lwd=.1, add=TRUE) plot(principal, col="#CCCCCC", lwd=.3, add=TRUE) plot(regional, col="#BBBBBB", lwd=.5, add=TRUE) plot(highway, col="#AAAAAA", lwd=.7, add=TRUE) plot(boundaries, col="#666666", lwd=1, add=TRUE) dev.off()
We will load the following Natural Earth:
# Loading Shapefiles countries <- readOGR(dsn="shp/ne/cultural",layer="ne_110m_admin_0_countries") graticules <- readOGR(dsn="shp/ne/physical",layer="ne_110m_graticules_10") bbox <- readOGR(dsn="shp/ne/physical",layer="ne_110m_wgs84_bounding_box")
We will use the Winkel Tripel projection:
# Winkel Tripel projection countries <- spTransform(countries,CRS("+proj=wintri")) bbox <- spTransform(bbox, CRS("+proj=wintri")) graticules <- spTransform(graticules, CRS("+proj=wintri")) # Ploting world map pdf('world.pdf',width=10,height=6) # PDF export par(mar=c(0,0,0,0)) # Zero margins plot(bbox, col="white", border="grey90",lwd=1) plot(countries, col="#E6E6E6", border="#AAAAAA",lwd=1, add=TRUE) plot(graticules, col="#CCCCCC33", lwd=1, add=TRUE) dev.off() # Saving file
Image: Carte du monde projetée en Winkel Tripel
Most frequent usage consists of visualizing data with a color scale. Let's plot the HDI, provided by the United Nations Develment Programme in a CSV file. The procedure is as described above:
# Loading data and merging dataframes hdi <- read.csv('csv/hdi.csv') countries <- merge(countries, hdi, by.x="iso_a3", by.y="Abbreviation") # Converting HDI in numeric countries$hdi <- as.numeric(levels(countries$X2012.HDI.Value))[countries$X2012.HDI.Value] # Generating color scale and assigning colors col <- findColours(classIntervals(countries$hdi, 100, style="pretty"), smoothColors("#ffffe5",98,"#00441b")) # Assigning grey to missing data col[is.na(countries$hdi)] <- "#DDDDDD" # Generating legend leg <- findColours(classIntervals(round(countries$hdi,3), 7, style="pretty"), smoothColors("#ffffe5",5,"#00441b"), under="less than", over="more than", between="–", cutlabels=FALSE) # Ploting cairo_pdf('hdi.pdf',width=10,height=6) par(mar=c(0,0,0,0),family="Myriad Pro",ps=8) plot(bbox, col="white", border="grey90",lwd=1) plot(countries, col=col, border=col,lwd=.8, add=TRUE) plot(graticules,col="#00000009",lwd=1, add=TRUE) legend(-15000000,-3000000,fill=attr(leg, "palette"), legend= names(attr(leg,"table")), title = "HDI in 2012 :") dev.off()
Image: Human Development Index (HDI) in 2012
An other kind of visualization is given by circles. Population of most populated cities is provided by Natural Earth:
# Loading shapefile cities <- readOGR(dsn="shp/ne/cultural", layer="ne_110m_populated_places") cities <- spTransform(cities, CRS("+proj=wintri"))
The data shall be proportionate to the circles areas, not the radius; so the radius is the square root of the population:
# Calculating circles radius cities$radius <- sqrt(cities$POP2015) cities$radius <- cities$radius/max(cities$radius)*3
We plot the map:
pdf('cities.pdf',width=10,height=6) par(mar=c(0,0,0,0)) plot(bbox, col="white", border="grey90", lwd=1) plot(countries, col="#E6E6E6", border="#AAAAAA", lwd=1, add=TRUE) points(cities, col="#8D111766", bg="#8D111766", lwd=1, pch=21, cex=cities$radius) plot(graticules, col="#CCCCCC33", lwd=1, add=TRUE) dev.off()
Image: Most populated cities in the world
Natural Earth provides urban areas shapefiles, derived from satellite data. Let'us map them with night lights colors:
areas <- readOGR(dsn="shp/ne/cultural",layer="ne_10m_urban_areas") areas <- spTransform(areas, CRS("+proj=wintri")) pdf('areas.pdf',width=10,height=6) par(mar=c(0,0,0,0)) plot(bbox, col="#000000", border="#000000",lwd=1) plot(countries, col="#000000", border="#000000",lwd=1, add=TRUE) plot(areas, col="#FFFFFF", border="#FFFFFF66",lwd=1.5, add=TRUE) dev.off()