This is going to be a series of visualization with R. The main packages used are ggplot2
and ggvis
, and probably shiny
. Hadley Wickham sees ggvis
as the next generation of ggplot2
, it is worth plotting the map by both packages. The first research subject is the China map.
The files (of different format) used for plotting geometry graphics are called ArcGIS files. The 3 key files are of extended name “.shp”, “.dbf” and “.shx”.
For the current subject, we only need the 3 files.
bou2_4p.shp
: The main file that stores the feature geometry.bou2_4p.dbf
: The dBASE table that stores the attribute information of features. There is a one-to-one relationship between geometry and attributes, which is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file.bou2_4p.shx
: The index file that stores the index of the feature geometry.
We use 4 packages for the study.
|
|
Data Structure of .shp File
Normally before we plot something, we need to know the structure of data. The function used to read shape files is maptools::readShapePoly()
:
|
|
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
The direct method to check the structure is to use str(cnmap)
. I don’t suggest to do that but… If you already did… You will find there are 5 main slots:
@data
: adata.frame
mapping the 925 polygons which consist the
entire China map.@polygons
: a list of 925 polygons containing geometric points data@plotOrder
: orders to plot these polygons@bbox
and@proj4string
: not used
|
|
## 'data.frame': 925 obs. of 7 variables:
## $ AREA : num 54.4 129.1 175.6 21.3 15.6 ...
## $ PERIMETER : num 68.5 129.9 84.9 41.2 38.4 ...
## $ BOU2_4M_ : int 2 3 4 5 6 7 8 9 10 11 ...
## $ BOU2_4M_ID: int 23 15 65 22 21 62 13 11 292 292 ...
## $ ADCODE93 : int 230000 150000 650000 220000 210000 620000 130000 110000 210000 210000 ...
## $ ADCODE99 : int 230000 150000 650000 220000 210000 620000 130000 110000 210000 210000 ...
## $ NAME : Factor w/ 33 levels "¸£½¨Ê¡","¸ÊËàÊ¡",..: 33 13 15 5 11 2 28 3 11 11 ...
## - attr(*, "data_types")= chr "N" "N" "N" "N" ...
|
|
## Formal class 'Polygons' [package "sp"] with 5 slots
## ..@ Polygons :List of 1
## .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
## .. .. .. ..@ labpt : num [1:2] 127.8 47.9
## .. .. .. ..@ area : num 54.4
## .. .. .. ..@ hole : logi FALSE
## .. .. .. ..@ ringDir: int 1
## .. .. .. ..@ coords : num [1:5784, 1:2] 121 121 122 122 122 ...
## ..@ plotOrder: int 1
## ..@ labpt : num [1:2] 127.8 47.9
## ..@ ID : chr "0"
## ..@ area : num 54.4
If you use English operation system, you may find messy codes in Column NAME
of the 1st slot. Try this (under WIN7):
|
|
## [1] "LC_COLLATE=Chinese (Simplified)_People's Republic of China.936;LC_CTYPE=Chinese (Simplified)_People's Republic of China.936;LC_MONETARY=Chinese (Simplified)_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic of China.936"
# in Linux or OS X system, the value for `locale` should be "zh_CN"
Then set the default text coding as UTF-8
.
If you want to label the province in the future in English, here is a solution:
|
|
Directly Draw China Map
plot()
is the easiest function we can use for the map.
|
|
In the map, we can see Nansha (Spratly) Islands and lots of the small islands surrounding the south-east coast (that’s why there are more than 900 polygons). We can simply remove them by filtering the AREA
(the 1st column of the 1st slot).
|
|
## [1] 56
Now we only need to plot 56 polygons.
|
|
Draw China Map Using ggplot2
Package
Now we try ggplot2
. ggplot2::fortify()
is a amazing function which converts a generic R object into a data frame useful for plotting. When you plot a SpatialPolygonsDataFrame
data, ggplot2
converts it by default to data.frame
.
|
|
## Regions defined for each Polygons
## 'data.frame': 79095 obs. of 7 variables:
## $ long : num 121 121 122 122 122 ...
## $ lat : num 53.3 53.3 53.3 53.3 53.3 ...
## $ order: int 1 2 3 4 5 6 7 8 9 10 ...
## $ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ piece: Factor w/ 53 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ group: Factor w/ 109 levels "0.1","1.1","2.1",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ id : chr "0" "0" "0" "0" ...
The first 2 columns are long
(longitude) and lat
(latitude) which are used to sketch the contour of polygons.
|
|
|
|
The difference between the 2 plots is the map projected coordinate system. There are dozens of projection methods, you could enter ?mapproject
to check more.
Draw China Map Using ggvis
Package
I didn’t find if there is a function like fortify()
in ggvis
package, we need to manually convert it to a data.frame
and use group_by
to group the polygons.
|
|