What is Principal Component Analysis in GIS?
PCA. You’ve probably seen this acronym before. PCA stands for “Principal Component Analysis“.
But what is it and how is it used in GIS and remote sensing?
Sometimes, variables are highly correlated in such a way that it would be duplicate information found in another variable. Principal component analysis identifies duplicate data over several datasets. Only essential information is aggregated into groups called “principal components”.
The power of PCA is that it creates a new dataset with only the essential information.
The bottom line is that you reduce redundancy when using PCA.
Principal Component Analysis Example in ArcGIS
What about elevation, slope and hillshade data? Is there redundancy in these three datasets?
Here’s how to run a PCA analysis with elevation, hillshade and slope bands in ArcGIS:
1 Run the “Composite Bands” tool
The composite bands tool combines the elevation, hillshade and slope rasters into a single 3-band raster. Use the following rasters as inputs:
- Band 1: Elevation
- Band 2: Hillshade
- Band 3: Slope
Output the new raster as Composite
2 Execute the “Principal Components” tool
Using the spatial analyst extension, execute the “Principal Components” tool with the following criteria:
- Input Raster: Composite
- Output Raster: PCA
- Number of Principal Components: 3
- Output Data File: PrincipalComponents.txt
The result will be a 3 channel PCA composite and a data file showing the amount of redundancy.
3 Analyze the Principal Components Table
The “Percent of Eigenvalues” shows how much each principal component accounts for.
|PC Layer||EigenValue||Percent of EigenValues||Accumulative of EigenValues|
This table shows that the first component accounts for 67.1% of the covariance.
When you add the second channel, it accounts for 98.1% of the covariance. The third component does not give much extra information (1.9%) and is slightly redundant with principal components 1 and 2.
How is PCA used for remote sensing image classification?
Running a principal component analysis on three bands was useful – we found the third component did not add much information.
What about a 10-band (multispectral) raster? Or even 100 or 200 bands (hyperspectral)?
This is where PCA is really useful – multispectral and hyperspectral analysis.
For example, if most of the variance (eigenvalue) is found in principal components one, two and three, it’s only necessary to use these three principal components. For land cover classification, it is much easier using three bands compared to all 10 bands.
In summary, PCA identifies duplicate data over multiple channels, reduces redundancy and speeds up processing time. This is key for principal component analysis image processing.
Where you’re working with highly correlated variables, you can run a principal component analysis to see which ones are.
If you’ve tested this principal component analysis guide, try to master these other spatial statistics guides: