Waldo’s First Law of Geography states that “everything is related to everything else, but near things are more related than distant things.”
In the case of a semi-variogram, closer things are more predictable and has less variability. While distant things are less predictable and are less related.
For example, the terrain one meter ahead of you is more likely to be similar than 100 meters away.
As you’ll learn, semi-variogram charts out this critically important concept of how sample values (pollution, elevation, noise, etc.) vary with distance.
Soil Moisture Samples
Our example contains 73 soil moisture samples in a 10 acre field. In the north-west corner, the samples are much wetter with higher water content. But in the eastern quadrant, they are much dryer as color-coded in the image below.
- How predictable are values from place to place?
- Are known values closer together more similar than values farther apart?
This idea can be described with statistical dependence or autocorrelation. Further to this, autocorrelation (things closer together are more similar than things farther apart) provides valuable information for prediction.
READ MORE: Spatial Autocorrelation and Moran’s I in GIS
To understand spatial dependence, you can estimate it with a semi-variogram. Semi-variograms take 2 sample locations and calls the distance between points h. In the x-axis, it plots distance (h) in lags, which are just grouped distances. Taking each set of 2 sample locations, it measures the variance between the response variable (water content in soil) and plots it out in the y-axis.
Depending on the observer, semi-variograms look like a big mess of points. For example, our soil moisture plot looks like this:
But you can really do some detective work by selecting individual points. When you take this single point on the semi-variogram:
You can see which 2 points they represent on the map. This makes sense because they are a far distance apart from each other, hence its far-right position in the semi-variogram. It’s actually this point highlighted below:
They also have a large difference from the mean value in that particular lag distance. It’s positioned higher on the y-axis if the semivariance is high. As you probably noticed, the semivariance is smaller at closer distances and increases with larger lag distances.
We are looking at all distance between 2 samples and their variability. A semi-variogram considers all points and their distance with variance. That’s why semi-variograms have so many points on it. Here’s a subset of the data set above to see all the different sets of points that are being plotted out in a semi-variogram.
Range, Sill and Nugget
At sample points with close distances, the difference in values between points tend to be small. In other words, the semi-variance is small.
But when sample point distances are farther away, the points being compared to one another are less likely to be similar. This means that the semi-variance becomes large.
As distance increases away from sample points, there is no longer a relationship between the sample points. Their variance begins to flatten out, and sample values are not related to one another.
Sill: The value at which the model first flattens out.
Range: The distance at which the model first flattens out.
Nugget: The value at which the semi-variogram (almost) intercepts the y-value.
When you have two sample points at the same location, it is expected to have the same value so the nugget should be zero. Sometimes they don’t and this adds randomness. But before the graph starts leveling, these value are spatially auto correlated.
As expected, when distance increases, the semivariance increases. There are less pairs of points separated by far distances, hence the less correlation between sample points.
But as indicated in the semi-variogram with the sill and range, it begins to reach its flat, asymptotic level. This is when you try to fit a function to model this behavior.
Mathematical function and models
You select the type of model for how it fits the data because it will provide a mathematical function to the relationship between values and distances. Various functions can be used as a best fit such exponential, linear, spherical and Gaussian.
Ideally, you are trying to lower your R-squared value, as best fit as possible. However, when you have an understanding of how the phenomena behaves with distance, you can better choose which model to be applied to the semi-variogram.
A linear model means that spatial variability increases linearly with distance. It’s the most simple type of model without a plateau, meaning that the user has to arbitrarily select the sill and range.
The spherical model is one of the most common models used in variogram modelling. It is a modified quadratic equation where spatial dependence flattens out as the sill and range.
The exponential model resembles the spherical model in that spatial variability reaches the sill gradually. The relationship between two sample points decay gradually, while at a distance of infinite spatial dependence dissipates.
The Gaussian function is derived from a normal probability distribution curve. This type of model is useful where phenomena are similar at short distances because of its progressive rise up the y-axis.
This type of prediction model uses a circular function to fit spatial variability in a semi-variogram. It resembles the speherical model function where spatial dependence fades away at its asymptotic level.
Semi-variograms provide a useful preliminary step in understanding the nature of data.
Each phenomenon has its own semi-variogram and own mathematical function. The user uncovers the relationship between values and distances and then chooses the best fitting model.
Although semi-variograms are handy for understanding variation with distance, the model chosen from semi-variograms are commonly used the interpolation technique known as kriging. More on this later…
This is because the variogram model influences the prediction of those unknown values during kriging interpolation.