Hyperspectral data sets contain useful information for characterizing vegetation canopies not previously available from multi-spectral data sources. However, to make full use of the information content one has to find ways for coping with the strong multi-collinearity in the data. The redundancy directly results from the fact that only a few variables effectively control the vegetation signature. This low dimensionality strongly contrasts with the often more than 100 spectral channels provided by modern spectroradiometers and through imaging spectroscopy. With this study we evaluated three different chemometric techniques specifically designed to deal with redundant (and small) data sets. In addition, a widely used 2-band vegetation index was chosen (NDVI) as a baseline approach.
A multi-site and multi-date field campaign was conducted to acquire the necessary reference observations. On small subplots the total canopy chlorophyll content was measured and the corresponding canopy signature (450–2500nm) was recorded (nobs = 42). Using this data set we investigated the predictive power and noise sensitivity of stepwise multiple linear regression (SMLR) and two ‘full spectrum’ methods: principal component regression (PCR) and partial least squares regression (PLSR). The NDVI was fitted to the canopy chlorophyll content using an exponential relation. For all techniques, a jackknife approach was used to obtain cross-validated statistics. The PLSR clearly outperformed all other techniques. PLSR gave a cross-validated RMSE of 51 mg m−2 for canopy chlorophyll contents ranging between 38 and 475 mg m−2 (0.99 ≤ LAI ≤ 8.74 m2 m−2). The lowest accuracy was achieved using PCR (RMSEcv = 82 mg m−2 and).
The NDVI, even using chlorophyll optimized band settings, could not reach the accuracy of PLSR. Regarding the sensitivity to artificially created (white) noise, PCR showed some advantages, whereas SMLR was the most sensitive chemometric technique. For relatively small, highly multi-collinear data sets the use of partial least square regression is recommended. PLSR makes full use of the rich spectral information while being relatively insensitive to sensor noise. PLSR provides a regression model where the entire spectral information is taken – in a weighted form – into account. This method seems therefore much better adapted to deal with potentially confounding factors compared to any 2-band vegetation index which can only avoid the most harmful factor of variation.