Monday, March 10, 2008

Mann-Kendall Trend (part 1)

Judging from the technical support questions, most people use the MAROS software for the convenient groundwater trend analysis tool -- specifically the Mann-Kendall analysis.

I have never advocated the Mann-Kendall analysis is the One-True-Statistical method or the be-all and end-all; however, it is often a pretty darn good way to look at data -- especially groundwater monitoring data.

The reason why Mann-Kendall is a pretty good approach is that it is a non-parametric method, meaning that there is no assumption of a statistical distribution (i.e. normal distribution). Most groundwater data is not distributed normally, due to the problem of left censoring (no values recorded below the detection limit) and the occasional very high concentration, orders of magnitude above the detection limt.

Another annoying feature of most groundwater monitoring programs is the propensity of site managers to sample wells irregularly (i.e. quarterly sampling in 2000 and 2004, semiannual sampling in 2001 and 2003, monthly in 2002, not at all in 2005 and whenever in 2006 etc. . . .).

So, statistical analysis of groundwater data can be complicated by a variety of factors from the nature of the analytical results to the site management decisions. Luckily, the Mann-Kendall analysis of trend can handle these problems pretty well. Frankly, when your groundwater data starts to look as high and tight as medical monitoring data, you can look to other methods. Until then, we may all be stuck with Mann-Kendall to a certain extent.

The way the Mann-Kendall analysis is handled in MAROS is a little different from other methods (Gilbert, 2987), and may require a little explanation.

The Mann-Kendall trend evaluation relies on three statistical metrics -- the 'S' statistic, the coefficient of variation (COV) and, what we call the confidence factor (CF).

This third critical statistical metric in the Mann-Kendall evaluation can cause a lot of confusion, as the CF represents a small modification of the usual approach to the Mann-Kendall analysis. The CF is a measure of confidence in rejecting the null hypothesis.

The null hypothesis (H0) states that the dataset shows no distinct trend. The Mann-Kendall method tests H0 against the alternative hypothesis (HA) -- that the data show a trend. The probability (p) of accepting H0 is determined from the Mann-Kendall table of probabilities, based on the number of samples (n, for n less than 40) and the absolute value of S. Specifically, p is the probability of obtaining a value of S equal or greater than the calculated value for n when no trend is present. We will reject H0 when p < alpha =" 0.1)." face="arial">
Typically, the Mann-Kendall test results in ‘No Trend’, or ‘Increasing’ or ‘Decreasing’ designations for the dataset. However, in order to develop a finer resolution of outcomes, the concept of ‘confidence factor’ has been developed.

We define the CF as (1-p) %. The CF is inversely proportional to p, and directly proportional to both S and n. When the CF is below 90% (p > 0.1), H0 is accepted, and the data are judged to show no distinct trend. For the method used by MAROS, data showing no trend can be classified in one of two ways -- Stable or No Trend. A ‘Stable’ result occurs when S <>1) and an S of any value.

When the CF is between 90 – 95% (0.1 > p > 0.05), H0 is rejected, but the trend is weak. The weakness of the trend is identified by using the terms “Probably Increasing” or “Probably Decreasing” to describe the data. For CF > 95% (p <>

By using the method described above, data from each well location can be categorized in one of 8 ways: Increasing trend (I), Decreasing trend (D), Probably Increasing trend (PI), Probably Decreasing trend (PD), Stable (S), No Trend (NT), non-detect (ND) or insufficient data to determine a trend (N/A) (for n less than 4).

As an example, consider a dataset with 12 sample events with an S statistic of -26. In the Mann-Kendall table, the p value for n=12 and abs[S] = 26 is 0.043. The coefficient of variation in the dataset is 0.65.

So: CF = (1 - p) = (1 - 0.043) 0.957 or 95.7%
S = -26
COV = 0.65

Conclusion: Decreasing Trend

With a CF of 95.7% we have very high confidence in rejecting the H0. The probability of accepting the null hypothesis is only 4.3%, well below our standard of 10%. So, we conclude that the data show a strongly Decreasing trend.

The power of a statistical test can be calculated from the number of samples (n), the variance in the data, alpha (or the false positive rate), and the critical effect size. For a specific statistical evaluation, the critical effect size can be difficult to identify. In the case of groundwater data, the practical quantitation limit is often designated as the critical effect size. Given these parameters, the practitioner has control only over the sample size and the detection limit.

Increasing n increases the power of the statistical analysis as long as variability in the data remains fairly low. For the Mann-Kendall analysis described above, increasing n can increase the CF, as long as the magnitude of S stays relatively consistent. Increasing the sample size can require time, as the sampling interval should be sufficient to produce independent-ish samples.

More on this later!





Thursday, March 6, 2008

LTMO Unleashed

After many years of conducting Long-Term Monitoring Optimization (LTMO) site evaluations, supporting LTMO software, conducting trainings, answering technical support questions, and sitting through hours and hours of meetings . . . I have decided to start posting LTMO information on the web. I hope that some of the insights and observations I post here can help those of you laboring in the groundwater monitoring trenches.

By way of explanation, Long-Term Monitoring in this blog will refer to monitoring of groundwater affected by common chemical contaminants (TCE, PCE, metals, BTEX, munitions, etc.). Information posted here is not necessarily applicable to initial site characterizations, but is intended as guidance for those with well characterized sites and lots and lots of data. I will cover some of the statistical methods used to support site decision-making and software tools available. I also plan to provide observations on qualitative evaluations, which can often appear subjective and confusing.

This blog will not cover other forms of monitoring, such as air or health -- I leave that to others. I will be posting whenever I feel like it, or whenever I have the time. No promises.

Onward.