In research fields ranging from climate to seismology, analysts routinely examine time series data—records of how variables like temperatures change or micro-seismic signals change over time. Identifying meaningful patterns in these series provides insight for everything from stock predictions to climate models.
However, a common pitfall arises in time series analysis: the problem of autocorrelation.
More specifically, in a time series that exhibits autocorrelation, the data values at any given time are statistically dependent on prior values in the same series. This violates the assumption of independence that many statistical techniques rely on.
Positive autocorrelation implies that a high value in the series will likely be followed by another high value, and vice versa for low values. This may reflect an underlying cyclical pattern or inertia in the system generating the time series.
While sometimes subtle, ignoring autocorrelation can severely undermine the validity of analysis results for just about every area of research that deals with time series data.
The climate research field, for example, is rife with datasets that exhibit excessive autocorrelation due to non-stationarity data.
In the simplest of explanations, non-stationarity data refers to the concept that patterns and properties are not stable over time. They change or shift instead of staying the same. For example, if you measure the temperature each month and the average temperature starts getting higher over time, that data is non-stationary. The average is changing instead of being steady.
Numerous scientific and commercial research projects work with non-stationarity data, which can lead to autocorrelation and negatively affect factual evaluation and understanding. Besides climate research, these fields of study consist of:
- Finance - Stock prices, exchange rates, interest rates, etc.
- Macroeconomics - GDP, unemployment, inflation rates.
- Demographics - Population, birth rates, migration data.
- Econometrics - Supply/demand curves, sales forecasts.
- Signal Processing - Network traffic data, audio/video signals.
- Neuroscience - EEG brain wave recordings, neural spike trains.
- Meteorology - Precipitation, pressure, wind speed time series.
- Oceanography - Sea level measurements, wave heights.
- Seismology - Earthquake occurrence rates, fault displacements.
- Astronomy - Flux measurements of astronomical objects, solar activity.
Autocorrelation from non-stationarity data arises for multiple reasons. Many systems exhibit inherent inertia and "memory"—today's value is influenced by yesterday’s. Gradual shifts, like trends, also induce autocorrelation. Cyclical forces like seasons impose autocorrelated oscillations. And measurement issues can artificially introduce autocorrelation.
Regardless of origin, the presence of autocorrelation can wreak havoc on analysis results if left unchecked.
Thus, it is incumbent on researchers to carefully identify, account for, and then correct the autocorrelation when necessary in order to achieve the desired outcome of an objective analysis and interpretation, regardless of the research domain.
For example, consider analyzing historical home sales data to forecast prices. Autocorrelation caused by inertia in housing markets would result in each month’s sales being correlated with prior months. Failing to account for this could lead to wildly overconfident price forecasts due to underestimated uncertainty.
Or take epidemiological models of disease spread, such as the recent COVID-19 outbreak. Autocorrelation from clustering of infections could distort transmission rate estimates across a region and globally if not addressed properly.
The bottom line is that improperly handling autocorrelation impairs the accuracy of short-term and long-term time series statistics and models.
More insidiously, autocorrelation can profoundly mislead causal analysis as a result of spurious correlations.
For instance, say a company's quarterly sales numbers exhibit autocorrelation due to economic cycles. Meanwhile, their advertising spending is steadily increasing each quarter. Simplistic analysis could suggest higher ad spending is driving sales growth. But the correlation is an artifact—sales are autocorrelated for macroeconomic reasons unrelated to the company's ads. Properly correcting for autocorrelation reveals the spuriousness.
And climate data provides many cautionary examples of how unaddressed autocorrelation can lead researchers astray, especially in determining causality, i.e., cause and effect.
Due to inertia in Earth's climate system, the atmosphere, and the oceans, temperature warming and rainfall measurements exhibit significant autocorrelation. Ignoring this can result in claiming strong predictive relationships between climate variables that are merely statistical artifacts.
A familiar and highly politicized example in the climate field is the slow but steady rise in long-term temperatures in relation to rising atmospheric CO2 concentrations.
The simple and easiest interpretation is that CO2 causes the warming.
But if the persistently high autocorrelation of the temperature time series is removed, both the statistical and causal relationship with CO2 disappear. (See the two included charts.)
Yet, consensus-narrative climate scientists will frequently ignore or minimize the severe autocorrelations that are due to non-stationarity and other characteristics.
Instead, they claim that most climate warming and weather phenomena are due to human CO2 emissions.
However, in reality, as the autocorrelation correction suggests, atmospheric CO2 may have some warming impact but is not the dominant force among the many climate variables that contribute.
While the complexities of time series analysis make autocorrelation a thorny issue, as this global temperature and CO2 example indicate, there are multiple well-established techniques to detect and mitigate it.
For example, transforming data by simply differencing between consecutive points can help undo the autocorrelation issues. (See the two included charts.)
The key to avoiding the perils of autocorrelation is the requisite due diligence to identify and, if necessary, correct the autocorrelation as a mandatory first step in any time series investigation.
The benefits of removing autocorrelation from a time series of data are many. Benefit examples include:
1. Allows for Better Statistical Testing:
- Many statistical tests and models assume independence between data points.
- Autocorrelation violates this assumption and can lead to incorrect confidence intervals, p-values, etc. Correcting for autocorrelation reduces the risk of biased parameter estimates and improves the accuracy of models.
2. Reveals True Relationships and Correlations:
- Autocorrelation can obscure or inflate relationships between variables that are truly independent. Removing autocorrelation clarifies the true correlations.
3. Allows for Causal Analysis:
- Autocorrelation makes it difficult to infer causal relationships between variables. Removing autocorrelation helps isolate the true impact of explanatory variables on the outcome variable over time.
4. Facilitates Signal Extraction:
- Autocorrelation acts as a smoothing filter that blurs the underlying signal. Eliminating autocorrelation helps recover the true signal and patterns.
5. Enables Identification of Exogenous Shocks:
- Major shocks and disruptions are easier to detect in the absence of autocorrelation smoothing effects. Provides insights into structural breaks.
6. Enhances Accuracy in Long-Term Planning:
- For time series data used in long-term planning or strategic decision-making, accurate models are essential. Correcting for autocorrelation helps ensure that the models used for planning are reliable and produce realistic forecasts.
7. Preserves Scientific Validity and Integrity:
- In scientific research, correcting for autocorrelation helps maintain the scientific validity and integrity of the research.
While autocorrelation may seem arcane to the general public, the benefits that accrue from its identification and proper addressing are significant.
Despite the benefits of proper treatment, many researchers still choose to ignore it, and doing so puts policymakers and citizens in jeopardy as autocorrelation can easily bias decision-making towards ineffective solutions and costly failures due to a misunderstanding of the actual causes and effects.
In science and in policy, true cause and effect matter.