| |
|
|
|
M.J. Menne, C.N. Williams, Jr., and R.S. Vose
National Climatic Data Center, National
Oceanic and Atmospheric Administration |
|
| |
|
|
|
| |
MONTHLY TEMPERATURE AND PRECIPITATION DATA
|
|
DATA ACCESS
PRINCIPAL INVESTIGATORS
HOME
DOE
NCDC
NOAA
|
| |
|
|
|
| |
| |
|
| |
Please cite data as: M. J. Menne, C. N. Williams, Jr., and R. S. Vose,
2009. United States Historical Climatology Network (USHCN) Version 2 Serial Monthly Dataset.
Carbon Dioxide Information Analysis Center, Oak Ridge National
Laboratory, Oak Ridge, Tennessee.
Last updated June 2009. |
|
|
|
| |
|
|
|
| |
INTRODUCTION
The United States Historical Climatology Network (USHCN) is essentially a subset of the
U.S. Cooperative Observer Network operated by NOAA's National Weather Surface (NWS).
The approximately 1200 HCN stations were originally selected according to factors
such as record longevity, percentage of missing values, spatial coverage, as well
as the number of station moves and/or other station changes that may affect data homogeneity.
Most HCN stations are situated in rural areas or small towns; however, a smaller
number of stations are also part of the NOAA NWS synoptic network, whose stations
are generally located at airports in more urbanized environments. USHCN
datasets have been developed at NOAA's National Climatic Data Center (NCDC) in
collaboration with the Department of Energy's Carbon Dioxide Information Analysis Center (CDIAC).
The USHCN project dates to the mid-1980s (Quinlan et al. 1987).
At that time, in response to the need for an accurate, unbiased, modern historical
climate record for the United States, personnel at the Global
Change Research Program of the U.S. Department of Energy and at NCDC
defined a network of 1219 stations in the contiguous United States whose
observations would comprise a key baseline dataset for monitoring
U.S. climate. Since then, the USHCN dataset has been updated
several times (e.g., Karl et al., 1990; Easterling et al., 1996). The USHCN
version 2 serial monthly data release is the most recent update to the
HCN datasets. Version 2 data were produced using a new set of quality
control and homogeneity assessment algorithms. Two papers have been
prepared (Menne and Williams, 2008 and Menne et al., 2008) that provide an
overall description of the adjustment methodology as well as an
assessment of the version 2 maximum and minimum temperature trends. A
brief summary of HCN processing steps is also provided below. The
methodology used in previous releases of the version 1 monthly data is
described at the NCDC USHCN Version 1 web site.
The USHCN database is used by NOAA to monitor temperature and precipitation
over the U.S. This includes the calculation of trends over roughly the last century
and regular updates to yearly and monthly state/regional rankings of temperature
and precipitation (see
NCDC's Climate Monitoring web page). Further background on the USHCN's use in this work may be found at NCDC's
National Temperature Trends: The Science Behind the Calculations web page.
|
|
| |
VERSION 2 DATA PROCESSING STEPS
The data from each HCN station were subject to the following quality control and homogeneity testing and adjustment procedures.
|
|
| |
QUALITY EVALUATION AND DATABASE CONSTRUCTION
First, daily maximum and minimum temperatures and total precipitation were
extracted from a number of different NCDC data sources and subjected to a
series of quality evaluation checks. The three sources of daily observations
included DSI-3200,
DSI-3206 and
DSI-3210.
Daily maximum and minimum temperature values that passed the evaluation checks were used to
compute monthly average values. However, no monthly temperature average or total precipitation
value was calculated for station-months in which more than 9 daily observations were missing or flagged as erroneous.
Monthly values calculated from the three daily data sources then were merged with two additional
sources of monthly data values to form a comprehensive dataset of serial monthly
temperature and precipitation values for each HCN station. Duplicate records between data sources were
eliminated. Following the merging procedure, the monthly values from all stations were subject to an
additional set of quality evaluation procedures, which removed between 0.1 and 0.2% of monthly temperature
values and less than 0.02% of monthly precipitation values.
|
|
| |
TIME OF OBSERVATION BIAS ADJUSTMENTS
Next, monthly temperature values were adjusted for the time-of-observation
bias (Karl, et al. 1986; Vose et al., 2003).
The Time of Observation Bias (TOB) arises when the 24-hour daily summary period at a
station begins and ends at an hour other than local midnight. When the summary period
ends at an hour other than midnight, monthly mean temperatures exhibit a
systematic bias relative to the local midnight standard (Baker, 1975).
In the U.S. Cooperative Observer Network, the ending hour of the 24-hour climatological day
typically varies from station to station and can change at a given station during its period of record.
The TOB-adjustment software uses an empirical model to estimate and adjust the monthly
temperature values so that they more closely resemble values based on the local midnight
summary period. The metadata archive is used to determine the time of observation for any
given period in a station's observational history.
|
|
| |
HOMOGENEITY TESTING AND ADJUSTMENT PROCEDURES
Following the TOB adjustments, the homogeneity of the TOB-adjusted temperature series is
assessed. In previous releases of the USHCN monthly dataset, homogeneity adjustments were
performed using the procedure described in Karl and Williams (1987). This procedure was
used to evaluate non-climatic discontinuities (artificial changepoints) in a station's
temperature or precipitation series caused by known changes to a station such as
equipment relocations and changes. Since knowledge of changes in the status of
observations comes from the station history metadata archive maintained at NCDC, the
original USHCN homogenization algorithm was known as the Station History
Adjustment Program (SHAP).
Unfortunately, station histories are often incomplete so artificial
discontinuities in a data series may occur on dates with no associated
record in the metadata archive. Undocumented station changes obviously
limit the effectiveness of SHAP. To remedy the problem of incomplete
station histories, the version 2 homogenization algorithm addresses both
documented and undocumented discontinuities.
The potential for undocumented discontinuities adds a layer of
complexity to homogeneity testing. Tests for undocumented changepoints, for
example, require different sets of test-statistic percentiles than those used in
analogous tests for documented discontinuities (Lund and Reeves, 2002).
For this reason, tests for undocumented changepoints are inherently less sensitive than
their counterparts used when changes are documented. Tests for documented changes
should, therefore, also be conducted where possible to maximize the power of detection for
all artificial discontinuities. In addition, since undocumented changepoints can occur in
all series, accurate attribution of any particular discontinuity between two climate series is more
challenging (Menne and Williams, 2005).
The USHCN version 2 "pairwise" homogenization algorithm addresses these and other
issues according to the following steps, which are described in detail in
Menne and Williams (2008).
At present, only temperature series are evaluated for artificial changepoints.
- First, a series of monthly temperature differences is formed between
numerous pairs of station series in a region. Specifically, difference series are
calculated between each target station series and a number (up to 40) of highly
correlated series from nearby stations. In effect, a matrix of difference
series is formed for a large fraction of all possible combinations of station
series pairs in each localized region. The station pool for this pairwise comparison of
series includes USHCN stations as well as other U.S. Cooperative Observer Network stations.
- Tests for undocumented changepoints are then applied to each paired difference
series. A hierarchy of changepoint models is used to distinguish whether the changepoint
appears to be a change in mean with no trend
(Alexandersson and Moberg, 1997), a
change in mean within a general trend (Wang, 2003), or a
change in mean coincident with a change in trend
(Lund and Reeves, 2002) . Since all
difference series are comprised of values from two series, a changepoint date in any one
difference series is temporarily attributed to both station series used to calculate the
differences. The result is a matrix of potential changepoint dates for each station series.
- The full matrix of changepoint dates is then "unconfounded" by identifying the
series common to multiple paired-difference series that have the same changepoint date. Since each
series is paired with a unique set of neighboring series, it is possible to determine whether more than one nearby
series share the same changepoint date.
- The magnitude of each relative changepoint is calculated using the most
appropriate two-phase regression model (e.g., a jump in mean with no trend in the series, a
jump in mean within a general linear trend, etc.). This magnitude is used to
estimate the "window of uncertainty" for each changepoint date since the most probable
date of an undocumented changepoint is subject to some sampling uncertainty, the
magnitude of which is a function of the size of the changepoint. Any cluster of
undocumented changepoint dates that falls within overlapping windows of
uncertainty is conflated to a single changepoint date according to
- a known change date as documented in the target station's history
archive (meaning the discontinuity does not appear to be undocumented), or
- the most common undocumented changepoint date within the uncertainty window (meaning the discontinuity
appears to be truly undocumented)
- Finally, multiple pairwise estimates of relative step change magnitude are re-calculated
(as a simple difference in mean) at all documented and undocumented discontinuities attributed to the
target series. The range of the pairwise estimates for each target step change is used to calculate
confidence limits for the magnitude of the discontinuity. Adjustments are made to the
target series using the estimates for each shift in the series.
|
|
| |
ESTIMATION OF MISSING VALUES
Following the homogenization process, estimates for missing data are
calculated using a weighted average of
values from highly correlated neighboring stations. The weights are determined using a procedure
similar to the SHAP routine. This program, called FILNET, uses the results from the
TOB and homogenization algorithms to obtain a more accurate estimate of the
climatological relationship between stations. The FILNET program also estimates
data across intervals in a station record where discontinuities occur in a short
time interval, which prevents the reliable estimation of appropriate adjustments.
|
|
| |
URBANIZATION EFFECTS
In the original HCN, the regression-based approach of Karl et al. (1988)
was employed to account for urban heat islands. In contrast, no specific urban correction is
applied in HCN version 2 because the change-point detection algorithm effectively accounts for
any "local" trend at any individual station. In other words, the impact of urbanization and other
changes in land use is likely small in HCN version 2. Figure 2 - the minimum temperature
time series for Reno, Nevada - provides anecdotal evidence in this regard. In brief, the
black line represents unadjusted data, and the blue line represents fully adjusted data.
The unadjusted data clearly indicate that the station at Reno experienced both major step
changes (e.g., a move from the city to the airport during the 1930s) and trend
changes (e.g., a possible growing urban heat island beginning in the 1970s).
In contrast, the fully adjusted (homogenized) data indicate that both the
step-type changes and the trend changes have been effectively addressed
through the change-point detection process used in HCN version 2.
Figure 1. (a) Mean annual unadjusted and fully adjusted minimum temperatures at
Reno, Nevada. Error bars indicating the magnitude of uncertainty (±1 standard error) were
calculated via 100 Monte Carlo simulations that sampled within the range of the pairwise
estimates for the magnitude of each inhomogeneity; (b) difference between minimum temperatures at
Reno and the mean from its 10 nearest neighbors.
|
| |
STATION INFORMATION
The format of each record in the USHCN station inventory file
(ushcn-stations.txt) is as follows.
| Variable |
|
Columns |
|
Type |
| COOP ID |
|
1-6 |
|
Character |
| LATITUDE |
|
8-15 |
|
Real |
| LONGITUDE |
|
17-25 |
|
Real |
| ELEVATION |
|
27-32 |
|
Real |
| STATE |
|
34-35 |
|
Character |
| NAME |
|
37-66 |
|
Character |
| COMPONENT 1 |
|
68-73 |
|
Character |
| COMPONENT 2 |
|
75-80 |
|
Character |
| COMPONENT 3 |
|
82-87 |
|
Character |
| UTC OFFSET |
|
89-90 |
|
Integer |
These variables have the following definitions:
| COOP ID |
|
is the U.S. Cooperative Observer Network station identification
code. Note that the first two digits in the Coop ID correspond
to the assigned state number (see Table 1 below).
|
| |
|
|
| LATITUDE |
|
is latitude of the station (in decimal degrees). |
| |
|
|
| LONGITUDE |
|
is the longitude of the station (in decimal degrees). |
| |
|
|
| ELEVATION |
|
is the elevation of the station (in meters, missing = -999.9). |
| |
|
|
| STATE |
|
is the U.S. postal code for the state. |
| |
|
|
| NAME |
|
is the name of the station location. |
| |
|
|
| COMPONENT 1 |
|
is the Coop Id for the first station (in chronologic order) whose
records were joined with those of the HCN site to form a longer time
series. "------" indicates "not applicable". |
| |
|
|
| COMPONENT 2 |
|
is the Coop Id for the second station (if applicable) whose records
were joined with those of the HCN site to form a longer time series. |
| |
|
|
| COMPONENT 3 |
|
is the Coop Id for the third station (if applicable) whose records
were joined with those of the HCN site to form a longer time series. |
| |
|
|
| UTC OFFSET |
|
is the time difference between Coordinated Universal Time (UTC) and
local standard time at the station (i.e., the number of hours that
must be added to local standard time to match UTC).
|
|
|
| |
TABLE 1. State numbers and abbreviations for the contiguous United States |
|
| |
| State number |
|
State abbreviation |
|
State |
| 01 | | AL | | Alabama |
| 02 | | AZ | | Arizona |
| 03 | | AR | | Arkansas |
| 04 | | CA | | California |
| 05 | | CO | | Colorado |
| 06 | | CT | | Connecticut |
| 07 | | DE | | Delaware |
| 08 | | FL | | Florida |
| 09 | | GA | | Georgia |
| 10 | | ID | | Idaho |
| 11 | | IL | | Idaho |
| 12 | | IN | | Indiana |
| 13 | | IA | | Iowa |
| 14 | | KS | | Kansas |
| 15 | | KY | | Kentucky |
| 16 | | LA | | Louisiana |
| 17 | | ME | | Maine |
| 18 | | MD | | Maryland |
| 19 | | MA | | Massachusetts |
| 20 | | MI | | Michigan |
| 21 | | MN | | Minnesota |
| 22 | | MS | | Mississippi |
| 23 | | MO | | Missouri |
| 24 | | MT | | Montana |
| 25 | | NE | | Nebraska |
| 26 | | NV | | Nevada |
| 27 | | NH | | New Hampshire |
| 28 | | NJ | | New Jersey |
| 29 | | NM | | New Mexico |
| 30 | | NY | | New York |
| 31 | | NC | | North Carolina |
| 32 | | ND | | North Dakota |
| 33 | | OH | | Ohio |
| 34 | | OK | | Oklahoma |
| 35 | | OR | | Oregon |
| 36 | | PA | | Pennsylvania |
| 37 | | RI | | Rhode Island |
| 38 | | SC | | South Carolina |
| 39 | | SD | | South Dakota |
| 40 | | TN | | Tennessee |
| 41 | | TX | | Texas |
| 42 | | UT | | Utah |
| 43 | | VT | | Vermont |
| 44 | | VA | | Virginia |
| 45 | | WA | | Washington |
| 46 | | WV | | West Virginia |
| 47 | | WI | | Wisconsin |
| 48 | | WY | | Wyoming |
|
|
| |
DATA FILES
USHCN data files may be downloaded from CDIAC's anonymous FTP area (see the
USHCN Data Access page).
There are four data files, two "estimated uncertainty" files, the station inventory file described above, and
a "status" file for the USHCN version 2 database. Filenames and further descriptions are as follows.
| FILENAME |
|
DESCRIPTION
|
| |
|
|
| 9641C_YYYYMM_F52.max.gz |
|
GZIP-compressed file of bias-adjusted mean monthly
maximum temperatures |
| 9641C_YYYYMM_F52.min.gz |
|
GZIP-compressed file of bias-adjusted mean monthly
minimum temperatures |
| 9641C_YYYYMM_F52.avg.gz |
|
GZIP-compressed file of the average of bias-
adjusted mean monthly maximum and minimum
temperatures |
| 9641C_YYYYMM_F52.pcp.gz |
|
GZIP-compressed file of total monthly precipitation
(un-adjusted) |
|
9641C_err_52d.max.gz |
|
GZIP-compressed file of the estimated uncertainty
associated with the bias-adjusted mean monthly
maximum temperatures (1 standard error) |
|
9641C_err_52d.min.gz |
|
GZIP-compressed file of the estimated uncertainty
associated with the bias-adjusted mean monthly
minimum temperatures (1 standard error) |
|
ushcn-stations.txt |
|
List of U.S. HCN stations and their coordinates |
|
status.txt |
|
Notes on the current status of USHCN Version 2
Monthly Data |
Each USHCN data file contains data for all 1218 stations for one of the four
meteorological variables (also known as data "elements").
Each record (line) in the files contains one year of 12 monthly values plus an
annual value, with formatting as follows:
| Variable |
|
Columns |
|
Type |
| STATION ID |
|
1-6 |
|
Character |
| ELEMENT |
|
7-7 |
|
Integer |
| YEAR |
|
8-11 |
|
Integer |
| VALUE1 |
|
13-17 |
|
Integer |
| FLAG1 |
|
18-18 |
|
Character |
| VALUE2 |
|
20-24 |
|
Integer |
| FLAG2 |
|
25 |
|
Character |
| . |
|
. |
|
. |
| . |
|
. |
|
. |
| VALUE13 |
|
97-101 |
|
Integer |
| FLAG13 |
|
102 |
|
Character |
These variables have the following definitions:
| STATION ID |
|
is the station identification code.
Note that the first two characters in the Station ID correspond
to the state number in Table 1.
|
| |
|
|
| ELEMENT |
|
is the element code. There are four values corresponding to the
element contained in the file: |
| |
|
1 = mean maximum temperature (in tenths of degrees F) |
| |
|
2 = mean minimum temperature (in tenths of degrees F) |
| |
|
3 = average temperature (in tenths of degrees F) |
| |
|
4 = total precipitation (in hundredths of inches) |
| |
|
|
| YEAR |
|
is the year of the record. |
| |
|
|
| VALUE1 |
|
is the value for January in the year of record (missing = -9999). |
|
| FLAG1 |
|
is the flag for January in the year of record. There are
five possible values:
|
| | | Blank = no flag is applicable |
| | | E = value is an estimate from surrounding values; no original
value is available;
|
| | | I = monthly value calculated from incomplete daily data (1 to 9
days were missing);
|
| | | Q = value is an estimate from surrounding values; the original
value was flagged by the monthly quality control algorithms;
|
| | | X = value is an estimate from surrounding values; the original
was part of block of monthly values that was too short to
adjust in the temperature homogenization algorithm.
|
| |
|
|
| VALUE2 |
|
is the value for February in the year of record.
|
| |
|
|
| FLAG2 |
|
is the flag for February in the year of record. |
| . |
|
. |
| . |
|
. |
| VALUE12 |
|
is the value for December in the year of record.
|
| |
|
|
| FLAG12 |
|
is the flag for December in the year of record. |
| |
|
|
| VALUE13 |
|
is the annual value (mean for temperature; total for precipitation).
|
| |
|
|
| FLAG13 |
|
is the flag for the annual value. |
|
| |
|
| |
DATA ACCESS
The USHCN monthly data are available via FTP or a Web interface that allows users to
query, plot, and download individual station data. Please see the
USHCN Data Access page.
|
| |
|
| |
REFERENCES
- Alexandersson, H. and A. Moberg, 1997: Homogenization of
Swedish temperature data. Part I: Homogeneity test for linear trends. Int. J. Climatol., 17, 25-34.
- Baker, D. G., 1975: Effect of observation time on mean temperature
estimation. J. Appl. Meteor., 14, 471-476.
- Easterling, D. R., T. R. Karl, E.H. Mason, P. Y. Hughes, and D. P. Bowman. 1996.
United States Historical Climatology Network (U.S. HCN) Monthly Temperature and Precipitation Data.
ORNL/CDIAC-87, NDP-019/R3.
Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee.
- Karl, T.R., H.F. Diaz, and G. Kukla, 1988: Urbanization: its detection and effect in the United States climate record, J. Climate. 1, 1099-1123.
- Karl, T.R., and C.N. Williams Jr., 1987: An approach to adjusting climatological time series for discontinuous inhomogeneities. J. Climate Appl. Meteor., 26, 1744-1763.
- Karl, T.R., C.N. Williams, Jr., P.J. Young, and W.M. Wendland, 1986: A model to estimate the time of observation bias associated with monthly mean maximum, minimum, and mean temperature for the United States, J. Climate Appl. Meteor., 25, 145-160.
- Karl, T.R., and C.N. Williams Jr., 1987: An approach to adjusting climatological time series for discontinuous inhomogeneities. J. Climate Appl. Meteor., 26, 1744-1763.
- Lund, R., and J. Reeves, 2002: Detection of undocumented changepoints: a revision of the two-phase regression model. J. Climate, 15, 2547-2554.
- Menne, M.J., and C.N. Williams, Jr., 2005: Detection of undocumented changepoints using multiple test statistics and composite reference series. J. Climate, 18, 4271-4286.
- Menne, M.J., and C.N. Williams, Jr., 2008: Homogenization of temperature series via pairwise comparisons. Journal of Climate, Early online release, doi: 10.1175/2008JCLI2263.1.
- Menne, M.J., C.N. Williams, and R.S. Vose, 2009: The United States Historical Climatology Network Monthly Temperature Data - Version 2. Bull. Amer. Meteor. Soc., 90, 993-1007, doi: 10.1175/2008BAMS2613.1.
- Quinlan, F. T., T. R. Karl, and C. N. Williams, Jr. 1987. United States Historical Climatology Network (HCN) Serial Temperature and Precipitation Data. NDP-019. Carbon Dioxide Information Analysis Center. Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee.
- Vose, R.S., C.N. Williams, T.C. Peterson, T.R. Karl, and D.R. Easterling, 2003: An evaluation of the time of observation bias adjustment in the US Historical Climatology Network. Geophysical research letters, 30 (20), 2046, clim3-1--3-4 doi:10.1029/2003GL018111.
- Wang, X.L., 2003: Comments on "Detection of undocumented changepoints: A revision of the two-phase model". J. Climate, 16, 3383-3385.
|
| |
CONTACTS
Questions regarding the USHCN web site or data may be directed to
Dale Kaiser at CDIAC.
|
| |
|
|
|
|
|
|