Long-Term Daily and Monthly Climate Records from
Stations Across the Contiguous United States
 
  UNITED STATES HISTORICAL CLIMATOLOGY NETWORK  
       

M.J. Menne, C.N. Williams, Jr., and R.S. Vose
National Climatic Data Center, National Oceanic and Atmospheric Administration


       
  MONTHLY TEMPERATURE AND PRECIPITATION DATA
 

DATA ACCESS

PRINCIPAL INVESTIGATORS

HOME

DOE

NCDC

NOAA

       
 

   
 

Please cite data as: M. J. Menne, C. N. Williams, Jr., and R. S. Vose, 2009. United States Historical Climatology Network (USHCN) Version 2 Serial Monthly Dataset. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, Oak Ridge, Tennessee.
Last updated June 2009.

   

   

  INTRODUCTION

The United States Historical Climatology Network (USHCN) is essentially a subset of the U.S. Cooperative Observer Network operated by NOAA's National Weather Surface (NWS). The approximately 1200 HCN stations were originally selected according to factors such as record longevity, percentage of missing values, spatial coverage, as well as the number of station moves and/or other station changes that may affect data homogeneity. Most HCN stations are situated in rural areas or small towns; however, a smaller number of stations are also part of the NOAA NWS synoptic network, whose stations are generally located at airports in more urbanized environments. USHCN datasets have been developed at NOAA's National Climatic Data Center (NCDC) in collaboration with the Department of Energy's Carbon Dioxide Information Analysis Center (CDIAC).

The USHCN project dates to the mid-1980s (Quinlan et al. 1987). At that time, in response to the need for an accurate, unbiased, modern historical climate record for the United States, personnel at the Global Change Research Program of the U.S. Department of Energy and at NCDC defined a network of 1219 stations in the contiguous United States whose observations would comprise a key baseline dataset for monitoring U.S. climate. Since then, the USHCN dataset has been updated several times (e.g., Karl et al., 1990; Easterling et al., 1996). The USHCN version 2 serial monthly data release is the most recent update to the HCN datasets. Version 2 data were produced using a new set of quality control and homogeneity assessment algorithms. Two papers have been prepared (Menne and Williams, 2008 and Menne et al., 2008) that provide an overall description of the adjustment methodology as well as an assessment of the version 2 maximum and minimum temperature trends. A brief summary of HCN processing steps is also provided below. The methodology used in previous releases of the version 1 monthly data is described at the NCDC USHCN Version 1 web site.

The USHCN database is used by NOAA to monitor temperature and precipitation over the U.S. This includes the calculation of trends over roughly the last century and regular updates to yearly and monthly state/regional rankings of temperature and precipitation (see NCDC's Climate Monitoring web page). Further background on the USHCN's use in this work may be found at NCDC's National Temperature Trends: The Science Behind the Calculations web page.


 
  VERSION 2 DATA PROCESSING STEPS

The data from each HCN station were subject to the following quality control and homogeneity testing and adjustment procedures.

 
  QUALITY EVALUATION AND DATABASE CONSTRUCTION

First, daily maximum and minimum temperatures and total precipitation were extracted from a number of different NCDC data sources and subjected to a series of quality evaluation checks. The three sources of daily observations included DSI-3200, DSI-3206 and DSI-3210. Daily maximum and minimum temperature values that passed the evaluation checks were used to compute monthly average values. However, no monthly temperature average or total precipitation value was calculated for station-months in which more than 9 daily observations were missing or flagged as erroneous. Monthly values calculated from the three daily data sources then were merged with two additional sources of monthly data values to form a comprehensive dataset of serial monthly temperature and precipitation values for each HCN station. Duplicate records between data sources were eliminated. Following the merging procedure, the monthly values from all stations were subject to an additional set of quality evaluation procedures, which removed between 0.1 and 0.2% of monthly temperature values and less than 0.02% of monthly precipitation values.

 
  TIME OF OBSERVATION BIAS ADJUSTMENTS

Next, monthly temperature values were adjusted for the time-of-observation bias (Karl, et al. 1986; Vose et al., 2003). The Time of Observation Bias (TOB) arises when the 24-hour daily summary period at a station begins and ends at an hour other than local midnight. When the summary period ends at an hour other than midnight, monthly mean temperatures exhibit a systematic bias relative to the local midnight standard (Baker, 1975). In the U.S. Cooperative Observer Network, the ending hour of the 24-hour climatological day typically varies from station to station and can change at a given station during its period of record. The TOB-adjustment software uses an empirical model to estimate and adjust the monthly temperature values so that they more closely resemble values based on the local midnight summary period. The metadata archive is used to determine the time of observation for any given period in a station's observational history.

 
  HOMOGENEITY TESTING AND ADJUSTMENT PROCEDURES

Following the TOB adjustments, the homogeneity of the TOB-adjusted temperature series is assessed. In previous releases of the USHCN monthly dataset, homogeneity adjustments were performed using the procedure described in Karl and Williams (1987). This procedure was used to evaluate non-climatic discontinuities (artificial changepoints) in a station's temperature or precipitation series caused by known changes to a station such as equipment relocations and changes. Since knowledge of changes in the status of observations comes from the station history metadata archive maintained at NCDC, the original USHCN homogenization algorithm was known as the Station History Adjustment Program (SHAP).

Unfortunately, station histories are often incomplete so artificial discontinuities in a data series may occur on dates with no associated record in the metadata archive. Undocumented station changes obviously limit the effectiveness of SHAP. To remedy the problem of incomplete station histories, the version 2 homogenization algorithm addresses both documented and undocumented discontinuities.

The potential for undocumented discontinuities adds a layer of complexity to homogeneity testing. Tests for undocumented changepoints, for example, require different sets of test-statistic percentiles than those used in analogous tests for documented discontinuities (Lund and Reeves, 2002). For this reason, tests for undocumented changepoints are inherently less sensitive than their counterparts used when changes are documented. Tests for documented changes should, therefore, also be conducted where possible to maximize the power of detection for all artificial discontinuities. In addition, since undocumented changepoints can occur in all series, accurate attribution of any particular discontinuity between two climate series is more challenging (Menne and Williams, 2005).

The USHCN version 2 "pairwise" homogenization algorithm addresses these and other issues according to the following steps, which are described in detail in Menne and Williams (2008). At present, only temperature series are evaluated for artificial changepoints.

  1. First, a series of monthly temperature differences is formed between numerous pairs of station series in a region. Specifically, difference series are calculated between each target station series and a number (up to 40) of highly correlated series from nearby stations. In effect, a matrix of difference series is formed for a large fraction of all possible combinations of station series pairs in each localized region. The station pool for this pairwise comparison of series includes USHCN stations as well as other U.S. Cooperative Observer Network stations.
  2. Tests for undocumented changepoints are then applied to each paired difference series. A hierarchy of changepoint models is used to distinguish whether the changepoint appears to be a change in mean with no trend (Alexandersson and Moberg, 1997), a change in mean within a general trend (Wang, 2003), or a change in mean coincident with a change in trend (Lund and Reeves, 2002) . Since all difference series are comprised of values from two series, a changepoint date in any one difference series is temporarily attributed to both station series used to calculate the differences. The result is a matrix of potential changepoint dates for each station series.
  3. The full matrix of changepoint dates is then "unconfounded" by identifying the series common to multiple paired-difference series that have the same changepoint date. Since each series is paired with a unique set of neighboring series, it is possible to determine whether more than one nearby series share the same changepoint date.
  4. The magnitude of each relative changepoint is calculated using the most appropriate two-phase regression model (e.g., a jump in mean with no trend in the series, a jump in mean within a general linear trend, etc.). This magnitude is used to estimate the "window of uncertainty" for each changepoint date since the most probable date of an undocumented changepoint is subject to some sampling uncertainty, the magnitude of which is a function of the size of the changepoint. Any cluster of undocumented changepoint dates that falls within overlapping windows of uncertainty is conflated to a single changepoint date according to
    • a known change date as documented in the target station's history archive (meaning the discontinuity does not appear to be undocumented), or
    • the most common undocumented changepoint date within the uncertainty window (meaning the discontinuity appears to be truly undocumented)
  5. Finally, multiple pairwise estimates of relative step change magnitude are re-calculated (as a simple difference in mean) at all documented and undocumented discontinuities attributed to the target series. The range of the pairwise estimates for each target step change is used to calculate confidence limits for the magnitude of the discontinuity. Adjustments are made to the target series using the estimates for each shift in the series.

 
  ESTIMATION OF MISSING VALUES

Following the homogenization process, estimates for missing data are calculated using a weighted average of values from highly correlated neighboring stations. The weights are determined using a procedure similar to the SHAP routine. This program, called FILNET, uses the results from the TOB and homogenization algorithms to obtain a more accurate estimate of the climatological relationship between stations. The FILNET program also estimates data across intervals in a station record where discontinuities occur in a short time interval, which prevents the reliable estimation of appropriate adjustments.

 
  URBANIZATION EFFECTS

In the original HCN, the regression-based approach of Karl et al. (1988) was employed to account for urban heat islands. In contrast, no specific urban correction is applied in HCN version 2 because the change-point detection algorithm effectively accounts for any "local" trend at any individual station. In other words, the impact of urbanization and other changes in land use is likely small in HCN version 2. Figure 2 - the minimum temperature time series for Reno, Nevada - provides anecdotal evidence in this regard. In brief, the black line represents unadjusted data, and the blue line represents fully adjusted data. The unadjusted data clearly indicate that the station at Reno experienced both major step changes (e.g., a move from the city to the airport during the 1930s) and trend changes (e.g., a possible growing urban heat island beginning in the 1970s). In contrast, the fully adjusted (homogenized) data indicate that both the step-type changes and the trend changes have been effectively addressed through the change-point detection process used in HCN version 2.

Mean annual unadjusted and fully adjusted minimum temperatures at Reno, Nevada

Figure 1. (a) Mean annual unadjusted and fully adjusted minimum temperatures at Reno, Nevada. Error bars indicating the magnitude of uncertainty (±1 standard error) were calculated via 100 Monte Carlo simulations that sampled within the range of the pairwise estimates for the magnitude of each inhomogeneity; (b) difference between minimum temperatures at Reno and the mean from its 10 nearest neighbors.

  STATION INFORMATION

The format of each record in the USHCN station inventory file (ushcn-stations.txt) is as follows.
Variable   Columns   Type
COOP ID   1-6   Character
LATITUDE   8-15   Real
LONGITUDE   17-25   Real
ELEVATION   27-32   Real
STATE   34-35   Character
NAME   37-66   Character
COMPONENT 1   68-73   Character
COMPONENT 2   75-80   Character
COMPONENT 3   82-87   Character
UTC OFFSET   89-90   Integer

These variables have the following definitions:

COOP ID   is the U.S. Cooperative Observer Network station identification code. Note that the first two digits in the Coop ID correspond to the assigned state number (see Table 1 below).
     
LATITUDE   is latitude of the station (in decimal degrees).
     
LONGITUDE   is the longitude of the station (in decimal degrees).
     
ELEVATION   is the elevation of the station (in meters, missing = -999.9).
     
STATE   is the U.S. postal code for the state.
     
NAME   is the name of the station location.
     
COMPONENT 1   is the Coop Id for the first station (in chronologic order) whose records were joined with those of the HCN site to form a longer time series. "------" indicates "not applicable".
     
COMPONENT 2   is the Coop Id for the second station (if applicable) whose records were joined with those of the HCN site to form a longer time series.
     
COMPONENT 3   is the Coop Id for the third station (if applicable) whose records were joined with those of the HCN site to form a longer time series.
     
UTC OFFSET   is the time difference between Coordinated Universal Time (UTC) and local standard time at the station (i.e., the number of hours that must be added to local standard time to match UTC).


  TABLE 1. State numbers and abbreviations for the contiguous United States  
 
State number   State abbreviation   State
01 AL Alabama
02 AZ Arizona
03 AR Arkansas
04 CA California
05 CO Colorado
06 CT Connecticut
07 DE Delaware
08 FL Florida
09 GA Georgia
10 ID Idaho
11 IL Idaho
12 IN Indiana
13 IA Iowa
14 KS Kansas
15 KY Kentucky
16 LA Louisiana
17 ME Maine
18 MD Maryland
19 MA Massachusetts
20 MI Michigan
21 MN Minnesota
22 MS Mississippi
23 MO Missouri
24 MT Montana
25 NE Nebraska
26 NV Nevada
27 NH New Hampshire
28 NJ New Jersey
29 NM New Mexico
30 NY New York
31 NC North Carolina
32 ND North Dakota
33 OH Ohio
34 OK Oklahoma
35 OR Oregon
36 PA Pennsylvania
37 RI Rhode Island
38 SC South Carolina
39 SD South Dakota
40 TN Tennessee
41 TX Texas
42 UT Utah
43 VT Vermont
44 VA Virginia
45 WA Washington
46 WV West Virginia
47 WI Wisconsin
48 WY Wyoming

  DATA FILES

USHCN data files may be downloaded from CDIAC's anonymous FTP area (see the USHCN Data Access page). There are four data files, two "estimated uncertainty" files, the station inventory file described above, and a "status" file for the USHCN version 2 database. Filenames and further descriptions are as follows.

FILENAME   DESCRIPTION
     
9641C_YYYYMM_F52.max.gz   GZIP-compressed file of bias-adjusted mean monthly maximum temperatures
9641C_YYYYMM_F52.min.gz   GZIP-compressed file of bias-adjusted mean monthly minimum temperatures
9641C_YYYYMM_F52.avg.gz   GZIP-compressed file of the average of bias- adjusted mean monthly maximum and minimum temperatures
9641C_YYYYMM_F52.pcp.gz   GZIP-compressed file of total monthly precipitation (un-adjusted)
9641C_err_52d.max.gz   GZIP-compressed file of the estimated uncertainty associated with the bias-adjusted mean monthly maximum temperatures (1 standard error)
9641C_err_52d.min.gz   GZIP-compressed file of the estimated uncertainty associated with the bias-adjusted mean monthly minimum temperatures (1 standard error)
ushcn-stations.txt   List of U.S. HCN stations and their coordinates
status.txt   Notes on the current status of USHCN Version 2 Monthly Data


Each USHCN data file contains data for all 1218 stations for one of the four meteorological variables (also known as data "elements"). Each record (line) in the files contains one year of 12 monthly values plus an annual value, with formatting as follows:
Variable   Columns   Type
STATION ID   1-6   Character
ELEMENT   7-7   Integer
YEAR   8-11   Integer
VALUE1   13-17   Integer
FLAG1   18-18   Character
VALUE2   20-24   Integer
FLAG2   25   Character
.   .   .
.   .   .
VALUE13   97-101   Integer
FLAG13   102   Character

These variables have the following definitions:

STATION ID   is the station identification code. Note that the first two characters in the Station ID correspond to the state number in Table 1.
     
ELEMENT   is the element code. There are four values corresponding to the element contained in the file:
    1 = mean maximum temperature (in tenths of degrees F)
    2 = mean minimum temperature (in tenths of degrees F)
    3 = average temperature (in tenths of degrees F)
    4 = total precipitation (in hundredths of inches)
     
YEAR   is the year of the record.
     
VALUE1   is the value for January in the year of record (missing = -9999).
 
FLAG1   is the flag for January in the year of record. There are five possible values:
  Blank = no flag is applicable
  E = value is an estimate from surrounding values; no original value is available;
  I = monthly value calculated from incomplete daily data (1 to 9 days were missing);
  Q = value is an estimate from surrounding values; the original value was flagged by the monthly quality control algorithms;
  X = value is an estimate from surrounding values; the original was part of block of monthly values that was too short to adjust in the temperature homogenization algorithm.
     
VALUE2   is the value for February in the year of record.
     
FLAG2   is the flag for February in the year of record.
.   .
.   .
VALUE12   is the value for December in the year of record.
     
FLAG12   is the flag for December in the year of record.
     
VALUE13   is the annual value (mean for temperature; total for precipitation).
     
FLAG13   is the flag for the annual value.

   
 

DATA ACCESS

The USHCN monthly data are available via FTP or a Web interface that allows users to query, plot, and download individual station data. Please see the USHCN Data Access page.

   
 

REFERENCES

 

CONTACTS

Questions regarding the USHCN web site or data may be directed to Dale Kaiser at CDIAC.

     

USHCN HomeContact UsCDIACESDORNLDisclaimers
This site provided by the Oak Ridge National Laboratory
ORNL is managed by UT-Battelle LLC. for the U.S. Department of Energy