Last updated February 21, 2019
This methodology explains how EPI measures wages, hours, and compensation using a variety of government data sources. Primarily, EPI’s wage data come from the Current Population Survey (CPS), the federal government survey that is best known for providing the monthly estimates of unemployment. This document covers:
- the methodology for measuring hourly wages using the Current Population Survey Outgoing Rotation Group (CPS ORG) and for measuring annual wages and hours worked using the Current Population Survey Annual Social and Economic Supplement (CPS ASEC)
- a discussion of wage data from the Social Security Administration (SSA)
- an explanation of EPI’s measurement of compensation (which includes wages and benefits) using Bureau of Economic Analysis National Income and Product Accounts (NIPA) tables and Bureau of Labor Statistics National Compensation Survey’s Employer Costs for Employee Compensation (ECEC) data
The methodology outlined here is used in the wages, wage gaps, and benefits and compensation sections of EPI’s State of Working America Data Library as well as EPI reports that use this data.
The CPS is a monthly household survey prepared by the U.S. Census Bureau for the Bureau of Labor Statistics (BLS). Specifically, for 1979 and beyond, we analyze microdata files that contain a full year’s data on the outgoing rotation groups (ORG) in the CPS. (For years prior to 1979, we use the CPS May files; our use of these files is discussed later on this page.) We believe that the CPS-ORG files allow for timely and accurate analyses of wage trends that are in keeping with the familiar labor force definitions and concepts employed by BLS.
The sampling framework of the CPS is a “rolling panel,” in which households are in the survey for four consecutive months, out for eight, and then back in for four months. The ORG files provide data on those CPS respondents in either their fourth or eighth month of the survey (i.e., in groups four or eight, out of a total of eight groups). Therefore, in any given month, the ORG file represents a quarter of the CPS sample. For a given year, the ORG file is equivalent in size to three months of CPS files (one-fourth of 12 months). For our analyses, we use a sample drawn from the full-year ORG sample, the size of which ranges from 170,000 to 199,000 observations during the 1979 to 1995 period. Due to a decrease in the overall sample size of the CPS, the ORG shrunk to an average of 154,000 cases from 1996 to 1998, and was an average of 171,000 cases from 1999 to 2014. Our most recent sample contains about 159,000 cases.
Our subsample includes all wage and salary workers with valid wage and hour data, whether paid weekly or by the hour. Additionally, in order to be included in our subsample, respondents had to meet the following criteria:
- be age 16 or older
- be employed in the public or private sector (self-employed were excluded)
- have hours worked within the valid range in the survey (1–99 per week, or “hours vary”—see discussion below)
- have either hourly or weekly wages within the valid survey range (top-coding discussed below)
Exceptions to this subsample are few and include the entry-level worker sample, which is 17- to 24-year-olds.
For those who met these criteria, an hourly wage was calculated in the following manner: If a valid hourly wage was reported, that wage was used throughout our analysis. For salaried workers (those who report only a weekly wage), the hourly wage was the weekly wage divided by the hours worked. Outliers, i.e., persons with hourly wages below 50 cents or above $100 in 1989 dollars (adjusted by the CPI-U-RS consumer price index), were removed from the analysis. These yearly upper and lower bounds are presented in Table 1. CPS demographic weights were applied to make the sample nationally representative.
Wage earner sample, hourly wage lower and upper limits, 1973–2019
|Year||CPI-U-RS, extended||Lower limit ($)||Upper limit ($)|
Source: EPI analysis of Current Population Survey Outgoing Rotation Group microdata
The hourly wage reported by hourly workers in the CPS excludes overtime, tips, or commissions (OTTC), thus introducing a potential undercount in the hourly wage of workers who regularly receive tips or premium pay. OTTC is included in the usual weekly earnings of hourly workers, which raises the possibility of assigning an imputed hourly wage to hourly workers based on the reported weekly wage and hours worked per week. Conceptually, using this imputed wage is preferable to using the reported hourly wage because it is more inclusive. We have chosen, however, not to use this broader wage measure, because the extra information on OTTC seems unreliable. We compared the imputed hourly wage (reported weekly earnings divided by weekly hours) with the reported hourly wage; the difference presumably reflects OTTC. This comparison showed that significant percentages of the hourly workforce appeared to receive negative OTTC. These error rates range from a low of 0 percent of the hourly workforce in 1989–1993 to a high of 16–17 percent in 1973–1988, and persist across the survey change from 1993 to 1994. Since negative OTTC is clearly implausible, we rejected this imputed hourly wage series and rely strictly on the hourly rate of pay as reported directly by hourly workers, subject to the sample criteria discussed above.
Constructing wage deciles is often complicated by the fact that reported wages in the data tend to “clump,” particularly around round numbers. So, for example, a large share of workers might report their hourly wage as $10.00 per hour, even if they actually make $9.87 or $10.14. And, of course, the true wage distribution might itself be a bit clumpy—meaning that an unusually large share of workers really might make exactly $10.00 per hour. This becomes a problem if the 10th percentile wage happens to fall right into such a clump, you can find that more than 10 percent of workers are earning the 10th percentile wage or less. A final problem concerns when a percentile (say the 10th) lands right on a large clump workers. This can make that percentile show no growth for a couple of years. Again, say that many workers report a wage of $10.00, and they do so in a number of successive years even as their actual wage changes from (say) $9.90 to $10.10. If the 10th percentile landed right on the $10.00 clump, it could look static for years in the nominal data.
For these reasons, we enforce some smoothness in the wage data to ensure that exactly 10 percent of our sample falls at or under the 10th percentile wage (and 20 percent falls under the 20th percentile and so on). The smoothing technique involves creating a categorical hourly wage distribution, in which the categories are 25-cent intervals1 (also referred to as 25-cent bins). We then find the bins on either side of each decile and perform a weighted, linear interpolation to locate the wage cutoffs for each of the particular deciles. The weights for the interpolation are derived from differences in the cumulative percentages on either side of the decile. For example, suppose that the 48th percentile of the wage distribution of workers by wage level is in the $9.51–$9.75 wage bin, and the 51st percentile is in the next higher bin, $9.76–$10.00. The weight for the interpolation (in this case, the median, or 50th percentile) is (50–48)/ (51–48), or two-thirds. The interpolated median equals this weight, times the width of the bin ($.25), plus the upper bound of the previous bin ($9.75); $9.92 in this example.
In order to preserve the confidentiality of respondents, the income variables in the public-use files of the CPS are top-coded, that is, values above a certain level are capped at a single common value. The reasoning is that since so few individuals, if any, have incomes above this “top-code,” reporting the exact income number could allow somebody to use that information (along with other information from the CPS, such as state of residence, age, ethnicity, etc.) to actually identify a specific survey respondent. For the survey years 1973–1985, the weekly wage is top-coded at $999.00; an extended top-code value of $1,923 is available in 1986–1997; the top-code value changes to $2,884.61 in 1998 and remains at that level. Particularly for the later years, this truncation of the wage distribution creates a downward bias in the mean wage. We dealt with the top-coding issue by imputing a new weekly wage for top-coded individuals. The imputed value is the Pareto-imputed mean for the upper tail of the weekly earnings distribution, based on the distribution of weekly earnings up to the 80th percentile. (The Pareto distribution is defined as c/(x^(a+1)), where c and a are positive constants that we estimate using the top 20 percent of the empirical distribution. More precisely, c is a scale parameter assumed known; a is the key parameter for estimation.) The estimate uses the shape of the upper part of the distribution (in our case, the top 20 percent) to extrapolate to the part that is unobservable due to the top-codes. This procedure was done for men and women separately. The imputed values for men and women appear in Table 2. A new hourly wage, equal to the new estimated value for weekly earnings, divided by that person’s usual hours per week, was calculated.
Pareto-imputed mean values for top-coded weekly earnings, and share top coded, 1973–2016
|Share (percent hours)||Value|
Source: EPI analysis of Current Population Survey Outgoing Rotation Group microdata
In January 1994, a new survey instrument was introduced into the CPS; many labor force items were added and improved. This presents a significant challenge to researchers who wish to make comparisons over time. The most careful research on the impact of the survey change has been conducted by BLS researcher Anne Polivka.2 Interestingly, Polivka did not find that the survey changes had a major impact on broad measures of unemployment or wage levels, though significant differences did surface for some subgroups (e.g., weekly earnings for those with less than a high school diploma and those with advanced degrees, and the unemployment rate of older workers). However, a change in the reporting of weekly hours did call for the alteration of our methodology. In 1994 the CPS began allowing people to report that their usual hours worked per week vary. In order to include nonhourly workers who report varying hours in our wage analyses, we estimated their usual hours using a regression-based imputation procedure, in which we predicted the usual hours of work for “hours vary” cases based on the usual hours worked of persons with similar characteristics. An hourly wage was calculated by dividing weekly earnings by the estimate of hours for these workers. The share of our sample that received such a wage in the 1994–2016 period is presented in Table 3. The reported hourly wage of hourly workers was preserved.
Share of wage earners assigned an hourly wage from imputed weekly hours, 1994–2016
|Percent hours vary|
Source: EPI analysis of Current Population Survey Outgoing Rotation Group microdata
BLS analysts Randy Ilg and Steven E. Hauzen,3 following a 2000 Polivka study,4 did adjust the 10th-percentile wage because “changes to the survey in 1994 led to lower reported earnings for relatively low-paid workers, compared with pre-1994 estimates.” We make no such adjustments for both practical and empirical reasons. Practically, the BLS has provided no adjustment factors for hourly wage trends that we can use—Polivka’s work is for weekly wages. More importantly, the trends in 10th-percentile hourly wages differ from those reported by Ilg and Hauzen for 10th-percentile weekly earnings. This is perhaps not surprising, since the composition of earners at the “bottom” will differ when measured by weekly rather than hourly wages, with low-weekly-earners being almost exclusively part-timers. Empirically, Ilg and Hauzen show the unadjusted gap between the 50th-percentile wage and the 10th-percentile wage increasing between 1993 and 1994, when the new survey begins. In contrast, our 50/10 wage gap for hourly wages decreases between 1993 and 1994. Thus, the pattern of wage change in their data differs greatly from that in our data. In fact, our review of the 1993–1994 trends across all of the deciles shows no discontinuities whatsoever. Consequently, we make no adjustments to account for any effect of the 1994 survey change. Had we made the sort of adjustments suggested by Polivka, our measured fall in the 50/10 wage gap in the 1990s would be even larger, and the overall pattern—wage gaps shrinking at 50/10, widening at 90/50, and, especially, at 95/50—would remain the same.
When a response is not obtained for weekly earnings, or an inconsistency is detected, an “imputed” response is performed by CPS using a “hot deck” method, whereby a response from another sample person with similar demographic and economic characteristics is used for the nonresponse. This procedure for imputing missing wage data appears to bias comparisons between union and nonunion members. We restrict our sample to the observations with non-imputed wages only for analysis of the union wage premium.
Racial/ethnic demographic variables are also used in tables and in results reporting wage regression analyses. Starting in January of 2003, individuals surveyed by the CPS are asked directly if they belong to Spanish, Hispanic, or Latino categories. Persons who report they are Spanish, Hispanic, or Latino also may be of any race. For consistency, we categorize them as Hispanic and our race/ethnicity variable includes four mutually exclusive categories across years:
- white, non-Hispanic
- black, non-Hispanic
- Hispanic, any race
- all others
Beginning in 1992, the CPS employed a new coding scheme for education, providing data on respondents’ highest education level attained. In earlier years, the CPS provided data on years of schooling completed. The challenge of making a consistent wage series by education level is to either make the new data consistent with the past or to make the old “years of schooling” data consistent with the new educational attainment measures. To that end, we assume that completing 12 years of schooling equates with a high school diploma, 16 years with a college degree, and 18 years with an advanced degree. Anything between and including 13 and 15 are coded as some college. We redistribute the “17s” to the 16 years category (presumably a four-year degree).
We employ these education categories in various tables where we present wage trends by education over time. For the data for 1992 and later, we compute the “some college” trends by aggregating those “with some college but no degree beyond high school” and those with an associate or other degree that is not a four-year college degree.
Annual wages and hours
Changes in annual or weekly earnings can result from changes in hourly earnings or changes in time worked (hours worked per week or weeks worked per year). Our analyses focus on the hourly wage, which represents the pure price of labor (exclusive of benefits), because we are interested in changing pay levels for the workforce and its subgroups. This enables us to clearly distinguish changes in annual earnings that are due to an increase (or decrease) in work hours from changes that are due to increases (or decreases) in hourly pay. Most of our wage analyses, therefore, do not account for weekly or annual earnings changes due to reduced or increased work hours or opportunities for employment.
An exception to this is “annual wages and work hours” in the State of Working America Data Library, which presents annual hours, earnings, and hourly wages from the CPS Annual Social and Economic Supplement (CPS ASEC, also referred to as the March CPS). Our analysis of this data shows that the overwhelming driver of annual wage trends between business cycle peaks has been trends in hourly wages. In this analysis, weekly and hourly wage data are “hour weighted,” obtained by dividing annual wages by weeks worked and annual hours worked. The 1967 and 1973 values are derived from unpublished tabulations of CPS data provided by Kevin Murphy from an update of his 1989 paper with Finis Welch6; they include self-employment as well as wage and salary workers. The values displayed in this table were bridged from CPS 1979 values using the growth rates in the Murphy and Welch series. Hours of work were derived from differences between annual, weekly, and hourly wage trends.
In our view, the ORG files provide a better source of data for wage analyses than the traditionally used CPS ASEC files. In order to calculate hourly wages from the CPS ASEC, analysts must make calculations using three retrospective variables: the annual earnings, weeks worked, and usual weekly hours worked in the year prior to the survey. In contrast, respondents in the ORG are asked a set of questions about hours worked, weekly wages, and, for workers paid by the hour, hourly wages in the week prior to the survey. In this regard, the data from the ORG are likely to be more reliable than data from the CPS ASEC. See Jared Bernstein and Lawrence Mishel’s 1997 article7 for a detailed discussion of these differences.
Another exception to the use of the CPS ORG data for wages includes average annual wages by wage group taken from a 2010 article by Wojciech Kopczuk, Emmanuel Saez, and Jae Song Song8, Table A-3. Data for 2006 through 2015 are extrapolated from 2004 data using changes in wage shares computed from Social Security Administration (SSA) wage statistics (data at http://www.ssa.gov/cgi-bin/netcomp.cgi). The final results of the paper by Kopczuk, Saez, and Song printed in a journal used a more restrictive definition of wages so we employ the original definition, as recommended in private correspondence with Kopczuk. SSA provides data on the share of total wages and employment in annual wage brackets such as for those earning between $95,000.00 and $99,999.99. We employ the midpoint of the bracket to compute total wage income in each bracket and sum all brackets. Our estimate of total wage income using this method replicates the total wage income presented by SSA with a difference of less than 0.1 percent. We use interpolation to derive cutoffs building from the bottom up to obtain the 0–90th percentile bracket and then estimate the remaining categories. This allows us to estimate the wage shares for upper wage groups. We use these wage shares computed for 2004 and later years to extend the Kopczuk, Saez, and Song series by adding the changes in share between 2004 and the relevant year to their series. To obtain absolute wage trends we use the SSA data on the total wage pool and employment and compute the real wage per worker (based on their share of wages and employment) in the different groups in 2015 dollars.
Compensation and benefits
Note: For information on the measure of workers’ compensation used to analyze CEO-to-worker compensation, please see the next section.
We measure compensation (which includes wages and benefits) using two main data sources: Bureau of Economic Analysis National Income and Product Accounts (NIPA) tables and Bureau of Labor Statistics National Compensation Survey Employer Costs for Employee Compensation (ECEC) data. NIPA data are for the entire economy. Wages are deflated by the personal consumption expenditures (PCE) index for all items, except health, which is deflated by the PCE medical index. Data are computed from the NIPA tables. Our “wages” category includes wages and salaries and thus “wages” are calculated by dividing wage and salary accruals (NIPA Table 6.3) by hours worked by full-time and part-time employees (NIPA Table 6.9). “Compensation” is the sum of wages and salaries and benefits (it includes payroll taxes and health, pension, and other nonwage benefits). Payroll taxes are calculated as total compensation (NIPA Table 6.2) minus the sum of volunteer benefits (sum of health and nonhealth benefits; see NIPA Table 6.11) and wages and salaries. “Nonwage benefits” is the difference between total compensation and wages and salaries. These data were deflated using the NIPA personal consumption expenditure (PCE, chain-weighted) index, with health insurance adjusted by the PCE medical care (chained) index. These data include both public- and private-sector workers.
ECEC data are for the private sector. The data provide cost levels for March for private-sector workers, available starting in 1987. We categorize wages and salaries differently than BLS, putting all wage-related items (including paid leave and supplemental pay) into the hourly wage column. This makes the definition of wages and salaries comparable to workers’ W-2 earnings and to the definition of wages in the CPS Outgoing Rotation Group data that are tabulated for the “Wages” and “Wage gaps” sections of the State of Working America Data Library and other analyses that focus on hourly wage trends. Nonwage benefits, in our definition, include only payroll taxes, pensions, and insurance. The sum of wages and salaries and benefits equals total compensation. It is important to use the ECEC (the current-weighted series) rather than the other, fixed-weighted series (the ECI) from the same National Compensation Survey (NCS) data because composition shifts (in the distribution of employment across occupations and industries) can have large effects over time. Employer costs for insurance are deflated by the medical-care component of the CPI-U-RS (Consumer Price Index Research Series Using Current Methods). All other pay is deflated by the CPI-U-RS for “all items.” Inflation is measured for the first quarter of each year.
CEO and worker compensation
Consistent with EPI’s annual report on CEO-to-worker compensation (see: Mishel and Schieder 2017), we compare executive compensation for CEOs at the top 350 firms based on sales using the Execucomp database, and then we calculate a comparable worker compensation estimate using series from the BLS Current Employment Statistics program and BEA.
In order to calculate a worker annual compensation series, we use data from the Bureau of Economic Analysis (BEA) and the Bureau of Labor Statistics (BLS). Because no data exist for the compensation of an average worker in a firm, we had to create our own proxy. Compensation data were collected from the BEA’s National Income and Product Account (NIPA) Tables 6.2C and 6.2D, “Compensation of Employees by Industry.” These tables give total compensation of all workers by industry for since 1992 with a one-year lag in data released. Wage and salary data corresponding to these compensation data come from NIPA Tables 6.3C and 6.3D, “Wage and Salary Accruals by Industry.” These tables provide total wage and salary disbursements to all workers in each industry since 1992. Using these two datasets, we are able to create an industry-specific compensation-to-wage ratio by dividing total compensation by total wage and salary accruals in each industry. Applying this ratio to a measure of the wages of typical workers provides an estimate of compensation.
Average worker hourly earnings data are from the BLS Current Employment Statistics program (CES). We use average hourly earnings of production and nonsupervisory employees for each industry at the 3-digit NAICS (North American Industrial Classification System) level. This series is based on the regular establishment survey used to generate the payroll establishment employment data that are released by BLS each month along with the unemployment rate. Production and nonsupervisory employees represent over 80 percent of payroll employment.
To find average hourly worker compensation, the compensation-to-wage ratios from the BEA are multiplied by each respective average hourly earnings figure. Because data from the BEA were only available through the previous year, the most recent year’s compensation-to-wage ratio was applied to the current average hourly earnings from the CES. This results in average hourly worker compensation by industry at the 3-digit NAICS level. For some industries, wage data from the CES were available, but compensation-to-wage ratios from the BEA were not. In these instances, the compensation-to-wage ratio for the larger 2-digit NAICS-level industry that encompasses the 3-digit-level industry with the missing value was used.
Because the data used are average hourly earnings and the CEO compensation data are presented as annual numbers, the final industry-level typical worker compensation data are multiplied by 2,080. This converts hourly compensation of production/nonsupervisory workers to annual average worker compensation, which can now be directly compared with the annual CEO compensation figures used to calculate CEO-to-worker compensation ratios (explained later). Most workers do not work full-time and year-round, so the annual compensation measure we are employing clearly overstates the actual annual compensation of a typical worker.
Average hourly compensation of production/nonsupervisory workers
Production/nonsupervisory workers make up approximately 82 percent of the workforce. Wage data for these workers serve as a useful proxy for the median hourly wage when we extend our analysis back to 1948, as data on median wages (from the CPS ORG) only go back to 1973. The trend of average earnings (i.e., wages) for production/nonsupervisory workers is similar to the trend in median hourly wages since 1973, so it’s reasonable to assume a similar pattern held for 1948–1973.
The most recent series of average hourly earnings for production/nonsupervisory workers (available from the BLS Current Employment Statistics [CES]) extends from 1964 to the present. Prior to 1964, the series of average hourly earnings of production workers (also available from the BLS CES) measured the earnings of a similar pool of workers. We backcast the average hourly earnings of production/nonsupervisory workers from 1964 to 1948 using the percent changes in the average hourly earnings of production workers.
Data on the average hourly earnings of production/nonsupervisory workers are then converted to real dollars by deflating them by the CPI-U-RS. Finally, we multiply the real average hourly earnings by the real compensation-to-wage ratio to obtain the real average hourly compensation of production/nonsupervisory workers.
Data on union coverage rates are from the CPS ORG and Barry Hirsch and David Macpherson,9 updated at unionstats.com. The data on union coverage begin in 1977 and are extended back to 1973, based on percentage-point changes in union membership shares in Hirsh and Macpherson’s analysis.
This methodology is an update of Appendix B from the Economic Policy Institute book The State of Working America, 12th Edition.
1. EPI changed its method from 50-cent intervals to 25-cent intervals in August 2016. The narrower bins provide more precise measurement of wages of subgroups. For more details on our previous method, see Appendix B of The State of Working America.
2. Anne Polivka, “Data Watch: The Redesigned Current Population Survey,” Journal of Economic Perspectives, vol. 10, no. 3, pp. 169-180, 1996.
4. Anne Polivka, “Using Earnings Data from the Current Population Survey,” Bureau of Labor Statistics working paper, Anne. 2000.
5. David A. Jaeger, “Reconciling the Old and New Census Bureau Education Questions: Recommendations for Researchers.” Journal of Economics and Business Statistics, vol. 15, no. 3, pp. 300–309, 1997.
6. Kevin Murphy and Finis Welch, Recent Trends in Real Wages: Evidence From Household Data. Paper prepared for the Health Care Financing Administration of the U.S. Department of Health and Human Services by the University of Chicago, 1989.
8. Wojciech Kopczuk, Emmanuel Saez, and Jae Song, “Earnings Inequality and Mobility in the United States: Evidence from Social Security Data since 1973,” The Quarterly Journal of Economics, February 2010.
9. Barry Hirsch and David Macpherson, “Union Membership and Coverage Database from the Current Population Survey: Note.” Industrial and Labor Relations Review, vol. 56, no. 2, pp. 349–54, 2003.