Go to the Table Of ContentsSkip To Content
Click for DHHS Home Page
Click for the SAMHSA Home Page
Click for the OAS Drug Abuse Statistics Home Page
Click for What's New
Click for Recent Reports and HighlightsClick for Information by Topic Click for OAS Data Systems and more Pubs Click for Data on Specific Drugs of Use Click for Short Reports and Facts Click for Frequently Asked Questions Click for Publications Click to send OAS Comments, Questions and Requests Click for OAS Home Page Click for Substance Abuse and Mental Health Services Administration Home Page Click to Search Our Site

2001 State Estimates of Substance Use

bulletNational data      bulletState level data       bulletMetropolitan and other subState area data

Appendix E: State Estimation Methodology

This report includes estimates of 19 substance use measures. Twelve of the measures used the same definition for 1999 through 2001 and have estimates of change between 1999–2000 and 2000–2001, the difference of two 2-year moving averages. Six substance abuse and dependence measures used the same definition for 2000 and 2001, but not for 1999; therefore, only the estimates for 2000–2001 are provided. One new measure, serious mental illness (SMI), was introduced in 2001, and State estimates have been produced for that single year.

This appendix describes the methodology used to measure change in State estimates (Section E.1), the validation of that methodology (Section E.2), the validation of the estimates of prevalence levels based on the combined 1999–2000 National Household Survey on Drug Abuse (NHSDA) data (Section E.3), caveats regarding small area estimation (SAE) (Section E.4), and the general methodology (hierarchical Bayes) used to create the State estimates (Section E.5). Included at the end of this appendix are tables showing the State response rates for 1999–2001, the State sample sizes for 1999–2001, and the State sample sizes for the 2001 incentive experiment.

E.1. Measuring Change in State Estimates Between 1999–2000 and 2000–2001

The estimates of change in State estimates presented in this report are based on the 1999 through 2001 NHSDAs. State estimates for 1999–2000 and 2000–2001 were produced by combining State-level NHSDA data with local-area county and Census block group/tract-level predictor variables from the States for the two time periods. The SAE methodology for estimating change is described in this section, while Section E.5 provides a general overview of SAE methodology. The moving average State prevalence estimates displayed in Appendix A for the overlapping 1999–2000 and 2000–2001 time periods were obtained from independent applications of RTI's survey-weighted hierarchical Bayes (SWHB) methodology.

The State estimates for 1999–2000 are the model-based small area estimates previously published by the Substance Abuse and Mental Health Services Administration (SAMHSA) (see Wright, 2002a, 2002b). These estimates were derived by first fitting logistic mixed models to the pooled 1999–2000 survey dataset. These models fit separate fixed and random effects for each of four age groups. Each age group model had 51 State-level random effects and 300 substate region-level random effects. The fixed predictor variables for each age group were defined at five levels, namely, person-level demographics, 1990 decennial Census block group-level items, tract-level items, county variables, and State variables. The same fixed predictors were used for all 3 years (1999, 2000, and 2001) of data but annual updates were made when more current versions became available.

Having estimated the common fixed and random effects from the pooled 1999–2000 dataset, year-specific predicted probabilities of substance use were formed at the block group–b level for each of eight gender (2) by race/ethnicity (4) domains-d within each of four age groups-a.

Year specificity in the State estimates was induced by updating the fixed predictor variables annually and by using year-specific block group-level population projections for the 32 age by gender by race/ethnicity domains to weight together the domain-specific probabilities of use. These year-t population projections, [Notation N sub bad at time t denotes the year-t specific population projections for block-group b, age group a, and gender by race/ethnicity domain d.] were purchased from Claritas Inc. Letting Notation pi sub bad at time t denotes the year-t specific predicted probability of substance use for block-group b, age group a, and gender by race/ethnicity domain d. denote the predicted probability of substance use for the age group-a by race/ethnicity by gender subpopulation-d in block group-b for year-t, then the age group-specific estimates for State i were computed as population-weighted averages of the form

Equation E3   ,  D

where the summation extends over all the block groups-b belonging (epsilon) to the State i universe omega sub i. Note that the domain-d summations extend over the eight age group-specific gender by race/ethnicity domains within each block group.

To produce the 1999–2000 pooled estimates, the common fixed and random effect estimates were first employed to form State estimates Notation pi sub ia at time 99 denotes the predicted probability of substance use for State i, for age group a in 1999. and Notation pi sub ia at time 00 denotes the predicted probability of substance use for a State i, for age group a in 2000. for 1999 and 2000, respectively. These annualized State estimates were then combined as population-weighted averages of the form

Equation E8  ,   D

where Notation N sub ia at time t denotes the year-t specific population projection and is calculated as the sum of the year-t specific population projections for block-group b, age group a, and gender by race/ethnicity domain d (N sub bad at time t), summed over all 8 gender by race/ethnicity domains and all the block-groups in State i. The SWHB versions of these pooled estimates were computed as posterior means over 1,250 Gibbs samples drawn from the joint posterior distribution of the fixed and random effects. The 95 percent asymmetric prediction intervals (PIs) for these pooled 1999–2000 prevalence estimates were first formed as symmetric, approximately Gaussian, Bayes credible intervals on the log-odds scale. The end points of these log-odds symmetric intervals then were transformed back to the prevalence scale.

The State by age group prevalence estimates derived from the pooled 2000 and 2001 survey data were produced by refitting the logistic mixed models. In this independent refitting of the models, updated versions of the fixed predictors were used with the 2001 survey responses when updates were available. This refitting resulted in a new set of age group-specific fixed and random effects for the combined 2000 and 2001 surveys. As described previously, 1,250 Gibbs sample draws from the joint posterior distribution of these fixed and random effect parameters were used to calculate posterior means and 95 percent prediction intervals for the 2000 and 2001 State i by age group-a prevalence estimates Notation pi sub ia at time 00 and 01 denotes the predicted probability of substance use based on the 2000 and 2001 data for State i and age group a..

The 2000 and 2001 models were fit independently of the previously fit 1999 and 2000 models. This independent analysis approach was followed because there was no desire to revise the previous estimates and the associated moving average change measures as the result of jointly modeling all 3 years of survey data. This approach does have a shortcoming when computing the Bayes significance level for an estimated moving average change measure. Specifically, one needs to estimate the posterior variance of a change measure defined as the log-odds ratio:

Equation E11     D

A change measure like the log-odds ratio is favored over the simple difference because the Bayes significance calculation is much less burdensome when the posterior distribution of the change measure is approximately Gaussian as is the case for Notation lor sub ia denotes the log-odds ratio for State i and age group a. but not for the simple difference. Calculating the posterior variance of Notation lor sub ia denotes the log-odds ratio for State i and age group a. can be accomplished by using the posterior variance statistics that were previously obtained from the independent Markov chain Monte Carlo (MCMC) chains.

To complete the variance calculation for Notation lor sub ia denotes the log-odds ratio for State i and age group a., a correlation estimate for the two log-odds statistics is required. To approximate this correlation, the 1999–2000 and 2000–2001 models were fit simultaneously. This simultaneous fit yielded an MCMC sample of 1,250 draws from the joint posterior distribution of both sets of fixed and random effects. To accommodate this simultaneous fitting of the 1999–2000 and 2000–2001 models, a concatenated dataset containing both of the pooled samples was created. Because the PROC GIBBS software allows for separate logistic mixed models for a set of nonoverlapping subpopulations, it was possible to simultaneously fit eight age group (4) by dataset (2) models as if there were no overlap in the two datasets. This simultaneous solution yielded a set of 1,250 MCMC replicates for the two overlapping log-odds statistics. In these simultaneous models, the eight age group by dataset random effects for each State and for each substate region were allowed to have a general variance-covariance matrix. It was hoped that these random effect covariances between datasets would largely account for the 2000 survey overlap.

In the process of conducting the SAE change measure validation study (reported on in Section E.2), it was observed that the 95 percent prediction intervals for two of the SAE odds ratios, (namely, past month alcohol use and past month cigarette use) were approximately the same or wider than the 95 percent confidence intervals (CIs) for the associated design-based odds ratio estimates. These interval comparisons are displayed in Table  E.1. It had also been previously noted that the prediction intervals for the two SAE-based log-odds statistics involved in the log-odds ratios were substantially narrower than the corresponding design-based intervals. Therefore, it was clear that the correlations between the two odds statistics over the MCMC samples were substantially smaller than their design-based counterparts. Table  E.2 shows these underestimated correlations as compared with their design-based counterparts.

These model-based MCMC correlations were underestimated as a consequence of the faulty assumption that the eight age group by dataset subpopulations in the simultaneous models were nonoverlapping. The overlap associated with the 2000 survey data was not adequately accounted for by the random effect correlations. There is an alternative form of the odds ratio estimator that employs nonoverlapping subpopulations and provides for proper MCMC-based correlation estimation. This odds ratio for change is based on simultaneously fitting the three annual models to produce 1,250 MCMC samples from the joint posterior distribution of the triple Notation pi-tilde sub ia at time 99 denotes the predicted probability of substance use in 1999 for State i and age group a and is based on simultaneously fitting the three annual models., Notation pi-tilde sub ia at time 00 denotes the predicted probability of substance use in 2000 for State i and age group a and is based on simultaneously fitting the three annual models., and Notation pi-tilde sub ia at time 01 denotes the predicted probability of substance use in 2001 for State i and age group a and is based on simultaneously fitting the three annual models.. For this simultaneous model, there are 12 age (4) by year (3) subpopulation-specific models, each with their own sets of fixed and random effects. In this case, the general covariance matrices for the State and substate random effects are 12 by 12 matrices corresponding to the 12 element (age group by year) vectors of random effects. The associated odds ratio is based on the pooled prevalences:

Equation E16     D

and

Equation E17     D

Note that the survey-weighted Bernoulli-type log likelihood employed in PROC GIBBS was appropriate for this simultaneous model because the 12 age group by year subpopulations were nonoverlapping. The purpose of using the more complex 2-year averaging scheme described previously was to minimize bias. If one assumes the fixed and random effects are common for the 2 years being pooled, this should yield small area estimates that are closer to the design-based estimates than the Notation pi-tilde sub ia at time t denotes the predicted probability of substance use for State i and age group a. estimators above where year-specific parameters were assumed. For the odds ratio based on the Notation pi-tilde sub ia at time t denotes the predicted probability of substance use for State i and age group a. averaged prevalence estimates, it is clear that the correlation between the two log-odds statistics should be high. This follows from the fact that Notation pi-tilde sub ia at time 00 denotes the predicted probability of substance use for State i, age group a for 2000. is common to the two population-weighted averages. These correlation estimates based on Notation pi-tilde sub ia at time t denotes the predicted probability of substance use for State i and age group a. more properly reflect the true correlations associated with the Notation pi sub ia at time t denotes the predicted probability of substance use for State i, age group a. type of averages presented in the body of this report. Table  E.3 is similar to Table  E.1 except that the prediction intervals were obtained using the correlations from the alternative method. Table  E.4 displays the correlations from the alternative method and the corresponding design-based correlations. Table s E.5 to E.8 contrast the Bayes significance levels for these two correlation estimators. Note that the revised significance estimates [p value(2)] are smaller than the original ones [p value(1)]; they are about 20 percent smaller for past month use of cigarettes, alcohol, and marijuana, and about 6 percent smaller for past year use of cocaine.

E.2. Validation of Methodology to Measure Change

To validate the SAE models for estimating change between the pooled 1999–2000 small area estimates and the pooled 2000–2001 small area estimates, the design-based estimates of change for the eight large sample States were used as internal benchmarks. The eight large sample States had 2-year sample sizes that ranged between 6,200 and 9,700. Estimates were produced for four outcome variables representative of a range of prevalence rates: past year use of cocaine, past month use of marijuana, past month use of cigarettes, and past month use of alcohol. The goal of the validation was to compare the estimates for small States utilizing the SAE methodology with estimates based on the internal benchmarks.

E.2.1 Replicate Formation Methodology

The validation study was performed by first subsampling the eight large States; for each of these large States, four sample replicates ("pseudo" small States) were formed that mimicked the design properties of the 42 small States and the District of Columbia. A key feature of this replicate formation strategy was mimicking the 50 percent overlap between the 1999 and 2000 samples of 96 area segments and between the 2000 and 2001 segment samples in each small sample State. Because new samples of dwellings and persons were drawn from all sample segments every year, the survey design-induced covariance between years is limited to this 50 percent overlap of sample block groups/segments.

Exhibit E.1 presents the 50 percent segment overlap plan for the 3 survey years. Note that there are 48 field interviewer (FI) regions in each of the eight large States and 12 FI regions in each of the 42 small States and the District of Columbia. Each FI region has four quarters, and each quarter is then expected to have two area segments. For various reasons, some of the FI region-by-quarter slots may be empty. In the following illustration, segments A, C, E, and G in 1999 were kept in 2000. Segments B, D, F, and H were replaced by segments I, J, K, and L in 2000. In 2001, the segments I, J, K, and L of 2000 were kept, and segments A, C, E, and G from 2000 were replaced by segments M, N, O, and P.

Exhibit E.1 Sample Segment 50 Percent Overlap Plan for the 1999, 2000, and 2001 NHSDAs
FI Region Quarter Segments
1999 2000 2001
1 1 A A M
B I I
2 C C N
D J J
3 E E O
F K K
4 G G P
H L L
FI = field interviewer.

To select the four pseudo small State samples from each large State, 12 pseudo FI regions were first created within each large sample State by pooling their 48 initial FI regions into groups of 4. Each of these pseudo FI regions then was expected to have eight area segments per calendar quarter (see Exhibit E.2). For each of these pseudo FI region-by-quarter sets of eight area segments, any segments that were devoid of interviews were first randomly replaced by a selection from the non-empty segments in the set. The segments for the 1999, 2000, and 2001 NHSDA data were filled in separately. Once complete sets of eight non-empty segments for the 1999, 2000, and 2001 NHSDA data in each of the pseudo FI region-by-quarter sets were assembled, the 1999, 2000, and 2001 data were linked using State-by-pseudo FI region-by-quarter-by-segment identification codes.

Exhibit E.2 An Example of Sample Segment Assignment in Pseudo FI Regions in 1999, 2000, and 2001 NHSDAs
Pseudo
FI Region
Quarter Segments
1999 2000 2001
1 1 a a m
b i i
c c n
d j j
e e o
f k k
g g p
h l l
FI = field interviewer.

Let a, b, c, d, e, f, g, and h denote the eight segments in quarter 1 of pseudo FI region 1 in 1999. Approximately half of the eight segments represented cases where the 1999 segments were reused in 2000 (i.e., common segments a, c, e, and g in 1999 and 2000), and the remaining segments b, d, f, and h represented cases where 1999 segments were linked with new 2000 replacement segments i, j, k, and, l. Similarly between 2000 and 2001, segments i, j, k, and l are common segments, whereas segments a, c, e, and g are linked to new segments m, n, o, and p.

Next, the eight linked 1999 and 2000 segment pairs were stratified into two strata—the common segment pairs and the uncommon 1999 and 2000 segment pairs. One segment pair was then randomly drawn from each of these strata and combined to form four pseudo small States such that one of the paired replicates would have common segments in the 1999 and 2000 surveys and the other replicate pair would have uncommon segments for 1999 and 2000. The 2001 segments then were forced to go into the same pseudo States depending on the linkage between the 2000 and 2001 sample segments. For example, if segment "g" was assigned to pseudo State 1 in 1999, "g" also was linked to "p" in 2001 because "g" was common between 1999 and 2000; segment "g" in 2000 and the segment "p" in 2001 were forced to go into pseudo State 1. Exhibit E.3 demonstrates a typical assignment of segments among the four pseudo states for the 1999, 2000, and 2001 NHSDAs.

Exhibit E.3 Typical Assignment of Segments among Four Pseudo States for 1999, 2000, and 2001 NHSDAs
Pseudo
FI Region
Quarter Pseudo State Segments
1999 2000 2001
1 1 1 g g p
b i i
2 a a m
h l l
3 e e o
d j j
4 c c n
f k k
FI = field interviewer.

This subsampling validation exercise was repeated for all four quarters in a pseudo FI region and for all 12 pseudo FI regions in each of the eight large States. This resulted in 32 (8 large States × 4 subsamples from each large State) pseudo small States from eight large States. These pseudo small States mimicked the design properties of small States with the 50 percent sample segment overlap preserved across adjacent survey years.

E.2.2 Results of Validating the Small Area Estimates of Change Between 1999–2000 and 2000–2001

Table s E.9 to E.12 present the internal benchmark estimate (labeled "design-based") and the corresponding average estimate using the SAE procedures for the four substance use measures for each of the eight large States and the relative absolute bias (RAB) for each of the substance use measures. The estimate in each case is the odds of having used the substance in 2000–2001 divided by the odds of having used the substance in 1999–2000. In general, the average relative biases for the age 12 or older population are fairly small for substance use measures with larger prevalence rates and somewhat larger for the others. The average relative bias is worst for past year use of cocaine (12.7 percent for the population age 12 or older). Note, however, that the relative bias is generally conservative, producing SAE odds ratios that are closer to "no change" relative to the design-based odds ratios. For example, of the 32 pairs of State-by-age group estimates for cocaine, the SAE odds ratios are closer to 1.0 for 29 of the pairs and the design-based odds ratios are closer to 1.0 for only 3 pairs.

Table  E.3 presents the ratio of widths of the 95 percent prediction intervals from the SAE data to the 95 percent confidence intervals from a direct estimate based on the same size sample. The estimates in the table are based on the recalculated (larger) estimate of the correlation between the two 2-year moving averages. As one can see, the width of the 95 percent prediction intervals are much smaller on average for each of the four substance measures validated, ranging from 0.60 for past month use of marijuana and past year use of cocaine to 0.77 for past month use of cigarettes for persons age 12 or older. This represents an improved precision that is equivalent to a sample size almost 3 times larger for marijuana and cocaine and about 2 times larger for cigarettes-relative to the precision obtained from the corresponding direct design-based estimate.

E.3. Validation of Combined Prevalence-Level Estimates for 1999–2000

The 2-year estimates had been validated in the 2000 State report for four variables: past month use of marijuana, past year use of cocaine, past month binge alcohol use, and past month use of cigarettes. The results of that validation are repeated here in Table s E.13 to E.16. On average, the relative absolute biases (RABs) were quite small. For the 12 or older age group, the RABs were as follows:

Also, compared with the design-based confidence intervals, the 95 percent prediction intervals were much shorter, about 75 percent as large for marijuana, binge alcohol, and cigarettes and 65 percent as large for cocaine (Table  E.17).

In addition, the 2-year estimates were compared with the corresponding 1-year estimates to ascertain the extent of improvement in estimation for the 42 States and the District of Columbia, given that those sample sizes would now be approximately double their size in 1999. For example, comparing the prediction intervals' widths across the 50 States and the District of Columbia, the SAE average prediction interval width for past month use of marijuana among persons 12 or older was 2.40 percent in 1999, but only 1.98 percent for 1999 and 2000 combined (see Section B.4.2 from Wright, 2002b). Just as importantly, because the States (and the District of Columbia) had smaller single-year sample sizes, the national model had a greater relative influence in the SAE estimates for 1999 than for 1999 and 2000 combined. Therefore, the 1999–2000 pooled State estimates would not be shrunk as much toward the national model-based estimate as would similar estimates based on a single year of data. One result is that the 2-year small area estimates would tend to be closer to their corresponding design-based estimates than small area estimates based on a single year of data. The other implication is that States with design-based estimates that were relatively lower or higher than other States would retain that distinction, and the overall range and spread of the State estimates would tend to be larger, for example, than it was in 1999. This should make it easier to identify States that have notably lower or higher substance use prevalence rates than other States.

E.4. Caveats

Some of the caveats regarding SAE are addressed in Chapter 7 in Volume I of this report. Table s E.18 to E.20 show the screening, interview, and overall response rates for the 50 States and the District of Columbia from 1999 to 2001, respectively. The response rates are somewhat higher in both 2000 and 2001.

In 2001, an incentive experiment was embedded in the regular data collection during quarters 1 and 2. For that experiment, small random samples were selected in each State proportionate to their population size, and sampled persons were assigned to receive $0, $20, or $40 for completing the questionnaire. Analysis of that data revealed that the response rates were significantly higher among those receiving an incentive than among those who did not receive an incentive and that the overall cost of the survey was less due to the much smaller number of callbacks that were necessary (Eyerman & Bowman, 2002). Initial analysis of that data did not indicate any significant differences in estimated prevalence levels between the incentive and nonincentive cases; however, subsequent analysis has revealed higher prevalence rates for the incentive cases for some of the substance measures. Because the incentive sample size is relatively small compared to the total State sample size, the decision was made to combine both incentive and nonincentive samples in 2001 to produce the national estimates and to produce the State estimates for 2000 and 2001 combined. For example, the incentive sample size for Alabama totaled 98 cases that received either the $20 or $40 incentive (Table  E.21), but the total sample size for 2000–2001 for Alabama was 1,821 (Table  E.22). The largest allocation of incentive sample cases was in Illinois. There, 442 cases received either the $20 or $40 incentive out of a total combined sample size of 7,218, about 6 percent. Table  E.22 also presents the State sample sizes for 1999 through 2001. Table  E.21 presents the State sample allocations for just the incentive experiment.

One other possible contributor to bias in the State estimates, and the estimates in general, is the effect of editing and imputation of the summary data. In developing the editing and imputation process for 1999 and subsequent years, the desire was to minimize the amount of editing because of its somewhat subjective nature, and instead let the random imputation process supply any partially missing information. Overall, the percentage of imputed information is quite small for any given substance.

The imputation method is based on a multivariate imputation in which some demographic and other substance use information from the respondent is used to determine a donor who is similar in those characteristics but has supplied data for the drug in question (Grau et al., 2001, 2002, 2003). Often, information also is available from the partial respondent on the recency of drug use. For example, respondents may have indicated that they used the drug in their lifetime or in the past year, but left blank the question about use in the past month. For many of the records, this type of auxiliary information was available. In a small portion of the time, no auxiliary information was available, in which case a random donor with similar drug use patterns and demographic characteristics was used. For the different substances, the largest differences between the edited and the imputed estimates typically occurred when there was a lot of auxiliary information. For past month use of marijuana, based on the 1999 data, the State with the largest percentage change from edited to imputed data was Alabama, whose edited rate of use of marijuana was 2.1 percent and whose imputed rate of use was 3.1 percent—a relative increase of almost 50 percent.

E.5. SAE Methodology

E.5.1 Background

In response to the need for State-level information on substance abuse problems, SAMHSA began developing and testing SAE methods for the NHSDA in 1994 under a contract with RTI of Research Triangle Park, North Carolina. That developmental work used logistic regression models with data from the combined 1991 to 1993 NHSDAs and local area indicators, such as drug-related arrests, alcohol-related death rates, and block group/tract-level characteristics from the 1990 Census that were found to be associated with substance abuse. In 1996, the results were published for 25 States for which there were sufficient sample data (OAS, 1996). A subsequent report described the methodology in detail and noted areas in which improvements were needed (Folsom & Judkins, 1997).

The increasing need for State-level estimates of substance use led to the decision to expand the NHSDA to provide estimates for all 50 States and the District of Columbia on an annual basis beginning in 1999. It was determined that, with the use of modeling similar to that used with the 1991 to 1993 NHSDA data in conjunction with a sample designed for State-level estimation, a sample of about 67,500 persons would be sufficient to make reasonably precise estimates.

The State-based NHSDA sample design implemented in 1999 through 2001 had the following characteristics:

In preparation for the modeling of the 1999 data, RTI used the data from the combined 1994–1996 NHSDAs to develop an improved methodology that utilized more local area data and produced better estimates of the accuracy of the State estimates (Folsom, Shah, & Vaish, 1999). That effort involved the development of procedures that would validate the results for geographic areas with large samples. This work was reviewed by a panel with SAE expertise.1 They approved of the methodology, but suggested further improvements for the modeling to be used to produce the 1999 State estimates. Those improvements were incorporated into the methodology finally used for the 1999 State estimates. Similar methodology (as described earlier) was used for the 2000 State report and this 2001 State report. The SWHB methodology is described below.

E.5.2 Goals of Modeling

There were several goals underlying the estimation process. The first was to model drug use at the lowest possible level and aggregate over the levels to form the State estimates. The chosen level of aggregation was the 32 age group (12 to 17, 18 to 25, 26 to 34, 35+) by race/ethnicity (white, non-Hispanic; black, non-Hispanic; Hispanic; Other non-Hispanic) by gender cells at the block group level. Estimated population counts were obtained from a private vendor for each block group for each of the 32 cells. This level of aggregation was desired because the NHSDA first stage of sample selection was at the block group level, so that there would be data at this level to fit a model. In addition, there was a great deal of information from the Census at the block group level that could be used as predictors in the models. If prevalence rates could be estimated for each of the 32 cells at the block group level, it would only be necessary to multiply the rates by the estimated population counts and aggregate to the State level.

Another goal of the estimation process was to include the sampling weight in the model in such a way that the small area estimates would converge to the design-based (sample-weighted) estimates when they were aggregated to a sufficient sample size. There was a desire for the estimates to have this characteristic so that there would be consistency with the survey-weighted national estimates based on the entire sample.

A third goal was to include as much local source data as possible, especially data related to each substance use measure. This would help provide a better fit beyond the strictly sociodemographic information. The desire was to use national sources of these data so that there would be consistency of collection and estimation methodology across States.

Recognizing that estimates based solely on these "fixed" effects would not reflect differences across States due to differences in laws, enforcement activities, advertising campaigns, outreach activities, and other such unique State contributions, a fourth goal was to include "random" effects to compensate for these differences. The types of random effects that could be supported by the NHSDA data were a function of the size of sample and the model fit to the sample data. Random effects were included at the State level and for substate regions comprising three FI regions. Although this grouping of the three FI regions was principally motivated by the need to accumulate enough of a sample to support good model fitting for the low-prevalence NHSDA outcomes, it also was reasoned that it would be possible to produce substate hierarchical Bayes (HB) estimates for areas comprised of these FI region groups, once 2 or 3 years of NHSDA data were available, because that would yield substate region samples of at least 400 respondents. For substate areas that do not conform to the substate region boundaries (e.g., counties and large municipalities), HB estimates could be derived from their elemental block group-level contributions, but the design-based data employed in the estimation of the associated substate region effects would not be restricted to the county or city of interest. This mismatch of FI region and county/large municipality boundaries weakens the theoretical appeal of the associated HB estimate. For this reason, substate HB estimates probably should be restricted to areas that can be matched reasonably well to FI region groups.

One of the difficulties of typical SAE has been obtaining good estimates of the accuracy of the SAEs with prediction intervals that give a good representation of the true probability of coverage of the intervals. Therefore, the final major goal was to provide accurate prediction intervals—ones that would approach the usual sample-based intervals as the sample size increases.

E.5.3 Variables Modeled

In the 2001 NHSDA, a set of 19 measures covering a variety of aspects of substance use and abuse was designated for estimation. For the first 12, three estimates have been produced: one set based on pooled 1999 and 2000 NHSDA data, another set based on pooled 2000 and 2001 NHSDA data, and a third set measuring the change between the first two estimates. Estimates of measures of change between two consecutive single years had not been precise enough to declare significant the size of the annual changes that were observed. For the next six variables, only estimates based on the pooled 2000 and 2001 data were possible because the definitions of those variables had changed between 1999 and 2000. The final variable, serious mental illness (SMI), was added in 2001. The 19 outcome variables are listed as follows:

  1. past month use of any illicit drug,
  2. past month use of marijuana,
  3. perceptions of great risk of smoking marijuana once a month,
  4. average annual rates of first use of marijuana,
  5. past month use of any illicit drug other than marijuana,
  6. past year use of cocaine,
  7. past month use of alcohol,
  8. past month binge alcohol use,
  9. perceptions of great risk of having five or more drinks of an alcoholic beverage once or twice a week,
  10. past month use of any tobacco product,
  11. past month use of cigarettes,
  12. perceptions of great risk of smoking one or more packs of cigarettes per day,
  13. past year alcohol dependence or abuse,
  14. past year alcohol dependence,
  15. past year any illicit drug dependence or abuse,
  16. past year any illicit drug dependence,
  17. past year dependence or abuse for any illicit drug or alcohol,
  18. past year treatment gap, and
  19. past year serious mental illness.

E.5.4 Predictors Used in Logistic Regression Models

Local area data used as potential predictor variables in the logistic regression models were obtained from several sources, including Claritas, the Census Bureau, the FBI (Uniform Crime Reports), Health Resources and Services Administration (Area Resource File), SAMHSA (Uniform Facility Data Set), and the National Center for Health Statistics (mortality data). The major list of sources and potential data items used in the modeling are provided below.

The following lists provide the specific independent variables that were potential predictors in the models.

Claritas Data
Description Level
% Population aged 0–18 in block group Block group
% Population aged 19–24 in block group Block group
% Population aged 25–34 in block group Block group
% Population aged 35–44 in block group Block group
% Population aged 45–54 in block group Block group
% Population aged 55–64 in block group Block group
% Population aged 65+ in block group Block group
% Blacks in block group Block group
% Hispanics in block group Block group
% Other race in block group Block group
% Whites in block group Block group
% Males in block group Block group
% Females in block group Block group
% American Indian, Eskimo, Aleut in tract Tract
% Asian, Pacific Islander in tract Tract
% Population aged 0–18 in tract Tract
% Population aged 19–24 in tract Tract
% Population aged 25–34 in tract Tract
% Population aged 35–44 in tract Tract
% Population aged 45–54 in tract Tract
% Population aged 55–64 in tract Tract
% Population aged 65+ in tract Tract
% Blacks in tract Tract
% Hispanics in tract Tract
% Other race in tract Tract
% Whites in tract Tract
% Males in tract Tract
% Females in tract Tract
% Population aged 0–18 in county County
% Population aged 19–24 in county County
% Population aged 25–34 in county County
% Population aged 35–44 in county County
% Population aged 45–54 in county County
% Population aged 55–64 in county County
% Population aged 65+ in county County
% Blacks in county County
% Hispanics in county County
% Other race in county County
% Whites in county County
% Males in county County
% Females in county County

1990 Census Data
Description Level
% Population who dropped out of high school Tract
% Housing units built in 1940–1949 Tract
% Persons 16–64 with a work disability Tract
% Hispanics who are Cuban Tract
% Females 16 years or older in labor force Tract
% Females never married Tract
% Females separated/divorced/widowed/other Tract
% One-person households Tract
% Female head of household, no spouse, child <18 Tract
% Males 16 years or older in labor force Tract
% Males never married Tract
% Males separated/divorced/widowed/other Tract
% Housing units built in 1939 or earlier Tract
Average persons per room Tract
% Families below poverty level Tract
% Households with public assistance income Tract
% Housing units rented Tract
% Population 9–12 years of school, no high school diploma Tract
% Population 0–8 years of school Tract
% Population with associate's degree Tract
% Population some college and no degree Tract
% Population with bachelor's, graduate, professional degree Tract
Median rents for rental units Tract
Median value of owner-occupied housing units Tract
Median household income Tract

Uniform Crime Report Data
Description Level
Drug possession arrest rate County
Drug sale/manufacture arrest rate County
Drug violations' arrest rate County
Marijuana possession arrest rate County
Marijuana sale/manufacture arrest rate County
Opium cocaine possession arrest rate County
Opium cocaine sale/manufacture arrest rate County
Other drug possession arrest rate County
Other dangerous non-narcotics arrest rate County
Serious crime arrest rate County
Violent crime arrest rate County
Driving under influence arrest rate1 County

Other Categorical Data
Description Source Level
=1 if Hispanic, =0 otherwise Sample Person
=1 if non-Hispanic Black, =0 otherwise Sample Person
=1 if non-Hispanic Other, =0 otherwise Sample Person
=1 if male, =0 if female Sample Person
=1 if Northeast region, =0 otherwise 1990 Census State
=1 if Midwest region, =0 otherwise 1990 Census State
=1 if South region, =0 otherwise 1990 Census State
=1 if MSA with 1 million +, =0 otherwise 1990 Census County
=1 if MSA with <1 million, =0 otherwise 1990 Census County
=1 if non-MSA urban, =0 otherwise 1990 Census Tract
=1 if underclass tract Urban Institute Tract
=1 if no Cubans in tract, =0 otherwise 1990 Census Tract
=1 if urban area, =0 if rural area 1990 Census Tract
=1 if no arrests for dangerous non-narcotics, =0 otherwise UCR County

Miscellaneous Data
Variable Description Source Level
Alcohol death rate, direct cause ICD-9 County
Alcohol death rate, indirect cause ICD-9 County
Cigarettes death rate, direct cause ICD-9 County
Cigarettes death rate, indirect cause ICD-9 County
Drug death rate, direct cause ICD-9 County
Drug death rate, indirect cause ICD-9 County
Alcohol treatment rate UFDS County
Alcohol and drug treatment rate UFDS County
Drug treatment rate UFDS County
% Families below poverty level ARF County
Unemployment rate ARF County
Per capita income (in thousands) ARF County
Food stamp participation rate Census Bureau County
Single state agency maintenance of effort2 National Association of State Alcohol and Drug Abuse Directors (NASADAD) State
Block grant awards2 SAMHSA State
Cost of Services Factor Index (2001–2003)2 SAMHSA State
Total Taxable Resources Per Capita Index (1998)2 U.S. Department of Treasury State
Average suicide rate (1996–1998, per 10,000)1 ARF County
1 Indicates additional predictors used to model serious mental illness for 2001.
2 Indicates additional predictors used to model treatment gap for 2000–2001.

E.5.5 Selection of Independent Variables for the Models

For serious mental illness (SMI) modeled using 2001 data alone, independent variables for each age group were identified by a Chi-squared Automatic Interaction Detector (CHAID) algorithm, which does not use sample weights. Prior to this process, all the continuous variables were categorized using deciles and were treated as ordinal in CHAID. Region was treated as a nominal categorical variable in CHAID. Significant (at 3 percent level) independent variables from each age group model and final nodes in the tree-growing process were identified as predictor variables destined for inclusion at a later step.

Independently, a SAS stepwise logistic regression model was fit for each age group. The SAS stepwise was used because it was able to quickly run all of the variables for all of the models, although it was recognized that the software would not take into account the complex sample design. The independent variables included all the first-order or linear polynomial trend contrasts across the 10 levels of the categorized variables plus the gender, region, and race variables. Significant variables (at the 3 percent level) were identified from this process. Based on the combined list from CHAID and SAS, a list of variables was created that included the corresponding second- and third-order polynomials and the interaction of the first-order polynomials with the gender, race, and region variables.

Next, the variables were entered into a SAS stepwise logistic model at the 1 percent significance level. Because of past concerns about overfitting of the data in earlier estimation using the 1991 to 1993 NHSDA data, the significance levels were made quite stringent. These variables were then entered into a SUrvey DAta ANalysis (SUDAAN) logistic regression model because the SUDAAN software would adjust for the effects of the weights and other aspects of the complex sample design (RTI, 2001). All variables that were still significant at the 1 percent significance level were entered into the survey-weighted hierarchical Bayes (SWHB) process.

For outcome variables modeled using pooled 2000 and 2001 data, the predictor set was the same one used in the 1999–2000 analyses, which was obtained using the same variable selection method described above for SMI.

E.5.6 General Model Description

The model can be characterized as a complex mixed model (including both fixed and random effects) of the form:

[Notation depicting a complex mixed logistic model. Lambda equals X times beta plus Z times U. X times beta is the usual (fixed) regression contribution, and Z times U represents random effects for the States and FI region groups. Lambda is a vector of the log odds of the propensity for a particular person in a particular FI composite region in a given State to engage in the behavior of interest.

Each of the symbols represents a matrix or vector. The leading term Notation depicting X times beta, which is the usual (fixed) regression contribution. is the usual (fixed) regression contribution, and Notation depicting Z times U, which represents random effects for the States and FI region groups. represents random effects for the States and field interviewer (FI) region groups that the data will support and for which estimates are desired. Not obvious from the notation is that the form of the model is a logistic model used to estimate dichotomous data. The lambda vector has elements Notation depicting lambda, which is a vector of the log odds of the propensity for a particular person-k in a particular FI composite region-j in a given State i to engage in the behavior of interest., where the Notation depicting pi sub i, j, k, which is the propensity for the kth person in the jth FI composite region in the ith State to engage in the behavior of interest. is the propensity for the kth person in the jth FI composite region in the ith State to engage in the behavior of interest (e.g., to use marijuana in the past month). Also not obvious from the notation is that the model fitting utilizes the final "sample" weights as discussed above. The "sample" weights have been adjusted for nonresponse and poststratified to known Census counts.

The estimate for each State behaves like a "weighted" average of the design-based estimate in that State and the predicted value based on the national regression model. The "weights" in this case are functions of the relative precision of the sample-based estimate for the State and the predicted estimate based on the national model. The eight large States have large samples, and thus more "weight" is given to the sample estimate relative to the model-based regression estimate. The 42 small States and the District of Columbia put relatively more "weight" on the regression estimate because of their smaller samples. The national regression estimate actually uses national parameters that are based on the pooled 2000 and 2001 sample; however, the regression estimate for a specific State is based on applying the national regression parameters to that State's "local" county, block group, and tract-level predictor variables and summing to the State level. Therefore, even the national regression component of the estimate for a State includes "local" State data.

The goal then was to come up with the best estimates of beta and U. This would lead to the best estimates of lambda, which would in turn lead to the best estimate of pi. Once the best estimate of pi for each block group and each age/race/gender cell within a block group has been estimated, the results could be weighted by the projected Census population counts at that level to make estimates for any geographic area larger than a block group.

In the model fitting for the pooled 2000 and 2001 data, the small numbers of predictor variables updated in 2001 were used in both their 2000 and 2001 versions when they appeared in a model. To produce the 2000–2001 pooled small area estimates, the common fixed and random effects were first employed to form State estimates Notation pi at time 00 denotes the predicted probability of substance use for 2000. and Notation pi at time 01 denotes the predicted probability of substance use for 2001. for 2000 and 2001 respectively. These annualized State estimates then were combined as population-weighted averages of the form

Equation E28 ,    D

where Notation N at time 00 denotes the population projections in 2000. and Notation N at time 01 denotes the population projections in 2001. are the population counts obtained from Claritas Inc.

E.5.7 Implementation of Modeling

The solution to the equation for in Section E.5.6 is not straightforward but involves a series of iterative steps to generate values of the desired fixed and random effects from the underlying joint distribution. The basic process can be described as follows.

Let beta denote the matrix of fixed effects, eta be the matrix of State random effects i = 1-51, and nu denote the matrix of FI composite region effects j within State i. Because the goal is to estimate separate models for four age groups, it is assumed that the random effect vectors are four-variate Normal with null mean vectors and 4×4 covariance matrices Notation depicting D sub eta, which is the 4 by 4 variance-covariance matrix of the State random effects. and Notation depicting D sub nu, which is the 4 by 4 variance-covariance matrix of the FI composite region level random effects., respectively. To estimate the individual effects, a Bayesian approach is used to represent the joint density function given the data by Notation depicting joint probability density function of fixed effects (beta), State random effects (eta), composite field interviewer region effects (nu) within the State, and associated 4 by 4 variance-covariance matrices (D sub nu) and (D sub eta) assuming that the data (y) are known.. According to the Bayes process, this can be estimated once the conditional distributions are known:

Notation depicting conditional probability distribution of fixed effects (beta), assuming that the data (y) and the following parameters are known:  State random effects (eta), composite field interviewer region effects (nu) within the State, and associated 4 by 4 variance-covariance matrices (D sub nu) and (D sub eta)., Notation depicting conditional probability distribution of the 4 by 4 variance-covariance matrices (D sub nu)and (D sub eta), assuming that the data (y) and the following parameters are known:  fixed effects (beta), State random effects (eta), composite field interviewer region effects (nu) within the State., and Notation depicting conditional probability distribution of State random effects (eta) and composite field interviewer region effects (nu) within the State, assuming that the data (y) and the following parameters are known: fixed effects (beta)  and associated 4 by 4 variance-covariance matrices (D sub nu) and (D sub eta)..

To generate random draws from these distributions, MCMC processes need to be used. There is a body of methods for generating pseudo-random draws from probability distributions via Markov chains. A Markov chain is fully specified by its starting distribution Notation depicting probability of X sub zero, where X sub zero is the starting point. and the transition kernel Notation depicting probability of X sub t, given X sub (t-1), where t represents the current time or step..

Each MCMC step that involves the vector of binary outcome variables y in the conditioning set needs first to be modified by defining a pseudolikelihood using survey weights. In defining pseudolikelihood, weights are introduced after scaling them to the effective sample size based on a suitable design effect. Note that with the pseudolikelihood, the covariance matrix of the pseudoscore functions is no longer equal to the pseudoinformation matrix; therefore, a sandwich type of covariance matrix was used to compute the design effect. In this process, weights are largely assumed to be noninformative (i.e., unrelated to the outcome variable y). The assumption of noninformative weights is useful in finding tractable expressions for the appropriate information matrix of the pseudoscore functions. The pseudo log-likelihood remains an unbiased estimate of the finite-population log-likelihood regardless of this assumption.

Step I Notation depicting the conditional probability of fixed effects (beta), assuming that the data (y) and the following parameters are known:  State random effects (eta), composite field interviewer region effects (nu) within the State. (this does not depend on Notation depicting D sub eta., Notation depicting D sub nue.)

With a flat prior for Notation depicting fixed effect, beta sub a, where a denotes a specific age group., the conditional posterior is proportional to the pseudolikelihood function. For large samples, this posterior can be approximated by the multivariate normal distribution with mean vector equal to the pseudomaximum likelihood estimate and with asymptotic covariance matrix having the associated sandwich form. Assuming that the survey weights are noninformative makes the age group-specific Notation depicting fixed effect, beta sub a, where a denotes a specific age group. vectors conditionally independent of each other. Therefore, the Notation depicting fixed effect, beta sub a, where a denotes a specific age group. can be updated separately at each MCMC cycle.

Step II Notation depicting the conditional probability of State random effects (eta) for State i, assuming that the data (y) and the following parameters are known: fixed effects (beta), composite field interviewer region effects (nu) and the associated 4 by 4 variance-covariance matrix (D sub eta). (this does not depend on Notation depicting D sub nue)

Here, the conditional posterior is proportional to the product of the prior Notation depicting the prior distribution of the State i random effects, eta sub i., the pseudo-likelihood function Notation depicting the pseudo-likelihood function of the data given the parameters. as well as the prior Notation depicting the prior distribution of the fixed effects (beta) and variance-covariance matrix (D sub eta).; this last prior can be omitted as it does not involve Notation depicting the State random effect for State i. To calculate the denominator (or the normalization constant) of the posterior distribution for Notation depicting the State random effect for State i requires multidimensional integration and is numerically intractable. To get around this problem, the Metropolis-Hastings (M-H) algorithm is used that requires a dominating density convenient for Monte Carlo sampling. For this purpose, the mode and curvature of the conditional posterior distribution are used; these can be simply obtained from its numerator. Then a Gaussian distribution is used with matching mode and curvature to define the dominating density for M-H. As with the age group-specific Notation depicting the fixed effect, beta sub a, where a denotes a specific age group. parameters, the State-specific random effect vectors Notation depicting the State random effect for State i are conditionally independent of each other and can be updated separately at each MCMC cycle.

Step III Notation depicting the conditional probability of composite field interviewer region effects (nu) within the State, assuming that the data (y) and the following parameters are known: fixed effects (beta), State random effects (eta) and the associated 4 by 4 variance-covariance matrix (D sub nu). (this does not depend on Notation depicting D sub eta.)

Similar to step II.

Step IV Notation depicting the conditional probability of D sub eta, given State random effects eta., Conditional probability of D sub nu, given composite FI region random effects nu. (here, eta and nu include all the information from y)

Here, the pseudo-likelihood involving design weights comes in implicitly through the conditioning parameters eta and nu evaluated at the current cycle. An exact conditional posterior distribution is obtained because the inverse Wishart priors for Notation depicting D sub eta. and Notation depicting D sub nu. are conjugate.

E.5.8 Remarks

E.6. References

Eyerman, J., & Bowman, K. (2002, January). 2001 National Household Survey on Drug Abuse: Incentive experiment combined quarter 1 and quarter 2 analysis. Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies. [Available as a PDF at /nhsda/methods/incentive.pdf]

Folsom, R. E., & Judkins, D. R. (1997). Substance abuse in states and metropolitan areas: Model based estimates from the 1991–1993 National Household Surveys on Drug Abuse: Methodology report (DHHS Publication No. SMA 97–3140, Methodology Series M-1). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies. [Available at /methods.htm#methods]

Folsom, R. E., Shah, B., & Vaish, A. (1999). Substance abuse in states: A methodological report on model based estimates from the 1994–1996 National Household Surveys on Drug Abuse. In Proceedings of the Section on Survey Research Methods of the American Statistical Association (pp. 371–375). Washington, DC: American Statistical Association.

Grau, E. A., Bowman, K. R., Giacoletti, K. E. D., Odom, D. M., & Sathe, N. S. (2001, July). Imputation report. In 1999 National Household Survey on Drug Abuse: Methodological resource book (Vol. 1, Section 4). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies. [Available as a PDF at /nhsda/methods.cfm]

Grau, E. A., Bowman, K. R., Copello, E., Frechtel, P., Licata, A., & Odom, D. M. (2002, July). Imputation report. In 2000 National Household Survey on Drug Abuse: Methodological resource book (Vol. 1, Section 4). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies. [Available as a PDF at /nhsda/methods.cfm]

Grau, E. A., Barnett-Walker, K., Copello, E., Frechtel, P., Licata, A., Liu, B., & Odom, D. M. (2003, May). Imputation report. In 2001 National Household Survey on Drug Abuse: Methodological resource book (Vol. 1, Section 4). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies. [Available as a PDF at /nhsda/methods.cfm]

Office of Applied Studies. (1996). Substance abuse in states and metropolitan areas: Model based estimates from the 1991–1993 National Household Surveys on Drug Abuse—Summary report. Rockville, MD: Substance Abuse and Mental Health Services Administration. [Available as a WordPerfect 6.1 file at /analytic.htm]

RTI. (2001). SUDAAN user's manual: Release 8.0. Research Triangle Park, NC: RTI.

Wright, D. (2002a). State estimates of substance use from the 2000 National Household Survey on Drug Abuse: Volume I. Findings (DHHS Publication No. SMA 02–3731, NHSDA Series H-15). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies. [Available at /states.htm]

Wright, D. (2002b). State estimates of substance use from the 2000 National Household Survey on Drug Abuse: Volume II. Supplementary technical appendices (DHHS Publication No. SMA 02–3732, NHSDA Series H-16). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies. [Available at /states.htm]

Table E.1 Ratio of Average Widths of Change Between the 1999–2000 Pooled Data and the 2000–2001 Pooled Data (Based on the Underestimated Model-Based Correlations)
State Age in Years Total
12–17 18–25 26+
Past Month Use of Marijuana
CA 0.84 0.94 0.77 0.72
FL 0.74 1.02 0.72 0.73
IL 0.87 0.99 0.60 0.71
MI 0.74 1.04 0.88 0.84
NY 0.72 0.94 0.54 0.64
OH 0.75 1.01 0.75 0.80
PA 0.74 1.00 0.86 0.86
TX 0.94 1.01 0.38 0.67
Average 0.79 0.99 0.69 0.75
Past Year Use of Cocaine
CA 0.99 0.83 0.59 0.60
FL 0.64 1.20 0.92 1.05
IL 0.90 0.81 0.32 0.50
MI 0.09 0.96 0.79 0.79
NY 0.48 0.75 0.52 0.61
OH 0.44 1.07 0.69 0.87
PA 0.59 0.77 0.46 0.52
TX 0.86 0.97 0.39 0.67
Average 0.63 0.92 0.59 0.70
Past Month Use of Alcohol
CA 0.98 1.08 1.01 1.00
FL 0.82 0.91 1.03 1.01
IL 0.91 1.00 0.92 0.90
MI 0.96 0.99 1.00 0.95
NY 0.98 0.76 0.96 0.96
OH 0.93 0.87 1.09 1.10
PA 0.96 0.83 0.92 0.90
TX 1.25 1.03 1.10 1.07
Average 0.97 0.93 1.01 0.99
Past Month Use of Cigarettes
CA 1.03 1.14 1.02 0.99
FL 0.97 1.05 1.14 1.13
IL 1.04 1.20 1.10 1.12
MI 0.95 1.05 1.04 1.01
NY 0.81 1.10 1.11 1.08
OH 1.05 1.22 1.02 1.02
PA 1.02 1.05 1.10 1.07
TX 1.11 1.27 1.03 1.02
Average 1.00 1.14 1.07 1.05
Note: Ratio = Average width of model-based PIs of change for substates / Average width of design-based CIs of change for substates
Note: The change measure is defined as the odds ratio {P2/(1-P2)}/{P1/(1-P1)}, where P1 is the pooled 1999–2000 small area estimate and P2 is the pooled 2000–2001 small area estimate.
CI = confidence interval; PI = predication interval.
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999, 2000, and 2001.

Table E.2 Average Correlation Between the 1999–2000 and the 2000–2001 Model-Based and Design Based Estimates (Based on the Underestimated Model-Based Correlations)
State Age in Years Total
12–17 18–25 26+
DB MB DB MB DB MB DB MB
Past Month Use of Marijuana
CA 0.3204 0.1217 0.4943 0.1508 0.4107 0.3515 0.3273 0.3701
FL 0.5079 0.1998 0.5020 0.1456 0.3114 0.3024 0.3492 0.3308
IL 0.4133 0.1733 0.4996 0.1649 0.5736 0.3816 0.5988 0.3986
MI 0.3316 0.1322 0.4838 0.1203 0.5615 0.3651 0.5476 0.3843
NY 0.4372 0.2003 0.5343 0.1757 0.4083 0.3752 0.4609 0.3991
OH 0.3827 0.1516 0.6195 0.1514 0.5057 0.3990 0.5723 0.3984
PA 0.4838 0.1611 0.5863 0.1533 0.5799 0.3420 0.6406 0.3549
TX 0.5088 0.1337 0.5064 0.1675 0.3134 0.4462 0.4329 0.4362
Average 0.4346 0.1634 0.5321 0.1540 0.4633 0.3725 0.5094 0.3856
Past Year Use of Cocaine
CA 0.4937 0.1827 0.3807 0.1365 0.4240 0.2833 0.4380 0.3131
FL 0.3228 0.2723 0.4839 0.1286 0.6494 0.2852 0.5982 0.2994
IL 0.6058 0.3017 0.4796 0.1452 0.3945 0.2476 0.4316 0.2724
MI 0.4221 0.2550 0.5056 0.1419 0.5341 0.2935 0.5134 0.3233
NY 0.4502 0.2938 0.4186 0.1903 0.4097 0.2728 0.3996 0.3012
OH 0.5629 0.2872 0.4782 0.1389 0.5790 0.2679 0.5704 0.2887
PA 0.3517 0.2333 0.5553 0.1465 0.4333 0.2681 0.4394 0.2972
TX 0.3932 0.2160 0.3400 0.1274 0.2720 0.2830 0.3627 0.2952
Average 0.4455 0.2633 0.4635 0.1453 0.4662 0.2743 0.4726 0.2972
Past Month Use of Alcohol
CA 0.3987 0.0866 0.5756 0.0821 0.5560 0.1390 0.5808 0.1562
FL 0.4226 0.0998 0.5331 0.1181 0.4971 0.1539 0.5078 0.1659
IL 0.3669 0.1073 0.5651 0.0958 0.4712 0.1379 0.4637 0.1563
MI 0.4200 0.1142 0.4815 0.0836 0.5311 0.1466 0.4978 0.1634
NY 0.4680 0.1147 0.4835 0.1540 0.4485 0.1382 0.4914 0.1571
OH 0.3443 0.1063 0.5001 0.1032 0.4647 0.1207 0.4843 0.1383
PA 0.4636 0.0793 0.6181 0.1300 0.4895 0.1264 0.4856 0.1471
TX 0.6342 0.0738 0.5562 0.1084 0.6464 0.1576 0.6509 0.1700
Average 0.4444 0.0990 0.5351 0.1124 0.5083 0.1401 0.5136 0.1569
Past Month Use of Cigarettes
CA 0.3284 0.0717 0.5193 0.0491 0.5963 0.0760 0.5655 0.0910
FL 0.4907 0.0863 0.5048 0.0912 0.5069 0.0788 0.5184 0.0848
IL 0.4375 0.0827 0.5203 0.0861 0.5016 0.0367 0.5550 0.0577
MI 0.4284 0.0440 0.5433 0.0493 0.4787 0.0555 0.4999 0.0647
NY 0.3974 0.0829 0.5050 0.0715 0.4655 0.0581 0.4643 0.0706
OH 0.4731 0.0688 0.5462 0.0461 0.4433 0.0596 0.4696 0.0714
PA 0.4733 0.0734 0.5898 0.0483 0.4217 0.0558 0.4253 0.0727
TX 0.5882 0.0766 0.6083 0.0544 0.6135 0.1101 0.6321 0.1200
Average 0.4659 0.0735 0.5447 0.0634 0.4931 0.0652 0.5108 0.0778
Note: The design based (DB) correlation is derived from the SUDAAN sampling variance and covariance calculations for P1 and P2, where P1 is the 1999–2000 pooled small area estimate and P2 is the 2000–2001 pooled small area estimate. SUDAAN uses between-replicate, within-FI (field interviewer) region, mean squares, and cross products. The DB correlation on the log-odds scale is the same as on the prevalence scale. The model-based (MB) correlations are Bayes posterior correlations for the log-odds calculated from the Markov chain Monte Carlo (MCMC) samples. The MB correlations are underestimated because the software cannot properly account for the sampling covariance resulting from the 2000 data overlap.
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999, 2000, and 2001.

Table E.3 Ratio of Average Widths of Change Between the 1999–2000 Pooled Data and the 2000–2001 Pooled Data (Based on the Appropriately Estimated Model-Based Correlations)
State Age in Years Total
12–17 18–25 26+
Past Month Use of Marijuana
CA 0.65 0.72 0.60 0.56
FL 0.55 0.78 0.58 0.58
IL 0.69 0.73 0.49 0.57
MI 0.58 0.76 0.72 0.68
NY 0.56 0.72 0.46 0.54
OH 0.57 0.71 0.62 0.63
PA 0.57 0.71 0.67 0.66
TX 0.68 0.73 0.34 0.55
Average 0.61 0.73 0.56 0.60
Past Year Use of Cocaine
CA 0.71 0.66 0.53 0.54
FL 0.48 0.91 0.76 0.87
IL 0.69 0.64 0.28 0.42
MI 0.07 0.73 0.71 0.71
NY 0.36 0.63 0.44 0.52
OH 0.33 0.82 0.59 0.73
PA 0.47 0.56 0.39 0.43
TX 0.65 0.76 0.36 0.59
Average 0.47 0.71 0.51 0.60
Past Month Use of Alcohol
CA 0.72 0.76 0.73 0.72
FL 0.59 0.66 0.76 0.74
IL 0.67 0.70 0.65 0.63
MI 0.71 0.70 0.73 0.69
NY 0.70 0.55 0.71 0.71
OH 0.68 0.62 0.77 0.77
PA 0.70 0.58 0.66 0.65
TX 0.88 0.71 0.76 0.72
Average 0.71 0.66 0.72 0.70
Past Month Use of Cigarettes
CA 0.77 0.84 0.72 0.70
FL 0.71 0.78 0.81 0.80
IL 0.79 0.85 0.78 0.80
MI 0.69 0.76 0.81 0.78
NY 0.60 0.80 0.82 0.80
OH 0.78 0.84 0.77 0.76
PA 0.77 0.73 0.81 0.78
TX 0.81 0.90 0.75 0.74
Average 0.74 0.81 0.78 0.77
Note: Ratio = Average width of model-based PIs of change for substates / Average width of design-based CIs of change for substates
Note: The change measure is defined as the odds ratio {P2/(1-P2)}/{P1/(1-P1)}, where P1 is the pooled 1999–2000 small area estimate and P2 is the pooled 2000–2001 small area estimate.
CI = confidence interval; PI = predication interval.
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999, 2000, and 2001.

Table E.4 Average Correlation Between the 1999–2000 and the 2000–2001 Model-Based and Design-Based Estimates (Based on the Appropriately Estimated Model-Based Correlations)
State Age in Years Total
12–17 18–25 26+
DB MB DB MB DB MB DB MB
Past Month Use of Marijuana
CA 0.3204 0.4760 0.4943 0.4916 0.4107 0.5962 0.3273 0.6235
FL 0.5079 0.5380 0.5020 0.5025 0.3114 0.5441 0.3492 0.5775
IL 0.4133 0.4812 0.4996 0.5351 0.5736 0.5820 0.5988 0.6067
MI 0.3316 0.4588 0.4838 0.5279 0.5615 0.5752 0.5476 0.5944
NY 0.4372 0.5092 0.5343 0.5221 0.4083 0.5293 0.4609 0.5668
OH 0.3827 0.5138 0.6195 0.5711 0.5057 0.5844 0.5723 0.6269
PA 0.4838 0.4861 0.5863 0.5708 0.5799 0.5904 0.6406 0.6112
TX 0.5088 0.5371 0.5064 0.5606 0.3134 0.5498 0.4329 0.6190
Avg. 0.4346 0.5027 0.5321 0.5400 0.4633 0.5659 0.5094 0.6010
Past Year Use of Cocaine
CA 0.4937 0.5673 0.3807 0.4353 0.4240 0.4077 0.4380 0.4349
FL 0.3228 0.5644 0.4839 0.4814 0.6494 0.4919 0.5982 0.5117
IL 0.6058 0.5783 0.4796 0.4570 0.3945 0.4344 0.4316 0.4747
MI 0.4221 0.5396 0.5056 0.4837 0.5341 0.4272 0.5134 0.4568
NY 0.4502 0.5941 0.4186 0.4262 0.4097 0.4536 0.3996 0.4855
OH 0.5629 0.5787 0.4782 0.4728 0.5790 0.4549 0.5704 0.4816
PA 0.3517 0.4995 0.5553 0.526