|
The methodology that was used
for these small area estimates employs logistic regression models that
combine NHSDA data with local area indicators that were found to be associated
with substance abuse. Several innovative strategies were used to produce
these estimates.
2.1 Potential Estimation Strategies for Small Areas
Several procedures have been
used by other researchers to produce various health and economic statistics
for small areas in other fields. Schaible (footnote#6) presents information
describing three common indirect estimation methods that have been used
by federal statistical agencies: synthetic estimators, regression estimators,
and composite estimators.
(footnote#7) State level synthetic estimators are
constructed by taking national estimates for demographic subgroups and
applying them to the demographic composition of the particular States
for which estimates are desired. Synthetic estimators, however, often
fail to reflect the actual variation across local areas because demographic
characteristics (age, gender, race/ethnicity, etc.) do not fully
determine the phenomena being estimated. Regression estimators
incorporate additional factors or predictors in the estimation procedure
in an attempt to improve the estimates for the small area. Synthetic and
regression estimators are indirect estimation methods and do not
require any direct measures of the phenomena for the small area. In many
cases, however, direct estimates are available for the small area but
the sample size for the small area is not large enough to yield precise
estimates. Composite estimators are constructed as a weighted combination
of direct survey estimators and indirect estimators. They are used to
take maximum advantage of any direct information from the survey for the
small area, to reduce the bias associated with indirect estimators, and
are likely to be more accurate than using solely either the direct or
indirect estimator.
(footnote#8)
In developing the small area
estimates for the States and MSAs included in this report, we developed
procedures that built on earlier small area estimation work, which focused
on a variety of social, health and economic phenomena. The
previously cited works by Schaible (1996) and Ghosh and Rao (1994) discuss
these types of estimators and note that a variety of research has been
carried out investigating the optimal weighting and estimation schemes
for composite estimators. In this study, similar methods were developed
to produce estimates of substance abuse.
2.2 Small Area Estimation Method Used
This study built on prior methodologies and
improved them in several ways. The estimator used can be approximated
mathematically as a composite estimator which is the weighted average
of an indirect regression estimator and a direct survey estimator. The
basic form of the composite estimator approximation is:
|
ˆ |
|
|
_ |
|
|
|
|
|
__ |
|
__ |
|
__ |
|
|
θ |
= |
фs |
∏ |
+ |
(1 |
- |
фs |
) |
[ps |
- |
(∏ s |
- |
∏) |
] |
where θ is
the estimated prevalence rate for a given small area. In this equation,
∏ corresponds to the indirect regression
estimator which is obtained as the population weighted average of block
group level predictions of substance abuse prevalence and the term in
the square brackets is the direct survey estimator which is a function
of the actual NHSDA survey estimator ps
for the local area.(footnote#9)
In the bracketed direct survey estimator, estimator, estimator, estimator, ps
is the NHSDA survey weighted prevalence rate
estimate and ∏ s is
the corresponding survey weighted mean of person level logistic model
predicted probabilities of use.(footnote#10)
The weighting factors, фs
and 1- фs in
the composite estimator are a function of the sampling variance of the
local area effects and the variances of the logistic regression estimators.
They were constructed so that when the NHSDA sample size in a State was
large, the estimates are very close to the direct survey estimate.
In fitting these models, three
innovations were introduced:
First, logistic regression models for
the relationship between substance abuse and a variety of predictors,
including demographic characteristics and a number of social and economic
characteristics, were fit at the block group level using Census block
group and tract level predictors of substance abuse. These block level
estimates were then summed to arrive at the estimates for the States and
MSAs.
Second, additional county-level predictors
were included in the model to account for still more variation across
local areas.
Third, the survey design weights were
used in the estimation of the logistic regression coefficients and the
State and MSA local area effects. This innovation causes the State small
area estimates to sum to the NHSDA estimates for US regions and the nation.(footnote#11)
2.3 Fitting the Small Area Estimation
Model
The final
estimates of the logistic regression coefficients and the local area effects
for States and MSAs were estimated using an iteratively reweighted least
squares algorithm patterned after Breslow and Clayton's (1993)(footnote#12)
prescription
for analyzing generalized linear mixed models (GLMM).
2.3.1 Data
Used
Four types
of data were used in the estimation: NHSDA data, Census data, county level
(social indicators) correlates of substance abuse, and block group level
population projections.
The NHSDA
data were used to fit models for each of the 11 outcome measures described
in Chapter 1. Estimates were made for four age groups (12-17, 18-25, 26-34,
and 35+). In addition, separate models were fit for two geographic subpopulations
comprised of 1) the six large MSAs and 2) the remainder of the nation.
Altogether, this resulted in the fitting of 88 models, (11 outcome measures
x 4 age groups x 2 geographic subpopulations). The dependent variables
in these models were the rates for the outcome measures. In fitting these
models, two types of auxiliary data were used: census data and county
level indicators of drug use. After models were fit, the population projections
were applied to the estimated rates at the block group level.
NHSDA
data: The data that were used
for the estimation came from the respondents to the NHSDA for 1991-1993.
Essentially the same survey methodology was used in each of these three
years: The sample was a deeply stratified, multistage national probability
sample of civilian persons age 12 and older, living in households and
certain group quarters, such as, college dormitories and homeless shelters.
Civilians living on military installations are included in the target
population. On the other hand, military personnel on active duty as well
as most transient populations, such as homeless people not residing in
shelters, were not included in the target population.
Each year,
roughly 120 primary sampling units (PSUs) were selected at the first stage
of sampling. These PSUs were generally individual counties or groups of
adjacent counties constituting Metropolitan Statistical Areas (MSAs).
At the second stage of selection, groups of Census blocks, called sample
segments, were selected. Within each segment, dwelling units were selected;
within each successfully screened dwelling unit, either zero, one or two
occupants were selected for the NHSDA interview. Generally, the NHSDA
yields dwelling unit screening response rates of approximately 94 percent
and interview response rates of approximately 80 percent. The pooled 1991-1993
NHSDAs yielded a combined sample size of 87,915 people. In 1991-1993,
the NHSDA included a special sample from six large MSAs. This sample was
designed to provide independent estimates of prevalence of substance abuse
in Chicago, Denver, Los Angeles, Miami, New York, and Washington, D.C.
Since nearly
all of the substance abuse data that are collected in the survey are highly
sensitive, rigorous methods are used in the NHSDA to protect the privacy
and confidentiality of responses. The interviewer works with the respondent
to find a private place for the interview. After the respondent answers
some of the less sensitive questions by responding to interviewer queries,
he or she is trained in the completion of a series of self-administered
questionnaires. These self-administered questionnaires allow the respondent
to conceal his or her answers from both the interviewer and any household
members who may be nearby. These methods that are undertaken to protect
the privacy and confidentiality of respondents have been shown to increase
the reporting of substance abuse.(footnote#13)
Census
data: Exhibit 2.1
lists the Census data that were used in the modeling. Exhibit 2.1
lists nineteen groups of variables that were considered as potential predictors
in each of the 88 models formulated for this study. All of the attributes
come from the 1990 U.S. Census long form sample.(footnote#15)
Exhibit 2.1 1990 Census Variables
Used to Model Prevalence of Substance Abuse.From
Summary Tape File 3; 1990 Census of Population and Housing.
|
1. Race x
Hispanic - - Percent: |
White nonHispanic
Black nonHispanic
Hispanic
Other
|
|
2. Education
for persons 18 or older- - Percent with: |
0-8 years
9-12 years and no
H.S. diploma
H.S. graduate
some college and
no degree
associate degree
bachelors, graduate,
or professional degree |
|
3. Age - -
Percent aged: |
0-18 years
19-24 years
25-34 years
35-44 years
45-54 years
55-64 years
65 and over
|
|
4. Poverty
- - Percent: |
families below poverty
level |
|
5. Public
Assistance - - Percent of: |
households with public
assistance income |
|
6. Disability
- - Percent: |
persons 16-64 with
a work disability |
|
7. Household
composition - - Percent: |
one-person households
|
of households with
female heads (no spouse present) with children under 18
|
|
8. Employment
- - Percent: |
of men 16 years and
older in the labor force
of women 16 years
and older in the labor force |
|
9.Housing
value - owner occupied
units: |
Median value of owner occupied
housing units |
|
10.
Housing
rent - rental units - -
|
Median rents for rental
units |
|
11. Sex by
marital status (persons 16 years
and older) - - Percent: |
Females currently
married and not separated
Females separated, divorced, or widowed
Females never married
Males currently married and not separated
Males separated, divorced, or widowed
Males never married
|
|
12.
Income - - |
Median Household Income
|
|
13. Urbanicity
- - Percent: |
of persons residing
in an urban place |
|
14. Urbanized
Area - - Percent: |
of persons in an
MSA urbanized area |
|
15. Age of
Housing Units (HU) - - Percent: |
of HUs built before
1939
of HUs built from
1940 to 1949 |
|
16. High School
Dropout Rate (Tract level only) - -
Percent: |
of high school age
children who have dropped out |
|
17.
Underclass
Tract Indicator (Tract
level only) |
|
|
|
18.
Hispanic Subpopulations - - Percent: |
of Hispanics that
are Cuban
of Hispanics that
are Puerto Rican |
|
19. Other
Race Subpopulations - - Percent: |
Population that is
Asian and Pacific Islander
Population that
is Native American, Alaskan, or Aleut |
|
From Summary Tape File 3; 1990 Census
of Population and Housing.(footnote#14)
|
County
level (social indicator) correlates of substance abuse:
In addition to the Census variables, recoded county level 'social indicators'
of substance abuse were also considered. These county level variables
were obtained from three sources: The first of these sources is the FBI's
Uniform Crime Reports data base for 1991. This source yielded data on
arrest rates per 10,000 persons for illegal drug possession, and drug
sales/manufacture by several reported drug categories, and on total violent
crime arrest rates. The second source combined data from the 1991 and
1992 National Drug and Alcoholism Treatment Unit Survey (NDATUS) conducted
by the Substance Abuse and Mental Health Services Administration. From
this source, data was obtained on the 1991 and 1992 average treatment
rates per 1,000 county residents for (1) alcohol treatment alone and (2)
for illicit drug treatment (includes treatment for both drug and alcohol
use). Thirdly, 1990 alcohol related death rates per 10,000 county residents
were obtained from the National Center for Health Statistics national
death certificate registry. Two such rates were considered in this research:
(a) the A any related rate@ which includes all ICD-8 cause-of-death codes
that are deemed to have a significant link to alcohol abuse, and (b) a
more restrictive rate which requires explicit mention of alcohol on the
death certificate.
Block group level
population projections: Population projections
for 1992 for each Census block group, were obtained from Claritas, Inc.(footnote#16)
These projections included counts by age group (12-17, 18-24, 25-29, 30-34,
35+ years old), by gender, by race (white, black, American Indian plus
Asians & Pacific Islanders, and other races) and by Hispanic indicator.
These block group level population projections were adjusted by gender
category to conform to the four age groups and the four race ethnicity
groups employed for data reporting by the NHSDA.

2.3.2 States
and MSAs Selected for Estimation
Composite
small area estimators require at least some direct information for the
small areas under consideration. Thus, only those States and MSAs which
had some NHSDA sample points were included in the study. The States and
MSAs selected for small area estimation are presented in Exhibit 2.2.
This exhibit shows the number of people who responded to the combined
1991-1993 NHSDA surveys plus information on the NHSDA sample including
numbers of the sample MSA/County units, sample block groups, and the estimated
1992 population.
The States
and MSAs presented in Exhibit 2.2 were chosen for small
area estimation because:
- The NHSDA sample size was large enough
(~ 400 persons) to support model based, indirect estimation, and not necessarily
large enough to support direct NHSDA estimation,
- The number of distinct sample MSA/County
units was greater than or equal to 4 units, and
- The number of distinct sample segments
was greater than 40 segments.
Four exceptions
to the four or more MSA/County unit rule were allowed for States that
met the sample person and area segment minimums.

Exhibit 2.2A States Selected for Inclusion
in the Study: Population Size and NHSDA Sample Characteristics.
|
|
STATE
|
SAMPLE
MSA/
COUNTIES1 |
SAMPLE
BLOCK GROUPS2 |
SAMPLE
RESPONDING PERSONS3
|
1992
POPULATION
PROJECTION4 |
| |
TOTAL UNITED STATES
|
213
|
8,942
|
84,974
|
205,945
|
| |
NORTH EAST REGION
|
34
|
1,489
|
13,681
|
42,236
|
| |
New Jersey
|
6
|
167
|
1,523
|
6,443
|
| |
New York
|
7
|
843
|
8,505
|
14,892
|
| |
Pennsylvania
|
11
|
269
|
2,133
|
9,945
|
| |
SOUTH REGION
|
49
|
1,723
|
15,456
|
71,396
|
| |
Florida
|
13
|
964
|
10,066
|
11,265
|
| |
Georgia
|
5
|
118
|
1,061
|
5,442
|
| |
Kentucky
|
6
|
112
|
1,081
|
3,053
|
| |
Louisiana
|
6
|
133
|
1,099
|
3,387
|
| |
North Carolina
|
12
|
222
|
1,863
|
5,641
|
| |
Oklahoma
|
4
|
63
|
494
|
2,563
|
| |
South Carolina
|
4
|
48
|
330
|
2,939
|
| |
Tennessee
|
4
|
92
|
780
|
4,117
|
| |
Texas
|
13
|
503
|
5,082
|
13,751
|
| |
Virginia
|
9
|
362
|
3,538
|
5,227
|
| |
West Virginia
|
3
|
44
|
394
|
1,497
|
| |
NORTH CENTRAL REGION
|
85
|
3,271
|
32,346
|
48,968
|
| |
Illinois
|
6
|
799
|
8,088
|
9,378
|
| |
Indiana
|
6
|
114
|
978
|
4,581
|
| |
Kansas
|
4
|
60
|
515
|
2,010
|
| |
Michigan
|
5
|
175
|
1,187
|
7,615
|
| |
Minnesota
|
3
|
76
|
684
|
3,576
|
| |
Missouri
|
6
|
120
|
1,059
|
4,223
|
| |
Ohio
|
12
|
264
|
2,021
|
8,946
|
| |
Wisconsin
|
3
|
55
|
475
|
4,021
|
| |
WEST REGION
|
45
|
2,459
|
23,491
|
43,346
|
| |
California
|
22
|
1,320
|
12,364
|
24,342
|
| |
New Mexico
|
5
|
74
|
676
|
1,199
|
| |
Oregon
|
4
|
59
|
412
|
2,397
|
| |
Washington
|
3
|
73
|
690
|
4,094
|
1MSA/Counties refers to geographic entities
formed to estimate random effect terms in the logistic model and which
are generally analogous to NHSDA primary sampling units (PSUs). The exceptions
are the distinct MSA constituents of PSUs that crossed State boundaries
or combined more than one MSA.
2Block groups refers to the sample segments
which were selected at the second stage of selection in the 1991-1993
NHSDA.
387,915 people responded to the 1991-1993 NHSDA
however 2,941 people were omitted from the small area estimation research
because of missing local area indicator variables which were used as potential
predictors in the models.
4Population projections presented in 1000's.

Exhibit 2.2B - - MSA Small
Areas Selected for Inclusion in the Study: Population Size and NHSDA
Sample Characteristics
|
|
MSA
|
SAMPLE
BLOCK GROUPS2 |
SAMPLE
RESPONDING PERSONS3
|
1992
POPULATION
PROJECTION4 |
|
|
Anaheim-Santa Ana, CA
|
*
|
*
|
1,996
|
| |
Atlanta, GA
|
*
|
*
|
2,425
|
| |
Baltimore, MD
|
*
|
*
|
1,996
|
| |
Boston, MA
|
*
|
*
|
3,145
|
| |
Chicago, IL
|
735
|
7,537
|
4,981
|
| |
Dallas, TX
|
*
|
*
|
2,120
|
| |
Denver, CO
|
719
|
7,585
|
1,346
|
| |
Detroit, MI
|
*
|
*
|
3,593
|
| |
El Paso, TX
|
*
|
*
|
456
|
| |
Houston, TX
|
*
|
*
|
2,661
|
| |
Los Angeles, CA
|
768
|
7,533
|
7,127
|
| |
Miami-Hialeah, FL
|
725
|
8,142
|
1,600
|
| |
Minneapolis-St. Paul, MN
|
*
|
*
|
2,035
|
| |
Nassau-Suffolk, NY
|
*
|
*
|
2,178
|
| |
New York, NY
|
730
|
7,676
|
7,086
|
| |
Newark, NJ
|
*
|
*
|
1,500
|
| |
Oakland, CA
|
*
|
*
|
1,727
|
| |
Philadelphia, PA-NJ
|
*
|
*
|
4,037
|
| |
Phoenix, AZ
|
*
|
*
|
1,770
|
| |
San Antonio, TX
|
*
|
*
|
1,040
|
| |
San Bernardino, CA
|
*
|
*
|
2,122
|
| |
San Diego, CA
|
*
|
*
|
2,089
|
| |
St. Louis, MO-IL
|
*
|
*
|
2,006
|
| |
Tampa-St. Petersburg, FL
|
*
|
*
|
1,822
|
| |
Washington, DC
|
725
|
7,795
|
3,345
|
*Number of sample block groups ranged from 40 to 110; number
of respondents ranged from 400 to 1200 in these MSAs.
2Block groups refers to the sample segments
which were selected at the second stage of selection in the 1991-1993
NHSDA.
387,915 people responded to the 1991-1993 NHSDA
however 2,941 people were omitted from the small area estimation research
because of missing local area indicator variables which were used as potential
predictors in the models.
4Population projections presented in 1000's.
See Appendix B for a list of counties included in each MSA.

2.3.3 Summary
of Methodology
In summary,
the estimates were produced by completing the following three basic steps:
- Estimate regression parameters:
Using NHSDA data, logistic regression models were developed which identified
(and estimated the parameters associated with) local area indicators that
were significant predictors of the eleven substance abuse measures in
NHSDA sample locations. Separate models were run for each of the 11 outcome
measures and for each of four age groups (12-17, 18-25, 26-34, and 35+).
Because the 1991-1993 NHSDA included very large samples for six large
US cities, separate models were fit collectively for these six large cities
and for the remainder of the nation. Thus, a total of 88 separate models
were used for this small area estimation. An important feature of these
logistic regression models was the inclusion of random-effect parameters
that adjust for the actual A direct@ estimates obtained from NHSDA sample
data in the States and MSAs for which small area estimates were to be
generated.
- Apply regression parameters to the
entire U.S.: Predicted estimates
of substance abuse rates were generated for every Census block group in
the U.S. by applying the regression parameters to the local area indicators,
which were known for every block group. Within each Census block group,
estimates were made for each of the 11 measures and for 32 demographic
groups defined by four age groups, four race/ethnicity (Hispanic, non-Hispanic
black, non-Hispanic white, non-Hispanic other race) groups, and gender,
(4x4x2 = 32 demographic subgroups).
- Sum up block group estimates to the
State and MSA levels: Estimated
rates were multiplied by population estimates for each block group, resulting
in estimated numbers of people for each of the eleven attributes. Data
from all block groups in States and MSAs were then summed to give final
State and MSA total estimates.

2.3.4 Confidence
Intervals
Asymmetric
95 percent confidence intervals were constructed based on the logit transformation.
This is the typical procedure that is used when estimating characteristics
with small prevalence. The mean square errors that are used to construct
the confidence intervals include a contribution that accounts for the
variances and covariances of the logistic regression coefficients incorporated
in the indirect component (∏
) and the sampling variance of the direct local
area estimate. The weight put on these mean square error contributions
depends on the values of the weights in Equation (1).
|