|
In
any small area estimation (SAE) project, questions arise as to how well
the model describes the outcome variables, that is, in this case, how
effective are the small area estimation models in describing substance
abuse in the States and MSAs. When producing small area estimates, there
usually are no A gold-standards,@ (i.e., direct estimates with known reliability
and validity) that can be used to evaluate the overall methodology. In
fact, if there were such estimates available, policy makers could use
those and would not need small area estimates. Thus, in the absence of
such gold-standards for use in an evaluation, one uses multiple evaluation
procedures which, when considered jointly, give an overall indication
of the quality of the small area estimates. In this small area estimation
study, a large variety of evaluations were conducted. The following summarizes
some of these evaluations.
(footnote#17)
This SAE study
developed a method that 1) reduced bias associated with synthetic/regression
estimates which often fail to reflect the full range of variability across
small areas, 2) made maximum use of the information that was collected
in the NHSDA, and 3) produced estimates that summed to the NHSDA direct
estimates for the nation. The evaluations discussed in this chapter found
that:
- The statistical evaluations based on examining
the agreement between direct NHSDA estimates and the SAE estimates within
evaluation subgroups (see below for an explanation of how this was done),
indicated that the SAE estimates had moderate to high correlations with
direct estimates, that there were few significant differences between
the direct estimates and SAE estimates, and that the SAE estimates adequately
reflected the range of prevalence in substance abuse.
- The SAE national estimates closely track
the NHSDA direct estimates.
- Statistical evaluations based on comparisons
to external data on cigarette and alcohol use indicated that the SAE estimates
were at about the same level as the external data, that the correlations
with the external data were moderate to high, and that the estimates were
good to very good at reflecting the range of prevalence of cigarette and
alcohol use across the States.
- Statistical evaluations based on comparisons
to external data from administrative data on arrest and past year treatment
for substance abuse indicated that the SAE estimates for States were lower
than those based on administrative data (as are the NHSDA estimates at
the national level) and had only poor to fair correlations with the corresponding
administrative data.
- Substance abuse characteristics that have
higher prevalence were more adequately reflected by the modeling than
those characteristics with lower levels of prevalence.
4.1 Evaluation Strategies
Three strategies were used
to evaluate the estimates:
1) Goodness-of-fit tests
2) Comparisons
to external measures of substance abuse
3) Comparisons
of the final SAE model to other models
1) Goodness-of-fit
tests for the final SAE model
Statistical tests for how
well models predict the outcome variables (substance abuse in this case)
are termed goodness-of-fit tests. Different goodness-of-fit tests detect
different types of problems in the estimates, i.e., different lack-of-fit
attributes.
(footnote#18)
Two sets of tests were completed, each of which used three
goodness-of-fit measures (the correlation, χ2 probability,
and range ratio). The first set evaluated the final SAE model by partitioning
the entire 1991-1993 NHSDA sample
into evaluation subgroups,
(footnote#19)
calculating both direct survey based and Model-based
estimates for each evaluation subgroup, and comparing these using the
three goodness-of-fit statistics. The evaluation subgroups are constructed
so that the sample size in each evaluation subgroup is large enough to
produce reliable direct estimates. This can be thought of as using the
sample to create a set of artificial subpopulations that are predicted
to have similar rates of substance abuse within each subgroup and different
rates across subgroups. These predicted prevalence estimates are then
compared to those that were actually observed in the NHSDA for the artificial
populations.
The second
set of goodness-of-fit tests were conducted using what is termed a cross-validation
approach. Under the cross-validation approach, the models are first fit
to part of the data and the resulting prediction equations are then used
to make estimates using the remainder of the full data set; that is, the
part not used initially to fit the model.
(footnote#20)
This sample splitting is done
multiple times and the associated remainder subsample estimates are used
separately to form the evaluation subgroups and calculate the associated
goodness-of-fit statistics. Finally, the multiple goodness-of-fit measures
are averaged to form a single measure.
(footnote#21)
2) Comparisons to external
measures of substance abuse
The second evaluation strategy
that was used was to compare the SAE estimates to indicators of substance
abuse from external sources. Three different comparisons were made:
Comparisons of the SAE estimates of use
of alcohol and cigarettes to similar estimates from the Behavioral Risk
Factor Surveillance System (BRFSS), which is a telephone survey conducted
in all 50 States under cooperative agreements with the Centers for Disease
Control and Prevention.
Comparison of the SAE estimates for drug
treatment to data from the NDATUS study.
Comparison of the SAE estimates of arrest
in the past year to UCR data on arrests.
Although external
data were available for only a subset of the estimates from the SAE study,
these comparisons were selected because they provided another vehicle
for assessing the quality of the entire estimation methodology.
3) Comparisons
of the final SAE model to other models
Another approach
that was used to evaluate the full SAE model was to compare it to other
models using the cross-validation goodness-of-fit tests and the comparisons
to external data. Six models were investigated including:
- The final SAE model: The composite
estimator which is approximated by a weighted combination of an indirect
logistic regression estimator and a local area effect that is a function
of the direct survey estimates. The indirect logistic regression estimator
was constructed by fitting models which contained county and block group
demographic characteristics which were hypothesized to have a relationship
to the substance abuse outcome variable. These are listed in Exhibit
2.1. The block group demographic characteristics comprised two types:
basic demographic variables (gender, age, and race/ethnicity) and the
other demographic variables. The local area effect is a function of
the actual NHSDA estimates for the local area. Because the 1991-1993
NHSDA included very large samples for six large US cities, separate
models were fit for these six large cities and the remainder of the
nation.
- The big-city-subsample model
is similar to the full SAE in that it is a composite estimator that
includes both an indirect and direct component; however, in this case
a subsample of the respondents from the six large cities was selected
and pooled with the full sample from the remainder of the nation before
fitting the models. This model was investigated because it reflects
the composition of continuing NHSDA surveys which do not have large
oversamples from these big cities.
- The SAE indirect estimator
drops the local area direct effects from the full SAE model. This
model was fit to evaluate the usefulness
of using the composite estimator. Differences between this model and
the full SAE model give one an idea as to whether adding the local area
effects to the model improves its ability to predict substance abuse.
Also, comparing this model with the next simpler model, the county demographic
model (below), measures the impact of the addition of the other demographic
variables.
Three
simpler models were used a basic demographic model,
a
demographic model that used six large city/non-large city splits, and
a model that included the basic demographics and county level predictions.
The basic demographic model
uses only demographic predictors of gender, age and race/ethnicity, and
applies the coefficients for the significant demographic predictors (main
effects and their interactions) to population distributions in the block
groups to estimate the prevalence in the block groups. Separate models
were not fit for the six large cities and the remainder of the
nation. This model is analogous to the simple
synthetic estimators and was constructed to investigate the gains that
come from using additional predictors (other than demographic characteristics)
in the logistic regression models.
The six large city/non-large city demographic
model is analogous to the basic demographic model in that it uses
only the demographic predictors gender, age, and race/ethnicity. However,
in this model, the six-big-city/non-six-big city sample split was used
to determine if it was possible to improve these estimates by reflecting
the fact that there is a different association between substance abuse
and demographic composition in the six-big-cities and the remainder of
the nation. Comparing this
model to the basic demographic model provides an indication of whether
or not simple demographic models could be improved if they were fit within
areas that were likely to have divergent associations between demographic
characteristics and substance abuse.
The county demographic model adds
county level predictors to the six-big-city/non-big-city demographic model.
This model was investigated to
determine if it was possible to improve over simple demographic indirect
estimators by including county level indicators of substance abuse. This
type of model could be a good candidate for monitoring drug use over time
because it is simpler than the full SAE model and because it may be possible
to monitor changes over time by observing changes in the county level
indicators of drug use.
Using these
three strategies - - goodness-of-fit tests, comparisons to external
data, and comparisons to other models - - we evaluated the SAE model
by: 1) carrying out goodness-of-fit tests for the final SAE model using
the evaluation subgroups, 2) conducting cross-validation goodness-of-fit
tests on the final SAE and several other models, and 3) comparing estimates
from the SAE model and the other types of models to external data by calculating
rank correlations and range ratios for the various sets of estimates.
The results are given in the next section.
4.2 Results of the Evaluations
Goodness-of-fit
tests for the final SAE model using evaluation subgroups:
Exhibit 4.1
summarizes the evaluations of the final SAE model based on the Goodness-of-fit
tests that were constructed by dividing the sample into evaluation subgroups.
Three statistics are presented:
The correlation between the sample mean
of the predicted values and the associated NHSDA direct estimates for
each of the evaluation subgroups. Values closer to 1 are better than smaller
values.
The Chi-Square probabilities from a comparison
of the observed (direct estimates) and predicted estimates for each of
the evaluation subgroups.
(footnote#22) Based on statistical theory, with a large number
of tests as is shown in Exhibit
4.1, if the distribution of the observed and predicted is the same,
about half of the χ2
probabilities would be above 0.5 and half below.
Exhibit 4.1 - - Summary of Evaluation of the Final
SAE Model Composite Based on Forming Evaluation Subgroups and Comparing
the Model Based Estimates with the Direct Survey Estimates. Correlations,
Chi-square Probabilities*, and Range Ratios
|
|
|
Age Group
|
|
|
NHSDA SAE
Outcome Measure
|
Statistic
|
12-17
|
18-25
|
26-34
|
35 Plus
|
All
Ages
|
|
Licit Drugs
|
|
|
|
|
|
|
| |
Past Month Cigarette Use
|
Correlation
|
0.917
|
0.952
|
0.952
|
0.946
|
0.978
|
| |
χ2 probability
|
0.038
|
0.054
|
0.136
|
0.846
|
0.788
|
| |
Range Ratio
|
0.870
|
0.934
|
1.016
|
1.027
|
0.985
|
| |
Past Month Alcohol Use
|
Correlation
|
0.871
|
0.960
|
0.972
|
0.981
|
0.990
|
| |
χ2 probability
|
0.056
|
0.066
|
0.049
|
0.984
|
0.759
|
| |
Range Ratio
|
0.878
|
0.872
|
0.894
|
0.968
|
0.948
|
|
Illicit Drugs
|
|
|
|
|
|
|
| |
Past Month Any Illicit Drug Use
|
Correlation
|
0.939
|
0.973
|
0.962
|
0.942
|
0.990
|
| |
χ2 probability
|
0.340
|
0.639
|
0.384
|
0.474
|
0.450
|
| |
Range Ratio
|
0.905
|
0.827
|
0.880
|
0.881
|
0.868
|
| |
Past Month Any Illicit But Marijuana Use
|
Correlation
|
0.916
|
0.926
|
0.941
|
0.845
|
0.973
|
| |
χ2 probability
|
0.611
|
0.089
|
0.774
|
0.566
|
0.609
|
| |
Range Ratio
|
0.923
|
0.801
|
1.028
|
0.821
|
0.879
|
| |
Past Month Cocaine Use
|
Correlation
|
0.824
|
0.878
|
0.920
|
0.903
|
0.970
|
| |
χ2 probability
|
0.864
|
0.094
|
0.803
|
0.611
|
0.607
|
| |
Range Ratio
|
0.856
|
0.740
|
0.880
|
0.889
|
0.849
|
|
Dependence
|
|
|
|
|
|
|
| |
Past Year Dependence On Illicit Drugs
|
Correlation
|
0.912
|
0.868
|
0.827
|
0.723
|
0.966
|
| |
χ2 probability
|
0.871
|
0.778
|
0.625
|
0.670
|
0.537
|
| |
Range Ratio
|
1.031
|
0.882
|
0.894
|
1.147
|
1.000
|
| |
Past Year Dependence On Alcohol
|
Correlation
|
0.787
|
0.898
|
0.943
|
0.927
|
0.978
|
| |
χ2 probability
|
0.221
|
0.007
|
0.688
|
0.523
|
0.020
|
| |
Range Ratio
|
0.892
|
0.716
|
0.899
|
0.878
|
0.807
|
|
Treatment
|
|
|
|
|
|
|
| |
Past Year Treatment For Illicit Drugs
|
Correlation
|
0.908
|
0.772
|
0.918
|
0.884
|
0.962
|
| |
χ2 probability
|
0.489
|
0.437
|
0.539
|
0.354
|
0.184
|
| |
Range Ratio
|
0.879
|
0.920
|
0.923
|
0.984
|
0.944
|
| |
Past Year Treatment For Alcohol
|
Correlation
|
0.776
|
0.842
|
0.763
|
0.878
|
0.948
|
| |
χ2 probability
|
0.428
|
0.680
|
0.612
|
0.037
|
0.159
|
| |
Range Ratio
|
0.978
|
0.748
|
0.899
|
0.814
|
0.826
|
| |
Needing Treatment In Past Year
|
Correlation
|
0.899
|
0.910
|
0.923
|
0.934
|
0.980
|
| |
χ2 probability
|
0.112
|
0.255
|
0.355
|
0.859
|
0.403
|
| |
Range Ratio
|
0.892
|
0.851
|
0.937
|
0.823
|
0.872
|
|
Arrest
|
|
|
|
|
|
|
| |
Past Year Arrested
|
Correlation
|
0.961
|
0.883
|
0.911
|
0.878
|
0.977
|
| |
χ2 probability
|
0.623
|
0.225
|
0.902
|
0.589
|
0.416
|
| |
Range Ratio
|
0.778
|
0.883
|
0.952
|
1.123
|
0.950
|
*Probability of observing the calculated difference
in the predicted and direct estimates across evaluation subgroups given
that there is no difference.
The ratio of the range of the predicted
values to the range of the direct estimates. Range ratios close to 1 indicate
better agreement between the predicted and the direct estimates. Ratios
larger than 1 indicate that the predicted values are more disperse than
the actual values and probably means that the model predictions are too
large for groups with highest prevalence and too small for the subgroups
with lowest prevalence. Range ratios smaller than 1 indicate that the
model fails to reflect the actual variability in prevalence across subgroups.
Summary
test statistics for 55 estimates are presented--for 11 outcome measures
by 4 age groups and overall for a total of 55 estimates. Examining Exhibit
4.1, it can be seen that all of the tests indicate that the
full SAE model works quite well.
In all but three of the cases the correlation
between the predicted and the direct estimates are above 0.8. In addition,
it can be noted that these correlations are slightly higher for the more
prevalent substance abuse measures.
The Chi-Square probabilities
show a similar picture. The actual median of the values in the table is
0.467.
The range ratios are quite good. This
shows that the final SAE model eliminates one of the major disadvantages
of the prior methods in that they failed to reflect the full range of
variation in the actual estimates.
Thus, the
evaluations summarized in Exhibit 4.1 indicate that the
final SAE model worked well. However, because the estimates within the
evaluation of subgroups are not independent, this may present an overly
optimistic picture of the quality of the estimates. The following cross
validation approach corrects for this lack of independence.
Exhibit 4.2
- - Summary of Evaluations of Final SAE Model and Comparisons to Alternative
Models Based on a Cross-validation Goodness-of-fit Tests. Correlations,
χ2-
Probabilities*, and Range Ratios.
|
SAE Outcome Measure
|
Statistic
|
Final Composite SAE Model
|
Indirect estimators
|
|
Final SAE Model
|
County Demographic
Model
|
Large City/
NonLarge City
Demographic Model
|
Basic
Demographic
Model
|
|
Past Month Cigarette Use
|
Correlation
|
0.765
|
0.737
|
0.622
|
0.514
|
0.584
|
|
χ2 probability
|
0.015
|
0.007
|
0.008
|
0.000
|
0.000
|
|
Range Ratio
|
1.095
|
1.084
|
0.956
|
0.440
|
0.272
|
| |
|
|
|
|
|
|
|
Past Month Alcohol Use
|
Correlation
|
0.866
|
0.858
|
0.832
|
0.824
|
0.839
|
|
χ2 probability
|
0.280
|
0.231
|
0.108
|
0.023
|
0.001
|
|
Range Ratio
|
0.969
|
0.841
|
0.966
|
0.685
|
0.524
|
| |
|
|
|
|
|
|
Past Month Any Illicit Drug Use
|
Correlation
|
0.728
|
0.704
|
0.659
|
0.573
|
0.637
|
|
χ2 probability
|
0.043
|
0.036
|
0.132
|
0.080
|
0.109
|
|
Range Ratio
|
1.618
|
1.500
|
1.038
|
0.510
|
0.310
|
| |
|
|
|
|
|
|
Past Month Any Illicit Drug Use But Marijuana
|
Correlation
|
0.636
|
0.615
|
0.392
|
0.212
|
0.297
|
|
χ2 probability
|
0.412
|
0.249
|
0.242
|
0.131
|
0.131
|
|
Range Ratio
|
1.450
|
1.451
|
1.213
|
0.256
|
0.153
|
| |
|
|
|
|
|
|
Past Year Treatment For Illicit Drugs
|
Correlation
|
0.588
|
0.481
|
0.407
|
0.532
|
0.561
|
|
χ2 probability
|
0.287
|
0.275
|
0.386
|
0.294
|
0.297
|
|
Range Ratio
|
1.420
|
1.509
|
1.348
|
0.278
|
0.219
|
| |
|
|
|
|
|
|
Past Year Arrested
|
Correlation
|
0.641
|
0.662
|
0.543
|
0.639
|
0.635
|
|
χ2 probability
|
0.418
|
0.399
|
0.324
|
0.278
|
0.165
|
|
Range Ratio
|
1.552
|
1.537
|
1.430
|
0.373
|
0.425
|
| |
|
|
|
|
|
|
MEAN
|
Correlation
|
0.704
|
0.676
|
0.576
|
0.549
|
0.592
|
|
χ2 probability
|
0.243
|
0.199
|
0.200
|
0.134
|
0.117
|
|
Range Ratio
|
1.351
|
1.321
|
1.159
|
0.424
|
0.317
|
| |
|
|
|
|
|
Note: These tests were restricted to the 26- to 34-year-old
age group due to the cost of computations.
*Probability of observing the calculated difference in the
predicted and direct estimates across evaluation subgroups given that
there is no difference.
Cross-validation
goodness-of-fit tests for the final SAE model and comparison to other
models:
Exhibit
4.2 summarizes the cross-validation analyses. (footnote#23)
In this table, we also introduce for the first time four of the other
models that were examined. For example, looking at the first row
of Exhibit 4.2, which shows the correlations between the
predicted and the direct estimates in the evaluation subgroups, we note
that the final SAE model has the highest correlation, the SAE model without
the direct local area effects the next highest, and so on.
Exhibit 4.2
- The cross-validation approach presents
a somewhat less optimistic view of the final SAE model than did the
goodness-of-fit tests using evaluation subgroups. The correlations are
lower and the range ratios are higher than those observed in Exhibit
4.1. This probably means that the final SAE model included some
predictors that should not have been included. This is sometimes called
A over-fitting@ in that some of the factors that were identified as
being good predictors of substance abuse in one sample were actually
not very good predictors when applied to a different sample.
(footnote#24)
- The models that used only the demographic
characteristics are the poorest performers in almost all cases. The
range ratios indicate that simply using demographic characteristics
to predict drug use in a small area will probably not reflect the true
range in prevalence estimates across the small areas. This indicates
that improvements can be achieved by including more predictors in the
models.
- The county demographic model worked
fairly well.
(footnote#25) Although the full SAE model was the overall best performer,
both the county demographic model and the SAE model without the direct
effects worked fairly well. The county demographic model estimates are
simpler to calculate than the SAE model without the direct effects;
therefore, it may be a good candidate for future SAE modeling activities.
Comparisons
to other sources of substance abuse data:
For each of
the six models examined, Exhibits 4.3 through 4.6
compare the estimated prevalences to those from other sources using both
the rank correlation and the range ratios. Methodological and definitional
differences between the NHSDA and the other sources are considered in
making these comparisons. Adjustments were made to the external data in
some cases, but it was not possible to fully account for the differences.
The other data sources and the adjustments made for comparison with SAE
estimates are described below:
Behavioral
Risk Factor Surveillance System (BRFSS):
Comparisons of the SAE estimates of the prevalence of past month alcohol
and cigarette use are made to estimates from the BRFSS without making
any adjustments. The BRFSS is a telephone survey conducted in all 50 States
under cooperative agreements with the Centers for Disease Control and
Prevention. Definitions used are comparable to NHSDA definitions, since
the BRFSS estimates reflect past month use. Studies have shown that reporting
of substance use behaviors may be lower in telephone surveys than in face-to-face
surveys, particularly for illicit drugs. The BRFSS State estimates are
simple averages over the three years 1991 through 1993.
National
Drug and Alcoholism Treatment Unit Survey (NDATUS):
SAE estimates of the number of persons receiving treatment for drug abuse
are compared to estimates constructed from NDATUS. NDATUS is an inventory
of all specialty substance abuse treatment facilities in the U.S. Based
on reporting by State substance abuse agencies, it provides estimates
(including adjustments for nonresponse) of the number of clients in treatment
at a given point in time. To develop an estimate of persons treated during
a year, the NDATUS client counts (including drug only and combined drug
and alcohol clients) were multiplied by the reciprocals of average lengths
of stay, and adjusted to account for multiple treatment episodes in a
year by the same individual. Estimates of length of stay and multiple
episodes were obtained from the Drug Services Research Survey, conducted
in 1990. These calculations were done within categories of treatment modality
and and applied separately to each state. No adjustment was made to account
for the inclusion in the SAE estimates of persons reporting treatment
through self-help groups, private physicians, or emergency rooms, none
of which are counted in NDATUS. The State estimates are averages over
only 1992 and 1993 since the 1991 estimates could not be adequately adjusted
for nonresponse.
Uniform
Crime Reports (UCR): The SAE
estimates of the number of persons arrested in the past year are compared
to estimates derived from the UCR. The UCR compiles data from local jurisdictions
on the number of arrests. For comparison with the SAE estimates, an adjustment
to the UCR data was made to account for persons arrested more than once
during a year, so the adjusted UCR estimates reflect number of persons
arrested at least once. This adjustment was made within the four Census
regions, using data on multiple arrests reported by arrestees in the NHSDA
sample. The State estimates are simple averages over the three years 1991
through 1993.
For each of
the six models examined, Exhibits 4.3 through 4.6
compare the estimated prevalences to those from other sources
using both the rank correlation
(footnote#26)
and the range ratios.
(footnote#27)
Of note in these
exhibits is the following:
Exhibit
4.3 presents the results
that compare the final SAE estimates for the BRFSS estimates for alcohol
use. The rank correlations between the final SAE estimates and the BRFSS
estimates are quite high (over 0.85) and the range ratio is good (nearly
0.6). This indicates that the SAE model produced estimates that are very
consistent with what is found in the BRFSS. In addition, both sources
estimate a similar prevalence of use at the national level with the NHSDA
estimates being somewhat larger. This higher level of reporting is consistent
with findings from methodology studies which have shown that telephone
surveys yield lower reports of use than self-administered surveys.
(footnote#28)
In
addition, we note that the rank correlations and range ratios are much
better for the final SAE model than the corresponding statistics for the
two demographic models. This indicates that the SAE model is performing
much better than typical synthetic estimators which states might construct
by applying the NHSDA rates of use to their population distribution.
The comparison
of the final SAE estimates to the BRFSS smoking data (Exhibit 4.4) presents
a similar picture. The range ratio is very good (over 0.9) and the rank
correlations are moderate (about 0.5). The fact that the BRFSS uses a
somewhat different question than the NHSDA may account for some of the
lack of comparability. Again, we observe that the NHSDA estimates higher
levels of cigarette use than the BRFSS, which is consistent with the difference
in interview methodology. The demographic model does not do as well as
the final SAE model, indicating that the SAE model is better than using
a synthetic estimator.
Exhibits
4.5 and 4.6
present the results for the past year drug treatment and past year arrest.
Although we again observe that the SAE models perform better than the
demographic models, the rank correlations are still only 0.38 for treatment
and 0.35 for arrest. These low correlations are probably due to the lack
of correspondence in methodology between the divergent data sources, the
low prevalence rates of these items, and the less restricted range across
States.
(footnote#29)
One of the
characteristics that is desirable in a small area estimation procedure
is that it produce estimates that adequately reflect the range of differences
across areas. Because the estimated NHSDA prevalence rates for treatment
and arrest are lower than the corresponding estimates from NDATUS and
UCR, the calculated range ratios are going to appear artificially small.
That is, the range ratios in Exhibits 4.5 and 4.6 are not a good indication
of how well the models reflect the range of differences across
areas because of the differences in overall prevalence levels
4.3 Summary of the Evaluations
As was noted
in the beginning of this section, a variety of approaches must be used
to evaluate small area estimation methods. Considering all of the evidence
presented,
(footnote#30)
the following findings are noteworthy:
- The full SAE model is generally the best model in that it tends to
have the highest correlations with the direct estimates, to adequately
reflect the range of prevalence, and to have exhibited few significant
differences when compared to direct estimates in Goodness-of-fit tests.
- The full SAE model is a better predictor for the more prevalent behaviors
than for the less prevalent behaviors and better at reflecting the differences
in prevalence rates across areas when there is a wider dispersion of
rates across the areas.
The demographic models are poor predictors
of substance abuse. The estimates presented in this report are much better
than States could achieve by simply applying the NHSDA prevalence rates
to the population distribution in their States.
Estimates of substance abuse are quite sensitive to method effects as
evidenced by the fact that the correspondence between SAE models and external
small area estimates was increasingly lower as the methodologies between
the two information sources were increasingly divergent.
Using a NHSDA sample that does not oversample the largest cities and
an associated global SAE model appears to work almost as well as the full
SAE model. This result bodes well for any future NHSDA small area estimator
projects based on data where the six cities were not oversampled.
The county demographic model is a very promising alternative to the
full SAE model particularly if modeling resources are limited. The county
level predictors are appealing since they can be updated from year to
year reflecting temporal trends.
Exhibit 4.3 - - Comparison
of Alternative Small Area Estimators of Prevalence of Alcohol Use to Direct
Survey Estimates from the Behavioral Risk Factor Surveillance System (BRFSS).
|
State
|
BRFSS
Estimate
|
Composite
Estimators
|
Indirect Estimators
|
Direct
91-93 NHSDA Estimates
|
|
Final SAE
Model
|
Big City Sub-Sampled Model
|
SAE Model
|
County Demog Model
|
Big City/
Remaind Demog Model
|
Basic
Demog Model
|
|
Total United States
|
50.90
|
53.46
|
53.59
|
53.43
|
53.01
|
53.41
|
53.40
|
53.01
|
| |
|
|
|
|
|
|
|
|
|
North East Region
|
|
|
|
|
|
|
|
|
|
New Jersey
|
56.60
|
59.94
|
60.25
|
62.03
|
60.32
|
52.62
|
52.91
|
61.10
|
|
New York
|
53.10
|
57.04
|
57.35
|
56.60
|
56.51
|
53.60
|
52.52
|
56.96
|
|
Pennsylvania
|
57.10
|
55.82
|
56.14
|
53.75
|
55.90
|
53.39
|
53.96
|
52.70
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
South Region
|
|
|
|
|
|
|
|
|
|
Florida
|
53.70
|
48.45
|
48.71
|
48.52
|
49.75
|
52.43
|
52.78
|
49.67
|
|
Georgia
|
35.90
|
48.57
|
47.04
|
48.21
|
47.48
|
52.28
|
52.78
|
48.78
|
|
Kentucky
|
32.40
|
41.18
|
40.48
|
44.26
|
40.97
|
54.18
|
54.79
|
32.03
|
|
Louisiana
|
46.00
|
49.40
|
49.77
|
44.13
|
44.66
|
51.53
|
52.02
|
56.62
|
|
North Carolina
|
36.10
|
46.73
|
45.18
|
48.17
|
46.44
|
52.49
|
53.01
|
43.04
|
|
Oklahoma
|
35.50
|
39.81
|
39.74
|
44.02
|
44.22
|
52.84
|
53.16
|
36.50
|
|
South Carolina
|
36.90
|
46.84
|
44.32
|
46.67
|
41.34
|
51.83
|
52.36
|
47.03
|
|
Tennessee
|
26.40
|
40.70
|
38.56
|
45.10
|
39.07
|
53.06
|
53.65
|
35.76
|
|
Texas
|
52.00
|
52.88
|
53.09
|
48.80
|
50.07
|
53.10
|
53.06
|
55.23
|
|
Virginia
|
| |