FAQs
1. Is “household head” a term no longer used by the SCF?
6. How are missing values (item nonresponse) dealt with in the SCF?
9. What adjustments should be made to regressions when using all five implicates of an SCF data set?
Q1. Is “household head” a term no longer used by the SCF?
A1. The term “reference person” is the new descriptor as of the 2019 survey, replacing the outdated “household head” terminology used in previous surveys.
Q2. I tried to import the SAS transport file created using PROC CPORT with StatTransfer and SPSS, but the error log says that NO transport file is found or transport file is bad. Is there something wrong with the SAS transport file?
A2. StatTransfer and SPSS do not support SAS transport files created using PROC CPORT. Only uncompressed SAS transport files created using PROC COPY with an XPORT option are supported by StatTransfer and SPSS. All SCF SAS data sets are available in both forms. SAS users can download either form of SAS transport files. However, StatTransfer and SPSS users should download the COPY/EXPORT version of the SAS transport files.
Q3. I downloaded a SCF SAS data set using a PC and I tried to use it on the Unix machine. My SAS error log reported "SAS transport file is bad". What is wrong with the data set?
A3. SAS transport files are not text files. They are binary files in ASCII format. If you downloaded the data sets using a PC, and attempted to use the data sets on an Unix machine, you must be certain that the binary format is preserved when transferring the data sets from the PC to the Unix platform.
Q4. I have tried to download files from the SCF Web site, but it does not seem to be working. What could be wrong?
A4. There are numerous reasons why you may have trouble downloading files from the SCF Web site. Because of the size of some of the files, it takes a substantial amount of time to download them. Internet problems may break the connection before the file is completely retrieved. It is best to compare the size of the downloaded file to the size reported on the SCF site to ensure you have received the complete file. If you do not receive the complete file, the best option is to attempt the download again. Also, some files require certain software for downloading. For example, you must have Acrobat Reader software to download PDF files from the SCF site. This software is available at no charge from Adobe.
Q5. I downloaded an SCF SAS data set on my PC, but the size of the zipped and unzipped data sets is slightly different than the amount of disk space given in the instructions. Is there something wrong?
A5. No. The given disk space is for users who download the data using Unix. Unix systems and PC systems store bytes differently. Thus, depending on the system used, the space required for each data set will be slightly different.
NOTE: To make sure you have downloaded the entire data set, you should always run a PROC CONTENTS on the data set.
Q6. How are missing values (item nonresponse) dealt with in the SCF?
A6. Missing values are replaced by use of a multiple imputation technique.
Q7. How do I compute statistics like means, medians, and frequencies using the weights and all five implicates?
A7. NOTE: The code below applies to the SCFs from 1989 forward.
There are two approaches: an exact method, and a method that is exact for means and frequencies but only a close approximation for medians.
For the exact method, compute the desired statistic separately for each implicate using the sample weight (X42001). The final point estimate is given by the average of the estimates for the five implicates.
The second method is given in the following steps using SAS (similar methods apply in other languages):
Begin by dividing the weight (X42001) by 5 so that the sum of the weights across all five implicates equals the correct population total.
LIBNAME IN 'scf data set library'; DATA SCF; SET IN.scf data set; WGT=X42001/5; RUN;
Next, using PROC UNIVARIATE with the FREQ option compute the desired statistic. NOTE: You must use the FREQ or WEIGHT option to compute weighted medians with PROC UNIVARIATE.
PROC UNIVARIATE DATA=SCF; FREQ WGT; VAR selected variables; RUN;
Or, using PROC MEANS with the FREQ option, compute the desired statistic. NOTE: The WEIGHT option can also be used with PROC MEANS, but the computed standard deviations will be unweighted. Consult the SAS Procedures Guide for more information.
PROC MEANS DATA=SCF; FREQ WGT; VAR selected variables; RUN;
Or, using PROC FREQ with the WEIGHT option, compute the desired frequencies or crosstabulations.
PROC FREQ DATA=SCF; WEIGHT WGT; TABLES selected variables; RUN;
Q8. How can I compute proper standard errors of estimates from the SCF data that account for both imputation and sampling variance?
A8. A SAS program to compute standard errors using replicate weights is provided in the 2004 Codebook. Similar code can be used for the 1989-2001 SCFs.
Q9. What adjustments should be made to regressions when using all five implicates of an SCF data set?
A9. Users who want to estimate regressions should be cautious in their treatment of the implicates. Many regression packages will treat each of the five implicates as an independent observation and correspondingly inflate the reported significance of results. Users who want to calculate regression estimates but who have no immediate use for proper significance tests (perhaps for exploratory work) could regress the average of the dependent and independent values across the implicates or multiply the standard errors of the regression (on all observations) by the square root of five.
For details on the proper treatment of the implicates in regression, see Multiple Imputation for Nonresponse in Surveys, by Donald B. Rubin, Wiley, 1987. Here we provide two examples to compute standard errors that account for multiple imputations in regressions.
A SAS MACRO written by Catherine Montalto and Jaimie Sung will compute regressions and standard errors accounting for multiple imputations. Further documentation for this model are given in their paper "Multiple Imputation in the 1992 Survey of Consumer Finances," Financial Counseling and Planning, Volume 7, 1996, pages 133-146.
A second approach, which is applicable to other types of models (probits, etc.) is given in the SAS MACRO MISECOMP and the Stata program StataMICode.do in the 2004 Codebook
Q10. The numbers I have computed do not exactly match the numbers reported in the various Federal Reserve Bulletin articles on the SCFs. Why?
A10. Estimates made using the public versions of the SCF data may differ from those reported in the Bulletin for two main reasons. First, some of the articles used preliminary data, which have been superseded by the final data provided on this site. Second, to protect the privacy of respondents, the data made available to the public have been systematically altered using statistical procedures that should not significantly affect the estimates made with the final public data. Where the differences are substantial, it is likely that either the user has made an error or the differences are actually not statistically significant. For more on disclosure review, see the following papers:
Multiple Imputation in the Survey of Consumer Finances
Multiple Imputation and Disclosure Protection: The Case of the 1995 Survey of Consumer Finances
Disclosure Review and the 1998 Survey of Consumer Finances
Analyzing the Disclosure Review Procedures for the 1995 Survey of Consumer Finances
The Challenges of Preparing Sensitive Data for Public Release
A second approach, which is applicable to other types of models (probits, etc.) is given in the SAS MACRO MISECOMP and the Stata program StataMICode.do in the 2004 Codebook
Q11. For the racial and ethnic identification variables in the data sets, what does the “other” category include?
A11. In Full Public SCF data sets, the other race group consists of a very racially/ethnically diverse set of families, including those identifying as Asian, American Indian, Alaska Native, Native Hawaiian, Pacific Islander, and other race. Because of small sample sizes, we do not have statistical power to further disaggregate this group of families. Because of varied composition of the other group and changes in its composition over time, readers should exercise caution when making inferences about the other race group.
In the Summary Extract SCF data sets, there are multiple race variables constructed for users. For the RACE variable, the other race group consists of a very racially/ethnically diverse set of families, including those identifying as Asian, American Indian, Alaska Native, Native Hawaiian, Pacific Islander, and other race. For the RACECL4 variable, the other race group consists of a very racially/ethnically diverse set of families, including those identifying as Asian, American Indian, Alaska Native, Native Hawaiian, Pacific Islander, other race, and all respondents reporting more than one racial identification. Because of small sample sizes, we do not have statistical power to further disaggregate this group of families for either race variable.
As an example, for the RACECL4 variable in 2016, families reporting more than one racial identification were the largest subgroup of the other or multiple race group (about 50 percent of families), followed by Asian families (about 30 percent of families), though the composition of this group varies over time. Because of the varied composition of the other group and changes in its composition over time, readers should exercise caution when making inferences.
Q12. I am interested in estimates for demographic groups or variable responses that are not included in the public dataset. Is there any way to compute these estimates?
A12. Often demographic groups or variable responses with small populations are not included in order to protect the confidentiality of the respondents. In addition, the number of observations is often insufficient to compute reliable estimates.