The Federal Reserve Board eagle logo links to home page
Occasional Staff Studies Logo of a normal distribution curve links to OSS home page
National Survey of Small Business Finances
SSBF index


The Bootstrap Data Set for the 1993 NSSBF
Bootstrapping has become the most widely used replication method. With many surveys, the bootstrap samples are constructed from available sampling information from the survey data themselves. For the 1993 NSSBF, however, creating the weights for the bootstrap samples requires the use of confidential data not available to the public. Therefore, a special data set containing 1,000 bootstrap samples was created to allow researchers to use bootstrap methods in their own estimations. Each bootstrap sample was drawn with replacement, so that in each bootstrap sample, some firms appear in the sample multiple times and some firms do not appear at all. In the data set BOOTSTRP, each of the 4,637 firms in the public data set have 1,000 multiplicity variables (labeled MULT1-MULT1000), which indicate how many times the firm was sampled in each of the bootstrap samples, and 1,000 weights (labeled BWGT1-BWGT1000), which were calculated so that the bootstrap sample weights sum to the target population.

SAS Transport File
The SAS transport file is compressed in two ways: Unix compress (.Z file) and PKZIP (.ZIP). Note: When uncompressed, the SAS transport file requires approximately 74 MB of disk space. Examples of programs to convert from the SAS transport file to a SAS data set on the Unix and Windows platforms are provided in the file transport.
SAS data for PCs (25.8 MB .ZIP)
SAS data for Unix (31.6 MB .Z)

ASCII Flat File
The ASCII flat file is also compressed in two ways: UNIX compress (.Z file) and PKZIP (.ZIP). Note: When uncompressed, the ASCII flat file requires approximately 49 MB of disk space.

Further information regarding the ASCII (20.2 KB ASCII)
List of variables (15.8 KB ASCII)
ASCII data for PCs (18.8 MB .ZIP)
ASCII data for Unix (18.8 MB .Z)

Example: Using the 1993 NSSBF Bootstrap Data Set
The number of bootstrap replicates required depends on the needs of the researcher and the sparseness of the data. When the statistics are simple, such as the variance of a single variable, and the responses are mostly non-missing, 100 bootstrap samples will usually suffice. For complicated functions of variables, or for sparsely reported data, more replicates may be needed.

The following example, showing how to use the BOOTSTRP data set to estimate the variance of a statistic, may be helpful. We assume that the researcher is already familiar with bootstrap estimation. To estimate the mean assets of firms in the target population using the variable ASSETS from the data set NSSBF93, we first compute the weighted mean of ASSETS using the final weight variable FIN_WGT. Most statistical packages also estimate the standard error of the estimate, but for this sample the researcher realizes that, for the purpose at hand, the estimate for the standard error is incorrect and decides to estimate the standard error of the mean using bootstrap methods. Because the variable ASSETS is reported for almost every firm in the sample, the researcher decides that 100 bootstrap replicates will do.

The first bootstrap mean is computed using the first bootstrap replicate in BOOTSTRP. A temporary data set is first created from BOOTSTRP with two variables: PWCODE and WEIGHT. This temporary data set has one observation for firms with MULT1=1, two observations for firms with MULT1=2, and so on. Firms with MULT1=0 are dropped from this temporary dataset. The variable BWGT1 is renamed to WEIGHT. The resulting data set might look like this:
 
 

PWCODE
 WEIGHT 
34123  1871.230
41290    891.345
41290    891.345
20013    367.891
 

This data set is then merged on PWCODE with the NSSBF93 data set (dropping all observations not in the temporary bootstrap replicate data set). The weighted mean of ASSETS is computed again, using the bootstrap weight variable, WEIGHT. This bootstrap mean is saved.

This process is repeated 99 times. The process generates a total of 100 bootstrap estimates of the mean. The 100 bootstrap estimates are then used to compute the standard error (or variance) of the mean.

Comments, questions, or Problems?
Please contact us.




Home | Surveys | OSS
Accessibility | Contact Us
Last update: November 1, 2006