This section provides an evaluation of whether credit scoring in general, and the factors included in credit-scoring models in particular, may result in negative or differential effects on specific subpopulations and, if so, whether such effects could be mitigated by changes in the model development process. As stated earlier, a credit characteristic in a credit-scoring model has a differential effect related to a particular demographic characteristic if the weights assigned to that credit characteristic differ from the weights that would be estimated in a demographically neutral environment. Thus, identifying such credit characteristics requires the estimation of both the FRB base model and models estimated in demographically neutral environments. These model estimations allow an evaluation of the differential effect for all credit characteristics that are included in the FRB base model. In addition, inferences about credit characteristics not included in the FRB base model can be gleaned by incrementally adding such characteristics one at a time to the existing model and determining their effect on the credit scores of different population groups.
Results in this section cover several different topics. First, descriptive information is provided on the univariate relationship between credit characteristics and both performance and demographics. Second, an assessment is made of the extent to which differences in mean credit scores across different population groups can be attributed to individual credit characteristics included in the FRB base model. Third, an assessment is made of the effect on different groups that would result from dropping each of the credit characteristics included in the FRB base model from the model. Fourth, a similar analysis of adding each excluded credit characteristic to the FRB base model is presented. Each of these four topics provides interesting descriptive information, but, as stated, the full assessment of differential effect requires the estimation of models in demographically neutral environments. Such analysis is provided in the next two subsections, but the focus is limited to race or ethnicity and age, which exhibited the highest potential propensity to experience a differential effect. (Sex was also tested, but the results showed little evidence of differential effect and are not presented). The final subsection discusses the implications of finding differential effects and ways in which they might be mitigated.
As stated earlier, for a credit characteristic to have a differential effect for a particular demographic population, the credit characteristic at a minimum must be correlated with both the demographic characteristic and performance. Technically, such an assessment should be made in a multivariate environment controlling for other credit characteristics included in the model. However, univariate correlations of both of these relationships can provide useful insight into which credit characteristics are most likely to raise concerns regarding differential effects. In this section, we examine the correlations of each of the 312 credit characteristics provided by TransUnion both with subsequent credit performance and with each demographic characteristic considered in the study.
The first step of the analysis of correlations examines each of the credit characteristics to identify the degree to which they are correlated with performance and with demographic characteristics. Those that are found to have a high correlation with both are possible sources of a differential effect. Because performance and demographic characteristics have arbitrary signs, the correlations are expressed as positive values ranging from zero to 1. For those demographic characteristics that are categorical in nature and take on more than one value, such as race or ethnicity, multiple correlations are computed using a base group. For example, for race and ethnicity, non-Hispanic whites are the base, or comparison, group. Thus, the variable black versus non-Hispanic white is correlated with each credit characteristic as well as Asian versus non-Hispanic white and so on for each minority group..129
The twelve panels of figure 11 are scatter plots of the correlation of each credit characteristic with performance and with a demographic characteristic. Credit characteristics that appear above the 45-degree line are more correlated with performance than with the demographic characteristic, and credit characteristics below the line are more correlated with demographic characteristics than performance. For purposes of exposition, each credit characteristic is coded according to its assignment to one of the five distinct credit-characteristic groupings identified by Fair Isaac as discussed above. The twelve panels of figure 12 display the same correlations as those in figure 11, but for just the 19 credit characteristics that constitute the three scorecards of the FRB base model.
For race and ethnicity, almost all of the credit characteristics appear above the 45 degree line (that is, are more correlated with performance than with the demographic characteristic) regardless of the specific group considered. Indeed, most credit characteristics are only minimally correlated with race and ethnicity, many are not correlated at all, and none are highly correlated. A virtually identical result is found when the census-tract proxy for race or ethnicity is used as a substitute for an individual's race and ethnicity.
For the comparison of performance on accounts held by blacks with the performance on accounts held by non-Hispanic whites, the characteristics that are most correlated with both performance and race are all related to past payment history. Each of these characteristics is also highly correlated with performance. With respect to the analysis for other racial or ethnic categories, most of the credit characteristics are not correlated at all, a few are only minimally correlated, and none are highly correlated.
The relationships for age differ significantly from those for race and ethnicity. Many credit characteristics are highly correlated with age. Most of the credit characteristics that appear to be highly correlated with age involve characteristics from the "length of credit history" group defined by Fair Isaac, such as "total number of months since the oldest account was opened," and several come from the four other credit characteristic groups. Some credit characteristics, such as "total number of months since the most recent account delinquency," which belongs in the payment history group, have aspects of credit history length in them. Other credit characteristics, however, such as one representing the ratio of revolving balance to high credit, which is in the "amounts owed" group, have no clear connection to length of credit history. These univariate results suggest that several credit characteristics are candidates for introducing differential effect across age groups.
Results for sex show that the vast majority of credit characteristics are much more highly correlated with performance than with sex. However, a significant number of credit characteristics, each involving a department store or retail trade account, exhibit correlations of more than 0.2 with sex, though each of these characteristics is only minimally related to performance. For marital status, the results are similar to those for race or ethnicity in that most credit characteristics are only minimally correlated with marital status.
The analysis of location characteristics finds that few credit characteristics are related to any significant degree to the proportion of minority population in the census tract, relative census-tract income, or degree of urbanization. Also, almost all of the credit characteristics show little or no correlation with foreign-born and recent immigrant populations. The few credit characteristics that are at least somewhat correlated with these demographic characteristics all involve characteristics related to the length of an individual's credit history.
The correlations for the characteristics included in the FRB base model exhibit patterns similar to those shown for the credit characteristics not included in the model. Regarding race and ethnicity, correlations between the demographic characteristics and credit characteristics are generally quite low. None of the correlations exceed 0.1, and nearly all are much smaller. The only racial group that appears to have any notable correlations between demographics and credit characteristics included in the model is blacks, but, even for this group, none of the correlations is substantial. Patterns for the race proxy, sex, marital status, and foreign-born status are similar to those for individual race and ethnicity. None of the credit characteristics included in the FRB base model is highly correlated with these demographic characteristics.
Findings regarding age, however, are notable. Several of the credit characteristics included in the FRB base model have relatively strong correlations with age, especially characteristics included in the "length of credit history" group or indirectly related to the individual's length of credit history. Such correlations are not surprising because younger individuals, by definition, have had only a relatively short time in which to establish credit histories.
In this section we examine the extent to which differences in mean credit scores across populations can be attributable to the different credit characteristics in the model. We first decompose mean credit-score differences across populations into differences in the distribution of individuals in each population across the three scorecards used in the FRB base model (thin, clean, and major derogatory) and differences in the mean scores for each population within each scorecard. For the second decomposition, for each scorecard, we decompose differences in the mean score into differences in the predicative credit characteristics that are used in the scorecard.
The first decomposition has two stages. In the first stage, the portion of the mean credit-score differences that is attributable to disproportionate representation on the thin-file and major-derogatory scorecards is derived by calculating the change in the score that would have resulted if each population had the same mean score on each scorecard. Because mean scores are, on average, lower on the thin-file and major-derogatory scorecards than on the clean-file scorecard, population groups that have proportionately greater representation on these scorecards will have lower mean scores, even if all of the populations have the same mean scores on each individual scorecard. The second stage takes the remaining difference and attributes it to differences in population mean scores within each scorecard. The credit characteristics are sorted into five groups that are consistent with the groups of credit characteristics discussed above in the derivation of the FICO score. These calculations result in five sources of credit-score differences that will sum exactly to the total difference in mean scores across population groups.
Results are shown as a decomposition of the difference in scores between individuals in each population and a "base" group (table 27). For racial and ethnic groupings, the base group is non-Hispanic whites; for national origin, it is non-foreign-born; for sex, males; for marital status, married males; for age, individuals aged 62 or older; for census-tract income, middle-income tracts; for tract minority percentage, tracts with a minority population less than 10 percent; and for degree of urbanization, urban census tracts.
Looking across populations, the largest differences are between blacks and non-Hispanic whites and between individuals younger than age 30 and those aged 62 or older. The following discussion focuses on these two comparisons, although the tables present differences for all populations.
The difference in mean FRB base score between blacks and non-Hispanic whites, 28.3 points, is primarily due to the differences in the population distributions on the different scorecards. More than half of the point difference is attributable to the fact that blacks have the higher representation than non-Hispanic whites on the thin-file and major-derogatory scorecards combined, and most of that higher representation comes from the major-derogatory scorecard. Differences in mean scores within each scorecard are also substantial. A similar pattern is observed for the differences in scores between Hispanics and non-Hispanic whites.
Scorecard differences account for a portion of the differences in mean credit scores across age groups. However, patterns are different than those found for race or ethnicity. Young individuals are disproportionately represented on the thin-file scorecard, but that is not the major factor explaining score differences between those younger than age 30 and the base group. As noted below, differences in mean scores within scorecards is the source of most of the difference in overall mean scores between the young and the old.
For all comparisons among populations, differences in mean scores within scorecard play an important role. Mean differences across the three scorecards are generally of the same sign, although magnitudes vary. Groups that are disproportionately represented on the major-derogatory scorecard also have lower mean scores on the three scorecards, with one glaring exception: Recent immigrants are overrepresented on the clean-file scorecard but have much lower mean scores within the clean-file scorecard than either other foreign-born or non-foreign-born individuals.
The major-derogatory scorecard accounted for the largest portion of the difference in mean scores between blacks and non-Hispanic whites of the three scorecards. Differences in mean scores within the major-derogatory scorecard accounted for almost one-fifth of the total difference in mean scores between blacks and non-Hispanic whites.
For age differences, the largest portion of the difference between individuals younger than age 30 and those aged 62 or older derives from differences in mean scores between these two groups within the clean-file scorecard.
The second decomposition is scorecard specific and focuses on individual credit characteristics. For each scorecard, we attribute differences in the mean scores across demographic groups to specific individual credit characteristics (tables 28.A--C).
For the thin-file scorecard, a difference of 3 points in mean scores on this scorecard was found between non-Hispanic whites and blacks. More than 80 percent of this difference is accounted for by three credit characteristics ("the total number of public records and derogatory accounts with an amount owed greater than $100," "total number of months since the most recent account delinquency," and "percentage of total remaining balance to total maximum credit for all open revolving accounts reported in the past 12 months"). Almost two-thirds of the 6.8 point difference in mean scores on the thin-file scorecard between younger and older individuals is due to the same three credit characteristics. For all other groups, mean differences in credit scores across populations on the thin-file scorecard are small (at most a couple of points) and complex, as the effects of credit characteristics are often in different directions.
For the major-derogatory scorecard, three credit characteristics ("total number of public records and derogatory accounts with an amount owed greater than $100," "total number of months since the most recent account delinquency," and "percentage of accounts with no late payments reported") are found to account for more than 60 percent of the difference between blacks and non-Hispanic whites on that scorecard. All other credit characteristics played some role, but no other individual characteristic accounted for as much as 10 percent of the mean score difference within that scorecard. Some differences across age cohorts also appear on this scorecard The credit characteristic that accounts for the largest portion (about one-fifth) of the age difference is "average age of accounts on credit report."
The clean-file scorecard contains significant differences in mean scores across age cohorts. The credit characteristic that accounts for the largest portion of the difference in the mean scores between those younger than age 30 and those aged 62 or older is "average age of accounts on credit report." As noted above, recent immigrants had substantial differences in mean score within the clean-file scorecard. More than two-thirds of this difference can be attributed to differences in the credit characteristic "average age of accounts on credit report." The differences between blacks and non-Hispanic whites on the clean-file scorecard arise primarily from "total number of months since most recent account delinquency" and "percentage of total remaining balance to total maximum credit for all open revolving accounts reported in the past 12 months."
Differences in mean credit scores across populations can also be decomposed into the portions attributable to each of the five groups of credit characteristics designated by Fair Isaac: (1) types of credit in use, (2) payment history, (3) amounts owed, (4) length of credit history, and (5) new credit. The within-scorecard differences in mean credit scores across population groups can be aggregated across the three scorecards (table 29). The results from these credit-characteristic-group decompositions are similar to those for individual credit characteristics. Of the 11.0 point difference in mean credit scores between blacks and non-Hispanic whites that is attributable to within-scorecard differences, 7.7 points, or 70.2 percent, of the difference derives from credit characteristics related to the group "payment history."
The within-scorecard difference in mean credit scores by age, which were as high as 22.1 points between those younger than age 30 and those aged 62 or older, are primarily attributable to differences in credit characteristics related to the group "payment history," which accounts for 10.9 point, or 49.5 percent, of this difference. That group of credit characteristics also explained an even higher share (about 60 percent) of the (smaller) within-scorecard differences in mean credit score between the other age groups and individuals aged 62 or older.
The final population with relatively large within-scorecard differences in mean credit scores was recent immigrants. The credit characteristic group that contributes the most to the difference in mean credit scores between recent immigrants and non-foreign-born individuals is "length of credit history," which accounts for 12.4 points. About half of the 12.4 point difference in mean credit scores between recent immigrants and non-foreign-born individuals is offset by higher mean scores for recent immigrants in the credit characteristic group "payment history." As a result, the overall within-scorecard difference in mean credit scores between recent immigrants and non-foreign-born individuals is 8.4 points.
The previous section examined the extent to which differences in mean credit scores across demographic groups could be attributed to specific credit characteristics. Another way of providing an inference about the potential for credit characteristics to have differential effects is to examine what the effect would be on the scores of each demographic group if each credit characteristic included in the model were dropped in turn. Also, the effects of dropping groups of related credit characteristics are evaluated. As in the preceding exercise, this evaluation must be conducted separately for each scorecard.
The analysis required two steps. First, each of the three scorecards of the FRB base model was reestimated (and renormalized to a rank-order scale of zero to 100) by dropping each included characteristic one at a time. Credit scores derived from each of the models that exclude an individual characteristic for each population are compared with scores from the original FRB base model to determine how the exclusion of that characteristic affects scores across demographic groups. If the excluded characteristic is highly correlated with a demographic characteristic, then the scores of individuals with that demographic characteristic should change substantially. This process is repeated for each of the credit characteristics on each of the three scorecards of the FRB base model.130
Results of this analysis indicate that, for most populations, dropping any single characteristic has only a slight effect on credit scores, typically 1 point or less (tables 30.A--C). Thus, such changes have little effect on differences in mean score between population groups. The small change in scores when a single characteristic is dropped reflects the high degree of correlation among the characteristics in the scoring model. The small effect of dropping a single characteristic holds across the three scorecards.
One exception to this pattern occurs on the clean-file scorecard and affects age groups and foreign-born individuals. Specifically, dropping the characteristic "average age of accounts on credit report" and reestimating the clean-file-scorecard model significantly raises mean credit scores for individuals on the clean scorecard younger than age 30 (5.4 points) and recent immigrants (6.7 points). The effect of dropping this credit characteristic is smaller for other groups and both raises and lowers scores. The net effect is to reduce the differences in mean score on the clean-file scorecard between individuals younger than age 30 and those aged 62 or older by about 7 points, or about one-fourth. Also, dropping the credit characteristic "average age of accounts on credit report" reduces the differences in mean score on the major-derogatory scorecard between individuals younger than age 30 and those aged 62 or older by about 2.5 points, or about one-fifth.
The analysis was extended to consider the effects of dropping groups of related credit characteristics as defined by Fair Isaac. The effects of dropping groups of credit characteristics were largely similar to the effects found when individual characteristics were dropped from the FRB base model. While changes in scores were somewhat larger when a group of characteristics was dropped, for the most part, the effects on credit scores were small for all populations. For example, for blacks, the group of credit characteristics whose exclusion had the largest effect on mean scores on the thin and major-derogatory scorecards were those related to "payment history" that raised the mean credit score for blacks by over 5 points on the thin-file scorecard and about 2 points on the major-derogatory scorecard (tables 31.A--C).
Large changes in mean credit scores by age and for recent immigrants were observed when the group of credit characteristics related to "length of credit history" was dropped from the clean scorecard or the major-derogatory scorecards. (Only one credit characteristic from this group appeared on these two scorecards, and it was the same characteristic). The largest differences for these two demographic groups were observed on the clean-file scorecard where the exclusion of credit characteristics relating to "length of credit history" raised the mean credit scores of those younger than age 30 by 5.4 points and those of recent immigrants by 6.7 points. Notably, the net result of dropping the group of credit characteristics related to "length of credit history" is to narrow the difference between the mean credit scores of recent immigrants and non-foreign-born individuals on the clean-file scorecard from 14.6 points to 7.6 points (data for non-foreign-born individuals are not shown in tables).
The analysis up to this point has been limited to credit characteristics in the FRB base model. In this section, we examine the effect on credit scores of adding other characteristics one by one to each scorecard. The model for each scorecard was reestimated (and renormalized) with the addition of a particular characteristic not in the base model for that scorecard, and the resulting credit scores were compared with those from the FRB base model.
Across population groups, credit scores change very little following the addition of a new credit characteristic. None of the additional credit characteristics changed the mean credit score for blacks on any of the three scorecards by more than 0.39 point (tables 32.A--C). In fact, on the major-derogatory scorecard, on which more than three-fifths of blacks are scored, the largest change in mean scores was a decrease of 0.1 point, which resulted when the characteristic "total number of finance installment accounts" was added to the model. For Hispanics, the results were largely the same, though the changes in mean scores on each of the three scorecards generally varied over a somewhat wider range than for blacks.
The changes in mean scores resulting from the above process were generally larger (both positive and negative) for age groups than for racial and ethnic groups. The range of changes was still small, however. The largest negative effect on the mean scores of any age group came from the inclusion of the credit characteristic "average balance of all open accounts reported in the past 12 months" on the thin-file scorecard, which produced a 1.78 point decline in the mean scores of individuals aged 62 or older. The largest positive effect came from the addition of the credit characteristic "total number of months consumer has had a credit report" to the thin-file scorecard, which raised the credit scores of individuals aged 62 and older by 1.24 points.
Although none of the credit characteristics that were omitted from the FRB base model was found to have a significant effect on mean credit scores for any demographic group, those credit characteristics that related specifically to finance company trades that were not in the model were identified to the extent possible and analyzed in detail because of concerns that have been raised publicly about their potential for a differential effect on blacks. Of the 312 credit characteristics included in the TransUnion data, 24 relate specifically to credit accounts involving finance companies (table 33). Both positive and negative changes in the mean credit scores of blacks result from the addition of each of the omitted credit characteristics related to finance companies, although the largest change was a decrease of only 0.1 point from the addition of the credit characteristic "total number of finance installment accounts" on the major-derogatory scorecard.131 The largest positive change in mean scores for blacks was only 0.09 points, and came from the addition of either of two characteristics, "percentage of total remaining balance to total maximum credit for all open personal loan accounts" and "total number of finance installment accounts" on the clean-file scorecard.
In the previous sections, the potential for individual credit characteristics to have a differential effect was explored by dropping or adding such characteristics one by one from the FRB base model and, after each removal or addition, evaluating the change in credit scores for different populations and the overall fit of the model. Although inferential, these analyses do not provide a definitive assessment of differential effects for different populations and credit characteristics. As stated earlier, a definitive assessment requires a comparison of the weights credit characteristics receive in the FRB base model with those that would be estimated in a demographically neutral environment. These assessments can be made for individual credit characteristics. Assessments can also be made for the model as a whole by examining changes in mean credit scores for different populations using both the FRB base model and models estimated in demographically neutral environments. Assessments made for the model as a whole reflect the collective differential effect arising from all of the credit characteristics included in the model.
Because of the lack of evidence for sex-based differential effect, the detailed results are not presented here. The remaining analysis focuses on the protected populations--the racial or ethnic groups and the age groups--which, as discussed in the previous section, exhibited the highest potential propensity to experience a differential effect.132 Consequently, additional estimations were conducted in a "race neutral" environment (meaning racially and ethnically neutral) and in an "age neutral" environment.
The general approach taken was the same for both race and age estimations. The credit characteristics and attributes of the FRB base model were frozen and the attribute weights reestimated (and scores recalculated) in demographically neutral environments. For each group, two different concepts of demographic neutrality were employed. The first way of creating neutrality was to restrict the estimating sample to a single demographic group. For the racial assessment the sample was restricted to non-Hispanic whites (the "white only" model).133 For age, the estimating sample was limited to individuals aged 40 or older (the "older age" models).
Restricting the sample for the white-only and older-age models has the virtue of ensuring that the estimation of associated weights does not reflect correlations between credit characteristics and other racial, ethnic, or age groups. A disadvantage is that the estimated attribute weights reflect the relationship between performance and credit characteristics only for the population group used in the estimation. In the present case, another disadvantage is that the sample sizes are smaller.
In the second way of creating neutrality, the entire sample is used for the estimation, but in reestimating the attribute weights the estimations include shifts in the racial intercept (the "racial-indicator variable" model) or shifts in the age intercept (the "age-indicator variable" model). The shifts in the racial or age intercepts are used only in model estimation; they are not used in creating credit scores.
The race- and age-indicator-variable models have the advantage of using the full sample and of using all population groups in estimating the relationship between performance and credit characteristics. A disadvantage of this method is that race- and age-neutrality is defined very simply as a shift in mean credit scores in which everyone in the same racial or ethnic group or age group experiences an identical shift (up or down) in their scores. This common shift precludes accounting for the more-complex ways that age or race may affect model estimation.
Reestimating the attribute weights in demographically neutral environments is not a complete test of the potential for differential effect. It is possible that the presence of a large differential effect could mute the importance of a credit characteristic, and consequently that characteristic might not be included in a model estimated in a demographically neutral environment. To test for this possibility, each of the credit characteristics not included in the FRB base model was added, one at a time, to the race- and age-neutral versions of the model, and their effects on scores for different populations were evaluated. This process was identical to the process described earlier when the effects of adding credit characteristics to the FRB base model were evaluated.
The interpretation of the results in this section focuses on the implications for a differential effect. As discussed above, if none of the credit characteristics in a scoring model impose a differential effect, then model results estimated in a demographically neutral environment would be nearly identical to those estimated with the entire sample or estimated without controls for personal demographics. That is, credit scores and the weights assigned to attributes should change little. Also, the overall predictiveness of the model should also be largely unaffected.
Alternatively, one or more of the credit characteristics included in the model might embody at least some element of differential effect for age, race, or ethnicity. In that event, two effects should be observed when the model is estimated in a demographically neutral environment: (1) The overall model predictiveness should weaken and (2) some change should appear in the relative scores across populations groups. The implication of this second item is that those groups whose scores rise are the groups that are hurt by differential effect; the groups that experience a decline in scores benefit from differential effect. Finally, if differential effect works by muting the effects of a credit characteristic, then adding the muted characteristic to the FRB base model in a demographically neutral environment should increase the predictiveness of the model and change mean scores of some groups.
As described previously, one aspect of differential effect is model fit or predictiveness. There are several different ways that the predictiveness of models can be compared. One is with the KS statistic and another is with the divergence statistic. A third way is to look at changes in the distribution of scores for individuals with good performance and for individuals with bad performance; these changes can be examined in different ways. Also, the performance measure and sample over which model fit is assessed must be defined. Here, we assess the predictiveness of each model for the full sample of 232,467 individuals using the five performance measures defined earlier.
A comparison of the KS statistics for different populations using the FRB base model reveals relatively small differences across groups (table 34). We present two different versions of KS statistics. The first column is the "raw" KS statistic for each population. The use of this statistic can be problematic in comparing fit across different groups since it is affected by the distribution of credit scores within a population group. The second column shows a normalized or "adjusted" KS statistic that displays what the KS statistic would be if each population group were reweighted to have the same overall score distribution as the population as a whole. The adjusted KS statistic is the more meaningful one to use in comparing model fit across different models.
A comparison of either KS statistics or mean score differences between goods and bads (the numerator in the calculation of the divergence statistic) between the FRB base model and the two racially neutral models shows virtually no difference in fit (table 35). Further, examining the mean scores for individuals with good or bad performance reveals that the mean scores are almost identical between the FRB base model and either of the two racially neutral models.
The second part of assessing differential effect is to look at changes in credit scores between the FRB base model and the racially neutral models. Descriptive statistics by racial group for the FRB base model and the two racially neutral models indicate that there is virtually no difference between the group mean and median scores or distribution by decile across the models (table 36). For example, mean scores for blacks are only 0.1 point higher for the white-only and racial-indicator-variable models. Score changes are also quite small when the population is segmented by credit-score quintile (table 37). Overall, only about 2 percent of individuals have a score change of 5 points or more (and virtually none of the individuals in the bottom 2 credit-score quintiles change scores by 5 points or more).
Another way of looking at differential effect is to examine changes in mean performance residuals for different population groups (table 38). Because performance residuals reflect the average difference between actual performances for each racial group and the predicted performance at each score level based upon the entire population, changes in these residuals can only occur if credit scores change for the population group when estimated in a demographically neutral environment, thus reflecting differential effect. Performance residuals are virtually unchanged for blacks or other racial groups in each of the two racially neutral models.
In contrast to race, it appears that mean credit scores and performance residuals for recent immigrants differ between the models estimated in a race-neutral environment and the FRB base model. Notably, mean credit scores for recent immigrants are 0.3 point higher in the two racially neutral models, and their overperformance declines also by about 0.3 percentage point. This result suggests that the FRB base model embeds a slight negative differential effect, as measured by the treatment of this group in a racially neutral environment. This pattern is found only for recent immigrants, as scores and performance measures for foreign-born individuals as a whole are unchanged.134
Tests of adding credit characteristics to the white-only and the racial-indicator-variable models showed no evidence of important excluded credit characteristics. Results are not presented since they are virtually identical to those presented in the previous section, where credit characteristics were added to the FRB base model in a non-demographically neutral environment.135
Differential effects and race or ethnicity. There is little evidence from the analysis here that any of the credit characteristics included in the FRB base model embeds negative differential effects for any racial or ethnic group or that any important credit characteristic was left out of the model because a differential effect muted its predictiveness. Performance residuals and mean credit scores by group are virtually unchanged between those estimated using the FRB base model and either of the racially neutral models. Further, the lack of a differential effect is also evidenced by the lack of improvement in predictiveness in moving to the FRB base model from the racially neutral models. The lack of a differential effect for race or ethnicity appears to be driven mainly by the lack of correlation between credit characteristics and race or ethnicity.
These results strongly suggest that, in the aggregate, there is no differential effect for race or ethnicity in the FRB base model. Nonetheless, it may be possible that there may be offsetting effects among credit characteristics that go in different directions. To investigate this possibility, we compared the attribute weights assigned in the FRB base model with those estimated for the racially neutral models. The differences in the weights assigned to the attributes are minor.
For example, differences for the finance company credit characteristic, "total number of open personal finance installment accounts reported in the past 12 months," a credit characteristic for which concerns have been raised, show virtually no difference for the three models (table 39). Further, dropping the finance company credit characteristic would have an adverse effect on model predictiveness. This can be seen by examining changes in the evaluation of good performers and bad performers between the FRB base model and the model dropping the credit characteristic, "total number of open personal finance installment accounts reported in the past 12 months." The loss of predictiveness is shown by a comparison of the sum of the total percentage of bad performers that have score decreases plus good performers that have score increases with the sum of the percentage of bad performers whose scores increase plus good performers whose scores decline: The greater the difference, the greater the loss in predictiveness. Results from dropping the characteristic "total number of open personal finance installment accounts reported in the past 12 months" from the clean scorecard are shown in table 40. For each racial or ethnic group, as well as for the total population, the percentage of individuals whose scores move 1 point or more in the direction of improved model predictiveness is significantly smaller than the percentage of individuals whose scores move 1 point or more in the direction that implies less model predictiveness.
A similar differential effect analysis was conducted for the age of individuals. A slight modification to the process had to be made, as age is a continuous rather than a categorical variable. The model was estimated using only individuals aged 40 or older as the restricted sample, an approach comparable to that for the restricted sample used to estimate the white-only models. However, even the restricted age sample still has some variation due to age and thus is not completely age neutral. To account for this age variation, the older-person model was estimated with age-indicator variables for each year from age 40 to age 75 and then in five-year intervals up to age 90, with a final indicator variable for those older than age 90. The full age-indicator-variable model was also estimated using the entire population with the same age-based indicator variables as used in the older-age model, but with additional indicator variables for each age between 18 and 39 and with an additional indicator variable for those younger than age 18.
There appears to be no change in overall predictiveness for the age-neutral models relative to the FRB base model (table 41). The result holds both when model predictiveness is measured by KS statistic or by the relative mean scores of individuals experiencing good or bad performance. Indeed, the KS statistics for the age-neutral models actually increase by 0.1 point over the FRB base model.
Although overall predictiveness does not change when credit scores are estimated in an age-neutral environment, mean scores of some groups do change (tables 42 and 43). For example, the mean score of individuals younger than age 30 falls 0.4 point when the age-indicator-variable model is compared with the FRB base model. However, the scores of individuals aged 62 or older increased by 1.5 points.136 Changes in mean performance residuals are consistent with the score changes (table 44). For example, underperformance of individuals younger than age 30 falls from 0.4 point in the FRB base model to 0.1 point in the age-indicator-variable model. The slight underperformance of individuals aged 62 or older in the FRB base model widens from 0.1 to 0.3 point. Recent immigrants also show differences in mean scores and performance residuals between the FRB base model and the age-neutral models. Scores for this group are about 0.7 point lower with the age-neutral models compared with the FRB base model and the overperformance residuals are about 2 percentage points higher.
Results from adding credit characteristics to the age-neutral models showed little evidence of differential effect. As with the race-neutral models, results are not presented here since they are virtually unchanged from those found when characteristics were added to the FRB base model.
Differential effects and age. Unlike race and ethnicity (except as reflected by recent immigrant status), there is some evidence that the FRB base model credit characteristics may embed some disparate effects by age, but the effect appears small. Individuals younger than age 30 experience positive differential effect, and individuals aged 62 or older experience negative differential effect in the FRB base model. This is reflected in the fact that mean scores for individuals younger than age 30 are about 0.4 point higher in the FRB base model than in the age-neutral models, but scores for individuals aged 62 or older are about 1.5 points lower. As was the case with the racially neutral models, recent immigrants also appear to experience an age-related differential effect. However, it is in the opposite direction than was the case when comparisons were made in racially neutral environments. Mean scores of recent immigrants are about 0.7 points higher in the FRB base model than in models estimated in age-neutral environments.
To further understand a potential source of the differential effect, changes in the weights associated with each attribute and credit characteristic were examined. Much of the change in scores can be traced to changes in the attribute weights associated with the credit characteristic "average age of accounts on credit report." The weights associated with the attributes for this characteristic have a wider range in the age-neutral models than in the FRB base model (table 45). Consequently, those individuals with shorter average account histories (for example, younger individuals and recent immigrants) have higher scores in the FRB base model, and individuals with longer average account histories (typically older individuals) have lower scores in the FRB base model.
The impact of these changes on the younger group is more complex than is apparent from the aggregate changes in mean scores and performance for this group. As shown in table 46, FRB base scores are lower than, or about the same as, those of the age-neutral models for individuals aged 19 and 20 and somewhat higher for individuals aged 21 through 29 [sentence corrected as of August 23, 2007]. In part these changes in different directions reflect the fact that individuals aged 19 though 22 underperform in the age-neutral environment, whereas individuals aged 23 through 29 overperform.
As noted, recent immigrants experience a positive differential effect in the FRB base model. However, it is also the case that this group overperforms, in part because their credit profile resembles those of younger individuals, though they perform like members of their own age cohort. The positive differential effect helps this group by increasing their average scores in the FRB base model, but the score increase is not sufficient to eliminate their overperformance. As noted, much of their overperformance stems from lower score levels as a consequence of having short credit histories, at least as represented in U.S. credit records. Mitigating the effects of a short credit history on recent immigrants would come at a cost. For example, dropping the credit characteristic "average age of accounts on credit report" from the clean-file and major-derogatory scorecards and dropping another length-of-credit-history characteristic, "total number of months since the most recent update on an account," from the thin-file scorecard would lower the overall KS statistic for the model from 73.0 to 72.8.
Another way of looking at the effect of dropping credit characteristics related to length of credit history is to examine the changes in evaluation of good performers and bad performers when these characteristics when these characteristics are dropped from the FRB base model (table 47). For example, when the credit characteristic, "average age of accounts on credit report," is dropped from the clean scorecard, 46 percent of sample individuals' scores move by 1 point or more in the direction consistent with worse model performance. In contrast, 30 percent of individuals have scores that move by 1 point or more in the direction consistent with improved model predictiveness. On net, these changes imply a significant decrease in model predictiveness. Thus, to mitigate the fact that scores, even in an age-neutral environment, for recent immigrants are too low by dropping the characteristics related to length of credit history would result in a significant decrease in model predictiveness for other individuals.
The investigation of differential effects arising through individual credit characteristics was restricted to the FRB base model developed for this purpose, and thus these results are dependent upon the choices made in building this model and may not apply to other models used in the industry. Nevertheless, several generalizations are suggested by these findings.
First, there is little evidence that any of the credit characteristics included in the FRB base model embed negative differential effects for any racial or ethnic group, and there is no evidence that any important credit characteristic was excluded from the model because its predictiveness was muted by differential effect. Those results appear to be due mainly to the lack of correlation between credit characteristics and race or ethnicity. To the extent that the credit characteristics examined here are typical of those used in other generic credit history scoring models, the results presented here would likely apply to those models as well. A similar conclusion can be drawn about sex.137
Second, the analysis did find mild evidence of differential effects by age, with younger individuals and recent immigrants experiencing positive differential effects (higher scores) in the FRB base model, and older individuals experiencing negative differential effects (lower scores) in the FRB base model. These effects appear to be caused by credit characteristics related to length of credit history having somewhat more muted effects in the FRB base model than they would have in a model estimated in an age-neutral environment. The consequences of this more muted effect for these credit characteristics reduces scores of individuals with long credit histories and increases scores of individuals with short credit histories.
Mitigating the differential effect by dropping the credit characteristics related to length of credit history would be counterproductive because the characteristic is receiving too little weight in the FRB base model rather than too much. An alternative way of mitigating the effect would be to use the weights estimated in an age-neutral environment, although a choice must be made about which age-neutral environment to use for estimation since the resulting weights differ depending upon the way age-neutrality is achieved. For example, if the weights estimated for the attributes associated with the credit characteristic "average age of accounts on credit report" in the age-neutral age-indicator-variable model are substituted for the original weights for these attributes in the FRB base model (but all other attribute weights are left unchanged), the positive differential effect for recent immigrants and younger individuals is virtually eliminated. However, for individuals aged 62 or older, the fact that only about one-half of the negative differential effect is eliminated implies that other credit characteristics must be contributing to this effect. Predictiveness drops a small amount when the different weights are used; however, the reduction stems entirely from the elimination of the proxying effects in the characteristic weights.
In any event, if the effect is not mitigated, the size of the differential effect is relatively small. Mean scores of different age groups derived from the FRB base model and the age-neutral models differ by at most 1.5 points.
Recent immigrants appear to have somewhat lower FRB base model scores than would be appropriate given their performance. However, this is not due to a negative differential effect. Rather, it owes to the tendency of recent immigrants to have credit profiles similar to young people in terms of the lengths of their credit histories, as reflected in their U.S. credit records.
The scores of recent immigrants might be made more consistent with performance by changes in the credit-reporting process. For example, it might be possible to gather information on the credit histories of recent immigrants from their home countries to supplement the credit records maintained by the three credit-reporting agencies in the United States More generally, ongoing industry efforts to incorporate into credit records items traditionally not collected, such as utility and rental payments, and experiences with nontraditional sources of finance, such as payday lenders and pawn shops, would broaden the information included in credit records and may serve to lengthen the period over which individuals will be recorded as having a credit record.