Guide To Credit Scoring In R Sharma

User Manual: Sharma-CreditScoring

Open the PDF directly: View PDF PDF.
Page Count: 45

DownloadGuide To Credit Scoring In R Sharma-Credit
Open PDF In BrowserView PDF
Credit Scoring in R

1 of 45

Guide to Credit Scoring in R

By DS (ds5j@excite.com) (Interdisciplinary Independent Scholar with 9+ years
experience in risk management)
Summary
To date Sept 23 2009, as Ross Gayler has pointed out, there is no guide or
documentation on Credit Scoring using R (Gayler, 2008). This document is the first
guide to credit scoring using the R system. This is a brief practical guide based on
experience showing how to do common credit scoring development and validation using
R. In addition the paper highlights cutting edge algorithms available in R and not in
other commercial packages and discusses an approach to improving existing credit
scorecards using the Random Forest package.
Note: This is not meant to be tutorial on basic R or the benefits of it necessarily as
other documentation for e.g. http://cran.r-project.org/other-docs.html does a good job
for introductory R.
Acknlowedgements: Thanks to Ross Gayler for the idea and generous and detailed
feedback. Thanks also to Carolin Strobl for her help on unbiased random forest
variable and party package.
Thanks also to George Overstreet and Peter Beling for helpful discussions and
guidance. Also much thanks to Jorge Velez and other people on R-help who helped
with coding and R solutions.

Credit Scoring in R

2 of 45

Table of Contents

Goals................................................................................................................................3
Approach to Model Building...........................................................................................3
Architectural Suggestions................................................................................................3
Practical Suggestions.......................................................................................................3
R Code Examples............................................................................................................4
Reading Data In...............................................................................................................4
Binning Example.............................................................................................................4
Example of Binning or Coarse Classifying in R:............................................................4
Breaking Data into Training and Test Sample................................................................4
Traditional Credit Scoring...................................................................................................5
Traditional Credit Scoring Using Logistic Regression in R............................................5
Calculating ROC Curve for model..................................................................................5
Calculating KS Statistic...................................................................................................5
Calculating top 3 variables affecting Credit Score Function in R...................................6
Cutting Edge techniques Available in R..............................................................................7
Using Bayesian N Using Traditional recursive Partitioning...........................................7
Comparing Complexity and out of Sample Error..........................................................10
Compare ROC Performance of Trees............................................................................10
Converting Trees to Rules.............................................................................................11
Bayesian Networks in Credit Scoring............................................................................12
Using Traditional recursive Partitioning.......................................................................14
Comparing Complexity and out of Sample Error..........................................................16
Compare ROC Performance of Trees............................................................................17
Converting Trees to Rules.............................................................................................18
Conditional inference Trees ..........................................................................................18
Using Random Forests...................................................................................................20
Calculating Area under the Curve.................................................................................25
Cross Validation ...........................................................................................................26
Cutting Edge techniques: Party Package(Unbiased Non parametric methods-Model
Based Trees)..................................................................................................................26
Appendix of Useful Functions.......................................................................................29
References......................................................................................................................31
.....................................................................................................................................34
Appendix: German Credit Data.....................................................................................35

Credit Scoring in R

3 of 45

Goals
The goal of this guide to show basic credit scoring computations in R using
simple code.

Approach to Model Building
It is suggested that credit scoring practitioners adopt a systems approach to model
development and maintenance. From this point of view one can use the SOAR
methodology, developed by Don Brown at UVA (Brown, 2005). The SOAR process
comprises of understanding the goal of the system being developed and specifying it in
clear terms along with a clear understanding and specification of the data, observing the
data, analyzing the data, and the making recommendations (2005). For references on the
traditional credit scoring development process like Lewis, Siddiqi, or Anderson please
see Ross Gayler’s Credit Scoring references page
(http://r.gayler.googlepages.com/creditscoringresources ).

Architectural Suggestions
Clearly in the commercial statistical computing world SAS is the industry leading
product to date. This is partly due to the vast amount of legacy code already in existence
in corporations and also because of its memory management and data manipulation
capabilities. R in contrast to SAS offers open source support, along with cutting edge
algorithms, and facilities. To successfully use R in a large scale industrial environment it
is important to run it on large scale computers where memory is plentiful as R, unlike
SAS, loads all data into memory. Windows has a 2 gigbayte memory limit which can be
problematic for super large data sets.
Although SAS is used in many companies as a one stop shop, most statistical
departments would benefit in the long run by separating all data manipulation to the
database layer (using SQL) which leaves only statistical computing to be performed.
Once these 2 functions are decoupled it becomes clear R offers a lot in terms of robust
statistical software.

Practical Suggestions
Building high performing models requires skill, ability to conceptualize and
understand data relationships, some theory. It is helpful to be versed in the appropriate
literature, brainstorm relationships that should exist in the data, and test them out. This is
an ad hoc process I have used and found to be effective. For formal methods like
Geschka’s brainwriting and Zwicky’s morphological box see Gibson’s guide to Systems
analysis (Gibson etal, 2004). For the advantages of R and introductory tutorials see
http://cran.r-project.org/other-docs.html.

Credit Scoring in R

4 of 45

R Code Examples
In the credit scoring examples below the German Credit Data set is used
(Asuncion et al, 2007). It has 300 bad loans and 700 good loans and is a better data set
than other open credit data as it is performance based vs. modeling the decision to grant a
loan or not. The bad loans did not pay as intended. It is common in credit scoring to
classify bad accounts as those which have ever had a 60 day delinquency or worse (in
mortgage loans often 90 day plus is often used).

Reading Data In
# read comma separated file into memory
data<-read.csv("C:/Documents and Settings/My
Documents/GermanCredit.csv")

Binning Example
In R dummy data variables are called factors and numeric or double are numeric types.
#code to convert variable to factor
data$ property <-as.factor(data$ property)
#code to convert to numeric
data$age <-as.numeric(data$age)
#code to convert to decimal
data$amount<-as.double(data$amount)
Often in credit scoring it is recommended that continuous variables like Loan to Value
ratios, expense ratios, and other continuous variables be converted to dummy variables to
improve performance (Mays, 2000).

Example of Binning or Coarse Classifying in R:
data$amount<-as.factor(ifelse(data$amount<=2500,'02500',ifelse(data$amount<=5000,'2600-5000','5000+')))
Note: Having a variable in both continuous and binned (discrete form) can result in
unstable or poorer performing results.

Breaking Data into Training and Test Sample
The following code creates a training data set comprised of randomly selected 60% of the
data and the out of sample test sample being a random 40% sample remaining.

Credit Scoring in R

5 of 45

d = sort(sample(nrow(data), nrow(data)*.6))
#select training sample
train<-data[d,]
test<-data[-d,]
train<-subset(train,select=-default)

Traditional Credit Scoring
Traditional Credit Scoring Using Logistic Regression in R
m<-glm(good_bad~.,data=train,family=binomial())
# for those interested in the step function one can use m=2.5
age< 27

Credit Scoring in R

12 of 45

Rule number: 18 [yval=bad cover=50 N=16 Y=34 (8%) prob=0.09]
checking< 2.5
afford< 54
history>=3.5
job>=2.5
The rules show that loans with low checking, affordability, history, and no co-applicants
are much riskier.
For other more robust recursive partitioning see Breiman’s Random Forests and Zeleis
and Hothorn’s conditional inference trees and model based recursive partitioning which
allows econometricians the ability to use theory to guide the development of tree logic
(2007).

Bayesian Networks in Credit Scoring
The ability to understand the relationships between credit scoring variables is critical in
building sound models. Bayesian Networks provide a powerful technique to understand
causal relationships between variables via graphical directed graphs showing
relationships among variables. The lack of causal analysis in econometric papers is an
issue raised by Pearl and discussed at length in his beautiful work on causal inference
(Pearl, 200). The technique treats the variables are random variables and uses markov
chain monte carlo methods to assess relationships between variables. It is
computationally intensive but another important tool to have in the credit scoring tool kit.
For literature on the applications of Bayesian networks to credit scoring please see
Baesens et al(2001) and Chang et al (2000).
For details on Bayesian network Package see the deal package (Bøttcher & Dethlefsen,
2003).
Bayesian Network Credit Scoring in R
#load library
library(deal)
#make copy of train
ksl<-train
#discrete cannot inherit from continuous so binary
good/bad must be converted to numeric for deal package
ksl$good_bad<-as.numeric(train$good_bad)
#no missing values allowed so set any missing to 0
# ksl$history[is.na(ksl$history1)] <- 0

Credit Scoring in R

13 of 45

#drops empty factors
# ksl$property<-ksl$property[drop=TRUE]

ksl.nw<-network(ksl)
ksl.prior <- jointprior(ksl.nw)
#The ban list is a matrix with two columns. Each row
contains the directed edge
#that is not allowed.
#banlist <- matrix(c(5,5,6,6,7,7,9,8,9,8,9,8,9,8),ncol=2)
## ban arrows towards Sex and Year
#
[,1] [,2]
#[1,]
5
8
#[2,]
5
9
#[3,]
6
8
#[4,]
6
9
#[5,]
7
8
#[6,]
7
9
#[7,]
9
8
# note this a computationally intensive procuredure and if
you know that certain variables should have not
relationships you should specify
# the arcs between variables to exclude in the banlist
ksl.nw <- learn(ksl.nw,ksl,ksl.prior)$nw
#this step appears expensive so reset restart from 2 to 1
and degree from 10 to 1
result =2.5
age< 27
Rule number: 18 [yval=bad cover=50 N=16 Y=34 (8%) prob=0.09]
checking< 2.5
afford< 54
history>=3.5
job>=2.5
For other more robust recursive partitioning see Breiman’s Random Forests and Zeleis
and Hothorn’s conditional inference trees and model based recursive partitioning which
allows econometricians the ability to use theory to guide the development of tree logic
(2007).

Conditional inference Trees
Conditional inference Trees are the next generation of Recursive Partitioning
methodology and over comes the instability and biases found in traditional recursive

Credit Scoring in R

19 of 45

partitioning like CARTtm and CHAID. Conditional Inference trees offer a concept of
statistical significance based on bonferroni metric unlike traditional tree methods like
CHAID. Conditional inference trees perform as well as Rpart and are robust and stable
with statistically significant tree partitions being selected (Hothorn etal, 2007) .

#conditional inference trees corrects for known biases in chaid and cart
library(party)
cfit1<-ctree(good_bad~.,data=train)
plot(cfit1);
Conditional inference Tree Plot

ctree plot shows the distribution of classes under each branch.
resultdfr <- as.data.frame(do.call("rbind", treeresponse(cfit1, newdata = test)))
test$tscore3<-resultdfr[,2]
pred9<-prediction(test$tscore3,test$good_bad)
perf9 <- performance(pred9,"tpr","fpr")

Credit Scoring in R

20 of 45

plot(perf5,col='red',lty=1,main='Tree vs Tree with Prior Prob vs Ctree');
plot(perf6, col='green',add=TRUE,lty=2);
plot(perf9, col='blue',add=TRUE,lty=3);
legend(0.6,0.6,c('simple tree','tree with 90/10
prior','Ctree'),col=c('red','green','blue'),lwd=3)

Performance of Trees vs. Ctrees

Using Random Forests
Given the known issues of instability of traditional recursive partitioning techniques
Random Forests offer a great alternative to traditional credit scoring and offer better
insight into variable interactions than traditional logistic regression
library(randomForest)

Credit Scoring in R

21 of 45

arf")
{
# The following [,5] is hardwired - needs work!
cat("\n")
cat(sprintf(" Rule number: %s ", names[i]))
cat(sprintf("[yval=%s cover=%d (%.0f%%) prob=%0.2f]\n",
ylevels[frm[i,]$yval], frm[i,]$n,
round(100*frm[i,]$n/ds.size), frm[i,]$yval2[,5]))
pth <- path.rpart(model, nodes=as.numeric(names[i]),
print.it=FALSE)
cat(sprintf("
%s\n", unlist(pth)[-1]), sep="")
}
}
}

Credit Scoring in R

30 of 45

My modified version of the function needs to be tweaked depending on the data set. If
the predictor variable is bad then following function will only print rules which classify
bad loans. If your data has a different value then that line in the code needs to be
changed for your use.
listrules<-function(model)
{
if (!inherits(model, "rpart")) stop("Not a legitimate
rpart tree")
#
# Get some information.
#
frm
<- model$frame
names
<- row.names(frm)
ylevels <- attr(model, "ylevels")
ds.size <- model$frame[1,]$n
#
# Print each leaf node as a rule.
#
for (i in 1:nrow(frm))
{
if (frm[i,1] == "" & ylevels[frm[i,]$yval]=='bad')
{
# The following [,5] is hardwired - needs work!
cat("\n")
cat(sprintf(" Rule number: %s ", names[i]))
cat(sprintf("[yval=%s cover=%d N=%.0f Y=%.0f (%.0f%%)
prob=%0.2f]\n",
ylevels[frm[i,]$yval], frm[i,]$n,
formatC(frm[i,]$yval2[,2], format = "f", digits = 2),
formatC(frm[i,]$n-frm[i,]$yval2[,2], format = "f", digits
= 2),
round(100*frm[i,]$n/ds.size), frm[i,]
$yval2[,5]))
pth <- path.rpart(model, nodes=as.numeric(names[i]),
print.it=FALSE)
cat(sprintf("
%s\n", unlist(pth)[-1]), sep="")
}
}
}

Credit Scoring in R

31 of 45

References
Asuncion, A. & Newman, D.J. (2007). UCI Machine Learning Repository
[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of
California, School of Information and Computer Science. Source of German Credit Data.
Baesens B; , Egmont-Petersen,M; Castelo, R, and Vanthienen, J. (2001)
Learning Bayesian network classifiers for credit scoring using Markov ChainMonte Carlo
search. Retrieved from http://www.cs.uu.nl/research/techreps/repo/CS-2001/200158.pdf
Beling P, Covaliu Z and Oliver RM (2005). Optimal Scoring cutoff policies and
efficient frontiers. J. Opl Res Soc 56: 1016–1029.
Bøttcher,SG; Dethlefsen,C (2003) Deal: A package for learning bayesian networks.
Journal of Statistical Software. Retrieved from http://www.jstatsoft.org/v08/i20/paper
Breiman,L. (2002) Wald 2: Looking Inside the Black Box. Retrieved from
www.stat.berkeley.edu/users/breiman/wald2002-2.pdf
Brown, Don (2005) Linear Models Unpublished Manuscript at University of Virginia.
Chang,KC; Fund, R.; Lucas,A; Oliver, R and Shikaloff, N (2000)
Bayesian networks applied to credit scoring. IMA Journal of Management Mathematics
2000 11(1):1-18
Gayler, R. (1995) Is the Wholesale Modeling of interactions Worthwhile? (Proceedings
of Conference on CreditScoring and Credit Control, University of Edinburgh
Management School, U.K.).
Gayler, R (2008) Credit Risks Analystics Occasional newsletter. Retrieved from
http://r.gayler.googlepages.com/CRAON01.pdf
Gibson, J; Scherer, W.T. (2004) How to Do a Systems Analysis?
Hand, D. J. (2005). Good practice in retail credit score-card assessment. Journal of the
Operational Research Society, 56, 1109–1117.
Kowalczyk, W (2003) Heuristics for building scorecard Trees.
Hothorn, T; Hornik, K & Zeilesi, A (2007) Unbiased Recursive Paritioning: A
Conditional inference Framework. Retrieved from

Credit Scoring in R

32 of 45

http://www.crc.man.ed.ac.uk/conference/archive/2003/abstracts/kowalkzyk.pdfhttp://stat
math.wu.ac.at/~zeileis/papers/Hothorn+Hornik+Zeileis-2006.pdf
Maindonald, J.H. and Braun, W.J. (2007) “Data Analysis and Graphics Using R”.
http://cran.ms.unimelb.edu.au/web/packages/DAAG/DAAG.pdf
.
Mays, Elizabeth. (2000) Handbook of Credit Scoring, Chicago: Glenlake,
Overstreet, GA; Bradly, E. (1996) Applicability of Generic Linear Scoring Models in the
U.S credit-union environment. IMA Journal of Math Applied in Business and Industry. 7.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge,
England: Cambridge University Press.
Perlich, C; Provost, F; & Simonoff, J.S. (2003) Tree Induction vs. Logistic Regression: A
Learning-Curve Analysis. Journal of Machine Learning Research 4
(2003) 211-255
Sharma, D; Overstreet, George; Beling, Peter (2009) Not If Affordability data adds value
but how to add real value by Leveraging Affordability Data: Enhancing Predictive
capability of Credit Scoring Using Affordability Data. CAS (Casualty Actuarial Society)
Working Paper. Retrieved from http://www.casact.org/research/wp/index.cfm?
fa=workingpapers
Sing,Tobias; Sander, Oliver; Beerenwinkel, Niko; & Lengauer, Thomas. (2005)
ROCR: visualizing classifier performance in R. Bioinformatics 21(20):3940-3941
Schauerhuber,M; Zeileis,Achim ; Meyer, David; and Hornik, Kurt (2007)
Benchmarking Open-Source Tree Learners in R/RWeka. Retrieved from
http://epub.wu.ac.at/dyn/virlib/wp/eng/mediate/epub-wu-01_bd8.pdf?ID=epub-wu01_bd8
Sing,Tobias; Sander, Oliver; Beerenwinkel, Niko; & Lengauer, Thomas. (2005)
ROCR: visualizing classifier performance in R.Strobl, C., A.-L. Boulesteix, A. Zeileis,
and T. Hothorn (2007). Bias in random forest variable importance measures: Illustrations,
sources and a solution. BMC Bioinformatics 21(20):3940-39418:25.
Strobl,C.; Hothorn,T& A. Zeileis (2009). Party on! A new, conditional variable
importance measure
for random forests available in the party package. Technical report (submitted).
Retrieved from http://epub.ub.uni-muenchen.de/9387/1/techreport.pdf.
Strobl,Carolin; Malley,J. and Tutz,G (2009) An Introduction to Recursive Partitioning.
Retrieved from http://epub.ub.uni-muenchen.de/10589/1/partitioning.pdf

Credit Scoring in R

33 of 45

Therneau, T.M.; Atkinson, E.J.; (1997) An Introduction to Recursive Partitioning Using
the RPART Routines Retrieved from www.mayo.edu/hsr/techrpt/61.pdf .
Tranunion (2006) SEGMENTATION FOR CREDIT-BASED DELINQUENCY
MODELS White Paper. Retrieved from
http://www.transunion.com/corporate/vantageScore/documents/segmentationr6.pdf
Velez, Jorge Ivan (2008) R-Help Support group email.
http://tolstoy.newcastle.edu.au/R/e4/help/08/07/16432.html
Williams, Graham Desktop Guide to Data Mining Retrieved from
http://www.togaware.com/datamining/survivor/ and
http://datamining.togaware.com/survivor/Convert_Tree.html
Zeileis,A; Hothorn, T; and Hornik, K (2006) Evaluating Model-based Trees in Practice
Achim Zeileis, Torsten Hothorn, Kurt Hornik. Retrieved from http://epub.wu-wien.ac.at/
dyn/virlib/wp/eng/mediate/epub-wu-01_95a.pdf?ID=epub-wu-01_95a
Zeileis,A; Hothorn, T; and Hornik, K (2006) Party with the mob: Model-based Recursive
Partitioning in R
Retrieved from http://cran.r-project.org/web/packages/party/vignettes/MOB.pdf For
Party package in R. retrieved from http://cran.r-project.org/web/packages/party/party.pdf

Credit Scoring in R

34 of 45

Credit Scoring in R

35 of 45

Appendix: German Credit Data
http://ocw.mit.edu/NR/rdonlyres/Sloan-School-of-Management/15-062DataMiningSpring2003/94F99F14-189D-4FBA-91A8-D648D1867149/0/GermanCredit.pdf
Variable Type Code Description
1
OBS#
Observation No.
Categorical
2
CHK_ACCT
Checking account status
Categorical
0 : < 0 DM
1: 0 < ...< 200 DM
2 : => 200 DM
3: unknown
3
DURATION
Duration of credit in months
Numerical
4
HISTORY
Credit history
Categorical
0: no credits taken
1: all credits at this bank paid back duly
2: existing credits paid back duly till now
3: delay in paying off in the past
4: critical account
5
NEW_CAR
Purpose of credit
Binary
car (new) 0: No, 1: Yes
6
USED_CAR
Purpose of credit
Binary
car (used) 0: No, 1: Yes
7
FURNITURE
Purpose of credit
Binary
furniture/equipment 0: No, 1: Yes
8
RADIO/TV
Purpose of credit

Credit Scoring in R

36 of 45

Binary
radio/television 0: No, 1: Yes
9
EDUCATION
Purpose of credit
Binary
education 0: No, 1: Yes
10
RETRAINING
Purpose of credit
Binary
retraining 0: No, 1: Yes
11
AMOUNT
Credit amount
Numerical
12
SAV_ACCT
Average balance in savings account
Categorical
0 : < 100 DM
1 : 100<= ... < 500 DM
2 : 500<= ... < 1000 DM
3 : =>1000 DM
4 : unknown
13
EMPLOYMENT Present employment since
Categorical
0 : unemployed
1: < 1 year
2 : 1 <= ... < 4 years
3 : 4 <=... < 7 years
4 : >= 7 years
14
INSTALL_RATE Installment rate as % of disposable
income
Numerical
15
MALE_DIV
Applicant is male and divorced
Binary
0: No, 1: Yes
16
MALE_SINGLE
Applicant is male and single
Binary
0: No, 1: Yes
17
MALE_MAR
Applicant is male and married or widower Binary

Credit Scoring in R

37 of 45

0: No, 1: Yes
Page 2
Var. # Variable Name
Description
Variable Type Code Description
18
CO-APPLICANT Application has a co-applicant
Binary
0: No, 1: Yes
19
GUARANTOR
Applicant has a guarantor
Binary
0: No, 1: Yes
20
TIME_RES
Present resident since - years
Categorical
0: <= 1 year
1<…<=2 years
2<…<=3 years
3:>4years
21
REAL_ESTATE
Applicant owns real estate
Binary
0: No, 1: Yes
22
PROP_NONE
Applicant owns no property (or unknown) Binary
0: No, 1: Yes
23
AGE
Age in years
Numerical
24
OTHER_INSTALL Applicant has other installment plan credit Binary
0: No, 1: Yes
25
RENT
Applicant rents
Binary
0: No, 1: Yes
26
OWN_RES
Applicant owns residence
Binary
0: No, 1: Yes
27
NUM_CREDITS Number of existing credits at this bank

Credit Scoring in R

38 of 45

Numerical
28
JOB
Nature of job
Categorical
0 : unemployed/ unskilled - non-resident
1 : unskilled - resident
2 : skilled employee / official
3 : management/ self-employed/highly
qualified employee/ officer
29
NUM_DEPEND Number of dependents
Numerical
30
TELEPHONE
Applicant has phone in his or her name Binary
0: No, 1: Yes
31
FOREIGN
Foreign worker
Binary
0: No, 1: Yes
32
RESPONSE
Fulfilled terms of credit agreement
Binary
0: No, 1: Yes
Binary
0: No, 1: Yes

Sample of Full R code in One Shot
(in case one wants to copy paste and run all the code at once)
data<-read.csv("C:/Documents and Settings/GermanCredit.csv")
data$afford<-data$checking*
data$savings*data$installp*data$housing
#code to convert variable to factor
data$property <-as.factor(data$property)
#code to convert to numeric
data$age <-as.numeric(data$age)
#code to convert to decimal
data$amount<-as.double(data$amount)
data$amount<-as.factor(ifelse(data$amount<=2500,'02500',ifelse(data$amount<=5000,'2600-5000','5000+')))
d = sort(sample(nrow(data), nrow(data)*.6))

Credit Scoring in R

39 of 45

#select training sample
train<-data[d,]
test<-data[-d,]
train<-subset(train,select=-default)
#m" & ylevels[frm[i,]$yval]=='bad')
{
# The following [,5] is hardwired - needs work!
cat("\n")
cat(sprintf(" Rule number: %s ", names[i]))
cat(sprintf("[yval=%s cover=%d N=%.0f Y=%.0f (%.0f%%)
prob=%0.2f]\n",
ylevels[frm[i,]$yval], frm[i,]$n,
formatC(frm[i,]$yval2[,2], format = "f", digits = 2),
formatC(frm[i,]$n-frm[i,]$yval2[,2], format = "f", digits
= 2),
round(100*frm[i,]$n/ds.size), frm[i,]
$yval2[,5]))
pth <- path.rpart(model, nodes=as.numeric(names[i]),
print.it=FALSE)
cat(sprintf("
%s\n", unlist(pth)[-1]), sep="")
}
}
}
listrules(fit1)
listrules(fit2)
library(deal)
#make copy of train
ksl<-train

Credit Scoring in R

43 of 45

#discrete cnnot inherit from continuous so binary good/bad
must be converted to numeric for deal package
ksl$good_bad<-as.numeric(train$good_bad)
#no missing values allowed so set any missing to 0
# ksl$history[is.na(ksl$history1)] <- 0
#drops empty factors
# ksl$property<-ksl$property[drop=TRUE]

ksl.nw<-network(ksl)
ksl.prior <- jointprior(ksl.nw)
#The ban list is a matrix with two columns. Each row
contains the directed edge
#that is not allowed.
#banlist <- matrix(c(5,5,6,6,7,7,9,8,9,8,9,8,9,8),ncol=2)
## ban arrows towards Sex and Year
#
[,1] [,2]
#[1,]
5
8
#[2,]
5
9
#[3,]
6
8
#[4,]
6
9
#[5,]
7
8
#[6,]
7
9
#[7,]
9
8
# note this a computationally intensive procuredure and if
you know that certain variables should have not
relationships you should specify
# the arcs between variables to exclude in the banlist
ksl.nw <- learn(ksl.nw,ksl,ksl.prior)$nw
#this step appears expensive so reset restart from 2 to 1
and degree from 10 to 1
result 

Navigation menu