Guide To Credit Scoring In R Sharma
User Manual: Sharma-CreditScoring
Open the PDF directly: View PDF .
Page Count: 45
Download | |
Open PDF In Browser | View PDF |
Credit Scoring in R 1 of 45 Guide to Credit Scoring in R By DS (ds5j@excite.com) (Interdisciplinary Independent Scholar with 9+ years experience in risk management) Summary To date Sept 23 2009, as Ross Gayler has pointed out, there is no guide or documentation on Credit Scoring using R (Gayler, 2008). This document is the first guide to credit scoring using the R system. This is a brief practical guide based on experience showing how to do common credit scoring development and validation using R. In addition the paper highlights cutting edge algorithms available in R and not in other commercial packages and discusses an approach to improving existing credit scorecards using the Random Forest package. Note: This is not meant to be tutorial on basic R or the benefits of it necessarily as other documentation for e.g. http://cran.r-project.org/other-docs.html does a good job for introductory R. Acknlowedgements: Thanks to Ross Gayler for the idea and generous and detailed feedback. Thanks also to Carolin Strobl for her help on unbiased random forest variable and party package. Thanks also to George Overstreet and Peter Beling for helpful discussions and guidance. Also much thanks to Jorge Velez and other people on R-help who helped with coding and R solutions. Credit Scoring in R 2 of 45 Table of Contents Goals................................................................................................................................3 Approach to Model Building...........................................................................................3 Architectural Suggestions................................................................................................3 Practical Suggestions.......................................................................................................3 R Code Examples............................................................................................................4 Reading Data In...............................................................................................................4 Binning Example.............................................................................................................4 Example of Binning or Coarse Classifying in R:............................................................4 Breaking Data into Training and Test Sample................................................................4 Traditional Credit Scoring...................................................................................................5 Traditional Credit Scoring Using Logistic Regression in R............................................5 Calculating ROC Curve for model..................................................................................5 Calculating KS Statistic...................................................................................................5 Calculating top 3 variables affecting Credit Score Function in R...................................6 Cutting Edge techniques Available in R..............................................................................7 Using Bayesian N Using Traditional recursive Partitioning...........................................7 Comparing Complexity and out of Sample Error..........................................................10 Compare ROC Performance of Trees............................................................................10 Converting Trees to Rules.............................................................................................11 Bayesian Networks in Credit Scoring............................................................................12 Using Traditional recursive Partitioning.......................................................................14 Comparing Complexity and out of Sample Error..........................................................16 Compare ROC Performance of Trees............................................................................17 Converting Trees to Rules.............................................................................................18 Conditional inference Trees ..........................................................................................18 Using Random Forests...................................................................................................20 Calculating Area under the Curve.................................................................................25 Cross Validation ...........................................................................................................26 Cutting Edge techniques: Party Package(Unbiased Non parametric methods-Model Based Trees)..................................................................................................................26 Appendix of Useful Functions.......................................................................................29 References......................................................................................................................31 .....................................................................................................................................34 Appendix: German Credit Data.....................................................................................35 Credit Scoring in R 3 of 45 Goals The goal of this guide to show basic credit scoring computations in R using simple code. Approach to Model Building It is suggested that credit scoring practitioners adopt a systems approach to model development and maintenance. From this point of view one can use the SOAR methodology, developed by Don Brown at UVA (Brown, 2005). The SOAR process comprises of understanding the goal of the system being developed and specifying it in clear terms along with a clear understanding and specification of the data, observing the data, analyzing the data, and the making recommendations (2005). For references on the traditional credit scoring development process like Lewis, Siddiqi, or Anderson please see Ross Gayler’s Credit Scoring references page (http://r.gayler.googlepages.com/creditscoringresources ). Architectural Suggestions Clearly in the commercial statistical computing world SAS is the industry leading product to date. This is partly due to the vast amount of legacy code already in existence in corporations and also because of its memory management and data manipulation capabilities. R in contrast to SAS offers open source support, along with cutting edge algorithms, and facilities. To successfully use R in a large scale industrial environment it is important to run it on large scale computers where memory is plentiful as R, unlike SAS, loads all data into memory. Windows has a 2 gigbayte memory limit which can be problematic for super large data sets. Although SAS is used in many companies as a one stop shop, most statistical departments would benefit in the long run by separating all data manipulation to the database layer (using SQL) which leaves only statistical computing to be performed. Once these 2 functions are decoupled it becomes clear R offers a lot in terms of robust statistical software. Practical Suggestions Building high performing models requires skill, ability to conceptualize and understand data relationships, some theory. It is helpful to be versed in the appropriate literature, brainstorm relationships that should exist in the data, and test them out. This is an ad hoc process I have used and found to be effective. For formal methods like Geschka’s brainwriting and Zwicky’s morphological box see Gibson’s guide to Systems analysis (Gibson etal, 2004). For the advantages of R and introductory tutorials see http://cran.r-project.org/other-docs.html. Credit Scoring in R 4 of 45 R Code Examples In the credit scoring examples below the German Credit Data set is used (Asuncion et al, 2007). It has 300 bad loans and 700 good loans and is a better data set than other open credit data as it is performance based vs. modeling the decision to grant a loan or not. The bad loans did not pay as intended. It is common in credit scoring to classify bad accounts as those which have ever had a 60 day delinquency or worse (in mortgage loans often 90 day plus is often used). Reading Data In # read comma separated file into memory data<-read.csv("C:/Documents and Settings/My Documents/GermanCredit.csv") Binning Example In R dummy data variables are called factors and numeric or double are numeric types. #code to convert variable to factor data$ property <-as.factor(data$ property) #code to convert to numeric data$age <-as.numeric(data$age) #code to convert to decimal data$amount<-as.double(data$amount) Often in credit scoring it is recommended that continuous variables like Loan to Value ratios, expense ratios, and other continuous variables be converted to dummy variables to improve performance (Mays, 2000). Example of Binning or Coarse Classifying in R: data$amount<-as.factor(ifelse(data$amount<=2500,'02500',ifelse(data$amount<=5000,'2600-5000','5000+'))) Note: Having a variable in both continuous and binned (discrete form) can result in unstable or poorer performing results. Breaking Data into Training and Test Sample The following code creates a training data set comprised of randomly selected 60% of the data and the out of sample test sample being a random 40% sample remaining. Credit Scoring in R 5 of 45 d = sort(sample(nrow(data), nrow(data)*.6)) #select training sample train<-data[d,] test<-data[-d,] train<-subset(train,select=-default) Traditional Credit Scoring Traditional Credit Scoring Using Logistic Regression in R m<-glm(good_bad~.,data=train,family=binomial()) # for those interested in the step function one can use m=2.5 age< 27 Credit Scoring in R 12 of 45 Rule number: 18 [yval=bad cover=50 N=16 Y=34 (8%) prob=0.09] checking< 2.5 afford< 54 history>=3.5 job>=2.5 The rules show that loans with low checking, affordability, history, and no co-applicants are much riskier. For other more robust recursive partitioning see Breiman’s Random Forests and Zeleis and Hothorn’s conditional inference trees and model based recursive partitioning which allows econometricians the ability to use theory to guide the development of tree logic (2007). Bayesian Networks in Credit Scoring The ability to understand the relationships between credit scoring variables is critical in building sound models. Bayesian Networks provide a powerful technique to understand causal relationships between variables via graphical directed graphs showing relationships among variables. The lack of causal analysis in econometric papers is an issue raised by Pearl and discussed at length in his beautiful work on causal inference (Pearl, 200). The technique treats the variables are random variables and uses markov chain monte carlo methods to assess relationships between variables. It is computationally intensive but another important tool to have in the credit scoring tool kit. For literature on the applications of Bayesian networks to credit scoring please see Baesens et al(2001) and Chang et al (2000). For details on Bayesian network Package see the deal package (Bøttcher & Dethlefsen, 2003). Bayesian Network Credit Scoring in R #load library library(deal) #make copy of train ksl<-train #discrete cannot inherit from continuous so binary good/bad must be converted to numeric for deal package ksl$good_bad<-as.numeric(train$good_bad) #no missing values allowed so set any missing to 0 # ksl$history[is.na(ksl$history1)] <- 0 Credit Scoring in R 13 of 45 #drops empty factors # ksl$property<-ksl$property[drop=TRUE] ksl.nw<-network(ksl) ksl.prior <- jointprior(ksl.nw) #The ban list is a matrix with two columns. Each row contains the directed edge #that is not allowed. #banlist <- matrix(c(5,5,6,6,7,7,9,8,9,8,9,8,9,8),ncol=2) ## ban arrows towards Sex and Year # [,1] [,2] #[1,] 5 8 #[2,] 5 9 #[3,] 6 8 #[4,] 6 9 #[5,] 7 8 #[6,] 7 9 #[7,] 9 8 # note this a computationally intensive procuredure and if you know that certain variables should have not relationships you should specify # the arcs between variables to exclude in the banlist ksl.nw <- learn(ksl.nw,ksl,ksl.prior)$nw #this step appears expensive so reset restart from 2 to 1 and degree from 10 to 1 result =2.5 age< 27 Rule number: 18 [yval=bad cover=50 N=16 Y=34 (8%) prob=0.09] checking< 2.5 afford< 54 history>=3.5 job>=2.5 For other more robust recursive partitioning see Breiman’s Random Forests and Zeleis and Hothorn’s conditional inference trees and model based recursive partitioning which allows econometricians the ability to use theory to guide the development of tree logic (2007). Conditional inference Trees Conditional inference Trees are the next generation of Recursive Partitioning methodology and over comes the instability and biases found in traditional recursive Credit Scoring in R 19 of 45 partitioning like CARTtm and CHAID. Conditional Inference trees offer a concept of statistical significance based on bonferroni metric unlike traditional tree methods like CHAID. Conditional inference trees perform as well as Rpart and are robust and stable with statistically significant tree partitions being selected (Hothorn etal, 2007) . #conditional inference trees corrects for known biases in chaid and cart library(party) cfit1<-ctree(good_bad~.,data=train) plot(cfit1); Conditional inference Tree Plot ctree plot shows the distribution of classes under each branch. resultdfr <- as.data.frame(do.call("rbind", treeresponse(cfit1, newdata = test))) test$tscore3<-resultdfr[,2] pred9<-prediction(test$tscore3,test$good_bad) perf9 <- performance(pred9,"tpr","fpr") Credit Scoring in R 20 of 45 plot(perf5,col='red',lty=1,main='Tree vs Tree with Prior Prob vs Ctree'); plot(perf6, col='green',add=TRUE,lty=2); plot(perf9, col='blue',add=TRUE,lty=3); legend(0.6,0.6,c('simple tree','tree with 90/10 prior','Ctree'),col=c('red','green','blue'),lwd=3) Performance of Trees vs. Ctrees Using Random Forests Given the known issues of instability of traditional recursive partitioning techniques Random Forests offer a great alternative to traditional credit scoring and offer better insight into variable interactions than traditional logistic regression library(randomForest) Credit Scoring in R 21 of 45 arf ") { # The following [,5] is hardwired - needs work! cat("\n") cat(sprintf(" Rule number: %s ", names[i])) cat(sprintf("[yval=%s cover=%d (%.0f%%) prob=%0.2f]\n", ylevels[frm[i,]$yval], frm[i,]$n, round(100*frm[i,]$n/ds.size), frm[i,]$yval2[,5])) pth <- path.rpart(model, nodes=as.numeric(names[i]), print.it=FALSE) cat(sprintf(" %s\n", unlist(pth)[-1]), sep="") } } } Credit Scoring in R 30 of 45 My modified version of the function needs to be tweaked depending on the data set. If the predictor variable is bad then following function will only print rules which classify bad loans. If your data has a different value then that line in the code needs to be changed for your use. listrules<-function(model) { if (!inherits(model, "rpart")) stop("Not a legitimate rpart tree") # # Get some information. # frm <- model$frame names <- row.names(frm) ylevels <- attr(model, "ylevels") ds.size <- model$frame[1,]$n # # Print each leaf node as a rule. # for (i in 1:nrow(frm)) { if (frm[i,1] == " " & ylevels[frm[i,]$yval]=='bad') { # The following [,5] is hardwired - needs work! cat("\n") cat(sprintf(" Rule number: %s ", names[i])) cat(sprintf("[yval=%s cover=%d N=%.0f Y=%.0f (%.0f%%) prob=%0.2f]\n", ylevels[frm[i,]$yval], frm[i,]$n, formatC(frm[i,]$yval2[,2], format = "f", digits = 2), formatC(frm[i,]$n-frm[i,]$yval2[,2], format = "f", digits = 2), round(100*frm[i,]$n/ds.size), frm[i,] $yval2[,5])) pth <- path.rpart(model, nodes=as.numeric(names[i]), print.it=FALSE) cat(sprintf(" %s\n", unlist(pth)[-1]), sep="") } } } Credit Scoring in R 31 of 45 References Asuncion, A. & Newman, D.J. (2007). UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science. Source of German Credit Data. Baesens B; , Egmont-Petersen,M; Castelo, R, and Vanthienen, J. (2001) Learning Bayesian network classifiers for credit scoring using Markov ChainMonte Carlo search. Retrieved from http://www.cs.uu.nl/research/techreps/repo/CS-2001/200158.pdf Beling P, Covaliu Z and Oliver RM (2005). Optimal Scoring cutoff policies and efficient frontiers. J. Opl Res Soc 56: 1016–1029. Bøttcher,SG; Dethlefsen,C (2003) Deal: A package for learning bayesian networks. Journal of Statistical Software. Retrieved from http://www.jstatsoft.org/v08/i20/paper Breiman,L. (2002) Wald 2: Looking Inside the Black Box. Retrieved from www.stat.berkeley.edu/users/breiman/wald2002-2.pdf Brown, Don (2005) Linear Models Unpublished Manuscript at University of Virginia. Chang,KC; Fund, R.; Lucas,A; Oliver, R and Shikaloff, N (2000) Bayesian networks applied to credit scoring. IMA Journal of Management Mathematics 2000 11(1):1-18 Gayler, R. (1995) Is the Wholesale Modeling of interactions Worthwhile? (Proceedings of Conference on CreditScoring and Credit Control, University of Edinburgh Management School, U.K.). Gayler, R (2008) Credit Risks Analystics Occasional newsletter. Retrieved from http://r.gayler.googlepages.com/CRAON01.pdf Gibson, J; Scherer, W.T. (2004) How to Do a Systems Analysis? Hand, D. J. (2005). Good practice in retail credit score-card assessment. Journal of the Operational Research Society, 56, 1109–1117. Kowalczyk, W (2003) Heuristics for building scorecard Trees. Hothorn, T; Hornik, K & Zeilesi, A (2007) Unbiased Recursive Paritioning: A Conditional inference Framework. Retrieved from Credit Scoring in R 32 of 45 http://www.crc.man.ed.ac.uk/conference/archive/2003/abstracts/kowalkzyk.pdfhttp://stat math.wu.ac.at/~zeileis/papers/Hothorn+Hornik+Zeileis-2006.pdf Maindonald, J.H. and Braun, W.J. (2007) “Data Analysis and Graphics Using R”. http://cran.ms.unimelb.edu.au/web/packages/DAAG/DAAG.pdf . Mays, Elizabeth. (2000) Handbook of Credit Scoring, Chicago: Glenlake, Overstreet, GA; Bradly, E. (1996) Applicability of Generic Linear Scoring Models in the U.S credit-union environment. IMA Journal of Math Applied in Business and Industry. 7. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, England: Cambridge University Press. Perlich, C; Provost, F; & Simonoff, J.S. (2003) Tree Induction vs. Logistic Regression: A Learning-Curve Analysis. Journal of Machine Learning Research 4 (2003) 211-255 Sharma, D; Overstreet, George; Beling, Peter (2009) Not If Affordability data adds value but how to add real value by Leveraging Affordability Data: Enhancing Predictive capability of Credit Scoring Using Affordability Data. CAS (Casualty Actuarial Society) Working Paper. Retrieved from http://www.casact.org/research/wp/index.cfm? fa=workingpapers Sing,Tobias; Sander, Oliver; Beerenwinkel, Niko; & Lengauer, Thomas. (2005) ROCR: visualizing classifier performance in R. Bioinformatics 21(20):3940-3941 Schauerhuber,M; Zeileis,Achim ; Meyer, David; and Hornik, Kurt (2007) Benchmarking Open-Source Tree Learners in R/RWeka. Retrieved from http://epub.wu.ac.at/dyn/virlib/wp/eng/mediate/epub-wu-01_bd8.pdf?ID=epub-wu01_bd8 Sing,Tobias; Sander, Oliver; Beerenwinkel, Niko; & Lengauer, Thomas. (2005) ROCR: visualizing classifier performance in R.Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 21(20):3940-39418:25. Strobl,C.; Hothorn,T& A. Zeileis (2009). Party on! A new, conditional variable importance measure for random forests available in the party package. Technical report (submitted). Retrieved from http://epub.ub.uni-muenchen.de/9387/1/techreport.pdf. Strobl,Carolin; Malley,J. and Tutz,G (2009) An Introduction to Recursive Partitioning. Retrieved from http://epub.ub.uni-muenchen.de/10589/1/partitioning.pdf Credit Scoring in R 33 of 45 Therneau, T.M.; Atkinson, E.J.; (1997) An Introduction to Recursive Partitioning Using the RPART Routines Retrieved from www.mayo.edu/hsr/techrpt/61.pdf . Tranunion (2006) SEGMENTATION FOR CREDIT-BASED DELINQUENCY MODELS White Paper. Retrieved from http://www.transunion.com/corporate/vantageScore/documents/segmentationr6.pdf Velez, Jorge Ivan (2008) R-Help Support group email. http://tolstoy.newcastle.edu.au/R/e4/help/08/07/16432.html Williams, Graham Desktop Guide to Data Mining Retrieved from http://www.togaware.com/datamining/survivor/ and http://datamining.togaware.com/survivor/Convert_Tree.html Zeileis,A; Hothorn, T; and Hornik, K (2006) Evaluating Model-based Trees in Practice Achim Zeileis, Torsten Hothorn, Kurt Hornik. Retrieved from http://epub.wu-wien.ac.at/ dyn/virlib/wp/eng/mediate/epub-wu-01_95a.pdf?ID=epub-wu-01_95a Zeileis,A; Hothorn, T; and Hornik, K (2006) Party with the mob: Model-based Recursive Partitioning in R Retrieved from http://cran.r-project.org/web/packages/party/vignettes/MOB.pdf For Party package in R. retrieved from http://cran.r-project.org/web/packages/party/party.pdf Credit Scoring in R 34 of 45 Credit Scoring in R 35 of 45 Appendix: German Credit Data http://ocw.mit.edu/NR/rdonlyres/Sloan-School-of-Management/15-062DataMiningSpring2003/94F99F14-189D-4FBA-91A8-D648D1867149/0/GermanCredit.pdf Variable Type Code Description 1 OBS# Observation No. Categorical 2 CHK_ACCT Checking account status Categorical 0 : < 0 DM 1: 0 < ...< 200 DM 2 : => 200 DM 3: unknown 3 DURATION Duration of credit in months Numerical 4 HISTORY Credit history Categorical 0: no credits taken 1: all credits at this bank paid back duly 2: existing credits paid back duly till now 3: delay in paying off in the past 4: critical account 5 NEW_CAR Purpose of credit Binary car (new) 0: No, 1: Yes 6 USED_CAR Purpose of credit Binary car (used) 0: No, 1: Yes 7 FURNITURE Purpose of credit Binary furniture/equipment 0: No, 1: Yes 8 RADIO/TV Purpose of credit Credit Scoring in R 36 of 45 Binary radio/television 0: No, 1: Yes 9 EDUCATION Purpose of credit Binary education 0: No, 1: Yes 10 RETRAINING Purpose of credit Binary retraining 0: No, 1: Yes 11 AMOUNT Credit amount Numerical 12 SAV_ACCT Average balance in savings account Categorical 0 : < 100 DM 1 : 100<= ... < 500 DM 2 : 500<= ... < 1000 DM 3 : =>1000 DM 4 : unknown 13 EMPLOYMENT Present employment since Categorical 0 : unemployed 1: < 1 year 2 : 1 <= ... < 4 years 3 : 4 <=... < 7 years 4 : >= 7 years 14 INSTALL_RATE Installment rate as % of disposable income Numerical 15 MALE_DIV Applicant is male and divorced Binary 0: No, 1: Yes 16 MALE_SINGLE Applicant is male and single Binary 0: No, 1: Yes 17 MALE_MAR Applicant is male and married or widower Binary Credit Scoring in R 37 of 45 0: No, 1: Yes Page 2 Var. # Variable Name Description Variable Type Code Description 18 CO-APPLICANT Application has a co-applicant Binary 0: No, 1: Yes 19 GUARANTOR Applicant has a guarantor Binary 0: No, 1: Yes 20 TIME_RES Present resident since - years Categorical 0: <= 1 year 1<…<=2 years 2<…<=3 years 3:>4years 21 REAL_ESTATE Applicant owns real estate Binary 0: No, 1: Yes 22 PROP_NONE Applicant owns no property (or unknown) Binary 0: No, 1: Yes 23 AGE Age in years Numerical 24 OTHER_INSTALL Applicant has other installment plan credit Binary 0: No, 1: Yes 25 RENT Applicant rents Binary 0: No, 1: Yes 26 OWN_RES Applicant owns residence Binary 0: No, 1: Yes 27 NUM_CREDITS Number of existing credits at this bank Credit Scoring in R 38 of 45 Numerical 28 JOB Nature of job Categorical 0 : unemployed/ unskilled - non-resident 1 : unskilled - resident 2 : skilled employee / official 3 : management/ self-employed/highly qualified employee/ officer 29 NUM_DEPEND Number of dependents Numerical 30 TELEPHONE Applicant has phone in his or her name Binary 0: No, 1: Yes 31 FOREIGN Foreign worker Binary 0: No, 1: Yes 32 RESPONSE Fulfilled terms of credit agreement Binary 0: No, 1: Yes Binary 0: No, 1: Yes Sample of Full R code in One Shot (in case one wants to copy paste and run all the code at once) data<-read.csv("C:/Documents and Settings/GermanCredit.csv") data$afford<-data$checking* data$savings*data$installp*data$housing #code to convert variable to factor data$property <-as.factor(data$property) #code to convert to numeric data$age <-as.numeric(data$age) #code to convert to decimal data$amount<-as.double(data$amount) data$amount<-as.factor(ifelse(data$amount<=2500,'02500',ifelse(data$amount<=5000,'2600-5000','5000+'))) d = sort(sample(nrow(data), nrow(data)*.6)) Credit Scoring in R 39 of 45 #select training sample train<-data[d,] test<-data[-d,] train<-subset(train,select=-default) #m " & ylevels[frm[i,]$yval]=='bad') { # The following [,5] is hardwired - needs work! cat("\n") cat(sprintf(" Rule number: %s ", names[i])) cat(sprintf("[yval=%s cover=%d N=%.0f Y=%.0f (%.0f%%) prob=%0.2f]\n", ylevels[frm[i,]$yval], frm[i,]$n, formatC(frm[i,]$yval2[,2], format = "f", digits = 2), formatC(frm[i,]$n-frm[i,]$yval2[,2], format = "f", digits = 2), round(100*frm[i,]$n/ds.size), frm[i,] $yval2[,5])) pth <- path.rpart(model, nodes=as.numeric(names[i]), print.it=FALSE) cat(sprintf(" %s\n", unlist(pth)[-1]), sep="") } } } listrules(fit1) listrules(fit2) library(deal) #make copy of train ksl<-train Credit Scoring in R 43 of 45 #discrete cnnot inherit from continuous so binary good/bad must be converted to numeric for deal package ksl$good_bad<-as.numeric(train$good_bad) #no missing values allowed so set any missing to 0 # ksl$history[is.na(ksl$history1)] <- 0 #drops empty factors # ksl$property<-ksl$property[drop=TRUE] ksl.nw<-network(ksl) ksl.prior <- jointprior(ksl.nw) #The ban list is a matrix with two columns. Each row contains the directed edge #that is not allowed. #banlist <- matrix(c(5,5,6,6,7,7,9,8,9,8,9,8,9,8),ncol=2) ## ban arrows towards Sex and Year # [,1] [,2] #[1,] 5 8 #[2,] 5 9 #[3,] 6 8 #[4,] 6 9 #[5,] 7 8 #[6,] 7 9 #[7,] 9 8 # note this a computationally intensive procuredure and if you know that certain variables should have not relationships you should specify # the arcs between variables to exclude in the banlist ksl.nw <- learn(ksl.nw,ksl,ksl.prior)$nw #this step appears expensive so reset restart from 2 to 1 and degree from 10 to 1 result