Credit Card Fraud Detection XGBOOST Guide
User Manual:
Open the PDF directly: View PDF .
Page Count: 14
Download | ![]() |
Open PDF In Browser | View PDF |
Credit Card Fraud Detection - XGBOOST ∗ Javier Ng EY The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. Keywords: xgboost, machine learning, R Overview Loading of Libraries setwd("C:/Users/zm679xs/Desktop/R/Essential-R-Codes/EDA Projects/Credit Card Fraud Detection") library(ggplot2) # Data visualization library(readr) # CSV file I/O, e.g. the read_csv function library(caret) library(DMwR) # smote library(xgboost) library(Matrix) library(reshape) #melt library(pROC) # AUC # Input data files are available in the "../input/" directory. # For example, running this (by clicking run or pressing Shift+Enter) will list the files in the load("ccdata.RData") Split Data Split data into train, test and cv using caret. ## split into train, test, cv set.seed(1900) inTrain <- createDataPartition(y = ccdata$Class, p = .6, list = F) #60% train train <- ccdata[inTrain,] testcv <- ccdata[-inTrain,] inTest <- createDataPartition(y = testcv$Class, p = .5, list = F) #20% test test <- testcv[inTest,] cv <- testcv[-inTest,] train$Class <- as.factor(train$Class) rm(inTrain, inTest, testcv) #20% cv ∗ PDF is available on EY FDA shared drive. Current version: March 11, 2019; Corresponding author: javier.ng@sg. ey.com. 1 Imbalance Dataset Using SMOTE to make the dataset more balanced. #using SMOTE instead i <- grep("Class", colnames(train)) # Get index Class column train_smote <- SMOTE(Class ~ ., as.data.frame(train), perc.over = 20000, perc.under=100) table(train_smote$Class) ## ## 0 1 ## 60400 60702 #Identify the Predictors and the dependent variable, aka label. predictors <- colnames(train_smote[-ncol(train_smote)]) #remove last column #xgboost works only if the labels are numeric. Hence, convert the labels (Species) to numeric. label <- as.numeric(train_smote[,ncol(train_smote)]) #Alas, xgboost works only if the numeric labels start from 0. Hence, subtract 1 from the label. label <- as.numeric(train_smote[,ncol(train_smote)])-1 print(table(label)) ## label ## 0 1 ## 60400 60702 Setting Parameters for XGBoost # set parameters: parameters <- list( # General Parameters booster = silent = # Booster Parameters eta = gamma = max_depth = min_child_weight = subsample = colsample_bytree = colsample_bylevel = lambda = alpha = # Task Parameters objective = eval_metric = seed = ) "gbtree", 0, 0.3, 0, 6, 1, 1, 1, 1, 1, 0, "binary:logistic", "auc", 1900 2 Train Model Train the model with the parameters set above and nrounds = 25 (increasing nrounds does not improve the model anymore). Plots show increasing train and cv AUC in the beginning and stagnating at later rounds as expected. ## Training of models # Original cv.nround = 200; # Number of rounds. This can be set to a lower or higher value, if you wish, e bst.cv <- xgb.cv( param=parameters, data = as.matrix(train_smote[,predictors]), label = label, nfold = 3, nrounds=cv.nround, prediction=T) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] train-auc:0.988126+0.000290 train-auc:0.994458+0.000626 train-auc:0.996438+0.000303 train-auc:0.997409+0.000210 train-auc:0.998177+0.000100 train-auc:0.998640+0.000145 train-auc:0.998912+0.000079 train-auc:0.999116+0.000082 train-auc:0.999324+0.000044 train-auc:0.999437+0.000048 train-auc:0.999559+0.000040 train-auc:0.999609+0.000043 train-auc:0.999674+0.000033 train-auc:0.999721+0.000038 train-auc:0.999774+0.000020 train-auc:0.999816+0.000006 train-auc:0.999855+0.000010 train-auc:0.999868+0.000009 train-auc:0.999893+0.000004 train-auc:0.999911+0.000008 train-auc:0.999932+0.000011 train-auc:0.999945+0.000008 train-auc:0.999952+0.000010 train-auc:0.999959+0.000010 train-auc:0.999964+0.000008 train-auc:0.999969+0.000007 train-auc:0.999976+0.000005 train-auc:0.999979+0.000005 train-auc:0.999982+0.000004 train-auc:0.999983+0.000004 train-auc:0.999985+0.000004 test-auc:0.987120+0.000203 test-auc:0.993682+0.000973 test-auc:0.995692+0.000592 test-auc:0.996878+0.000309 test-auc:0.997759+0.000171 test-auc:0.998255+0.000200 test-auc:0.998543+0.000171 test-auc:0.998800+0.000144 test-auc:0.999043+0.000009 test-auc:0.999184+0.000020 test-auc:0.999334+0.000016 test-auc:0.999400+0.000026 test-auc:0.999473+0.000032 test-auc:0.999529+0.000021 test-auc:0.999595+0.000016 test-auc:0.999658+0.000022 test-auc:0.999710+0.000013 test-auc:0.999729+0.000014 test-auc:0.999765+0.000009 test-auc:0.999791+0.000008 test-auc:0.999824+0.000011 test-auc:0.999845+0.000012 test-auc:0.999855+0.000020 test-auc:0.999868+0.000027 test-auc:0.999874+0.000027 test-auc:0.999884+0.000025 test-auc:0.999892+0.000026 test-auc:0.999909+0.000024 test-auc:0.999918+0.000021 test-auc:0.999922+0.000019 test-auc:0.999926+0.000021 3 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] train-auc:0.999987+0.000003 train-auc:0.999988+0.000001 train-auc:0.999990+0.000002 train-auc:0.999990+0.000002 train-auc:0.999991+0.000003 train-auc:0.999993+0.000002 train-auc:0.999993+0.000002 train-auc:0.999994+0.000002 train-auc:0.999996+0.000002 train-auc:0.999997+0.000002 train-auc:0.999997+0.000002 train-auc:0.999998+0.000002 train-auc:0.999998+0.000001 train-auc:0.999998+0.000001 train-auc:0.999999+0.000001 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 test-auc:0.999931+0.000020 test-auc:0.999940+0.000017 test-auc:0.999942+0.000018 test-auc:0.999946+0.000021 test-auc:0.999949+0.000021 test-auc:0.999953+0.000022 test-auc:0.999955+0.000021 test-auc:0.999959+0.000020 test-auc:0.999962+0.000019 test-auc:0.999964+0.000019 test-auc:0.999966+0.000018 test-auc:0.999967+0.000017 test-auc:0.999970+0.000017 test-auc:0.999971+0.000018 test-auc:0.999972+0.000017 test-auc:0.999973+0.000018 test-auc:0.999974+0.000017 test-auc:0.999975+0.000016 test-auc:0.999977+0.000014 test-auc:0.999978+0.000014 test-auc:0.999979+0.000013 test-auc:0.999979+0.000014 test-auc:0.999979+0.000015 test-auc:0.999981+0.000013 test-auc:0.999981+0.000013 test-auc:0.999981+0.000013 test-auc:0.999983+0.000011 test-auc:0.999982+0.000013 test-auc:0.999983+0.000012 test-auc:0.999984+0.000011 test-auc:0.999985+0.000010 test-auc:0.999985+0.000010 test-auc:0.999984+0.000011 test-auc:0.999985+0.000010 test-auc:0.999985+0.000010 test-auc:0.999985+0.000011 test-auc:0.999986+0.000010 test-auc:0.999987+0.000010 test-auc:0.999987+0.000010 test-auc:0.999987+0.000010 test-auc:0.999987+0.000009 test-auc:0.999987+0.000009 test-auc:0.999987+0.000009 test-auc:0.999987+0.000009 test-auc:0.999988+0.000009 test-auc:0.999987+0.000009 test-auc:0.999988+0.000009 test-auc:0.999988+0.000009 4 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [80] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [81] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [82] train-auc:1.000000+0.000000 test-auc:0.999988+0.000009 [83] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [84] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [85] train-auc:1.000000+0.000000 test-auc:0.999988+0.000009 [86] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [87] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [88] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [89] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [90] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [91] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [92] train-auc:1.000000+0.000000 test-auc:0.999987+0.000009 [93] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [94] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [95] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [96] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [97] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [98] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [99] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [100] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [101] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [102] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [103] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [104] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [105] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [106] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [107] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [108] train-auc:1.000000+0.000000 test-auc:0.999986+0.000011 [109] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [110] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [111] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [112] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [113] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [114] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [115] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [116] train-auc:1.000000+0.000000 test-auc:0.999987+0.000011 [117] train-auc:1.000000+0.000000 test-auc:0.999988+0.000010 [118] train-auc:1.000000+0.000000 test-auc:0.999988+0.000010 [119] train-auc:1.000000+0.000000 test-auc:0.999988+0.000010 [120] train-auc:1.000000+0.000000 test-auc:0.999988+0.000010 [121] train-auc:1.000000+0.000000 test-auc:0.999988+0.000010 [122] train-auc:1.000000+0.000000 test-auc:0.999988+0.000010 [123] train-auc:1.000000+0.000000 test-auc:0.999988+0.000010 [124] train-auc:1.000000+0.000000 test-auc:0.999988+0.000010 [125] train-auc:1.000000+0.000000 test-auc:0.999988+0.000010 [126] train-auc:1.000000+0.000000 test-auc:0.999987+0.000010 [127] train-auc:1.000000+0.000000 test-auc:0.999987+0.000011 5 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 test-auc:0.999987+0.000011 test-auc:0.999987+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000009 test-auc:0.999988+0.000009 test-auc:0.999988+0.000009 test-auc:0.999988+0.000009 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999988+0.000010 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 6 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 train-auc:1.000000+0.000000 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999988+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 test-auc:0.999987+0.000011 #Find where the minimum logloss occurred min.loss.idx = which.max(bst.cv$evaluation_log[, test_auc_mean]) cat("Minimum logloss occurred in round : ", min.loss.idx, "\n") ## Minimum logloss occurred in round : 139 # Minimum logloss print(bst.cv$evaluation_log[min.loss.idx,]) ## iter train_auc_mean train_auc_std test_auc_mean test_auc_std ## 1: 139 1 0 0.9999883 9.392662e-06 Predict Predict with dataset and set threshold of 0.5 ## Make predictions bst <- xgboost( param=parameters, data =as.matrix(train_smote[,predictors]), #training it without the output variable! label = label, nrounds=min.loss.idx) 7 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] train-auc:0.988286 train-auc:0.995025 train-auc:0.996358 train-auc:0.997692 train-auc:0.998161 train-auc:0.998605 train-auc:0.999029 train-auc:0.999174 train-auc:0.999332 train-auc:0.999449 train-auc:0.999527 train-auc:0.999583 train-auc:0.999652 train-auc:0.999680 train-auc:0.999759 train-auc:0.999798 train-auc:0.999830 train-auc:0.999855 train-auc:0.999878 train-auc:0.999900 train-auc:0.999931 train-auc:0.999945 train-auc:0.999957 train-auc:0.999963 train-auc:0.999971 train-auc:0.999975 train-auc:0.999977 train-auc:0.999980 train-auc:0.999982 train-auc:0.999985 train-auc:0.999987 train-auc:0.999989 train-auc:0.999989 train-auc:0.999990 train-auc:0.999991 train-auc:0.999993 train-auc:0.999994 train-auc:0.999995 train-auc:0.999995 train-auc:0.999996 train-auc:0.999996 train-auc:0.999997 train-auc:0.999998 train-auc:0.999998 train-auc:0.999999 train-auc:0.999999 train-auc:0.999999 train-auc:0.999999 8 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 train-auc:1.000000 9 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## [97] train-auc:1.000000 [98] train-auc:1.000000 [99] train-auc:1.000000 [100] train-auc:1.000000 [101] train-auc:1.000000 [102] train-auc:1.000000 [103] train-auc:1.000000 [104] train-auc:1.000000 [105] train-auc:1.000000 [106] train-auc:1.000000 [107] train-auc:1.000000 [108] train-auc:1.000000 [109] train-auc:1.000000 [110] train-auc:1.000000 [111] train-auc:1.000000 [112] train-auc:1.000000 [113] train-auc:1.000000 [114] train-auc:1.000000 [115] train-auc:1.000000 [116] train-auc:1.000000 [117] train-auc:1.000000 [118] train-auc:1.000000 [119] train-auc:1.000000 [120] train-auc:1.000000 [121] train-auc:1.000000 [122] train-auc:1.000000 [123] train-auc:1.000000 [124] train-auc:1.000000 [125] train-auc:1.000000 [126] train-auc:1.000000 [127] train-auc:1.000000 [128] train-auc:1.000000 [129] train-auc:1.000000 [130] train-auc:1.000000 [131] train-auc:1.000000 [132] train-auc:1.000000 [133] train-auc:1.000000 [134] train-auc:1.000000 [135] train-auc:1.000000 [136] train-auc:1.000000 [137] train-auc:1.000000 [138] train-auc:1.000000 [139] train-auc:1.000000 # Make prediction on the testing data. test$prediction <- predict(bst, as.matrix(test[,predictors])) #here, I've removed the output var 10 test$prediction <- ifelse(test$prediction >= 0.5, 1 , 0) Confusion Matrix Sensitivity (TPR) = 0.9991 Specificity (TNR) = 0.9135 Accuracy = 0.9989 #Compute the accuracy of predictions. confmatrix_table <- confusionMatrix(as.factor(test$prediction), as.factor(test$Class)) #sensitiv #accuracy = 0.9989 plot_confusion_matrix <- function(test_df, sSubtitle) { tst <- data.frame(round(test_df$prediction,0), test_df$Class) opts <- c("Predicted", "True") names(tst) <- opts cf <- plyr::count(tst) cf[opts][cf[opts]==0] <- "Not Fraud" cf[opts][cf[opts]==1] <- "Fraud" ggplot(data = cf, mapping = aes(x = True, y = Predicted)) + labs(title = "Confusion matrix", subtitle = sSubtitle) + geom_tile(aes(fill = freq), colour = "grey") + geom_text(aes(label = sprintf("%1.0f", freq)), vjust = 1) + scale_fill_gradient(low = "lightblue", high = "Green") + theme_bw() + theme(legend.position = "none") } plot_confusion_matrix(test, paste("XGBoost with", paste("min logloss at round: ", min.loss.idx, "Sensitivity:", round(confmatrix_table[[4]][1], 4), "\n", "Specificity:", round(confmatrix_table[[4]][2], 4))) 11 Confusion matrix XGBoost with min logloss at round: 139 Sensitivity: 0.9991 Specificity: 0.9135 9 56803 95 54 Predicted Not Fraud Fraud Fraud Not Fraud True Importance Matrix Identify features that are most important ##plot feature importance importance_matrix <- xgb.importance(model = bst) xgb.plot.importance(importance_matrix) 12 V14 V4 V8 V11 V10 V12 V1 V26 V3 V17 Amount V7 V13 V18 Time V23 V19 V2 V15 V20 V21 V9 V5 V25 V6 V24 V22 V27 V16 V28 0.0 0.1 0.2 0.3 0.4 ROC Curve AUC = 0.956 ############################################################### library(ROCR) ## Loading required package: gplots ## ## Attaching package: 'gplots' ## The following object is masked from 'package:stats': ## ## lowess # Use ROCR package to plot ROC Curve xgb.pred <- prediction(test$prediction, test$Class) xgb.perf <- performance(xgb.pred, "tpr", "fpr") plot(xgb.perf, avg="threshold", colorize=TRUE, 13 0.5 0.6 lwd=1, main="ROC Curve w/ Thresholds", print.cutoffs.at=seq(0, 1, by=0.05), text.adj=c(-0.5, 0.5), text.cex=0.5) grid(col="lightgray") axis(1, at=seq(0, 1, by=0.1)) axis(2, at=seq(0, 1, by=0.1)) abline(v=c(0.1, 0.3, 0.5, 0.7, 0.9), col="lightgray", lty="dotted") abline(h=c(0.1, 0.3, 0.5, 0.7, 0.9), col="lightgray", lty="dotted") lines(x=c(0, 1), y=c(0, 1), col="black", lty="dotted") 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.2 0.8 0.4 0.2 0.4 0.6 1.6 0.2 0.8 0.1 0.05 0 0 0.0 Average true positive rate 1 0.55 0.5 0.15 0.1 0.65 0.6 0.25 0.2 0.75 0.7 0.35 0.3 0.85 0.8 0.45 0.4 0.95 0.9 2 1.0 ROC Curve w/ Thresholds 0.0 Average false positive rate auc_ROCR <- performance(xgb.pred, measure = "auc") #gives the AUC rate auc_ROCR <- auc_ROCR@y.values[[1]] auc_ROCR ## [1] 0.9562559 #AUC = 0.956 14 1.0
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 14 Page Mode : UseOutlines Author : Javier Ng (EY) Title : Credit Card Fraud Detection - XGBOOST Subject : Creator : LaTeX with hyperref Producer : pdfTeX-1.40.20 Keywords : xgboost, machine learning, R Create Date : 2019:03:11 13:34:16+08:00 Modify Date : 2019:03:11 13:34:16+08:00 Trapped : False PTEX Fullbanner : This is MiKTeX-pdfTeX 2.9.6959 (1.40.20)EXIF Metadata provided by EXIF.tools