H2o Package
2018-04-14
: H2O H2O Package h2o_package h2o-r docs-website 7 rel-wolpert h2o
Open the PDF directly: View PDF .
Page Count: 226
Download | ![]() |
Open PDF In Browser | View PDF |
"h2o" April 14, 2018 R topics documented: h2o-package . . . . . . . aaa . . . . . . . . . . . . apply . . . . . . . . . . as.character.H2OFrame . as.data.frame.H2OFrame as.factor . . . . . . . . . as.h2o . . . . . . . . . . as.matrix.H2OFrame . . as.numeric . . . . . . . . as.vector.H2OFrame . . australia . . . . . . . . . colnames . . . . . . . . dim.H2OFrame . . . . . dimnames.H2OFrame . . h2o.abs . . . . . . . . . h2o.acos . . . . . . . . . h2o.aggregated_frame . . h2o.aggregator . . . . . h2o.aic . . . . . . . . . . h2o.all . . . . . . . . . . h2o.anomaly . . . . . . . h2o.any . . . . . . . . . h2o.anyFactor . . . . . . h2o.arrange . . . . . . . h2o.ascharacter . . . . . h2o.asfactor . . . . . . . h2o.asnumeric . . . . . . h2o.assign . . . . . . . . h2o.as_date . . . . . . . h2o.auc . . . . . . . . . h2o.automl . . . . . . . h2o.betweenss . . . . . . h2o.biases . . . . . . . . h2o.bottomN . . . . . . h2o.cbind . . . . . . . . h2o.ceiling . . . . . . . h2o.centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 8 8 9 9 10 11 12 13 13 14 14 15 15 16 17 17 18 19 20 20 21 21 22 22 23 23 24 24 25 25 27 28 28 29 29 30 R topics documented: 2 h2o.centersSTD . . . . . . . . . . . . . . h2o.centroid_stats . . . . . . . . . . . . . h2o.clearLog . . . . . . . . . . . . . . . h2o.clusterInfo . . . . . . . . . . . . . . h2o.clusterIsUp . . . . . . . . . . . . . . h2o.clusterStatus . . . . . . . . . . . . . h2o.cluster_sizes . . . . . . . . . . . . . h2o.coef . . . . . . . . . . . . . . . . . . h2o.coef_norm . . . . . . . . . . . . . . h2o.colnames . . . . . . . . . . . . . . . h2o.columns_by_type . . . . . . . . . . . h2o.computeGram . . . . . . . . . . . . h2o.confusionMatrix . . . . . . . . . . . h2o.connect . . . . . . . . . . . . . . . . h2o.cor . . . . . . . . . . . . . . . . . . h2o.cos . . . . . . . . . . . . . . . . . . h2o.cosh . . . . . . . . . . . . . . . . . . h2o.coxph . . . . . . . . . . . . . . . . . h2o.createFrame . . . . . . . . . . . . . . h2o.cross_validation_fold_assignment . . h2o.cross_validation_holdout_predictions h2o.cross_validation_models . . . . . . . h2o.cross_validation_predictions . . . . . h2o.cummax . . . . . . . . . . . . . . . . h2o.cummin . . . . . . . . . . . . . . . . h2o.cumprod . . . . . . . . . . . . . . . h2o.cumsum . . . . . . . . . . . . . . . . h2o.cut . . . . . . . . . . . . . . . . . . h2o.day . . . . . . . . . . . . . . . . . . h2o.dayOfWeek . . . . . . . . . . . . . . h2o.dct . . . . . . . . . . . . . . . . . . h2o.ddply . . . . . . . . . . . . . . . . . h2o.decryptionSetup . . . . . . . . . . . h2o.deepfeatures . . . . . . . . . . . . . h2o.deeplearning . . . . . . . . . . . . . h2o.deepwater . . . . . . . . . . . . . . . h2o.deepwater.available . . . . . . . . . . h2o.describe . . . . . . . . . . . . . . . . h2o.difflag1 . . . . . . . . . . . . . . . . h2o.dim . . . . . . . . . . . . . . . . . . h2o.dimnames . . . . . . . . . . . . . . . h2o.distance . . . . . . . . . . . . . . . . h2o.downloadAllLogs . . . . . . . . . . . h2o.downloadCSV . . . . . . . . . . . . h2o.download_mojo . . . . . . . . . . . h2o.download_pojo . . . . . . . . . . . . h2o.entropy . . . . . . . . . . . . . . . . h2o.exp . . . . . . . . . . . . . . . . . . h2o.exportFile . . . . . . . . . . . . . . . h2o.exportHDFS . . . . . . . . . . . . . h2o.fillna . . . . . . . . . . . . . . . . . h2o.filterNACols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 30 31 31 32 32 33 33 34 34 35 35 36 37 38 39 39 40 41 42 43 43 44 44 45 45 46 46 47 48 48 49 50 51 52 58 62 62 63 63 64 64 65 65 66 67 68 68 69 70 70 71 R topics documented: h2o.findSynonyms . . . . . . . . . h2o.find_row_by_threshold . . . . . h2o.find_threshold_by_max_metric h2o.floor . . . . . . . . . . . . . . . h2o.flow . . . . . . . . . . . . . . . h2o.gainsLift . . . . . . . . . . . . h2o.gbm . . . . . . . . . . . . . . . h2o.getConnection . . . . . . . . . h2o.getFrame . . . . . . . . . . . . h2o.getFutureModel . . . . . . . . . h2o.getGLMFullRegularizationPath h2o.getGrid . . . . . . . . . . . . . h2o.getId . . . . . . . . . . . . . . h2o.getModel . . . . . . . . . . . . h2o.getTimezone . . . . . . . . . . h2o.getTypes . . . . . . . . . . . . h2o.getVersion . . . . . . . . . . . h2o.giniCoef . . . . . . . . . . . . h2o.glm . . . . . . . . . . . . . . . h2o.glrm . . . . . . . . . . . . . . . h2o.grep . . . . . . . . . . . . . . . h2o.grid . . . . . . . . . . . . . . . h2o.group_by . . . . . . . . . . . . h2o.gsub . . . . . . . . . . . . . . . h2o.head . . . . . . . . . . . . . . . h2o.hist . . . . . . . . . . . . . . . h2o.hit_ratio_table . . . . . . . . . h2o.hour . . . . . . . . . . . . . . . h2o.ifelse . . . . . . . . . . . . . . h2o.importFile . . . . . . . . . . . h2o.import_sql_select . . . . . . . . h2o.import_sql_table . . . . . . . . h2o.impute . . . . . . . . . . . . . h2o.init . . . . . . . . . . . . . . . h2o.insertMissingValues . . . . . . h2o.interaction . . . . . . . . . . . h2o.isax . . . . . . . . . . . . . . . h2o.ischaracter . . . . . . . . . . . h2o.isfactor . . . . . . . . . . . . . h2o.isnumeric . . . . . . . . . . . . h2o.is_client . . . . . . . . . . . . . h2o.kfold_column . . . . . . . . . . h2o.killMinus3 . . . . . . . . . . . h2o.kmeans . . . . . . . . . . . . . h2o.kurtosis . . . . . . . . . . . . . h2o.levels . . . . . . . . . . . . . . h2o.listTimezones . . . . . . . . . . h2o.list_all_extensions . . . . . . . h2o.list_api_extensions . . . . . . . h2o.list_core_extensions . . . . . . h2o.loadModel . . . . . . . . . . . h2o.log . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 72 72 73 73 73 74 78 79 79 79 80 80 81 81 82 82 82 83 87 90 91 92 93 94 94 95 95 96 97 98 99 100 101 103 104 105 106 106 107 107 107 108 108 110 110 111 111 111 112 112 113 R topics documented: 4 h2o.log10 . . . . . . . . . . h2o.log1p . . . . . . . . . . h2o.log2 . . . . . . . . . . . h2o.logAndEcho . . . . . . h2o.logloss . . . . . . . . . h2o.ls . . . . . . . . . . . . h2o.lstrip . . . . . . . . . . h2o.mae . . . . . . . . . . . h2o.makeGLMModel . . . . h2o.make_metrics . . . . . . h2o.match . . . . . . . . . . h2o.max . . . . . . . . . . . h2o.mean . . . . . . . . . . h2o.mean_per_class_error . h2o.mean_residual_deviance h2o.median . . . . . . . . . h2o.merge . . . . . . . . . . h2o.metric . . . . . . . . . . h2o.min . . . . . . . . . . . h2o.mktime . . . . . . . . . h2o.month . . . . . . . . . . h2o.mse . . . . . . . . . . . h2o.nacnt . . . . . . . . . . h2o.naiveBayes . . . . . . . h2o.names . . . . . . . . . . h2o.na_omit . . . . . . . . . h2o.nchar . . . . . . . . . . h2o.ncol . . . . . . . . . . . h2o.networkTest . . . . . . . h2o.nlevels . . . . . . . . . h2o.no_progress . . . . . . . h2o.nrow . . . . . . . . . . h2o.null_deviance . . . . . . h2o.null_dof . . . . . . . . . h2o.num_iterations . . . . . h2o.num_valid_substrings . h2o.openLog . . . . . . . . h2o.parseRaw . . . . . . . . h2o.parseSetup . . . . . . . h2o.partialPlot . . . . . . . . h2o.performance . . . . . . h2o.pivot . . . . . . . . . . h2o.prcomp . . . . . . . . . h2o.predict_json . . . . . . . h2o.print . . . . . . . . . . . h2o.prod . . . . . . . . . . . h2o.proj_archetypes . . . . . h2o.quantile . . . . . . . . . h2o.r2 . . . . . . . . . . . . h2o.randomForest . . . . . . h2o.range . . . . . . . . . . h2o.rbind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 114 114 115 115 116 116 117 117 118 118 119 120 121 122 122 123 124 126 126 127 127 128 129 131 131 132 132 133 133 133 134 134 135 135 136 136 137 138 139 140 141 141 143 144 144 145 146 147 147 151 151 R topics documented: h2o.reconstruct . . . . . h2o.relevel . . . . . . . . h2o.removeAll . . . . . h2o.removeVecs . . . . . h2o.rep_len . . . . . . . h2o.residual_deviance . . h2o.residual_dof . . . . h2o.rm . . . . . . . . . . h2o.rmse . . . . . . . . . h2o.rmsle . . . . . . . . h2o.round . . . . . . . . h2o.rstrip . . . . . . . . h2o.runif . . . . . . . . . h2o.saveModel . . . . . h2o.saveModelDetails . . h2o.saveMojo . . . . . . h2o.scale . . . . . . . . h2o.scoreHistory . . . . h2o.sd . . . . . . . . . . h2o.sdev . . . . . . . . . h2o.setLevels . . . . . . h2o.setTimezone . . . . h2o.show_progress . . . h2o.shutdown . . . . . . h2o.signif . . . . . . . . h2o.sin . . . . . . . . . . h2o.skewness . . . . . . h2o.splitFrame . . . . . h2o.sqrt . . . . . . . . . h2o.stackedEnsemble . . h2o.startLogging . . . . h2o.std_coef_plot . . . . h2o.stopLogging . . . . h2o.str . . . . . . . . . . h2o.stringdist . . . . . . h2o.strsplit . . . . . . . h2o.sub . . . . . . . . . h2o.substring . . . . . . h2o.sum . . . . . . . . . h2o.summary . . . . . . h2o.svd . . . . . . . . . h2o.table . . . . . . . . . h2o.tabulate . . . . . . . h2o.tan . . . . . . . . . h2o.tanh . . . . . . . . . h2o.target_encode_apply h2o.target_encode_create h2o.toFrame . . . . . . . h2o.tokenize . . . . . . . h2o.tolower . . . . . . . h2o.topN . . . . . . . . h2o.totss . . . . . . . . . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 153 153 154 154 155 155 156 156 157 158 158 159 159 160 161 162 162 163 163 164 164 165 165 166 166 167 167 168 169 170 171 171 172 172 173 173 174 175 175 176 178 178 179 180 180 181 182 183 184 184 185 R topics documented: 6 h2o.tot_withinss . . . . . . . . . . . . . . h2o.toupper . . . . . . . . . . . . . . . . h2o.transform . . . . . . . . . . . . . . . h2o.trim . . . . . . . . . . . . . . . . . . h2o.trunc . . . . . . . . . . . . . . . . . h2o.unique . . . . . . . . . . . . . . . . h2o.var . . . . . . . . . . . . . . . . . . h2o.varimp . . . . . . . . . . . . . . . . h2o.varimp_plot . . . . . . . . . . . . . . h2o.week . . . . . . . . . . . . . . . . . h2o.weights . . . . . . . . . . . . . . . . h2o.which . . . . . . . . . . . . . . . . . h2o.which_max . . . . . . . . . . . . . . h2o.which_min . . . . . . . . . . . . . . h2o.withinss . . . . . . . . . . . . . . . . h2o.word2vec . . . . . . . . . . . . . . . h2o.xgboost . . . . . . . . . . . . . . . . h2o.xgboost.available . . . . . . . . . . . h2o.year . . . . . . . . . . . . . . . . . . H2OAutoML-class . . . . . . . . . . . . H2OClusteringModel-class . . . . . . . . H2OConnection-class . . . . . . . . . . . H2OConnectionMutableState . . . . . . . H2OCoxPHModel-class . . . . . . . . . H2OCoxPHModelSummary-class . . . . H2OFrame-class . . . . . . . . . . . . . H2OFrame-Extract . . . . . . . . . . . . H2OGrid-class . . . . . . . . . . . . . . H2OModel-class . . . . . . . . . . . . . H2OModelFuture-class . . . . . . . . . . H2OModelMetrics-class . . . . . . . . . housevotes . . . . . . . . . . . . . . . . . iris . . . . . . . . . . . . . . . . . . . . . is.character . . . . . . . . . . . . . . . . is.factor . . . . . . . . . . . . . . . . . . is.h2o . . . . . . . . . . . . . . . . . . . is.numeric . . . . . . . . . . . . . . . . . Logical-or . . . . . . . . . . . . . . . . . ModelAccessors . . . . . . . . . . . . . . names.H2OFrame . . . . . . . . . . . . . Ops.H2OFrame . . . . . . . . . . . . . . plot.H2OModel . . . . . . . . . . . . . . plot.H2OTabulate . . . . . . . . . . . . . predict.H2OAutoML . . . . . . . . . . . predict.H2OModel . . . . . . . . . . . . predict_leaf_node_assignment.H2OModel print.H2OFrame . . . . . . . . . . . . . . print.H2OTable . . . . . . . . . . . . . . prostate . . . . . . . . . . . . . . . . . . range.H2OFrame . . . . . . . . . . . . . show,H2OCoxPHModelSummary-method str.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 186 186 187 187 188 188 189 189 190 191 191 192 192 193 193 194 197 198 198 199 199 200 200 201 201 201 202 203 203 204 204 205 205 206 206 206 207 207 208 208 210 211 212 212 213 214 214 215 215 216 216 h2o-package 7 summary,H2OCoxPHModel-method summary,H2OGrid-method . . . . . summary,H2OModel-method . . . . use.package . . . . . . . . . . . . . walking . . . . . . . . . . . . . . . zzz . . . . . . . . . . . . . . . . . . && . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 217 218 218 219 219 220 221 h2o-package H2O R Interface Description This is a package for running H2O via its REST API from within R. To communicate with a H2O instance, the version of the R package must match the version of H2O. When connecting to a new H2O cluster, it is necessary to re-run the initializer. Details Package: Type: Version: Branch: Date: License: Depends: h2o Package 3.18.0.7 rel-wolpert Sat Apr 14 22:16:02 UTC 2018 Apache License (== 2.0) R (>= 2.13.0), RCurl, jsonlite, statmod, tools, methods, utils This package allows the user to run basic H2O commands using R commands. In order to use it, you must first have H2O running. To run H2O on your local machine, call h2o.init without any arguments, and H2O will be automatically launched at localhost:54321, where the IP is "127.0.0.1" and the port is 54321. If H2O is running on a cluster, you must provide the IP and port of the remote machine as arguments to the h2o.init() call. H2O supports a number of standard statistical models, such as GLM, K-means, and Random Forest. For example, to run GLM, call h2o.glm with the H2O parsed data and parameters (response variable, error distribution, etc...) as arguments. (The operation will be done on the server associated with the data object where H2O is running, not within the R environment). Note that no actual data is stored in the R workspace; and no actual work is carried out by R. R only saves the named objects, which uniquely identify the data set, model, etc on the server. When the user makes a request, R queries the server via the REST API, which returns a JSON file with the relevant information that R then displays in the console. If you are using an older version of H2O, use the following porting guide to update your scripts: Porting Scripts Author(s) Maintainer: The H2O.ai team8 apply References • H2O.ai Homepage • H2O Documentation • H2O on GitHub aaa Starting H2O For examples Description Starting H2O For examples Examples if(Sys.info()['sysname'] == "Darwin" && Sys.info()['release'] == '13.4.0'){ quit(save="no") }else{ h2o.init(nthreads = 2) } apply Apply on H2O Datasets Description Method for apply on H2OFrame objects. Usage apply(X, MARGIN, FUN, ...) Arguments X an H2OFrame object on which apply will operate. MARGIN the vector on which the function will be applied over, either 1 for rows or 2 for columns. FUN the function to be applied. ... optional arguments to FUN. Value Produces a new H2OFrame of the output of the applied function. The output is stored in H2O so that it can be used in subsequent H2O processes. See Also apply for the base generic as.character.H2OFrame 9 Examples h2o.init() irisPath <- system.file("extdata", "iris.csv", package="h2o") iris.hex <- h2o.importFile(path = irisPath, destination_frame = "iris.hex") summary(apply(iris.hex, 2, sum)) as.character.H2OFrame Convert an H2OFrame to a String Description Convert an H2OFrame to a String Usage ## S3 method for class 'H2OFrame' as.character(x, ...) Arguments x An H2OFrame object ... Further arguments to be passed from or to other methods. Examples h2o.init() pretrained.frame <- as.h2o(data.frame( C1 = c("a", "b"), C2 = c(0, 1), C3 = c(1, 0), C4 = c(0.2, 0.8), stringsAsFactors = FALSE)) pretrained.w2v <- h2o.word2vec(pre_trained = pretrained.frame, vec_size = 3) words <- as.character(as.h2o(c("b", "a", "c", NA, "a"))) vecs <- h2o.transform(pretrained.w2v, words = words) as.data.frame.H2OFrame Converts parsed H2O data into an R data frame Description Downloads the H2O data and then scans it in to an R data frame. Usage ## S3 method for class 'H2OFrame' as.data.frame(x, ...) 10 as.factor Arguments x An H2OFrame object. ... Further arguments to be passed down from other methods. Details Method as.data.frame.H2OFrame will use fread if data.table package is installed in required version. See Also use.package Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) as.data.frame(prostate.hex) as.factor Convert H2O Data to Factors Description Convert a column into a factor column. Usage as.factor(x) Arguments x a column from an H2OFrame data set. See Also as.factor. Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) prostate.hex[,2] <- as.factor(prostate.hex[,2]) summary(prostate.hex) as.h2o 11 as.h2o Create H2OFrame Description Import R object to the H2O cloud. Usage as.h2o(x, destination_frame = "", ...) ## Default S3 method: as.h2o(x, destination_frame = "", ...) ## S3 method for class 'H2OFrame' as.h2o(x, destination_frame = "", ...) ## S3 method for class 'data.frame' as.h2o(x, destination_frame = "", ...) ## S3 method for class 'Matrix' as.h2o(x, destination_frame = "", ...) Arguments x An R object. destination_frame A string with the desired name for the H2OFrame. ... arguments passed to method arguments. Details Method as.h2o.data.frame will use fwrite if data.table package is installed in required version. To speedup execution time for large sparse matrices, use h2o datatable. Make sure you have installed and imported data.table and slam packages. Turn on h2o datatable by options("h2o.use.data.table"=TRUE) References http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/ See Also use.package Examples h2o.init() hi <- as.h2o(iris) he <- as.h2o(euro) hl <- as.h2o(letters) hm <- as.h2o(state.x77) 12 as.matrix.H2OFrame hh <- as.h2o(hi) stopifnot(is.h2o(hi), dim(hi)==dim(iris), is.h2o(he), dim(he)==c(length(euro),1L), is.h2o(hl), dim(hl)==c(length(letters),1L), is.h2o(hm), dim(hm)==dim(state.x77), is.h2o(hh), dim(hh)==dim(hi)) if (requireNamespace("Matrix", quietly=TRUE)) { data <- rep(0, 100) data[(1:10)^2] <- 1:10 * pi m <- matrix(data, ncol = 20, byrow = TRUE) m <- Matrix::Matrix(m, sparse = TRUE) hs <- as.h2o(m) stopifnot(is.h2o(hs), dim(hs)==dim(m)) } as.matrix.H2OFrame Convert an H2OFrame to a matrix Description Convert an H2OFrame to a matrix Usage ## S3 method for class 'H2OFrame' as.matrix(x, ...) Arguments x An H2OFrame object ... Further arguments to be passed down from other methods. Examples h2o.init() irisPath <- system.file("extdata", "iris.csv", package="h2o") iris <- h2o.uploadFile(path = irisPath) iris.hex <- as.h2o(iris) describe <- h2o.describe(iris.hex) mins = as.matrix(apply(iris.hex, 2, min)) print(mins) as.numeric 13 as.numeric Convert H2O Data to Numeric Description Converts an H2O column into a numeric value column. Usage as.numeric(x) Arguments x a column from an H2OFrame data set. ... Further arguments to be passed from or to other methods. Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) prostate.hex[,2] <- as.factor (prostate.hex[,2]) prostate.hex[,2] <- as.numeric(prostate.hex[,2]) as.vector.H2OFrame Convert an H2OFrame to a vector Description Convert an H2OFrame to a vector Usage ## S3 method for class 'H2OFrame' as.vector(x,mode) Arguments x An H2OFrame object mode Mode to coerce vector to 14 colnames Examples h2o.init() irisPath <- system.file("extdata", "iris.csv", package="h2o") iris <- h2o.uploadFile(path = irisPath) hex <- as.h2o(iris) cor_R <- cor(as.matrix(iris[,1])) cor_h2o <- cor(hex[,1]) iris_Rcor <- cor(iris[,1:4]) iris_H2Ocor <- as.data.frame(cor(hex[,1:4])) h2o_vec <- as.vector(unlist(iris_H2Ocor)) r_vec <- as.vector(unlist(iris_Rcor)) australia Australia Coastal Data Description Temperature, soil moisture, runoff, and other environmental measurements from the Australia coast. The data is available from http://cs.colby.edu/courses/S11/cs251/labs/lab07/AustraliaSubset. csv. Format A data frame with 251 rows and 8 columns colnames Returns the column names of an H2OFrame Description Returns the column names of an H2OFrame Usage colnames(x, do.NULL = TRUE, prefix = "col") Arguments x An H2OFrame object. do.NULL logical. If FALSE and names are NULL, names are created. prefix for created names. Examples h2o.init() iris.hex <- as.h2o(iris) colnames(iris) # Returns "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" dim.H2OFrame dim.H2OFrame 15 Returns the Dimensions of an H2OFrame Description Returns the number of rows and columns for an H2OFrame object. Usage ## S3 method for class 'H2OFrame' dim(x) Arguments x An H2OFrame object. See Also dim for the base R method. Examples h2o.init() iris.hex <- as.h2o(iris) dim(iris.hex) dimnames.H2OFrame Column names of an H2OFrame Description Set column names of an H2O Frame Usage ## S3 method for class 'H2OFrame' dimnames(x) Arguments x An H2OFrame 16 h2o.abs Examples h2o.init() n <- 2000 # Generate variables V1, ... V10 X <- matrix(rnorm(10*n), n, 10) # y = +1 if sum_i x_{ij}^2 > chisq median on 10 df y <- rep(-1, n) y[apply(X*X, 1, sum) > qchisq(.5, 10)] <- 1 # Assign names to the columns of X: dimnames(X)[[2]] <- c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10") h2o.abs Compute the absolute value of x Description Compute the absolute value of x Usage h2o.abs(x) Arguments x An H2OFrame object. See Also abs for the base R implementation. Examples h2o.init() url <- "https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/smtrees.csv" smtreesH2O <- h2o.importFile(url) smtreesR <- read.csv("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/smtrees.csv") fith2o <- h2o.gbm(x=c("girth", "height"), y="vol", ntrees=3, max_depth=1, distribution="gaussian", min_rows=2, learn_rate=.1, training_frame=smtreesH2O) pred <- as.data.frame(predict(fith2o, newdata=smtreesH2O)) diff <- pred-smtreesR[,4] diff_abs <- abs(diff) print(diff_abs) h2o.acos 17 h2o.acos Compute the arc cosine of x Description Compute the arc cosine of x Usage h2o.acos(x) Arguments x An H2OFrame object. See Also acos for the base R implementation. Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) h2o.acos(prostate.hex[,2]) h2o.aggregated_frame Retrieve an aggregated frame from an Aggregator model Description Retrieve an aggregated frame from the Aggregator model and use it to create a new frame. Usage h2o.aggregated_frame(model) Arguments model an H2OClusteringModel corresponding from a h2o.aggregator call. 18 h2o.aggregator Examples library(h2o) h2o.init() df <- h2o.createFrame(rows=100, cols=5, categorical_fraction=0.6, integer_fraction=0, binary_fraction=0, real_range=100, integer_range=100, missing_fraction=0) target_num_exemplars=1000 rel_tol_num_exemplars=0.5 encoding="Eigen" agg <- h2o.aggregator(training_frame=df, target_num_exemplars=target_num_exemplars, rel_tol_num_exemplars=rel_tol_num_exemplars, categorical_encoding=encoding) # Use the aggregated frame to create a new dataframe new_df <- h2o.aggregated_frame(agg) h2o.aggregator Build an Aggregated Frame Description Builds an Aggregated Frame of an H2OFrame. Usage h2o.aggregator(training_frame, x, model_id = NULL, ignore_const_cols = TRUE, target_num_exemplars = 5000, rel_tol_num_exemplars = 0.5, transform = c("NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"), categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), save_mapping_frame = FALSE) Arguments training_frame Id of the training data frame. x A vector containing the character names of the predictors in the model. model_id Destination id for this model; auto-generated if not specified. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. target_num_exemplars Targeted number of exemplars Defaults to 5000. rel_tol_num_exemplars Relative tolerance for number of exemplars (e.g, 0.5 is +/- 50 percents) Defaults to 0.5. transform Transformation of training data Must be one of: "NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE". Defaults to NORMALIZE. categorical_encoding Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. h2o.aic 19 save_mapping_frame Logical. Whether to export the mapping of the aggregated frame Defaults to FALSE. Examples library(h2o) h2o.init() df <- h2o.createFrame(rows=100, cols=5, categorical_fraction=0.6, integer_fraction=0, binary_fraction=0, real_range=100, integer_range=100, missing_fraction=0) target_num_exemplars=1000 rel_tol_num_exemplars=0.5 encoding="Eigen" agg <- h2o.aggregator(training_frame=df, target_num_exemplars=target_num_exemplars, rel_tol_num_exemplars=rel_tol_num_exemplars, categorical_encoding=encoding) h2o.aic Retrieve the Akaike information criterion (AIC) value Description Retrieves the AIC value. If "train", "valid", and "xval" parameters are FALSE (default), then the training AIC value is returned. If more than one parameter is set to TRUE, then a named vector of AICs are returned, where the names are "train", "valid" or "xval". Usage h2o.aic(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel or H2OModelMetrics. train Retrieve the training AIC valid Retrieve the validation AIC xval Retrieve the cross-validation AIC Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) p.sid <- h2o.runif(prostate.hex) prostate.train <- h2o.assign(prostate.hex[p.sid > .2,], "prostate.train") prostate.glm <- h2o.glm(x=3:7, y=2, training_frame=prostate.train) aic.basic <- h2o.aic(prostate.glm) print(aic.basic) 20 h2o.anomaly h2o.all Given a set of logical vectors, are all of the values true? Description Given a set of logical vectors, are all of the values true? Usage h2o.all(x) Arguments x An H2OFrame object. See Also all for the base R implementation. h2o.anomaly Anomaly Detection via H2O Deep Learning Model Description Detect anomalies in an H2O dataset using an H2O deep learning model with auto-encoding. Usage h2o.anomaly(object, data, per_feature = FALSE) Arguments object An H2OAutoEncoderModel object that represents the model to be used for anomaly detection. data An H2OFrame object. per_feature Whether to return the per-feature squared reconstruction error Value Returns an H2OFrame object containing the reconstruction MSE or the per-feature squared error. See Also h2o.deeplearning for making an H2OAutoEncoderModel. h2o.any 21 Examples library(h2o) h2o.init() prosPath = system.file("extdata", "prostate.csv", package = "h2o") prostate.hex = h2o.importFile(path = prosPath) prostate.dl = h2o.deeplearning(x = 3:9, training_frame = prostate.hex, autoencoder = TRUE, hidden = c(10, 10), epochs = 5) prostate.anon = h2o.anomaly(prostate.dl, prostate.hex) head(prostate.anon) prostate.anon.per.feature = h2o.anomaly(prostate.dl, prostate.hex, per_feature=TRUE) head(prostate.anon.per.feature) h2o.any Given a set of logical vectors, is at least one of the values true? Description Given a set of logical vectors, is at least one of the values true? Usage h2o.any(x) Arguments x An H2OFrame object. See Also all for the base R implementation. h2o.anyFactor Check H2OFrame columns for factors Description Determines if any column of an H2OFrame object contains categorical data. Usage h2o.anyFactor(x) Arguments x An H2OFrame object. Value Returns a logical value indicating whether any of the columns in x are factors. 22 h2o.ascharacter Examples library(h2o) h2o.init() irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o") iris.hex <- h2o.importFile(path = irisPath) h2o.anyFactor(iris.hex) h2o.arrange Sorts an H2O frame by columns Description Sorts H2OFrame by the columns specified. H2OFrame can contain String columns but should not sort on any String columns. Otherwise, an error will be thrown. To sort column c1 in descending order, do desc(c1). Returns a new H2OFrame, like dplyr::arrange. Usage h2o.arrange(x, ...) Arguments x The H2OFrame input to be sorted. ... The column names to sort by. h2o.ascharacter Convert H2O Data to Characters Description Convert H2O Data to Characters Usage h2o.ascharacter(x) Arguments x An H2OFrame object. See Also as.character for the base R implementation. h2o.asfactor 23 h2o.asfactor Convert H2O Data to Factors Description Convert H2O Data to Factors Usage h2o.asfactor(x) Arguments x An H2OFrame object. See Also as.factor for the base R implementation. h2o.asnumeric Convert H2O Data to Numerics Description Convert H2O Data to Numerics Usage h2o.asnumeric(x) Arguments x An H2OFrame object. See Also as.numeric for the base R implementation. 24 h2o.as_date h2o.assign Rename an H2O object. Description Makes a copy of the data frame and gives it the desired the key. Usage h2o.assign(data, key) Arguments data An H2OFrame object key The hex key to be associated with the H2O parsed data object h2o.as_date Convert between character representations and objects of Date class Description Functions to convert between character representations and objects of class "Date" representing calendar dates. Usage h2o.as_date(x, format, ...) Arguments x H2OFrame column of strings or factors to be converted format A character string indicating date pattern ... Further arguments to be passed from or to other methods. h2o.auc h2o.auc 25 Retrieve the AUC Description Retrieves the AUC value from an H2OBinomialMetrics. If "train", "valid", and "xval" parameters are FALSE (default), then the training AUC value is returned. If more than one parameter is set to TRUE, then a named vector of AUCs are returned, where the names are "train", "valid" or "xval". Usage h2o.auc(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OBinomialMetrics object. train Retrieve the training AUC valid Retrieve the validation AUC xval Retrieve the cross-validation AUC See Also h2o.giniCoef for the Gini coefficient, h2o.mse for MSE, and h2o.metric for the various threshold metrics. See h2o.performance for creating H2OModelMetrics objects. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") hex <- h2o.uploadFile(prosPath) hex[,2] <- as.factor(hex[,2]) model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli") perf <- h2o.performance(model, hex) h2o.auc(perf) h2o.automl Automatic Machine Learning Description The Automatic Machine Learning (AutoML) function automates the supervised machine learning model training process. The current version of AutoML trains and cross-validates a Random Forest, an Extremely-Randomized Forest, a random grid of Gradient Boosting Machines (GBMs), a random grid of Deep Neural Nets, and then trains a Stacked Ensemble using all of the models. 26 h2o.automl Usage h2o.automl(x, y, training_frame, validation_frame = NULL, leaderboard_frame = NULL, nfolds = 5, fold_column = NULL, weights_column = NULL, max_runtime_secs = 3600, max_models = NULL, stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error"), stopping_tolerance = NULL, stopping_rounds = 3, seed = NULL, project_name = NULL, exclude_algos = NULL) Arguments x A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. y The name or index of the response variable in the model. For classification, the y column must be a factor, otherwise regression will be performed. Indexes are 1-based in R. training_frame Training frame (H2OFrame or ID). validation_frame Validation frame (H2OFrame or ID); Optional. This frame is used for early stopping of individual models and early stopping of the grid searches (unless max_models or max_runtimes_secs overrides metric-based early stopping). leaderboard_frame Leaderboard frame (H2OFrame or ID); Optional. If provided, the Leaderboard will be scored using this data frame intead of using cross-validation metrics, which is the default. nfolds Number of folds for k-fold cross-validation. Defaults to 5. Use 0 to disable cross-validation; this will also disable Stacked Ensemble (thus decreasing the overall model performance). fold_column Column with cross-validation fold index assignment per observation; used to override the default, randomized, 5-fold cross-validation scheme for individual models in the AutoML run. weights_column Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. max_runtime_secs Maximum allowed runtime in seconds for the entire model training process. Use 0 to disable. Defaults to 3600 secs (1 hour). max_models Maximum number of models to build in the AutoML process (does not include Stacked Ensembles). Defaults to NULL. stopping_metric Metric to use for early stopping (AUTO is logloss for classification, deviance for regression). Must be one of "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error". Defaults to AUTO. stopping_tolerance Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much). This value defaults to 0.001 if the dataset is at least 1 million rows; otherwise it defaults to a bigger value determined by the size of the dataset and the non-NA-rate. In that case, the value is computed as 1/sqrt(nrows * non-NA-rate). h2o.betweenss 27 stopping_rounds Integer. Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k (stopping_rounds) scoring events. Defaults to 3 and must be an non-zero integer. Use 0 to disable early stopping. seed Integer. Set a seed for reproducibility. AutoML can only guarantee reproducibility if max_models or early stopping is used because max_runtime_secs is resource limited, meaning that if the resources are not the same between runs, AutoML may be able to train more models on one run vs another. project_name Character string to identify an AutoML project. Defaults to NULL, which means a project name will be auto-generated based on the training frame ID. exclude_algos Vector of character strings naming the algorithms to skip during the modelbuilding phase. An example use is exclude_algos = c("GLM", "DeepLearning", "DRF"), and the full list of options is: "GLM", "GBM", "DRF" (Random Forest and Extremely-Randomized Trees), "DeepLearning" and "StackedEnsemble". Defaults to NULL, which means that all appropriate H2O algorithms will be used, if the search stopping criteria allow. Optional. Details AutoML finds the best model, given a training frame and response, and returns an H2OAutoML object, which contains a leaderboard of all the models that were trained in the process, ranked by a default model performance metric. Value An H2OAutoML object. Examples library(h2o) h2o.init() votes_path <- system.file("extdata", "housevotes.csv", package = "h2o") votes_hf <- h2o.uploadFile(path = votes_path, header = TRUE) aml <- h2o.automl(y = "Class", training_frame = votes_hf, max_runtime_secs = 30) h2o.betweenss Get the between cluster sum of squares Description Get the between cluster sum of squares. If "train", "valid", and "xval" parameters are FALSE (default), then the training betweenss value is returned. If more than one parameter is set to TRUE, then a named vector of betweenss’ are returned, where the names are "train", "valid" or "xval". Usage h2o.betweenss(object, train = FALSE, valid = FALSE, xval = FALSE) 28 h2o.bottomN Arguments object An H2OClusteringModel object. train Retrieve the training between cluster sum of squares valid Retrieve the validation between cluster sum of squares xval Retrieve the cross-validation between cluster sum of squares h2o.biases Return the respective bias vector Description Return the respective bias vector Usage h2o.biases(object, vector_id = 1) Arguments object An H2OModel or H2OModelMetrics vector_id An integer, ranging from 1 to number of layers + 1, that specifies the bias vector to return. h2o.bottomN H2O bottomN Description bottomN function will will grab the bottom N percent of values of a column and return it in a H2OFrame. Extract the top N percent of values of a column and return it in a H2OFrame. Usage h2o.bottomN(x, column, nPercent) Arguments x an H2OFrame column is a column name or column index to grab the top N percent value from nPercent is a bottom percentage value to grab Value An H2OFrame with 2 columns. The first column is the original row indices, second column contains the bottomN values h2o.cbind h2o.cbind 29 Combine H2O Datasets by Columns Description Takes a sequence of H2O data sets and combines them by column Usage h2o.cbind(...) Arguments ... A sequence of H2OFrame arguments. All datasets must exist on the same H2O instance (IP and port) and contain the same number of rows. Value An H2OFrame object containing the combined . . . arguments column-wise. See Also cbind for the base R method. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) prostate.cbind <- h2o.cbind(prostate.hex, prostate.hex) head(prostate.cbind) h2o.ceiling Take a single numeric argument and return a numeric vector with the smallest integers Description ceiling takes a single numeric argument x and returns a numeric vector containing the smallest integers not less than the corresponding elements of x. Usage h2o.ceiling(x) Arguments x An H2OFrame object. 30 h2o.centroid_stats See Also ceiling for the base R implementation. h2o.centers Retrieve the Model Centers Description Retrieve the Model Centers Usage h2o.centers(object) Arguments object An H2OClusteringModel object. h2o.centersSTD Retrieve the Model Centers STD Description Retrieve the Model Centers STD Usage h2o.centersSTD(object) Arguments object An H2OClusteringModel object. h2o.centroid_stats Retrieve centroid statistics Description Retrieve the centroid statistics. If "train", "valid", and "xval" parameters are FALSE (default), then the training centroid stats value is returned. If more than one parameter is set to TRUE, then a named list of centroid stats data frames are returned, where the names are "train", "valid" or "xval". Usage h2o.centroid_stats(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object train valid xval An H2OClusteringModel object. Retrieve the training centroid statistics Retrieve the validation centroid statistics Retrieve the cross-validation centroid statistics h2o.clearLog h2o.clearLog 31 Delete All H2O R Logs Description Clear all H2O R command and error response logs from the local disk. Used primarily for debugging purposes. Usage h2o.clearLog() See Also h2o.startLogging, h2o.stopLogging, h2o.openLog Examples library(h2o) h2o.init() h2o.startLogging() ausPath = system.file("extdata", "australia.csv", package="h2o") australia.hex = h2o.importFile(path = ausPath) h2o.stopLogging() h2o.clearLog() h2o.clusterInfo Description Print H2O cluster info Usage h2o.clusterInfo() Print H2O cluster info 32 h2o.clusterStatus h2o.clusterIsUp Determine if an H2O cluster is up or not Description Determine if an H2O cluster is up or not Usage h2o.clusterIsUp(conn = h2o.getConnection()) Arguments conn H2OConnection object Value TRUE if the cluster is up; FALSE otherwise h2o.clusterStatus Return the status of the cluster Description Retrieve information on the status of the cluster running H2O. Usage h2o.clusterStatus() See Also H2OConnection, h2o.init Examples h2o.init() h2o.clusterStatus() h2o.cluster_sizes 33 h2o.cluster_sizes Retrieve the cluster sizes Description Retrieve the cluster sizes. If "train", "valid", and "xval" parameters are FALSE (default), then the training cluster sizes value is returned. If more than one parameter is set to TRUE, then a named list of cluster size vectors are returned, where the names are "train", "valid" or "xval". Usage h2o.cluster_sizes(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OClusteringModel object. train Retrieve the training cluster sizes valid Retrieve the validation cluster sizes xval Retrieve the cross-validation cluster sizes h2o.coef Return the coefficients that can be applied to the non-standardized data. Description Note: standardize = True by default. If set to False, then coef() returns the coefficients that are fit directly. Usage h2o.coef(object) Arguments object an H2OModel object. 34 h2o.colnames h2o.coef_norm Return coefficients fitted on the standardized data (requires standardize = True, which is on by default). These coefficients can be used to evaluate variable importance. Description Return coefficients fitted on the standardized data (requires standardize = True, which is on by default). These coefficients can be used to evaluate variable importance. Usage h2o.coef_norm(object) Arguments object h2o.colnames an H2OModel object. Return column names of an H2OFrame Description Return column names of an H2OFrame Usage h2o.colnames(x) Arguments x An H2OFrame object. See Also colnames for the base R implementation. h2o.columns_by_type 35 h2o.columns_by_type Obtain a list of columns that are specified by ‘coltype‘ Description Obtain a list of columns that are specified by ‘coltype‘ Usage h2o.columns_by_type(object, coltype = "numeric", ...) Arguments object H2OFrame object coltype A character string indicating which column type to filter by. This must be one of the following: "numeric" - Numeric, but not categorical or time "categorical" Integer, with a categorical/factor String mapping "string" - String column "time" - Long msec since the Unix Epoch - with a variety of display/parse options "uuid" - UUID "bad" - No none-NA rows (triple negative! all NAs or zero rows) ... Ignored Value A list of column indices that correspond to "type" Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) h2o.columns_by_type(prostate.hex,coltype="numeric") h2o.computeGram Compute weighted gram matrix. Description Compute weighted gram matrix. Usage h2o.computeGram(X, weights = "", use_all_factor_levels = FALSE, standardize = TRUE, skip_missing = FALSE) 36 h2o.confusionMatrix Arguments X an H2OModel corresponding to H2O framel. weights character corresponding to name of weight vector in frame. use_all_factor_levels logical flag telling h2o whether or not to skip first level of categorical variables during one-hot encoding. standardize logical flag telling h2o whether or not to standardize data skip_missing logical flag telling h2o whether skip rows with missing data or impute them with mean h2o.confusionMatrix Access H2O Confusion Matrices Description Retrieve either a single or many confusion matrices from H2O objects. Usage h2o.confusionMatrix(object, ...) ## S4 method for signature 'H2OModel' h2o.confusionMatrix(object, newdata, valid = FALSE, ...) ## S4 method for signature 'H2OModelMetrics' h2o.confusionMatrix(object, thresholds = NULL, metrics = NULL) Arguments object ... newdata valid thresholds metrics Either an H2OModel object or an H2OModelMetrics object. Extra arguments for extracting train or valid confusion matrices. An H2OFrame object that can be scored on. Requires a valid response column. Retrieve the validation metric. (Optional) A value or a list of valid values between 0.0 and 1.0. This value is only used in the case of H2OBinomialMetrics objects. (Optional) A metric or a list of valid metrics ("min_per_class_accuracy", "absolute_mcc", "tnr", "fnr", "fpr", "tpr", "precision", "accuracy", "f0point5", "f2", "f1"). This value is only used in the case of H2OBinomialMetrics objects. Details The H2OModelMetrics version of this function will only take H2OBinomialMetrics or H2OMultinomialMetrics objects. If no threshold is specified, all possible thresholds are selected. Value Calling this function on H2OModel objects returns a confusion matrix corresponding to the predict function. If used on an H2OBinomialMetrics object, returns a list of matrices corresponding to the number of thresholds specified. h2o.connect 37 See Also predict for generating prediction frames, h2o.performance for creating H2OModelMetrics. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") hex <- h2o.uploadFile(prosPath) hex[,2] <- as.factor(hex[,2]) model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli") h2o.confusionMatrix(model, hex) # Generating a ModelMetrics object perf <- h2o.performance(model, hex) h2o.confusionMatrix(perf) h2o.connect Connect to a running H2O instance. Description Connect to a running H2O instance. Usage h2o.connect(ip = "localhost", port = 54321, strict_version_check = TRUE, proxy = NA_character_, https = FALSE, insecure = FALSE, username = NA_character_, password = NA_character_, cookies = NA_character_, context_path = NA_character_, config = NULL) Arguments ip Object of class character representing the IP address of the server where H2O is running. port Object of class numeric representing the port number of the H2O server. strict_version_check (Optional) Setting this to FALSE is unsupported and should only be done when advised by technical support. proxy (Optional) A character string specifying the proxy path. https (Optional) Set this to TRUE to use https instead of http. insecure (Optional) Set this to TRUE to disable SSL certificate checking. username (Optional) Username to login with. password (Optional) Password to login with. cookies (Optional) Vector(or list) of cookies to add to request. context_path (Optional) The last part of connection URL: http:// : / config (Optional) A list describing connection parameters. 38 h2o.cor Value an instance of H2OConnection object representing a connection to the running H2O instance. Examples ## Not run: library(h2o) # Try to connect to a H2O instance running at http://localhost:54321/cluster_X # If not found, start a local H2O instance from R with the default settings. #h2o.connect(ip = "localhost", port = 54321, context_path = "cluster_X") # Or #config = list(ip = "localhost", port = 54321, context_path = "cluster_X") #h2o.connect(config = config) # Skip strict version check during connecting to the instance #h2o.connect(config = c(strict_version_check = FALSE, config)) ## End(Not run) h2o.cor Correlation of columns. Description Compute the correlation matrix of one or two H2OFrames. Usage h2o.cor(x, y = NULL, na.rm = FALSE, use) cor(x, ...) Arguments x An H2OFrame object. y NULL (default) or an H2OFrame. The default is equivalent to y = x. na.rm logical. Should missing values be removed? use An optional character string indicating how to handle missing values. This must be one of the following: "everything" - outputs NaNs whenever one of its contributing observations is missing "all.obs" - presence of missing observations will throw an error "complete.obs" - discards missing values along with all observations in their rows so that only complete observations are used ... Further arguments to be passed down from other methods. Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) cor(prostate.hex$AGE) h2o.cos 39 h2o.cos Compute the cosine of x Description Compute the cosine of x Usage h2o.cos(x) Arguments x An H2OFrame object. See Also cos for the base R implementation. h2o.cosh Compute the hyperbolic cosine of x Description Compute the hyperbolic cosine of x Usage h2o.cosh(x) Arguments x An H2OFrame object. See Also cosh for the base R implementation. 40 h2o.coxph h2o.coxph Trains a Cox Proportional Hazards Model (CoxPH) on an H2O dataset Description Trains a Cox Proportional Hazards Model (CoxPH) on an H2O dataset Usage h2o.coxph(x, event_column, training_frame, model_id = NULL, start_column = NULL, stop_column = NULL, weights_column = NULL, offset_column = NULL, ties = c("efron", "breslow"), init = 0, lre_min = 9, iter_max = 20) Arguments x (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except event_column, start_column and stop_column are used. event_column The name of binary data column in the training frame indicating the occurrence of an event. training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. start_column start_column stop_column stop_column weights_column Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. offset_column Offset column. This will be added to the combination of columns before applying the link function. ties ties Must be one of: "efron", "breslow". Defaults to efron. init init Defaults to 0. lre_min lre_min Defaults to 9. iter_max iter_max Defaults to 20. h2o.createFrame h2o.createFrame 41 Data H2OFrame Creation in H2O Description Creates a data frame in H2O with real-valued, categorical, integer, and binary columns specified by the user. Usage h2o.createFrame(rows = 10000, cols = 10, randomize = TRUE, value = 0, real_range = 100, categorical_fraction = 0.2, factors = 100, integer_fraction = 0.2, integer_range = 100, binary_fraction = 0.1, binary_ones_fraction = 0.02, time_fraction = 0, string_fraction = 0, missing_fraction = 0.01, response_factors = 2, has_response = FALSE, seed, seed_for_column_types) Arguments rows The number of rows of data to generate. cols The number of columns of data to generate. Excludes the response column if has_response = TRUE. randomize A logical value indicating whether data values should be randomly generated. This must be TRUE if either categorical_fraction or integer_fraction is non-zero. value If randomize = FALSE, then all real-valued entries will be set to this value. real_range The range of randomly generated real values. categorical_fraction The fraction of total columns that are categorical. factors The number of (unique) factor levels in each categorical column. integer_fraction The fraction of total columns that are integer-valued. integer_range The range of randomly generated integer values. binary_fraction The fraction of total columns that are binary-valued. binary_ones_fraction The fraction of values in a binary column that are set to 1. time_fraction The fraction of randomly created date/time columns. string_fraction The fraction of randomly created string columns. missing_fraction The fraction of total entries in the data frame that are set to NA. response_factors If has_response = TRUE, then this is the number of factor levels in the response column. has_response A logical value indicating whether an additional response column should be prepended to the final H2O data frame. If set to TRUE, the total number of columns will be cols+1. 42 h2o.cross_validation_fold_assignment seed A seed used to generate random values when randomize = TRUE. seed_for_column_types A seed used to generate random column types when randomize = TRUE. Value Returns an H2OFrame object. Examples library(h2o) h2o.init() hex <- h2o.createFrame(rows = 1000, cols = 100, categorical_fraction = 0.1, factors = 5, integer_fraction = 0.5, integer_range = 1, has_response = TRUE) head(hex) summary(hex) hex2 <- h2o.createFrame(rows = 100, cols = 10, randomize = FALSE, value = 5, categorical_fraction = 0, integer_fraction = 0) summary(hex2) h2o.cross_validation_fold_assignment Retrieve the cross-validation fold assignment Description Retrieve the cross-validation fold assignment Usage h2o.cross_validation_fold_assignment(object) Arguments object An H2OModel object. Value Returns a H2OFrame h2o.cross_validation_holdout_predictions h2o.cross_validation_holdout_predictions Retrieve the cross-validation holdout predictions Description Retrieve the cross-validation holdout predictions Usage h2o.cross_validation_holdout_predictions(object) Arguments object An H2OModel object. Value Returns a H2OFrame h2o.cross_validation_models Retrieve the cross-validation models Description Retrieve the cross-validation models Usage h2o.cross_validation_models(object) Arguments object An H2OModel object. Value Returns a list of H2OModel objects 43 44 h2o.cummax h2o.cross_validation_predictions Retrieve the cross-validation predictions Description Retrieve the cross-validation predictions Usage h2o.cross_validation_predictions(object) Arguments object An H2OModel object. Value Returns a list of H2OFrame objects h2o.cummax Return the cumulative max over a column or across a row Description Return the cumulative max over a column or across a row Usage h2o.cummax(x, axis = 0) Arguments x An H2OFrame object. axis An int that indicates whether to do down a column (0) or across a row (1). See Also cummax for the base R implementation. h2o.cummin h2o.cummin 45 Return the cumulative min over a column or across a row Description Return the cumulative min over a column or across a row Usage h2o.cummin(x, axis = 0) Arguments x An H2OFrame object. axis An int that indicates whether to do down a column (0) or across a row (1). See Also cummin for the base R implementation. h2o.cumprod Return the cumulative product over a column or across a row Description Return the cumulative product over a column or across a row Usage h2o.cumprod(x, axis = 0) Arguments x An H2OFrame object. axis An int that indicates whether to do down a column (0) or across a row (1). See Also cumprod for the base R implementation. 46 h2o.cut h2o.cumsum Return the cumulative sum over a column or across a row Description Return the cumulative sum over a column or across a row Usage h2o.cumsum(x, axis = 0) Arguments x axis An H2OFrame object. An int that indicates whether to do down a column (0) or across a row (1). See Also cumsum for the base R implementation. h2o.cut Cut H2O Numeric Data to Factor Description Divides the range of the H2O data into intervals and codes the values according to which interval they fall in. The leftmost interval corresponds to the level one, the next is level two, etc. Usage h2o.cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ...) ## S3 method for class 'H2OFrame' cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ...) Arguments x breaks labels An H2OFrame object with a single numeric column. A numeric vector of two or more unique cut points. Labels for the levels of the resulting category. By default, labels are constructed sing "(a,b]" interval notation. include.lowest Logical, indicationg if an ’x[i]’ equal to the lowest (or highest, for right = FALSE ’breaks’ value should be included right /codeLogical, indicating if the intervals should be closed on the right (opened on the left) or vice versa. dig.lab Integer which is used when labels are not given, determines the number of digits used in formatting the break numbers. ... Further arguments passed to or from other methods. h2o.day 47 Value Returns an H2OFrame object containing the factored data with intervals as levels. Examples library(h2o) h2o.init() irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o") iris.hex <- h2o.uploadFile(path = irisPath, destination_frame = "iris.hex") summary(iris.hex) # Cut sepal length column into intervals determined by min/max/quantiles sepal_len.cut <- cut(iris.hex$sepal_len, c(4.2, 4.8, 5.8, 6, 8)) head(sepal_len.cut) summary(sepal_len.cut) h2o.day Convert Milliseconds to Day of Month in H2O Datasets Description Converts the entries of an H2OFrame object from milliseconds to days of the month (on a 1 to 31 scale). Usage h2o.day(x) day(x) ## S3 method for class 'H2OFrame' day(x) Arguments x An H2OFrame object. Value An H2OFrame object containing the entries of x converted to days of the month. See Also h2o.month 48 h2o.dct h2o.dayOfWeek Convert Milliseconds to Day of Week in H2O Datasets Description Converts the entries of an H2OFrame object from milliseconds to days of the week (on a 0 to 6 scale). Usage h2o.dayOfWeek(x) dayOfWeek(x) ## S3 method for class 'H2OFrame' dayOfWeek(x) Arguments x An H2OFrame object. Value An H2OFrame object containing the entries of x converted to days of the week. See Also h2o.day, h2o.month h2o.dct Compute DCT of an H2OFrame Description Compute the Discrete Cosine Transform of every row in the H2OFrame Usage h2o.dct(data, destination_frame, dimensions, inverse = FALSE) Arguments data An H2OFrame object representing the dataset to transform destination_frame A frame ID for the result dimensions An array containing the 3 integer values for height, width, depth of each sample. The product of HxWxD must total up to less than the number of columns. For 1D, use c(L,1,1), for 2D, use C(N,M,1). inverse Whether to perform the inverse transform h2o.ddply 49 Value Returns an H2OFrame object. Examples library(h2o) h2o.init() df <- h2o.createFrame(rows = 1000, cols = 8*16*24, categorical_fraction = 0, integer_fraction = 0, missing_fraction = 0) df1 <- h2o.dct(data=df, dimensions=c(8*16*24,1,1)) df2 <- h2o.dct(data=df1,dimensions=c(8*16*24,1,1),inverse=TRUE) max(abs(df1-df2)) df1 <- h2o.dct(data=df, dimensions=c(8*16,24,1)) df2 <- h2o.dct(data=df1,dimensions=c(8*16,24,1),inverse=TRUE) max(abs(df1-df2)) df1 <- h2o.dct(data=df, dimensions=c(8,16,24)) df2 <- h2o.dct(data=df1,dimensions=c(8,16,24),inverse=TRUE) max(abs(df1-df2)) h2o.ddply Split H2O Dataset, Apply Function, and Return Results Description For each subset of an H2O data set, apply a user-specified function, then combine the results. This is an experimental feature. Usage h2o.ddply(X, .variables, FUN, ..., .progress = "none") Arguments X An H2OFrame object to be processed. .variables Variables to split X by, either the indices or names of a set of columns. FUN Function to apply to each subset grouping. ... Additional arguments passed on to FUN. .progress Name of the progress bar to use. #TODO: (Currently unimplemented) Value Returns an H2OFrame object containing the results from the split/apply operation, arranged See Also ddply for the plyr library implementation. 50 h2o.decryptionSetup Examples library(h2o) h2o.init() # Import iris dataset to H2O irisPath <- system.file("extdata", "iris_wheader.csv", package = "h2o") iris.hex <- h2o.uploadFile(path = irisPath, destination_frame = "iris.hex") # Add function taking mean of sepal_len column fun <- function(df) { sum(df[,1], na.rm = TRUE)/nrow(df) } # Apply function to groups by class of flower # uses h2o's ddply, since iris.hex is an H2OFrame object res <- h2o.ddply(iris.hex, "class", fun) head(res) h2o.decryptionSetup Setup a Decryption Tool Description If your source file is encrypted - setup a Decryption Tool and then provide the reference (result of this function) to the import functions. Usage h2o.decryptionSetup(keystore, keystore_type = "JCEKS", key_alias = NA_character_, password = NA_character_, decrypt_tool = "", decrypt_impl = "water.parser.GenericDecryptionTool", cipher_spec = NA_character_) Arguments keystore An H2OFrame object referencing a loaded Java Keystore (see example). keystore_type (Optional) Specification of Keystore type, defaults to JCEKS. key_alias Which key from the keystore to use for decryption. password Password to the keystore and the key. decrypt_tool (Optional) Name of the decryption tool. decrypt_impl (Optional) Java class name implementing the Decryption Tool. cipher_spec Specification of a cipher (eg.: AES/ECB/PKCS5Padding). See Also h2o.importFile, h2o.parseSetup h2o.deepfeatures 51 Examples ## Not run: library(h2o) h2o.init() ksPath <- system.file("extdata", "keystore.jks", package = "h2o") keystore <- h2o.importFile(path = ksPath, parse = FALSE) # don't parse, keep as a binary file cipher <- "AES/ECB/PKCS5Padding" pwd <- "Password123" kAlias <- "secretKeyAlias" dt <- h2o.decryptionSetup(keystore, key_alias = kAlias, password = pwd, cipher_spec = cipher) dataPath <- system.file("extdata", "prostate.csv.aes", package = "h2o") data <- h2o.importFile(dataPath, decrypt_tool = dt) summary(data) ## End(Not run) h2o.deepfeatures Feature Generation via H2O Deep Learning or DeepWater Model Description Extract the non-linear feature from an H2O data set using an H2O deep learning model. Usage h2o.deepfeatures(object, data, layer) Arguments object An H2OModel object that represents the deep learning model to be used for feature extraction. data An H2OFrame object. layer Index (for DeepLearning, integer) or Name (for DeepWater, String) of the hidden layer to extract Value Returns an H2OFrame object with as many features as the number of units in the hidden layer of the specified index. See Also link{h2o.deeplearning} for making H2O Deep Learning models. link{h2o.deepwater} for making H2O DeepWater models. 52 h2o.deeplearning Examples library(h2o) h2o.init() prosPath = system.file("extdata", "prostate.csv", package = "h2o") prostate.hex = h2o.importFile(path = prosPath) prostate.dl = h2o.deeplearning(x = 3:9, y = 2, training_frame = prostate.hex, hidden = c(100, 200), epochs = 5) prostate.deepfeatures_layer1 = h2o.deepfeatures(prostate.dl, prostate.hex, layer = 1) prostate.deepfeatures_layer2 = h2o.deepfeatures(prostate.dl, prostate.hex, layer = 2) head(prostate.deepfeatures_layer1) head(prostate.deepfeatures_layer2) #if (h2o.deepwater.available()) { # prostate.dl = h2o.deepwater(x = 3:9, y = 2, backend="mxnet", training_frame = prostate.hex, # hidden = c(100, 200), epochs = 5) # prostate.deepfeatures_layer1 = # h2o.deepfeatures(prostate.dl, prostate.hex, layer = "fc1_w") # prostate.deepfeatures_layer2 = # h2o.deepfeatures(prostate.dl, prostate.hex, layer = "fc2_w") # head(prostate.deepfeatures_layer1) # head(prostate.deepfeatures_layer2) #} h2o.deeplearning Build a Deep Neural Network model using CPUs Description Builds a feed-forward multilayer artificial neural network on an H2OFrame. Usage h2o.deeplearning(x, y, training_frame, model_id = NULL, validation_frame = NULL, nfolds = 0, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), fold_column = NULL, ignore_const_cols = TRUE, score_each_iteration = FALSE, weights_column = NULL, offset_column = NULL, balance_classes = FALSE, class_sampling_factors = NULL, max_after_balance_size = 5, max_hit_ratio_k = 0, checkpoint = NULL, pretrained_autoencoder = NULL, overwrite_with_best_model = TRUE, use_all_factor_levels = TRUE, standardize = TRUE, activation = c("Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout"), hidden = c(200, 200), epochs = 10, train_samples_per_iteration = -2, target_ratio_comm_to_comp = 0.05, seed = -1, adaptive_rate = TRUE, rho = 0.99, epsilon = 1e-08, rate = 0.005, rate_annealing = 1e-06, rate_decay = 1, momentum_start = 0, momentum_ramp = 1e+06, momentum_stable = 0, nesterov_accelerated_gradient = TRUE, input_dropout_ratio = 0, hidden_dropout_ratios = NULL, l1 = 0, l2 = 0, h2o.deeplearning 53 max_w2 = 3.4028235e+38, initial_weight_distribution = c("UniformAdaptive", "Uniform", "Normal"), initial_weight_scale = 1, initial_weights = NULL, initial_biases = NULL, loss = c("Automatic", "CrossEntropy", "Quadratic", "Huber", "Absolute", "Quantile"), distribution = c("AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"), quantile_alpha = 0.5, tweedie_power = 1.5, huber_alpha = 0.9, score_interval = 5, score_training_samples = 10000, score_validation_samples = 0, score_duty_cycle = 0.1, classification_stop = 0, regression_stop = 1e-06, stopping_rounds = 5, stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error"), stopping_tolerance = 0, max_runtime_secs = 0, score_validation_sampling = c("Uniform", "Stratified"), diagnostics = TRUE, fast_mode = TRUE, force_load_balance = TRUE, variable_importances = TRUE, replicate_training_data = TRUE, single_node_mode = FALSE, shuffle_training_data = FALSE, missing_values_handling = c("MeanImputation", "Skip"), quiet_mode = FALSE, autoencoder = FALSE, sparse = FALSE, col_major = FALSE, average_activation = 0, sparsity_beta = 0, max_categorical_features = 2147483647, reproducible = FALSE, export_weights_and_biases = FALSE, mini_batch_size = 1, categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), elastic_averaging = FALSE, elastic_averaging_moving_rate = 0.9, elastic_averaging_regularization = 0.001, verbose = FALSE) Arguments x (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. y The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. nfolds Number of folds for K-fold cross-validation (0 to disable or >= 2). Defaults to 0. keep_cross_validation_predictions Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE. keep_cross_validation_fold_assignment Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE. fold_assignment Cross-validation fold assignment scheme, if fold_column is not specified. The ’Stratified’ option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO. 54 h2o.deeplearning fold_column Column with cross-validation fold index assignment per observation. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. weights_column Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. offset_column Offset column. This will be added to the combination of columns before applying the link function. balance_classes Logical. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to FALSE. class_sampling_factors Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. max_after_balance_size Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. Defaults to 5.0. max_hit_ratio_k Max. number (top K) of predictions to use for hit ratio computation (for multiclass only, 0 to disable). Defaults to 0. checkpoint Model checkpoint to resume training with. pretrained_autoencoder Pretrained autoencoder model to initialize this model with. overwrite_with_best_model Logical. If enabled, override the final model with the best model found during training. Defaults to TRUE. use_all_factor_levels Logical. Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder. Defaults to TRUE. standardize Logical. If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data. Defaults to TRUE. activation Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier. hidden Hidden layer sizes (e.g. [100, 100]). Defaults to [200, 200]. epochs How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10. train_samples_per_iteration Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic. Defaults to -2. h2o.deeplearning 55 target_ratio_comm_to_comp Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning). Defaults to 0.05. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Note: only reproducible when running single threaded. Defaults to -1 (time-based random number). adaptive_rate Logical. Adaptive learning rate. Defaults to TRUE. rho Adaptive learning rate time decay factor (similarity to prior updates). Defaults to 0.99. epsilon Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress). Defaults to 1e-08. rate Learning rate (higher => less stable, lower => slower convergence). Defaults to 0.005. rate_annealing Learning rate annealing: rate / (1 + rate_annealing * samples). Defaults to 1e06. rate_decay Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n 1). Defaults to 1. momentum_start Initial momentum at the beginning of training (try 0.5). Defaults to 0. momentum_ramp Number of training samples for which momentum increases. Defaults to 1000000. momentum_stable Final momentum after the ramp is over (try 0.99). Defaults to 0. nesterov_accelerated_gradient Logical. Use Nesterov accelerated gradient (recommended). Defaults to TRUE. input_dropout_ratio Input layer dropout ratio (can improve generalization, try 0.1 or 0.2). Defaults to 0. hidden_dropout_ratios Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5. l1 L1 regularization (can add stability and improve generalization, causes many weights to become 0). Defaults to 0. l2 L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0. max_w2 Constraint for squared sum of incoming weights per unit (e.g. for Rectifier). Defaults to 3.4028235e+38. initial_weight_distribution Initial weight distribution. Must be one of: "UniformAdaptive", "Uniform", "Normal". Defaults to UniformAdaptive. initial_weight_scale Uniform: -value...value, Normal: stddev. Defaults to 1. initial_weights A list of H2OFrame ids to initialize the weight matrices of this model with. initial_biases A list of H2OFrame ids to initialize the bias vectors of this model with. loss Loss function. Must be one of: "Automatic", "CrossEntropy", "Quadratic", "Huber", "Absolute", "Quantile". Defaults to Automatic. distribution Distribution function Must be one of: "AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber". Defaults to AUTO. 56 h2o.deeplearning quantile_alpha Desired quantile for Quantile regression, must be between 0 and 1. Defaults to 0.5. tweedie_power Tweedie power for Tweedie regression, must be between 1 and 2. Defaults to 1.5. huber_alpha Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1). Defaults to 0.9. score_interval Shortest time interval (in seconds) between model scoring. Defaults to 5. score_training_samples Number of training set samples for scoring (0 for all). Defaults to 10000. score_validation_samples Number of validation set samples for scoring (0 for all). Defaults to 0. score_duty_cycle Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring). Defaults to 0.1. classification_stop Stopping criterion for classification error fraction on training data (-1 to disable). Defaults to 0. regression_stop Stopping criterion for regression error (MSE) on training data (-1 to disable). Defaults to 1e-06. stopping_rounds Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 5. stopping_metric Metric to use for early stopping (AUTO: logloss for classification, deviance for regression) Must be one of: "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error". Defaults to AUTO. stopping_tolerance Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0. max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. score_validation_sampling Method used to sample validation dataset for scoring. Must be one of: "Uniform", "Stratified". Defaults to Uniform. diagnostics fast_mode Logical. Enable diagnostics for hidden layers. Defaults to TRUE. Logical. Enable fast mode (minor approximation in back-propagation). Defaults to TRUE. force_load_balance Logical. Force extra load balancing to increase training speed for small datasets (to keep all cores busy). Defaults to TRUE. variable_importances Logical. Compute variable importances for input features (Gedeon method) can be slow for large networks. Defaults to TRUE. replicate_training_data Logical. Replicate the entire training dataset onto every node for faster training on small datasets. Defaults to TRUE. h2o.deeplearning 57 single_node_mode Logical. Run on a single node for fine-tuning of model parameters. Defaults to FALSE. shuffle_training_data Logical. Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes). Defaults to FALSE. missing_values_handling Handling of missing values. Either MeanImputation or Skip. Must be one of: "MeanImputation", "Skip". Defaults to MeanImputation. quiet_mode Logical. Enable quiet mode for less output to standard output. Defaults to FALSE. autoencoder Logical. Auto-Encoder. Defaults to FALSE. sparse Logical. Sparse data handling (more efficient for data with lots of 0 values). Defaults to FALSE. col_major Logical. #DEPRECATED Use a column major weight matrix for input layer. Can speed up forward propagation, but might slow down backpropagation. Defaults to FALSE. average_activation Average activation for sparse auto-encoder. #Experimental Defaults to 0. sparsity_beta Sparsity regularization. #Experimental Defaults to 0. max_categorical_features Max. number of categorical features, enforced via hashing. #Experimental Defaults to 2147483647. reproducible Logical. Force reproducibility on small data (will be slow - only uses 1 thread). Defaults to FALSE. export_weights_and_biases Logical. Whether to export Neural Network weights and biases to H2O Frames. Defaults to FALSE. mini_batch_size Mini-batch size (smaller leads to better fit, larger can speed up and generalize better). Defaults to 1. categorical_encoding Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. elastic_averaging Logical. Elastic averaging between compute nodes can improve distributed model convergence. #Experimental Defaults to FALSE. elastic_averaging_moving_rate Elastic averaging moving rate (only if elastic averaging is enabled). Defaults to 0.9. elastic_averaging_regularization Elastic averaging regularization strength (only if elastic averaging is enabled). Defaults to 0.001. verbose Logical. Print scoring history to the console (Metrics per tree for GBM, DRF, & XGBoost. Metrics per epoch for Deep Learning). Defaults to FALSE. See Also predict.H2OModel for prediction 58 h2o.deepwater Examples library(h2o) h2o.init() iris.hex <- as.h2o(iris) iris.dl <- h2o.deeplearning(x = 1:4, y = 5, training_frame = iris.hex, seed=123456) # now make a prediction predictions <- h2o.predict(iris.dl, iris.hex) h2o.deepwater Build a Deep Learning model using multiple native GPU backends Description Builds a deep neural network on an H2OFrame containing various data sources. Usage h2o.deepwater(x, y, training_frame, model_id = NULL, checkpoint = NULL, autoencoder = FALSE, validation_frame = NULL, nfolds = 0, balance_classes = FALSE, max_after_balance_size = 5, class_sampling_factors = NULL, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), fold_column = NULL, offset_column = NULL, weights_column = NULL, score_each_iteration = FALSE, categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), overwrite_with_best_model = TRUE, epochs = 10, train_samples_per_iteration = -2, target_ratio_comm_to_comp = 0.05, seed = -1, standardize = TRUE, learning_rate = 0.001, learning_rate_annealing = 1e-06, momentum_start = 0.9, momentum_ramp = 10000, momentum_stable = 0.9, distribution = c("AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"), score_interval = 5, score_training_samples = 10000, score_validation_samples = 0, score_duty_cycle = 0.1, classification_stop = 0, regression_stop = 0, stopping_rounds = 5, stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error"), stopping_tolerance = 0, max_runtime_secs = 0, ignore_const_cols = TRUE, shuffle_training_data = TRUE, mini_batch_size = 32, clip_gradient = 10, network = c("auto", "user", "lenet", "alexnet", "vgg", "googlenet", "inception_bn", "resnet"), backend = c("mxnet", "caffe", "tensorflow"), image_shape = c(0, 0), channels = 3, sparse = FALSE, gpu = TRUE, device_id = c(0), cache_data = TRUE, network_definition_file = NULL, network_parameters_file = NULL, mean_image_file = NULL, export_native_parameters_prefix = NULL, activation = c("Rectifier", "Tanh"), hidden = NULL, input_dropout_ratio = 0, hidden_dropout_ratios = NULL, problem_type = c("auto", "image", "dataset")) h2o.deepwater 59 Arguments x (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. y The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. checkpoint Model checkpoint to resume training with. autoencoder Logical. Auto-Encoder. Defaults to FALSE. validation_frame Id of the validation data frame. nfolds balance_classes Number of folds for K-fold cross-validation (0 to disable or >= 2). Defaults to 0. Logical. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to FALSE. max_after_balance_size Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. Defaults to 5.0. class_sampling_factors Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. keep_cross_validation_predictions Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE. keep_cross_validation_fold_assignment Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE. fold_assignment Cross-validation fold assignment scheme, if fold_column is not specified. The ’Stratified’ option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO. fold_column Column with cross-validation fold index assignment per observation. offset_column Offset column. This will be added to the combination of columns before applying the link function. weights_column Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. 60 h2o.deepwater categorical_encoding Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. overwrite_with_best_model Logical. If enabled, override the final model with the best model found during training. Defaults to TRUE. epochs How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10. train_samples_per_iteration Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic. Defaults to -2. target_ratio_comm_to_comp Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning). Defaults to 0.05. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Note: only reproducible when running single threaded. Defaults to -1 (time-based random number). standardize Logical. If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data. Defaults to TRUE. learning_rate Learning rate (higher => less stable, lower => slower convergence). Defaults to 0.001. learning_rate_annealing Learning rate annealing: rate / (1 + rate_annealing * samples). Defaults to 1e06. momentum_start Initial momentum at the beginning of training (try 0.5). Defaults to 0.9. momentum_ramp Number of training samples for which momentum increases. Defaults to 10000. momentum_stable Final momentum after the ramp is over (try 0.99). Defaults to 0.9. distribution Distribution function Must be one of: "AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber". Defaults to AUTO. score_interval Shortest time interval (in seconds) between model scoring. Defaults to 5. score_training_samples Number of training set samples for scoring (0 for all). Defaults to 10000. score_validation_samples Number of validation set samples for scoring (0 for all). Defaults to 0. score_duty_cycle Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring). Defaults to 0.1. classification_stop Stopping criterion for classification error fraction on training data (-1 to disable). Defaults to 0. regression_stop Stopping criterion for regression error (MSE) on training data (-1 to disable). Defaults to 0. h2o.deepwater 61 stopping_rounds Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 5. stopping_metric Metric to use for early stopping (AUTO: logloss for classification, deviance for regression) Must be one of: "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error". Defaults to AUTO. stopping_tolerance Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0. max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. shuffle_training_data Logical. Enable global shuffling of training data. Defaults to TRUE. mini_batch_size Mini-batch size (smaller leads to better fit, larger can speed up and generalize better). Defaults to 32. clip_gradient Clip gradients once their absolute value is larger than this value. Defaults to 10. network Network architecture. Must be one of: "auto", "user", "lenet", "alexnet", "vgg", "googlenet", "inception_bn", "resnet". Defaults to auto. backend Deep Learning Backend. Must be one of: "mxnet", "caffe", "tensorflow". Defaults to mxnet. image_shape Width and height of image. Defaults to [0, 0]. channels Number of (color) channels. Defaults to 3. sparse Logical. Sparse data handling (more efficient for data with lots of 0 values). Defaults to FALSE. gpu Logical. Whether to use a GPU (if available). Defaults to TRUE. device_id Device IDs (which GPUs to use). Defaults to [0]. cache_data Logical. Whether to cache the data in memory (automatically disabled if data size is too large). Defaults to TRUE. network_definition_file Path of file containing network definition (graph, architecture). network_parameters_file Path of file containing network (initial) parameters (weights, biases). mean_image_file Path of file containing the mean image data for data normalization. export_native_parameters_prefix Path (prefix) where to export the native model parameters after every iteration. activation Activation function. Only used if no user-defined network architecture file is provided, and only for problem_type=dataset. Must be one of: "Rectifier", "Tanh". hidden Hidden layer sizes (e.g. [200, 200]). Only used if no user-defined network architecture file is provided, and only for problem_type=dataset. 62 h2o.describe input_dropout_ratio Input layer dropout ratio (can improve generalization, try 0.1 or 0.2). Defaults to 0. hidden_dropout_ratios Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5. problem_type Problem type, auto-detected by default. If set to image, the H2OFrame must contain a string column containing the path (URI or URL) to the images in the first column. If set to text, the H2OFrame must contain a string column containing the text in the first column. If set to dataset, Deep Water behaves just like any other H2O Model and builds a model on the provided H2OFrame (non-String columns). Must be one of: "auto", "image", "dataset". Defaults to auto. h2o.deepwater.available Determines whether Deep Water is available Description Ask the H2O server whether a Deep Water model can be built. (Depends on availability of native backends.) Returns TRUE if a Deep Water model can be built, or FALSE otherwise. Usage h2o.deepwater.available(h2oRestApiVersion = .h2o.__REST_API_VERSION) Arguments h2oRestApiVersion (Optional) Specific version of the REST API to use. h2o.describe H2O Description of A Dataset Description Reports the "Flow" style summary rollups on an instance of H2OFrame. Includes information about column types, mins/maxs/missing/zero counts/stds/number of levels Usage h2o.describe(frame) Arguments frame An H2OFrame object. Value A table with the Frame stats. h2o.difflag1 63 Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.importFile(path = prosPath) h2o.describe(prostate.hex) h2o.difflag1 Conduct a lag 1 transform on a numeric H2OFrame column Description Conduct a lag 1 transform on a numeric H2OFrame column Usage h2o.difflag1(object) Arguments object H2OFrame object Value Returns an H2OFrame object. h2o.dim Returns the number of rows and columns for an H2OFrame object. Description Returns the number of rows and columns for an H2OFrame object. Usage h2o.dim(x) Arguments x An H2OFrame object. See Also dim for the base R implementation. 64 h2o.distance h2o.dimnames Column names of an H2OFrame Description Column names of an H2OFrame Usage h2o.dimnames(x) Arguments x An H2OFrame object. See Also dimnames for the base R implementation. h2o.distance Compute a pairwise distance measure between all rows of two numeric H2OFrames. Description Compute a pairwise distance measure between all rows of two numeric H2OFrames. Usage h2o.distance(x, y, measure) Arguments x An H2OFrame object (large, references). y An H2OFrame object (small, queries). measure An optional string indicating what distance measure to use. Must be one of: "l1" - Absolute distance (L1-norm, >=0) "l2" - Euclidean distance (L2-norm, >=0) "cosine" - Cosine similarity (-1...1) "cosine_sq" - Squared Cosine similarity (0...1) Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) h2o.distance(prostate.hex[11:30,], prostate.hex[1:10,], "cosine") h2o.downloadAllLogs 65 h2o.downloadAllLogs Download H2O Log Files to Disk Description h2o.downloadAllLogs downloads all H2O log files to local disk in .zip format. Generally used for debugging purposes. Usage h2o.downloadAllLogs(dirname = ".", filename = NULL) Arguments dirname (Optional) A character string indicating the directory that the log file should be saved in. filename (Optional) A character string indicating the name that the log file should be saved to. Note that the saved format is .zip, so the file name must include the .zip extension. Examples h2o.downloadAllLogs(dirname='./your_directory_name/', filename = 'autoh2o_log.zip') h2o.downloadCSV Download H2O Data to Disk Description Download an H2O data set to a CSV file on the local disk Usage h2o.downloadCSV(data, filename) Arguments data an H2OFrame object to be downloaded. filename A string indicating the name that the CSV file should be should be saved to. Warning Files located on the H2O server may be very large! Make sure you have enough hard drive space to accomodate the entire file. 66 h2o.download_mojo Examples library(h2o) h2o.init() irisPath <- system.file("extdata", "iris_wheader.csv", package = "h2o") iris.hex <- h2o.uploadFile(path = irisPath) myFile <- paste(getwd(), "my_iris_file.csv", sep = .Platform$file.sep) h2o.downloadCSV(iris.hex, myFile) file.info(myFile) file.remove(myFile) h2o.download_mojo Download the model in MOJO format. Description Download the model in MOJO format. Usage h2o.download_mojo(model, path = getwd(), get_genmodel_jar = FALSE, genmodel_name = "") Arguments model An H2OModel path The path where MOJO file should be saved. Saved to current directory by default. get_genmodel_jar If TRUE, then also download h2o-genmodel.jar and store it in folder “path“. genmodel_name Custom name of genmodel jar. Value Name of the MOJO file written to the path. Examples library(h2o) h <- h2o.init() fr <- as.h2o(iris) my_model <- h2o.gbm(x=1:4, y=5, training_frame=fr) h2o.download_mojo(my_model) # save to the current working directory h2o.download_pojo h2o.download_pojo 67 Download the Scoring POJO (Plain Old Java Object) of an H2O Model Description Download the Scoring POJO (Plain Old Java Object) of an H2O Model Usage h2o.download_pojo(model, path = NULL, getjar = NULL, get_jar = TRUE, jar_name = "") Arguments model An H2OModel path The path to the directory to store the POJO (no trailing slash). If NULL, then print to to console. The file name will be a compilable java file name. getjar (DEPRECATED) Whether to also download the h2o-genmodel.jar file needed to compile the POJO. This argument is now called ‘get_jar‘. get_jar Whether to also download the h2o-genmodel.jar file needed to compile the POJO jar_name Custom name of genmodel jar. Value If path is NULL, then pretty print the POJO to the console. Otherwise save it to the specified directory and return POJO file name. Examples library(h2o) h <- h2o.init() fr <- as.h2o(iris) my_model <- h2o.gbm(x=1:4, y=5, training_frame=fr) h2o.download_pojo(my_model) # print the model to screen # h2o.download_pojo(my_model, getwd()) # save the POJO and jar file to the current working # directory, NOT RUN # h2o.download_pojo(my_model, getwd(), get_jar = FALSE ) # save only the POJO to the current # working directory, NOT RUN h2o.download_pojo(my_model, getwd()) # save to the current working directory 68 h2o.exp h2o.entropy Shannon entropy Description Return the Shannon entropy of a string column. If the string is empty, the entropy is 0. Usage h2o.entropy(x) Arguments x The column on which to calculate the entropy. Examples library(h2o) h2o.init() buys <- as.h2o(c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes","no")) buys_entropy <- h2o.entropy(buys) h2o.exp Compute the exponential function of x Description Compute the exponential function of x Usage h2o.exp(x) Arguments x An H2OFrame object. See Also exp for the base R implementation. h2o.exportFile h2o.exportFile 69 Export an H2O Data Frame (H2OFrame) to a File or to a collection of Files. Description Exports an H2OFrame (which can be either VA or FV) to a file. This file may be on the H2O instace’s local filesystem, or to HDFS (preface the path with hdfs://) or to S3N (preface the path with s3n://). Usage h2o.exportFile(data, path, force = FALSE, parts = 1) Arguments data An H2OFrame object. path The path to write the file to. Must include the directory and also filename if exporting to a single file. May be prefaced with hdfs:// or s3n://. Each row of data appears as line of the file. force logical, indicates how to deal with files that already exist. parts integer, number of part files to export to. Default is to write to a single file. Large data can be exported to multiple ’part’ files, where each part file contains subset of the data. User can specify the maximum number of part files or use value -1 to indicate that H2O should itself determine the optimal number of files. Parameter path will be considered to be a path to a directory if export to multiple part files is desired. Part files conform to naming scheme ’part-m-?????’. Details In the case of existing files force = TRUE will overwrite the file. Otherwise, the operation will fail. Examples ## Not run: library(h2o) h2o.init() irisPath <- system.file("extdata", "iris.csv", package = "h2o") iris.hex <- h2o.uploadFile(path = irisPath) # # # # These aren't real paths h2o.exportFile(iris.hex, path = "/path/on/h2o/server/filesystem/iris.csv") h2o.exportFile(iris.hex, path = "hdfs://path/in/hdfs/iris.csv") h2o.exportFile(iris.hex, path = "s3n://path/in/s3/iris.csv") ## End(Not run) 70 h2o.fillna h2o.exportHDFS Export a Model to HDFS Description Exports an H2OModel to HDFS. Usage h2o.exportHDFS(object, path, force = FALSE) Arguments object an H2OModel class object. path The path to write the model to. Must include the driectory and filename. force logical, indicates how to deal with files that already exist. h2o.fillna fillNA Description Fill NA’s in a sequential manner up to a specified limit Usage h2o.fillna(x, method = "forward", axis = 1, maxlen = 1L) Arguments x an H2OFrame method A String: "forward" or "backward" axis An Integer 1 for row-wise fill (default), 2 for column-wise fill maxlen An Integer for maximum number of consecutive NA’s to fill Value An H2OFrame after filling missing values Examples library(h2o) h2o.init() fr.with.nas = h2o.createFrame(categorical_fraction=0.0,missing_fraction=0.7,rows=6,cols=2,seed=123) fr <- h2o.fillna(fr.with.nas, "forward", axis=1, maxlen=2L) h2o.filterNACols h2o.filterNACols 71 Filter NA Columns Description Filter NA Columns Usage h2o.filterNACols(data, frac = 0.2) Arguments data A dataset to filter on. frac The threshold of NAs to allow per column (columns >= this threshold are filtered) Value Returns a numeric vector of indexes that pertain to non-NA columns h2o.findSynonyms Find synonyms using a word2vec model. Description Find synonyms using a word2vec model. Usage h2o.findSynonyms(word2vec, word, count = 20) Arguments word2vec A word2vec model. word A single word to find synonyms for. count The top ‘count‘ synonyms will be returned. 72 h2o.find_threshold_by_max_metric h2o.find_row_by_threshold Find the threshold, give the max metric. No duplicate thresholds allowed Description Find the threshold, give the max metric. No duplicate thresholds allowed Usage h2o.find_row_by_threshold(object, threshold) Arguments object H2OBinomialMetrics threshold number between 0 and 1 h2o.find_threshold_by_max_metric Find the threshold, give the max metric Description Find the threshold, give the max metric Usage h2o.find_threshold_by_max_metric(object, metric) Arguments object H2OBinomialMetrics metric "F1," for example h2o.floor h2o.floor 73 Take a single numeric argument and return a numeric vector with the largest integers Description floor takes a single numeric argument x and returns a numeric vector containing the largest integers not greater than the corresponding elements of x. Usage h2o.floor(x) Arguments x An H2OFrame object. See Also floor for the base R implementation. h2o.flow Open H2O Flow Description Open H2O Flow in your browser Usage h2o.flow() h2o.gainsLift Access H2O Gains/Lift Tables Description Retrieve either a single or many Gains/Lift tables from H2O objects. Usage h2o.gainsLift(object, ...) ## S4 method for signature 'H2OModel' h2o.gainsLift(object, newdata, valid = FALSE, xval = FALSE, ...) ## S4 method for signature 'H2OModelMetrics' h2o.gainsLift(object) 74 h2o.gbm Arguments object Either an H2OModel object or an H2OModelMetrics object. ... further arguments to be passed to/from this method. newdata An H2OFrame object that can be scored on. Requires a valid response column. valid Retrieve the validation metric. xval Retrieve the cross-validation metric. Details The H2OModelMetrics version of this function will only take H2OBinomialMetrics objects. Value Calling this function on H2OModel objects returns a Gains/Lift table corresponding to the predict function. See Also predict for generating prediction frames, h2o.performance for creating H2OModelMetrics. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") hex <- h2o.uploadFile(prosPath) hex[,2] <- as.factor(hex[,2]) model <- h2o.gbm(x = 3:9, y = 2, distribution = "bernoulli", training_frame = hex, validation_frame = hex, nfolds=3) h2o.gainsLift(model) ## extract training metrics h2o.gainsLift(model, valid=TRUE) ## extract validation metrics (here: the same) h2o.gainsLift(model, xval =TRUE) ## extract cross-validation metrics h2o.gainsLift(model, newdata=hex) ## score on new data (here: the same) # Generating a ModelMetrics object perf <- h2o.performance(model, hex) h2o.gainsLift(perf) ## extract from existing metrics object h2o.gbm Build gradient boosted classification or regression trees Description Builds gradient boosted classification trees and gradient boosted regression trees on a parsed data set. The default distribution function will guess the model type based on the response column type. In order to run properly, the response column must be an numeric for "gaussian" or an enum for "bernoulli" or "multinomial". h2o.gbm 75 Usage h2o.gbm(x, y, training_frame, model_id = NULL, validation_frame = NULL, nfolds = 0, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, score_each_iteration = FALSE, score_tree_interval = 0, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), fold_column = NULL, ignore_const_cols = TRUE, offset_column = NULL, weights_column = NULL, balance_classes = FALSE, class_sampling_factors = NULL, max_after_balance_size = 5, max_hit_ratio_k = 0, ntrees = 50, max_depth = 5, min_rows = 10, nbins = 20, nbins_top_level = 1024, nbins_cats = 1024, r2_stopping = Inf, stopping_rounds = 0, stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error"), stopping_tolerance = 0.001, max_runtime_secs = 0, seed = -1, build_tree_one_node = FALSE, learn_rate = 0.1, learn_rate_annealing = 1, distribution = c("AUTO", "bernoulli", "quasibinomial", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"), quantile_alpha = 0.5, tweedie_power = 1.5, huber_alpha = 0.9, checkpoint = NULL, sample_rate = 1, sample_rate_per_class = NULL, col_sample_rate = 1, col_sample_rate_change_per_level = 1, col_sample_rate_per_tree = 1, min_split_improvement = 1e-05, histogram_type = c("AUTO", "UniformAdaptive", "Random", "QuantilesGlobal", "RoundRobin"), max_abs_leafnode_pred = Inf, pred_noise_bandwidth = 0, categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), calibrate_model = FALSE, calibration_frame = NULL, custom_metric_func = NULL, verbose = FALSE) Arguments x (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. y The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. nfolds Number of folds for K-fold cross-validation (0 to disable or >= 2). Defaults to 0. keep_cross_validation_predictions Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE. keep_cross_validation_fold_assignment Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. 76 h2o.gbm score_tree_interval Score the model after every so many trees. Disabled if set to 0. Defaults to 0. fold_assignment Cross-validation fold assignment scheme, if fold_column is not specified. The ’Stratified’ option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO. fold_column Column with cross-validation fold index assignment per observation. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. offset_column Offset column. This will be added to the combination of columns before applying the link function. weights_column Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. balance_classes Logical. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to FALSE. class_sampling_factors Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. max_after_balance_size Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. Defaults to 5.0. max_hit_ratio_k Max. number (top K) of predictions to use for hit ratio computation (for multiclass only, 0 to disable) Defaults to 0. ntrees Number of trees. Defaults to 50. max_depth Maximum tree depth. Defaults to 5. min_rows Fewest allowed (weighted) observations in a leaf. Defaults to 10. nbins For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point Defaults to 20. nbins_top_level For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level Defaults to 1024. nbins_cats For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting. Defaults to 1024. r2_stopping r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this Defaults to 1.797693135e+308. stopping_rounds Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0. h2o.gbm 77 stopping_metric Metric to use for early stopping (AUTO: logloss for classification, deviance for regression) Must be one of: "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error". Defaults to AUTO. stopping_tolerance Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0.001. max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number). build_tree_one_node Logical. Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. Defaults to FALSE. learn_rate Learning rate (from 0.0 to 1.0) Defaults to 0.1. learn_rate_annealing Scale the learning rate by this factor after each tree (e.g., 0.99 or 0.999) Defaults to 1. distribution Distribution function Must be one of: "AUTO", "bernoulli", "quasibinomial", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber". Defaults to AUTO. quantile_alpha Desired quantile for Quantile regression, must be between 0 and 1. Defaults to 0.5. tweedie_power Tweedie power for Tweedie regression, must be between 1 and 2. Defaults to 1.5. huber_alpha Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1). Defaults to 0.9. checkpoint Model checkpoint to resume training with. sample_rate Row sample rate per tree (from 0.0 to 1.0) Defaults to 1. sample_rate_per_class A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree col_sample_rate Column sample rate (from 0.0 to 1.0) Defaults to 1. col_sample_rate_change_per_level Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) Defaults to 1. col_sample_rate_per_tree Column sample rate per tree (from 0.0 to 1.0) Defaults to 1. min_split_improvement Minimum relative improvement in squared error reduction for a split to happen Defaults to 1e-05. histogram_type What type of histogram to use for finding optimal split points Must be one of: "AUTO", "UniformAdaptive", "Random", "QuantilesGlobal", "RoundRobin". Defaults to AUTO. 78 h2o.getConnection max_abs_leafnode_pred Maximum absolute value of a leaf node prediction Defaults to 1.797693135e+308. pred_noise_bandwidth Bandwidth (sigma) of Gaussian multiplicative noise ~N(1,sigma) for tree node predictions Defaults to 0. categorical_encoding Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. calibrate_model Logical. Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities. Defaults to FALSE. calibration_frame Calibration frame for Platt Scaling custom_metric_func Reference to custom evaluation function, format: ‘language:keyName=funcName‘ verbose Logical. Print scoring history to the console (Metrics per tree for GBM, DRF, & XGBoost. Metrics per epoch for Deep Learning). Defaults to FALSE. See Also predict.H2OModel for prediction Examples library(h2o) h2o.init() # Run regression GBM on australia.hex data ausPath <- system.file("extdata", "australia.csv", package="h2o") australia.hex <- h2o.uploadFile(path = ausPath) independent <- c("premax", "salmax","minairtemp", "maxairtemp", "maxsst", "maxsoilmoist", "Max_czcs") dependent <- "runoffnew" h2o.gbm(y = dependent, x = independent, training_frame = australia.hex, ntrees = 3, max_depth = 3, min_rows = 2) h2o.getConnection Retrieve an H2O Connection Description Attempt to recover an h2o connection. Usage h2o.getConnection() h2o.getFrame 79 Value Returns an H2OConnection object. h2o.getFrame Get an R Reference to an H2O Dataset, that will NOT be GC’d by default Description Get the reference to a frame with the given id in the H2O instance. Usage h2o.getFrame(id) Arguments id A string indicating the unique frame of the dataset to retrieve. h2o.getFutureModel Get future model Description Get future model Usage h2o.getFutureModel(object, verbose = FALSE) Arguments object verbose H2OModel Print model progress to console. Default is FALSE h2o.getGLMFullRegularizationPath Extract full regularization path from a GLM model Description Extract the full regularization path from a GLM model (assuming it was run with the lambda search option). Usage h2o.getGLMFullRegularizationPath(model) Arguments model an H2OModel corresponding from a h2o.glm call. 80 h2o.getId h2o.getGrid Get a grid object from H2O distributed K/V store. Description Note that if neither cross-validation nor a validation frame is used in the grid search, then the training metrics will display in the "get grid" output. If a validation frame is passed to the grid, and nfolds = 0, then the validation metrics will display. However, if nfolds > 1, then cross-validation metrics will display even if a validation frame is provided. Usage h2o.getGrid(grid_id, sort_by, decreasing) Arguments grid_id ID of existing grid object to fetch sort_by Sort the models in the grid space by a metric. Choices are "logloss", "residual_deviance", "mse", "auc", "accuracy", "precision", "recall", "f1", etc. decreasing Specify whether sort order should be decreasing Examples library(h2o) library(jsonlite) h2o.init() iris.hex <- as.h2o(iris) h2o.grid("gbm", grid_id = "gbm_grid_id", x = c(1:4), y = 5, training_frame = iris.hex, hyper_params = list(ntrees = c(1,2,3))) grid <- h2o.getGrid("gbm_grid_id") # Get grid summary summary(grid) # Fetch grid models model_ids <- grid@model_ids models <- lapply(model_ids, function(id) { h2o.getModel(id)}) h2o.getId Get back-end distributed key/value store id from an H2OFrame. Description Get back-end distributed key/value store id from an H2OFrame. Usage h2o.getId(x) h2o.getModel 81 Arguments x An H2OFrame Value The id of the H2OFrame h2o.getModel Get an R reference to an H2O model Description Returns a reference to an existing model in the H2O instance. Usage h2o.getModel(model_id) Arguments model_id A string indicating the unique model_id of the model to retrieve. Value Returns an object that is a subclass of H2OModel. Examples library(h2o) h2o.init() iris.hex <- as.h2o(iris, "iris.hex") model_id <- h2o.gbm(x = 1:4, y = 5, training_frame = iris.hex)@model_id model.retrieved <- h2o.getModel(model_id) h2o.getTimezone Get the Time Zone on the H2O Cloud Returns a string Description Get the Time Zone on the H2O Cloud Returns a string Usage h2o.getTimezone() 82 h2o.giniCoef h2o.getTypes Get the types-per-column Description Get the types-per-column Usage h2o.getTypes(x) Arguments x An H2OFrame Value A list of types per column h2o.getVersion Get h2o version Description Get h2o version Usage h2o.getVersion() h2o.giniCoef Retrieve the GINI Coefficcient Description Retrieves the GINI coefficient from an H2OBinomialMetrics. If "train", "valid", and "xval" parameters are FALSE (default), then the training GINIvalue is returned. If more than one parameter is set to TRUE, then a named vector of GINIs are returned, where the names are "train", "valid" or "xval". Usage h2o.giniCoef(object, train = FALSE, valid = FALSE, xval = FALSE) h2o.glm 83 Arguments object an H2OBinomialMetrics object. train Retrieve the training GINI Coefficcient valid Retrieve the validation GINI Coefficcient xval Retrieve the cross-validation GINI Coefficcient See Also h2o.auc for AUC, h2o.giniCoef for the GINI coefficient, and h2o.metric for the various. See h2o.performance for creating H2OModelMetrics objects. threshold metrics. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") hex <- h2o.uploadFile(prosPath) hex[,2] <- as.factor(hex[,2]) model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli") perf <- h2o.performance(model, hex) h2o.giniCoef(perf) h2o.glm Fit a generalized linear model Description Fits a generalized linear model, specified by a response variable, a set of predictors, and a description of the error distribution. Usage h2o.glm(x, y, training_frame, model_id = NULL, validation_frame = NULL, nfolds = 0, seed = -1, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), fold_column = NULL, ignore_const_cols = TRUE, score_each_iteration = FALSE, offset_column = NULL, weights_column = NULL, family = c("gaussian", "binomial", "quasibinomial", "ordinal", "multinomial", "poisson", "gamma", "tweedie"), tweedie_variance_power = 0, tweedie_link_power = 1, solver = c("AUTO", "IRLSM", "L_BFGS", "COORDINATE_DESCENT_NAIVE", "COORDINATE_DESCENT", "GRADIENT_DESCENT_LH", "GRADIENT_DESCENT_SQERR"), alpha = NULL, lambda = NULL, lambda_search = FALSE, early_stopping = TRUE, nlambdas = -1, standardize = TRUE, missing_values_handling = c("MeanImputation", "Skip"), compute_p_values = FALSE, remove_collinear_columns = FALSE, intercept = TRUE, non_negative = FALSE, max_iterations = -1, 84 h2o.glm objective_epsilon = -1, beta_epsilon = 1e-04, gradient_epsilon = -1, link = c("family_default", "identity", "logit", "log", "inverse", "tweedie", "ologit", "oprobit", "ologlog"), prior = -1, lambda_min_ratio = -1, beta_constraints = NULL, max_active_predictors = -1, interactions = NULL, interaction_pairs = NULL, obj_reg = -1, balance_classes = FALSE, class_sampling_factors = NULL, max_after_balance_size = 5, max_hit_ratio_k = 0, max_runtime_secs = 0, custom_metric_func = NULL) Arguments x (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. y The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. nfolds Number of folds for K-fold cross-validation (0 to disable or >= 2). Defaults to 0. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number). keep_cross_validation_predictions Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE. keep_cross_validation_fold_assignment Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE. fold_assignment Cross-validation fold assignment scheme, if fold_column is not specified. The ’Stratified’ option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO. fold_column Column with cross-validation fold index assignment per observation. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. offset_column Offset column. This will be added to the combination of columns before applying the link function. weights_column Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the h2o.glm 85 size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. family Family. Use binomial for classification with logistic regression, others are for regression problems. Must be one of: "gaussian", "binomial", "quasibinomial", "ordinal", "multinomial", "poisson", "gamma", "tweedie". Defaults to gaussian. tweedie_variance_power Tweedie variance power Defaults to 0. tweedie_link_power Tweedie link power Defaults to 1. solver AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. Coordinate descent is experimental (beta). Must be one of: "AUTO", "IRLSM", "L_BFGS", "COORDINATE_DESCENT_NAIVE", "COORDINATE_DESCENT", "GRADIENT_DESCENT_LH", "GRADIENT_DESCENT_SQERR". Defaults to AUTO. alpha Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = ’L-BFGS’; 0.5 otherwise. lambda Regularization strength lambda_search Logical. Use lambda search starting at lambda max, given lambda is then interpreted as lambda min Defaults to FALSE. early_stopping Logical. Stop early when there is no more relative improvement on train or validation (if provided) Defaults to TRUE. nlambdas Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100. Defaults to -1. standardize Logical. Standardize numeric columns to have zero mean and unit variance Defaults to TRUE. missing_values_handling Handling of missing values. Either MeanImputation or Skip. Must be one of: "MeanImputation", "Skip". Defaults to MeanImputation. compute_p_values Logical. Request p-values computation, p-values work only with IRLSM solver and no regularization Defaults to FALSE. remove_collinear_columns Logical. In case of linearly dependent columns, remove some of the dependent columns Defaults to FALSE. intercept Logical. Include constant term in the model Defaults to TRUE. non_negative Logical. Restrict coefficients (not intercept) to be non-negative Defaults to FALSE. max_iterations Maximum number of iterations Defaults to -1. objective_epsilon Converge if objective value changes less than this. Default indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001. Defaults to -1. 86 h2o.glm beta_epsilon Converge if beta changes less (using L-infinity norm) than beta esilon, ONLY applies to IRLSM solver Defaults to 0.0001. gradient_epsilon Converge if objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively. Defaults to -1. link Must be one of: "family_default", "identity", "logit", "log", "inverse", "tweedie", "ologit", "oprobit", "ologlog". Defaults to family_default. prior Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality. Defaults to -1. lambda_min_ratio Minimum lambda used in lambda search, specified as a ratio of lambda_max (the smallest lambda that drives all coefficients to zero). Default indicates: if the number of observations is greater than the number of variables, then lambda_min_ratio is set to 0.0001; if the number of observations is less than the number of variables, then lambda_min_ratio is set to 0.01. Defaults to -1. beta_constraints Beta constraints max_active_predictors Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 5000 otherwise it is set to 100000000. Defaults to -1. interactions A list of predictor column indices to interact. All pairwise combinations will be computed for the list. interaction_pairs A list of pairwise (first order) column interactions. obj_reg balance_classes Likelihood divider in objective value computation, default is 1/nobs Defaults to -1. Logical. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to FALSE. class_sampling_factors Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. max_after_balance_size Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. Defaults to 5.0. max_hit_ratio_k Maximum number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable) Defaults to 0. max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. custom_metric_func Reference to custom evaluation function, format: ‘language:keyName=funcName‘ h2o.glrm 87 Value A subclass of H2OModel is returned. The specific subclass depends on the machine learning task at hand (if it’s binomial classification, then an H2OBinomialModel is returned, if it’s regression then a H2ORegressionModel is returned). The default print- out of the models is shown, but further GLM-specifc information can be queried out of the object. To access these various items, please refer to the seealso section below. Upon completion of the GLM, the resulting object has coefficients, normalized coefficients, residual/null deviance, aic, and a host of model metrics including MSE, AUC (for logistic regression), degrees of freedom, and confusion matrices. Please refer to the more in-depth GLM documentation available here: https://h2o-release.s3.amazonaws.com/ h2o-dev/rel-shannon/2/docs-website/h2o-docs/index.html#Data+Science+Algorithms-GLM See Also predict.H2OModel for prediction, h2o.mse, h2o.auc, h2o.confusionMatrix, h2o.performance, h2o.giniCoef, h2o.logloss, h2o.varimp, h2o.scoreHistory Examples h2o.init() # Run GLM of CAPSULE ~ AGE + RACE + PSA + DCAPS prostatePath = system.file("extdata", "prostate.csv", package = "h2o") prostate.hex = h2o.importFile(path = prostatePath, destination_frame = "prostate.hex") h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), training_frame = prostate.hex, family = "binomial", nfolds = 0, alpha = 0.5, lambda_search = FALSE) # Run GLM of VOL ~ CAPSULE + AGE + RACE + PSA + GLEASON myX = setdiff(colnames(prostate.hex), c("ID", "DPROS", "DCAPS", "VOL")) h2o.glm(y = "VOL", x = myX, training_frame = prostate.hex, family = "gaussian", nfolds = 0, alpha = 0.1, lambda_search = FALSE) # GLM variable importance # Also see: # https://github.com/h2oai/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R data.hex = h2o.importFile( path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv", destination_frame = "data.hex") myX = 1:20 myY="y" my.glm = h2o.glm(x=myX, y=myY, training_frame=data.hex, family="binomial", standardize=TRUE, lambda_search=TRUE) h2o.glrm Generalized low rank decomposition of an H2O data frame Description Builds a generalized low rank decomposition of an H2O data frame 88 h2o.glrm Usage h2o.glrm(training_frame, cols = NULL, model_id = NULL, validation_frame = NULL, ignore_const_cols = TRUE, score_each_iteration = FALSE, loading_name = NULL, transform = c("NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"), k = 1, loss = c("Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic", "Periodic"), loss_by_col = c("Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic", "Periodic", "Categorical", "Ordinal"), loss_by_col_idx = NULL, multi_loss = c("Categorical", "Ordinal"), period = 1, regularization_x = c("None", "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", "Simplex"), regularization_y = c("None", "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", "Simplex"), gamma_x = 0, gamma_y = 0, max_iterations = 1000, max_updates = 2000, init_step_size = 1, min_step_size = 1e-04, seed = -1, init = c("Random", "SVD", "PlusPlus", "User"), svd_method = c("GramSVD", "Power", "Randomized"), user_y = NULL, user_x = NULL, expand_user_y = TRUE, impute_original = FALSE, recover_svd = FALSE, max_runtime_secs = 0) Arguments training_frame Id of the training data frame. cols (Optional) A vector containing the data columns on which k-means operates. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. loading_name Frame key to save resulting X transform Transformation of training data Must be one of: "NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE". Defaults to NONE. k Rank of matrix approximation Defaults to 1. loss Numeric loss function Must be one of: "Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic", "Periodic". Defaults to Quadratic. loss_by_col Loss function by column (override) Must be one of: "Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic", "Periodic", "Categorical", "Ordinal". loss_by_col_idx Loss function by column index (override) multi_loss Categorical loss function Must be one of: "Categorical", "Ordinal". Defaults to Categorical. period Length of period (only used with periodic loss function) Defaults to 1. regularization_x Regularization function for X matrix Must be one of: "None", "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", "Simplex". Defaults to None. h2o.glrm 89 regularization_y Regularization function for Y matrix Must be one of: "None", "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", "Simplex". Defaults to None. gamma_x Regularization weight on X matrix Defaults to 0. gamma_y Regularization weight on Y matrix Defaults to 0. max_iterations Maximum number of iterations Defaults to 1000. max_updates Maximum number of updates, defaults to 2*max_iterations Defaults to 2000. init_step_size Initial step size Defaults to 1. min_step_size Minimum step size Defaults to 0.0001. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number). init Initialization mode Must be one of: "Random", "SVD", "PlusPlus", "User". Defaults to PlusPlus. svd_method Method for computing SVD during initialization (Caution: Randomized is currently experimental and unstable) Must be one of: "GramSVD", "Power", "Randomized". Defaults to Randomized. user_y User-specified initial Y user_x User-specified initial X expand_user_y Logical. Expand categorical columns in user-specified initial Y Defaults to TRUE. impute_original recover_svd Logical. Reconstruct original training data by reversing transform Defaults to FALSE. Logical. Recover singular values and eigenvectors of XY Defaults to FALSE. max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. Value Returns an object of class H2ODimReductionModel. References M. Udell, C. Horn, R. Zadeh, S. Boyd (2014). Generalized Low Rank Models[http://arxiv.org/abs/1410.0342]. Unpublished manuscript, Stanford Electrical Engineering Department N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011. See Also h2o.kmeans, h2o.svd, h2o.prcomp 90 h2o.grep Examples library(h2o) h2o.init() ausPath <- system.file("extdata", "australia.csv", package="h2o") australia.hex <- h2o.uploadFile(path = ausPath) h2o.glrm(training_frame = australia.hex, k = 5, loss = "Quadratic", regularization_x = "L1", gamma_x = 0.5, gamma_y = 0, max_iterations = 1000) h2o.grep Search for matches to an argument pattern Description Searches for matches to argument ‘pattern‘ within each element of a string column. Usage h2o.grep(pattern, x, ignore.case = FALSE, invert = FALSE, output.logical = FALSE) Arguments pattern A character string containing a regular expression. x An H2O frame that wraps a single string column. ignore.case If TRUE case is ignored during matching. invert Identify elements that do not match the pattern. output.logical If TRUE returns logical vector of indicators instead of list of matching positions Details This function has similar semantics as R’s native grep function and it supports a subset of its parameters. Default behavior is to return indices of the elements matching the pattern. Parameter ‘output.logical‘ can be used to return a logical vector indicating if the element matches the pattern (1) or not (0). Value H2OFrame holding the matching positions or a logical vector if ‘output.logical‘ is enabled. Examples library(h2o) h2o.init() addresses <- as.h2o(c("2307", "Leghorn St", "Mountain View", "CA", "94043")) zip.codes <- addresses[h2o.grep("[0-9]{5}", addresses, output.logical = TRUE),] h2o.grid h2o.grid 91 H2O Grid Support Description Provides a set of functions to launch a grid search and get its results. Usage h2o.grid(algorithm, grid_id, x, y, training_frame, ..., hyper_params = list(), is_supervised = NULL, do_hyper_params_check = FALSE, search_criteria = NULL) Arguments algorithm Name of algorithm to use in grid search (gbm, randomForest, kmeans, glm, deeplearning, naivebayes, pca). grid_id (Optional) ID for resulting grid search. If it is not specified then it is autogenerated. x (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. y The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. training_frame Id of the training data frame. ... arguments describing parameters to use with algorithm (i.e., x, y, training_frame). Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning - for available parameters. hyper_params List of lists of hyper parameters (i.e., list(ntrees=c(1,2), max_depth=c(5,7))). is_supervised (Optional) If specified then override the default heuristic which decides if the given algorithm name and parameters specify a supervised or unsupervised algorithm. do_hyper_params_check Perform client check for specified hyper parameters. It can be time expensive for large hyper space. search_criteria (Optional) List of control parameters for smarter hyperparameter search. The default strategy ’Cartesian’ covers the entire space of hyperparameter combinations. Specify the ’RandomDiscrete’ strategy to get random search of all the combinations of your hyperparameters. RandomDiscrete should be usually combined with at least one early stopping criterion, max_models and/or max_runtime_secs, e.g. list(strategy = "RandomDiscrete", max_models = 42, max_runtime_ or list(strategy = "RandomDiscrete", stopping_metric = "AUTO", stopping_tolerance = or list(strategy = "RandomDiscrete", stopping_metric = "misclassification", stoppin Details Launch grid search with given algorithm and parameters. 92 h2o.group_by Examples library(h2o) library(jsonlite) h2o.init() iris.hex <- as.h2o(iris) grid <- h2o.grid("gbm", x = c(1:4), y = 5, training_frame = iris.hex, hyper_params = list(ntrees = c(1,2,3))) # Get grid summary summary(grid) # Fetch grid models model_ids <- grid@model_ids models <- lapply(model_ids, function(id) { h2o.getModel(id)}) h2o.group_by Group and Apply by Column Description Performs a group by and apply similar to ddply. Usage h2o.group_by(data, by, ..., gb.control = list(na.methods = NULL, col.names = NULL)) Arguments data an H2OFrame object. by a list of column names ... any supported aggregate function. See Details: for more help. gb.control a list of how to handle NA values in the dataset as well as how to name output columns. The method is specified using the rm.method argument. See Details: for more help. Details In the case of na.methods within gb.control, there are three possible settings. "all" will include NAs in computation of functions. "rm" will completely remove all NA fields. "ignore" will remove NAs from the numerator but keep the rows for computational purposes. If a list smaller than the number of columns groups is supplied, the list will be padded by "ignore". Note that to specify a list of column names in the gb.control list, you must add the col.names argument. Similar to na.methods, col.names will pad the list with the default column names if the length is less than the number of colums groups supplied. Supported functions include nrow. This function is required and accepts a string for the name of the generated column. Other supported aggregate functions accept col and na arguments for specifying columns and the handling of NAs ("all", "ignore", and GroupBy object; max calculates the maximum of each column specified in col for each group of a GroupBy object; mean calculates the mean of each column specified in col for each group of a GroupBy object; min calculates the minimum of h2o.gsub 93 each column specified in col for each group of a GroupBy object; mode calculates the mode of each column specified in col for each group of a GroupBy object; sd calculates the standard deviation of each column specified in col for each group of a GroupBy object; ss calculates the sum of squares of each column specified in col for each group of a GroupBy object; sum calculates the sum of each column specified in col for each group of a GroupBy object; and var calculates the variance of each column specified in col for each group of a GroupBy object. If an aggregate is provided without a value (for example, as max in sum(col="X1", na="all").mean(col="X5", na="all").max()), then it is assumed that the aggregation should apply to all columns except the GroupBy columns. Note again that nrow is required and cannot be empty. Value Returns a new H2OFrame object with columns equivalent to the number of groups created h2o.gsub String Global Substitute Description Creates a copy of the target column in which each string has all occurence of the regex pattern replaced with the replacement substring. Usage h2o.gsub(pattern, replacement, x, ignore.case = FALSE) Arguments pattern The pattern to replace. replacement The replacement pattern. x The column on which to operate. ignore.case Case sensitive or not Examples library(h2o) h2o.init() string_to_gsub <- as.h2o("r tutorial") sub_string <- h2o.gsub("r ","H2O ",string_to_gsub) 94 h2o.hist h2o.head Return the Head or Tail of an H2O Dataset. Description Returns the first or last rows of an H2OFrame object. Usage h2o.head(x, n = 6L, ...) ## S3 method for class 'H2OFrame' head(x, n = 6L, ...) h2o.tail(x, n = 6L, ...) ## S3 method for class 'H2OFrame' tail(x, n = 6L, ...) Arguments x An H2OFrame object. n (Optional) A single integer. If positive, number of rows in x to return. If negative, all but the n first/last number of rows in x. ... Ignored. Value An H2OFrame containing the first or last n rows of an H2OFrame object. Examples library(h2o) h2o.init(ip <- "localhost", port = 54321, startH2O = TRUE) ausPath <- system.file("extdata", "australia.csv", package="h2o") australia.hex <- h2o.uploadFile(path = ausPath) head(australia.hex, 10) tail(australia.hex, 10) h2o.hist Compute A Histogram Description Compute a histogram over a numeric column. If breaks=="FD", the MAD is used over the IQR in computing bin width. Note that we do not beautify the breakpoints as R does. h2o.hit_ratio_table 95 Usage h2o.hist(x, breaks = "Sturges", plot = TRUE) Arguments x A single numeric column from an H2OFrame. breaks Can be one of the following: A string: "Sturges", "Rice", "sqrt", "Doane", "FD", "Scott" A single number for the number of breaks splitting the range of the vec into number of breaks bins of equal width A vector of numbers giving the split points, e.g., c(-50,213.2123,9324834) plot A logical value indicating whether or not a plot should be generated (default is TRUE). h2o.hit_ratio_table Retrieve the Hit Ratios Description If "train", "valid", and "xval" parameters are FALSE (default), then the training Hit Ratios value is returned. If more than one parameter is set to TRUE, then a named list of Hit Ratio tables are returned, where the names are "train", "valid" or "xval". Usage h2o.hit_ratio_table(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel object. train Retrieve the training Hit Ratio valid Retrieve the validation Hit Ratio xval Retrieve the cross-validation Hit Ratio h2o.hour Convert Milliseconds to Hour of Day in H2O Datasets Description Converts the entries of an H2OFrame object from milliseconds to hours of the day (on a 0 to 23 scale). Usage h2o.hour(x) hour(x) ## S3 method for class 'H2OFrame' hour(x) 96 h2o.ifelse Arguments x An H2OFrame object. Value An H2OFrame object containing the entries of x converted to hours of the day. See Also h2o.day h2o.ifelse H2O Apply Conditional Statement Description Applies conditional statements to numeric vectors in H2O parsed data objects when the data are numeric. Usage h2o.ifelse(test, yes, no) ifelse(test, yes, no) Arguments test A logical description of the condition to be met (>, <, =, etc...) yes The value to return if the condition is TRUE. no The value to return if the condition is FALSE. Details Both numeric and categorical values can be tested. However when returning a yes and no condition both conditions must be either both categorical or numeric. Value Returns a vector of new values matching the conditions stated in the ifelse call. Examples h2o.init() ausPath <- system.file("extdata", "australia.csv", package="h2o") australia.hex <- h2o.importFile(path = ausPath) australia.hex[,9] <- ifelse(australia.hex[,3] < 279.9, 1, 0) summary(australia.hex) h2o.importFile h2o.importFile 97 Import Files into H2O Description Imports files into an H2O cloud. The default behavior is to pass-through to the parse phase automatically. Usage h2o.importFile(path, destination_frame = "", parse = TRUE, header = NA, sep = "", col.names = NULL, col.types = NULL, na.strings = NULL, decrypt_tool = NULL) h2o.importFolder(path, pattern = "", destination_frame = "", parse = TRUE, header = NA, sep = "", col.names = NULL, col.types = NULL, na.strings = NULL, decrypt_tool = NULL) h2o.importHDFS(path, pattern = "", destination_frame = "", parse = TRUE, header = NA, sep = "", col.names = NULL, na.strings = NULL) h2o.uploadFile(path, destination_frame = "", parse = TRUE, header = NA, sep = "", col.names = NULL, col.types = NULL, na.strings = NULL, progressBar = FALSE, parse_type = NULL, decrypt_tool = NULL) Arguments path The complete URL or normalized file path of the file to be imported. Each row of data appears as one line of the file. destination_frame (Optional) The unique hex key assigned to the imported file. If none is given, a key will automatically be generated based on the URL path. parse (Optional) A logical value indicating whether the file should be parsed after import, for details see h2o.parseRaw. header (Optional) A logical value indicating whether the first line of the file contains column headers. If left empty, the parser will try to automatically detect this. sep (Optional) The field separator character. Values on each line of the file are separated by this character. If sep = "", the parser will automatically detect the separator. col.names (Optional) An H2OFrame object containing a single delimited line with the column names for the file. col.types (Optional) A vector to specify whether columns should be forced to a certain type upon import parsing. na.strings (Optional) H2O will interpret these strings as missing. decrypt_tool (Optional) Specify a Decryption Tool (key-reference acquired by calling h2o.decryptionSetup. pattern (Optional) Character string containing a regular expression to match file(s) in the folder. 98 h2o.import_sql_select progressBar (Optional) When FALSE, tell H2O parse call to block synchronously instead of polling. This can be faster for small datasets but loses the progress bar. parse_type (Optional) Specify which parser type H2O will use. Valid types are "ARFF", "XLS", "CSV", "SVMLight" Details h2o.importFile is a parallelized reader and pulls information from the server from a location specified by the client. The path is a server-side path. This is a fast, scalable, highly optimized way to read data. H2O pulls the data from a data store and initiates the data transfer as a read operation. Unlike the import function, which is a parallelized reader, h2o.uploadFile is a push from the client to the server. The specified path must be a client-side path. This is not scalable and is only intended for smaller data sizes. The client pushes the data from a local filesystem (for example, on your machine where R is running) to H2O. For big-data operations, you don’t want the data stored on or flowing through the client. h2o.importFolder imports an entire directory of files. If the given path is relative, then it will be relative to the start location of the H2O instance. The default behavior is to pass-through to the parse phase automatically. h2o.importHDFS is deprecated. Instead, use h2o.importFile. See Also h2o.import_sql_select, h2o.import_sql_table, h2o.parseRaw Examples h2o.init(ip = "localhost", port = 54321, startH2O = TRUE) prosPath = system.file("extdata", "prostate.csv", package = "h2o") prostate.hex = h2o.importFile(path = prosPath, destination_frame = "prostate.hex") class(prostate.hex) summary(prostate.hex) #Import files with a certain regex pattern by utilizing h2o.importFolder() #In this example we import all .csv files in the directory prostate_folder prosPath = system.file("extdata", "prostate_folder", package = "h2o") prostate_pattern.hex = h2o.importFolder(path = prosPath, pattern = ".*.csv", destination_frame = "prostate.hex") class(prostate_pattern.hex) summary(prostate_pattern.hex) h2o.import_sql_select Import SQL table that is result of SELECT SQL query into H2O Description Creates a temporary SQL table from the specified sql_query. Runs multiple SELECT SQL queries on the temporary table concurrently for parallel ingestion, then drops the table. Be sure to start the h2o.jar in the terminal with your downloaded JDBC driver in the classpath: ‘java -cp : : water.H2OApp‘ Also see h2o.import_sql_select. Currently supported SQL databases are MySQL, PostgreSQL, and MariaDB. Support for Oracle 12g and Microsoft SQL Server Usage h2o.import_sql_table(connection_url, table, username, password, columns = NULL, optimize = NULL) Arguments connection_url URL of the SQL database connection as specified by the Java Database Connectivity (JDBC) Driver. For example, "jdbc:mysql://localhost:3306/menagerie?&useSSL=false" table Name of SQL table username Username for SQL server password Password for SQL server columns (Optional) Character vector of column names to import from SQL table. Default is to import all columns. optimize (Optional) Optimize import of SQL table for faster imports. Experimental. Default is true. 100 h2o.impute Details For example, my_sql_conn_url <- "jdbc:mysql://172.16.2.178:3306/ingestSQL?&useSSL=false" table <- "citibike20k" username <- "root" password <- "abc123" my_citibike_data <- h2o.import_sql_table(my_sql_conn_ table, username, password) h2o.impute Basic Imputation of H2O Vectors Description Perform inplace imputation by filling missing values with aggregates computed on the "na.rm’d" vector. Additionally, it’s possible to perform imputation based on groupings of columns from within data; these columns can be passed by index or name to the by parameter. If a factor column is supplied, then the method must be "mode". Usage h2o.impute(data, column = 0, method = c("mean", "median", "mode"), combine_method = c("interpolate", "average", "lo", "hi"), by = NULL, groupByFrame = NULL, values = NULL) Arguments data The dataset containing the column to impute. column A specific column to impute, default of 0 means impute the whole frame. method "mean" replaces NAs with the column mean; "median" replaces NAs with the column median; "mode" replaces with the most common factor (for factor columns only); combine_method If method is "median", then choose how to combine quantiles on even sample sizes. This parameter is ignored in all other cases. by group by columns groupByFrame Impute the column col with this pre-computed grouped frame. values A vector of impute values (one per column). NaN indicates to skip the column Details The default method is selected based on the type of the column to impute. If the column is numeric then "mean" is selected; if it is categorical, then "mode" is selected. Other column types (e.g. String, Time, UUID) are not supported. Value an H2OFrame with imputed values h2o.init 101 Examples h2o.init() fr <- as.h2o(iris, destination_frame="iris") fr[sample(nrow(fr),40),5] <- NA # randomly replace 50 values with NA # impute with a group by fr <- h2o.impute(fr, "Species", "mode", by=c("Sepal.Length", "Sepal.Width")) h2o.init Initialize and Connect to H2O Description Attempts to start and/or connect to and H2O instance. Usage h2o.init(ip = "localhost", port = 54321, startH2O = TRUE, forceDL = FALSE, enable_assertions = TRUE, license = NULL, nthreads = -1, max_mem_size = NULL, min_mem_size = NULL, ice_root = tempdir(), strict_version_check = TRUE, proxy = NA_character_, https = FALSE, insecure = FALSE, username = NA_character_, password = NA_character_, cookies = NA_character_, context_path = NA_character_, ignore_config = FALSE, extra_classpath = NULL) Arguments ip Object of class character representing the IP address of the server where H2O is running. port Object of class numeric representing the port number of the H2O server. startH2O (Optional) A logical value indicating whether to try to start H2O from R if no connection with H2O is detected. This is only possible if ip = "localhost" or ip = "127.0.0.1". If an existing connection is detected, R does not start H2O. forceDL (Optional) A logical value indicating whether to force download of the H2O executable. Defaults to FALSE, so the executable will only be downloaded if it does not already exist in the h2o R library resources directory h2o/java/h2o.jar. This value is only used when R starts H2O. enable_assertions (Optional) A logical value indicating whether H2O should be launched with assertions enabled. Used mainly for error checking and debugging purposes. This value is only used when R starts H2O. license (Optional) A character string value specifying the full path of the license file. This value is only used when R starts H2O. nthreads (Optional) Number of threads in the thread pool. This relates very closely to the number of CPUs used. -1 means use all CPUs on the host (Default). A positive integer specifies the number of CPUs directly. This value is only used when R starts H2O. 102 h2o.init max_mem_size (Optional) A character string specifying the maximum size, in bytes, of the memory allocation pool to H2O. This value must a multiple of 1024 greater than 2MB. Append the letter m or M to indicate megabytes, or g or G to indicate gigabytes. This value is only used when R starts H2O. min_mem_size (Optional) A character string specifying the minimum size, in bytes, of the memory allocation pool to H2O. This value must a multiple of 1024 greater than 2MB. Append the letter m or M to indicate megabytes, or g or G to indicate gigabytes. This value is only used when R starts H2O. ice_root (Optional) A directory to handle object spillage. The defaul varies by OS. strict_version_check (Optional) Setting this to FALSE is unsupported and should only be done when advised by technical support. proxy (Optional) A character string specifying the proxy path. https (Optional) Set this to TRUE to use https instead of http. insecure (Optional) Set this to TRUE to disable SSL certificate checking. username (Optional) Username to login with. password (Optional) Password to login with. cookies (Optional) Vector(or list) of cookies to add to request. context_path (Optional) The last part of connection URL: http:// : / ignore_config (Optional) A logical value indicating whether a search for a .h2oconfig file should be conducted or not. Default value is FALSE. extra_classpath (Optional) A vector of paths to libraries to be added to the Java classpath when H2O is started from R. Details By default, this method first checks if an H2O instance is connectible. If it cannot connect and start = TRUE with ip = "localhost", it will attempt to start and instance of H2O at localhost:54321. If an open ip and port of your choice are passed in, then this method will attempt to start an H2O instance at that specified ip port. When initializing H2O locally, this method searches for h2o.jar in the R library resources (system.file("java", "h2o. and if the file does not exist, it will automatically attempt to download the correct version from Amazon S3. The user must have Internet access for this process to be successful. Once connected, the method checks to see if the local H2O R package version matches the version of H2O running on the server. If there is a mismatch and the user indicates she wishes to upgrade, it will remove the local H2O R package and download/install the H2O R package from the server. Value this method will load it and return a H2OConnection object containing the IP address and port number of the H2O server. Note Users may wish to manually upgrade their package (rather than waiting until being prompted), which requires that they fully uninstall and reinstall the H2O package, and the H2O client package. You must unload packages running in the environment before upgrading. It’s recommended that users restart R or R studio after upgrading h2o.insertMissingValues 103 See Also H2O R package documentation for more details. h2o.shutdown for shutting down from R. Examples ## Not run: # Try to connect to a local H2O instance that is already running. # If not found, start a local H2O instance from R with the default settings. h2o.init() # Try to connect to a local H2O instance. # If not found, raise an error. h2o.init(startH2O = FALSE) # Try to connect to a local H2O instance that is already running. # If not found, start a local H2O instance from R with 5 gigabytes of memory. h2o.init(max_mem_size = "5g") # Try to connect to a local H2O instance that is already running. # If not found, start a local H2O instance from R that uses 5 gigabytes of memory. h2o.init(max_mem_size = "5g") ## End(Not run) h2o.insertMissingValues Insert Missing Values into an H2OFrame Description Randomly replaces a user-specified fraction of entries in an H2O dataset with missing values. Usage h2o.insertMissingValues(data, fraction = 0.1, seed = -1) Arguments data An H2OFrame object representing the dataset. fraction A number between 0 and 1 indicating the fraction of entries to replace with missing. seed A random number used to select which entries to replace with missing values. Default of seed = -1 will automatically generate a seed in H2O. Value Returns an H2OFrame object. WARNING This will modify the original dataset. Unless this is intended, this function should only be called on a subset of the original. 104 h2o.interaction Examples library(h2o) h2o.init() irisPath <- system.file("extdata", "iris.csv", package = "h2o") iris.hex <- h2o.importFile(path = irisPath) summary(iris.hex) irismiss.hex <- h2o.insertMissingValues(iris.hex, fraction = 0.25) head(irismiss.hex) summary(irismiss.hex) h2o.interaction Categorical Interaction Feature Creation in H2O Description Creates a data frame in H2O with n-th order interaction features between categorical columns, as specified by the user. Usage h2o.interaction(data, destination_frame, factors, pairwise, max_factors, min_occurrence) Arguments data An H2OFrame object containing the categorical columns. destination_frame A string indicating the destination key. If empty, this will be auto-generated by H2O. factors Factor columns (either indices or column names). pairwise Whether to create pairwise interactions between factors (otherwise create one higher-order interaction). Only applicable if there are 3 or more factors. max_factors Max. number of factor levels in pair-wise interaction terms (if enforced, one extra catch-all factor will be made) min_occurrence Min. occurrence threshold for factor levels in pair-wise interaction terms Value Returns an H2OFrame object. Examples library(h2o) h2o.init() # Create some random data myframe <- h2o.createFrame(rows = 20, cols = 5, seed = -12301283, randomize = TRUE, value = 0, h2o.isax 105 categorical_fraction = 0.8, factors = 10, real_range = 1, integer_fraction = 0.2, integer_range = 10, binary_fraction = 0, binary_ones_fraction = 0.5, missing_fraction = 0.2, response_factors = 1) # Turn integer column into a categorical myframe[,5] <- as.factor(myframe[,5]) head(myframe, 20) # Create pairwise interactions pairwise <- h2o.interaction(myframe, destination_frame = 'pairwise', factors = list(c(1,2),c("C2","C3","C4")), pairwise=TRUE, max_factors = 10, min_occurrence = 1) head(pairwise, 20) h2o.levels(pairwise,2) # Create 5-th order interaction higherorder <- h2o.interaction(myframe, destination_frame = 'higherorder', factors = c(1,2,3,4,5), pairwise=FALSE, max_factors = 10000, min_occurrence = 1) head(higherorder, 20) # Limit the number of factors of the "categoricalized" integer column # to at most 3 factors, and only if they occur at least twice head(myframe[,5], 20) trim_integer_levels <- h2o.interaction(myframe, destination_frame = 'trim_integers', factors = "C5", pairwise = FALSE, max_factors = 3, min_occurrence = 2) head(trim_integer_levels, 20) # Put all together myframe <- h2o.cbind(myframe, pairwise, higherorder, trim_integer_levels) myframe head(myframe,20) summary(myframe) h2o.isax iSAX Description Compute the iSAX index for a DataFrame which is assumed to be numeric time series data Usage h2o.isax(x, num_words, max_cardinality, optimize_card = FALSE) Arguments x an H2OFrame num_words Number of iSAX words for the timeseries. ie granularity along the time series max_cardinality Maximum cardinality of the iSAX word. Each word can have less than the max optimize_card An optimization flag that will find the max cardinality regardless of what is passed in for max_cardinality. 106 h2o.isfactor Value An H2OFrame with the name of time series, string representation of iSAX word, followed by binary representation References http://www.cs.ucr.edu/~eamonn/iSAX_2.0.pdf http://www.cs.ucr.edu/~eamonn/SAX.pdf h2o.ischaracter Check if character Description Check if character Usage h2o.ischaracter(x) Arguments x An H2OFrame object. See Also is.character for the base R implementation. h2o.isfactor Check if factor Description Check if factor Usage h2o.isfactor(x) Arguments x An H2OFrame object. See Also is.factor for the base R implementation. h2o.isnumeric 107 h2o.isnumeric Check if numeric Description Check if numeric Usage h2o.isnumeric(x) Arguments x An H2OFrame object. See Also is.numeric for the base R implementation. h2o.is_client Check Client Mode Connection Description Check Client Mode Connection Usage h2o.is_client() h2o.kfold_column Produce a k-fold column vector. Description Create a k-fold vector useful for H2O algorithms that take a fold_assignments argument. Usage h2o.kfold_column(data, nfolds, seed = -1) Arguments data nfolds seed A dataframe against which to create the fold column. The number of desired folds. A random seed, -1 indicates that H2O will choose one. Value Returns an H2OFrame object with fold assignments. 108 h2o.kmeans h2o.killMinus3 Dump the stack into the JVM’s stdout. Description A poor man’s profiler, but effective. Usage h2o.killMinus3() h2o.kmeans Performs k-means clustering on an H2O dataset Description Performs k-means clustering on an H2O dataset Usage h2o.kmeans(training_frame, x, model_id = NULL, validation_frame = NULL, nfolds = 0, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), fold_column = NULL, ignore_const_cols = TRUE, score_each_iteration = FALSE, k = 1, estimate_k = FALSE, user_points = NULL, max_iterations = 10, standardize = TRUE, seed = -1, init = c("Random", "PlusPlus", "Furthest", "User"), max_runtime_secs = 0, categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited")) Arguments training_frame Id of the training data frame. x A vector containing the character names of the predictors in the model. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. nfolds Number of folds for K-fold cross-validation (0 to disable or >= 2). Defaults to 0. keep_cross_validation_predictions Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE. keep_cross_validation_fold_assignment Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE. h2o.kmeans 109 fold_assignment Cross-validation fold assignment scheme, if fold_column is not specified. The ’Stratified’ option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO. fold_column Column with cross-validation fold index assignment per observation. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. k The max. number of clusters. If estimate_k is disabled, the model will find k centroids, otherwise it will find up to k centroids. Defaults to 1. estimate_k Logical. Whether to estimate the number of clusters (<=k) iteratively and deterministically. Defaults to FALSE. user_points This option allows you to specify a dataframe, where each row represents an initial cluster center. The user- specified points must have the same number of columns as the training observations. The number of rows must equal the number of clusters max_iterations Maximum training iterations (if estimate_k is enabled, then this is for each inner Lloyds iteration) Defaults to 10. standardize Logical. Standardize columns before computing distances Defaults to TRUE. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number). init Initialization mode Must be one of: "Random", "PlusPlus", "Furthest", "User". Defaults to Furthest. max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. categorical_encoding Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. Value Returns an object of class H2OClusteringModel. See Also h2o.cluster_sizes, h2o.totss, h2o.num_iterations, h2o.betweenss, h2o.tot_withinss, h2o.withinss, h2o.centersSTD, h2o.centers Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) 110 h2o.levels h2o.kmeans(training_frame = prostate.hex, k = 10, x = c("AGE", "RACE", "VOL", "GLEASON")) h2o.kurtosis Kurtosis of a column Description Obtain the kurtosis of a column of a parsed H2O data object. Usage h2o.kurtosis(x, ..., na.rm = TRUE) kurtosis.H2OFrame(x, ..., na.rm = TRUE) Arguments x An H2OFrame object. ... Further arguments to be passed from or to other methods. na.rm A logical value indicating whether NA or missing values should be stripped before the computation. Value Returns a list containing the kurtosis for each column (NaN for non-numeric columns). Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) h2o.kurtosis(prostate.hex$AGE) h2o.levels Return the levels from the column requested column. Description Return the levels from the column requested column. Usage h2o.levels(x, i) Arguments x An H2OFrame object. i Optional, the index of the column whose domain is to be returned. h2o.listTimezones 111 See Also levels for the base R method. Examples iris.hex <- as.h2o(iris) h2o.levels(iris.hex, 5) # returns "setosa" h2o.listTimezones "versicolor" "virginica" List all of the Time Zones Acceptable by the H2O Cloud. Description List all of the Time Zones Acceptable by the H2O Cloud. Usage h2o.listTimezones() h2o.list_all_extensions List all H2O registered extensions Description List all H2O registered extensions Usage h2o.list_all_extensions() h2o.list_api_extensions List registered API extensions Description List registered API extensions Usage h2o.list_api_extensions() 112 h2o.loadModel h2o.list_core_extensions List registered core extensions Description List registered core extensions Usage h2o.list_core_extensions() h2o.loadModel Load H2O Model from HDFS or Local Disk Description Load a saved H2O model from disk. (Note that ensemble binary models can now be loaded using this method.) Usage h2o.loadModel(path) Arguments path The path of the H2O Model to be imported. and port of the server running H2O. Value Returns a H2OModel object of the class corresponding to the type of model built. See Also h2o.saveModel, H2OModel Examples ## Not run: # library(h2o) # h2o.init() # prosPath = system.file("extdata", "prostate.csv", package = "h2o") # prostate.hex = h2o.importFile(path = prosPath, destination_frame = "prostate.hex") # prostate.glm = h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), # training_frame = prostate.hex, family = "binomial", alpha = 0.5) # glmmodel.path = h2o.saveModel(prostate.glm, dir = "/Users/UserName/Desktop") # glmmodel.load = h2o.loadModel(glmmodel.path) ## End(Not run) h2o.log 113 h2o.log Compute the logarithm of x Description Compute the logarithm of x Usage h2o.log(x) Arguments x An H2OFrame object. See Also log for the base R implementation. h2o.log10 Compute the log10 of x Description Compute the log10 of x Usage h2o.log10(x) Arguments x An H2OFrame object. See Also log10 for the base R implementation. 114 h2o.log2 h2o.log1p Compute the log1p of x Description Compute the log1p of x Usage h2o.log1p(x) Arguments x An H2OFrame object. See Also log1p for the base R implementation. h2o.log2 Compute the log2 of x Description Compute the log2 of x Usage h2o.log2(x) Arguments x An H2OFrame object. See Also log2 for the base R implementation. h2o.logAndEcho h2o.logAndEcho 115 Log a message on the server-side logs Description This is helpful when running several pieces of work one after the other on a single H2O cluster and you want to make a notation in the H2O server side log where one piece of work ends and the next piece of work begins. Usage h2o.logAndEcho(message) Arguments message A character string with the message to write to the log. Details h2o.logAndEcho sends a message to H2O for logging. Generally used for debugging purposes. h2o.logloss Retrieve the Log Loss Value Description Retrieves the log loss output for a H2OBinomialMetrics or H2OMultinomialMetrics object If "train", "valid", and "xval" parameters are FALSE (default), then the training Log Loss value is returned. If more than one parameter is set to TRUE, then a named vector of Log Losses are returned, where the names are "train", "valid" or "xval". Usage h2o.logloss(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object a H2OModelMetrics object of the correct type. train Retrieve the training Log Loss valid Retrieve the validation Log Loss xval Retrieve the cross-validation Log Loss 116 h2o.lstrip h2o.ls List Keys on an H2O Cluster Description Accesses a list of object keys in the running instance of H2O. Usage h2o.ls() Value Returns a list of hex keys in the current H2O instance. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) h2o.ls() h2o.lstrip Strip set from left Description Return a copy of the target column with leading characters removed. The set argument is a string specifying the set of characters to be removed. If omitted, the set argument defaults to removing whitespace. Usage h2o.lstrip(x, set = " ") Arguments x The column whose strings should be lstrip-ed. set string of characters to be removed Examples library(h2o) h2o.init() string_to_lstrip <- as.h2o("1234567890") lstrip_string <- h2o.lstrip(string_to_lstrip,"123") #Remove "123" h2o.mae 117 h2o.mae Retrieve the Mean Absolute Error Value Description Retrieves the mean absolute error (MAE) value from an H2O model. If "train", "valid", and "xval" parameters are FALSE (default), then the training MAE value is returned. If more than one parameter is set to TRUE, then a named vector of MAEs are returned, where the names are "train", "valid" or "xval". Usage h2o.mae(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel object. train Retrieve the training MAE valid Retrieve the validation set MAE if a validation set was passed in during model build time. xval Retrieve the cross-validation MAE Examples library(h2o) h <- h2o.init() fr <- as.h2o(iris) m <- h2o.deeplearning(x=2:5,y=1,training_frame=fr) h2o.mae(m) h2o.makeGLMModel Set betas of an existing H2O GLM Model Description This function allows setting betas of an existing glm model. Usage h2o.makeGLMModel(model, beta) Arguments model an H2OModel corresponding from a h2o.glm call. beta a new set of betas (a named vector) 118 h2o.match h2o.make_metrics Create Model Metrics from predicted and actual values in H2O Description Given predicted values (target for regression, class-1 probabilities or binomial or per-class probabilities for multinomial), compute a model metrics object Usage h2o.make_metrics(predicted, actuals, domain = NULL, distribution = NULL) Arguments predicted An H2OFrame containing predictions actuals An H2OFrame containing actual values domain Vector with response factors for classification. distribution Distribution for regression. Value Returns an object of the H2OModelMetrics subclass. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) prostate.hex$CAPSULE <- as.factor(prostate.hex$CAPSULE) prostate.gbm <- h2o.gbm(3:9, "CAPSULE", prostate.hex) pred <- h2o.predict(prostate.gbm, prostate.hex)[,3] ## class-1 probability h2o.make_metrics(pred,prostate.hex$CAPSULE) h2o.match Value Matching in H2O Description match and %in% return values similar to the base R generic functions. Usage h2o.match(x, table, nomatch = 0, incomparables = NULL) match.H2OFrame(x, table, nomatch = 0, incomparables = NULL) x %in% table h2o.max 119 Arguments x a categorical vector from an H2OFrame object with values to be matched. table an R object to match x against. nomatch the value to be returned in the case when no match is found. incomparables a vector of calues that cannot be matched. Any value in x matching a value in this vector is assigned the nomatch value. Value Returns a vector of the positions of (first) matches of its first argument in its second See Also match for base R implementation. Examples h2o.init() hex <- as.h2o(iris) h2o.match(hex[,5], c("setosa", "versicolor")) h2o.max Returns the maxima of the input values. Description Returns the maxima of the input values. Usage h2o.max(x, na.rm = FALSE) Arguments x An H2OFrame object. na.rm logical. indicating whether missing values should be removed. See Also max for the base R implementation. 120 h2o.mean h2o.mean Compute the frame’s mean by-column (or by-row). Description Compute the frame’s mean by-column (or by-row). Usage h2o.mean(x, na.rm = FALSE, axis = 0, return_frame = FALSE, ...) ## S3 method for class 'H2OFrame' mean(x, na.rm = FALSE, axis = 0, return_frame = FALSE, ...) Arguments x An H2OFrame object. na.rm logical. Indicate whether missing values should be removed. axis integer. Indicate whether to calculate the mean down a column (0) or across a row (1). NOTE: This is only applied when return_frame is set to TRUE. Otherwise, this parameter is ignored. return_frame logical. Indicate whether to return an H2O frame or a list. Default is FALSE (returns a list). ... Further arguments to be passed from or to other methods. Value Returns a list containing the mean for each column (NaN for non-numeric columns) if return_frame is set to FALSE. If return_frame is set to TRUE, then it will return an H2O frame with means per column or row (depends on axis argument). See Also mean , rowMeans, or colMeans for the base R implementation Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) # Default behavior. Will return list of means per column. h2o.mean(prostate.hex$AGE) # return_frame set to TRUE. This will return an H2O Frame # with mean per row or column (depends on axis argument) h2o.mean(prostate.hex,na.rm=TRUE,axis=1,return_frame=TRUE) h2o.mean_per_class_error 121 h2o.mean_per_class_error Retrieve the mean per class error Description Retrieves the mean per class error from an H2OBinomialMetrics. If "train", "valid", and "xval" parameters are FALSE (default), then the training mean per class error value is returned. If more than one parameter is set to TRUE, then a named vector of mean per class errors are returned, where the names are "train", "valid" or "xval". Usage h2o.mean_per_class_error(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OBinomialMetrics object. train Retrieve the training mean per class error valid Retrieve the validation mean per class error xval Retrieve the cross-validation mean per class error See Also h2o.mse for MSE, and h2o.metric for the various threshold metrics. See h2o.performance for creating H2OModelMetrics objects. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") hex <- h2o.uploadFile(prosPath) hex[,2] <- as.factor(hex[,2]) model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli") perf <- h2o.performance(model, hex) h2o.mean_per_class_error(perf) h2o.mean_per_class_error(model, train=TRUE) 122 h2o.median h2o.mean_residual_deviance Retrieve the Mean Residual Deviance value Description Retrieves the Mean Residual Deviance value from an H2O model. If "train", "valid", and "xval" parameters are FALSE (default), then the training Mean Residual Deviance value is returned. If more than one parameter is set to TRUE, then a named vector of Mean Residual Deviances are returned, where the names are "train", "valid" or "xval". Usage h2o.mean_residual_deviance(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel object. train Retrieve the training Mean Residual Deviance valid Retrieve the validation Mean Residual Deviance xval Retrieve the cross-validation Mean Residual Deviance Examples library(h2o) h <- h2o.init() fr <- as.h2o(iris) m <- h2o.deeplearning(x=2:5,y=1,training_frame=fr) h2o.mean_residual_deviance(m) h2o.median H2O Median Description Compute the median of an H2OFrame. Usage h2o.median(x, na.rm = TRUE) ## S3 method for class 'H2OFrame' median(x, na.rm = TRUE) h2o.merge 123 Arguments x An H2OFrame object. na.rm a logical, indicating whether na’s are omitted. Value Returns a list containing the median for each column (NaN for non-numeric columns) Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath, destination_frame = "prostate.hex") h2o.median(prostate.hex) h2o.merge Merge Two H2O Data Frames Description Merges two H2OFrame objects with the same arguments and meanings as merge() in base R. However, we do not support all=TRUE, all.x=TRUE and all.y=TRUE. The default method is auto and it will default to the radix method. The radix method will return the correct merge result regardless of duplicated rows in the right frame. In addition, the radix method can perform merge even if you have string columns in your frames. If there are duplicated rows in your rite frame, they will not be included if you use the hash method. The hash method cannot perform merge if you have string columns in your left frame. Hence, we consider the radix method superior to the hash method and is the default method to use. Usage h2o.merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, method = "auto") Arguments x, y H2OFrame objects by columns used for merging by default the common names by.x x columns used for merging by name or number by.y y columns used for merging by name or number all TRUE includes all rows in x and all rows in y even if there is no match to the other all.x If all.x is true, all rows in the x will be included, even if there is no matching row in y, and vice-versa for all.y. all.y see all.x method auto(default), radix, hash 124 h2o.metric Examples h2o.init() left <- data.frame(fruit = c('apple', 'orange', 'banana', 'lemon', 'strawberry', 'blueberry'), color <- c('red', 'orange', 'yellow', 'yellow', 'red', 'blue')) right <- data.frame(fruit = c('apple', 'orange', 'banana', 'lemon', 'strawberry', 'watermelon'), citrus <- c(FALSE, TRUE, FALSE, TRUE, FALSE, FALSE)) l.hex <- as.h2o(left) r.hex <- as.h2o(right) left.hex <- h2o.merge(l.hex, r.hex, all.x = TRUE) h2o.metric H2O Model Metric Accessor Functions Description A series of functions that retrieve model metric details. Usage h2o.metric(object, thresholds, metric) h2o.F0point5(object, thresholds) h2o.F1(object, thresholds) h2o.F2(object, thresholds) h2o.accuracy(object, thresholds) h2o.error(object, thresholds) h2o.maxPerClassError(object, thresholds) h2o.mean_per_class_accuracy(object, thresholds) h2o.mcc(object, thresholds) h2o.precision(object, thresholds) h2o.tpr(object, thresholds) h2o.fpr(object, thresholds) h2o.fnr(object, thresholds) h2o.tnr(object, thresholds) h2o.recall(object, thresholds) h2o.metric 125 h2o.sensitivity(object, thresholds) h2o.fallout(object, thresholds) h2o.missrate(object, thresholds) h2o.specificity(object, thresholds) Arguments object An H2OModelMetrics object of the correct type. thresholds (Optional) A value or a list of values between 0.0 and 1.0. metric (Optional) A specified paramter to retrieve. Details Many of these functions have an optional thresholds parameter. Currently only increments of 0.1 are allowed. If not specified, the functions will return all possible values. Otherwise, the function will return the value for the indicated threshold. Currently, the these functions are only supported by H2OBinomialMetrics objects. Value Returns either a single value, or a list of values. See Also h2o.auc for AUC, h2o.giniCoef for the GINI coefficient, and h2o.mse for MSE. See h2o.performance for creating H2OModelMetrics objects. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") hex <- h2o.uploadFile(prosPath) hex[,2] <- as.factor(hex[,2]) model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli") perf <- h2o.performance(model, hex) h2o.F1(perf) 126 h2o.mktime h2o.min Returns the minima of the input values. Description Returns the minima of the input values. Usage h2o.min(x, na.rm = FALSE) Arguments x An H2OFrame object. na.rm logical. indicating whether missing values should be removed. See Also min for the base R implementation. h2o.mktime Compute msec since the Unix Epoch Description Compute msec since the Unix Epoch Usage h2o.mktime(year = 1970, month = 0, day = 0, hour = 0, minute = 0, second = 0, msec = 0) Arguments year Defaults to 1970 month zero based (months are 0 to 11) day zero based (days are 0 to 30) hour hour minute minute second second msec msec h2o.month h2o.month 127 Convert Milliseconds to Months in H2O Datasets Description Converts the entries of an H2OFrame object from milliseconds to months (on a 1 to 12 scale). Usage h2o.month(x) month(x) ## S3 method for class 'H2OFrame' month(x) Arguments x An H2OFrame object. Value An H2OFrame object containing the entries of x converted to months of the year. See Also h2o.year h2o.mse Retrieves Mean Squared Error Value Description Retrieves the mean squared error value from an H2OModelMetrics object. If "train", "valid", and "xval" parameters are FALSE (default), then the training MSEvalue is returned. If more than one parameter is set to TRUE, then a named vector of MSEs are returned, where the names are "train", "valid" or "xval". Usage h2o.mse(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModelMetrics object of the correct type. train Retrieve the training MSE valid Retrieve the validation MSE xval Retrieve the cross-validation MSE 128 h2o.nacnt Details This function only supports H2OBinomialMetrics, H2OMultinomialMetrics, and H2ORegressionMetrics objects. See Also h2o.auc for AUC, h2o.mse for MSE, and h2o.metric for the various threshold metrics. See h2o.performance for creating H2OModelMetrics objects. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") hex <- h2o.uploadFile(prosPath) hex[,2] <- as.factor(hex[,2]) model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli") perf <- h2o.performance(model, hex) h2o.mse(perf) h2o.nacnt Count of NAs per column Description Gives the count of NAs per column. Usage h2o.nacnt(x) Arguments x An H2OFrame object. Value Returns a list containing the count of NAs per column Examples h2o.init() iris.hex <- as.h2o(iris) h2o.nacnt(iris.hex) # should return all 0s h2o.insertMissingValues(iris.hex) h2o.nacnt(iris.hex) h2o.naiveBayes h2o.naiveBayes 129 Compute naive Bayes probabilities on an H2O dataset. Description The naive Bayes classifier assumes independence between predictor variables conditional on the response, and a Gaussian distribution of numeric predictors with mean and standard deviation computed from the training dataset. When building a naive Bayes classifier, every row in the training dataset that contains at least one NA will be skipped completely. If the test dataset has missing values, then those predictors are omitted in the probability calculation during prediction. Usage h2o.naiveBayes(x, y, training_frame, model_id = NULL, nfolds = 0, seed = -1, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), fold_column = NULL, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, validation_frame = NULL, ignore_const_cols = TRUE, score_each_iteration = FALSE, balance_classes = FALSE, class_sampling_factors = NULL, max_after_balance_size = 5, max_hit_ratio_k = 0, laplace = 0, threshold = 0.001, min_sdev = 0.001, eps = 0, eps_sdev = 0, min_prob = 0.001, eps_prob = 0, compute_metrics = TRUE, max_runtime_secs = 0) Arguments x (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. y The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. nfolds Number of folds for K-fold cross-validation (0 to disable or >= 2). Defaults to 0. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number). fold_assignment Cross-validation fold assignment scheme, if fold_column is not specified. The ’Stratified’ option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO. fold_column Column with cross-validation fold index assignment per observation. keep_cross_validation_predictions Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE. 130 h2o.naiveBayes keep_cross_validation_fold_assignment Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE. validation_frame Id of the validation data frame. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. balance_classes Logical. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to FALSE. class_sampling_factors Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. max_after_balance_size Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. Defaults to 5.0. max_hit_ratio_k Max. number (top K) of predictions to use for hit ratio computation (for multiclass only, 0 to disable) Defaults to 0. laplace Laplace smoothing parameter Defaults to 0. threshold This argument is deprecated, use ‘min_sdev‘ instead. The minimum standard deviation to use for observations without enough data. Must be at least 1e-10. min_sdev The minimum standard deviation to use for observations without enough data. Must be at least 1e-10. eps This argument is deprecated, use ‘eps_sdev‘ instead. A threshold cutoff to deal with numeric instability, must be positive. eps_sdev A threshold cutoff to deal with numeric instability, must be positive. min_prob Min. probability to use for observations with not enough data. eps_prob Cutoff below which probability is replaced with min_prob. compute_metrics Logical. Compute metrics on training data Defaults to TRUE. max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. Details The naive Bayes classifier assumes independence between predictor variables conditional on the response, and a Gaussian distribution of numeric predictors with mean and standard deviation computed from the training dataset. When building a naive Bayes classifier, every row in the training dataset that contains at least one NA will be skipped completely. If the test dataset has missing values, then those predictors are omitted in the probability calculation during prediction. Value Returns an object of class H2OBinomialModel if the response has two categorical levels, and H2OMultinomialModel otherwise. h2o.names 131 Examples h2o.init() votesPath <- system.file("extdata", "housevotes.csv", package="h2o") votes.hex <- h2o.uploadFile(path = votesPath, header = TRUE) h2o.naiveBayes(x = 2:17, y = 1, training_frame = votes.hex, laplace = 3) h2o.names Column names of an H2OFrame Description Column names of an H2OFrame Usage h2o.names(x) Arguments x An H2OFrame object. See Also names for the base R implementation. h2o.na_omit Remove Rows With NAs Description Remove Rows With NAs Usage h2o.na_omit(object, ...) Arguments object H2OFrame object ... Ignored Value Returns an H2OFrame object containing non-NA rows. 132 h2o.ncol h2o.nchar String length Description String length Usage h2o.nchar(x) Arguments x The column whose string lengths will be returned. Examples library(h2o) h2o.init() string_to_nchar <- as.h2o("r tutorial") nchar_string <- h2o.nchar(string_to_nchar) h2o.ncol Return the number of columns present in x. Description Return the number of columns present in x. Usage h2o.ncol(x) Arguments x An H2OFrame object. See Also ncol for the base R implementation. h2o.networkTest 133 h2o.networkTest View Network Traffic Speed Description View speed with various file sizes. Usage h2o.networkTest() Value Returns a table listing the network speed for 1B, 10KB, and 10MB. h2o.nlevels Get the number of factor levels for this frame. Description Get the number of factor levels for this frame. Usage h2o.nlevels(x) Arguments x An H2OFrame object. See Also nlevels for the base R method. h2o.no_progress Description Disable Progress Bar Usage h2o.no_progress() Disable Progress Bar 134 h2o.null_deviance h2o.nrow Return the number of rows present in x. Description Return the number of rows present in x. Usage h2o.nrow(x) Arguments x An H2OFrame object. See Also nrow for the base R implementation. h2o.null_deviance Retrieve the null deviance Description If "train", "valid", and "xval" parameters are FALSE (default), then the training null deviance value is returned. If more than one parameter is set to TRUE, then a named vector of null deviances are returned, where the names are "train", "valid" or "xval". Usage h2o.null_deviance(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel or H2OModelMetrics train Retrieve the training null deviance valid Retrieve the validation null deviance xval Retrieve the cross-validation null deviance h2o.null_dof 135 h2o.null_dof Retrieve the null degrees of freedom Description If "train", "valid", and "xval" parameters are FALSE (default), then the training null degrees of freedom value is returned. If more than one parameter is set to TRUE, then a named vector of null degrees of freedom are returned, where the names are "train", "valid" or "xval". Usage h2o.null_dof(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel or H2OModelMetrics train Retrieve the training null degrees of freedom valid Retrieve the validation null degrees of freedom xval Retrieve the cross-validation null degrees of freedom h2o.num_iterations Retrieve the number of iterations. Description Retrieve the number of iterations. Usage h2o.num_iterations(object) Arguments object An H2OClusteringModel object. ... further arguments to be passed on (currently unimplemented) 136 h2o.openLog h2o.num_valid_substrings Count of substrings >= 2 chars that are contained in file Description Find the count of all possible substrings >= 2 chars that are contained in the specified line-separated text file. Usage h2o.num_valid_substrings(x, path) Arguments x path The column on which to calculate the number of valid substrings. Path to text file containing line-separated strings to be referenced. h2o.openLog View H2O R Logs Description Open existing logs of H2O R POST commands and error resposnes on local disk. Used primarily for debugging purposes. Usage h2o.openLog(type) Arguments type Currently unimplemented. See Also h2o.startLogging, h2o.stopLogging, h2o.clearLog Examples ## Not run: h2o.init() h2o.startLogging() ausPath = system.file("extdata", "australia.csv", package="h2o") australia.hex = h2o.importFile(path = ausPath) h2o.stopLogging() # Not run to avoid windows being opened during R CMD check # h2o.openLog("Command") # h2o.openLog("Error") ## End(Not run) h2o.parseRaw h2o.parseRaw 137 H2O Data Parsing Description The second phase in the data ingestion step. Usage h2o.parseRaw(data, pattern = "", destination_frame = "", header = NA, sep = "", col.names = NULL, col.types = NULL, na.strings = NULL, blocking = FALSE, parse_type = NULL, chunk_size = NULL, decrypt_tool = NULL) Arguments data An H2OFrame object to be parsed. pattern (Optional) Character string containing a regular expression to match file(s) in the folder. destination_frame (Optional) The hex key assigned to the parsed file. header (Optional) A logical value indicating whether the first row is the column header. If missing, H2O will automatically try to detect the presence of a header. sep (Optional) The field separator character. Values on each line of the file are separated by this character. If sep = "", the parser will automatically detect the separator. col.names (Optional) An H2OFrame object containing a single delimited line with the column names for the file. col.types (Optional) A vector specifying the types to attempt to force over columns. na.strings (Optional) H2O will interpret these strings as missing. blocking (Optional) Tell H2O parse call to block synchronously instead of polling. This can be faster for small datasets but loses the progress bar. parse_type (Optional) Specify which parser type H2O will use. Valid types are "ARFF", "XLS", "CSV", "SVMLight" chunk_size size of chunk of (input) data in bytes decrypt_tool (Optional) Specify a Decryption Tool (key-reference acquired by calling h2o.decryptionSetup. Details Parse the Raw Data produced by the import phase. See Also h2o.importFile, h2o.parseSetup 138 h2o.parseSetup h2o.parseSetup Get a parse setup back for the staged data. Description Get a parse setup back for the staged data. Usage h2o.parseSetup(data, pattern = "", destination_frame = "", header = NA, sep = "", col.names = NULL, col.types = NULL, na.strings = NULL, parse_type = NULL, chunk_size = NULL, decrypt_tool = NULL) Arguments data An H2OFrame object to be parsed. pattern (Optional) Character string containing a regular expression to match file(s) in the folder. destination_frame (Optional) The hex key assigned to the parsed file. header (Optional) A logical value indicating whether the first row is the column header. If missing, H2O will automatically try to detect the presence of a header. sep (Optional) The field separator character. Values on each line of the file are separated by this character. If sep = "", the parser will automatically detect the separator. col.names (Optional) An H2OFrame object containing a single delimited line with the column names for the file. col.types (Optional) A vector specifying the types to attempt to force over columns. na.strings (Optional) H2O will interpret these strings as missing. parse_type (Optional) Specify which parser type H2O will use. Valid types are "ARFF", "XLS", "CSV", "SVMLight" chunk_size size of chunk of (input) data in bytes decrypt_tool (Optional) Specify a Decryption Tool (key-reference acquired by calling h2o.decryptionSetup. See Also h2o.parseRaw h2o.partialPlot h2o.partialPlot 139 Partial Dependence Plots Description Partial dependence plot gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. Note: Unlike randomForest’s partialPlot when plotting partial dependence the mean response (probabilities) is returned rather than the mean of the log class probability. Usage h2o.partialPlot(object, data, cols, destination_key, nbins = 20, plot = TRUE, plot_stddev = TRUE) Arguments object An H2OModel object. data An H2OFrame object used for scoring and constructing the plot. cols Feature(s) for which partial dependence will be calculated. destination_key An key reference to the created partial dependence tables in H2O. nbins Number of bins used. For categorical columns make sure the number of bins exceed the level count. plot A logical specifying whether to plot partial dependence table. plot_stddev A logical specifying whether to add std err to partial dependence plot. Value Plot and list of calculated mean response tables for each feature requested. Examples library(h2o) h2o.init() prostate.path <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prostate.path, destination_frame = "prostate.hex") prostate.hex[, "CAPSULE"] <- as.factor(prostate.hex[, "CAPSULE"] ) prostate.hex[, "RACE"] <- as.factor(prostate.hex[,"RACE"] ) prostate.gbm <- h2o.gbm(x = c("AGE","RACE"), y = "CAPSULE", training_frame = prostate.hex, ntrees = 10, max_depth = 5, learn_rate = 0.1) h2o.partialPlot(object = prostate.gbm, data = prostate.hex, cols = c("AGE", "RACE")) 140 h2o.performance h2o.performance Model Performance Metrics in H2O Description Given a trained h2o model, compute its performance on the given dataset Usage h2o.performance(model, newdata = NULL, train = FALSE, valid = FALSE, xval = FALSE, data = NULL) Arguments model An H2OModel object newdata An H2OFrame. The model will make predictions on this dataset, and subsequently score them. The dataset should match the dataset that was used to train the model, in terms of column names, types, and dimensions. If newdata is passed in, then train, valid, and xval are ignored. train A logical value indicating whether to return the training metrics (constructed during training). Note: when the trained h2o model uses balance_classes, the training metrics constructed during training will be from the balanced training dataset. For more information visit: https://0xdata.atlassian.net/browse/TN-9 valid A logical value indicating whether to return the validation metrics (constructed during training). xval A logical value indicating whether to return the cross-validation metrics (constructed during training). data (DEPRECATED) An H2OFrame. This argument is now called ‘newdata‘. Value Returns an object of the H2OModelMetrics subclass. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) prostate.hex$CAPSULE <- as.factor(prostate.hex$CAPSULE) prostate.gbm <- h2o.gbm(3:9, "CAPSULE", prostate.hex) h2o.performance(model = prostate.gbm, newdata=prostate.hex) ## If model uses balance_classes ## the results from train = TRUE will not match the results from newdata = prostate.hex prostate.gbm.balanced <- h2o.gbm(3:9, "CAPSULE", prostate.hex, balance_classes = TRUE) h2o.performance(model = prostate.gbm.balanced, newdata = prostate.hex) h2o.performance(model = prostate.gbm.balanced, train = TRUE) h2o.pivot h2o.pivot 141 Pivot a frame Description Pivot the frame designated by the three columns: index, column, and value. Index and column should be of type enum, int, or time. For cases of multiple indexes for a column label, the aggregation method is to pick the first occurrence in the data frame Usage h2o.pivot(x, index, column, value) Arguments x an H2OFrame index the column where pivoted rows should be aligned on column the column to pivot value values of the pivoted table Value An H2OFrame with columns from the columns arg, aligned on the index arg, with values from values arg h2o.prcomp Principal component analysis of an H2O data frame Description Principal components analysis of an H2O data frame using the power method to calculate the singular value decomposition of the Gram matrix. Usage h2o.prcomp(training_frame, x, model_id = NULL, validation_frame = NULL, ignore_const_cols = TRUE, score_each_iteration = FALSE, transform = c("NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"), pca_method = c("GramSVD", "Power", "Randomized", "GLRM"), k = 1, max_iterations = 1000, use_all_factor_levels = FALSE, compute_metrics = TRUE, impute_missing = FALSE, seed = -1, max_runtime_secs = 0) 142 h2o.prcomp Arguments training_frame Id of the training data frame. x A vector containing the character names of the predictors in the model. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. transform Transformation of training data Must be one of: "NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE". Defaults to NONE. pca_method Method for computing PCA (Caution: GLRM is currently experimental and unstable) Must be one of: "GramSVD", "Power", "Randomized", "GLRM". Defaults to GramSVD. k Rank of matrix approximation Defaults to 1. max_iterations Maximum training iterations Defaults to 1000. use_all_factor_levels Logical. Whether first factor level is included in each categorical expansion Defaults to FALSE. compute_metrics Logical. Whether to compute metrics on the training data Defaults to TRUE. impute_missing Logical. Whether to impute missing entries with the column mean Defaults to FALSE. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number). max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. Value Returns an object of class H2ODimReductionModel. References N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011. See Also h2o.svd, h2o.glrm h2o.predict_json 143 Examples library(h2o) h2o.init() ausPath <- system.file("extdata", "australia.csv", package="h2o") australia.hex <- h2o.uploadFile(path = ausPath) h2o.prcomp(training_frame = australia.hex, k = 8, transform = "STANDARDIZE") h2o.predict_json H2O Prediction from R without having H2O running Description Provides the method h2o.predict with which you can predict a MOJO or POJO Jar model from R. Usage h2o.predict_json(model, json, genmodelpath, labels, classpath, javaoptions) Arguments model String with file name of MOJO or POJO Jar json JSON String with inputs to model genmodelpath (Optional) path name to h2o-genmodel.jar, if not set defaults to same dir as MOJO labels (Optional) if TRUE then show output labels in result classpath (Optional) Extra items for the class path of where to look for Java classes, e.g., h2o-genmodel.jar javaoptions (Optional) Java options string, default if "-Xmx4g" Value Returns an object with the prediction result Examples library(h2o) h2o.predict_json('~/GBM_model_python_1473313897851_6.zip', '{"C7":1}') h2o.predict_json('~/GBM_model_python_1473313897851_6.zip', '{"C7":1}', c(".", "lib")) 144 h2o.prod h2o.print Print An H2OFrame Description Print An H2OFrame Usage h2o.print(x, n = 6L) Arguments x An H2OFrame object n An (Optional) A single integer. If positive, number of rows in x to return. If negative, all but the n first/last number of rows in x. Anything bigger than 20 rows will require asking the server (first 20 rows are cached on the client). ... Further arguments to be passed from or to other methods. h2o.prod Return the product of all the values present in its arguments. Description Return the product of all the values present in its arguments. Usage h2o.prod(x) Arguments x An H2OFrame object. See Also prod for the base R implementation. h2o.proj_archetypes 145 h2o.proj_archetypes Convert Archetypes to Features from H2O GLRM Model Description Project each archetype in an H2O GLRM model into the corresponding feature space from the H2O training frame. Usage h2o.proj_archetypes(object, data, reverse_transform = FALSE) Arguments object An H2ODimReductionModel object that represents the model containing archetypes to be projected. data An H2OFrame object representing the training data for the H2O GLRM model. reverse_transform (Optional) A logical value indicating whether to reverse the transformation from model-building by re-scaling columns and adding back the offset to each column of the projected archetypes. Value Returns an H2OFrame object containing the projection of the archetypes down into the original feature space, where each row is one archetype. See Also h2o.glrm for making an H2ODimReductionModel. Examples library(h2o) h2o.init() irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o") iris.hex <- h2o.uploadFile(path = irisPath) iris.glrm <- h2o.glrm(training_frame = iris.hex, k = 4, loss = "Quadratic", multi_loss = "Categorical", max_iterations = 1000) iris.parch <- h2o.proj_archetypes(iris.glrm, iris.hex) head(iris.parch) 146 h2o.quantile h2o.quantile Quantiles of H2O Frames. Description Obtain and display quantiles for H2O parsed data. Usage h2o.quantile(x, probs = c(0.001, 0.01, 0.1, 0.25, 0.333, 0.5, 0.667, 0.75, 0.9, 0.99, 0.999), combine_method = c("interpolate", "average", "avg", "low", "high"), weights_column = NULL, ...) ## S3 method for class 'H2OFrame' quantile(x, probs = c(0.001, 0.01, 0.1, 0.25, 0.333, 0.5, 0.667, 0.75, 0.9, 0.99, 0.999), combine_method = c("interpolate", "average", "avg", "low", "high"), weights_column = NULL, ...) Arguments x An H2OFrame object with a single numeric column. probs Numeric vector of probabilities with values in [0,1]. combine_method How to combine quantiles for even sample sizes. Default is to do linear interpolation. E.g., If method is "lo", then it will take the lo value of the quantile. Abbreviations for average, low, and high are acceptable (avg, lo, hi). weights_column (Optional) String name of the observation weights column in x or an H2OFrame object with a single numeric column of observation weights. ... Further arguments passed to or from other methods. Details quantile.H2OFrame, a method for the quantile generic. Obtain and return quantiles for an H2OFrame object. Value A vector describing the percentiles at the given cutoffs for the H2OFrame object. Examples # Request quantiles for an H2O parsed data set: library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) # Request quantiles for a subset of columns in an H2O parsed data set quantile(prostate.hex[,3]) for(i in 1:ncol(prostate.hex)) quantile(prostate.hex[,i]) h2o.r2 147 h2o.r2 Retrieve the R2 value Description Retrieves the R2 value from an H2O model. Will return R^2 for GLM Models and will return NaN otherwise. If "train", "valid", and "xval" parameters are FALSE (default), then the training R2 value is returned. If more than one parameter is set to TRUE, then a named vector of R2s are returned, where the names are "train", "valid" or "xval". Usage h2o.r2(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel object. train Retrieve the training R2 valid Retrieve the validation set R2 if a validation set was passed in during model build time. xval Retrieve the cross-validation R2 Examples library(h2o) h <- h2o.init() fr <- as.h2o(iris) m <- h2o.glm(x=2:5,y=1,training_frame=fr) h2o.r2(m) h2o.randomForest Build a Random Forest model Description Builds a Random Forest model on an H2OFrame. Usage h2o.randomForest(x, y, training_frame, model_id = NULL, validation_frame = NULL, nfolds = 0, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, score_each_iteration = FALSE, score_tree_interval = 0, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), 148 h2o.randomForest fold_column = NULL, ignore_const_cols = TRUE, offset_column = NULL, weights_column = NULL, balance_classes = FALSE, class_sampling_factors = NULL, max_after_balance_size = 5, max_hit_ratio_k = 0, ntrees = 50, max_depth = 20, min_rows = 1, nbins = 20, nbins_top_level = 1024, nbins_cats = 1024, r2_stopping = Inf, stopping_rounds = 0, stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error"), stopping_tolerance = 0.001, max_runtime_secs = 0, seed = -1, build_tree_one_node = FALSE, mtries = -1, sample_rate = 0.6320000291, sample_rate_per_class = NULL, binomial_double_trees = FALSE, checkpoint = NULL, col_sample_rate_change_per_level = 1, col_sample_rate_per_tree = 1, min_split_improvement = 1e-05, histogram_type = c("AUTO", "UniformAdaptive", "Random", "QuantilesGlobal", "RoundRobin"), categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), calibrate_model = FALSE, calibration_frame = NULL, distribution = c("AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"), custom_metric_func = NULL, verbose = FALSE) Arguments x (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. y The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. nfolds Number of folds for K-fold cross-validation (0 to disable or >= 2). Defaults to 0. keep_cross_validation_predictions Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE. keep_cross_validation_fold_assignment Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. score_tree_interval Score the model after every so many trees. Disabled if set to 0. Defaults to 0. fold_assignment Cross-validation fold assignment scheme, if fold_column is not specified. The ’Stratified’ option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO. h2o.randomForest 149 fold_column Column with cross-validation fold index assignment per observation. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. offset_column Offset column. This argument is deprecated and has no use for Random Forest. weights_column Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. balance_classes Logical. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to FALSE. class_sampling_factors Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. max_after_balance_size Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. Defaults to 5.0. max_hit_ratio_k Max. number (top K) of predictions to use for hit ratio computation (for multiclass only, 0 to disable) Defaults to 0. ntrees Number of trees. Defaults to 50. max_depth Maximum tree depth. Defaults to 20. min_rows Fewest allowed (weighted) observations in a leaf. Defaults to 1. nbins For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point Defaults to 20. nbins_top_level For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level Defaults to 1024. nbins_cats For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting. Defaults to 1024. r2_stopping r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this Defaults to 1.797693135e+308. stopping_rounds Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0. stopping_metric Metric to use for early stopping (AUTO: logloss for classification, deviance for regression) Must be one of: "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error". Defaults to AUTO. stopping_tolerance Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0.001. 150 h2o.randomForest max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number). build_tree_one_node Logical. Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. Defaults to FALSE. mtries Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrtp for classification and p/3 for regression (where p is the # of predictors Defaults to -1. sample_rate Row sample rate per tree (from 0.0 to 1.0) Defaults to 0.6320000291. sample_rate_per_class A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree binomial_double_trees Logical. For binary classification: Build 2x as many trees (one per class) - can lead to higher accuracy. Defaults to FALSE. checkpoint Model checkpoint to resume training with. col_sample_rate_change_per_level Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) Defaults to 1. col_sample_rate_per_tree Column sample rate per tree (from 0.0 to 1.0) Defaults to 1. min_split_improvement Minimum relative improvement in squared error reduction for a split to happen Defaults to 1e-05. histogram_type What type of histogram to use for finding optimal split points Must be one of: "AUTO", "UniformAdaptive", "Random", "QuantilesGlobal", "RoundRobin". Defaults to AUTO. categorical_encoding Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. calibrate_model Logical. Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities. Defaults to FALSE. calibration_frame Calibration frame for Platt Scaling distribution Distribution. This argument is deprecated and has no use for Random Forest. custom_metric_func Reference to custom evaluation function, format: ‘language:keyName=funcName‘ verbose Logical. Print scoring history to the console (Metrics per tree for GBM, DRF, & XGBoost. Metrics per epoch for Deep Learning). Defaults to FALSE. Value Creates a H2OModel object of the right type. h2o.range 151 See Also predict.H2OModel for prediction h2o.range Returns a vector containing the minimum and maximum of all the given arguments. Description Returns a vector containing the minimum and maximum of all the given arguments. Usage h2o.range(x, na.rm = FALSE, finite = FALSE) Arguments x An H2OFrame object. na.rm logical. indicating whether missing values should be removed. finite logical. indicating if all non-finite elements should be omitted. See Also range for the base R implementation. h2o.rbind Combine H2O Datasets by Rows Description Takes a sequence of H2O data sets and combines them by rows Usage h2o.rbind(...) Arguments ... A sequence of H2OFrame arguments. All datasets must exist on the same H2O instance (IP and port) and contain the same number and types of columns. Value An H2OFrame object containing the combined . . . arguments row-wise. See Also rbind for the base R method. 152 h2o.reconstruct Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) prostate.cbind <- h2o.rbind(prostate.hex, prostate.hex) head(prostate.cbind) h2o.reconstruct Reconstruct Training Data via H2O GLRM Model Description Reconstruct the training data and impute missing values from the H2O GLRM model by computing the matrix product of X and Y, and transforming back to the original feature space by minimizing each column’s loss function. Usage h2o.reconstruct(object, data, reverse_transform = FALSE) Arguments object An H2ODimReductionModel object that represents the model to be used for reconstruction. data An H2OFrame object representing the training data for the H2O GLRM model. Used to set the domain of each column in the reconstructed frame. reverse_transform (Optional) A logical value indicating whether to reverse the transformation from model-building by re-scaling columns and adding back the offset to each column of the reconstructed frame. Value Returns an H2OFrame object containing the approximate reconstruction of the training data; See Also h2o.glrm for making an H2ODimReductionModel. Examples library(h2o) h2o.init() irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o") iris.hex <- h2o.uploadFile(path = irisPath) iris.glrm <- h2o.glrm(training_frame = iris.hex, k = 4, transform = "STANDARDIZE", loss = "Quadratic", multi_loss = "Categorical", max_iterations = 1000) iris.rec <- h2o.reconstruct(iris.glrm, iris.hex, reverse_transform = TRUE) head(iris.rec) h2o.relevel 153 h2o.relevel Reorders levels of an H2O factor, similarly to standard R’s relevel. Description The levels of a factor are reordered os that the reference level is at level 0, remaining levels are moved down as needed. Usage h2o.relevel(x, y) Arguments x factor column in h2o frame y reference level (string) Value new reordered factor column h2o.removeAll Remove All Objects on the H2O Cluster Description Removes the data from the h2o cluster, but does not remove the local references. Usage h2o.removeAll(timeout_secs = 0) Arguments timeout_secs Timeout in seconds. Default is no timeout. See Also h2o.rm Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package = "h2o") prostate.hex <- h2o.uploadFile(path = prosPath) h2o.ls() h2o.removeAll() h2o.ls() 154 h2o.rep_len h2o.removeVecs Delete Columns from an H2OFrame Description Delete the specified columns from the H2OFrame. Returns an H2OFrame without the specified columns. Usage h2o.removeVecs(data, cols) Arguments data The H2OFrame. cols The columns to remove. h2o.rep_len Replicate Elements of Vectors or Lists into H2O Description h2o.rep_len performs just as rep does. It replicates the values in x in the H2O backend. Usage h2o.rep_len(x, length.out) Arguments x an H2O frame length.out non negative integer. The desired length of the output vector. Value Creates an H2OFrame of the same type as x h2o.residual_deviance 155 h2o.residual_deviance Retrieve the residual deviance Description If "train", "valid", and "xval" parameters are FALSE (default), then the training residual deviance value is returned. If more than one parameter is set to TRUE, then a named vector of residual deviances are returned, where the names are "train", "valid" or "xval". Usage h2o.residual_deviance(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel or H2OModelMetrics train Retrieve the training residual deviance valid Retrieve the validation residual deviance xval Retrieve the cross-validation residual deviance h2o.residual_dof Retrieve the residual degrees of freedom Description If "train", "valid", and "xval" parameters are FALSE (default), then the training residual degrees of freedom value is returned. If more than one parameter is set to TRUE, then a named vector of residual degrees of freedom are returned, where the names are "train", "valid" or "xval". Usage h2o.residual_dof(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel or H2OModelMetrics train Retrieve the training residual degrees of freedom valid Retrieve the validation residual degrees of freedom xval Retrieve the cross-validation residual degrees of freedom 156 h2o.rmse h2o.rm Delete Objects In H2O Description Remove the h2o Big Data object(s) having the key name(s) from ids. Usage h2o.rm(ids) Arguments ids The object or hex key associated with the object to be removed or a vector/list of those things. See Also h2o.assign, h2o.ls h2o.rmse Retrieves Root Mean Squared Error Value Description Retrieves the root mean squared error value from an H2OModelMetrics object. If "train", "valid", and "xval" parameters are FALSE (default), then the training RMSEvalue is returned. If more than one parameter is set to TRUE, then a named vector of RMSEs are returned, where the names are "train", "valid" or "xval". Usage h2o.rmse(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModelMetrics object of the correct type. train Retrieve the training RMSE valid Retrieve the validation RMSE xval Retrieve the cross-validation RMSE Details This function only supports H2OBinomialMetrics, H2OMultinomialMetrics, and H2ORegressionMetrics objects. See Also h2o.auc for AUC, h2o.mse for RMSE, and h2o.metric for the various threshold metrics. See h2o.performance for creating H2OModelMetrics objects. h2o.rmsle 157 Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") hex <- h2o.uploadFile(prosPath) hex[,2] <- as.factor(hex[,2]) model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli") perf <- h2o.performance(model, hex) h2o.rmse(perf) h2o.rmsle Retrieve the Root Mean Squared Log Error Description Retrieves the root mean squared log error (RMSLE) value from an H2O model. If "train", "valid", and "xval" parameters are FALSE (default), then the training rmsle value is returned. If more than one parameter is set to TRUE, then a named vector of rmsles are returned, where the names are "train", "valid" or "xval". Usage h2o.rmsle(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OModel object. train Retrieve the training rmsle valid Retrieve the validation set rmsle if a validation set was passed in during model build time. xval Retrieve the cross-validation rmsle Examples library(h2o) h <- h2o.init() fr <- as.h2o(iris) m <- h2o.deeplearning(x=2:5,y=1,training_frame=fr) h2o.rmsle(m) 158 h2o.rstrip h2o.round Round doubles/floats to the given number of decimal places. Description Round doubles/floats to the given number of decimal places. Usage h2o.round(x, digits = 0) round(x, digits = 0) Arguments x An H2OFrame object. digits Number of decimal places to round doubles/floats. Rounding to a negative number of decimal places is See Also round for the base R implementation. h2o.rstrip Strip set from right Description Return a copy of the target column with trailing characters removed. The set argument is a string specifying the set of characters to be removed. If omitted, the set argument defaults to removing whitespace. Usage h2o.rstrip(x, set = " ") Arguments x The column whose strings should be rstrip-ed. set string of characters to be removed Examples library(h2o) h2o.init() string_to_rstrip <- as.h2o("1234567890") rstrip_string <- h2o.rstrip(string_to_rstrip,"890") #Remove "890" h2o.runif h2o.runif 159 Produce a Vector of Random Uniform Numbers Description Creates a vector of random uniform numbers equal in length to the length of the specified H2O dataset. Usage h2o.runif(x, seed = -1) Arguments x An H2OFrame object. seed A random seed used to generate draws from the uniform distribution. Value A vector of random, uniformly distributed numbers. The elements are between 0 and 1. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.importFile(path = prosPath, destination_frame = "prostate.hex") s <- h2o.runif(prostate.hex) summary(s) prostate.train <- prostate.hex[s <= 0.8,] prostate.train <- h2o.assign(prostate.train, "prostate.train") prostate.test <- prostate.hex[s > 0.8,] prostate.test <- h2o.assign(prostate.test, "prostate.test") nrow(prostate.train) + nrow(prostate.test) h2o.saveModel Save an H2O Model Object to Disk Description Save an H2OModel to disk. (Note that ensemble binary models can be saved.) Usage h2o.saveModel(object, path = "", force = FALSE) 160 h2o.saveModelDetails Arguments object an H2OModel object. path string indicating the directory the model will be written to. force logical, indicates how to deal with files that already exist. Details In the case of existing files force = TRUE will overwrite the file. Otherwise, the operation will fail. See Also h2o.loadModel for loading a model to H2O from disk Examples ## Not run: # library(h2o) # h2o.init() # prostate.hex <- h2o.importFile(path = paste("https://raw.github.com", # "h2oai/h2o-2/master/smalldata/logreg/prostate.csv", sep = "/"), # destination_frame = "prostate.hex") # prostate.glm <- h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), # training_frame = prostate.hex, family = "binomial", alpha = 0.5) # h2o.saveModel(object = prostate.glm, path = "/Users/UserName/Desktop", force=TRUE) ## End(Not run) h2o.saveModelDetails Save an H2O Model Details Description Save Model Details of an H2O Model in JSON Format Usage h2o.saveModelDetails(object, path = "", force = FALSE) Arguments object an H2OModel object. path string indicating the directory the model details will be written to. force logical, indicates how to deal with files that already exist. Details Model Details will download as a JSON file. In the case of existing files force = TRUE will overwrite the file. Otherwise, the operation will fail. h2o.saveMojo 161 Examples ## Not run: # library(h2o) # h2o.init() # prostate.hex <- h2o.uploadFile(path = system.file("extdata", "prostate.csv", package="h2o")) # prostate.glm <- h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), # training_frame = prostate.hex, family = "binomial", alpha = 0.5) # h2o.saveModelDetails(object = prostate.glm, path = "/Users/UserName/Desktop", force=TRUE) ## End(Not run) h2o.saveMojo Save an H2O Model Object as Mojo to Disk Description Save an MOJO (Model Object, Optimized) to disk. Usage h2o.saveMojo(object, path = "", force = FALSE) Arguments object an H2OModel object. path string indicating the directory the model will be written to. force logical, indicates how to deal with files that already exist. Details MOJO will download as a zip file. In the case of existing files force = TRUE will overwrite the file. Otherwise, the operation will fail. See Also h2o.saveModel for saving a model to disk as a binary object. Examples ## Not run: # library(h2o) # h2o.init() # prostate.hex <- h2o.uploadFile(path = system.file("extdata", "prostate.csv", package="h2o")) # prostate.glm <- h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), # training_frame = prostate.hex, family = "binomial", alpha = 0.5) # h2o.saveMojo(object = prostate.glm, path = "/Users/UserName/Desktop", force=TRUE) ## End(Not run) 162 h2o.scoreHistory h2o.scale Scaling and Centering of an H2OFrame Description Centers and/or scales the columns of an H2O dataset. Usage h2o.scale(x, center = TRUE, scale = TRUE) ## S3 method for class 'H2OFrame' scale(x, center = TRUE, scale = TRUE) Arguments x An H2OFrame object. center either a logical value or numeric vector of length equal to the number of columns of x. scale either a logical value or numeric vector of length equal to the number of columns of x. Examples library(h2o) h2o.init() irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o") iris.hex <- h2o.uploadFile(path = irisPath, destination_frame = "iris.hex") summary(iris.hex) # Scale and center all the numeric columns in iris data set scale(iris.hex[, 1:4]) h2o.scoreHistory Retrieve Model Score History Description Retrieve Model Score History Usage h2o.scoreHistory(object) Arguments object An H2OModel object. h2o.sd 163 h2o.sd Standard Deviation of a column of data. Description Obtain the standard deviation of a column of data. Usage h2o.sd(x, na.rm = FALSE) sd(x, na.rm = FALSE) Arguments x An H2OFrame object. na.rm logical. Should missing values be removed? See Also h2o.var for variance, and sd for the base R implementation. Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) sd(prostate.hex$AGE) h2o.sdev Retrieve the standard deviations of principal components Description Retrieve the standard deviations of principal components Usage h2o.sdev(object) Arguments object An H2ODimReductionModel object. 164 h2o.setTimezone h2o.setLevels Set Levels of H2O Factor Column Description Works on a single categorical vector. New domains must be aligned with the old domains. This call has SIDE EFFECTS and mutates the column in place (change of the levels will also affect all the frames that are referencing this column). If you want to make a copy of the column instead, use parameter in.place = FALSE. Usage h2o.setLevels(x, levels, in.place = TRUE) Arguments x A single categorical column. levels A character vector specifying the new levels. The number of new levels must match the number of old levels. in.place Indicates whether new domain will be directly applied to the column (in place change) or if a copy of the column will be created with the given domain levels. Examples h2o.init() iris.hex <- as.h2o(iris) new.levels <- c("setosa", "versicolor", "caroliniana") iris.hex$Species <- h2o.setLevels(iris.hex$Species, new.levels, in.place = FALSE) h2o.levels(iris.hex$Species) h2o.setTimezone Set the Time Zone on the H2O Cloud Description Set the Time Zone on the H2O Cloud Usage h2o.setTimezone(tz) Arguments tz The desired timezone. h2o.show_progress 165 h2o.show_progress Enable Progress Bar Description Enable Progress Bar Usage h2o.show_progress() h2o.shutdown Shut Down H2O Instance Description Shut down the specified instance. All data will be lost. Usage h2o.shutdown(prompt = TRUE) Arguments prompt A logical value indicating whether to prompt the user before shutting down the H2O server. Details This method checks if H2O is running at the specified IP address and port, and if it is, shuts down that H2O instance. WARNING All data, models, and other values stored on the server will be lost! Only call this function if you and all other clients connected to the H2O server are finished and have saved your work. Note Users must call h2o.shutdown explicitly in order to shut down the local H2O instance started by R. If R is closed before H2O, then an attempt will be made to automatically shut down H2O. This only applies to local instances started with h2o.init, not remote H2O servers. See Also h2o.init 166 h2o.sin Examples # Don't run automatically to prevent accidentally shutting down a cloud ## Not run: library(h2o) h2o.init() h2o.shutdown() ## End(Not run) h2o.signif Round doubles/floats to the given number of significant digits. Description Round doubles/floats to the given number of significant digits. Usage h2o.signif(x, digits = 6) signif(x, digits = 6) Arguments x An H2OFrame object. digits Number of significant digits to round doubles/floats. See Also signif for the base R implementation. h2o.sin Compute the sine of x Description Compute the sine of x Usage h2o.sin(x) Arguments x An H2OFrame object. See Also sin for the base R implementation. h2o.skewness h2o.skewness 167 Skewness of a column Description Obtain the skewness of a column of a parsed H2O data object. Usage h2o.skewness(x, ..., na.rm = TRUE) skewness.H2OFrame(x, ..., na.rm = TRUE) Arguments x An H2OFrame object. ... Further arguments to be passed from or to other methods. na.rm A logical value indicating whether NA or missing values should be stripped before the computation. Value Returns a list containing the skewness for each column (NaN for non-numeric columns). Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) h2o.skewness(prostate.hex$AGE) h2o.splitFrame Split an H2O Data Set Description Split an existing H2O data set according to user-specified ratios. The number of subsets is always 1 more than the number of given ratios. Note that this does not give an exact split. H2O is designed to be efficient on big data using a probabilistic splitting method rather than an exact split. For example, when specifying a split of 0.75/0.25, H2O will produce a test/train split with an expected value of 0.75/0.25 rather than exactly 0.75/0.25. On small datasets, the sizes of the resulting splits will deviate from the expected value more than on big data, where they will be very close to exact. Usage h2o.splitFrame(data, ratios = 0.75, destination_frames, seed = -1) 168 h2o.sqrt Arguments data An H2OFrame object representing the dataste to split. ratios A numeric value or array indicating the ratio of total rows contained in each split. Must total up to less than 1. destination_frames An array of frame IDs equal to the number of ratios specified plus one. seed Random seed. Value Returns a list of split H2OFrame’s Examples library(h2o) h2o.init() irisPath <- system.file("extdata", "iris.csv", package = "h2o") iris.hex <- h2o.importFile(path = irisPath) iris.split <- h2o.splitFrame(iris.hex, ratios = c(0.2, 0.5)) head(iris.split[[1]]) summary(iris.split[[1]]) h2o.sqrt Compute the square root of x Description Compute the square root of x Usage h2o.sqrt(x) Arguments x An H2OFrame object. See Also sqrt for the base R implementation. h2o.stackedEnsemble 169 h2o.stackedEnsemble Builds a Stacked Ensemble Description Build a stacked ensemble (aka. Super Learner) using the H2O base learning algorithms specified by the user. Usage h2o.stackedEnsemble(x, y, training_frame, model_id = NULL, validation_frame = NULL, base_models = list(), metalearner_algorithm = c("AUTO", "glm", "gbm", "drf", "deeplearning"), metalearner_nfolds = 0, metalearner_fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), metalearner_fold_column = NULL, keep_levelone_frame = FALSE, seed = -1, metalearner_params = NULL) Arguments x (Optional). A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. Training frame is used only to compute ensemble training metrics. y The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. base_models List of models (or model ids) to ensemble/stack together. Models must have been cross-validated using nfolds > 1, and folds must be identical across models. Defaults to []. metalearner_algorithm Type of algorithm to use as the metalearner. Options include ’AUTO’ (GLM with non negative weights; if validation_frame is present, a lambda search is performed), ’glm’ (GLM with default parameters), ’gbm’ (GBM with default parameters), ’drf’ (Random Forest with default parameters), or ’deeplearning’ (Deep Learning with default parameters). Must be one of: "AUTO", "glm", "gbm", "drf", "deeplearning". Defaults to AUTO. metalearner_nfolds Number of folds for K-fold cross-validation of the metalearner algorithm (0 to disable or >= 2). Defaults to 0. metalearner_fold_assignment Cross-validation fold assignment scheme for metalearner cross-validation. Defaults to AUTO (which is currently set to Random). The ’Stratified’ option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". 170 h2o.startLogging metalearner_fold_column Column with cross-validation fold index assignment per observation for crossvalidation of the metalearner. keep_levelone_frame Logical. Keep level one frame used for metalearner training. Defaults to FALSE. seed Seed for random numbers; passed through to the metalearner algorithm. Defaults to -1 (time-based random number) Defaults to -1 (time-based random number). metalearner_params Parameters for metalearner algorithm Defaults to NULL. Examples # See example R code here: # http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html h2o.startLogging Start Writing H2O R Logs Description Begin logging H2o R POST commands and error responses to local disk. Used primarily for debuggin purposes. Usage h2o.startLogging(file) Arguments file a character string name for the file, automatically generated See Also h2o.stopLogging, h2o.clearLog, h2o.openLog Examples library(h2o) h2o.init() h2o.startLogging() ausPath = system.file("extdata", "australia.csv", package="h2o") australia.hex = h2o.importFile(path = ausPath) h2o.stopLogging() h2o.std_coef_plot h2o.std_coef_plot 171 Plot Standardized Coefficient Magnitudes Description Plot a GLM model’s standardized coefficient magnitudes. Usage h2o.std_coef_plot(model, num_of_features = NULL) Arguments model A trained generalized linear model num_of_features The number of features to be shown in the plot See Also h2o.varimp_plot for variable importances plot of random forest, GBM, deep learning. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.importFile(prosPath) prostate.hex[,2] <- as.factor(prostate.hex[,2]) prostate.glm <- h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), training_frame = prostate.hex, family = "binomial", nfolds = 0, alpha = 0.5, lambda_search = FALSE) h2o.std_coef_plot(prostate.glm) h2o.stopLogging Stop Writing H2O R Logs Description Halt logging of H2O R POST commands and error responses to local disk. Used primarily for debugging purposes. Usage h2o.stopLogging() See Also h2o.startLogging, h2o.clearLog, h2o.openLog 172 h2o.stringdist Examples library(h2o) h2o.init() h2o.startLogging() ausPath = system.file("extdata", "australia.csv", package="h2o") australia.hex = h2o.importFile(path = ausPath) h2o.stopLogging() h2o.str Display the structure of an H2OFrame object Description Display the structure of an H2OFrame object Usage h2o.str(object, ..., cols = FALSE) Arguments object ... cols h2o.stringdist An H2OFrame. Further arguments to be passed from or to other methods. Print the per-column str for the H2OFrame Compute element-wise string distances between two H2OFrames Description Compute element-wise string distances between two H2OFrames. Both frames need to have the same shape (N x M) and only contain string/factor columns. Return a matrix (H2OFrame) of shape N x M. Usage h2o.stringdist(x, y, method = c("lv", "lcs", "qgram", "jaccard", "jw", "soundex"), compare_empty = TRUE) Arguments x y method compare_empty An H2OFrame A comparison H2OFrame A string identifier indicating what string distance measure to use. Must be one of: "lv" - Levenshtein distance "lcs" - Longest common substring distance "qgram" - q-gram distance "jaccard" - Jaccard distance between q-gram profiles "jw" - Jaro, or Jaro-Winker distance "soundex" - Distance based on soundex encoding if set to FALSE, empty strings will be handled as NaNs h2o.strsplit 173 Examples h2o.init() x <- as.h2o(c("Martha", "Dwayne", "Dixon")) y <- as.character(as.h2o(c("Marhta", "Duane", "Dicksonx"))) h2o.stringdist(x, y, method = "jw") h2o.strsplit String Split Description String Split Usage h2o.strsplit(x, split) Arguments x The column whose strings must be split. split The pattern to split on. Value An H2OFrame where each column is the outcome of the string split. Examples library(h2o) h2o.init() string_to_split <- as.h2o("Split at every character.") split_string <- h2o.strsplit(string_to_split,"") h2o.sub String Substitute Description Creates a copy of the target column in which each string has the first occurence of the regex pattern replaced with the replacement substring. Usage h2o.sub(pattern, replacement, x, ignore.case = FALSE) 174 h2o.substring Arguments pattern The pattern to replace. replacement The replacement pattern. x The column on which to operate. ignore.case Case sensitive or not Examples library(h2o) h2o.init() string_to_sub <- as.h2o("r tutorial") sub_string <- h2o.sub("r ","H2O ",string_to_sub) h2o.substring Substring Description Returns a copy of the target column that is a substring at the specified start and stop indices, inclusive. If the stop index is not specified, then the substring extends to the end of the original string. If start is longer than the number of characters in the original string, or is greater than stop, an empty string is returned. Negative start is coerced to 0. Usage h2o.substring(x, start, stop = "[]") h2o.substr(x, start, stop = "[]") Arguments x The column on which to operate. start The index of the first element to be included in the substring. stop Optional, The index of the last element to be included in the substring. Examples library(h2o) h2o.init() string_to_substring <- as.h2o("1234567890") substr <- h2o.substring(string_to_substring,2) #Get substring from second index onwards h2o.sum h2o.sum 175 Compute the frame’s sum by-column (or by-row). Description Compute the frame’s sum by-column (or by-row). Usage h2o.sum(x, na.rm = FALSE, axis = 0, return_frame = FALSE) Arguments x An H2OFrame object. na.rm logical. indicating whether missing values should be removed. axis An int that indicates whether to do down a column (0) or across a row (1). return_frame A boolean that indicates whether to return an H2O frame or a list. Default is FALSE. See Also sum for the base R implementation. h2o.summary Summarizes the columns of an H2OFrame. Description A method for the summary generic. Summarizes the columns of an H2O data frame or subset of columns and rows using vector notation (e.g. dataset[row, col]). Usage h2o.summary(object, factors = 6L, exact_quantiles = FALSE, ...) ## S3 method for class 'H2OFrame' summary(object, factors, exact_quantiles, ...) Arguments object An H2OFrame object. factors The number of factors to return in the summary. Default is the top 6. exact_quantiles Compute exact quantiles or use approximation. Default is to use approximation. ... Further arguments passed to or from other methods. 176 h2o.svd Details By default it uses approximated version of quantiles computation, however, user can modify this behavior by setting up exact_quantiles argument to true. Value A table displaying the minimum, 1st quartile, median, mean, 3rd quartile and maximum for each numeric column, and the levels and category counts of the levels in each categorical column. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.importFile(path = prosPath) summary(prostate.hex) summary(prostate.hex$GLEASON) summary(prostate.hex[,4:6]) summary(prostate.hex, exact_quantiles=TRUE) h2o.svd Singular value decomposition of an H2O data frame using the power method Description Singular value decomposition of an H2O data frame using the power method Usage h2o.svd(training_frame, x, destination_key, model_id = NULL, validation_frame = NULL, ignore_const_cols = TRUE, score_each_iteration = FALSE, transform = c("NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"), svd_method = c("GramSVD", "Power", "Randomized"), nv = 1, max_iterations = 1000, seed = -1, keep_u = TRUE, u_name = NULL, use_all_factor_levels = TRUE, max_runtime_secs = 0) Arguments training_frame Id of the training data frame. x A vector containing the character names of the predictors in the model. destination_key (Optional) The unique hex key assigned to the resulting model. Automatically generated if none is provided. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. h2o.svd 177 ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. transform Transformation of training data Must be one of: "NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE". Defaults to NONE. svd_method Method for computing SVD (Caution: Randomized is currently experimental and unstable) Must be one of: "GramSVD", "Power", "Randomized". Defaults to GramSVD. nv Number of right singular vectors Defaults to 1. max_iterations Maximum iterations Defaults to 1000. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number). keep_u Logical. Save left singular vectors? Defaults to TRUE. u_name Frame key to save left singular vectors use_all_factor_levels Logical. Whether first factor level is included in each categorical expansion Defaults to TRUE. max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. Value Returns an object of class H2ODimReductionModel. References N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011. Examples library(h2o) h2o.init() ausPath <- system.file("extdata", "australia.csv", package="h2o") australia.hex <- h2o.uploadFile(path = ausPath) h2o.svd(training_frame = australia.hex, nv = 8) 178 h2o.tabulate h2o.table Cross Tabulation and Table Creation in H2O Description Uses the cross-classifying factors to build a table of counts at each combination of factor levels. Usage h2o.table(x, y = NULL, dense = TRUE) table.H2OFrame(x, y = NULL, dense = TRUE) Arguments x An H2OFrame object with at most two columns. y An H2OFrame similar to x, or NULL. dense A logical for dense representation, which lists only non-zero counts, 1 combination per row. Set to FALSE to expand counts across all combinations. Value Returns a tabulated H2OFrame object. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath, destination_frame = "prostate.hex") summary(prostate.hex) # Counts of the ages of all patients head(h2o.table(prostate.hex[,3])) h2o.table(prostate.hex[,3]) # Two-way table of ages (rows) and race (cols) of all patients head(h2o.table(prostate.hex[,c(3,4)])) h2o.table(prostate.hex[,c(3,4)]) h2o.tabulate Tabulation between Two Columns of an H2OFrame Description Simple Co-Occurrence based tabulation of X vs Y, where X and Y are two Vecs in a given dataset. Uses histogram of given resolution in X and Y. Handles numerical/categorical data and missing values. Supports observation weights. h2o.tan 179 Usage h2o.tabulate(data, x, y, weights_column = NULL, nbins_x = 50, nbins_y = 50) Arguments data An H2OFrame object. x predictor column y response column weights_column (optional) observation weights column nbins_x number of bins for predictor column nbins_y number of bins for response column Value Returns two TwoDimTables of 3 columns each count_table: X Y counts response_table: X meanY counts Examples library(h2o) h2o.init() df <- as.h2o(iris) tab <- h2o.tabulate(data = df, x = "Sepal.Length", y = "Petal.Width", weights_column = NULL, nbins_x = 10, nbins_y = 10) plot(tab) h2o.tan Compute the tangent of x Description Compute the tangent of x Usage h2o.tan(x) Arguments x An H2OFrame object. See Also tan for the base R implementation. 180 h2o.target_encode_apply h2o.tanh Compute the hyperbolic tangent of x Description Compute the hyperbolic tangent of x Usage h2o.tanh(x) Arguments x An H2OFrame object. See Also tanh for the base R implementation. h2o.target_encode_apply Apply Target Encoding Map to Frame Description Applies a target encoding map to an H2OFrame object. Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models. Usage h2o.target_encode_apply(data, x, y, target_encode_map, holdout_type, fold_column = NULL, blended_avg = TRUE, noise_level = NULL, seed = -1) Arguments data An H2OFrame object with which to apply the target encoding map. x A list containing the names or indices of the variables to encode. A target encoding column will be created for each element in the list. Items in the list can be multiple columns. For example, if ‘x = list(c("A"), c("B", "C"))‘, then the resulting frame will have a target encoding column for A and a target encoding column for B & C (in this case, we group by two columns). y The name or column index of the response variable in the data. The response variable can be either numeric or binary. target_encode_map A list of H2OFrame objects that is the results of the h2o.target_encode_create function. holdout_type The holdout type used. Must be one of: "LeaveOneOut", "KFold", "None". fold_column (Optional) The name or column index of the fold column in the data. Defaults to NULL (no ‘fold_column‘). Only required if ‘holdout_type‘ = "KFold". h2o.target_encode_create 181 blended_avg Logical. (Optional) Whether to perform blended average. noise_level (Optional) The amount of random noise added to the target encoding. This helps prevent overfitting. Defaults to 0.01 * range of y. seed (Optional) A random seed used to generate draws from the uniform distribution for random noise. Defaults to -1. Value Returns an H2OFrame object containing the target encoding per record. See Also h2o.target_encode_create for creating the target encoding map Examples library(h2o) h2o.init() # Get Target Encoding Frame on bank-additional-full data with numeric `y` data <- h2o.importFile( path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv", destination_frame = "data") splits <- h2o.splitFrame(data, seed = 1234) train <- splits[[1]] test <- splits[[2]] mapping <- h2o.target_encode_create(data = train, x = list(c("job"), c("job", "marital")), y = "age") # Apply mapping to the training dataset train_encode <- h2o.target_encode_apply(data = train, x = list(c("job"), c("job", "marital")), y = "age", mapping, holdout_type = "LeaveOneOut") # Apply mapping to a test dataset test_encode <- h2o.target_encode_apply(data = test, x = list(c("job"), c("job", "marital")), y = "age", target_encode_map = mapping, holdout_type = "None") h2o.target_encode_create Create Target Encoding Map Description Creates a target encoding map based on group-by columns (‘x‘) and a numeric or binary target column (‘y‘). Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models. Usage h2o.target_encode_create(data, x, y, fold_column = NULL) 182 h2o.toFrame Arguments data An H2OFrame object with which to create the target encoding map. x A list containing the names or indices of the variables to encode. A target encoding map will be created for each element in the list. Items in the list can be multiple columns. For example, if ‘x = list(c("A"), c("B", "C"))‘, then there will be one mapping frame for A and one mapping frame for B & C (in this case, we group by two columns). y The name or column index of the response variable in the data. The response variable can be either numeric or binary. fold_column (Optional) The name or column index of the fold column in the data. Defaults to NULL (no ‘fold_column‘). Value Returns a list of H2OFrame objects containing the target encoding mapping for each column in ‘x‘. See Also h2o.target_encode_apply for applying the target encoding mapping to a frame. Examples library(h2o) h2o.init() # Get Target Encoding Map on bank-additional-full data with numeric response data <- h2o.importFile( path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv", destination_frame = "data") mapping_age <- h2o.target_encode_create(data = data, x = list(c("job"), c("job", "marital")), y = "age") head(mapping_age) # Get Target Encoding Map on bank-additional-full data with binary response mapping_y <- h2o.target_encode_create(data = data, x = list(c("job"), c("job", "marital")), y = "y") head(mapping_y) h2o.toFrame Convert a word2vec model into an H2OFrame Description Converts a given word2vec model into an H2OFrame. The frame represents learned word embeddings Usage h2o.toFrame(word2vec) h2o.tokenize 183 Arguments word2vec A word2vec model. Examples h2o.init() # Build a dummy word2vec model data <- as.character(as.h2o(c("a", "b", "a"))) w2v.model <- h2o.word2vec(data, sent_sample_rate = 0, min_word_freq = 0, epochs = 1, vec_size = 2) # Transform words to vectors and return average vector for each sentence h2o.toFrame(w2v.model) # -> Frame made of 2 rows and 2 columns h2o.tokenize Tokenize String Description h2o.tokenize is similar to h2o.strsplit, the difference between them is that h2o.tokenize will store the tokenized text into a single column making it easier for additional processing (filtering stop words, word2vec algo, ...). Usage h2o.tokenize(x, split) Arguments x The column or columns whose strings to tokenize. split The regular expression to split on. Value An H2OFrame with a single column representing the tokenized Strings. Original rows of the input DF are separated by NA. Examples library(h2o) h2o.init() string_to_tokenize <- as.h2o("Split at every character and tokenize.") tokenize_string <- h2o.tokenize(as.character(string_to_tokenize),"") 184 h2o.topN h2o.tolower Convert strings to lowercase Description Convert strings to lowercase Usage h2o.tolower(x) Arguments x An H2OFrame object whose strings should be lower cased Value An H2OFrame with all entries in lowercase format Examples library(h2o) h2o.init() string_to_lower <- as.h2o("ABCDE") lowered_string <- h2o.tolower(string_to_lower) h2o.topN H2O topN Description Extract the top N percent of values of a column and return it in a H2OFrame. Usage h2o.topN(x, column, nPercent) Arguments x an H2OFrame column is a column name or column index to grab the top N percent value from nPercent is a top percentage value to grab Value An H2OFrame with 2 columns. The first column is the original row indices, second column contains the topN values h2o.totss h2o.totss 185 Get the total sum of squares. Description If "train", "valid", and "xval" parameters are FALSE (default), then the training totss value is returned. If more than one parameter is set to TRUE, then a named vector of totss’ are returned, where the names are "train", "valid" or "xval". Usage h2o.totss(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OClusteringModel object. train Retrieve the training total sum of squares valid Retrieve the validation total sum of squares xval Retrieve the cross-validation total sum of squares h2o.tot_withinss Get the total within cluster sum of squares. Description If "train", "valid", and "xval" parameters are FALSE (default), then the training tot_withinss value is returned. If more than one parameter is set to TRUE, then a named vector of tot_withinss’ are returned, where the names are "train", "valid" or "xval". Usage h2o.tot_withinss(object, train = FALSE, valid = FALSE, xval = FALSE) Arguments object An H2OClusteringModel object. train Retrieve the training total within cluster sum of squares valid Retrieve the validation total within cluster sum of squares xval Retrieve the cross-validation total within cluster sum of squares 186 h2o.transform h2o.toupper Convert strings to uppercase Description Convert strings to uppercase Usage h2o.toupper(x) Arguments x An H2OFrame object whose strings should be upper cased Value An H2OFrame with all entries in uppercase format Examples library(h2o) h2o.init() string_to_upper <- as.h2o("abcde") upper_string <- h2o.toupper(string_to_upper) h2o.transform Transform words (or sequences of words) to vectors using a word2vec model. Description Transform words (or sequences of words) to vectors using a word2vec model. Usage h2o.transform(word2vec, words, aggregate_method = c("NONE", "AVERAGE")) Arguments word2vec A word2vec model. words An H2OFrame made of a single column containing source words. aggregate_method Specifies how to aggregate sequences of words. If method is ‘NONE‘ then no aggregation is performed and each input word is mapped to a single word-vector. If method is ’AVERAGE’ then input is treated as sequences of words delimited by NA. Each word of a sequences is internally mapped to a vector and vectors belonging to the same sentence are averaged and returned in the result. h2o.trim 187 Examples h2o.init() # Build a dummy word2vec model data <- as.character(as.h2o(c("a", "b", "a"))) w2v.model <- h2o.word2vec(data, sent_sample_rate = 0, min_word_freq = 0, epochs = 1, vec_size = 2) # Transform words to vectors without aggregation sentences <- as.character(as.h2o(c("b", "c", "a", NA, "b"))) h2o.transform(w2v.model, sentences) # -> 5 rows total, 2 rows NA ("c" is not in the vocabulary) # Transform words to vectors and return average vector for each sentence h2o.transform(w2v.model, sentences, aggregate_method = "AVERAGE") # -> 2 rows h2o.trim Trim Space Description Trim Space Usage h2o.trim(x) Arguments x The column whose strings should be trimmed. Examples library(h2o) h2o.init() string_to_trim <- as.h2o("r tutorial") trim_string <- h2o.trim(string_to_trim) h2o.trunc Truncate values in x toward 0 Description trunc takes a single numeric argument x and returns a numeric vector containing the integers formed by truncating the values in x toward 0. Usage h2o.trunc(x) 188 h2o.var Arguments x An H2OFrame object. See Also trunc for the base R implementation. h2o.unique H2O Unique Description Extract unique values in the column. Usage h2o.unique(x) Arguments x An H2OFrame object. Value Returns an H2OFrame object. h2o.var Variance of a column or covariance of columns. Description Compute the variance or covariance matrix of one or two H2OFrames. Usage h2o.var(x, y = NULL, na.rm = FALSE, use) var(x, y = NULL, na.rm = FALSE, use) Arguments x An H2OFrame object. y NULL (default) or an H2OFrame. The default is equivalent to y = x. na.rm logical. Should missing values be removed? use An optional character string indicating how to handle missing values. This must be one of the following: "everything" - outputs NaNs whenever one of its contributing observations is missing "all.obs" - presence of missing observations will throw an error "complete.obs" - discards missing values along with all observations in their rows so that only complete observations are used h2o.varimp 189 See Also var for the base R implementation. h2o.sd for standard deviation. Examples h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) var(prostate.hex$AGE) h2o.varimp Retrieve the variable importance. Description Retrieve the variable importance. Usage h2o.varimp(object) Arguments object An H2OModel object. h2o.varimp_plot Plot Variable Importances Description Plot Variable Importances Usage h2o.varimp_plot(model, num_of_features = NULL) Arguments model A trained model (accepts a trained random forest, GBM, or deep learning model, will use h2o.std_coef_plot for a trained GLM num_of_features The number of features shown in the plot (default is 10 or all if less than 10). See Also h2o.std_coef_plot for GLM. 190 h2o.week Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") hex <- h2o.importFile(prosPath) hex[,2] <- as.factor(hex[,2]) model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli") h2o.varimp_plot(model) # for deep learning set the variable_importance parameter to TRUE iris.hex <- as.h2o(iris) iris.dl <- h2o.deeplearning(x = 1:4, y = 5, training_frame = iris.hex, variable_importances = TRUE) h2o.varimp_plot(iris.dl) h2o.week Convert Milliseconds to Week of Week Year in H2O Datasets Description Converts the entries of an H2OFrame object from milliseconds to weeks of the week year (starting from 1). Usage h2o.week(x) week(x) ## S3 method for class 'H2OFrame' week(x) Arguments x An H2OFrame object. Value An H2OFrame object containing the entries of x converted to weeks of the week year. See Also h2o.month h2o.weights h2o.weights 191 Retrieve the respective weight matrix Description Retrieve the respective weight matrix Usage h2o.weights(object, matrix_id = 1) Arguments object An H2OModel or H2OModelMetrics matrix_id An integer, ranging from 1 to number of layers + 1, that specifies the weight matrix to return. h2o.which Which indices are TRUE? Description Give the TRUE indices of a logical object, allowing for array indices. Usage h2o.which(x) Arguments x An H2OFrame object. Value Returns an H2OFrame object. See Also which for the base R method. Examples h2o.init() iris.hex <- as.h2o(iris) h2o.which(iris.hex[,1]==4.4) 192 h2o.which_min h2o.which_max Which indice contains the max value? Description Get the index of the max value in a column or row Usage h2o.which_max(x, na.rm = TRUE, axis = 0) which.max.H2OFrame(x, na.rm = TRUE, axis = 0) which.min.H2OFrame(x, na.rm = TRUE, axis = 0) Arguments x An H2OFrame object. na.rm logical. Indicate whether missing values should be removed. axis integer. Indicate whether to calculate the mean down a column (0) or across a row (1). Value Returns an H2OFrame object. See Also which.max for the base R method. h2o.which_min Which index contains the min value? Description Get the index of the min value in a column or row Usage h2o.which_min(x, na.rm = TRUE, axis = 0) Arguments x An H2OFrame object. na.rm logical. Indicate whether missing values should be removed. axis integer. Indicate whether to calculate the mean down a column (0) or across a row (1). h2o.withinss 193 Value Returns an H2OFrame object. See Also which.min for the base R method. h2o.withinss Get the Within SS Description Get the Within SS Usage h2o.withinss(object) Arguments object h2o.word2vec An H2OClusteringModel object. Trains a word2vec model on a String column of an H2O data frame Description Trains a word2vec model on a String column of an H2O data frame Usage h2o.word2vec(training_frame = NULL, model_id = NULL, min_word_freq = 5, word_model = c("SkipGram"), norm_model = c("HSM"), vec_size = 100, window_size = 5, sent_sample_rate = 0.001, init_learning_rate = 0.025, epochs = 5, pre_trained = NULL, max_runtime_secs = 0) Arguments training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. min_word_freq This will discard words that appear less than times Defaults to 5. word_model Use the Skip-Gram model Must be one of: "SkipGram". Defaults to SkipGram. norm_model Use Hierarchical Softmax Must be one of: "HSM". Defaults to HSM. vec_size Set size of word vectors Defaults to 100. window_size Set max skip length between words Defaults to 5. 194 h2o.xgboost sent_sample_rate Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; useful range is (0, 1e-5) Defaults to 0.001. init_learning_rate Set the starting learning rate Defaults to 0.025. epochs Number of training iterations to run Defaults to 5. pre_trained Id of a data frame that contains a pre-trained (external) word2vec model max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. h2o.xgboost Build an eXtreme Gradient Boosting model Description Builds a eXtreme Gradient Boosting model using the native XGBoost backend. Usage h2o.xgboost(x, y, training_frame, model_id = NULL, validation_frame = NULL, nfolds = 0, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, score_each_iteration = FALSE, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), fold_column = NULL, ignore_const_cols = TRUE, offset_column = NULL, weights_column = NULL, stopping_rounds = 0, stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error"), stopping_tolerance = 0.001, max_runtime_secs = 0, seed = -1, distribution = c("AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"), tweedie_power = 1.5, categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), quiet_mode = TRUE, ntrees = 50, max_depth = 6, min_rows = 1, min_child_weight = 1, learn_rate = 0.3, eta = 0.3, sample_rate = 1, subsample = 1, col_sample_rate = 1, colsample_bylevel = 1, col_sample_rate_per_tree = 1, colsample_bytree = 1, max_abs_leafnode_pred = 0, max_delta_step = 0, score_tree_interval = 0, min_split_improvement = 0, gamma = 0, max_bins = 256, max_leaves = 0, min_sum_hessian_in_leaf = 100, min_data_in_leaf = 0, sample_type = c("uniform", "weighted"), normalize_type = c("tree", "forest"), rate_drop = 0, one_drop = FALSE, skip_drop = 0, tree_method = c("auto", "exact", "approx", "hist"), grow_policy = c("depthwise", "lossguide"), booster = c("gbtree", "gblinear", "dart"), reg_lambda = 0, reg_alpha = 0, dmatrix_type = c("auto", "dense", "sparse"), backend = c("auto", "gpu", "cpu"), gpu_id = 0, verbose = FALSE) h2o.xgboost 195 Arguments x (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. y The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. training_frame Id of the training data frame. model_id Destination id for this model; auto-generated if not specified. validation_frame Id of the validation data frame. nfolds Number of folds for K-fold cross-validation (0 to disable or >= 2). Defaults to 0. keep_cross_validation_predictions Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE. keep_cross_validation_fold_assignment Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE. score_each_iteration Logical. Whether to score during each iteration of model training. Defaults to FALSE. fold_assignment Cross-validation fold assignment scheme, if fold_column is not specified. The ’Stratified’ option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO. fold_column Column with cross-validation fold index assignment per observation. ignore_const_cols Logical. Ignore constant columns. Defaults to TRUE. offset_column Offset column. This will be added to the combination of columns before applying the link function. weights_column Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. stopping_rounds Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0. stopping_metric Metric to use for early stopping (AUTO: logloss for classification, deviance for regression) Must be one of: "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error". Defaults to AUTO. 196 h2o.xgboost stopping_tolerance Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0.001. max_runtime_secs Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. seed Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number). distribution Distribution function Must be one of: "AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber". Defaults to AUTO. tweedie_power Tweedie power for Tweedie regression, must be between 1 and 2. Defaults to 1.5. categorical_encoding Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. quiet_mode Logical. Enable quiet mode Defaults to TRUE. ntrees (same as n_estimators) Number of trees. Defaults to 50. max_depth Maximum tree depth. Defaults to 6. min_rows (same as min_child_weight) Fewest allowed (weighted) observations in a leaf. Defaults to 1. min_child_weight (same as min_rows) Fewest allowed (weighted) observations in a leaf. Defaults to 1. learn_rate (same as eta) Learning rate (from 0.0 to 1.0) Defaults to 0.3. eta (same as learn_rate) Learning rate (from 0.0 to 1.0) Defaults to 0.3. sample_rate (same as subsample) Row sample rate per tree (from 0.0 to 1.0) Defaults to 1. subsample (same as sample_rate) Row sample rate per tree (from 0.0 to 1.0) Defaults to 1. col_sample_rate (same as colsample_bylevel) Column sample rate (from 0.0 to 1.0) Defaults to 1. colsample_bylevel (same as col_sample_rate) Column sample rate (from 0.0 to 1.0) Defaults to 1. col_sample_rate_per_tree (same as colsample_bytree) Column sample rate per tree (from 0.0 to 1.0) Defaults to 1. colsample_bytree (same as col_sample_rate_per_tree) Column sample rate per tree (from 0.0 to 1.0) Defaults to 1. max_abs_leafnode_pred (same as max_delta_step) Maximum absolute value of a leaf node prediction Defaults to 0.0. max_delta_step (same as max_abs_leafnode_pred) Maximum absolute value of a leaf node prediction Defaults to 0.0. score_tree_interval Score the model after every so many trees. Disabled if set to 0. Defaults to 0. h2o.xgboost.available 197 min_split_improvement (same as gamma) Minimum relative improvement in squared error reduction for a split to happen Defaults to 0.0. gamma (same as min_split_improvement) Minimum relative improvement in squared error reduction for a split to happen Defaults to 0.0. max_bins For tree_method=hist only: maximum number of bins Defaults to 256. max_leaves For tree_method=hist only: maximum number of leaves Defaults to 0. min_sum_hessian_in_leaf For tree_method=hist only: the mininum sum of hessian in a leaf to keep splitting Defaults to 100.0. min_data_in_leaf For tree_method=hist only: the mininum data in a leaf to keep splitting Defaults to 0.0. sample_type For booster=dart only: sample_type Must be one of: "uniform", "weighted". Defaults to uniform. normalize_type For booster=dart only: normalize_type Must be one of: "tree", "forest". Defaults to tree. rate_drop For booster=dart only: rate_drop (0..1) Defaults to 0.0. one_drop Logical. For booster=dart only: one_drop Defaults to FALSE. skip_drop For booster=dart only: skip_drop (0..1) Defaults to 0.0. tree_method Tree method Must be one of: "auto", "exact", "approx", "hist". Defaults to auto. grow_policy Grow policy - depthwise is standard GBM, lossguide is LightGBM Must be one of: "depthwise", "lossguide". Defaults to depthwise. booster Booster type Must be one of: "gbtree", "gblinear", "dart". Defaults to gbtree. reg_lambda L2 regularization Defaults to 0.0. reg_alpha L1 regularization Defaults to 0.0. dmatrix_type Type of DMatrix. For sparse, NAs and 0 are treated equally. Must be one of: "auto", "dense", "sparse". Defaults to auto. backend Backend. By default (auto), a GPU is used if available. Must be one of: "auto", "gpu", "cpu". Defaults to auto. gpu_id Which GPU to use. Defaults to 0. verbose Logical. Print scoring history to the console (Metrics per tree for GBM, DRF, & XGBoost. Metrics per epoch for Deep Learning). Defaults to FALSE. h2o.xgboost.available Determines whether an XGBoost model can be built Description Ask the H2O server whether a XGBoost model can be built. (Depends on availability of native backend.) Returns True if a XGBoost model can be built, or False otherwise. Usage h2o.xgboost.available() 198 H2OAutoML-class h2o.year Convert Milliseconds to Years in H2O Datasets Description Convert the entries of an H2OFrame object from milliseconds to years, indexed starting from 1900. Usage h2o.year(x) year(x) ## S3 method for class 'H2OFrame' year(x) Arguments x An H2OFrame object. Details This method calls the function of the MutableDateTime class in Java. Value An H2OFrame object containing the entries of x converted to years See Also h2o.month H2OAutoML-class The H2OAutoML class Description This class represents an H2OAutoML object H2OClusteringModel-class 199 H2OClusteringModel-class The H2OClusteringModel object. Description This virtual class represents a clustering model built by H2O. Details This object has slots for the key, which is a character string that points to the model key existing in the H2O cloud, the data used to build the model (an object of class H2OFrame). Slots model_id A character string specifying the key for the model fit in the H2O cloud’s key-value store. algorithm A character string specifying the algorithm that was used to fit the model. parameters A list containing the parameter settings that were used to fit the model that differ from the defaults. allparameters A list containing all parameters used to fit the model. model A list containing the characteristics of the model returned by the algorithm. size The number of points in each cluster. totss Total sum of squared error to grand mean. withinss A vector of within-cluster sum of squared error. tot_withinss Total within-cluster sum of squared error. betweenss Between-cluster sum of squared error. H2OConnection-class The H2OConnection class. Description This class represents a connection to an H2O cloud. Usage ## S4 method for signature 'H2OConnection' show(object) Arguments object an H2OConnection object. 200 H2OCoxPHModel-class Details Because H2O is not a master-slave architecture, there is no restriction on which H2O node is used to establish the connection between R (the client) and H2O (the server). A new H2O connection is established via the h2o.init() function, which takes as parameters the ‘ip‘ and ‘port‘ of the machine running an instance to connect with. The default behavior is to connect with a local instance of H2O at port 54321, or to boot a new local instance if one is not found at port 54321. Slots ip A character string specifying the IP address of the H2O cloud. port A numeric value specifying the port number of the H2O cloud. proxy A character specifying the proxy path of the H2O cloud. https Set this to TRUE to use https instead of http. insecure Set this to TRUE to disable SSL certificate checking. username Username to login with. password Password to login with. cookies Cookies to add to request context_path Context path which is appended to H2O server location. mutable An H2OConnectionMutableState object to hold the mutable state for the H2O connection. H2OConnectionMutableState The H2OConnectionMutableState class Description This class represents the mutable aspects of a connection to an H2O cloud. Slots session_id A character string specifying the H2O session identifier. key_count A integer value specifying count for the number of keys generated for the session_id. H2OCoxPHModel-class The H2OCoxPHModel object. Description Virtual object representing H2O’s CoxPH Model. Usage ## S4 method for signature 'H2OCoxPHModel' show(object) Arguments object an H2OCoxPHModel object. H2OCoxPHModelSummary-class 201 H2OCoxPHModelSummary-class The H2OCoxPHModelSummary object. Description Wrapper object for summary information compatible with survival package. Slots summary A list containing the a summary compatible with CoxPH summary used in the survival package. H2OFrame-class The H2OFrame class Description This class represents an H2OFrame object H2OFrame-Extract Extract or Replace Parts of an H2OFrame Object Description Operators to extract or replace parts of H2OFrame objects. Usage ## S3 method for class 'H2OFrame' data[row, col, drop = TRUE] ## S3 method for class 'H2OFrame' x$name ## S3 method for class 'H2OFrame' x[[i, exact = TRUE]] ## S3 method for class 'H2OFrame' x$name ## S3 method for class 'H2OFrame' x[[i, exact = TRUE]] ## S3 replacement method for class 'H2OFrame' data[row, col, ...] <- value 202 H2OGrid-class ## S3 replacement method for class 'H2OFrame' data$name <- value ## S3 replacement method for class 'H2OFrame' data[[name]] <- value Arguments data row col drop x name i exact ... value H2OGrid-class object from which to extract element(s) or in which to replace element(s). index specifying row element(s) to extract or replace. Indices are numeric or character vectors or empty (missing) or will be matched to the names. index specifying column element(s) to extract or replace. Unused An H2OFrame a literal character string or a name (possibly backtick quoted). index controls possible partial matching of [[ when extracting a character Further arguments passed to or from other methods. To be assigned H2O Grid Description A class to contain the information about grid results Format grid object in user-friendly way Usage ## S4 method for signature 'H2OGrid' show(object) Arguments object an H2OGrid object. Slots grid_id the final identifier of grid model_ids list of model IDs which are included in the grid object hyper_names list of parameter names used for grid search failed_params list of model parameters which caused a failure during model building, it can contain a null value failure_details list of detailed messages which correspond to failed parameters field failure_stack_traces list of stack traces corresponding to model failures reported by failed_params and failure_details fields failed_raw_params list of failed raw parameters summary_table table of models built with parameters and metric information. H2OModel-class 203 See Also H2OModel for the final model types. H2OModel-class The H2OModel object. Description This virtual class represents a model built by H2O. Usage ## S4 method for signature 'H2OModel' show(object) Arguments object an H2OModel object. Details This object has slots for the key, which is a character string that points to the model key existing in the H2O cloud, the data used to build the model (an object of class H2OFrame). Slots model_id A character string specifying the key for the model fit in the H2O cloud’s key-value store. algorithm A character string specifying the algorithm that were used to fit the model. parameters A list containing the parameter settings that were used to fit the model that differ from the defaults. allparameters A list containg all parameters used to fit the model. have_pojo A logical indicating whether export to POJO is supported have_mojo A logical indicating whether export to MOJO is supported model A list containing the characteristics of the model returned by the algorithm. H2OModelFuture-class H2O Future Model Description A class to contain the information for background model jobs. Slots job_key a character key representing the identification of the job process. model_id the final identifier for the model See Also H2OModel for the final model types. 204 housevotes H2OModelMetrics-class The H2OModelMetrics Object. Description A class for constructing performance measures of H2O models. Usage ## S4 method for signature 'H2OModelMetrics' show(object) ## S4 method for signature 'H2OBinomialMetrics' show(object) ## S4 method for signature 'H2OMultinomialMetrics' show(object) ## S4 method for signature 'H2OOrdinalMetrics' show(object) ## S4 method for signature 'H2ORegressionMetrics' show(object) ## S4 method for signature 'H2OClusteringMetrics' show(object) ## S4 method for signature 'H2OAutoEncoderMetrics' show(object) ## S4 method for signature 'H2ODimReductionMetrics' show(object) Arguments object housevotes An H2OModelMetrics object United States Congressional Voting Records 1984 Description This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition). iris 205 Format A data frame with 435 rows and 17 columns Source Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional Quarterly Inc., Washington, D.C., 1985 References Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science. iris Edgar Anderson’s Iris Data Description Measurements in centimeters of the sepal length and width and petal length and width, respectively, for three species of iris flowers. Format A data frame with 150 rows and 5 columns Source Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179-188. The data were collected by Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2-5. is.character Check if character Description Check if character Usage is.character(x) Arguments x An H2OFrame object 206 is.numeric is.factor Check if factor Description Check if factor Usage is.factor(x) Arguments x is.h2o An H2OFrame object Is H2O Frame object Description Test if object is H2O Frame. Usage is.h2o(x) Arguments x is.numeric An R object. Check if numeric Description Check if numeric Usage is.numeric(x) Arguments x An H2OFrame object Logical-or 207 Logical-or Logical or for H2OFrames Description Logical or for H2OFrames Usage "||"(x, y) Arguments x An H2OFrame object y An H2OFrame object ModelAccessors Accessor Methods for H2OModel Object Description Function accessor methods for various H2O output fields. Usage getParms(object) ## S4 method for signature 'H2OModel' getParms(object) getCenters(object) getCentersStd(object) getWithinSS(object) getTotWithinSS(object) getBetweenSS(object) getTotSS(object) getIterations(object) getClusterSizes(object) ## S4 method for signature 'H2OClusteringModel' getCenters(object) 208 Ops.H2OFrame ## S4 method for signature 'H2OClusteringModel' getCentersStd(object) ## S4 method for signature 'H2OClusteringModel' getWithinSS(object) ## S4 method for signature 'H2OClusteringModel' getTotWithinSS(object) ## S4 method for signature 'H2OClusteringModel' getBetweenSS(object) ## S4 method for signature 'H2OClusteringModel' getTotSS(object) ## S4 method for signature 'H2OClusteringModel' getIterations(object) ## S4 method for signature 'H2OClusteringModel' getClusterSizes(object) Arguments object names.H2OFrame an H2OModel class object. Column names of an H2OFrame Description Column names of an H2OFrame Usage ## S3 method for class 'H2OFrame' names(x) Arguments x Ops.H2OFrame An H2OFrame S3 Group Generic Functions for H2O Description Methods for group generic functions and H2O objects. Ops.H2OFrame Usage ## S3 method for class 'H2OFrame' Ops(e1, e2) ## S3 method for class 'H2OFrame' Math(x, ...) ## S3 method for class 'H2OFrame' Math(x, ...) ## S3 method for class 'H2OFrame' Math(x, ...) ## S3 method for class 'H2OFrame' Summary(x, ..., na.rm) ## S3 method for class 'H2OFrame' !x ## S3 method for class 'H2OFrame' is.na(x) ## S3 method for class 'H2OFrame' t(x) log(x, ...) log10(x) log2(x) log1p(x) trunc(x, ...) x %*% y nrow.H2OFrame(x) ncol.H2OFrame(x) ## S3 method for class 'H2OFrame' length(x) h2o.length(x) ## S3 replacement method for class 'H2OFrame' names(x) <- value colnames(x) <- value 209 210 plot.H2OModel Arguments e1 object e2 object x object ... Further arguments passed to or from other methods. na.rm logical. whether or not missing values should be removed y object value To be assigned plot.H2OModel Plot an H2O Model Description Plots training set (and validation set if available) scoring history for an H2O Model Usage ## S3 method for class 'H2OModel' plot(x, timestep = "AUTO", metric = "AUTO", ...) Arguments x A fitted H2OModel object for which the scoring history plot is desired. timestep A unit of measurement for the x-axis. metric A unit of measurement for the y-axis. ... additional arguments to pass on. Details This method dispatches on the type of H2O model to select the correct scoring history. The timestep and metric arguments are restricted to what is available in the scoring history for a particular type of model. Value Returns a scoring history plot. See Also h2o.deeplearning, h2o.gbm, h2o.glm, h2o.randomForest for model generation in h2o. plot.H2OTabulate 211 Examples if (requireNamespace("mlbench", quietly=TRUE)) { library(h2o) h2o.init() df <- as.h2o(mlbench::mlbench.friedman1(10000,1)) rng <- h2o.runif(df, seed=1234) train <- df[rng<0.8,] valid <- df[rng>=0.8,] } gbm <- h2o.gbm(x = 1:10, y = "y", training_frame = train, validation_frame = valid, ntrees=500, learn_rate=0.01, score_each_iteration = TRUE) plot(gbm) plot(gbm, timestep = "duration", metric = "deviance") plot(gbm, timestep = "number_of_trees", metric = "deviance") plot(gbm, timestep = "number_of_trees", metric = "rmse") plot(gbm, timestep = "number_of_trees", metric = "mae") plot.H2OTabulate Plot an H2O Tabulate Heatmap Description Plots the simple co-occurrence based tabulation of X vs Y as a heatmap, where X and Y are two Vecs in a given dataset. This function requires suggested ggplot2 package. Usage ## S3 method for class 'H2OTabulate' plot(x, xlab = x$cols[1], ylab = x$cols[2], base_size = 12, ...) Arguments x An H2OTabulate object for which the heatmap plot is desired. xlab A title for the x-axis. Defaults to what is specified in the given H2OTabulate object. ylab A title for the y-axis. Defaults to what is specified in the given H2OTabulate object. base_size Base font size for plot. ... additional arguments to pass on. Value Returns a ggplot2-based heatmap of co-occurance. See Also link{h2o.tabulate} 212 predict.H2OModel Examples library(h2o) h2o.init() df <- as.h2o(iris) tab <- h2o.tabulate(data = df, x = "Sepal.Length", y = "Petal.Width", weights_column = NULL, nbins_x = 10, nbins_y = 10) plot(tab) predict.H2OAutoML Predict on an AutoML object Description Obtains predictions from an AutoML object. Usage ## S3 method for class 'H2OAutoML' predict(object, newdata, ...) Arguments object a fitted H2OAutoML object for which prediction is desired newdata An H2OFrame object in which to look for variables with which to predict. ... additional arguments to pass on. Details This method generated predictions on the leader model from an AutoML run. The order of the rows in the results is the same as the order in which the data was loaded, even if some rows fail (for example, due to missing values or unseen factor levels). Value Returns an H2OFrame object with probabilites and default predictions. predict.H2OModel Predict on an H2O Model Description Obtains predictions from various fitted H2O model objects. Usage ## S3 method for class 'H2OModel' predict(object, newdata, ...) h2o.predict(object, newdata, ...) predict_leaf_node_assignment.H2OModel 213 Arguments object a fitted H2OModel object for which prediction is desired newdata An H2OFrame object in which to look for variables with which to predict. ... additional arguments to pass on. Details This method dispatches on the type of H2O model to select the correct prediction/scoring algorithm. The order of the rows in the results is the same as the order in which the data was loaded, even if some rows fail (for example, due to missing values or unseen factor levels). Value Returns an H2OFrame object with probabilites and default predictions. See Also h2o.deeplearning, h2o.gbm, h2o.glm, h2o.randomForest for model generation in h2o. predict_leaf_node_assignment.H2OModel Predict the Leaf Node Assignment on an H2O Model Description Obtains leaf node assignment from fitted H2O model objects. Usage predict_leaf_node_assignment.H2OModel(object, newdata, ...) h2o.predict_leaf_node_assignment(object, newdata, ...) Arguments object a fitted H2OModel object for which prediction is desired newdata An H2OFrame object in which to look for variables with which to predict. ... additional arguments to pass on. Details For every row in the test set, return a set of factors that identify the leaf placements of the row in all the trees in the model. The order of the rows in the results is the same as the order in which the data was loaded Value Returns an H2OFrame object with categorical leaf assignment identifiers for each tree in the model. 214 print.H2OTable See Also h2o.gbm and h2o.randomForest for model generation in h2o. Examples library(h2o) h2o.init() prosPath <- system.file("extdata", "prostate.csv", package="h2o") prostate.hex <- h2o.uploadFile(path = prosPath) prostate.hex$CAPSULE <- as.factor(prostate.hex$CAPSULE) prostate.gbm <- h2o.gbm(3:9, "CAPSULE", prostate.hex) h2o.predict(prostate.gbm, prostate.hex) h2o.predict_leaf_node_assignment(prostate.gbm, prostate.hex) print.H2OFrame Print An H2OFrame Description Print An H2OFrame Usage ## S3 method for class 'H2OFrame' print(x, n = 6L, ...) Arguments x An H2OFrame object n An (Optional) A single integer. If positive, number of rows in x to return. If negative, all but the n first/last number of rows in x. Anything bigger than 20 rows will require asking the server (first 20 rows are cached on the client). ... Further arguments to be passed from or to other methods. print.H2OTable Print method for H2OTable objects Description This will print a truncated view of the table if there are more than 20 rows. Usage ## S3 method for class 'H2OTable' print(x, header = TRUE, ...) prostate 215 Arguments x An H2OTable object header A logical value dictating whether or not the table name should be printed. ... Further arguments passed to or from other methods. Value The original x object prostate Prostate Cancer Study Description Baseline exam results on prostate cancer patients from Dr. Donn Young at The Ohio State University Comprehensive Cancer Center. Format A data frame with 380 rows and 9 columns Source Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition. range.H2OFrame Range of an H2O Column Description Range of an H2O Column Usage ## S3 method for class 'H2OFrame' range(..., na.rm = TRUE) Arguments ... An H2OFrame object. na.rm ignore missing values 216 str.H2OFrame show,H2OCoxPHModelSummary-method Print the CoxPH Model Summary Description Print the CoxPH Model Summary Usage ## S4 method for signature 'H2OCoxPHModelSummary' show(object) Arguments object An H2OCoxPHModelSummary object. ... further arguments to be passed on (currently unimplemented) str.H2OFrame Display the structure of an H2OFrame object Description Display the structure of an H2OFrame object Usage ## S3 method for class 'H2OFrame' str(object, ..., cols = FALSE) Arguments object An H2OFrame. ... Further arguments to be passed from or to other methods. cols Print the per-column str for the H2OFrame summary,H2OCoxPHModel-method summary,H2OCoxPHModel-method Print the CoxPH Model Summary Description Print the CoxPH Model Summary Usage ## S4 method for signature 'H2OCoxPHModel' summary(object, conf.int = 0.95, scale = 1) Arguments object an H2OCoxPHModel object. conf.int a specification of the confidence interval. scale a scale. summary,H2OGrid-method Format grid object in user-friendly way Description Format grid object in user-friendly way Usage ## S4 method for signature 'H2OGrid' summary(object, show_stack_traces = FALSE) Arguments object an H2OGrid object. show_stack_traces a flag to show stack traces for model failures 217 218 use.package summary,H2OModel-method Print the Model Summary Description Print the Model Summary Usage ## S4 method for signature 'H2OModel' summary(object, ...) Arguments object An H2OModel object. ... further arguments to be passed on (currently unimplemented) use.package Use optional package Description Testing availability of optional package, its version, and extra global default. This function is used internally. It is exported and documented because user can control behavior of the function by global option. Usage use.package(package, version = "1.9.8"[package == "data.table"], use = getOption("h2o.use.data.table", FALSE)[package == "data.table"]) Arguments package character scalar name of a package that we Suggests or Enhances on. version character scalar required version of a package. use logical scalar, extra escape option, to be used as global option. Details We use this function to control csv read/write with optional data.table package. Currently data.table is disabled by default, to enable it set options("h2o.use.data.table"=TRUE). It is possible to control just fread or fwrite with options("h2o.fread"=FALSE, "h2o.fwrite"=FALSE). h2o.fread and h2o.fwrite options are not handled in this function but next to fread and fwrite calls. See Also as.h2o.data.frame, as.data.frame.H2OFrame walking 219 Examples op <- options("h2o.use.data.table" = TRUE) if (use.package("data.table")) { cat("optional package data.table 1.9.8+ is available\n") } else { cat("optional package data.table 1.9.8+ is not available\n") } options(op) walking Muscular Actuations for Walking Subject Description The musculoskeletal model, experimental data, settings files, and results for three-dimensional, muscle-actuated simulations at walking speed as described in Hamner and Delp (2013). Simulations were generated using OpenSim 2.4. The data is available from https://simtk.org/project/ xml/downloads.xml?group_id=603. Format A data frame with 151 rows and 124 columns References Hamner, S.R., Delp, S.L. Muscle contributions to fore-aft and vertical body mass center accelerations over a range of running speeds. Journal of Biomechanics, vol 46, pp 780-787. (2013) zzz Shutdown H2O cloud after examples run Description Shutdown H2O cloud after examples run Examples library(h2o) h2o.init() h2o.shutdown(prompt = FALSE) Sys.sleep(3) 220 && && Logical and for H2OFrames Description Logical and for H2OFrames Usage "&&"(x, y) Arguments x An H2OFrame object y An H2OFrame object Index !.H2OFrame (Ops.H2OFrame), 208 ∗Topic datasets australia, 14 housevotes, 204 iris, 205 prostate, 215 walking, 219 ∗Topic package h2o-package, 7 [,H2OFrame-method (H2OFrame-Extract), 201 [.H2OFrame (H2OFrame-Extract), 201 [<-.H2OFrame (H2OFrame-Extract), 201 [[.H2OFrame (H2OFrame-Extract), 201 [[<-.H2OFrame (H2OFrame-Extract), 201 $.H2OFrame (H2OFrame-Extract), 201 $<-.H2OFrame (H2OFrame-Extract), 201 %*% (Ops.H2OFrame), 208 %in% (h2o.match), 118 &&, 220 cosh, 39 cummax, 44 cummin, 45 cumprod, 45 cumsum, 46 cut.H2OFrame (h2o.cut), 46 data.table, 218 day (h2o.day), 47 dayOfWeek (h2o.dayOfWeek), 48 ddply, 49 dim, 15, 63 dim.H2OFrame, 15 dimnames, 64 dimnames.H2OFrame, 15 exp, 68 floor, 73 fread, 10, 218 fwrite, 11, 218 aaa, 8 abs, 16 acos, 17 all, 20, 21 apply, 8, 8 as.character, 22 as.character.H2OFrame, 9 as.data.frame.H2OFrame, 9, 218 as.factor, 10, 10, 23 as.h2o, 11 as.h2o.data.frame, 218 as.matrix.H2OFrame, 12 as.numeric, 13, 23 as.vector.H2OFrame, 13 australia, 14 getBetweenSS (ModelAccessors), 207 getBetweenSS,H2OClusteringModel-method (ModelAccessors), 207 getCenters (ModelAccessors), 207 getCenters,H2OClusteringModel-method (ModelAccessors), 207 getCentersStd (ModelAccessors), 207 getCentersStd,H2OClusteringModel-method (ModelAccessors), 207 getClusterSizes (ModelAccessors), 207 getClusterSizes,H2OClusteringModel-method (ModelAccessors), 207 getIterations (ModelAccessors), 207 getIterations,H2OClusteringModel-method (ModelAccessors), 207 getParms (ModelAccessors), 207 getParms,H2OModel-method (ModelAccessors), 207 getTotSS (ModelAccessors), 207 getTotSS,H2OClusteringModel-method (ModelAccessors), 207 getTotWithinSS (ModelAccessors), 207 cbind, 29 ceiling, 30 colMeans, 120 colnames, 14, 34 colnames<- (Ops.H2OFrame), 208 cor (h2o.cor), 38 cos, 39 221 222 getTotWithinSS,H2OClusteringModel-method (ModelAccessors), 207 getWithinSS (ModelAccessors), 207 getWithinSS,H2OClusteringModel-method (ModelAccessors), 207 h2o (h2o-package), 7 h2o-package, 7 h2o.abs, 16 h2o.accuracy (h2o.metric), 124 h2o.acos, 17 h2o.aggregated_frame, 17 h2o.aggregator, 18 h2o.aic, 19 h2o.all, 20 h2o.anomaly, 20 h2o.any, 21 h2o.anyFactor, 21 h2o.arrange, 22 h2o.as_date, 24 h2o.ascharacter, 22 h2o.asfactor, 23 h2o.asnumeric, 23 h2o.assign, 24, 156 h2o.auc, 25, 83, 87, 125, 128, 156 h2o.automl, 25 h2o.betweenss, 27, 109 h2o.biases, 28 h2o.bottomN, 28 h2o.cbind, 29 h2o.ceiling, 29 h2o.centers, 30, 109 h2o.centersSTD, 30, 109 h2o.centroid_stats, 30 h2o.clearLog, 31, 136, 170, 171 h2o.cluster_sizes, 33, 109 h2o.clusterInfo, 31 h2o.clusterIsUp, 32 h2o.clusterStatus, 32 h2o.coef, 33 h2o.coef_norm, 34 h2o.colnames, 34 h2o.columns_by_type, 35 h2o.computeGram, 35 h2o.confusionMatrix, 36, 87 h2o.confusionMatrix,H2OModel-method (h2o.confusionMatrix), 36 h2o.confusionMatrix,H2OModelMetrics-method (h2o.confusionMatrix), 36 h2o.connect, 37 h2o.cor, 38 h2o.cos, 39 h2o.cosh, 39 INDEX h2o.coxph, 40 h2o.createFrame, 41 h2o.cross_validation_fold_assignment, 42 h2o.cross_validation_holdout_predictions, 43 h2o.cross_validation_models, 43 h2o.cross_validation_predictions, 44 h2o.cummax, 44 h2o.cummin, 45 h2o.cumprod, 45 h2o.cumsum, 46 h2o.cut, 46 h2o.day, 47, 48, 96 h2o.dayOfWeek, 48 h2o.dct, 48 h2o.ddply, 49 h2o.decryptionSetup, 50, 97, 137, 138 h2o.deepfeatures, 51 h2o.deeplearning, 20, 52, 210, 213 h2o.deepwater, 58 h2o.deepwater.available, 62 h2o.describe, 62 h2o.difflag1, 63 h2o.dim, 63 h2o.dimnames, 64 h2o.distance, 64 h2o.download_mojo, 66 h2o.download_pojo, 67 h2o.downloadAllLogs, 65 h2o.downloadCSV, 65 h2o.entropy, 68 h2o.error (h2o.metric), 124 h2o.exp, 68 h2o.exportFile, 69 h2o.exportHDFS, 70 h2o.F0point5 (h2o.metric), 124 h2o.F1 (h2o.metric), 124 h2o.F2 (h2o.metric), 124 h2o.fallout (h2o.metric), 124 h2o.fillna, 70 h2o.filterNACols, 71 h2o.find_row_by_threshold, 72 h2o.find_threshold_by_max_metric, 72 h2o.findSynonyms, 71 h2o.floor, 73 h2o.flow, 73 h2o.fnr (h2o.metric), 124 h2o.fpr (h2o.metric), 124 h2o.gainsLift, 73 h2o.gainsLift,H2OModel-method (h2o.gainsLift), 73 INDEX h2o.gainsLift,H2OModelMetrics-method (h2o.gainsLift), 73 h2o.gbm, 74, 210, 213, 214 h2o.getConnection, 78 h2o.getFrame, 79 h2o.getFutureModel, 79 h2o.getGLMFullRegularizationPath, 79 h2o.getGrid, 80 h2o.getId, 80 h2o.getModel, 81 h2o.getTimezone, 81 h2o.getTypes, 82 h2o.getVersion, 82 h2o.giniCoef, 25, 82, 83, 87, 125 h2o.glm, 7, 83, 210, 213 h2o.glrm, 87, 142, 145, 152 h2o.grep, 90 h2o.grid, 91 h2o.group_by, 92 h2o.gsub, 93 h2o.head, 94 h2o.hist, 94 h2o.hit_ratio_table, 95 h2o.hour, 95 h2o.ifelse, 96 h2o.import_sql_select, 98, 98 h2o.import_sql_table, 98, 99 h2o.importFile, 50, 97, 137 h2o.importFolder (h2o.importFile), 97 h2o.importHDFS (h2o.importFile), 97 h2o.impute, 100 h2o.init, 32, 101, 165 h2o.insertMissingValues, 103 h2o.interaction, 104 h2o.is_client, 107 h2o.isax, 105 h2o.ischaracter, 106 h2o.isfactor, 106 h2o.isnumeric, 107 h2o.kfold_column, 107 h2o.killMinus3, 108 h2o.kmeans, 89, 108 h2o.kurtosis, 110 h2o.length (Ops.H2OFrame), 208 h2o.levels, 110 h2o.list_all_extensions, 111 h2o.list_api_extensions, 111 h2o.list_core_extensions, 112 h2o.listTimezones, 111 h2o.loadModel, 112, 160 h2o.log, 113 h2o.log10, 113 223 h2o.log1p, 114 h2o.log2, 114 h2o.logAndEcho, 115 h2o.logloss, 87, 115 h2o.ls, 116, 156 h2o.lstrip, 116 h2o.mae, 117 h2o.make_metrics, 118 h2o.makeGLMModel, 117 h2o.match, 118 h2o.max, 119 h2o.maxPerClassError (h2o.metric), 124 h2o.mcc (h2o.metric), 124 h2o.mean, 120 h2o.mean_per_class_accuracy (h2o.metric), 124 h2o.mean_per_class_error, 121 h2o.mean_residual_deviance, 122 h2o.median, 122 h2o.merge, 123 h2o.metric, 25, 83, 121, 124, 128, 156 h2o.min, 126 h2o.missrate (h2o.metric), 124 h2o.mktime, 126 h2o.month, 47, 48, 127, 190, 198 h2o.mse, 25, 87, 121, 125, 127, 128, 156 h2o.na_omit, 131 h2o.nacnt, 128 h2o.naiveBayes, 129 h2o.names, 131 h2o.nchar, 132 h2o.ncol, 132 h2o.networkTest, 133 h2o.nlevels, 133 h2o.no_progress, 133 h2o.nrow, 134 h2o.null_deviance, 134 h2o.null_dof, 135 h2o.num_iterations, 109, 135 h2o.num_valid_substrings, 136 h2o.openLog, 31, 136, 170, 171 h2o.parseRaw, 97, 98, 137, 138 h2o.parseSetup, 50, 137, 138 h2o.partialPlot, 139 h2o.performance, 25, 37, 74, 83, 87, 121, 125, 128, 140, 156 h2o.pivot, 141 h2o.prcomp, 89, 141 h2o.precision (h2o.metric), 124 h2o.predict (predict.H2OModel), 212 h2o.predict_json, 143 h2o.predict_leaf_node_assignment 224 INDEX (predict_leaf_node_assignment.H2OModel), h2o.sum, 175 213 h2o.summary, 175 h2o.print, 144 h2o.svd, 89, 142, 176 h2o.prod, 144 h2o.table, 178 h2o.proj_archetypes, 145 h2o.tabulate, 178 h2o.quantile, 146 h2o.tail (h2o.head), 94 h2o.r2, 147 h2o.tan, 179 h2o.randomForest, 147, 210, 213, 214 h2o.tanh, 180 h2o.range, 151 h2o.target_encode_apply, 180, 182 h2o.rbind, 151 h2o.target_encode_create, 180, 181, 181 h2o.recall (h2o.metric), 124 h2o.tnr (h2o.metric), 124 h2o.reconstruct, 152 h2o.toFrame, 182 h2o.relevel, 153 h2o.tokenize, 183 h2o.removeAll, 153 h2o.tolower, 184 h2o.removeVecs, 154 h2o.topN, 184 h2o.rep_len, 154 h2o.tot_withinss, 109, 185 h2o.residual_deviance, 155 h2o.totss, 109, 185 h2o.residual_dof, 155 h2o.toupper, 186 h2o.rm, 153, 156 h2o.tpr (h2o.metric), 124 h2o.rmse, 156 h2o.transform, 186 h2o.rmsle, 157 h2o.trim, 187 h2o.round, 158 h2o.trunc, 187 h2o.rstrip, 158 h2o.unique, 188 h2o.runif, 159 h2o.uploadFile (h2o.importFile), 97 h2o.saveModel, 112, 159, 161 h2o.var, 163, 188 h2o.saveModelDetails, 160 h2o.varimp, 87, 189 h2o.saveMojo, 161 h2o.varimp_plot, 171, 189 h2o.scale, 162 h2o.week, 190 h2o.scoreHistory, 87, 162 h2o.weights, 191 h2o.sd, 163, 189 h2o.which, 191 h2o.sdev, 163 h2o.which_max, 192 h2o.sensitivity (h2o.metric), 124 h2o.which_min, 192 h2o.setLevels, 164 h2o.withinss, 109, 193 h2o.setTimezone, 164 h2o.word2vec, 193 h2o.show_progress, 165 h2o.xgboost, 194 h2o.shutdown, 103, 165 h2o.xgboost.available, 197 h2o.signif, 166 h2o.year, 127, 198 h2o.sin, 166 H2OAutoEncoderMetrics-class h2o.skewness, 167 (H2OModelMetrics-class), 204 h2o.specificity (h2o.metric), 124 H2OAutoEncoderModel, 20 h2o.splitFrame, 167 H2OAutoEncoderModel-class h2o.sqrt, 168 (H2OModel-class), 203 h2o.stackedEnsemble, 169 H2OAutoML, 27, 212 h2o.startLogging, 31, 136, 170, 171 H2OAutoML-class, 198 h2o.std_coef_plot, 171, 189 H2OBinomialMetrics, 25, 36, 74, 82, 83, 115, h2o.stopLogging, 31, 136, 170, 171 121, 125, 128, 156 h2o.str, 172 H2OBinomialMetrics-class h2o.stringdist, 172 (H2OModelMetrics-class), 204 h2o.strsplit, 173 H2OBinomialModel, 87, 130 h2o.sub, 173 H2OBinomialModel-class (H2OModel-class), 203 h2o.substr (h2o.substring), 174 H2OClusteringMetrics-class h2o.substring, 174 INDEX (H2OModelMetrics-class), 204 H2OClusteringModel, 17, 28, 30, 33, 109, 135, 185, 193 H2OClusteringModel-class, 199 H2OConnection, 32, 79 H2OConnection (H2OConnection-class), 199 H2OConnection-class, 199 H2OConnectionMutableState, 200 H2OCoxPHMetrics-class (H2OModelMetrics-class), 204 H2OCoxPHModel (H2OCoxPHModel-class), 200 H2OCoxPHModel-class, 200 H2OCoxPHModelSummary, 216 H2OCoxPHModelSummary (H2OCoxPHModelSummary-class), 201 H2OCoxPHModelSummary-class, 201 H2ODimReductionMetrics-class (H2OModelMetrics-class), 204 H2ODimReductionModel, 89, 142, 145, 152, 163, 177 H2ODimReductionModel-class (H2OModel-class), 203 H2OFrame-class, 201 H2OFrame-Extract, 201 H2OGrid (H2OGrid-class), 202 H2OGrid-class, 202 H2OModel, 19, 28, 33, 34, 36, 42–44, 51, 70, 74, 79, 81, 87, 95, 112, 117, 122, 134, 135, 139, 140, 147, 150, 155, 157, 159–162, 189, 191, 203, 208, 210, 213, 218 H2OModel (H2OModel-class), 203 H2OModel-class, 203 H2OModelFuture-class, 203 H2OModelMetrics, 19, 28, 36, 37, 74, 115, 118, 125, 127, 134, 135, 140, 155, 156, 191 H2OModelMetrics (H2OModelMetrics-class), 204 H2OModelMetrics-class, 204 H2OMultinomialMetrics, 36, 115, 128, 156 H2OMultinomialMetrics-class (H2OModelMetrics-class), 204 H2OMultinomialModel, 130 H2OMultinomialModel-class (H2OModel-class), 203 H2OOrdinalMetrics-class (H2OModelMetrics-class), 204 H2OOrdinalModel-class (H2OModel-class), 203 H2ORegressionMetrics, 128, 156 225 H2ORegressionMetrics-class (H2OModelMetrics-class), 204 H2ORegressionModel, 87 H2ORegressionModel-class (H2OModel-class), 203 H2OUnknownMetrics-class (H2OModelMetrics-class), 204 H2OUnknownModel-class (H2OModel-class), 203 H2OWordEmbeddingMetrics-class (H2OModelMetrics-class), 204 H2OWordEmbeddingModel-class (H2OModel-class), 203 head.H2OFrame (h2o.head), 94 hour (h2o.hour), 95 housevotes, 204 ifelse (h2o.ifelse), 96 iris, 205 is.character, 106, 205 is.factor, 106, 206 is.h2o, 206 is.na.H2OFrame (Ops.H2OFrame), 208 is.numeric, 107, 206 kurtosis.H2OFrame (h2o.kurtosis), 110 length.H2OFrame (Ops.H2OFrame), 208 levels, 111 log, 113 log (Ops.H2OFrame), 208 log10, 113 log10 (Ops.H2OFrame), 208 log1p, 114 log1p (Ops.H2OFrame), 208 log2, 114 log2 (Ops.H2OFrame), 208 Logical-or, 207 match, 119 match.H2OFrame (h2o.match), 118 Math.H2OFrame (Ops.H2OFrame), 208 max, 119 mean, 120 mean.H2OFrame (h2o.mean), 120 median.H2OFrame (h2o.median), 122 min, 126 ModelAccessors, 207 month (h2o.month), 127 names, 131 names.H2OFrame, 208 names<-.H2OFrame (Ops.H2OFrame), 208 226 ncol, 132 ncol.H2OFrame (Ops.H2OFrame), 208 nlevels, 133 nrow, 134 nrow.H2OFrame (Ops.H2OFrame), 208 Ops.H2OFrame, 208 plot.H2OModel, 210 plot.H2OTabulate, 211 predict, 36, 37, 74 predict.H2OAutoML, 212 predict.H2OModel, 57, 78, 87, 151, 212 predict_leaf_node_assignment.H2OModel, 213 print.H2OFrame, 214 print.H2OTable, 214 prod, 144 prostate, 215 quantile, 146 quantile.H2OFrame (h2o.quantile), 146 range, 151 range.H2OFrame, 215 rbind, 151 round, 158 round (h2o.round), 158 rowMeans, 120 scale.H2OFrame (h2o.scale), 162 sd, 163 sd (h2o.sd), 163 show,H2OAutoEncoderMetrics-method (H2OModelMetrics-class), 204 show,H2OBinomialMetrics-method (H2OModelMetrics-class), 204 show,H2OClusteringMetrics-method (H2OModelMetrics-class), 204 show,H2OConnection-method (H2OConnection-class), 199 show,H2OCoxPHModel-method (H2OCoxPHModel-class), 200 show,H2OCoxPHModelSummary-method, 216 show,H2ODimReductionMetrics-method (H2OModelMetrics-class), 204 show,H2OGrid-method (H2OGrid-class), 202 show,H2OModel-method (H2OModel-class), 203 show,H2OModelMetrics-method (H2OModelMetrics-class), 204 show,H2OMultinomialMetrics-method (H2OModelMetrics-class), 204 INDEX show,H2OOrdinalMetrics-method (H2OModelMetrics-class), 204 show,H2ORegressionMetrics-method (H2OModelMetrics-class), 204 signif, 166 signif (h2o.signif), 166 sin, 166 skewness.H2OFrame (h2o.skewness), 167 sqrt, 168 str.H2OFrame, 216 sum, 175 summary, 175 summary,H2OCoxPHModel-method, 217 summary,H2OGrid-method, 217 summary,H2OModel-method, 218 Summary.H2OFrame (Ops.H2OFrame), 208 summary.H2OFrame (h2o.summary), 175 t.H2OFrame (Ops.H2OFrame), 208 table.H2OFrame (h2o.table), 178 tail.H2OFrame (h2o.head), 94 tan, 179 tanh, 180 trunc, 188 trunc (Ops.H2OFrame), 208 use.package, 10, 11, 218 var, 189 var (h2o.var), 188 walking, 219 week (h2o.week), 190 which, 191 which.max, 192 which.max.H2OFrame (h2o.which_max), 192 which.min, 193 which.min.H2OFrame (h2o.which_max), 192 year (h2o.year), 198 zzz, 219
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 226 Page Mode : UseOutlines Author : Title : Subject : Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.16 Create Date : 2018:04:14 22:16:14Z Modify Date : 2018:04:14 22:16:14Z Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) kpathsea version 6.2.1EXIF Metadata provided by EXIF.tools