SparkCognition Darwin API User Guide Spark Cognition V1.6

SparkCognition_Darwin_API_User_Guide_v1.6

User Manual:

Open the PDF directly: View PDF .
Page Count: 27

About this guide
Darwin overview
- Accessing the API
Expectation
Technical routes
Revision Table

SparkCognition Darwin API User Guide

Contents

About this guide 1

Darwin overview 1

Accessing the API .............................................. 2

Expectation 2

Technical routes 2

analyze .................................................... 2

auth ...................................................... 4

clean ...................................................... 8

download ................................................... 9

job ....................................................... 11

lookup ..................................................... 13

run ...................................................... 21

train ...................................................... 21

upload ..................................................... 25

Revision Table 26

About this guide

This manual describes the Darwin

™

API and its use in automated model building. It is intended for data

scientists, software engineers, and analysts who want to use the Darwin API to interact with Darwin to

create and train models, monitor jobs, and perform analysis.

Darwin overview

Darwin is a SparkCognition

™

tool that automates model building processes to solve speciﬁc problems.

This tool enhances data scientist potential because it automates various tasks that are often manually

performed. These tasks include data cleaning, latent relationship extraction, and optimal model deter-

mination. Darwin promotes rapid and accurate feature generation through both automated windowing

and risk generation. Darwin quickly creates highly-accurate, dynamic models using both supervised and

unsupervised learning methods.

Darwin API User Guide

For additional information on Darwin, contact your local SparkCognition partner for access to the white

paper titled: Darwin - A Neurogenesis Platform.

Accessing the API

The Darwin API can normally be accessed through one of three methods:

• the Darwin Python SDK (preferred, recommended)

• the https://darwin-api.sparkcognition.com/v1 end point

• optionally, through user-created curl commands

For additional information on the Darwin SDK, see the SparkCognition Darwin Python SDK Guide.

Expectation

This document assumes the experience of a data scientist or software engineer that is knowledgeable of

data science techniques and associated programming tasks.

Technical routes

The Darwin API includes the following api operations:

•analyze - analyze a model or dataset

•auth - register and authenticate

•clean - preprocess a dataset

•download - download or delete a generated artifact

•job - return status on jobs

•lookup - get model or dataset metadata

•run - run a model on a dataset

•train - train a model

•upload - upload or delete a dataset

analyze

Request Type: POST

URI: /v1/analyze/model/{model_name}

Headers:

• Authorization: Bearer token

Form Data:

•model_name: The name of the model to be analyzed

•job_name: (optional) If not speciﬁed, a uuid is created as the job_name.

Page 2

Darwin API User Guide

•artifact_name: (optional) If not speciﬁed, a uuid is created as the artifact_name.

•category_name

: (optional) The name of the class for supervised or cluster for unsupervised to get fea-

ture importances for. If this is not speciﬁed, the feature importances will be over all classes/clusters.

•model_type

: (optional) Model type from the population. Possible values include:

DeepNeuralNetwork,

RandomForest, GradientBoosted.

Description: Analyze the universal feature importances for a particular model given the model name.

Note: This API is capable of returning the structure of the model in the form of a pandas Series.

Response Codes: 201, 400, 401, 403, 422

Successful Response:

{

"job_name": "string",

"artifact_name": "string"

}

Request Type: POST

URI: /v1/analyze/model/predictions/{model_name}/{dataset_name}

Headers:

• Authorization: Bearer token

Form Data:

•dataset_name

: The name of the dataset containing the data to analyze predictions for. This is a new

dataset that was not used during training for which you want feature importance scores for each

row of this dataset. This dataset has a limit of 500 rows. There is no limit for columns.

•model_name: The name of the model to be analyzed

•job_name: (optional) If not speciﬁed, a uuid is created as the job_name.

•artifact_name: (optional) If not speciﬁed, a uuid is created as the artifact_name.

•start_index: (optional) Index to start at in the dataset when analyzing model predictions.

•end_index: (optional) Index to stop at in the dataset when analyzing model predictions.

•model_type

: (optional) Model type from the population. Possible values include:

DeepNeuralNetwork,

RandomForest, GradientBoosted.

Description:

Analyze speciﬁc feature importances for a particular sample or samples given the model

name and sample data. Analyze predictions cannot be used if you trained your model with a dataset that

is larger than 500 MB.

Response Codes: 201, 400, 401, 403, 422

Successful Response:

{

"job_name": "string",

Page 3

Darwin API User Guide

"artifact_name": "string"

}

Request Type: POST

URI: /v1/analyze/data/{dataset_name}

Headers:

• Authorization: Bearer token

Description: Analyze a dataset and return statistics/metadata concerning designated data.

Parameter Descriptions:

•dataset_name: The name of the dataset to analyze and return statistics/metadata for

•job_name: The job name

•artifact_name: The artifact name

•max_unique_values

: Threshold for automatic pruning of categorical columns prior to one hot encod-

ing based on the number of unique values

Note

: If a categorical column contains at least

max_unique_values

, it is dropped during preprocess-

ing prior to one hot encoding.

Payload:

{

"job_name": "string",

"artifact_name": "string",

"max_unique_values": 30

}

Response Codes: 201, 400, 401, 403, 408, 422

Successful Response:

{

"job_name": "string",

"artifact_name": "string"

}

auth

Request Type: PATCH

URI: /v1/auth/email

Headers:

• Authorization: Bearer token

Page 4

Darwin API User Guide

Description: Add or change an email address.

Form Data:

•email: Email address

Response Codes: 204, 400, 401, 422

Successful Response:

{

'access_token': 'token_string'

}

Request Type: POST

URI: /v1/auth/login

Headers:

• Authorization: Bearer token

Description: Login as a service.

Form Data:

•api_key: The api key of the service

•pass1: The service level password

Response Codes: 201, 400, 401

Successful Response:

{

'access_token': 'token_string'

}

Request Type: POST

URI: /v1/auth/login/user

Description: Login as a user.

Form Data:

•username: The end user’s name

•pass1: The end user’s password

Response Codes: 201, 400, 401, 422

Successful Response:

{

'access_token': 'token_string'

}

Page 5

Darwin API User Guide

Request Type: PATCH

URI: /v1/auth/password

Headers:

• Authorization: Bearer token

Description: Change the password.

Form Data:

•curpass: Current password

•newpass1: New password

•newpass2: Conﬁrmation of new password

Response Codes: 204, 400, 401, 422

Successful Response:

{

'access_token': 'token_string'

}

Request Type: PATCH

URI: /v1/auth/password/reset

Headers:

Description:

Reset a user’s password. An email will be sent to the user’s email address with a temporary

password and instructions for changing it.

Form Data:

•username: The username of the user whose password needs resetting

Response Codes: 201, 400, 401, 422

Successful Response:

{

'access_token': 'token_string'

}

Request Type: POST

URI: /v1/auth/register

Headers:

Description: Register as a service.

Form Data:

Page 6

Darwin API User Guide

•api_key: The api key of the service

•pass1: The service level password

•pass2: The service level password conﬁrmation

•email: Email address

Response Codes: 201, 400, 401, 403

Successful Response:

{

'access_token': 'token_string'

}

Request Type: POST

URI: /v1/auth/register/user

Headers:

• Authorization: Bearer token

Description: Register a user for your service.

Form Data:

•username: The end user’s name

•pass1: The end user’s password

•pass2: The end user’s password conﬁrmation

•email: The end user’s email address

Response Codes: 201, 400, 401, 422

Successful Response:

{

'access_token': 'token_string'

}

Request Type: DELETE

URI: /v1/auth/register/user/{username}

Headers:

• Authorization: Bearer token

Description: Remove/Unregister a user.

Form Data:

•username: The username of the user to remove

Page 7

Darwin API User Guide

Response Codes: 201, 401, 403

Successful Response: None

clean

Request Type: POST

URI: /v1/clean/dataset/{dataset_name}

Headers:

• Authorization: Bearer token

Description:

Clean a named dataset. The output is the cleaned dataset which is scaled and one-hot-

encoded based on parameters in

/analyze/data

. Use

/download/dataset

to retrieve the cleaned dataset.

/clean/dataset

is only used for visualizing what Darwin would do or for when you want to use the cleaned

data outside of Darwin. Do not clean data and then train on the cleaned data with Darwin. Invoking

/train/model has its own cleaning function as part of the model creation process.

Form Data:

•dataset_name: Name of dataset to clean

•job_name: Name of job

•artifact_name: Name given to the cleaned dataset

•target

: (Mandatory for Supervised Model Building) String denoting target prediction column in input

data.

•impute: String alias that indicates how to ﬁll in missing values in input data.

ALIAS DESCRIPTION COMPLEXITY

‘fﬁll’ (Default)

Forward Fill: Propagate values forward from one example

into the missing cell of the next example. Might be useful for

timeseries data, but also applicable for both numerical and

categorical data.

Linear

Fast

‘bﬁll’ Backward Fill: Propagate values backward from one example into

the missing cell of the previous example. Might be useful for

timeseries data, but also applicable for both numerical and

categorical data.

Linear

Fast

‘mean’ Mean Fill: Computes the mean value of all non-missing examples

in a column to ﬁll in missing examples. The result may or might

not be interpretable in terms of the input space for categorical

variables.

Linear

Fast

•max_int_uniques

: Expected input/type:

integer

. Threshold for automatic encoding of categorical

variables. If a column contains less than

max_int_uniques

unique values, it is treated as categorical

and one hot encoded during preprocessing.

Note:

If the target has more numeric values than the

Page 8

Darwin API User Guide

max_int_uniques set point, the problem is treated as a regression and will use MSE.

•max_unique_values

: Expected input/type:

integer

. Threshold for automatic pruning of categorical

columns prior to one hot encoding based on the number of unique values.

Note

: If a categorical column contains at least

max_unique_values

, it is dropped during preprocess-

ing prior to one hot encoding.

Response Codes: 400, 401, 403, 422

Successful Response:

{

"job_name": "string",

"artifact_name": "string"

}

download

Request Type: GET

URI: /v1/download/artifacts/{artifact_name}

Headers:

• Authorization: Bearer token

Description: Download an artifact by name.

Form Data:

•artifact_name: Name of the artifact to download

Response Codes: 201, 401, 404, 408, 422

Successful Response:

{

'artifact': 'artifact_name'

}

Request Type: DELETE

URI: /v1/download/artifacts/{artifact_name}

Headers:

• Authorization: Bearer token

Description: Delete an artifact.

Form Data:

•artifact_name: Name of the artifact to download

Page 9

Darwin API User Guide

Response Codes: 204, 401, 404, 408, 422

Successful Response: None

Request Type: GET

URI: /v1/download/dataset/{dataset_name}

Headers:

• Authorization: Bearer token

Description: Download a dataset by name. It can be an original or cleaned dataset.

Form Data:

•dataset_name

: Name of the dataset to download. In the case of downloading a cleaned dataset, this

would be the name returned by /clean/dataset/{dataset_name}.

•ﬁle_part: Part number of a multi-part dataset, expressed as an integer.

Response Codes: 401, 404, 408, 422

Successful Response:

{

"dataset": "string",

"part": 1,

"note": "string"

}

Request Type: GET

URI: /v1/download/model/{model_name}

Headers:

• Authorization: Bearer token

Description: Download a supervised model by name.

Form Data:

•model_name: Name of the model to download

•path

: (optional) Relative or absolute path of the directory to download the model to. This directory

must already exist prior to model download. If no path is speciﬁed, the current directory is used.

There are two ﬁles associated with a model: ’model’ and ’data_proﬁler’.

•model_type

: (optional) Model type of the model to be downloaded. Possible values include:

Deep-

NeuralNetwork, RandomForest, GradientBoosted.

•model_format

: (optional) Format in which the model is to be downloaded. Possible values include:

json, onnx.

Page 10

Darwin API User Guide

Response Codes: 401, 404, 408, 422

Successful Response:

A successful response returns a .zip ﬁle, which contains two ﬁles: the supervised model itself and the

data proﬁler. Downloading unsupervised models is not supported.

job

Request Type: GET

URI: /v1/job/status

Headers:

• Authorization: Bearer token

Query Parameters:

•age: List jobs that are less than X units old (for example, 3 weeks, 2 days)

•status: List job of a particular status, for example Running

Description: Get the status for all jobs. Note that only 2 jobs can be running concurrently.

Response Codes: 200, 400, 401, 422

Successful Response:

[

{

"job_name": "job1_name",

"status": "Requested",

"starttime": "2018-01-30T13:27:46.449865",

"endtime": "2018-01-30T13:28:46.449865",

"percent_complete": 0,

"job_type": "TrainModel",

"loss": 0,

"generations": 0,

"dataset_names": [

"phone_data"

"artifact_names": [

"art1"

]

"model_name": null,

"job_error": "string"

{

"job_name": "job2_name",

"status": "Running",

Page 11

Darwin API User Guide

"starttime": "2018-01-30T13:27:46.449865",

"endtime": "2018-01-30T13:28:46.449865",

"percent_complete": 23,

"job_type": "UpdateModel",

"loss": 0.92,

"generations": 50,

"dataset_names": [

"language_data"

"artifact_names": null,

"model_name": "test_model",

"job_error": "string"

}

]

Request Type: GET

URI: /v1/job/status/{job_name}

Headers:

• Authorization: Bearer token

Description: Get the status for a particular job.

Form Data:

•job_name: The job name you want status on.

Response Codes: 200, 400, 401, 403, 404, 422

Successful Response:

{

"status": "Requested, Running, Completed",

"starttime": "string",

"endtime": "string",

"percent_complete": 30,

"job_type": "string",

"loss": 0,

"generations": 0,

"dataset_names": [

"string"

"artifact_names": [

"string"

"model_name": "string",

"job_error": "string"

}

Page 12

Darwin API User Guide

Request Type: PATCH

URI: /v1/job/status/{job_name}

Headers:

• Authorization: Bearer token

Description: Stop a running job.

Form Data:

•job_name: The job name you want to stop.

Response Codes: 200, 400, 401, 403, 404, 422

Successful Response:

"Job is scheduled to stop"

Request Type: DELETE

URI: /v1/job/status/{job_name}

Headers:

• Authorization: Bearer token

Description: Soft delete a running job

Form Data:

•job_name: The job name you want to delete.

Response Codes: 200, 400, 401, 403, 404, 422

Successful Response:

None

lookup

Request Type: GET

URI: /v1/lookup/limits

Headers:

• Authorization: Bearer token

Description: Get a client’s usage limit metadata.

Response Codes: 200, 401, 422

Successful Response:

Page 13

Darwin API User Guide

{

"username": "string",

"tier": 0,

"model_limit": 0,

"job_limit": 0,

"upload_limit": 0,

"user_limit": 0

}

Request Type: GET

URI: /v1/lookup/artifact

Headers:

• Authorization: Bearer token

Query Parameters:

•type: ﬁlter on the type of artifact (for example, Model, Dataset, Test, or Run)

Description: Get artifact metadata

Response Codes: 200, 401, 422

Successful Response:

[

{

"id": "string",

"name": "string",

"type": "string",

"created_at": "2018-01-22T19:00:39.863Z",

"mbytes": 0

}

]

Request Type: GET

URI: /v1/lookup/artifact/{artifact_name}

Headers:

• Authorization: Bearer token

Description: Get artifact metadata for a single artifact

Form Data:

•artifact_name: The artifact name you want to look up.

Response Codes: 200, 401, 404, 422

Successful Response:

Page 14

Darwin API User Guide

{

"name": "string",

"type": "string",

"created_at": "2018-01-22T19:00:39.869Z",

"mbytes": 0

}

Request Type: GET

URI: /v1/lookup/model

Headers:

• Authorization: Bearer token

Description:

Get the model metadata for a user. This is useful if a user has forgotten certain model

names.

Response Codes: 200, 401, 422

Successful Response:

[

{

"id": {},

"name": "model1_name",

"type": "Supervised",

"updated_at": "2017-02-03T073000",

"problem_type": "string"

"trained_on": ["dataset1_id", "dataset2_id"],

"generations": 100,

"loss": 0.8,

"complete": {},

"parameters": {},

"train_time_seconds": 240,

"algorithm": "string",

"running_job_id": "string",

"description": {"best_genome": "RandomForestClassifier", "recurrent": False}

{

"id": {},

"name": "model2_name",

"type": "Ensembled",

"updated_at": "2017-08-22T175022",

"trained_on": ["dataset3_id"],

"loss": 0.82,

"complete": {},

"generations": 80,

Page 15

Darwin API User Guide

"parameters": {

"target": "target1"

"train_time_seconds": 180,

"algorithm": "string",

"running_job_id": "string",

"description": {"best_genome": "DeepNet(\n (l0): LSTM(20, 18, num_layers=2)\n

(l1): Linear(in_features=18, out_features=1, bias=True)\n)",

"recurrent": True}

}

]

Note: running_job_id is only returned when complete is False.

Request Type: GET

URI: /v1/lookup/model/{model_name}

Headers:

• Authorization: Bearer token

Description: Get all of the model metadata for a particular model.

Form Data:

•model_name: The model name you want to look up.

Response Codes: 200, 401, 404, 422

Successful Response:

{

"type": "Unsupervised",

"updated_at": "2017-02-03T073000",

"trained_on": ["dataset1_id", "dataset2_id"],

"generations": 100,

"loss": 0.8,

"parameters": {},

"train_time_seconds": 180,

"algorithm": "string",

"running_job_id": "string",

"description": {"best_genome": "RandomForestClassifier", "recurrent": False}

}

Note: running_job_id is only returned when complete is False.

Request Type: GET

URI: /v1/lookup/model/{model_name}/population

Page 16

Darwin API User Guide

Headers:

• Authorization: Bearer token

Description:

Get model descriptions of the best genomes for all model types that were trained. The

population is displayed for unsupervised models only.

Form Data:

•model_name: The model name or identiﬁer.

Response Codes: 201, 401, 404, 422

Successful Response:

{

"population": {

"model_types": {

"DeepNeuralNetwork": {

"model_description": "string",

"loss_function": "string",

"fitness": Double

"RandomForest": {

"model_description": "string",

"loss_function": "string",

"fitness": Double

"GradientBoosted": {

"model_description": "string",

"loss_function": "string",

"fitness": Double

}

Request Type: GET

URI: /v1/lookup/dataset

Headers:

• Authorization: Bearer token

Description:

Get the dataset metadata for a user. This is useful if a user has forgotten certain dataset

names.

Response Codes: 200, 401, 422

Successful Response:

Page 17

Darwin API User Guide

[

{

"name": "dataset1_name",

"mbytes": 0.2,

"minimum_recommended_train_time": "string",

"updated_at": "20170924T000000",

"categorical": False,

"sequential": True,

"imbalanced": True,

{

"name": "dataset2_name",

"mbytes": 3.5,

"minimum_recommended_train_time": "string",

"updated_at": "20170902T010101",

"categorical": True,

"sequential": False,

"imbalanced": False,

}

]

Request Type: GET

URI: /v1/lookup/dataset/{dataset_name}

Headers:

• Authorization: Bearer token

Description: Get all of the metadata for a particular dataset.

Form Data:

•dataset_name: The dataset name for which you want the metadata.

Response Codes: 200, 401, 404, 422

Successful Response:

{

"mbytes": 0.2,

"minimum_recommended_train_time": "string",

"updated_at": "20170924T000000",

"categorical": False,

"sequential": True,

"imbalanced": True,

}

Page 18

Darwin API User Guide

Request Type: GET

URI: /v1/lookup/tier

Headers:

• Authorization: Bearer token

Description: Get all of the tier metadata.

Response Codes: 200, 401, 422

Successful Response:

[

{

"tier": 0,

"model_limit": 0,

"job_limit": 0,

"upload_limit": 0,

"user_limit": 0

}

]

Request Type: GET

URI: /v1/lookup/tier/{tier_num}

Headers:

• Authorization: Bearer token

Description: Get the metadata for a particular tier.

Form Data:

•tier_num: Tier for which you want metadata.

Response Codes: 200, 401, 404, 422

Successful Response:

{

"tier": 0,

"model_limit": 0,

"job_limit": 0,

"upload_limit": 0,

"user_limit": 0

}

Request Type: GET

URI: /v1/lookup/user

Headers:

Page 19

Darwin API User Guide

• Authorization: Bearer token

Description: Get user metadata for all users.

Response Codes: 200, 401, 422

Successful Response:

[

{

"user_id": "string",

"internal_name": "string",

"username": "string",

"tier": 0,

"created_at": "string",

"client_api_key": "string",

"expires_on": "string",

"parent_id": "string"

}

]

Request Type: GET

URI: /v1/lookup/user/{username}

Headers:

• Authorization: Bearer token

Description: Get user metadata for a particular user.

Form Data:

•username: Username for which you want user metadata.

Response Codes: 200, 401, 404, 422

Successful Response:

{

"user_id": "string",

"internal_name": "string",

"username": "string",

"tier": 0,

"created_at": "string",

"client_api_key": "string",

"expires_on": "string",

"parent_id": "string"

}

Page 20

Darwin API User Guide

run

Request Type: POST

URI: /v1/run/model/{model_name}/{dataset_name}

Headers:

• Authorization: Bearer token

Form Data:

•model_name: The name of the model.

•artifact_name: The name of the artifact.

•dataset_name: The name of the dataset.

•anomaly

: Setting this parameter to

True

indicates that an isolation forest should be built for

anomaly detection. If set to True, clustering will automatically be interpreted as False.

•supervised

: (

Deprecated

. This argument exists only for backward compatibility.) A boolean

(True/False) indicating whether the model is supervised or not, for example, set this to

False

for

unsupervised.

•model_type

- (optional) Model type of the model to be downloaded. Possible values include:

Deep-

NeuralNetwork, RandomForest, GradientBoosted.

Description:

Run a model on a dataset and return the predictions/classiﬁcations/clusters found by the

model.

Response Codes: 201, 400, 401, 403, 404, 408, 422

Successful Response:

{

"job_name": "name_of_job",

"artifact_name": "name_of_artifact"

}

train

Request Type: POST

URI: /v1/train/model

Headers:

• Authorization: Bearer token

Description: Create a model trained on the dataset identiﬁed by dataset_names.

Parameter descriptions:

Page 21

Darwin API User Guide

•dataset_names

: A list of dataset names to use for training. The maximum ﬁle size is 500 MB for

unsupervised and NBM and 10 GB for supervised.

Note: Using only 1 dataset is currently supported.

•job_name: The job name.

•model_name: The string identiﬁer of the model to be trained.

•loss_fn_name

: Specify the loss function. Possible values include:

"CrossEntropy", "MSE", "BCE",

"L1", "NLL", "BCEWithLogits", "SmoothL1". "CrossEntropy"

can be used for classiﬁcation data, while

all others can be used for regression data. The default value is

"CrossEntropy"

if this ﬁeld is left

empty.

•ﬁtness_fn_name

: Specify the ﬁtness function. This represents the name of the ﬁtness function used

for evolution of the model population during training. Possible values include:

"Accuracy", "F1",

"R2", "MSE"

. "F1" is the default for classiﬁcation and "R2" is the default for regression problems.

"Accuracy" and "F1" are for classiﬁcation only. "R2" and "MSE" are for regression only.

•max_train_time

(supervised only): Sets the training time for the model in ‘HH:MM’ format. Default

value is 00:01.

•max_epochs

(unsupervised only): Expected input/type:

numeric

. Sets the training time for the

model in epochs. Default value is 10.

•recurrent

: Expected input/type:

True/False

. Enables recurrent connections to be evolved in the

model. This option can be useful for timeseries or sequential data.

Note

: This option is automatically enabled if a

datetime

column is detected in the input data. This

may result in slower model evolution.

•impute: String alias that indicates how to ﬁll in missing values in input data.

ALIAS DESCRIPTION COMPLEXITY

‘fﬁll’ (Default)

Forward Fill: Propagate values forward from one example

into the missing cell of the next example. Might be useful for

timeseries data, but also applicable for both numerical and

categorical data.

Linear

Fast

‘bﬁll’ Backward Fill: Propagate values backward from one example into

the missing cell of the previous example. Might be useful for

timeseries data, but also applicable for both numerical and

categorical data.

Linear

Fast

‘mean’ Mean Fill: Computes the mean value of all non-missing examples

in a column to ﬁll in missing examples. The result may or might

not be interpretable in terms of the input space for categorical

variables.

Linear

Fast

•anomaly

: Setting this parameter to

True

indicates that an isolation forest should be built for

anomaly detection. If set to True, clustering will automatically be interpreted as False.

Page 22

Darwin API User Guide

•n_clusters (unsupervised only): Speciﬁes the number of clusters to be used.

Note: If this value is not provided, the number of clusters will be heuristically determined.

•anomaly_prior

(

unsupervised

only): Expected input/type:

between [0,1]

. Signiﬁcance level at which

a point is deﬁned as anomalous. This is only used for unsupervised problems if

clustering

disabled.

•lead_time_days

(

nbm

only): Expected input/type:

integer

. Default value is

. The number of days

prior to failure when the behavior starts trending toward either abnormal behavior or failure.

•nbm_window_size

(

nbm

only): Expected input/type:

integer

. Default value is

256

. The number of

sample points to consider for each failure detection.

•nbm

(

nbm

only): Expected input/type:

True/False

. Default value is

False

. Set value to

True

for a

normal behavioral model (NBM).

•failure_dates

(

nbm

only): Expected input/type:

string

. List of failure dates to use for the calculation.

Currently, only a list of one date can be used in the query. Example date format: "07/01/2015"

•recovery_dates

(

nbm

only): Expected input/type:

string

. List of recovery dates to use for the

calculation. Currently, only a list of one date can be used in the query. Example date format:

"11/01/2015"

Payload:

{

"dataset_names": ["dataset_name1"],

"job_name": "my_job",

"model_name": "string",

"loss_fn_name": "CrossEntropy",

"fitness_fn_name": "Accuracy",

"max_train_time": "00:01",

"max_epochs": 0,

"recurrent": True,

"impute": "mean",

"drop": "no",

"feature_eng": "mi",

"feature_select": 1,

"outlier": "mad",

"imbalance": True,

"anomaly": False,

"n_clusters": 5,

"anomaly_prior": 0.01,

"lead_time_days": 60,

"nbm_window_size": 256,

"nbm": False,

"return_risk": True,

"failure_dates": ["string"],

"recovery_dates": ["string"],

"scaler": "MinMax",

Page 23

Darwin API User Guide

"target_scaler": "MinMax"

}

Response Codes: 201, 400, 401, 403, 404, 408, 422

Successful Response:

{

"job_name": "nameofjob",

"model_name": "nameofmodel",

}

Request Type: PATCH

URI: /v1/train/model/{model_name}

Headers:

• Authorization: Bearer token

Description: Resume training for a model on the dataset identiﬁed by dataset_names.

Parameter Descriptions:

•dataset_names: A list of dataset names to use for training.

Note: Using only 1 dataset is currently supported.

•job_name: The job name

•max_train_time

(supervised only): Sets the training time for the model in ‘HH:MM’ format. Default

value is 00:01.

•max_epochs

(unsupervised only): Sets the training time for the model in epochs. Default value is

Payload:

{

"dataset_names": ["dataset_name1"],

"job_name": "my_job",

"max_train_time": "00:01",

"max_epochs": 0

}

Response Codes: 201, 401, 403, 404, 408, 422

Successful Response:

{

"job_name": "nameofjob",

"model_name": "nameofmodel",

}

Request Type: DELETE

Page 24

Darwin API User Guide

URI: /v1/train/model/{model_name}

Headers:

• Authorization: Bearer token

Description: Delete a model.

Form Data:

•model_name: - Name of the model to delete.

Response Codes: 204, 400, 401, 403, 404, 408, 422

Successful Response: None

upload

Request Type: POST

URI: /v1/upload

Headers:

• Authorization: Bearer token

Description: Upload a dataset.

Form Data:

•dataset: a dataset ﬁle in a supported format (csv, h5)

•dataset_name: the name for the uploaded dataset

Note: If not set, a guid will be provided

Response Codes: 201, 400, 401, 403, 408, 413, 422

Successful Response:

{

"dataset_name": "name_of_dataset"

}

Request Type: DELETE

URI: /v1/upload/{dataset_name}

Headers:

• Authorization: Bearer token

Description: Delete a dataset.

Form Data:

•dataset_name: Name or identiﬁer of dataset to delete.

Page 25

Darwin API User Guide

Response Codes: 204, 401, 403, 404, 422

Successful Response: None

Revision Table

Version Date Notes

v 1.0 02-Feb-2018 First Release

v 1.1 15-Feb-2018 added types: supervised and ensembled

v 1.2(pre)

16-Mar-2018 added Status: Type= PATCH

v 1.2 27-Mar-2018 Added or changed:

• /v1/job/status/{job_name}

• /v1/lookup/user

• /v1/lookup/username/{username}

• /v1/train/model

• /v1/run/model/{model_name}/{dataset_name}

Name change: /v1/lookup/client to /v1/lookup/limits

v 1.3 23-May-2018 Added or changed:

• /v1/analyze/model/{model_name}

• /v1/analyze/model/predictions/{model_name}/{dataset_name}

• /v1/auth/email

• /v1/auth/password/reset

• /v1/auth/register

• /v1/train/model

• /v1/train/model/{model_name}

Name change: /v1/lookup/client to /v1/lookup/limits

v 1.3.1 14-Jun-2018 Edits to:

• /v1/job/status/

• /v1/download/artifacts

• Model uses example

v 1.4 31-Jul-2018 Edits to:

• /v1/analyze/model/{model_name}

• /v1/analyze/data/{dataset_name}

• /v1/lookup/model

• /v1/lookup/model/{model_name}

• /v1/train/model

• /v1/train/model/{model_name}

Page 26

Darwin API User Guide

Version Date Notes

v 1.5 15-Oct-2018 Added:

• /v1/clean/dataset/{dataset_name}

• /v1/download/dataset/{dataset_name}

• /v1/download/model/{model_name}

Edits to:

• /v1/analyze/data/{dataset_name}

• /v1/lookup/model

• /v1/train/model

• /v1/download/artifacts/{artifact_name}

v 1.6 16-Jan-2019 Added:

• /v1/lookup/model/{model_name}/population

Edits to:

• /v1/analyze/model/predictions/{model_name}/{dataset_name}

• /v1/analyze/model/{model_name}

• /v1/clean/dataset/{dataset_name}

• /v1/download/model/{model_name}

• /v1/train/model

• /v1/run/model/{model_name}/{dataset_name}

Page 27

SparkCognition Darwin API User Guide Spark Cognition V1.6

SparkCognition_Darwin_API_User_Guide_v1.6

SparkCognition_Darwin_API_User_Guide_v1.6

SparkCognition_Darwin_API_User_Guide_v1.6

Navigation menu

Versions of this User Manual:

Views

Navigation