SparkCognition Darwin API User Guide Spark Cognition V1.6

SparkCognition_Darwin_API_User_Guide_v1.6

SparkCognition_Darwin_API_User_Guide_v1.6

SparkCognition_Darwin_API_User_Guide_v1.6

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 27

DownloadSparkCognition Darwin API User Guide Spark Cognition V1.6
Open PDF In BrowserView PDF
SparkCognition Darwin API User Guide

Contents
About this guide

1

Darwin overview

1

Accessing the API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Expectation

2

Technical routes

2

analyze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

auth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

clean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

download . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

run

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

train . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

upload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Revision Table

26

About this guide
This manual describes the Darwin™ API and its use in automated model building. It is intended for data
scientists, software engineers, and analysts who want to use the Darwin API to interact with Darwin to
create and train models, monitor jobs, and perform analysis.

Darwin overview
Darwin is a SparkCognition™ tool that automates model building processes to solve specific problems.
This tool enhances data scientist potential because it automates various tasks that are often manually
performed. These tasks include data cleaning, latent relationship extraction, and optimal model determination. Darwin promotes rapid and accurate feature generation through both automated windowing
and risk generation. Darwin quickly creates highly-accurate, dynamic models using both supervised and
unsupervised learning methods.
1

Darwin API User Guide

For additional information on Darwin, contact your local SparkCognition partner for access to the white
paper titled: Darwin - A Neurogenesis Platform.

Accessing the API
The Darwin API can normally be accessed through one of three methods:
• the Darwin Python SDK (preferred, recommended)
• the https://darwin-api.sparkcognition.com/v1 end point
• optionally, through user-created curl commands
For additional information on the Darwin SDK, see the SparkCognition Darwin Python SDK Guide.

Expectation
This document assumes the experience of a data scientist or software engineer that is knowledgeable of
data science techniques and associated programming tasks.

Technical routes
The Darwin API includes the following api operations:
• analyze - analyze a model or dataset
• auth - register and authenticate
• clean - preprocess a dataset
• download - download or delete a generated artifact
• job - return status on jobs
• lookup - get model or dataset metadata
• run - run a model on a dataset
• train - train a model
• upload - upload or delete a dataset

analyze
Request Type: POST
URI: /v1/analyze/model/{model_name}
Headers:
• Authorization: Bearer token
Form Data:
• model_name: The name of the model to be analyzed
• job_name: (optional) If not specified, a uuid is created as the job_name.
Page 2

Darwin API User Guide

• artifact_name: (optional) If not specified, a uuid is created as the artifact_name.
• category_name: (optional) The name of the class for supervised or cluster for unsupervised to get feature importances for. If this is not specified, the feature importances will be over all classes/clusters.
• model_type: (optional) Model type from the population. Possible values include: DeepNeuralNetwork,
RandomForest, GradientBoosted.
Description: Analyze the universal feature importances for a particular model given the model name.
Note: This API is capable of returning the structure of the model in the form of a pandas Series.
Response Codes: 201, 400, 401, 403, 422
Successful Response:
{
"job_name": "string",
"artifact_name": "string"
}

Request Type: POST
URI: /v1/analyze/model/predictions/{model_name}/{dataset_name}
Headers:
• Authorization: Bearer token
Form Data:
• dataset_name: The name of the dataset containing the data to analyze predictions for. This is a new
dataset that was not used during training for which you want feature importance scores for each
row of this dataset. This dataset has a limit of 500 rows. There is no limit for columns.
• model_name: The name of the model to be analyzed
• job_name: (optional) If not specified, a uuid is created as the job_name.
• artifact_name: (optional) If not specified, a uuid is created as the artifact_name.
• start_index: (optional) Index to start at in the dataset when analyzing model predictions.
• end_index: (optional) Index to stop at in the dataset when analyzing model predictions.
• model_type: (optional) Model type from the population. Possible values include: DeepNeuralNetwork,
RandomForest, GradientBoosted.
Description: Analyze specific feature importances for a particular sample or samples given the model
name and sample data. Analyze predictions cannot be used if you trained your model with a dataset that
is larger than 500 MB.
Response Codes: 201, 400, 401, 403, 422
Successful Response:
{
"job_name": "string",
Page 3

Darwin API User Guide

"artifact_name": "string"
}

Request Type: POST
URI: /v1/analyze/data/{dataset_name}
Headers:
• Authorization: Bearer token
Description: Analyze a dataset and return statistics/metadata concerning designated data.
Parameter Descriptions:
• dataset_name: The name of the dataset to analyze and return statistics/metadata for
• job_name: The job name
• artifact_name: The artifact name
• max_unique_values: Threshold for automatic pruning of categorical columns prior to one hot encoding based on the number of unique values
Note: If a categorical column contains at least max_unique_values, it is dropped during preprocessing prior to one hot encoding.
Payload:
{
"job_name": "string",
"artifact_name": "string",
"max_unique_values": 30
}
Response Codes: 201, 400, 401, 403, 408, 422
Successful Response:
{
"job_name": "string",
"artifact_name": "string"
}

auth
Request Type: PATCH
URI: /v1/auth/email
Headers:
• Authorization: Bearer token
Page 4

Darwin API User Guide

Description: Add or change an email address.
Form Data:
• email: Email address
Response Codes: 204, 400, 401, 422
Successful Response:
{
'access_token': 'token_string'
}

Request Type: POST
URI: /v1/auth/login
Headers:
• Authorization: Bearer token
Description: Login as a service.
Form Data:
• api_key: The api key of the service
• pass1: The service level password
Response Codes: 201, 400, 401
Successful Response:
{
'access_token': 'token_string'
}

Request Type: POST
URI: /v1/auth/login/user
Description: Login as a user.
Form Data:
• username: The end user’s name
• pass1: The end user’s password
Response Codes: 201, 400, 401, 422
Successful Response:
{
'access_token': 'token_string'
}
Page 5

Darwin API User Guide

Request Type: PATCH
URI: /v1/auth/password
Headers:
• Authorization: Bearer token
Description: Change the password.
Form Data:
• curpass: Current password
• newpass1: New password
• newpass2: Confirmation of new password
Response Codes: 204, 400, 401, 422
Successful Response:
{
'access_token': 'token_string'
}

Request Type: PATCH
URI: /v1/auth/password/reset
Headers:
Description: Reset a user’s password. An email will be sent to the user’s email address with a temporary
password and instructions for changing it.
Form Data:
• username: The username of the user whose password needs resetting
Response Codes: 201, 400, 401, 422
Successful Response:
{
'access_token': 'token_string'
}

Request Type: POST
URI: /v1/auth/register
Headers:
Description: Register as a service.
Form Data:
Page 6

Darwin API User Guide

• api_key: The api key of the service
• pass1: The service level password
• pass2: The service level password confirmation
• email: Email address
Response Codes: 201, 400, 401, 403
Successful Response:
{
'access_token': 'token_string'
}

Request Type: POST
URI: /v1/auth/register/user
Headers:
• Authorization: Bearer token
Description: Register a user for your service.
Form Data:
• username: The end user’s name
• pass1: The end user’s password
• pass2: The end user’s password confirmation
• email: The end user’s email address
Response Codes: 201, 400, 401, 422
Successful Response:
{
'access_token': 'token_string'
}

Request Type: DELETE
URI: /v1/auth/register/user/{username}
Headers:
• Authorization: Bearer token
Description: Remove/Unregister a user.
Form Data:
• username: The username of the user to remove

Page 7

Darwin API User Guide

Response Codes: 201, 401, 403
Successful Response: None

clean
Request Type: POST
URI: /v1/clean/dataset/{dataset_name}
Headers:
• Authorization: Bearer token
Description: Clean a named dataset. The output is the cleaned dataset which is scaled and one-hotencoded based on parameters in /analyze/data. Use /download/dataset to retrieve the cleaned dataset.
/clean/dataset is only used for visualizing what Darwin would do or for when you want to use the cleaned
data outside of Darwin. Do not clean data and then train on the cleaned data with Darwin. Invoking
/train/model has its own cleaning function as part of the model creation process.
Form Data:
• dataset_name: Name of dataset to clean
• job_name: Name of job
• artifact_name: Name given to the cleaned dataset
• target: (Mandatory for Supervised Model Building) String denoting target prediction column in input
data.
• impute: String alias that indicates how to fill in missing values in input data.
ALIAS

DESCRIPTION

COMPLEXITY

‘ffill’

(Default) Forward Fill: Propagate values forward from one example

Linear

into the missing cell of the next example. Might be useful for

Fast

timeseries data, but also applicable for both numerical and
categorical data.
‘bfill’

Backward Fill: Propagate values backward from one example into

Linear

the missing cell of the previous example. Might be useful for

Fast

timeseries data, but also applicable for both numerical and
categorical data.
‘mean’

Mean Fill: Computes the mean value of all non-missing examples

Linear

in a column to fill in missing examples. The result may or might

Fast

not be interpretable in terms of the input space for categorical
variables.

• max_int_uniques: Expected input/type: integer. Threshold for automatic encoding of categorical
variables. If a column contains less than max_int_uniques unique values, it is treated as categorical
and one hot encoded during preprocessing. Note: If the target has more numeric values than the
Page 8

Darwin API User Guide

max_int_uniques set point, the problem is treated as a regression and will use MSE.
• max_unique_values: Expected input/type: integer. Threshold for automatic pruning of categorical
columns prior to one hot encoding based on the number of unique values.
Note: If a categorical column contains at least max_unique_values, it is dropped during preprocessing prior to one hot encoding.
Response Codes: 400, 401, 403, 422
Successful Response:
{
"job_name": "string",
"artifact_name": "string"
}

download
Request Type: GET
URI: /v1/download/artifacts/{artifact_name}
Headers:
• Authorization: Bearer token
Description: Download an artifact by name.
Form Data:
• artifact_name: Name of the artifact to download
Response Codes: 201, 401, 404, 408, 422
Successful Response:
{
'artifact': 'artifact_name'
}

Request Type: DELETE
URI: /v1/download/artifacts/{artifact_name}
Headers:
• Authorization: Bearer token
Description: Delete an artifact.
Form Data:
• artifact_name: Name of the artifact to download

Page 9

Darwin API User Guide

Response Codes: 204, 401, 404, 408, 422
Successful Response: None

Request Type: GET
URI: /v1/download/dataset/{dataset_name}
Headers:
• Authorization: Bearer token
Description: Download a dataset by name. It can be an original or cleaned dataset.
Form Data:
• dataset_name: Name of the dataset to download. In the case of downloading a cleaned dataset, this
would be the name returned by /clean/dataset/{dataset_name}.
• file_part: Part number of a multi-part dataset, expressed as an integer.
Response Codes: 401, 404, 408, 422
Successful Response:
{
"dataset": "string",
"part": 1,
"note": "string"
}

Request Type: GET
URI: /v1/download/model/{model_name}
Headers:
• Authorization: Bearer token
Description: Download a supervised model by name.
Form Data:
• model_name: Name of the model to download
• path: (optional) Relative or absolute path of the directory to download the model to. This directory
must already exist prior to model download. If no path is specified, the current directory is used.
There are two files associated with a model: ’model’ and ’data_profiler’.
• model_type: (optional) Model type of the model to be downloaded. Possible values include: DeepNeuralNetwork, RandomForest, GradientBoosted.
• model_format: (optional) Format in which the model is to be downloaded. Possible values include:
json, onnx.

Page 10

Darwin API User Guide

Response Codes: 401, 404, 408, 422
Successful Response:
A successful response returns a .zip file, which contains two files: the supervised model itself and the
data profiler. Downloading unsupervised models is not supported.

job
Request Type: GET
URI: /v1/job/status
Headers:
• Authorization: Bearer token
Query Parameters:
• age: List jobs that are less than X units old (for example, 3 weeks, 2 days)
• status: List job of a particular status, for example Running
Description: Get the status for all jobs. Note that only 2 jobs can be running concurrently.
Response Codes: 200, 400, 401, 422
Successful Response:
[
{
"job_name": "job1_name",
"status": "Requested",
"starttime": "2018-01-30T13:27:46.449865",
"endtime": "2018-01-30T13:28:46.449865",
"percent_complete": 0,
"job_type": "TrainModel",
"loss": 0,
"generations": 0,
"dataset_names": [
"phone_data"
],
"artifact_names": [
"art1"
]
"model_name": null,
"job_error": "string"
},
{
"job_name": "job2_name",
"status": "Running",
Page 11

Darwin API User Guide

"starttime": "2018-01-30T13:27:46.449865",
"endtime": "2018-01-30T13:28:46.449865",
"percent_complete": 23,
"job_type": "UpdateModel",
"loss": 0.92,
"generations": 50,
"dataset_names": [
"language_data"
],
"artifact_names": null,
"model_name": "test_model",
"job_error": "string"
}
]

Request Type: GET
URI: /v1/job/status/{job_name}
Headers:
• Authorization: Bearer token
Description: Get the status for a particular job.
Form Data:
• job_name: The job name you want status on.
Response Codes: 200, 400, 401, 403, 404, 422
Successful Response:
{
"status": "Requested, Running, Completed",
"starttime": "string",
"endtime": "string",
"percent_complete": 30,
"job_type": "string",
"loss": 0,
"generations": 0,
"dataset_names": [
"string"
],
"artifact_names": [
"string"
],
"model_name": "string",
"job_error": "string"
}
Page 12

Darwin API User Guide

Request Type: PATCH
URI: /v1/job/status/{job_name}
Headers:
• Authorization: Bearer token
Description: Stop a running job.
Form Data:
• job_name: The job name you want to stop.
Response Codes: 200, 400, 401, 403, 404, 422
Successful Response:
"Job is scheduled to stop"

Request Type: DELETE
URI: /v1/job/status/{job_name}
Headers:
• Authorization: Bearer token
Description: Soft delete a running job
Form Data:
• job_name: The job name you want to delete.
Response Codes: 200, 400, 401, 403, 404, 422
Successful Response:
None

lookup
Request Type: GET
URI: /v1/lookup/limits
Headers:
• Authorization: Bearer token
Description: Get a client’s usage limit metadata.
Response Codes: 200, 401, 422
Successful Response:

Page 13

Darwin API User Guide

{
"username": "string",
"tier": 0,
"model_limit": 0,
"job_limit": 0,
"upload_limit": 0,
"user_limit": 0
}
Request Type: GET
URI: /v1/lookup/artifact
Headers:
• Authorization: Bearer token
Query Parameters:
• type: filter on the type of artifact (for example, Model, Dataset, Test, or Run)
Description: Get artifact metadata
Response Codes: 200, 401, 422
Successful Response:
[
{
"id": "string",
"name": "string",
"type": "string",
"created_at": "2018-01-22T19:00:39.863Z",
"mbytes": 0
}
]

Request Type: GET
URI: /v1/lookup/artifact/{artifact_name}
Headers:
• Authorization: Bearer token
Description: Get artifact metadata for a single artifact
Form Data:
• artifact_name: The artifact name you want to look up.
Response Codes: 200, 401, 404, 422
Successful Response:

Page 14

Darwin API User Guide

{
"name": "string",
"type": "string",
"created_at": "2018-01-22T19:00:39.869Z",
"mbytes": 0
}

Request Type: GET
URI: /v1/lookup/model
Headers:
• Authorization: Bearer token
Description: Get the model metadata for a user. This is useful if a user has forgotten certain model
names.
Response Codes: 200, 401, 422
Successful Response:
[
{
"id": {},
"name": "model1_name",
"type": "Supervised",
"updated_at": "2017-02-03T073000",
"problem_type": "string"
"trained_on": ["dataset1_id", "dataset2_id"],
"generations": 100,
"loss": 0.8,
"complete": {},
"parameters": {},
"train_time_seconds": 240,
"algorithm": "string",
"running_job_id": "string",
"description": {"best_genome": "RandomForestClassifier", "recurrent": False}
},
{
"id": {},
"name": "model2_name",
"type": "Ensembled",
"updated_at": "2017-08-22T175022",
"trained_on": ["dataset3_id"],
"loss": 0.82,
"complete": {},
"generations": 80,
Page 15

Darwin API User Guide

"parameters": {
"target": "target1"
},
"train_time_seconds": 180,
"algorithm": "string",
"running_job_id": "string",
"description": {"best_genome": "DeepNet(\n (l0): LSTM(20, 18, num_layers=2)\n
(l1): Linear(in_features=18, out_features=1, bias=True)\n)",
"recurrent": True}
}
]
Note: running_job_id is only returned when complete is False.

Request Type: GET
URI: /v1/lookup/model/{model_name}
Headers:
• Authorization: Bearer token
Description: Get all of the model metadata for a particular model.
Form Data:
• model_name: The model name you want to look up.
Response Codes: 200, 401, 404, 422
Successful Response:
{
"type": "Unsupervised",
"updated_at": "2017-02-03T073000",
"trained_on": ["dataset1_id", "dataset2_id"],
"generations": 100,
"loss": 0.8,
"parameters": {},
"train_time_seconds": 180,
"algorithm": "string",
"running_job_id": "string",
"description": {"best_genome": "RandomForestClassifier", "recurrent": False}
}
Note: running_job_id is only returned when complete is False.

Request Type: GET
URI: /v1/lookup/model/{model_name}/population
Page 16

Darwin API User Guide

Headers:
• Authorization: Bearer token
Description: Get model descriptions of the best genomes for all model types that were trained. The
population is displayed for unsupervised models only.
Form Data:
• model_name: The model name or identifier.
Response Codes: 201, 401, 404, 422
Successful Response:
{
"population": {
"model_types": {
"DeepNeuralNetwork": {
"model_description": "string",
"loss_function": "string",
"fitness": Double
},
"RandomForest": {
"model_description": "string",
"loss_function": "string",
"fitness": Double
},
"GradientBoosted": {
"model_description": "string",
"loss_function": "string",
"fitness": Double
}
}
}
}

Request Type: GET
URI: /v1/lookup/dataset
Headers:
• Authorization: Bearer token
Description: Get the dataset metadata for a user. This is useful if a user has forgotten certain dataset
names.
Response Codes: 200, 401, 422
Successful Response:

Page 17

Darwin API User Guide

[
{
"name": "dataset1_name",
"mbytes": 0.2,
"minimum_recommended_train_time": "string",
"updated_at": "20170924T000000",
"categorical": False,
"sequential": True,
"imbalanced": True,
},
{
"name": "dataset2_name",
"mbytes": 3.5,
"minimum_recommended_train_time": "string",
"updated_at": "20170902T010101",
"categorical": True,
"sequential": False,
"imbalanced": False,
}
]

Request Type: GET
URI: /v1/lookup/dataset/{dataset_name}
Headers:
• Authorization: Bearer token
Description: Get all of the metadata for a particular dataset.
Form Data:
• dataset_name: The dataset name for which you want the metadata.
Response Codes: 200, 401, 404, 422
Successful Response:
{
"mbytes": 0.2,
"minimum_recommended_train_time": "string",
"updated_at": "20170924T000000",
"categorical": False,
"sequential": True,
"imbalanced": True,
}

Page 18

Darwin API User Guide

Request Type: GET
URI: /v1/lookup/tier
Headers:
• Authorization: Bearer token
Description: Get all of the tier metadata.
Response Codes: 200, 401, 422
Successful Response:
[
{
"tier": 0,
"model_limit": 0,
"job_limit": 0,
"upload_limit": 0,
"user_limit": 0
}
]

Request Type: GET
URI: /v1/lookup/tier/{tier_num}
Headers:
• Authorization: Bearer token
Description: Get the metadata for a particular tier.
Form Data:
• tier_num: Tier for which you want metadata.
Response Codes: 200, 401, 404, 422
Successful Response:
{
"tier": 0,
"model_limit": 0,
"job_limit": 0,
"upload_limit": 0,
"user_limit": 0
}

Request Type: GET
URI: /v1/lookup/user
Headers:
Page 19

Darwin API User Guide

• Authorization: Bearer token
Description: Get user metadata for all users.
Response Codes: 200, 401, 422
Successful Response:
[
{
"user_id": "string",
"internal_name": "string",
"username": "string",
"tier": 0,
"created_at": "string",
"client_api_key": "string",
"expires_on": "string",
"parent_id": "string"
}
]
Request Type: GET
URI: /v1/lookup/user/{username}
Headers:
• Authorization: Bearer token
Description: Get user metadata for a particular user.
Form Data:
• username: Username for which you want user metadata.
Response Codes: 200, 401, 404, 422
Successful Response:
{
"user_id": "string",
"internal_name": "string",
"username": "string",
"tier": 0,
"created_at": "string",
"client_api_key": "string",
"expires_on": "string",
"parent_id": "string"
}

Page 20

Darwin API User Guide

run
Request Type: POST
URI: /v1/run/model/{model_name}/{dataset_name}
Headers:
• Authorization: Bearer token
Form Data:
• model_name: The name of the model.
• artifact_name: The name of the artifact.
• dataset_name: The name of the dataset.
• anomaly: Setting this parameter to True indicates that an isolation forest should be built for
anomaly detection. If set to True, clustering will automatically be interpreted as False.
• supervised: (Deprecated. This argument exists only for backward compatibility.) A boolean
(True/False) indicating whether the model is supervised or not, for example, set this to False for
unsupervised.
• model_type - (optional) Model type of the model to be downloaded. Possible values include: DeepNeuralNetwork, RandomForest, GradientBoosted.
Description: Run a model on a dataset and return the predictions/classifications/clusters found by the
model.
Response Codes: 201, 400, 401, 403, 404, 408, 422
Successful Response:
{
"job_name": "name_of_job",
"artifact_name": "name_of_artifact"
}

train
Request Type: POST
URI: /v1/train/model
Headers:
• Authorization: Bearer token
Description: Create a model trained on the dataset identified by dataset_names.
Parameter descriptions:

Page 21

Darwin API User Guide

• dataset_names: A list of dataset names to use for training. The maximum file size is 500 MB for
unsupervised and NBM and 10 GB for supervised.
Note: Using only 1 dataset is currently supported.
• job_name: The job name.
• model_name: The string identifier of the model to be trained.
• loss_fn_name: Specify the loss function. Possible values include: "CrossEntropy", "MSE", "BCE",
"L1", "NLL", "BCEWithLogits", "SmoothL1". "CrossEntropy" can be used for classification data, while
all others can be used for regression data. The default value is "CrossEntropy" if this field is left
empty.
• fitness_fn_name: Specify the fitness function. This represents the name of the fitness function used
for evolution of the model population during training. Possible values include: "Accuracy", "F1",
"R2", "MSE". "F1" is the default for classification and "R2" is the default for regression problems.
"Accuracy" and "F1" are for classification only. "R2" and "MSE" are for regression only.
• max_train_time (supervised only): Sets the training time for the model in ‘HH:MM’ format. Default
value is 00:01.

• max_epochs (unsupervised only): Expected input/type: numeric. Sets the training time for the
model in epochs. Default value is 10.

• recurrent: Expected input/type: True/False. Enables recurrent connections to be evolved in the
model. This option can be useful for timeseries or sequential data.
Note: This option is automatically enabled if a datetime column is detected in the input data. This
may result in slower model evolution.
• impute: String alias that indicates how to fill in missing values in input data.
ALIAS

DESCRIPTION

COMPLEXITY

‘ffill’

(Default) Forward Fill: Propagate values forward from one example
into the missing cell of the next example. Might be useful for

Linear
Fast

timeseries data, but also applicable for both numerical and
categorical data.
‘bfill’

Backward Fill: Propagate values backward from one example into

Linear

the missing cell of the previous example. Might be useful for

Fast

timeseries data, but also applicable for both numerical and
categorical data.
‘mean’

Mean Fill: Computes the mean value of all non-missing examples

Linear

in a column to fill in missing examples. The result may or might

Fast

not be interpretable in terms of the input space for categorical
variables.

• anomaly: Setting this parameter to True indicates that an isolation forest should be built for
anomaly detection. If set to True, clustering will automatically be interpreted as False.
Page 22

Darwin API User Guide

• n_clusters (unsupervised only): Specifies the number of clusters to be used.
Note: If this value is not provided, the number of clusters will be heuristically determined.
• anomaly_prior (unsupervised only): Expected input/type: between [0,1]. Significance level at which
a point is defined as anomalous. This is only used for unsupervised problems if clustering is
disabled.
• lead_time_days (nbm only): Expected input/type: integer. Default value is 60. The number of days
prior to failure when the behavior starts trending toward either abnormal behavior or failure.
• nbm_window_size (nbm only): Expected input/type: integer. Default value is 256. The number of
sample points to consider for each failure detection.
• nbm (nbm only): Expected input/type: True/False. Default value is False. Set value to True for a
normal behavioral model (NBM).
• failure_dates (nbm only): Expected input/type: string. List of failure dates to use for the calculation.
Currently, only a list of one date can be used in the query. Example date format: "07/01/2015"
• recovery_dates (nbm only): Expected input/type: string. List of recovery dates to use for the
calculation. Currently, only a list of one date can be used in the query. Example date format:
"11/01/2015"
Payload:
{
"dataset_names": ["dataset_name1"],
"job_name": "my_job",
"model_name": "string",
"loss_fn_name": "CrossEntropy",
"fitness_fn_name": "Accuracy",
"max_train_time": "00:01",
"max_epochs": 0,
"recurrent": True,
"impute": "mean",
"drop": "no",
"feature_eng": "mi",
"feature_select": 1,
"outlier": "mad",
"imbalance": True,
"anomaly": False,
"n_clusters": 5,
"anomaly_prior": 0.01,
"lead_time_days": 60,
"nbm_window_size": 256,
"nbm": False,
"return_risk": True,
"failure_dates": ["string"],
"recovery_dates": ["string"],
"scaler": "MinMax",
Page 23

Darwin API User Guide

"target_scaler": "MinMax"
}
Response Codes: 201, 400, 401, 403, 404, 408, 422
Successful Response:
{
"job_name": "nameofjob",
"model_name": "nameofmodel",
}

Request Type: PATCH
URI: /v1/train/model/{model_name}
Headers:
• Authorization: Bearer token
Description: Resume training for a model on the dataset identified by dataset_names.
Parameter Descriptions:
• dataset_names: A list of dataset names to use for training.
Note: Using only 1 dataset is currently supported.
• job_name: The job name
• max_train_time (supervised only): Sets the training time for the model in ‘HH:MM’ format. Default
value is 00:01.
• max_epochs (unsupervised only): Sets the training time for the model in epochs. Default value is 10.
Payload:
{
"dataset_names": ["dataset_name1"],
"job_name": "my_job",
"max_train_time": "00:01",
"max_epochs": 0
}
Response Codes: 201, 401, 403, 404, 408, 422
Successful Response:
{
"job_name": "nameofjob",
"model_name": "nameofmodel",
}

Request Type: DELETE
Page 24

Darwin API User Guide

URI: /v1/train/model/{model_name}
Headers:
• Authorization: Bearer token
Description: Delete a model.
Form Data:
• model_name: - Name of the model to delete.
Response Codes: 204, 400, 401, 403, 404, 408, 422
Successful Response: None

upload
Request Type: POST
URI: /v1/upload
Headers:
• Authorization: Bearer token
Description: Upload a dataset.
Form Data:
• dataset: a dataset file in a supported format (csv, h5)
• dataset_name: the name for the uploaded dataset
Note: If not set, a guid will be provided
Response Codes: 201, 400, 401, 403, 408, 413, 422
Successful Response:
{
"dataset_name": "name_of_dataset"
}

Request Type: DELETE
URI: /v1/upload/{dataset_name}
Headers:
• Authorization: Bearer token
Description: Delete a dataset.
Form Data:
• dataset_name: Name or identifier of dataset to delete.
Page 25

Darwin API User Guide

Response Codes: 204, 401, 403, 404, 422
Successful Response: None

Revision Table
Version

Date

Notes

v 1.0

02-Feb-2018

First Release

v 1.1

15-Feb-2018

added types: supervised and ensembled

v 1.2(pre)

16-Mar-2018

added Status: Type= PATCH

v 1.2

27-Mar-2018

Added or changed:
• /v1/job/status/{job_name}
• /v1/lookup/user
• /v1/lookup/username/{username}
• /v1/train/model
• /v1/run/model/{model_name}/{dataset_name}
Name change: /v1/lookup/client to /v1/lookup/limits

v 1.3

23-May-2018

Added or changed:
• /v1/analyze/model/{model_name}
• /v1/analyze/model/predictions/{model_name}/{dataset_name}
• /v1/auth/email
• /v1/auth/password/reset
• /v1/auth/register
• /v1/train/model
• /v1/train/model/{model_name}
Name change: /v1/lookup/client to /v1/lookup/limits

v 1.3.1

14-Jun-2018

Edits to:
• /v1/job/status/
• /v1/download/artifacts
• Model uses example

v 1.4

31-Jul-2018

Edits to:
• /v1/analyze/model/{model_name}
• /v1/analyze/data/{dataset_name}
• /v1/lookup/model
• /v1/lookup/model/{model_name}
• /v1/train/model
• /v1/train/model/{model_name}

Page 26

Darwin API User Guide

Version

Date

Notes

v 1.5

15-Oct-2018

Added:
• /v1/clean/dataset/{dataset_name}
• /v1/download/dataset/{dataset_name}
• /v1/download/model/{model_name}
Edits to:
• /v1/analyze/data/{dataset_name}
• /v1/lookup/model
• /v1/train/model
• /v1/download/artifacts/{artifact_name}

v 1.6

16-Jan-2019

Added:
• /v1/lookup/model/{model_name}/population
Edits to:
• /v1/analyze/model/predictions/{model_name}/{dataset_name}
• /v1/analyze/model/{model_name}
• /v1/clean/dataset/{dataset_name}
• /v1/download/model/{model_name}
• /v1/train/model
• /v1/run/model/{model_name}/{dataset_name}

Page 27



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 27
Page Mode                       : UseOutlines
Author                          : 
Title                           : SparkCognition Darwin API User Guide
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.19
Create Date                     : 2019:01:16 14:46:10-06:00
Modify Date                     : 2019:01:16 14:46:10-06:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2018) kpathsea version 6.3.0
EXIF Metadata provided by EXIF.tools

Navigation menu