Assemblyline 3.2 Reference Manual

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 67

DownloadAssemblyline 3.2 - Reference Manual
Open PDF In BrowserView PDF
UNCLASSIFIED

ASSEMBLYLINE​ ​3.2
REFERENCE​ ​MANUAL

2016​ ​-​ ​Communication​ ​Security​ ​Establishment

Revision​ ​6

UNCLASSIFIED

TABLE​ ​OF​ ​CONTENTS
Introduction

7

Typical​ ​workflows

7

Direct​ ​user​ ​submission

7

High​ ​volume​ ​ingest

8

Infrastructure

9

Servers

9

Core

9

Datastore

9

Workers

10

Logger​ ​(optional)

11

Support​ ​(optional)

11

Components

12

Dispatcher

12

Middleman

12

Expiry​ ​/​ ​Expiry​ ​workers

13

Journalist

13

Alerter

13

Alert​ ​actions

13

Workflow​ ​filters

14

System​ ​metrics

14

Metricsd

14

Controller

14

Hostagent

14

QuotaSniper

14

Deployments

15

Types

15

Development​ ​VM

15

Appliance

16

Production​ ​cluster

17
Page​ ​|​ ​2

UNCLASSIFIED
Custom​ ​instance
Services
Service​ ​VM
Data​ ​layout

18
19
19
20

Common​ ​fields

20

Buckets

20

Alert

20

blob

20

emptyresult

20

error

20

file

21

filescore

21

node

21

profile

21

result

21

signature

21

submission

21

user

21

Data​ ​security

22

Text​ ​representation

22

Parts

22

Level

22

Required

22

Group

22

Subgroup

22

Managing​ ​the​ ​system

23

Dashboard

23

Management​ ​UI

23

Build​ ​Documentation

23

Configuration

23
Page​ ​|​ ​3

UNCLASSIFIED
Error​ ​Viewer

23

Hosts

23

Profiles

24

Provisioning

24

Services

24

Site​ ​Map

24

Users

25

Virtual​ ​Machines

25

CLI

25
backup​ ​

25

backup​ ​​ ​​ ​[follow]​ ​[force]​ ​

25

delete​ ​​ ​[force]​ ​

26

delete​ ​submission​ ​full​ ​[force]​ ​

26

index​ ​commit​ ​[]

26

restore​ ​

26

signature​ ​change_status​ ​by_query​ ​[force]​ ​​ ​

26

iPython

27

Load​ ​datastore

27

What​ ​to​ ​and​ ​not​ ​to​ ​do​ ​with​ ​a​ ​datastore​ ​object

27

Repairing​ ​Corrupted​ ​eleveldb​ ​Index

28

Seed​ ​demystified

29

seed.auth

29

seed.auth.internal
seed.auth.internal.users
seed.core

30
31
32

seed.core.alerter

32

seed.core.bulk

33

seed.core.dispatcher

34

seed.core.dispatcher.max

34

seed.core.dispatcher.timeouts

35

Page​ ​|​ ​4

UNCLASSIFIED
seed.core.expiry
seed.core.expiry.journal
seed.core.middleman
seed.core.middleman.sampling_at
seed.core.redis

35
35
36
38
39

seed.core.redis.nonpersistent

39

seed.core.redis.persistent

39

seed.datasources

40

seed.datastore

40

seed.datastore.riak

41

seed.datastore.riak.nvals

41

seed.datastore.riak.solr

42

seed.datastore.riak.tweaks

42

seed.filestore

44

seed.installation

45

seed.installation.docker

45

seed.installation.external_packages

45

seed.installation.hooks

46

seed.installation.repositories

46

seed.installation.repositories.realms

46

seed.installation.repositories.repos

47

seed.installation.supplementary_packages

47

seed.logging
seed.logging.logserver

48
48

seed.logging.logserver.elasticsearch

49

seed.logging.logserver.kibana

50

seed.logging.logserver.ssl

51

seed.monitoring

51

seed.services

52

seed.services.limits

52

Page​ ​|​ ​5

UNCLASSIFIED
seed.services.master_list

53

seed.services.timeouts

56

seed.statistics

56

seed.submissions

56

seed.submissions.max
seed.system

57
58

seed.system.classification

59

seed.system.internal_repository

60

seed.system.yara

60

seed.ui

61

seed.ui.ssl
seed.ui.ssl.certs
seed.workers
seed.workers.virtualmachines
seed.workers.virtualmachines.master_list
seed.workers.virtualmachines.master_list.cfg

63
63
64
65
65
65

Page​ ​|​ ​6

UNCLASSIFIED

INTRODUCTION
Assemblyline is a scalable distributed file analysis framework. It is designed to process millions of files per day
but can also be installed on a single server. It is built in python and has a web interface for users to submit
tasks.​ ​Assemblyline​ ​also​ ​provides​ ​a​ ​web​ ​API​ ​for​ ​easy​ ​scripting.
Think of Assemblyline as an in-out service. You have a file, you need to know as much as you can about it and
potentially if it is malicious or not. You send it to Assemblyline, it gets scanned with as many services as
possible and then the data is aggregated for you in a report so you can make a decision about this file. The files
and​ ​reports​ ​in​ ​Assemblyline​ ​have​ ​a​ ​time​ ​to​ ​live,​ ​they​ ​are​ ​not​ ​stored​ ​forever.
Assemblyline can also be integrated in your workflow. It has an ingestion API that can generate alerts for
analysts​ ​to​ ​monitor.

TYPICAL​ ​WORKFLOWS

There are two typical workflows. The first is low volume task submission (Direct user submission). The second is
high​ ​volume​ ​file​ ​ingestion.​ ​These​ ​workflows​ ​are​ ​described​ ​below.

DIRECT​ ​USER​ ​SUBMISSION

In the case of direct submissions, the task received by the API will be sent directly to the dispatcher for
processing. Dispatcher will analyse the type of file submitted and route the submission to the appropriate
workers for analysis. When the workers are done, they save their analysis results directly in the Datastore and
notify the dispatcher that they have completed their task and move on to the next one. Once the dispatcher
has received all completion messages for a given submission, it marks the submission as completed in the
Datastore.
Direct submissions are subject to submission quotas. What this means is that a given user cannot have more
than X concurrent submissions. Submission quotas prevent the system from being overloaded. Although in this
case,​ ​the​ ​user​ ​is​ ​guaranteed​ ​to​ ​get​ ​a​ ​result​ ​for​ ​their​ ​submission.

Page​ ​|​ ​7

UNCLASSIFIED

HIGH​ ​VOLUME​ ​INGEST

For high volume ingest, all submissions are routed from the API entry point to a persistent ingest queue. The
middleman process will pull from this queue, removing duplicates and whitelisting files, to reduce the load on
the system before sending them for processing. For each unique file that middleman encounters, it sends the
submission to the dispatcher. Once a submission makes it to the dispatcher the workflow is the same as a
direct user submission with one addition. When a submission is completed the dispatcher notifies middleman
that the submission is completed. If the submission’s score reaches the system’s alert threshold, middleman
will​ ​task​ ​Alerter​ ​to​ ​create​ ​an​ ​alert​ ​for​ ​the​ ​given​ ​submission.
In this mode, submissions are not subjected to quotas they are instead prioritized and submitted in order of
priority. When the system is overloaded, queued submissions will be randomly deleted to reduce load on the
system. You are not guaranteed to have a result for your submission if the system is overloaded.

Page​ ​|​ ​8

UNCLASSIFIED

INFRASTRUCTURE
The Assemblyline infrastructure consists of 5 different type of servers: Core, Datastore, Worker, Logger and
Support.

SERVERS
CORE
The Core server is the central server. It is responsible for orchestrating the operation of the system. It hosts the
web interface/API; the components responsible for receiving and dispatching tasks to the workers; as well as
caching​ ​the​ ​ ​files​ ​being​ ​processed,​ ​locally..
The​ ​following​ ​off-the-shelf​ ​components​ ​run​ ​on​ ​the​ ​core​ ​server:
●

Redis​ ​(Queuing)

●

ProFTPd​ ​(File​ ​transfer​ ​to​ ​the​ ​workers)

●

NGinx,​ ​UWSGI,​ ​Flask,​ ​Angularjs​ ​(Web​ ​server/API)

●

Gunicorn​ ​(Websocket)

●

HaProxy​ ​(Load​ ​balancing)

The​ ​following​ ​custom​ ​ ​components​ ​(described​ ​later)​ ​also​ ​run​ ​on​ ​the​ ​core​ ​server:
●

Dispatcher​ ​(Tasking)

●

Middleman​ ​(High​ ​volume​ ​data​ ​ingestion)

●

Expiry,​ ​Journalist​ ​(Data​ ​deletion)

●

Alerter,​ ​Alert​ ​Action,​ ​Workflow​ ​Filter​ ​(Alerting)

●

System​ ​metrics,​ ​metricsd​ ​(Infrastructure​ ​health​ ​management)

●

Quota​ ​Sniper​ ​(Quota​ ​and​ ​session​ ​manager)

Page​ ​|​ ​9

UNCLASSIFIED

DATASTORE
The Datastore servers are where the data is stored before it expires. These server use Riak RV to store the data
with​ ​full​ ​text​ ​indexing​ ​using​ ​SOLR.​ ​Riak​ ​is​ ​fully​ ​distributed​ ​with​ ​redundancy​ ​and​ ​is​ ​horizontally​ ​scalable.
The​ ​following​ ​off-the-shelf​ ​components​ ​run​ ​on​ ​the​ ​datastore​ ​servers:
●

Riak​ ​RV​ ​(Data​ ​storage)

●

SOLR​ ​(Full​ ​text​ ​indexing​ ​-​ ​Built-in​ ​Riak​ ​KV)

The​ ​following​ ​custom​ ​components​ ​(described​ ​later)​ ​also​ ​run​ ​on​ ​the​ ​datastore​ ​servers:
●

System​ ​metrics​ ​(Infrastructure​ ​health​ ​management)

WORKERS
The worker boxes are responsible for hosting the services. There are two types of workers: provisioned and
flex.
Provisioned workers load a profile associated to their MAC address from the datastore and spawn the different
services and VMs associated to the profile. They ensure the services and the VMs stay up 100% of the time for
faster​ ​throughput.
Flex workers don't have a specific job to do. Instead they inspect current bottlenecks in the infrastructure and
spawn​ ​services​ ​or​ ​VMs​ ​to​ ​address​ ​the​ ​current​ ​bottleneck(s)..
The​ ​following​ ​custom​ ​components​ ​(described​ ​later)​ ​run​ ​on​ ​the​ ​workers:
●

Controller​ ​(Tasking​ ​for​ ​hostagent)

●

Hostagent​ ​(Running​ ​services​ ​and​ ​VMs)

●

System​ ​metrics​ ​(Infrastructure​ ​health​ ​management)

Page​ ​|​ ​10

UNCLASSIFIED

LOGGER​ ​(OPTIONAL)
The Logger server is where all the metrics and logs from the different servers and components are aggregated.
This server is optional because it is not required for proper processing of the data. That said for cluster
deployment,​ ​it​ ​is​ ​recommended​ ​to​ ​have​ ​a​ ​Logger​ ​server​ ​to​ ​make​ ​sure​ ​the​ ​system​ ​stays​ ​in​ ​good​ ​health.
The​ ​following​ ​off-the-shelf​ ​components​ ​run​ ​on​ ​the​ ​logger:
●

NGinx​ ​(Web​ ​proxy)

●

Kibana​ ​4​ ​(Logs​ ​and​ ​Dashboard​ ​display)

●

Elasticsearch​ ​(Logs​ ​and​ ​metrics​ ​storage)

●

Syslogd​ ​(Logs​ ​aggregator)

●

Logstash​ ​(Parser​ ​for​ ​logs​ ​going​ ​to​ ​Elasticsearch)

●

Filebeat​ ​(File​ ​ingestor​ ​for​ ​Logstash)

The​ ​following​ ​custom​ ​components​ ​(described​ ​later)​ ​also​ ​run​ ​on​ ​the​ ​logger:
●

System​ ​metrics​ ​(Infrastructure​ ​health​ ​management)

SUPPORT​ ​(OPTIONAL)

Support​ ​servers​ ​are​ ​optional​ ​but​ ​some​ ​services​ ​require​ ​external​ ​components​ ​to​ ​be​ ​installed​ ​to​ ​work​ ​properly.
A​ ​support​ ​server​ ​is​ ​useful​ ​for:
●

Storing​ ​the​ ​VMs​ ​disks​ ​for​ ​the​ ​multiple​ ​Assemblyline​ ​VMs​ ​you​ ​might​ ​have​ ​in​ ​your​ ​infrastructure.

●

Hosting​ ​a​ ​Docker​ ​registry​ ​for​ ​the​ ​different​ ​services​ ​that​ ​uses​ ​Docker​ ​containers.

●

Hosting​ ​a​ ​database​ ​that​ ​a​ ​service​ ​can​ ​query​ ​during​ ​execution​ ​(NSRL​ ​for​ ​example)

●

Hosting​ ​an​ ​ICAP​ ​proxy​ ​for​ ​scanning​ ​your​ ​files​ ​through​ ​an​ ​AV​ ​scanner

There are no install procedures for support server, there are no internally built components that run on
support​ ​servers.​ ​They​ ​are​ ​just​ ​another​ ​thing​ ​you​ ​have​ ​to​ ​consider​ ​when​ ​you​ ​do​ ​a​ ​deployment.

Page​ ​|​ ​11

UNCLASSIFIED

COMPONENTS
The following is a list of all the components in the system, their purpose, and how they fit into the
infrastructure.

DISPATCHER
Dispatcher is the core tasking component in the system. It checks the type of the file submitted and the list of
services that are registered and currently online and routes each file to the appropriate service. It keeps track
of children for a given file (zip extraction, etc...) and ensures that a submission is not completed until all the
children have been processed and all files sent to appropriate services. It keeps track of errors in the system
and re-queues jobs if it detects that a failure is recoverable. It is the dispatcher’s job the mark a submission
completed​ ​when​ ​all​ ​work​ ​ ​is​ ​done.
Dispatcher does all its queuing using non-persistent Redis queues. If the dispatcher is restarted, all inflight
submissions are restarted from the beginning. This is usually not a problem because Assemblyline has service
level​ ​caching.
Dispatcher​ ​also​ ​keeps​ ​metrics​ ​on​ ​how​ ​many​ ​files​ ​are​ ​being​ ​completed​ ​in​ ​the​ ​system.

MIDDLEMAN
Middleman is our high volume ingestion component. It takes all submissions created using the ingest API and
1
sorts them into different priority queues (Critical, High, Medium and Low). It then fills half of the dispatcher's
maximum inflight queue with submissions starting with the highest priority queues and continuing until all
queues​ ​are​ ​exhausted.
Middleman can also deal with impossible to finish backlogs. When the queues reach a certain threshold,
middleman will start sampling the queues using method that is increasingly aggressive in proportion to the size
of the backlog to randomly remove submissions from the priority queues to ensure that the queues don't keep
growing​ ​forever.​ ​When​ ​sampling,​ ​submissions​ ​are​ ​removed​ ​from​ ​both​ ​the​ ​front​ ​and​ ​back​ ​of​ ​the​ ​queues.
Middleman also ensures that we don't duplicate work and will dedup submissions before it sends them to the
dispatcher.​ ​It​ ​also​ ​applies​ ​whitelisting​ ​regular​ ​expressions​ ​to​ ​the​ ​metadata​ ​of​ ​the​ ​submissions,​ ​if​ ​provided.
Metrics are reported on number of duplicate, ingested, whitelisted, and completed files as well as the number
bytes ingested and completed. Messages are also sent to Alerter to create alerts for submissions with scores
that​ ​meet​ ​the​ ​system​ ​alert​ ​threshold​ ​threshold​ ​and​ ​where​ ​an​ ​alert​ ​was​ ​requested.

1

The​ ​priority​ ​queues​ ​are​ ​starving​ ​queues.​ ​All​ ​critical​ ​submissions​ ​are​ ​processed​ ​before​ ​starting​ ​highs​ ​and​ ​so​ ​on...

Page​ ​|​ ​12

UNCLASSIFIED

EXPIRY​ ​/​ ​EXPIRY​ ​WORKERS

2

Expiry takes care of data deletion It cleans up of every piece of information that has reached it’s Time To Live
(TTL). Every single piece of information in Assemblyline is tagged with an ​__expiry_ts__ field which dictates the
time at which this information will disappear from the system. Expiry uses SOLR indexing and search for expired
data​ ​then​ ​queues​ ​items​ ​of​ ​data​ ​for​ ​the​ ​expiry​ ​workers​ ​which​ ​in​ ​turn​ ​delete​ ​the​ ​data​ ​from​ ​the​ ​system.
As part of system optimisation, one of the data buckets (emptyresult) does not use search to expire the data.
Instead we use journal files and avoid having to index an insane amount of data that we only use for caching.
This​ ​is​ ​explained​ ​in​ ​more​ ​detail​ ​later​ ​in​ ​the​ ​data​ ​layout​ ​section.
Assemblyline is not designed to be a Knowledge Base. This is why data is expired from the system. Default TTL
for​ ​the​ ​data​ ​is​ ​15​ ​days.

JOURNALIST
Journalist is an expiry system optimization, it takes care of writing journal files for emptyresult items. It reads
those​ ​items​ ​from​ ​a​ ​Redis​ ​queues​ ​and​ ​writes​ ​them​ ​to​ ​timestamped​ ​files​ ​for​ ​expiry​ ​to​ ​process​ ​later.

ALERTER
Alerter is the component responsible for generating alerts. It receives a notification from middleman for all
submissions where an alert was requested and the submission’s score reaches the systems alert threshold.
When creating an alert, alerter gathers the features (Tags) our system has extracted for a submission and
generates an alert based on the mix of these features and the metadata that was part of the original
submission.
Even though submissions are deduped at the middleman level, middleman keeps track of these duplicates and,
in the case of an alert, sends a notification for each duplicate to alerter. Alerter then creates one individual
alert for every file ingested. This way, if one file was seen a 100 times we will have a 100 alerts and the
associated metadata for each individual alert. We can then use all that data to create a threat profile and more
easily​ ​mitigate​ ​the​ ​problem.

ALERT​ ​ACTIONS
Alert actions is used to makes sure two actions taken on a specific alerts happen one after the other and not at
the same time. All API calls or workflow actions to label, change priority, take ownership or change status of an
alert are placed in a Redis queue and dispatched to an internal alert action instance using a deterministic
feature of the alert. This ensures the actions are processed sequentially but still distributed to multiple
processes​ ​for​ ​speed.​ ​Alert​ ​actions​ ​reports​ ​metrics​ ​on​ ​the​ ​number​ ​of​ ​alerts​ ​created.

2

TTL​ ​in​ ​assemblyline​ ​should​ ​be​ ​thought​ ​of​ ​as​ ​DTL​ ​or​ ​Days​ ​To​ ​Live.

Page​ ​|​ ​13

UNCLASSIFIED

WORKFLOW​ ​FILTERS
In the alert page of the system, the analyst can build and save search queries that will then be used for
labelling, changing the priority or changing the status of an alert. Workflow filter runs those queries on newly
inserted alerts and sends messages to alert actions to apply the action described to all alerts matching the
query.

SYSTEM​ ​METRICS
System metrics is in charge of gathering CPU, memory, load, network and many other metrics and shipping
those directly the elasticsearch database on the Logger server. It also gathers Riak and SOLR specific metrics.
This​ ​component​ ​is​ ​only​ ​installed​ ​when​ ​a​ ​Logger​ ​server​ ​is​ ​defined​ ​in​ ​the​ ​seed.

METRICSD
The Metricsd component is in charge of aggregating Assemblyline specific counters reported by middleman,
dispatcher, alerter and hostagent over minute intervals and storing these aggregated counters in the
elasticsearch database on the Logger server. This component is only installed when a Logger server is defined in
the​ ​seed.

CONTROLLER
Controller is an extremely lightweight component that runs on the workers. Controller’s only responsibility is to
start, stop and restart the hostagent component. This functionality is used by the host management page so we
can​ ​restart​ ​the​ ​hostagent​ ​in​ ​batch​ ​without​ ​having​ ​the​ ​log​ ​in​ ​to​ ​the​ ​different​ ​boxes.

HOSTAGENT
Hostagent is the component responsible for reading the worker’s profile from the datastore and loading the
number of services and VMs as described in the profile. It keeps track of each service and VM that it launches
and makes sure that they stay alive. It is also responsible for providing heartbeats to the system to let the UI
and the dispatcher know that the different services are alive and are ready to receive tasks. When the
hostagent tries to instantiate a VM, it will make sure that it actually has the VM disks on the worker host and
download​ ​them​ ​if​ ​they​ ​are​ ​missing.
Hostagents​ ​report​ ​metrics​ ​on​ ​the​ ​effectiveness​ ​of​ ​the​ ​service​ ​caching.

QUOTASNIPER
QuotaSniper is the component that makes sure that the different quotas and sessions for each user expire at
the​ ​right​ ​time.

Page​ ​|​ ​14

UNCLASSIFIED

DEPLOYMENTS
There are different ways to deploy Assemblyline and each type of deployment has a set of minimum
requirements that must be met to ensure a pleasant experience with the system. It should also be noted that
when you install Assemblyline on a server or VM, it takes over that box. Assemblyline does not play nice if
installed alongside other software because it does not share resources. Stay away from custom partitioning.
Just give Assemblyline machines one partition with the full drive or you will run into issues. Assemblyline writes
tons​ ​of​ ​logs​ ​and​ ​makes​ ​heavy​ ​use​ ​of​ ​the​ ​/tmp​ ​folder.​ ​You've​ ​been​ ​warned...

TYPES
The most common deployment types are described below along with particularities and recommendations for
each​ ​type​ ​of​ ​deployment.

DEVELOPMENT​ ​VM
This is the most basic deployment type. You can use this type of deployment when you want to test the system
or​ ​if​ ​you​ ​are​ ​working​ ​on​ ​a​ ​new​ ​service​ ​for​ ​the​ ​system.
On a development VM, you install the Core, Datastore and Worker all on the same box. No need to install
Logger​ ​or​ ​any​ ​support​ ​server​ ​for​ ​that.​ ​(unless​ ​the​ ​service​ ​you're​ ​working​ ​on​ ​requires​ ​a​ ​support​ ​server...)
The​ ​minimum​ ​requirements​ ​for​ ​a​ ​development​ ​VM​ ​are​ ​as​ ​follows:
●

2​ ​cores

●

6​ ​GB​ ​of​ ​ram

●

20​ ​GB​ ​Disk​ ​space

You can follow the documentation on GitHub for step by step commands on how to install your development
VM:
https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/install_development_vm.md
When you’re done installing your development VM, it will be provisioned with the default set of services that can
run out of the box without additional interventions. Because you will do development on this VM, you should run
the​ ​services​ ​you​ ​are​ ​developing​ ​using​ ​the​ ​run_service_live​ ​script.

Page​ ​|​ ​15

UNCLASSIFIED

APPLIANCE
This is basically an extremely light weight deployment. When you send only a few thousand files a day at most.
The​ ​kind​ ​of​ ​installation​ ​to​ ​help​ ​your​ ​analysts​ ​go​ ​through​ ​their​ ​daily​ ​samples.
An appliance installation is a one server installation. Ideally you can get your hands on something a bit bigger
then the spec you'd use for a Development VM. For an Appliance installation, you install the Core, Datastore
and Worker all on the same box like the Development VM. You can even install what you'd install on a support
server​ ​directly​ ​on​ ​that​ ​box​ ​as​ ​well.​ ​Again,​ ​no​ ​need​ ​to​ ​install​ ​Logger​ ​for​ ​such​ ​a​ ​small​ ​install.
An appliance install can still be done inside a VM with better specs than the Development VM but if you do that
you will not be able to run services that need to instantiate a VM (Cuckoo, or any windows based service)
because nested virtualization does not work that well. For this reason, we recommend to do the appliance
install​ ​on​ ​bare​ ​metal.
Recommended​ ​specs​ ​for​ ​an​ ​appliance​ ​install:
●

16+​ ​threads​ ​(8​ ​cores​ ​+​ ​hyper​ ​threading)

●

96+​ ​GB​ ​Ram

●

1+ TB Disk space (Get some RAID 5 configuration because this is your only redundancy) - SSD are
preferred​ ​but​ ​not​ ​required

You​ ​can​ ​follow​ ​the​ ​documentation​ ​on​ ​GitHub​ ​for​ ​step​ ​by​ ​step​ ​commands​ ​on​ ​how​ ​to​ ​install​ ​your​ ​appliance:
https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/install_appliance.md
When you’re done installing your appliance, it will be provisioned with the default set of services that can run out of
the​ ​box​ ​without​ ​additional​ ​interventions.

Page​ ​|​ ​16

UNCLASSIFIED

PRODUCTION​ ​CLUSTER
This is the go big or go home type of install. It can easily do millions of files per day depending on the number
of worker boxes you assign to it. The buy-in for this kind of install is significant. It requires at minimum 12 boxes
if you want something that is manageable and that makes sense. It scales somewhat linearly with the exception
of the core server which cannot be scaled out yet. On our current production cluster this has yet to become an
issue.
The​ ​minimum​ ​setup​ ​for​ ​this​ ​type​ ​of​ ​installation​ ​is:​ ​5​ ​Riak​ ​nodes,​ ​5​ ​workers,​ ​1​ ​core​ ​and​ ​1​ ​logger.
●

Riak recommends for proper distribution of the replicated data that your cluster be at minimum 5
nodes​ ​which​ ​is​ ​why​ ​we​ ​recommend​ ​5​ ​Riak​ ​nodes.

●

We recommend at minimum 5 workers because with the size of the Riak cluster and the speed of
the components on the core server, you'll need at least that many workers to make sense of the 6
boxes​ ​you've​ ​spent​ ​on​ ​Riak​ ​and​ ​Core.

●

Core​ ​cannot​ ​be​ ​scaled​ ​out​ ​yet​ ​therefore​ ​only​ ​one​ ​box​ ​is​ ​assigned​ ​to​ ​it.

●

For this type of install it is highly recommended (see almost mandatory) that you run a Logger
server. If you don't have one, it will be really hard for you to maintain your infrastructure in good
health.

●

You may need a support box, which brings the total to 13 boxes. Although if your Logger box is
beefy​ ​enough​ ​you​ ​can​ ​probably​ ​have​ ​it​ ​fulfill​ ​a​ ​dual​ ​role​ ​as​ ​both​ ​a​ ​Logger​ ​and​ ​Support​ ​box.

●

You could technically create everything inside a virtual environment like VMWare vSphere or even
Amazon AWS but if you do, you'll lose the ability to run virtualize services (Cuckoo, or any windows
based service) from inside your infrastructure which is why we recommend running on bare metal
hardware.

The​ ​recommended​ ​specs​ ​for​ ​the​ ​boxes​ ​detailed​ ​above​ ​are​ ​as​ ​follows:
●

●

1x​ ​Core
o

16+​ ​threads​ ​(8​ ​cores​ ​+​ ​hyper​ ​threading)

o

96+​ ​GB​ ​Ram

o

1+ TB Disk space (Unless you've configured your seed otherwise this server stores your files better​ ​have​ ​lots​ ​of​ ​HD​ ​space.)

5x​ ​Riak
o

16+​ ​threads​ ​(8​ ​cores​ ​+​ ​hyper​ ​threading)

o

96+​ ​GB​ ​Ram

o

1+ TB Disk Space (SSDs are highly recommended for Riak as they will reduce latency spikes
when​ ​the​ ​combination​ ​of​ ​your​ ​data​ ​and​ ​the​ ​search​ ​index​ ​size​ ​does​ ​not​ ​fit​ ​in​ ​OS​ ​cache.)
Page​ ​|​ ​17

UNCLASSIFIED
●

●

5x​ ​Workers
o

Any number of cores (Obviously the bigger the better but this is where you can easily recycle
old​ ​hardware.)

o

4​ ​GB​ ​of​ ​ram​ ​/​ ​core​ ​(This​ ​will​ ​allow​ ​you​ ​better​ ​utilize​ ​your​ ​CPU​ ​power.)

o

10GB​ ​Disk​ ​space​ ​per​ ​logical​ ​core​ ​with​ ​a​ ​minimum​ ​of​ ​256​ ​GB

1x​ ​Logger
o

8+​ ​threads​ ​(4​ ​cores​ ​+​ ​hyper​ ​threading)

o

48+ GB Ram (Depending of how long you want to keep logs and metrics. You might have to set
elasticsearch to use up to 31GB of heap and if the elasticsearch indexes can be kept in OS
cache​ ​the​ ​better...)

o

1+ TB Disk Space (All logs and metrics counters will be shipped to this box and indexed and
some​ ​of​ ​our​ ​logs​ ​are​ ​quite​ ​verbose.)

You​ ​can​ ​follow​ ​the​ ​documentation​ ​on​ ​GitHub​ ​for​ ​step​ ​by​ ​step​ ​commands​ ​on​ ​how​ ​to​ ​install​ ​your​ ​cluster:
https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/install_cluster.md
When you’re done installing your cluster, it will be provisioned with the default set of services that can run out of
the box without additional interventions. However it will most likely be over allocated and extremely un-balanced
because the default profile instantiate one of each service on each workers regardless of their CPU and RAM usage.
You should use the provisioner to provision and balance your cluster because you'll quickly find that it is much easier
than​ ​creating​ ​a​ ​profile​ ​for​ ​each​ ​worker​ ​by​ ​hand.

CUSTOM​ ​INSTANCE

Assemblyline provides a foundation for distributed file analysis but when deploying it to your own
infrastructure you should setup a custom instance so you can add in your own authentication layer (LDAP, PKI,
…); your own branding; and/or any other features particular to you use case or features that may not be
shareable​ ​with​ ​the​ ​rest​ ​of​ ​the​ ​community.
Assemblyline provides a ​create_deployment.py script which you should have ran if you followed any of the
installation documents. This script will ask you questions about your deployment in order to create the
boilerplate for your custom deployment showing you how to add custom installation hooks, custom APIs,
custom​ ​web​ ​pages,​ ​etc…
You can refer to the Assemblyline UI readme file for more details on how to add functionalities to your
deployment:​ ​https://bitbucket.org/cse-assemblyline/al_ui/overview

Page​ ​|​ ​18

UNCLASSIFIED

SERVICES
The premise of Assemblyline is that any command line tool, API or database can be wrapped with a thin python
layer we call services. Services are easy to write. We provide a quick guide on how to add services to the
system​ ​in​ ​our​ ​documentation:
https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/create_new_service.md

If you create a new service that can be shared with the rest of the community, you are encouraged to share the
service​ ​with​ ​ ​a​ ​pull​ ​request.
Assemblyline ships with more than 30 services which you can use in your deployment. Some of these service
require external licences or access to external datasets. It is your responsibility to make sure you have all the
appropriate​ ​license​ ​agreements​ ​in​ ​place​ ​for​ ​the​ ​ ​services​ ​you​ ​intend​ ​to​ ​use.

SERVICE​ ​VM

Assemblyline provides a means to use virtual machines to run services that could be damaging to your
infrastructure (for example, they execute the file that they are scanning), that require special packages installed
that may conflict with other service packages or that simply cannot execute from inside a Ubuntu 14.04 host
(e.g.​ ​windows​ ​only​ ​services).
Creating​ ​a​ ​service​ ​VM​ ​can​ ​be​ ​done​ ​in​ ​a​ ​few​ ​simple​ ​steps:
●

Create​ ​a​ ​base​ ​disk​ ​for​ ​the​ ​OS​ ​you'll​ ​run​ ​the​ ​service​ ​on​ ​(if​ ​it​ ​does​ ​not​ ​exist​ ​already)
https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/build_vm_base_disk_ubuntu_14.04.md
https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/build_vm_base_disk_windows_10.md

●

Derive​ ​a​ ​VM​ ​from​ ​the​ ​base​ ​disk​ ​and​ ​install​ ​dependencies
https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/build_worker_vm.md

●

Drop the disk in the ​vmm/disks/ directory on the server referenced in the support_urls section of the
seed

●

Create a Virtual Machine entry in the management interface for your service that reference the disk
that​ ​you've​ ​just​ ​created.

●

From then on, in the provisioner or from the profile editor, you can add instances of your VM and then
hostagent​ ​will​ ​download​ ​the​ ​VM​ ​disk​ ​and​ ​run​ ​it.

Page​ ​|​ ​19

UNCLASSIFIED

DATA​ ​LAYOUT

Assemblyline data is stored in Riak KV using a key as a unique identifier for a JSON object. The data is stored in
different​ ​buckets​ ​with​ ​distinct​ ​ ​indexes​ ​and​ ​replication​ ​factors.

COMMON​ ​FIELDS

Buckets​ ​have​ ​a​ ​set​ ​of​ ​common​ ​fields​ ​related​ ​to​ ​data​ ​expiry​ ​and​ ​data​ ​security.
Buckets that have their data expired after a certain TTL will have and ​__expiry_ts__ ​field at the root level in
their​ ​JSON​ ​document​ ​which​ ​dictates​ ​the​ ​time​ ​at​ ​which​ ​the​ ​data​ ​will​ ​be​ ​expired.
Buckets that are subject to object level data security will have the following fields at the root level in their JSON
document:​ ​__access_grp1__,​ ​__access_grp2__,​ ​__access_lvl__,​ ​__access_req__​ ​and​ c​ lassification​.

BUCKETS
These​ ​are​ ​the​ ​different​ ​types​ ​of​ ​bucket​ ​you​ ​will​ ​find​ ​in​ ​the​ ​datastore:

ALERT
The alert bucket stores metadata about the submissions that were ingested through the ingest API and that
meet the scoring threshold for an alert-able submission. This bucket has a non-fixed set of fields and the index
for​ ​this​ ​bucket​ ​features​ ​a​ ​catch-all​ ​field​ ​to​ ​index​ ​all​ ​fields​ ​as​ ​strings.
Any​ ​and​ ​all​ ​submission​ ​metadata​ ​is​ ​preserved​ ​as​ ​part​ ​of​ ​the​ ​alert.
BLOB

The blob bucket is one of the few non-indexed buckets. It stores some configuration data and some aggregated
statistics​ ​about​ ​the​ ​system.
EMPTYRESULT

The emptyresult bucket is used for service level caching in the case where the service had nothing to report.
Data in the emptyresult bucket is not indexed and is deleted using a special journaling technique to make it
easier​ ​on​ ​Riak​ ​and​ ​SOLR.
ERROR

The error bucket is used to store errors reported by the different services while scanning a given file. This
bucket​ ​is​ ​fully​ ​indexed​ ​so​ ​admins​ ​can​ ​search​ ​for​ ​errors​ ​in​ ​the​ ​system.

Page​ ​|​ ​20

UNCLASSIFIED
FILE

The file bucket is used to store information about a given file like the different hashes, its size, its type, ...
basically any static properties of the file. The file bucket is also use to keep the file storage in sync with the
datastore. Every time a file object is deleted from the datastore, it's file equivalent on disk on the filestore is
deleted​ ​as​ ​well.​ ​This​ ​bucket​ ​is​ ​fully​ ​indexed​ ​so​ ​anyone​ ​can​ ​search​ ​for​ ​file​ ​properties​ ​in​ ​the​ ​system.
FILESCORE

The filescore bucket is used for submission caching by the ingest API. It tracks each file ingested along with
their associated resulting score so if the same file is ingested multiple times in a short period of time, it will not
rescan the file but use its previous results. This bucket indexed only on the __expiry_ts__ field for expiry to
delete​ ​the​ ​data.
NODE

The node bucket stores information about the different worker nodes that are registered. This bucket is fully
indexed​ ​but​ ​only​ ​admins​ ​are​ ​allowed​ ​to​ ​search​ ​it.
PROFILE

The profile bucket stores the different service allocation profiles that the worker nodes can use. Profiles usually
contain the number of each service that will run directly on the host and number of VMs of each service that
the​ ​host​ ​will​ ​launch.​ ​This​ ​bucket​ ​is​ ​fully​ ​indexed​ ​but​ ​only​ ​admins​ ​are​ ​allowed​ ​to​ ​search​ ​it.
RESULT

The result bucket stores the result of a given service running on a given file. This bucket is fully indexed so
anyone​ ​can​ ​search​ ​for​ ​results​ ​in​ ​the​ ​system.
SIGNATURE

The signature bucket stores the different Yara signatures that our Yara service can run. The Yara signatures are
stored into a special JSON encapsulation that makes it easier to enforce our metadata requirements from our
malware​ ​standard.​ ​This​ ​bucket​ ​is​ ​fully​ ​indexed​ ​so​ ​anyone​ ​can​ ​search​ ​for​ ​signatures​ ​in​ ​the​ ​system.
SUBMISSION

The submission bucket store details about a submission to the system. Optional metadata can be provided if
using the ingest API which will then be propagated into the alerts when the alerting score threshold is reached.
This bucket has fields and is fully indexed except a metadata section where a catch-all field indexes all fields
found​ ​as​ ​strings.
USER

The user bucket stores the user’s details, access controls and options. It is fully indexed but only admins are
allowed​ ​to​ ​search​ ​it.
Page​ ​|​ ​21

UNCLASSIFIED

DATA​ ​SECURITY

Assemblyline has object level security built into all API endpoints. The data security model is referred to as
classification and is based on the Canadian classification marking model. It has been made generic so you can
create your own classification model. The classification model is saved into 5 fields that are used by either SOLR
or​ ​python​ ​to​ ​only​ ​serve​ ​data​ ​to​ ​users​ ​that​ ​are​ ​allowed​ ​to​ ​see​ ​it.
In the case of SOLR, a filter query is added to all queries made to SOLR to filter out the data that the user is not
allowed to see. For direct Riak KV access, the classification string is compared against the user's maximum
classification​ ​to​ ​be​ ​sure​ ​he​ ​has​ ​access​ ​to​ ​the​ ​data.

TEXT​ ​REPRESENTATION

The text representation of the classification if saved into the 'classification' field in the JSON document. This
field is used to display the classification to the user and is also used by the python part of the classification
engine​ ​to​ ​validate​ ​user​ ​access​ ​to​ ​data.

PARTS
The classification can be exploded in 4 different parts that each serve different purposes. Each part has a name
and​ ​short​ ​name​ ​value​ ​to​ ​make​ ​it​ ​human​ ​readable.

LEVEL
Level is a 0 to X integer value where the user has to have a number greater or equal to the object they are
trying​ ​to​ ​get​ ​access.​ ​ ​This​ ​value​ ​is​ ​saved​ ​in​ ​the​ ​__access_lvl__​ ​field​ ​in​ ​the​ ​JSON​ ​document.

REQUIRED
The required part of the classification is a list of tokens that the user must have in order to access to the
object.​ ​This​ ​value​ ​is​ ​saved​ ​in​ ​the​ ​__access_req__​ ​field​ ​in​ ​the​ ​JSON​ ​document.

GROUP
The group part of the classification is used more as a dissemination list. A user that tries to access an object
must be part of at least the groups the object disseminates to. If no groups are specified on an object, it means
that​ ​all​ ​groups​ ​have​ ​access​ ​to​ ​it.​ ​ ​ ​This​ ​value​ ​is​ ​saved​ ​in​ ​the​ ​__access_grp1__​ ​field​ ​in​ ​the​ ​JSON​ ​document.

SUBGROUP
The subgroup part of the classification is used as a second level dissemination list. A user that tries to access an
object must be part of at least the subgroups the object disseminates to. If no subgroups are specified on an
object, it means that all subgroups have access to it. This value is saved in the ​__access_grp2__ field in the
JSON​ ​document.
Page​ ​|​ ​22

UNCLASSIFIED

MANAGING​ ​THE​ ​SYSTEM

There are numerous ways to manage the system. We've tried to build as much functionality in the UI as
possible but where the UI falls short, we have a couple of ways of other ways to fix the system or to perform
some​ ​routine​ ​management​ ​tasks.

DASHBOARD
The dashboard is the only piece of management that is accessible to all users of the system. It gives you an
indication of how busy the system is and which components are falling behind. You get one card for
middleman' s health, one for dispatcher's and one for each of the services currently loaded in the seed. It is the
basic “green card is OK, red you have a problem”-type of interface. When a card goes red, the part that is
problematic​ ​will​ ​be​ ​highlighted​ ​so​ ​you​ ​can​ ​easily​ ​what​ ​is​ ​wrong​ ​in​ ​the​ ​system.

MANAGEMENT​ ​UI

The​ ​management​ ​UI​ ​should​ ​help​ ​you​ ​perform​ ​90​ ​%​ ​of​ ​your​ ​administrative​ ​needs.

BUILD​ ​DOCUMENTATION
The Build documentation section is where you'll find step by step instructions on how to create services, how
to create service VMs, how to build an appliance or a cluster, etc... You can add your own build documentation
to​ ​this​ ​section​ ​by​ ​adding​ ​markdown​ ​documents​ ​to​ ​/opt/al/pkg/private/docs/​.

CONFIGURATION
The configuration section is where you can edit your currently deployed seed. You can also use this page to diff
your current running seed with the seed you've used for installation and the seed as it is saved in the code. The
idea is that you can use this section to tweak parameters of the system and that when you are comfortable
with them, you diff your running seed with the seed in the code and you apply the changes that are working
well​ ​in​ ​the​ ​live​ ​seed​ ​to​ ​the​ ​seed​ ​in​ ​the​ ​code.

ERROR​ ​VIEWER
The error viewer is a place where you can search through the errors that the different services generated so
you​ ​can​ ​fix​ ​those​ ​error​ ​once​ ​and​ ​for​ ​all.

HOSTS
The hosts management page is used to display the health of all the physical workers and virtual machines of
the system. For each host, it displays CPU usage, RAM Usage, Disk Usage, Number of services and Number of
VMs the host is running. Each card in the hosts management page can be flipped to reveal the full list of
services and VMs on the host. It also reveals buttons that you can use to interact with the hostagent service of
each worker. You can restart, stop and start the hostagent as well as pause and resume task execution. The
flipped​ ​side​ ​of​ ​the​ ​card​ ​also​ ​lets​ ​you​ ​enable/disable​ ​a​ ​host​ ​or​ ​change​ ​its​ ​profile​ ​if​ ​you​ ​click​ ​the​ ​expand​ ​button.

Page​ ​|​ ​23

UNCLASSIFIED
When multiple cards are flipped, there are buttons at the top of the interface that will let you interact with all
the flipped cards at the same time. This is especially handy to restart all hostagents and the same time when
you​ ​push​ ​a​ ​patch​ ​to​ ​a​ ​given​ ​service.
Cards​ ​in​ ​the​ ​host​ ​management​ ​page​ ​are​ ​color​ ​coded:
●
●
●
●

Green​ ​cards​ ​mean​ ​that​ ​all​ ​required​ ​processes​ ​are​ ​running​ ​like​ ​they​ ​are​ ​supposed​ ​to
Gray​ ​cards​ ​mean​ ​that​ ​the​ ​hosts​ ​is​ ​either​ ​disabled​ ​or​ ​the​ ​hostagent​ ​is​ ​not​ ​running
Orange cards mean the hostagent is running but the controller isn’t this mean that the buttons to
interact​ ​with​ ​the​ ​host​ ​won't​ ​work​ ​on​ ​this​ ​host,​ ​you'll​ ​have​ ​to​ ​SSH​ ​to​ ​the​ ​box
Red​ ​cards​ ​mean​ ​that​ ​the​ ​host​ ​is​ ​unreachable,​ ​both​ ​hostagent​ ​and​ ​controller​ ​are​ ​down

PROFILES
The profile management page lets you edit the different profiles each node is running. For each profile, you can
set the number of each service you'd like to run and the number of each VMs you'd like to run. You can even
set​ ​configuration​ ​overrides​ ​for​ ​each​ ​service​ ​in​ ​case​ ​one​ ​box​ ​does​ ​not​ ​work​ ​like​ ​the​ ​rest.
That said you should not really have to use this page at all, the preferred way to provision the system is by
using the automatic provisioner. It is much simpler to use and it prevents you from over-provisioning the
worker​ ​nodes​ ​and​ ​potentially​ ​reducing​ ​the​ ​performance​ ​of​ ​the​ ​system.
In the case where you are running an appliance, the recommended way to provision an appliance is to add
services or VMs to the ​al-worker-default profile created at installation and to restart the hostagent from the
hosts​ ​management​ ​page.

PROVISIONING
The provisioning page gives you an overview of how many cores and how much ram you have in your
infrastructure then lets you provision the different services, VMs and flex nodes without letting you
overprovision​ ​the​ ​cluster.​ ​This​ ​is​ ​the​ ​easiest​ ​way​ ​to​ ​the​ ​provision​ ​the​ ​cluster​ ​when​ ​you​ ​have​ ​a​ ​lot​ ​of​ ​nodes.

SERVICES
The service management page lets you change the properties of a service. It is the interface you want to use to
change the configuration variable of a service. When changing properties through this interface, you are in fact
changing​ ​properties​ ​inside​ ​the​ ​seed.​ ​You​ ​could​ ​technically​ ​use​ ​the​ ​seed​ ​editor​ ​and​ ​get​ ​the​ ​same​ ​results.
The​ ​different​ ​properties​ ​of​ ​the​ ​services​ ​will​ ​be​ ​explained​ ​in​ ​the​ ​'Seed​ ​demystified'​ ​section​ ​of​ ​this​ ​document.

SITE​ ​MAP

The​ ​site​ ​map​ ​gives​ ​you​ ​a​ ​lists​ ​of​ ​all​ ​pages​ ​and​ ​API​ ​with​ ​their​ ​corresponding​ ​function​ ​in​ ​the​ ​code​ ​as​ ​well​ ​as:
●
●
●
●

Which​ ​HTTP​ ​methods​ ​you​ ​can​ ​use​ ​on​ ​this​ ​page
Is​ ​it​ ​protected​ ​by​ ​authentication?
Does​ ​it​ ​require​ ​admin​ ​privileges?
Are​ ​the​ ​page​ ​parameters​ ​are​ ​audited​ ​in​ ​the​ ​UI​ ​audit​ ​logs?

Page​ ​|​ ​24

UNCLASSIFIED

USERS
The user management page gives you a way to edit the settings of the different users or to add/remove users
from​ ​the​ ​system.
Note: By default, Assemblyline uses it’s own internal authentication. You can override the authentication layer
with​ ​your​ ​own​ ​authentication.

VIRTUAL​ ​MACHINES
The Virtual Machines management page lets you change the properties of an AL VM. This is the interface you
will want to use to change the configuration variables for VMs. When changing properties through this
interface, you are in fact changing properties inside the seed. You could technically use the seed editor and get
the​ ​same​ ​results.
The​ ​different​ ​properties​ ​of​ ​the​ ​VMs​ ​will​ ​be​ ​explained​ ​in​ ​the​ ​'Seed​ ​demystified'​ ​section​ ​of​ ​this​ ​document.

CLI
Even though the majority of the management tasks can be done in the UI, there is the occasional time where
you'll have to go through the CLI to right some wrong or to backup/restore data in the system. The
Assemblyline CLI provides loads of functionalities that are extremely dangerous and could lead to disastrous
consequences​ ​for​ ​your​ ​production​ ​infrastructure.​ ​Use​ ​with​ ​caution...
We will explain some features of the CLI that you may have to use one day. For the rest, you can always run
al_cli​ ​help​ ​or​ ​al_cli​ ​help​ ​​ ​for​ ​more​ ​detailed​ ​help.
BACKUP​

​

This backup command will backup the data from all system buckets to a destination folder. The destination
folder​ ​must​ ​be​ ​placed​ ​in​ ​a​ ​directory​ ​where​ ​the​ ​AL​ ​user​ ​can​ ​write,​ ​maybe​ ​you​ ​should​ ​backup​ ​to​ ​/opt/al​ ​or​ ​/tmp​.
BACKUP​

​​ ​​ ​[FOLLOW]​ ​[FORCE]​ ​

This backup command lets you issue a SOLR query against a given bucket then backs up all content that
matched your query. The query part of this command must be placed into quotes if the query contains spaces.
Once you run the command, it will show you a piece of the data that matches your query and give you a count
of how many objects will be backed up. It will ask you to confirm that you indeed want to proceed with the
backup.
Note: If​ ​you​ ​want​ ​to​ ​skip​ ​the​ ​confirmation,​ ​you​ ​add​ ​the​ ​force​​ ​keyword​ ​to​ ​the​ ​command
You can also create a follow backup by adding the ​follow keyword to the command. A follow backup is
an intelligent backup that will not only backup the bucket where you run the query but will also find
any links between the objects that match your query and other objects in other buckets that are
related to what matches your query. A follow backup of a submission, for example, will also backup:
file,​ ​result,​ ​emptyresult​ ​and​ ​error.

Page​ ​|​ ​25

UNCLASSIFIED
DELETE​

​​ ​[FORCE]​ ​

This command is especially useful to get rid of bad data in your system. Do a simple SOLR query to isolate the
data​ ​you​ ​want​ ​to​ ​delete​ ​in​ ​a​ ​specific​ ​bucket​ ​and​ ​this​ ​command​ ​will​ ​delete​ ​it​ ​all.
DELETE​ S​ UBMISSION​ F​ ULL​ ​[FORCE]​

​

This command will help you fix the problem where someone submitted a massive number of files that they
were not supposed to or they simply misclassified their submissions. Write a simple SOLR query that isolates
those submissions and this command will delete the submissions and all other associated objects removing any
trace​ ​that​ ​the​ ​submission​ ​ever​ ​existed.
Note​:

The full keyword only works on the submission bucket. Also, you can delete a single submission using
the​ ​web​ ​interface​ ​while​ ​viewing​ ​the​ ​submission.

INDEX​ C
​ OMMIT​

​[]

Some buckets have their index committed only every 15 minutes for performance reason. This command
instructs SOLR to commit all its indexes now instead of waiting for the commit timeout. Optionally you can
specify​ ​the​ ​bucket​ ​you​ ​need​ ​the​ ​indexes​ ​to​ ​be​ ​committed.
RESTORE​

​

This​ ​command​ ​is​ ​used​ ​to​ ​restore​ ​backups​ ​created​ ​using​ ​the​ ​backup​ ​command.
SIGNATURE​ C
​ HANGE_STATUS​ B​ Y_QUERY​ ​[FORCE]​

​​ ​

This command allows you to switch a bunch of signatures at the same time to another status. Let's say you
have 100 signature in STAGING that you want to promote to DEPLOYED, you can simply write a SOLR query that
can isolate those signatures and run that command to promoted them all at the same time instead of doing
that​ ​one​ ​signature​ ​at​ ​the​ ​time​ ​in​ ​the​ ​UI.

Page​ ​|​ ​26

UNCLASSIFIED

IPYTHON
In the rare case where some data would be corrupted by either a patch gone wrong or something else, you
might have to programmatically fix the datastore by iterating through its data. iPython will be your friend in
those cases. We will provide you with the necessary lines of python to connect to the datastore and a few
pointers​ ​regarding​ ​what​ ​functions​ ​from​ ​the​ ​datastore​ ​not​ ​to​ ​use​ ​on​ ​a​ ​production​ ​cluster.

LOAD​ ​DATASTORE

It​ ​is​ ​extremely​ ​simple​ ​to​ ​load​ ​the​ ​datastore​ ​in​ ​iPython.​ ​Simply​ ​type​ ​the​ ​two​ ​following​ ​lines:
from​​
assemblyline.al.common​​
import​​
forge
ds​​
=​​
forge.get_datastore()

This​ ​will​ ​give​ ​you​ ​a​ ​datastore​ ​object​ ​"ds".
Note​: In the rare occasion that your seed is corrupted in a way where forge refuses to load. You can assign a
static​ ​seed​ ​to​ ​be​ ​used​ ​to​ ​load​ ​the​ ​forge​ ​object​ ​when​ ​launching​ ​iPython:
~#​​
AL_SEED_STATIC=python.path.to.seed​​
ipython

WHAT​ ​TO​ ​AND​ ​NOT​ ​TO​ ​DO​ ​WITH​ ​A​ ​DATASTORE​ ​OBJECT
With your datastore object you should avoid any list_debug_keys() function because they can completely take
down​ ​your​ ​cluster​ ​if​ ​you​ ​have​ ​too​ ​much​ ​data.
Also​ ​avoid​ ​_stream_bucket_keys()​ ​function​ ​because​ ​this​ ​can​ ​also​ ​take​ ​down​ ​your​ ​cluster.
Try to be as precise as possible and if you need to go through a lot of data, use the stream_search function to
get​ ​all​ ​the​ ​keys​ ​related​ ​to​ ​your​ ​data.
Remember that Riak is a key/value pair datastore. To edit an object, you need to know it's key first. For each of
the buckets in the datastore, a datastore object should have a get_ and a save_ function to
get​ ​an​ ​object​ ​using​ ​a​ ​key​ ​and​ ​to​ ​save​ ​that​ ​object​ ​back​ ​to​ ​the​ ​datastore.
If​ ​you’re​ ​in​ ​that​ ​deep,​ ​the​ ​only​ ​other​ ​thing​ ​we​ ​can​ ​add​ ​is,​ ​good​ ​luck!

Page​ ​|​ ​27

UNCLASSIFIED

REPAIRING​ ​CORRUPTED​ ​ELEVELDB​ ​INDEX

In​ ​the​ ​event​ ​of​ ​major​ ​hardware​ ​or​ ​filesystem​ ​problems,​ ​LevelDB​ ​can​ ​become​ ​corrupted.​ ​These​ ​failures​ ​are
uncommon,​ ​but​ ​they​ ​could​ ​happen,​ ​as​ ​heavy​ ​loads​ ​can​ ​push​ ​I/O​ ​limits.
To​ ​check​ ​whether​ ​one​ ​of​ ​your​ ​nodes​ ​has​ ​corrupted​ ​eleveldb​ ​index,​ ​you​ ​will​ ​need​ ​to​ ​run​ ​a​ ​shell​ ​command​ ​that
searches​ ​for​ ​Compaction​ ​Error​ ​in​ ​each​ ​LOG​ ​file.
find​/
​var/lib/riak/leveldb​​name​"
​LOG"​​exec​g
​rep​​l​'
​Compaction​e
​rror'​{
​}​\
​;

If​ ​there​ ​are​ ​compaction​ ​errors​ ​in​ ​any​ ​of​ ​your​ ​vnodes,​ ​those​ ​will​ ​be​ ​listed​ ​in​ ​the​ ​console.​ ​If​ ​any​ ​vnode​ ​has
experienced​ ​such​ ​errors,​ ​you​ ​would​ ​see​ ​output​ ​like​ ​this:
/var/lib/riak/leveldb/​
442446784738847563128068650529343492278651453440​
/LOG
/var/lib/riak/leveldb/​
442446784738847563128068650529343492278651453441​
/LOG

When​ ​you​ ​have​ ​discovered​ ​corruption​ ​in​ ​your​ ​LevelDB​ ​backend,​ ​follow​ ​the​ ​steps​ ​below​ ​to​ ​heal​ ​your​ ​corrupted
LevelDB:​ ​(​Note:​ ​Repairing​ ​eleveldb​ ​indexes​ ​may​ ​take​ ​several​ ​minutes.)
1. Stop​ ​the​ ​node:
riak​s
​top

2. Open​ ​an​ ​erlang​ ​shell​ ​to​ ​run​ ​the​ ​eleveldb:repair​​ ​function:
`riak​e
​rtspath`/erl

3. Once​ ​in​ ​the​ ​shell,​ ​run​ ​the​ ​following​ ​commands​ ​one​ ​by​ ​one,​ ​make​ ​sure​ ​the​ ​VNodeList​ ​variable​ ​reflects
the​ ​output​ ​of​ ​the​ ​find​ ​command:​ ​(commands​ ​ends​ ​with​ ​a​ ​‘.’)
application:set_env(eleveldb,​d
​ata_root,​"
​").
Options​=
​​[
​].
DataRoot​=
​​"
​/var/lib/riak/leveldb".
VNodeList​=
​​[
​"442446784738847563128068650529343492278651453440",
"442446784738847563128068650529343492278651453441"].
RepairPath​=
​​f
​un(DataRoot,​V
​NodeNumber)​​>​P
​ath​=
​​l
​ists:flatten(DataRoot​+
​+​"
​/"​+
​+
VNodeNumber),​i
​o:format("Repairing​~
​s.~n",[Path]),​P
​ath​e
​nd.
[eleveldb:repair(RepairPath(DataRoot,​V
​NodeList),​O
​ptions)​|
​|​V
​NodeNumber​<
​VNodeList].

4. Stop​ ​erlang​ ​shell:
ctrl​+
​​g
​
q

5. Restart​ ​riak:
riak​s
​tart

Page​ ​|​ ​28

UNCLASSIFIED

SEED​ ​DEMYSTIFIED

The seed is basically a one stop shop for the configuration of your infrastructure. It is a giant JSON dictionary
that​ ​contains​ ​and​ ​the​ ​installation​ ​parameters​ ​and​ ​the​ ​live​ ​values.​ ​It​ ​is​ ​comparable​ ​to​ ​the​ ​Windows​ ​registry.
We will explain you every single parameter of the seed, their impact and effects on your system after
installation and which components of our infrastructure they affect. If a setting is used at installation, changing
it on a live system will have no effects but if the setting is used live, you need only restart the affected
components​ ​and​ ​the​ ​setting​ ​will​ ​be​ ​in​ ​effect.

SEED.AUTH
seed.auth.allow_2fa
Description:​ ​Turn​ ​on/off​ ​2-Factor​ ​authentication​ ​via​ ​One-Time​ ​password​ ​(Google​ ​Authenticator,​ ​Microsoft​ ​Authenticator...)
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​UWSGI

seed.auth.allow_apikeys
Description:​ ​If​ ​‘true’,​ ​allow​ ​users​ ​to​ ​create​ ​apikeys​ ​to​ ​log​ ​into​ ​the​ ​system​ ​via​ ​scripts​ ​instead​ ​of​ ​using​ ​their​ ​personal​ ​creds.
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​UWSGI

seed.auth.allow_u2f
Description:​ ​Turn​ ​on/off​ ​2-Factor​ ​authentication​ ​via​ ​FIDO​ ​U2F​ ​certified​ ​hardware​ ​tokens
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​UWSGI

seed.auth.apikey_handler
Description:​ ​Python​ ​class​ ​path​ ​to​ ​the​ ​method​ ​used​ ​to​ ​handle​ ​apikey​ ​login​ ​requests.
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​UWSGI

Page​ ​|​ ​29

UNCLASSIFIED
seed.auth.dn_handler
Description:​ ​Python​ ​class​ ​path​ ​to​ ​the​ ​method​ ​used​ ​to​ ​transform​ ​a​ ​DN​ ​into​ ​a​ ​username.
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​UWSGI

seed.auth.encrypted_login
Description:​ ​Turn​ ​on/off​ ​second​ ​level​ ​encryption​ ​for​ ​all​ ​passwords​ ​send​ ​to​ ​the​ ​login​ ​API.​ ​(Man​ ​in​ ​the​ ​middle​ ​protection)
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​UWSGI

seed.auth.userpass_handler
Description:​ ​Python​ ​class​ ​path​ ​to​ ​the​ ​method​ ​used​ ​to​ ​handle​ ​user/password​ ​login​ ​requests.
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​UWSGI
SEED.AUTH.INTERNAL

seed.auth.internal.enabled
Description:​ ​Enable​ ​or​ ​not​ ​internal​ ​authentication
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​uwsgi

seed.auth.internal.failure_ttl
Description:​ ​Amount​ ​of​ ​seconds​ ​a​ ​user​ ​will​ ​be​ ​disabled​ ​after​ ​too​ ​many​ ​failed​ ​login​ ​attempts
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.auth.internal.max_failures
Description:​ ​Maximum​ ​number​ ​of​ ​login​ ​failures​ ​before​ ​the​ ​user​ ​account​ ​is​ ​locked​ ​down
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

Page​ ​|​ ​30

UNCLASSIFIED
seed.auth.internal.strict_requirements
Description:​ ​Enable​ ​or​ ​not​ ​strict​ ​password​ ​requirements​ ​(when​ ​users​ ​change​ ​their​ ​password)
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi
SEED.AUTH.INTERNAL.USERS

seed.auth.internal.users..classification
Description:​ ​Max​ ​classification​ ​for​ ​the​ ​given​ ​user
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.auth.internal.users..groups
Description:​ ​List​ ​of​ ​groups​ ​the​ ​user​ ​is​ ​a​ ​member​ ​of
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.auth.internal.users..is_admin
Description​:​ ​If​ ​the​ ​user​ ​is​ ​admin​ ​or​ ​not
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.auth.internal.users..name
Description:​ ​Full​ ​name​ ​of​ ​the​ ​user​ ​to​ ​create​ ​this​ ​user​ ​at​ ​installation
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.auth.internal.users..password
Description​:​ ​Password​ ​used​ ​for​ ​the​ ​user
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

Page​ ​|​ ​31

UNCLASSIFIED
seed.auth.internal.users..uname
Description:​ ​Username​ ​of​ ​the​ ​user
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​​ ​Riak

SEED.CORE
seed.core.nodes
Description:​ ​IPs​ ​or​ ​domains​ ​names​ ​for​ ​the​ ​core​ ​nodes.​ ​This​ ​server​ ​is​ ​not​ ​scalable​ ​yet​ ​so​ ​it​ ​needs​ ​to​ ​have​ ​only​ ​one​ ​entry.
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All
SEED.CORE.ALERTER

seed.core.alerter.constant_alert_fields
Description:​ ​List​ ​of​ ​fields​ ​where​ ​their​ ​content​ ​should​ ​not​ ​change​ ​during​ ​an​ ​alert​ ​update.
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​alert_action

seed.core.alerter.create_alert
Description:​ ​Python​ ​class​ ​path​ ​to​ ​the​ ​function​ ​used​ ​to​ ​generate​ ​an​ ​alert
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​alerter

seed.core.alerter.default_group_field
Description:​ ​Default​ ​field​ ​used​ ​for​ ​grouping​ ​in​ ​the​ ​alerts​ ​perspective​ ​in​ ​the​ ​UI
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.core.alerter.filtering_group_fields
Description:​ ​List​ ​of​ ​possible​ ​fields​ ​used​ ​for​ ​grouping​ ​the​ ​data​ ​that​ ​reduce​ ​the​ ​total​ ​dataset
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

Page​ ​|​ ​32

UNCLASSIFIED
seed.core.alerter.metadata_aliases
Description:​ ​Dictionary​ ​of​ ​aliases​ ​for​ ​the​ ​different​ ​metadata​ ​fields​ ​in​ ​an​ ​alert
Datatype:​ ​Dict[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.core.alerter.metadata_fields
Description:​ ​Dictionary​ ​that​ ​enforces​ ​field​ ​data​ ​types​ ​for​ ​the​ ​alerts.
Datatype:​ ​Dict[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.core.alerter.non_filtering_group_fields
Description:​ ​List​ ​of​ ​possible​ ​fields​ ​used​ ​for​ ​grouping​ ​the​ ​data​ ​that​ ​does​ ​not​ ​reduce​ ​the​ ​dataset
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.core.alerter.shards
Description:​ ​Number​ ​of​ ​alerter​ ​instances​ ​that​ ​will​ ​be​ ​launched​ ​at​ ​start​ ​to​ ​be​ ​able​ ​to​ ​handle​ ​the​ ​processing​ ​load
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​alerter
SEED.CORE.BULK

seed.core.bulk.compute_notice_field
Description:​ ​Generates​ ​computed​ ​fields​ ​based​ ​off​ ​of​ ​underlying​ ​metadata​ ​fields
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​alerter

seed.core.bulk.get_whitelist_verdict
Description:​ ​Python​ ​class​ ​path​ ​to​ ​the​ ​function​ ​used​ ​to​ ​check​ ​if​ ​a​ ​file​ ​should​ ​be​ ​whitelisted
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

Page​ ​|​ ​33

UNCLASSIFIED
seed.core.bulk.is_low_priority
Description:​ ​Python​ ​class​ ​path​ ​to​ ​the​ ​function​ ​that​ ​will​ ​lower​ ​priority​ ​of​ ​a​ ​ingest​ ​submission​ ​based​ ​on​ ​its​ ​metadata
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.bulk.whitelist
Description:​ ​Python​ ​class​ ​path​ ​to​ ​the​ ​whitelist​ ​dictionary
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman
SEED.CORE.DISPATCHER

seed.core.dispatcher.shards
Description:​ ​Number​ ​of​ ​dispatcher​ ​processes​ ​to​ ​launch​ ​when​ ​the​ ​dispatcher​ ​service​ ​restart
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman,​ ​dispatcher,​ ​hostagent,​ ​uwsgi
SEED.CORE.DISPATCHER.MAX

seed.core.dispatcher.max.depth
Description:​ ​Maximum​ ​depth​ ​level​ ​for​ ​children​ ​in​ ​a​ ​submission
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher

seed.core.dispatcher.max.files
Description:​ ​Maximum​ ​number​ ​of​ ​files​ ​a​ ​submission​ ​can​ ​have
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher

seed.core.dispatcher.max.inflight
Description:​ ​Maximum​ ​number​ ​of​ ​concurrent​ ​inflight​ ​requests
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher,​ ​middleman

Page​ ​|​ ​34

UNCLASSIFIED
seed.core.dispatcher.max.retries
Description:​ ​Maximum​ ​number​ ​of​ ​times​ ​a​ ​submission​ ​will​ ​be​ ​resent​ ​to​ ​a​ ​service​ ​before​ ​generating​ ​an​ ​error
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher
SEED.CORE.DISPATCHER.TIMEOUTS

seed.core.dispatcher.timeouts.child
Description:​ ​Maximum​ ​amount​ ​of​ ​time​ ​to​ ​wait​ ​for​ ​a​ ​child​ ​to​ ​be​ ​submitted​ ​before​ ​we​ ​expire​ ​it
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher

seed.core.dispatcher.timeouts.watch_queue
Description:​ ​Maximum​ ​amount​ ​of​ ​time​ ​in​ ​seconds​ ​a​ ​watch_queue​ ​message​ ​stays​ ​in​ ​redis​ ​before​ ​expiring
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher
SEED.CORE.EXPIRY

seed.core.expiry.delete_storage
Description:​ ​Does​ ​expiry​ ​need​ ​to​ ​delete​ ​the​ ​file​ ​in​ ​the​ ​filestorage​ ​as​ ​well​ ​when​ ​it​ ​delete​ ​a​ ​file​ ​object​ ​entry
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​expiry

seed.core.expiry.workers
Description:​ ​Number​ ​of​ ​expiry_workers​ ​processes​ ​that​ ​will​ ​be​ ​launched​ ​when​ ​the​ ​expiry_workers​ ​service​ ​start
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​expiry_workers
SEED.CORE.EXPIRY.JOURNAL

seed.core.expiry.journal.directory
Description:​ ​Directory​ ​where​ ​the​ ​journal​ ​files​ ​are​ ​stored​ ​for​ ​expiry​ ​to​ ​delete​ ​the​ ​containing​ ​keys
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​journalist,​ ​expiry

Page​ ​|​ ​35

UNCLASSIFIED
seed.core.expiry.journal.ttl
Description:​ ​Number​ ​of​ ​days​ ​old​ ​the​ ​journal​ ​files​ ​have​ ​to​ ​be​ ​for​ ​expiry​ ​to​ ​start​ ​processing​ ​them
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​expiry
SEED.CORE.MIDDLEMAN

seed.core.middleman.classification
Description:​ ​Default​ ​classification​ ​of​ ​the​ ​submission​ ​started​ ​by​ ​middleman
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.default_prefix
Description:​ ​Default​ ​prefix​ ​that​ ​will​ ​be​ ​added​ ​to​ ​all​ ​ingest​ ​submissions​ ​description
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.dropper_threads
Description:​ ​Number​ ​of​ ​dropper​ ​thread​ ​spawned​ ​by​ ​each​ ​of​ ​the​ ​middleman​ ​processes
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.expire_after
Description:​ ​Time​ ​in​ ​seconds​ ​after​ ​which​ ​the​ ​internal​ ​caching​ ​expires
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.incomplete_expire_after
Description:​ ​Time​ ​in​ ​seconds​ ​after​ ​which​ ​and​ ​incomplete​ ​submission​ ​expires
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

Page​ ​|​ ​36

UNCLASSIFIED
seed.core.middleman.incomplete_stale_after
Description:​ ​Time​ ​in​ ​seconds​ ​after​ ​which​ ​and​ ​incomplete​ ​submission​ ​becomes​ ​stale
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.ingester_threads
Description:​ ​Number​ ​of​ ​ingester​ ​threads​ ​started​ ​by​ ​each​ ​of​ ​the​ ​middleman​ ​processes
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.max_extracted
Description:​ ​Maximum​ ​number​ ​of​ ​extracted​ ​files​ ​extraction​ ​services​ ​can​ ​do​ ​per​ ​level​ ​for​ ​middleman​ ​submissions
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.max_supplementary
Description:​ ​Maximum​ ​number​ ​of​ ​supplementary​ ​files​ ​services​ ​can​ ​submit​ ​for​ ​middleman​ ​submissions
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.max_value_size
Description:​ ​Max​ ​size​ ​of​ ​the​ ​metadata​ ​fields​ ​that​ ​after​ ​which​ ​they​ ​are​ ​removed​ ​from​ ​the​ ​metadata​ ​section
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.shards
Description:​ ​Number​ ​of​ ​processes​ ​of​ ​middleman​ ​that​ ​will​ ​be​ ​started​ ​by​ ​the​ ​service
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

Page​ ​|​ ​37

UNCLASSIFIED
seed.core.middleman.stale_after
Description:​ ​Number​ ​of​ ​seconds​ ​after​ ​which​ ​cache​ ​entries​ ​are​ ​considered​ ​stale
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.submitter_threads
Description:​ ​Number​ ​of​ ​submitter​ ​threads​ ​started​ ​by​ ​each​ ​of​ ​the​ ​middleman​ ​processes
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.user
Description:​ ​Username​ ​that​ ​middleman​ ​will​ ​use​ ​to​ ​submit​ ​files​ ​to​ ​the​ ​system
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman
SEED.CORE.MIDDLEMAN.SAMPLING_AT

seed.core.middleman.sampling_at.critical
Description:​ ​Number​ ​of​ ​queued​ ​critical​ ​priority​ ​samples​ ​after​ ​which​ ​middleman​ ​will​ ​start​ ​sampling
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.sampling_at.high
Description:​ ​Number​ ​of​ ​queued​ ​high​ ​priority​ ​samples​ ​after​ ​which​ ​middleman​ ​will​ ​start​ ​sampling
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

seed.core.middleman.sampling_at.low
Description:​ ​Number​ ​of​ ​queued​ ​low​ ​priority​ ​samples​ ​after​ ​which​ ​middleman​ ​will​ ​start​ ​sampling
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman

Page​ ​|​ ​38

UNCLASSIFIED
seed.core.middleman.sampling_at.medium
Description:​ ​Number​ ​of​ ​queued​ ​medium​ ​priority​ ​samples​ ​after​ ​which​ ​middleman​ ​will​ ​start​ ​sampling
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman
SEED.CORE.REDIS
SEED.CORE.REDIS.NONPERSISTENT

seed.core.redis.nonpersistent.db
Description:​ ​Redis​ ​db​ ​number​ ​used​ ​to​ ​store​ ​non-persistent​ ​keys
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

seed.core.redis.nonpersistent.host
Description:​ ​IP​ ​or​ ​domain​ ​of​ ​the​ ​redis​ ​non-persistent​ ​host
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.core.redis.nonpersistent.port
Description:​ ​Port​ ​on​ ​which​ ​the​ ​redis​ ​non-persistent​ ​server​ ​listen​ ​on
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All
SEED.CORE.REDIS.PERSISTENT

seed.core.redis.persistent.db
Description:​ ​Redis​ ​db​ ​number​ ​used​ ​to​ ​store​ ​persistent​ ​keys
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

seed.core.redis.persistent.host
Description:​ ​IP​ ​or​ ​domain​ ​of​ ​the​ ​redis​ ​persistent​ ​host
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

Page​ ​|​ ​39

UNCLASSIFIED
seed.core.redis.persistent.port
Description:​ ​Port​ ​on​ ​which​ ​the​ ​redis​ ​persistent​ ​server​ ​listen​ ​on
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

SEED.DATASOURCES
seed.datasources..classpath
Description:​ ​Python​ ​class​ ​path​ ​to​ ​the​ ​actual​ ​datasource​ ​object
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi,​ ​hostagent

seed.datasources..config
Description:​ ​Either​ ​a​ ​dictionary​ ​containing​ ​the​ ​config​ ​or​ ​a​ ​path​ ​in​ ​the​ ​seed​ ​where​ ​to​ ​get​ ​it’s​ ​config​ ​from
Datatype:​ ​dictionary​ ​or​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi,​ ​hostagent

SEED.DATASTORE
seed.datastore.default_timeout
Description:​ ​Default​ ​timeout​ ​for​ ​connections​ ​to​ ​the​ ​datastore​ ​(future​ ​-​ ​currently​ ​unused)
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.datastore.hosts
Description:​ ​List​ ​of​ ​hosts​ ​IP​ ​or​ ​domain​ ​used​ ​as​ ​datastore​ ​jump​ ​points​ ​when​ ​bootstrapping.​ ​Usually​ ​the​ ​core​ ​server.
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.datastore.port
Description:​ ​Port​ ​used​ ​to​ ​connect​ ​to​ ​Riak​ ​PB.​ ​Only​ ​change​ ​this​ ​value​ ​in​ ​appliance​ ​mode.
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

Page​ ​|​ ​40

UNCLASSIFIED
seed.datastore.solr_port
Description:​ ​Port​ ​used​ ​to​ ​connect​ ​to​ ​SOLR.​ ​Only​ ​change​ ​this​ ​value​ ​in​ ​appliance​ ​mode.
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.datastore.stream_port
Description:​ ​Port​ ​use​ ​to​ ​connect​ ​to​ ​Riak​ ​HTTP​ ​interface.​ ​Only​ ​change​ ​this​ ​value​ ​in​ ​appliance​ ​mode.
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All
SEED.DATASTORE.RIAK

seed.datastore.riak.nodes
Description:​ ​List​ ​of​ ​all​ ​the​ ​Riak​ ​nodes​ ​IPs​ ​or​ ​Domain​ ​names
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.datastore.riak.ring_size
Description:​ ​Number​ ​of​ ​partitions​ ​Riak​ ​will​ ​split​ ​the​ ​data​ ​in.​ ​(Once​ ​set​ ​this​ ​can​ ​never​ ​be​ ​changed)
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak
SEED.DATASTORE.RIAK.NVALS

seed.datastore.riak.nvals.high
Description:​ ​Number​ ​of​ ​replication​ ​for​ ​high​ ​concurrency​ ​data​ ​buckets​ ​(Once​ ​set,​ ​this​ ​can’t​ ​be​ ​changed)
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak,​ ​SOLR

seed.datastore.riak.nvals.low
Description:​ ​Number​ ​of​ ​replication​ ​for​ ​low​ ​concurrency​ ​data​ ​buckets​ ​(Once​ ​set,​ ​this​ ​can’t​ ​be​ ​changed)
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak,​ ​SOLR

Page​ ​|​ ​41

UNCLASSIFIED
seed.datastore.riak.nvals.med
Description:​ ​Number​ ​of​ ​replication​ ​for​ ​medium​ ​concurrency​ ​data​ ​buckets​ ​(Once​ ​set,​ ​this​ ​can’t​ ​be​ ​changed)
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak,​ ​SOLR
SEED.DATASTORE.RIAK.SOLR

seed.datastore.riak.solr.gc
Description:​ ​Extra​ ​parameters​ ​applied​ ​to​ ​SOLR​ ​Garbage​ ​collector​ ​(thread​ ​lightly...​ ​)
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​SOLR

seed.datastore.riak.solr.heap_max_gb
Description:​ ​Maximum​ ​amount​ ​of​ ​heap​ ​memory​ ​in​ ​GB​ ​allocated​ ​to​ ​SOLR​ ​(Don’t​ ​go​ ​over​ ​31GB)
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​SOLR

seed.datastore.riak.solr.heap_min_gb
Description:​ ​Minimum​ ​amount​ ​of​ ​heap​ ​memory​ ​in​ ​GB​ ​allocated​ ​to​ ​SOLR​ ​(Might​ ​as​ ​well​ ​set​ ​the​ ​same​ ​value​ ​as​ ​maximum)
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​SOLR
SEED.DATASTORE.RIAK.TWEAKS

seed.datastore.riak.tweaks.10gnic
Description:​ ​Should​ ​we​ ​apply​ ​the​ ​recommended​ ​tweaks​ ​for​ ​10GB​ ​Network​ ​cards​ ​on​ ​the​ ​Riak​ ​Servers
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.datastore.riak.tweaks.disableswap
Description:​ ​Should​ ​we​ ​disable​ ​swap​ ​on​ ​the​ ​different​ ​Riak​ ​nodes
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

Page​ ​|​ ​42

UNCLASSIFIED
seed.datastore.riak.tweaks.fs
Description:​ ​Should​ ​we​ ​apply​ ​recommended​ ​tweaks​ ​to​ ​the​ ​file​ ​system​ ​for​ ​Riak
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.datastore.riak.tweaks.jetty
Description:​ ​Should​ ​we​ ​apply​ ​tweaks​ ​to​ ​SOLR’s​ ​jetty​ ​server​ ​on​ ​the​ ​Riak​ ​servers​ ​(May​ ​not​ ​actually​ ​do​ ​much,​ ​leave​ ​OFF)
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.datastore.riak.tweaks.net
Description:​ ​Should​ ​we​ ​apply​ ​the​ ​recommended​ ​network​ ​tweaks​ ​on​ ​the​ ​Riak​ ​Servers
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.datastore.riak.tweaks.noop_scheduler
Description:​ ​Should​ ​we​ ​apply​ ​the​ ​recommended​ ​disk​ ​scheduler​ ​tweak​ ​to​ ​the​ ​Riak​ ​servers
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.datastore.riak.tweaks.tuned_solr_configs
Description:​ ​Should​ ​we​ ​tuned​ ​each​ ​individual​ ​SOLR​ ​index​ ​for​ ​high​ ​volume​ ​(You​ ​should​ ​leave​ ​that​ ​ON)
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

Page​ ​|​ ​43

UNCLASSIFIED

SEED.FILESTORE
seed.filestore.ftp_ip_restriction
Description:​ ​CIDR​ ​block​ ​of​ ​allowed​ ​IPs​ ​to​ ​the​ ​FTP​ ​server
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​ProFTPd

seed.filestore.ftp_password
Description:​ ​Password​ ​used​ ​to​ ​login​ ​to​ ​Core’s​ ​FTP​ ​server
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​ProFTPd

seed.filestore.ftp_root
Description:​ ​Directory​ ​where​ ​the​ ​Core​ ​FTP​ ​user​ ​is​ ​locked​ ​into
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​ProFTPd

seed.filestore.ftp_user
Description:​ ​Username​ ​used​ ​to​ ​login​ ​to​ ​Core’s​ ​FTP​ ​server
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​ProFTPd

seed.filestore.support_urls
Description:​ ​List​ ​of​ ​fully​ ​qualified​ ​URLs​ ​(http://user:pass@domain/path)​ ​where​ ​support​ ​files​ ​are​ ​found​ ​(VM​ ​disks​ ​and​ ​such)
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

seed.filestore.urls
Description:​ ​List​ ​of​ ​fully​ ​qualified​ ​URLs​ ​(http://user:pass@domain/path)​ ​where​ ​submitted​ ​files​ ​are​ ​found​ ​(VM​ ​disks​ ​and​ ​such)
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

Page​ ​|​ ​44

UNCLASSIFIED

SEED.INSTALLATION
seed.installation.pip_index_url
Description:​ ​Url​ ​to​ ​your​ ​local​ ​PIP​ ​repo​ ​if​ ​you​ ​are​ ​using​ ​one
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All
SEED.INSTALLATION.DOCKER

seed.installation.docker.apt_repo_info
Description:​ ​Apt​ ​repo​ ​descriptor​ ​to​ ​add​ ​to​ ​your​ ​apt​ ​repos
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.installation.docker.apt_repo_key_url
Description:​ ​Url​ ​to​ ​fetch​ ​the​ ​apt​ ​repo​ ​key​ ​to​ ​add​ ​the​ ​docker​ ​apt​ ​repo​ ​to​ ​your​ ​repos
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.installation.docker.private_registry
Description:​ ​Domain:port​ ​of​ ​your​ ​private​ ​registry
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All
SEED.INSTALLATION.EXTERNAL_PACKAGES

seed.installation.external_packages..args
Description: Arguments passed to the transport (see PackageFetcher class in /opt/al/pkg/assemblyline/al/install/__init__.py
for​ ​more​ ​details​ ​about​ ​the​ ​possible​ ​arguments​ ​of​ ​the​ ​possible​ ​transports)
Datatype:​ ​dictionary
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

Page​ ​|​ ​45

UNCLASSIFIED
seed.installation.external_packages..transport
Description:​ ​Type​ ​of​ ​transport​ ​to​ ​be​ ​instantiated​ ​by​ ​the​ ​PackageFetcher​ ​class
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All
SEED.INSTALLATION.HOOKS

seed.installation.hooks.core_pre
Description:​ ​List​ ​of​ ​python​ ​class​ ​path​ ​of​ ​the​ ​different​ ​hooks​ ​that​ ​will​ ​be​ ​executed​ ​during​ ​core​ ​installation
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Core

seed.installation.hooks.riak_pre
Description:​ ​List​ ​of​ ​python​ ​class​ ​path​ ​of​ ​the​ ​different​ ​hooks​ ​that​ ​will​ ​be​ ​executed​ ​during​ ​Riak​ ​installation
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Riak

seed.installation.hooks.ui_pre
Description:​ ​List​ ​of​ ​python​ ​class​ ​path​ ​of​ ​the​ ​different​ ​hooks​ ​that​ ​will​ ​be​ ​executed​ ​during​ ​the​ ​UI​ ​installation
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​uwsgi,​ ​nginx
SEED.INSTALLATION.REPOSITORIES
SEED.INSTALLATION.REPOSITORIES.REALMS

seed.installation.repositories.realms..branch
Description:​ ​Name​ ​of​ ​the​ ​branch​ ​to​ ​checkout​ ​for​ ​this​ ​repo​ ​during​ ​install​ ​time
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.installation.repositories.realms..key
Description:​ ​Private​ ​key​ ​of​ ​the​ ​user​ ​the​ ​will​ ​checkout​ ​the​ ​repo​ ​(optional)
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

Page​ ​|​ ​46

UNCLASSIFIED
seed.installation.repositories.realms..password
Description:​ ​Password​ ​of​ ​the​ ​user​ ​the​ ​will​ ​checkout​ ​the​ ​repo​ ​during​ ​install​ ​time
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.installation.repositories.realms..url
Description:​ ​Fully​ ​qualified​ ​url​ ​to​ ​checkout​ ​the​ ​repo​ ​except​ ​the​ ​repo​ ​name
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.installation.repositories.realms..user
Description:​ ​Username​ ​used​ ​to​ ​checkout​ ​the​ ​repo​ ​during​ ​install​ ​time
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All
SEED.INSTALLATION.REPOSITORIES.REPOS

seed.installation.repositories.repos..realm
Description:​ ​Name​ ​of​ ​the​ ​realm​ ​where​ ​the​ ​repo​ ​is​ ​found​ ​in.
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All
SEED.INSTALLATION.SUPPLEMENTARY_PACKAGES

seed.installation.supplementary_packages.apt
Description:​ ​List​ ​of​ ​apt​ ​packages​ ​that​ ​are​ ​not​ ​required​ ​by​ ​the​ ​system​ ​to​ ​function​ ​but​ ​are​ ​useful​ ​to​ ​manage​ ​it
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.installation.supplementary_packages.pip
Description:​ ​List​ ​of​ ​pip​ ​packages​ ​that​ ​are​ ​not​ ​required​ ​by​ ​the​ ​system​ ​to​ ​function​ ​but​ ​are​ ​useful​ ​to​ ​manage​ ​it
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

Page​ ​|​ ​47

UNCLASSIFIED

SEED.LOGGING
seed.logging.directory
Description:​ ​Directory​ ​where​ ​the​ ​logs​ ​are​ ​being​ ​stored​ ​on​ ​the​ ​system​ ​(as​ ​to​ ​exists​ ​and​ ​be​ ​writeable​ ​by​ ​AL​ ​user)
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

seed.logging.log_to_console
Description:​ ​Should​ ​the​ ​logger​ ​log​ ​to​ ​console
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

seed.logging.log_to_file
Description:​ ​Should​ ​the​ ​logger​ ​log​ ​to​ ​file
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

seed.logging.log_to_syslog
Description:​ ​Should​ ​the​ ​logger​ ​log​ ​to​ ​syslog
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All
SEED.LOGGING.LOGSERVER

seed.logging.logserver.node
Description:​ ​IP​ ​or​ ​Domain​ ​of​ ​the​ ​log​ ​server
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

Page​ ​|​ ​48

UNCLASSIFIED
SEED.LOGGING.LOGSERVER.ELASTICSEARCH

seed.logging.logserver.elasticsearch.heap_size
Description:​ ​Size​ ​of​ ​elasticsearch​ ​java​ ​heap​ ​(Do​ ​not​ ​make​ ​bigger​ ​than​ ​31GB)
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​elasticsearch

seed.logging.logserver.elasticsearch.index_ttl.al_metrics
Description:​ ​Time​ ​to​ ​live​ ​in​ ​days​ ​for​ ​the​ ​al_metrics​ ​index​ ​in​ ​elasticsearch
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​cron

seed.logging.logserver.elasticsearch.index_ttl.audit
Description:​ ​Time​ ​to​ ​live​ ​in​ ​days​ ​for​ ​the​ ​audit​ ​index​ ​in​ ​elasticsearch
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​cron

seed.logging.logserver.elasticsearch.index_ttl.logs
Description:​ ​Time​ ​to​ ​live​ ​in​ ​days​ ​for​ ​the​ ​logs​ ​index​ ​in​ ​elasticsearch
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​cron

seed.logging.logserver.elasticsearch.index_ttl.riak
Description:​ ​Time​ ​to​ ​live​ ​in​ ​days​ ​for​ ​the​ ​riak​ ​index​ ​in​ ​elasticsearch
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​cron

seed.logging.logserver.elasticsearch.index_ttl.solr
Description:​ ​Time​ ​to​ ​live​ ​in​ ​days​ ​for​ ​the​ ​solr​ ​index​ ​in​ ​elasticsearch
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​cron

Page​ ​|​ ​49

UNCLASSIFIED
seed.logging.logserver.elasticsearch.index_ttl.system_metrics
Description:​ ​Time​ ​to​ ​live​ ​in​ ​days​ ​for​ ​the​ ​system_metrics​ ​index​ ​in​ ​elasticsearch
Datatype:​ ​int
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​cron
SEED.LOGGING.LOGSERVER.KIBANA

seed.logging.logserver.kibana.dashboards
Description:​ ​List​ ​of​ ​available​ ​dashboards​ ​in​ ​the​ ​kibana​ ​interface​ ​that​ ​will​ ​be​ ​exported​ ​to​ ​Assemblyline​ ​UI​ ​in​ ​an​ ​iFrame
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.logging.logserver.kibana.extra_indices
Description:​ ​List​ ​of​ ​extra​ ​indices​ ​to​ ​be​ ​loaded​ ​at​ ​install​ ​time
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​uwsgi

seed.logging.logserver.kibana.extra_viz
Description:​ ​List​ ​of​ ​extra​ ​visualisations​ ​and​ ​dashboards​ ​to​ ​be​ ​loaded​ ​at​ ​install​ ​time
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​uwsgi

seed.logging.logserver.kibana.host
Description:
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.logging.logserver.kibana.password
Description:​ ​Password​ ​use​ ​to​ ​connect​ ​to​ ​the​ ​kibana​ ​interface​ ​on​ ​the​ ​log​ ​server
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​Kibana

Page​ ​|​ ​50

UNCLASSIFIED
seed.logging.logserver.kibana.port
Description:​ ​Port​ ​to​ ​access​ ​the​ ​kibana​ ​dashboard
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.logging.logserver.kibana.scheme
Description:​ ​Scheme​ ​to​ ​use​ ​to​ ​access​ ​the​ ​kibana​ ​dashboard​ ​(HTTPS)
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi
SEED.LOGGING.LOGSERVER.SSL

seed.logging.logserver.ssl.crt
Description:​ ​Path​ ​to​ ​the​ ​public​ ​key​ ​of​ ​the​ ​web​ ​server
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

seed.logging.logserver.ssl.key
Description:​ ​Path​ ​to​ ​the​ ​private​ ​key​ ​of​ ​the​ ​logserver
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

SEED.MONITORING
seed.monitoring.harddrive
Description:​ ​Should​ ​the​ ​hard​ ​drive​ ​monitoring​ ​tool​ ​be​ ​installed​ ​on​ ​all​ ​servers
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​None

Page​ ​|​ ​51

UNCLASSIFIED

SEED.SERVICES
seed.services.categories
Description:​ ​List​ ​of​ ​all​ ​the​ ​categories​ ​of​ ​services​ ​for​ ​easier​ ​selection​ ​in​ ​the​ ​UI
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi,​ ​dispatcher,​ ​hostagent

seed.services.flex_blacklist
Description:​ ​List​ ​of​ ​services​ ​that​ ​cannot​ ​be​ ​instantiated​ ​by​ ​the​ ​flex​ ​nodes
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​Hostagent

seed.services.stages
Description:​ ​List​ ​of​ ​stages​ ​of​ ​execution​ ​in​ ​their​ ​actual​ ​execution​ ​order
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​Dispatcher,​ ​hostagent

seed.services.system_category
D​escription:​ ​Category​ ​for​ ​the​ ​system​ ​services​ ​(aka​ ​services​ ​that​ ​are​ ​automatically​ ​added​ ​to​ ​all​ ​tasks)
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi,​ ​dispatcher,​ ​hostagent
SEED.SERVICES.LIMITS

seed.services.limits.max_extracted
Description:​ ​Maximum​ ​number​ ​of​ ​extracted​ ​files​ ​a​ ​service​ ​can​ ​create
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​Hostagent,​ ​uwsgi

seed.services.limits.max_supplementary
Description:​ ​Maximum​ ​number​ ​of​ ​supplementary​ ​file​ ​a​ ​service​ ​can​ ​create
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​Hostagent,​ ​uwsgi

Page​ ​|​ ​52

UNCLASSIFIED
SEED.SERVICES.MASTER_LIST

seed.services.master_list..accepts
Description:​ ​Regular​ ​expression​ ​of​ ​file​ ​types​ ​accepted​ ​by​ ​the​ ​service
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher,​ ​hostagent

seed.services.master_list..class_name
Description:​ ​Name​ ​of​ ​the​ ​class​ ​instantiated​ ​once​ ​installed
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.services.master_list..category
Description:​ ​Category​ ​that​ ​the​ ​service​ ​falls​ ​under
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher,​ ​hostagent

seed.services.master_list..classpath
Description:​ ​Python​ ​class​ ​path​ ​to​ ​the​ ​main​ ​service​ ​class
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher,​ ​hostagent

seed.services.master_list..config
Description:​ ​Dictionary​ ​of​ ​configuration​ ​for​ ​a​ ​given​ ​service.​ ​This​ ​is​ ​free​ ​for​ ​all​ ​any​ ​service​ ​can​ ​have​ ​any​ ​kind​ ​of​ ​config.
Datatype:​ ​dict
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.services.master_list..cpu_cores
Description:​ ​Max​ ​cpu​ ​utilization​ ​while​ ​the​ ​service​ ​is​ ​running​ ​at​ ​full​ ​capacity​ ​-​ ​used​ ​for​ ​provisioning​ ​(1.0​ ​=​ ​1​ ​core​ ​at​ ​100%)
Datatype:​ ​float
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

Page​ ​|​ ​53

UNCLASSIFIED
seed.services.master_list..description
Description:​ ​Description​ ​of​ ​what​ ​the​ ​service​ ​do
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​None

seed.services.master_list..enabled
Description:​ ​Is​ ​the​ ​service​ ​enabled​ ​in​ ​the​ ​system​ ​or​ ​not
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher,​ ​hostagent

seed.services.master_list..install_by_default
Description:​ ​Should​ ​with​ ​call​ ​the​ ​service​ ​installer​ ​during​ ​installation
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​hostagent

seed.services.master_list..name
Description:​ ​Name​ ​of​ ​the​ ​service.​ ​(It​ ​has​ ​to​ ​match​ ​the​ ​service​ ​class​ ​name​ ​and​ ​should​ ​never​ ​be​ ​changed)
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​dispatcher,​ ​uwsgi,​ ​hostagent

seed.services.master_list..ram_mb
Description:​ ​Maximum​ ​amount​ ​of​ ​ram​ ​in​ ​MB​ ​the​ ​service​ ​can​ ​consume​ ​while​ ​processing​ ​files​ ​-​ ​used​ ​for​ ​provisioning
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.services.master_list..realm
Description:​ ​Name​ ​of​ ​the​ ​realm​ ​to​ ​use​ ​to​ ​checkout​ ​the​ ​repository
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

Page​ ​|​ ​54

UNCLASSIFIED
seed.services.master_list..rejects
Description:​ ​Regular​ ​expression​ ​of​ ​file​ ​types​ ​rejected​ ​by​ ​the​ ​service
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher,​ ​hostagent

seed.services.master_list..repo
Description:​ ​Name​ ​of​ ​the​ ​repository​ ​on​ ​the​ ​realm
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.services.master_list..stage
Description:​ ​Stage​ ​at​ ​which​ ​the​ ​service​ ​executes
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher,​ ​hostagent

seed.services.master_list..submission_params
Description: List​ ​of​ ​dictionaries​ ​describing​ ​the​ ​possible​ ​parameters​ ​that​ ​can​ ​be​ ​added​ ​to​ ​a​ ​submission​ ​for​ ​a​ ​given​ ​service.
The​ ​dict​ ​objects​ ​in​ ​this​ ​list​ ​should​ ​look​ ​like​ ​this:
{​ ​“default”:​ ​“”,

#​ ​Default​ ​value​ ​for​ ​the​ ​parameter

​ ​ ​“type”:​ ​“”,

#​ ​Type​ ​of​ ​parameter​ ​(int,​ ​str,​ ​bool,​ ​list)

​ ​ ​“name”:​ ​“”,

#​ ​Name​ ​of​ ​the​ ​parameter

​ ​ ​“list”:​ ​“”,

#​ ​(optional)​ ​If​ ​a​ ​list​ ​type,​ ​the​ ​list​ ​of​ ​possible​ ​values

​ ​ ​“value”:​ ​“”​ ​}

#​ ​Value​ ​of​ ​the​ ​parameter

Datatype:​ ​list[dict]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.services.master_list..supported_platforms
Description:​ ​List​ ​of​ ​OS​ ​the​ ​service​ ​can​ ​run​ ​on​ ​(possible​ ​values:​ ​windows,​ ​linux)
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

Page​ ​|​ ​55

UNCLASSIFIED
seed.services.master_list..timeout
Description:​ ​Time​ ​in​ ​seconds​ ​that​ ​after​ ​which​ ​a​ ​task​ ​is​ ​cancelled​ ​if​ ​the​ ​service​ ​did​ ​not​ ​have​ ​time​ ​to​ ​complete
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​dispatcher,​ ​hostagent
SEED.SERVICES.TIMEOUTS

seed.services.timeouts.default
Description:​ ​Default​ ​service​ ​timeout​ ​in​ ​seconds​ ​for​ ​service​ ​that​ ​do​ ​not​ ​provide​ ​a​ ​timeout
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

SEED.STATISTICS
seed.statistics.alert_statistics_fields
Description:​ ​When​ ​generating​ ​statistics​ ​about​ ​currently​ ​viewed​ ​alerts,​ ​this​ ​is​ ​the​ ​list​ ​of​ ​fields​ ​that​ ​will​ ​be​ ​faceted
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.statistics.submission_meta_fields
Description​:​ ​List​ ​of​ ​submission​ ​fields​ ​faceted​ ​when​ ​viewing​ ​detailed​ ​results​ ​about​ ​a​ ​file
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​ ​uwsgi

SEED.SUBMISSIONS
seed.submissions.decode_file
Description:​ ​Python​ ​path​ ​to​ ​the​ ​built-in​ ​file​ ​un-neutering​ ​function
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi,​ ​hostagent

Page​ ​|​ ​56

UNCLASSIFIED
seed.submissions.password
Description:​ ​Password​ ​used​ ​to​ ​connect​ ​to​ ​the​ ​submission​ ​host​ ​by​ ​the​ ​services
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.submissions.ttl
Description:​ ​Default​ ​time​ ​to​ ​live​ ​in​ ​days​ ​for​ ​submissions
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman,​ ​uwsgi,​ ​dispatcher

seed.submissions.url
Description:​ ​Url​ ​containing​ ​scheme,​ ​host​ ​and​ ​port​ ​to​ ​connect​ ​to​ ​when​ ​doing​ ​resubmissions​ ​(https://localhost:443)
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.submissions.user
Description:​ ​Uname​ ​used​ ​to​ ​connect​ ​to​ ​the​ ​submission​ ​host​ ​by​ ​the​ ​services
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.submissions.working_dir
Description:​ ​Working​ ​directory​ ​for​ ​the​ ​submissions​ ​to​ ​be​ ​processed​ ​in​ ​(deprecated)
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent
SEED.SUBMISSIONS.MAX

seed.submissions.max.priority
Description:​ ​Highest​ ​priority​ ​that​ ​can​ ​be​ ​set​ ​in​ ​the​ ​system
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​middleman,​ ​uwsgi,​ ​dispatcher,​ ​hostagent

Page​ ​|​ ​57

UNCLASSIFIED
seed.submissions.max.size
Description:​ ​Maximum​ ​file​ ​size​ ​of​ ​each​ ​files​ ​in​ ​the​ ​submission
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

SEED.SYSTEM
seed.system.constants
Description:​ ​Python​ ​path​ ​to​ ​the​ ​constants​ ​file
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

seed.system.country_code_map
Description:​ ​Python​ ​path​ ​to​ ​a​ ​class​ ​that​ ​does​ ​geolocation​ ​for​ ​an​ ​IP
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.system.load_config_from_riak
Description:​ ​Should​ ​the​ ​seed​ ​information​ ​be​ ​loaded​ ​from​ ​Riak​ ​(Should​ ​be​ ​True​ ​in​ ​a​ ​normal​ ​install)
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​ ​All

seed.system.name
Description:​ ​Name​ ​of​ ​the​ ​system.​ ​If​ ​‘production’​ ​the​ ​UI​ ​will​ ​display​ ​the​ ​production​ ​interface​ ​else​ ​will​ ​display​ ​the​ ​dev​ ​interface
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.system.organisation
Description:​ ​Acronym​ ​of​ ​the​ ​organisation​ ​which​ ​deployed​ ​the​ ​system
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

Page​ ​|​ ​58

UNCLASSIFIED
seed.system.password
Description:​ ​Password​ ​of​ ​the​ ​system​ ​user​ ​(leave​ ​null​ ​for​ ​production​ ​system)
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.system.root
Description:​ ​Assemblyline​ ​installation​ ​directory​ ​(you​ ​should​ ​leave​ ​it​ ​to​ ​/opt/al)
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.system.update_interval
Description:​ ​Time​ ​interval​ ​in​ ​seconds​ ​at​ ​which​ ​heartbeats​ ​are​ ​sent​ ​throughout​ ​the​ ​system
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

seed.system.use_proxy
Description:​ ​Should​ ​the​ ​system​ ​proxy​ ​connections​ ​to​ ​Riak​ ​and​ ​Redis​ ​(leave​ ​to​ ​True)
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All

seed.system.user
Description:​ ​Username​ ​of​ ​the​ ​user​ ​that​ ​will​ ​execute​ ​the​ ​various​ ​components
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​All
SEED.SYSTEM.CLASSIFICATION

seed.system.classification.definition
Description:

Dictionary that defines how the classification engine works in the system. This is much too complicated to
explain here and the documentation on the class is very complete. Check the classification file for more details
about​ ​this:
https://bitbucket.org/cse-assemblyline/assemblyline/src/master/al/common/classification.py

Datatype:​ ​dict
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All

Page​ ​|​ ​59

UNCLASSIFIED
seed.system.classification.engine
Description:​ ​Python​ ​path​ ​to​ ​the​ ​classification​ ​engine​ ​class
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​All
SEED.SYSTEM.INTERNAL_REPOSITORY

seed.system.internal_repository..branch
Description:​ ​Branch​ ​workers​ ​will​ ​checkout​ ​when​ ​checking​ ​out​ ​a​ ​given​ ​repository
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​None

seed.system.internal_repository..url
Description:​ ​Fully​ ​qualified​ ​url​ ​workers​ ​will​ ​used​ ​to​ ​checkout​ ​a​ ​given​ ​repository
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​None
SEED.SYSTEM.YARA

seed.system.yara.externals
Description:​ ​List​ ​of​ ​external​ ​task​ ​values​ ​that​ ​will​ ​be​ ​accessible​ ​from​ ​inside​ ​the​ ​yara​ ​rules
Datatype:​ ​list[string]
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi,​ ​hostagent

seed.system.yara.parser
Description:​ ​Python​ ​path​ ​to​ ​the​ ​YaraParser​ ​class
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi,​ ​hostagent

seed.system.yara.importer
Description:​ ​Python​ ​path​ ​to​ ​the​ ​YaraImporter​ ​class
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi,​ ​hostagent

Page​ ​|​ ​60

UNCLASSIFIED

SEED.UI
seed.ui.allow_raw_downloads
Description:​ ​Should​ ​the​ ​UI​ ​allow​ ​the​ ​user​ ​to​ ​download​ ​the​ ​files​ ​in​ ​raw​ ​format
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.ui.allowed_checkout_range
Description:​ ​CIDR​ ​of​ ​IPs​ ​that​ ​are​ ​allowed​ ​to​ ​checkout​ ​the​ ​code​ ​from​ ​the​ ​core​ ​server​ ​(restrict​ ​this​ ​as​ ​much​ ​as​ ​possible)
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

seed.ui.audit
Description:​ ​Should​ ​the​ ​UI​ ​audit​ ​the​ ​different​ ​requests​ ​it​ ​receives
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.ui.context
Description:​ ​Python​ ​path​ ​to​ ​the​ ​UI​ ​context​ ​dictionary
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.ui.debug
Description:​ ​Should​ ​the​ ​debug​ ​features​ ​be​ ​turned​ ​on​ ​into​ ​the​ ​UI
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.ui.download_encoding
Description:​ ​Default​ ​download​ ​encoding​ ​of​ ​every​ ​files​ ​downloaded​ ​in​ ​the​ ​system
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

Page​ ​|​ ​61

UNCLASSIFIED
seed.ui.email
Description:​ ​Administrative​ ​email​ ​displayed​ ​in​ ​the​ ​terms​ ​of​ ​service​ ​page
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.ui.enforce_quota
Description:​ ​Should​ ​the​ ​UI​ ​enforme​ ​submission​ ​and​ ​API​ ​quotas
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.ui.fqdn
Description:​ ​Fully​ ​qualified​ ​domain​ ​name​ ​of​ ​the​ ​web​ ​server
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

seed.ui.install_path
Description:​ ​File​ ​path​ ​where​ ​the​ ​different​ ​repos​ ​are​ ​checked​ ​out
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.ui.rsa_key_size
Description:​ ​Size​ ​of​ ​the​ ​rsa​ ​key​ ​used​ ​to​ ​encode​ ​user’s​ ​password​ ​during​ ​the​ ​login​ ​process
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.ui.secret_key
Description:​ ​Flask​ ​secret​ ​key​ ​to​ ​make​ ​sure​ ​the​ ​session​ ​cookies​ ​are​ ​safe​ ​(make​ ​this​ ​is​ ​long​ ​and​ ​random…)
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

Page​ ​|​ ​62

UNCLASSIFIED
seed.ui.tos
Description:​ ​Terms​ ​of​ ​service​ ​in​ ​markdown​ ​format
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi

seed.ui.tos_lockout
Description:​ ​Should​ ​the​ ​UI​ ​lockout​ ​the​ ​user​ ​for​ ​administrative​ ​review​ ​after​ ​they’ve​ ​agreed​ ​to​ ​terms​ ​of​ ​service
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​uwsgi
SEED.UI.SSL

seed.ui.ssl.enabled
Description:​ ​Should​ ​SSL​ ​be​ ​enabled​ ​on​ ​the​ ​core​ ​server
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx
SEED.UI.SSL.CERTS

seed.ui.ssl.certs.autogen
Description:​ ​Should​ ​the​ ​system​ ​automatically​ ​generate​ ​certs​ ​for​ ​SSL
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

seed.ui.ssl.certs.ca
Description:​ ​Path,​ ​local​ ​to​ ​the​ ​install_dir,​ ​to​ ​the​ ​client​ ​certs​ ​ca
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

seed.ui.ssl.certs.crl
Description:​ ​Path,​ ​local​ ​to​ ​the​ ​install_dir,​ ​to​ ​the​ ​client​ ​certs​ ​revocation​ ​list
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

Page​ ​|​ ​63

UNCLASSIFIED
seed.ui.ssl.certs.crt
Description:​ ​Path,​ ​local​ ​to​ ​the​ ​install_dir,​ ​to​ ​the​ ​server's​ ​public​ ​key
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

seed.ui.ssl.certs.key
Description:​ ​Path,​ ​local​ ​to​ ​the​ ​install_dir,​ ​to​ ​the​ ​server's​ ​private​ ​key
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

seed.ui.ssl.certs.tc
Description:​ ​Path,​ ​local​ ​to​ ​the​ ​install_dir,​ ​to​ ​the​ ​server's​ ​cert​ ​trust​ ​chain
Datatype:​ ​string
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​nginx

SEED.WORKERS
seed.workers.default_profile
Description:​ ​Default​ ​profile​ ​loaded​ ​by​ ​the​ ​workers​ ​when​ ​they​ ​don’t​ ​have​ ​a​ ​specific​ ​profile​ ​assigned​ ​to​ ​them
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.workers.install_kvm
Description:​ ​Should​ ​the​ ​necessary​ ​components​ ​to​ ​run​ ​VMs​ ​be​ ​installed​ ​on​ ​the​ ​workers
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​None

seed.workers.nodes
Description:​ ​List​ ​of​ ​IP​ ​or​ ​domains​ ​of​ ​all​ ​the​ ​workers​ ​in​ ​the​ ​system
Datatype:​ ​list[string]
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​None

Page​ ​|​ ​64

UNCLASSIFIED
seed.workers.proxy_redis
Description:​ ​Should​ ​connections​ ​to​ ​redis​ ​be​ ​proxied
Datatype:​ ​bool
Used​ ​when:​ ​Installation
Components​ ​affected:​ ​haproxy
SEED.WORKERS.VIRTUALMACHINES

seed.workers.virtualmachines.disk_root
Description:​ ​Path​ ​to​ ​the​ ​folder​ ​where​ ​the​ ​VMs​ ​disks​ ​are​ ​save​ ​on​ ​the​ ​workers
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.workers.virtualmachines.use_parent_as_datastore
Description:​ ​Should​ ​the​ ​VMs​ ​use​ ​their​ ​parent​ ​to​ ​access​ ​datastore
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.workers.virtualmachines.use_parent_as_queue
Description:​ ​Should​ ​the​ ​VMs​ ​use​ ​their​ ​parent​ ​to​ ​access​ ​the​ ​queues
Datatype:​ ​bool
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent
SEED.WORKERS.VIRTUALMACHINES.MASTER_LIST

seed.workers.virtualmachines.master_list..num_workers
Description:​ ​Number​ ​of​ ​worker​ ​processes​ ​that​ ​will​ ​be​ ​instantiated​ ​inside​ ​the​ ​VMs
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent
SEED.WORKERS.VIRTUALMACHINES.MASTER_LIST.CFG

seed.workers.virtualmachines.master_list..cfg.enabled
Description:​ ​Is​ ​the​ ​VM​ ​enabled
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent,​ ​dispatcher

Page​ ​|​ ​65

UNCLASSIFIED
seed.workers.virtualmachines.master_list..cfg.name
Description:​ ​Name​ ​of​ ​the​ ​VM.​ ​(Must​ ​match​ ​the​ ​name​ ​of​ ​the​ ​service​ ​it​ ​launches​ ​internally)
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent,​ ​dispatcher

seed.workers.virtualmachines.master_list..cfg.os_type
Description:​ ​Type​ ​of​ ​OS​ ​the​ ​VM​ ​runs​ ​(windows​ ​or​ ​linux)
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.workers.virtualmachines.master_list..cfg.os_variant
Description:​ ​Variant​ ​of​ ​OS​ ​the​ ​VM​ ​runs​ ​(win7,​ ​win2k8,​ ​ubuntu​ ​precise,​ ​…)
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.workers.virtualmachines.master_list..cfg.ram
Description:​ ​Amount​ ​of​ ​ram​ ​allocated​ ​to​ ​the​ ​VM
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.workers.virtualmachines.master_list..cfg.revert_every
Description:​ ​Time​ ​in​ ​seconds​ ​after​ ​which​ ​the​ ​VM​ ​is​ ​automatically​ ​reverted
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

seed.workers.virtualmachines.master_list..cfg.vcpus
Description:​ ​Number​ ​of​ ​virtual​ ​CPUs​ ​allocated​ ​to​ ​the​ ​VM
Datatype:​ ​int
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

Page​ ​|​ ​66

UNCLASSIFIED
seed.workers.virtualmachines.master_list..cfg.virtual_disk_url
Description:​ ​Name​ ​of​ ​the​ ​QCOW2​ ​disk​ ​use​ ​to​ ​create​ ​the​ ​VM
Datatype:​ ​string
Used​ ​when:​ ​Live
Components​ ​affected:​ ​hostagent

Page​ ​|​ ​67



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : Yes
Producer                        : Skia/PDF m64
Page Count                      : 67
EXIF Metadata provided by EXIF.tools

Navigation menu