Assemblyline 3.2 Reference Manual
User Manual: Pdf
Open the PDF directly: View PDF
.
Page Count: 67
| Download | |
| Open PDF In Browser | View PDF |
UNCLASSIFIED ASSEMBLYLINE 3.2 REFERENCE MANUAL 2016 - Communication Security Establishment Revision 6 UNCLASSIFIED TABLE OF CONTENTS Introduction 7 Typical workflows 7 Direct user submission 7 High volume ingest 8 Infrastructure 9 Servers 9 Core 9 Datastore 9 Workers 10 Logger (optional) 11 Support (optional) 11 Components 12 Dispatcher 12 Middleman 12 Expiry / Expiry workers 13 Journalist 13 Alerter 13 Alert actions 13 Workflow filters 14 System metrics 14 Metricsd 14 Controller 14 Hostagent 14 QuotaSniper 14 Deployments 15 Types 15 Development VM 15 Appliance 16 Production cluster 17 Page | 2 UNCLASSIFIED Custom instance Services Service VM Data layout 18 19 19 20 Common fields 20 Buckets 20 Alert 20 blob 20 emptyresult 20 error 20 file 21 filescore 21 node 21 profile 21 result 21 signature 21 submission 21 user 21 Data security 22 Text representation 22 Parts 22 Level 22 Required 22 Group 22 Subgroup 22 Managing the system 23 Dashboard 23 Management UI 23 Build Documentation 23 Configuration 23 Page | 3 UNCLASSIFIED Error Viewer 23 Hosts 23 Profiles 24 Provisioning 24 Services 24 Site Map 24 Users 25 Virtual Machines 25 CLI 25 backup 25 backup [follow] [force] 25 delete [force] 26 delete submission full [force] 26 index commit [ ] 26 restore 26 signature change_status by_query [force] 26 iPython 27 Load datastore 27 What to and not to do with a datastore object 27 Repairing Corrupted eleveldb Index 28 Seed demystified 29 seed.auth 29 seed.auth.internal seed.auth.internal.users seed.core 30 31 32 seed.core.alerter 32 seed.core.bulk 33 seed.core.dispatcher 34 seed.core.dispatcher.max 34 seed.core.dispatcher.timeouts 35 Page | 4 UNCLASSIFIED seed.core.expiry seed.core.expiry.journal seed.core.middleman seed.core.middleman.sampling_at seed.core.redis 35 35 36 38 39 seed.core.redis.nonpersistent 39 seed.core.redis.persistent 39 seed.datasources 40 seed.datastore 40 seed.datastore.riak 41 seed.datastore.riak.nvals 41 seed.datastore.riak.solr 42 seed.datastore.riak.tweaks 42 seed.filestore 44 seed.installation 45 seed.installation.docker 45 seed.installation.external_packages 45 seed.installation.hooks 46 seed.installation.repositories 46 seed.installation.repositories.realms 46 seed.installation.repositories.repos 47 seed.installation.supplementary_packages 47 seed.logging seed.logging.logserver 48 48 seed.logging.logserver.elasticsearch 49 seed.logging.logserver.kibana 50 seed.logging.logserver.ssl 51 seed.monitoring 51 seed.services 52 seed.services.limits 52 Page | 5 UNCLASSIFIED seed.services.master_list 53 seed.services.timeouts 56 seed.statistics 56 seed.submissions 56 seed.submissions.max seed.system 57 58 seed.system.classification 59 seed.system.internal_repository 60 seed.system.yara 60 seed.ui 61 seed.ui.ssl seed.ui.ssl.certs seed.workers seed.workers.virtualmachines seed.workers.virtualmachines.master_list seed.workers.virtualmachines.master_list.cfg 63 63 64 65 65 65 Page | 6 UNCLASSIFIED INTRODUCTION Assemblyline is a scalable distributed file analysis framework. It is designed to process millions of files per day but can also be installed on a single server. It is built in python and has a web interface for users to submit tasks. Assemblyline also provides a web API for easy scripting. Think of Assemblyline as an in-out service. You have a file, you need to know as much as you can about it and potentially if it is malicious or not. You send it to Assemblyline, it gets scanned with as many services as possible and then the data is aggregated for you in a report so you can make a decision about this file. The files and reports in Assemblyline have a time to live, they are not stored forever. Assemblyline can also be integrated in your workflow. It has an ingestion API that can generate alerts for analysts to monitor. TYPICAL WORKFLOWS There are two typical workflows. The first is low volume task submission (Direct user submission). The second is high volume file ingestion. These workflows are described below. DIRECT USER SUBMISSION In the case of direct submissions, the task received by the API will be sent directly to the dispatcher for processing. Dispatcher will analyse the type of file submitted and route the submission to the appropriate workers for analysis. When the workers are done, they save their analysis results directly in the Datastore and notify the dispatcher that they have completed their task and move on to the next one. Once the dispatcher has received all completion messages for a given submission, it marks the submission as completed in the Datastore. Direct submissions are subject to submission quotas. What this means is that a given user cannot have more than X concurrent submissions. Submission quotas prevent the system from being overloaded. Although in this case, the user is guaranteed to get a result for their submission. Page | 7 UNCLASSIFIED HIGH VOLUME INGEST For high volume ingest, all submissions are routed from the API entry point to a persistent ingest queue. The middleman process will pull from this queue, removing duplicates and whitelisting files, to reduce the load on the system before sending them for processing. For each unique file that middleman encounters, it sends the submission to the dispatcher. Once a submission makes it to the dispatcher the workflow is the same as a direct user submission with one addition. When a submission is completed the dispatcher notifies middleman that the submission is completed. If the submission’s score reaches the system’s alert threshold, middleman will task Alerter to create an alert for the given submission. In this mode, submissions are not subjected to quotas they are instead prioritized and submitted in order of priority. When the system is overloaded, queued submissions will be randomly deleted to reduce load on the system. You are not guaranteed to have a result for your submission if the system is overloaded. Page | 8 UNCLASSIFIED INFRASTRUCTURE The Assemblyline infrastructure consists of 5 different type of servers: Core, Datastore, Worker, Logger and Support. SERVERS CORE The Core server is the central server. It is responsible for orchestrating the operation of the system. It hosts the web interface/API; the components responsible for receiving and dispatching tasks to the workers; as well as caching the files being processed, locally.. The following off-the-shelf components run on the core server: ● Redis (Queuing) ● ProFTPd (File transfer to the workers) ● NGinx, UWSGI, Flask, Angularjs (Web server/API) ● Gunicorn (Websocket) ● HaProxy (Load balancing) The following custom components (described later) also run on the core server: ● Dispatcher (Tasking) ● Middleman (High volume data ingestion) ● Expiry, Journalist (Data deletion) ● Alerter, Alert Action, Workflow Filter (Alerting) ● System metrics, metricsd (Infrastructure health management) ● Quota Sniper (Quota and session manager) Page | 9 UNCLASSIFIED DATASTORE The Datastore servers are where the data is stored before it expires. These server use Riak RV to store the data with full text indexing using SOLR. Riak is fully distributed with redundancy and is horizontally scalable. The following off-the-shelf components run on the datastore servers: ● Riak RV (Data storage) ● SOLR (Full text indexing - Built-in Riak KV) The following custom components (described later) also run on the datastore servers: ● System metrics (Infrastructure health management) WORKERS The worker boxes are responsible for hosting the services. There are two types of workers: provisioned and flex. Provisioned workers load a profile associated to their MAC address from the datastore and spawn the different services and VMs associated to the profile. They ensure the services and the VMs stay up 100% of the time for faster throughput. Flex workers don't have a specific job to do. Instead they inspect current bottlenecks in the infrastructure and spawn services or VMs to address the current bottleneck(s).. The following custom components (described later) run on the workers: ● Controller (Tasking for hostagent) ● Hostagent (Running services and VMs) ● System metrics (Infrastructure health management) Page | 10 UNCLASSIFIED LOGGER (OPTIONAL) The Logger server is where all the metrics and logs from the different servers and components are aggregated. This server is optional because it is not required for proper processing of the data. That said for cluster deployment, it is recommended to have a Logger server to make sure the system stays in good health. The following off-the-shelf components run on the logger: ● NGinx (Web proxy) ● Kibana 4 (Logs and Dashboard display) ● Elasticsearch (Logs and metrics storage) ● Syslogd (Logs aggregator) ● Logstash (Parser for logs going to Elasticsearch) ● Filebeat (File ingestor for Logstash) The following custom components (described later) also run on the logger: ● System metrics (Infrastructure health management) SUPPORT (OPTIONAL) Support servers are optional but some services require external components to be installed to work properly. A support server is useful for: ● Storing the VMs disks for the multiple Assemblyline VMs you might have in your infrastructure. ● Hosting a Docker registry for the different services that uses Docker containers. ● Hosting a database that a service can query during execution (NSRL for example) ● Hosting an ICAP proxy for scanning your files through an AV scanner There are no install procedures for support server, there are no internally built components that run on support servers. They are just another thing you have to consider when you do a deployment. Page | 11 UNCLASSIFIED COMPONENTS The following is a list of all the components in the system, their purpose, and how they fit into the infrastructure. DISPATCHER Dispatcher is the core tasking component in the system. It checks the type of the file submitted and the list of services that are registered and currently online and routes each file to the appropriate service. It keeps track of children for a given file (zip extraction, etc...) and ensures that a submission is not completed until all the children have been processed and all files sent to appropriate services. It keeps track of errors in the system and re-queues jobs if it detects that a failure is recoverable. It is the dispatcher’s job the mark a submission completed when all work is done. Dispatcher does all its queuing using non-persistent Redis queues. If the dispatcher is restarted, all inflight submissions are restarted from the beginning. This is usually not a problem because Assemblyline has service level caching. Dispatcher also keeps metrics on how many files are being completed in the system. MIDDLEMAN Middleman is our high volume ingestion component. It takes all submissions created using the ingest API and 1 sorts them into different priority queues (Critical, High, Medium and Low). It then fills half of the dispatcher's maximum inflight queue with submissions starting with the highest priority queues and continuing until all queues are exhausted. Middleman can also deal with impossible to finish backlogs. When the queues reach a certain threshold, middleman will start sampling the queues using method that is increasingly aggressive in proportion to the size of the backlog to randomly remove submissions from the priority queues to ensure that the queues don't keep growing forever. When sampling, submissions are removed from both the front and back of the queues. Middleman also ensures that we don't duplicate work and will dedup submissions before it sends them to the dispatcher. It also applies whitelisting regular expressions to the metadata of the submissions, if provided. Metrics are reported on number of duplicate, ingested, whitelisted, and completed files as well as the number bytes ingested and completed. Messages are also sent to Alerter to create alerts for submissions with scores that meet the system alert threshold threshold and where an alert was requested. 1 The priority queues are starving queues. All critical submissions are processed before starting highs and so on... Page | 12 UNCLASSIFIED EXPIRY / EXPIRY WORKERS 2 Expiry takes care of data deletion It cleans up of every piece of information that has reached it’s Time To Live (TTL). Every single piece of information in Assemblyline is tagged with an __expiry_ts__ field which dictates the time at which this information will disappear from the system. Expiry uses SOLR indexing and search for expired data then queues items of data for the expiry workers which in turn delete the data from the system. As part of system optimisation, one of the data buckets (emptyresult) does not use search to expire the data. Instead we use journal files and avoid having to index an insane amount of data that we only use for caching. This is explained in more detail later in the data layout section. Assemblyline is not designed to be a Knowledge Base. This is why data is expired from the system. Default TTL for the data is 15 days. JOURNALIST Journalist is an expiry system optimization, it takes care of writing journal files for emptyresult items. It reads those items from a Redis queues and writes them to timestamped files for expiry to process later. ALERTER Alerter is the component responsible for generating alerts. It receives a notification from middleman for all submissions where an alert was requested and the submission’s score reaches the systems alert threshold. When creating an alert, alerter gathers the features (Tags) our system has extracted for a submission and generates an alert based on the mix of these features and the metadata that was part of the original submission. Even though submissions are deduped at the middleman level, middleman keeps track of these duplicates and, in the case of an alert, sends a notification for each duplicate to alerter. Alerter then creates one individual alert for every file ingested. This way, if one file was seen a 100 times we will have a 100 alerts and the associated metadata for each individual alert. We can then use all that data to create a threat profile and more easily mitigate the problem. ALERT ACTIONS Alert actions is used to makes sure two actions taken on a specific alerts happen one after the other and not at the same time. All API calls or workflow actions to label, change priority, take ownership or change status of an alert are placed in a Redis queue and dispatched to an internal alert action instance using a deterministic feature of the alert. This ensures the actions are processed sequentially but still distributed to multiple processes for speed. Alert actions reports metrics on the number of alerts created. 2 TTL in assemblyline should be thought of as DTL or Days To Live. Page | 13 UNCLASSIFIED WORKFLOW FILTERS In the alert page of the system, the analyst can build and save search queries that will then be used for labelling, changing the priority or changing the status of an alert. Workflow filter runs those queries on newly inserted alerts and sends messages to alert actions to apply the action described to all alerts matching the query. SYSTEM METRICS System metrics is in charge of gathering CPU, memory, load, network and many other metrics and shipping those directly the elasticsearch database on the Logger server. It also gathers Riak and SOLR specific metrics. This component is only installed when a Logger server is defined in the seed. METRICSD The Metricsd component is in charge of aggregating Assemblyline specific counters reported by middleman, dispatcher, alerter and hostagent over minute intervals and storing these aggregated counters in the elasticsearch database on the Logger server. This component is only installed when a Logger server is defined in the seed. CONTROLLER Controller is an extremely lightweight component that runs on the workers. Controller’s only responsibility is to start, stop and restart the hostagent component. This functionality is used by the host management page so we can restart the hostagent in batch without having the log in to the different boxes. HOSTAGENT Hostagent is the component responsible for reading the worker’s profile from the datastore and loading the number of services and VMs as described in the profile. It keeps track of each service and VM that it launches and makes sure that they stay alive. It is also responsible for providing heartbeats to the system to let the UI and the dispatcher know that the different services are alive and are ready to receive tasks. When the hostagent tries to instantiate a VM, it will make sure that it actually has the VM disks on the worker host and download them if they are missing. Hostagents report metrics on the effectiveness of the service caching. QUOTASNIPER QuotaSniper is the component that makes sure that the different quotas and sessions for each user expire at the right time. Page | 14 UNCLASSIFIED DEPLOYMENTS There are different ways to deploy Assemblyline and each type of deployment has a set of minimum requirements that must be met to ensure a pleasant experience with the system. It should also be noted that when you install Assemblyline on a server or VM, it takes over that box. Assemblyline does not play nice if installed alongside other software because it does not share resources. Stay away from custom partitioning. Just give Assemblyline machines one partition with the full drive or you will run into issues. Assemblyline writes tons of logs and makes heavy use of the /tmp folder. You've been warned... TYPES The most common deployment types are described below along with particularities and recommendations for each type of deployment. DEVELOPMENT VM This is the most basic deployment type. You can use this type of deployment when you want to test the system or if you are working on a new service for the system. On a development VM, you install the Core, Datastore and Worker all on the same box. No need to install Logger or any support server for that. (unless the service you're working on requires a support server...) The minimum requirements for a development VM are as follows: ● 2 cores ● 6 GB of ram ● 20 GB Disk space You can follow the documentation on GitHub for step by step commands on how to install your development VM: https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/install_development_vm.md When you’re done installing your development VM, it will be provisioned with the default set of services that can run out of the box without additional interventions. Because you will do development on this VM, you should run the services you are developing using the run_service_live script. Page | 15 UNCLASSIFIED APPLIANCE This is basically an extremely light weight deployment. When you send only a few thousand files a day at most. The kind of installation to help your analysts go through their daily samples. An appliance installation is a one server installation. Ideally you can get your hands on something a bit bigger then the spec you'd use for a Development VM. For an Appliance installation, you install the Core, Datastore and Worker all on the same box like the Development VM. You can even install what you'd install on a support server directly on that box as well. Again, no need to install Logger for such a small install. An appliance install can still be done inside a VM with better specs than the Development VM but if you do that you will not be able to run services that need to instantiate a VM (Cuckoo, or any windows based service) because nested virtualization does not work that well. For this reason, we recommend to do the appliance install on bare metal. Recommended specs for an appliance install: ● 16+ threads (8 cores + hyper threading) ● 96+ GB Ram ● 1+ TB Disk space (Get some RAID 5 configuration because this is your only redundancy) - SSD are preferred but not required You can follow the documentation on GitHub for step by step commands on how to install your appliance: https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/install_appliance.md When you’re done installing your appliance, it will be provisioned with the default set of services that can run out of the box without additional interventions. Page | 16 UNCLASSIFIED PRODUCTION CLUSTER This is the go big or go home type of install. It can easily do millions of files per day depending on the number of worker boxes you assign to it. The buy-in for this kind of install is significant. It requires at minimum 12 boxes if you want something that is manageable and that makes sense. It scales somewhat linearly with the exception of the core server which cannot be scaled out yet. On our current production cluster this has yet to become an issue. The minimum setup for this type of installation is: 5 Riak nodes, 5 workers, 1 core and 1 logger. ● Riak recommends for proper distribution of the replicated data that your cluster be at minimum 5 nodes which is why we recommend 5 Riak nodes. ● We recommend at minimum 5 workers because with the size of the Riak cluster and the speed of the components on the core server, you'll need at least that many workers to make sense of the 6 boxes you've spent on Riak and Core. ● Core cannot be scaled out yet therefore only one box is assigned to it. ● For this type of install it is highly recommended (see almost mandatory) that you run a Logger server. If you don't have one, it will be really hard for you to maintain your infrastructure in good health. ● You may need a support box, which brings the total to 13 boxes. Although if your Logger box is beefy enough you can probably have it fulfill a dual role as both a Logger and Support box. ● You could technically create everything inside a virtual environment like VMWare vSphere or even Amazon AWS but if you do, you'll lose the ability to run virtualize services (Cuckoo, or any windows based service) from inside your infrastructure which is why we recommend running on bare metal hardware. The recommended specs for the boxes detailed above are as follows: ● ● 1x Core o 16+ threads (8 cores + hyper threading) o 96+ GB Ram o 1+ TB Disk space (Unless you've configured your seed otherwise this server stores your files better have lots of HD space.) 5x Riak o 16+ threads (8 cores + hyper threading) o 96+ GB Ram o 1+ TB Disk Space (SSDs are highly recommended for Riak as they will reduce latency spikes when the combination of your data and the search index size does not fit in OS cache.) Page | 17 UNCLASSIFIED ● ● 5x Workers o Any number of cores (Obviously the bigger the better but this is where you can easily recycle old hardware.) o 4 GB of ram / core (This will allow you better utilize your CPU power.) o 10GB Disk space per logical core with a minimum of 256 GB 1x Logger o 8+ threads (4 cores + hyper threading) o 48+ GB Ram (Depending of how long you want to keep logs and metrics. You might have to set elasticsearch to use up to 31GB of heap and if the elasticsearch indexes can be kept in OS cache the better...) o 1+ TB Disk Space (All logs and metrics counters will be shipped to this box and indexed and some of our logs are quite verbose.) You can follow the documentation on GitHub for step by step commands on how to install your cluster: https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/install_cluster.md When you’re done installing your cluster, it will be provisioned with the default set of services that can run out of the box without additional interventions. However it will most likely be over allocated and extremely un-balanced because the default profile instantiate one of each service on each workers regardless of their CPU and RAM usage. You should use the provisioner to provision and balance your cluster because you'll quickly find that it is much easier than creating a profile for each worker by hand. CUSTOM INSTANCE Assemblyline provides a foundation for distributed file analysis but when deploying it to your own infrastructure you should setup a custom instance so you can add in your own authentication layer (LDAP, PKI, …); your own branding; and/or any other features particular to you use case or features that may not be shareable with the rest of the community. Assemblyline provides a create_deployment.py script which you should have ran if you followed any of the installation documents. This script will ask you questions about your deployment in order to create the boilerplate for your custom deployment showing you how to add custom installation hooks, custom APIs, custom web pages, etc… You can refer to the Assemblyline UI readme file for more details on how to add functionalities to your deployment: https://bitbucket.org/cse-assemblyline/al_ui/overview Page | 18 UNCLASSIFIED SERVICES The premise of Assemblyline is that any command line tool, API or database can be wrapped with a thin python layer we call services. Services are easy to write. We provide a quick guide on how to add services to the system in our documentation: https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/create_new_service.md If you create a new service that can be shared with the rest of the community, you are encouraged to share the service with a pull request. Assemblyline ships with more than 30 services which you can use in your deployment. Some of these service require external licences or access to external datasets. It is your responsibility to make sure you have all the appropriate license agreements in place for the services you intend to use. SERVICE VM Assemblyline provides a means to use virtual machines to run services that could be damaging to your infrastructure (for example, they execute the file that they are scanning), that require special packages installed that may conflict with other service packages or that simply cannot execute from inside a Ubuntu 14.04 host (e.g. windows only services). Creating a service VM can be done in a few simple steps: ● Create a base disk for the OS you'll run the service on (if it does not exist already) https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/build_vm_base_disk_ubuntu_14.04.md https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/build_vm_base_disk_windows_10.md ● Derive a VM from the base disk and install dependencies https://bitbucket.org/cse-assemblyline/assemblyline/src/master/docs/build_worker_vm.md ● Drop the disk in the vmm/disks/ directory on the server referenced in the support_urls section of the seed ● Create a Virtual Machine entry in the management interface for your service that reference the disk that you've just created. ● From then on, in the provisioner or from the profile editor, you can add instances of your VM and then hostagent will download the VM disk and run it. Page | 19 UNCLASSIFIED DATA LAYOUT Assemblyline data is stored in Riak KV using a key as a unique identifier for a JSON object. The data is stored in different buckets with distinct indexes and replication factors. COMMON FIELDS Buckets have a set of common fields related to data expiry and data security. Buckets that have their data expired after a certain TTL will have and __expiry_ts__ field at the root level in their JSON document which dictates the time at which the data will be expired. Buckets that are subject to object level data security will have the following fields at the root level in their JSON document: __access_grp1__, __access_grp2__, __access_lvl__, __access_req__ and c lassification. BUCKETS These are the different types of bucket you will find in the datastore: ALERT The alert bucket stores metadata about the submissions that were ingested through the ingest API and that meet the scoring threshold for an alert-able submission. This bucket has a non-fixed set of fields and the index for this bucket features a catch-all field to index all fields as strings. Any and all submission metadata is preserved as part of the alert. BLOB The blob bucket is one of the few non-indexed buckets. It stores some configuration data and some aggregated statistics about the system. EMPTYRESULT The emptyresult bucket is used for service level caching in the case where the service had nothing to report. Data in the emptyresult bucket is not indexed and is deleted using a special journaling technique to make it easier on Riak and SOLR. ERROR The error bucket is used to store errors reported by the different services while scanning a given file. This bucket is fully indexed so admins can search for errors in the system. Page | 20 UNCLASSIFIED FILE The file bucket is used to store information about a given file like the different hashes, its size, its type, ... basically any static properties of the file. The file bucket is also use to keep the file storage in sync with the datastore. Every time a file object is deleted from the datastore, it's file equivalent on disk on the filestore is deleted as well. This bucket is fully indexed so anyone can search for file properties in the system. FILESCORE The filescore bucket is used for submission caching by the ingest API. It tracks each file ingested along with their associated resulting score so if the same file is ingested multiple times in a short period of time, it will not rescan the file but use its previous results. This bucket indexed only on the __expiry_ts__ field for expiry to delete the data. NODE The node bucket stores information about the different worker nodes that are registered. This bucket is fully indexed but only admins are allowed to search it. PROFILE The profile bucket stores the different service allocation profiles that the worker nodes can use. Profiles usually contain the number of each service that will run directly on the host and number of VMs of each service that the host will launch. This bucket is fully indexed but only admins are allowed to search it. RESULT The result bucket stores the result of a given service running on a given file. This bucket is fully indexed so anyone can search for results in the system. SIGNATURE The signature bucket stores the different Yara signatures that our Yara service can run. The Yara signatures are stored into a special JSON encapsulation that makes it easier to enforce our metadata requirements from our malware standard. This bucket is fully indexed so anyone can search for signatures in the system. SUBMISSION The submission bucket store details about a submission to the system. Optional metadata can be provided if using the ingest API which will then be propagated into the alerts when the alerting score threshold is reached. This bucket has fields and is fully indexed except a metadata section where a catch-all field indexes all fields found as strings. USER The user bucket stores the user’s details, access controls and options. It is fully indexed but only admins are allowed to search it. Page | 21 UNCLASSIFIED DATA SECURITY Assemblyline has object level security built into all API endpoints. The data security model is referred to as classification and is based on the Canadian classification marking model. It has been made generic so you can create your own classification model. The classification model is saved into 5 fields that are used by either SOLR or python to only serve data to users that are allowed to see it. In the case of SOLR, a filter query is added to all queries made to SOLR to filter out the data that the user is not allowed to see. For direct Riak KV access, the classification string is compared against the user's maximum classification to be sure he has access to the data. TEXT REPRESENTATION The text representation of the classification if saved into the 'classification' field in the JSON document. This field is used to display the classification to the user and is also used by the python part of the classification engine to validate user access to data. PARTS The classification can be exploded in 4 different parts that each serve different purposes. Each part has a name and short name value to make it human readable. LEVEL Level is a 0 to X integer value where the user has to have a number greater or equal to the object they are trying to get access. This value is saved in the __access_lvl__ field in the JSON document. REQUIRED The required part of the classification is a list of tokens that the user must have in order to access to the object. This value is saved in the __access_req__ field in the JSON document. GROUP The group part of the classification is used more as a dissemination list. A user that tries to access an object must be part of at least the groups the object disseminates to. If no groups are specified on an object, it means that all groups have access to it. This value is saved in the __access_grp1__ field in the JSON document. SUBGROUP The subgroup part of the classification is used as a second level dissemination list. A user that tries to access an object must be part of at least the subgroups the object disseminates to. If no subgroups are specified on an object, it means that all subgroups have access to it. This value is saved in the __access_grp2__ field in the JSON document. Page | 22 UNCLASSIFIED MANAGING THE SYSTEM There are numerous ways to manage the system. We've tried to build as much functionality in the UI as possible but where the UI falls short, we have a couple of ways of other ways to fix the system or to perform some routine management tasks. DASHBOARD The dashboard is the only piece of management that is accessible to all users of the system. It gives you an indication of how busy the system is and which components are falling behind. You get one card for middleman' s health, one for dispatcher's and one for each of the services currently loaded in the seed. It is the basic “green card is OK, red you have a problem”-type of interface. When a card goes red, the part that is problematic will be highlighted so you can easily what is wrong in the system. MANAGEMENT UI The management UI should help you perform 90 % of your administrative needs. BUILD DOCUMENTATION The Build documentation section is where you'll find step by step instructions on how to create services, how to create service VMs, how to build an appliance or a cluster, etc... You can add your own build documentation to this section by adding markdown documents to /opt/al/pkg/private/docs/. CONFIGURATION The configuration section is where you can edit your currently deployed seed. You can also use this page to diff your current running seed with the seed you've used for installation and the seed as it is saved in the code. The idea is that you can use this section to tweak parameters of the system and that when you are comfortable with them, you diff your running seed with the seed in the code and you apply the changes that are working well in the live seed to the seed in the code. ERROR VIEWER The error viewer is a place where you can search through the errors that the different services generated so you can fix those error once and for all. HOSTS The hosts management page is used to display the health of all the physical workers and virtual machines of the system. For each host, it displays CPU usage, RAM Usage, Disk Usage, Number of services and Number of VMs the host is running. Each card in the hosts management page can be flipped to reveal the full list of services and VMs on the host. It also reveals buttons that you can use to interact with the hostagent service of each worker. You can restart, stop and start the hostagent as well as pause and resume task execution. The flipped side of the card also lets you enable/disable a host or change its profile if you click the expand button. Page | 23 UNCLASSIFIED When multiple cards are flipped, there are buttons at the top of the interface that will let you interact with all the flipped cards at the same time. This is especially handy to restart all hostagents and the same time when you push a patch to a given service. Cards in the host management page are color coded: ● ● ● ● Green cards mean that all required processes are running like they are supposed to Gray cards mean that the hosts is either disabled or the hostagent is not running Orange cards mean the hostagent is running but the controller isn’t this mean that the buttons to interact with the host won't work on this host, you'll have to SSH to the box Red cards mean that the host is unreachable, both hostagent and controller are down PROFILES The profile management page lets you edit the different profiles each node is running. For each profile, you can set the number of each service you'd like to run and the number of each VMs you'd like to run. You can even set configuration overrides for each service in case one box does not work like the rest. That said you should not really have to use this page at all, the preferred way to provision the system is by using the automatic provisioner. It is much simpler to use and it prevents you from over-provisioning the worker nodes and potentially reducing the performance of the system. In the case where you are running an appliance, the recommended way to provision an appliance is to add services or VMs to the al-worker-default profile created at installation and to restart the hostagent from the hosts management page. PROVISIONING The provisioning page gives you an overview of how many cores and how much ram you have in your infrastructure then lets you provision the different services, VMs and flex nodes without letting you overprovision the cluster. This is the easiest way to the provision the cluster when you have a lot of nodes. SERVICES The service management page lets you change the properties of a service. It is the interface you want to use to change the configuration variable of a service. When changing properties through this interface, you are in fact changing properties inside the seed. You could technically use the seed editor and get the same results. The different properties of the services will be explained in the 'Seed demystified' section of this document. SITE MAP The site map gives you a lists of all pages and API with their corresponding function in the code as well as: ● ● ● ● Which HTTP methods you can use on this page Is it protected by authentication? Does it require admin privileges? Are the page parameters are audited in the UI audit logs? Page | 24 UNCLASSIFIED USERS The user management page gives you a way to edit the settings of the different users or to add/remove users from the system. Note: By default, Assemblyline uses it’s own internal authentication. You can override the authentication layer with your own authentication. VIRTUAL MACHINES The Virtual Machines management page lets you change the properties of an AL VM. This is the interface you will want to use to change the configuration variables for VMs. When changing properties through this interface, you are in fact changing properties inside the seed. You could technically use the seed editor and get the same results. The different properties of the VMs will be explained in the 'Seed demystified' section of this document. CLI Even though the majority of the management tasks can be done in the UI, there is the occasional time where you'll have to go through the CLI to right some wrong or to backup/restore data in the system. The Assemblyline CLI provides loads of functionalities that are extremely dangerous and could lead to disastrous consequences for your production infrastructure. Use with caution... We will explain some features of the CLI that you may have to use one day. For the rest, you can always run al_cli help or al_cli help for more detailed help. BACKUP This backup command will backup the data from all system buckets to a destination folder. The destination folder must be placed in a directory where the AL user can write, maybe you should backup to /opt/al or /tmp. BACKUP [FOLLOW] [FORCE] This backup command lets you issue a SOLR query against a given bucket then backs up all content that matched your query. The query part of this command must be placed into quotes if the query contains spaces. Once you run the command, it will show you a piece of the data that matches your query and give you a count of how many objects will be backed up. It will ask you to confirm that you indeed want to proceed with the backup. Note: If you want to skip the confirmation, you add the force keyword to the command You can also create a follow backup by adding the follow keyword to the command. A follow backup is an intelligent backup that will not only backup the bucket where you run the query but will also find any links between the objects that match your query and other objects in other buckets that are related to what matches your query. A follow backup of a submission, for example, will also backup: file, result, emptyresult and error. Page | 25 UNCLASSIFIED DELETE [FORCE] This command is especially useful to get rid of bad data in your system. Do a simple SOLR query to isolate the data you want to delete in a specific bucket and this command will delete it all. DELETE S UBMISSION F ULL [FORCE] This command will help you fix the problem where someone submitted a massive number of files that they were not supposed to or they simply misclassified their submissions. Write a simple SOLR query that isolates those submissions and this command will delete the submissions and all other associated objects removing any trace that the submission ever existed. Note: The full keyword only works on the submission bucket. Also, you can delete a single submission using the web interface while viewing the submission. INDEX C OMMIT [ ] Some buckets have their index committed only every 15 minutes for performance reason. This command instructs SOLR to commit all its indexes now instead of waiting for the commit timeout. Optionally you can specify the bucket you need the indexes to be committed. RESTORE This command is used to restore backups created using the backup command. SIGNATURE C HANGE_STATUS B Y_QUERY [FORCE] This command allows you to switch a bunch of signatures at the same time to another status. Let's say you have 100 signature in STAGING that you want to promote to DEPLOYED, you can simply write a SOLR query that can isolate those signatures and run that command to promoted them all at the same time instead of doing that one signature at the time in the UI. Page | 26 UNCLASSIFIED IPYTHON In the rare case where some data would be corrupted by either a patch gone wrong or something else, you might have to programmatically fix the datastore by iterating through its data. iPython will be your friend in those cases. We will provide you with the necessary lines of python to connect to the datastore and a few pointers regarding what functions from the datastore not to use on a production cluster. LOAD DATASTORE It is extremely simple to load the datastore in iPython. Simply type the two following lines: from assemblyline.al.common import forge ds = forge.get_datastore() This will give you a datastore object "ds". Note: In the rare occasion that your seed is corrupted in a way where forge refuses to load. You can assign a static seed to be used to load the forge object when launching iPython: ~# AL_SEED_STATIC=python.path.to.seed ipython WHAT TO AND NOT TO DO WITH A DATASTORE OBJECT With your datastore object you should avoid any list_debug_keys() function because they can completely take down your cluster if you have too much data. Also avoid _stream_bucket_keys() function because this can also take down your cluster. Try to be as precise as possible and if you need to go through a lot of data, use the stream_search function to get all the keys related to your data. Remember that Riak is a key/value pair datastore. To edit an object, you need to know it's key first. For each of the buckets in the datastore, a datastore object should have a get_ and a save_ function to get an object using a key and to save that object back to the datastore. If you’re in that deep, the only other thing we can add is, good luck! Page | 27 UNCLASSIFIED REPAIRING CORRUPTED ELEVELDB INDEX In the event of major hardware or filesystem problems, LevelDB can become corrupted. These failures are uncommon, but they could happen, as heavy loads can push I/O limits. To check whether one of your nodes has corrupted eleveldb index, you will need to run a shell command that searches for Compaction Error in each LOG file. find/ var/lib/riak/leveldbname" LOG"execg repl' Compactione rror'{ }\ ; If there are compaction errors in any of your vnodes, those will be listed in the console. If any vnode has experienced such errors, you would see output like this: /var/lib/riak/leveldb/ 442446784738847563128068650529343492278651453440 /LOG /var/lib/riak/leveldb/ 442446784738847563128068650529343492278651453441 /LOG When you have discovered corruption in your LevelDB backend, follow the steps below to heal your corrupted LevelDB: (Note: Repairing eleveldb indexes may take several minutes.) 1. Stop the node: riaks top 2. Open an erlang shell to run the eleveldb:repair function: `riake rtspath`/erl 3. Once in the shell, run the following commands one by one, make sure the VNodeList variable reflects the output of the find command: (commands ends with a ‘.’) application:set_env(eleveldb,d ata_root," "). Options= [ ]. DataRoot= " /var/lib/riak/leveldb". VNodeList= [ "442446784738847563128068650529343492278651453440", "442446784738847563128068650529343492278651453441"]. RepairPath= f un(DataRoot,V NodeNumber)>P ath= l ists:flatten(DataRoot+ +" /"+ + VNodeNumber),i o:format("Repairing~ s.~n",[Path]),P athe nd. [eleveldb:repair(RepairPath(DataRoot,V NodeList),O ptions)| |V NodeNumber< VNodeList]. 4. Stop erlang shell: ctrl+ g q 5. Restart riak: riaks tart Page | 28 UNCLASSIFIED SEED DEMYSTIFIED The seed is basically a one stop shop for the configuration of your infrastructure. It is a giant JSON dictionary that contains and the installation parameters and the live values. It is comparable to the Windows registry. We will explain you every single parameter of the seed, their impact and effects on your system after installation and which components of our infrastructure they affect. If a setting is used at installation, changing it on a live system will have no effects but if the setting is used live, you need only restart the affected components and the setting will be in effect. SEED.AUTH seed.auth.allow_2fa Description: Turn on/off 2-Factor authentication via One-Time password (Google Authenticator, Microsoft Authenticator...) Datatype: bool Used when: Live Components affected: UWSGI seed.auth.allow_apikeys Description: If ‘true’, allow users to create apikeys to log into the system via scripts instead of using their personal creds. Datatype: bool Used when: Live Components affected: UWSGI seed.auth.allow_u2f Description: Turn on/off 2-Factor authentication via FIDO U2F certified hardware tokens Datatype: bool Used when: Live Components affected: UWSGI seed.auth.apikey_handler Description: Python class path to the method used to handle apikey login requests. Datatype: string Used when: Live Components affected: UWSGI Page | 29 UNCLASSIFIED seed.auth.dn_handler Description: Python class path to the method used to transform a DN into a username. Datatype: string Used when: Live Components affected: UWSGI seed.auth.encrypted_login Description: Turn on/off second level encryption for all passwords send to the login API. (Man in the middle protection) Datatype: bool Used when: Live Components affected: UWSGI seed.auth.userpass_handler Description: Python class path to the method used to handle user/password login requests. Datatype: string Used when: Live Components affected: UWSGI SEED.AUTH.INTERNAL seed.auth.internal.enabled Description: Enable or not internal authentication Datatype: bool Used when: Installation Components affected: uwsgi seed.auth.internal.failure_ttl Description: Amount of seconds a user will be disabled after too many failed login attempts Datatype: int Used when: Live Components affected: uwsgi seed.auth.internal.max_failures Description: Maximum number of login failures before the user account is locked down Datatype: int Used when: Live Components affected: uwsgi Page | 30 UNCLASSIFIED seed.auth.internal.strict_requirements Description: Enable or not strict password requirements (when users change their password) Datatype: bool Used when: Live Components affected: uwsgi SEED.AUTH.INTERNAL.USERS seed.auth.internal.users. .classification Description: Max classification for the given user Datatype: string Used when: Installation Components affected: Riak seed.auth.internal.users. .groups Description: List of groups the user is a member of Datatype: list[string] Used when: Installation Components affected: Riak seed.auth.internal.users. .is_admin Description: If the user is admin or not Datatype: bool Used when: Installation Components affected: Riak seed.auth.internal.users. .name Description: Full name of the user to create this user at installation Datatype: string Used when: Installation Components affected: Riak seed.auth.internal.users. .password Description: Password used for the user Datatype: string Used when: Installation Components affected: Riak Page | 31 UNCLASSIFIED seed.auth.internal.users. .uname Description: Username of the user Datatype: string Used when: Installation Components affected: Riak SEED.CORE seed.core.nodes Description: IPs or domains names for the core nodes. This server is not scalable yet so it needs to have only one entry. Datatype: list[string] Used when: Installation Components affected: All SEED.CORE.ALERTER seed.core.alerter.constant_alert_fields Description: List of fields where their content should not change during an alert update. Datatype: list[string] Used when: Live Components affected: alert_action seed.core.alerter.create_alert Description: Python class path to the function used to generate an alert Datatype: string Used when: Live Components affected: alerter seed.core.alerter.default_group_field Description: Default field used for grouping in the alerts perspective in the UI Datatype: string Used when: Live Components affected: uwsgi seed.core.alerter.filtering_group_fields Description: List of possible fields used for grouping the data that reduce the total dataset Datatype: list[string] Used when: Live Components affected: uwsgi Page | 32 UNCLASSIFIED seed.core.alerter.metadata_aliases Description: Dictionary of aliases for the different metadata fields in an alert Datatype: Dict[string] Used when: Live Components affected: uwsgi seed.core.alerter.metadata_fields Description: Dictionary that enforces field data types for the alerts. Datatype: Dict[string] Used when: Live Components affected: uwsgi seed.core.alerter.non_filtering_group_fields Description: List of possible fields used for grouping the data that does not reduce the dataset Datatype: list[string] Used when: Live Components affected: uwsgi seed.core.alerter.shards Description: Number of alerter instances that will be launched at start to be able to handle the processing load Datatype: int Used when: Live Components affected: alerter SEED.CORE.BULK seed.core.bulk.compute_notice_field Description: Generates computed fields based off of underlying metadata fields Datatype: string Used when: Live Components affected: alerter seed.core.bulk.get_whitelist_verdict Description: Python class path to the function used to check if a file should be whitelisted Datatype: string Used when: Live Components affected: middleman Page | 33 UNCLASSIFIED seed.core.bulk.is_low_priority Description: Python class path to the function that will lower priority of a ingest submission based on its metadata Datatype: string Used when: Live Components affected: middleman seed.core.bulk.whitelist Description: Python class path to the whitelist dictionary Datatype: string Used when: Live Components affected: middleman SEED.CORE.DISPATCHER seed.core.dispatcher.shards Description: Number of dispatcher processes to launch when the dispatcher service restart Datatype: int Used when: Live Components affected: middleman, dispatcher, hostagent, uwsgi SEED.CORE.DISPATCHER.MAX seed.core.dispatcher.max.depth Description: Maximum depth level for children in a submission Datatype: int Used when: Live Components affected: dispatcher seed.core.dispatcher.max.files Description: Maximum number of files a submission can have Datatype: int Used when: Live Components affected: dispatcher seed.core.dispatcher.max.inflight Description: Maximum number of concurrent inflight requests Datatype: int Used when: Live Components affected: dispatcher, middleman Page | 34 UNCLASSIFIED seed.core.dispatcher.max.retries Description: Maximum number of times a submission will be resent to a service before generating an error Datatype: int Used when: Live Components affected: dispatcher SEED.CORE.DISPATCHER.TIMEOUTS seed.core.dispatcher.timeouts.child Description: Maximum amount of time to wait for a child to be submitted before we expire it Datatype: int Used when: Live Components affected: dispatcher seed.core.dispatcher.timeouts.watch_queue Description: Maximum amount of time in seconds a watch_queue message stays in redis before expiring Datatype: int Used when: Live Components affected: dispatcher SEED.CORE.EXPIRY seed.core.expiry.delete_storage Description: Does expiry need to delete the file in the filestorage as well when it delete a file object entry Datatype: bool Used when: Live Components affected: expiry seed.core.expiry.workers Description: Number of expiry_workers processes that will be launched when the expiry_workers service start Datatype: int Used when: Live Components affected: expiry_workers SEED.CORE.EXPIRY.JOURNAL seed.core.expiry.journal.directory Description: Directory where the journal files are stored for expiry to delete the containing keys Datatype: string Used when: Live Components affected: journalist, expiry Page | 35 UNCLASSIFIED seed.core.expiry.journal.ttl Description: Number of days old the journal files have to be for expiry to start processing them Datatype: int Used when: Live Components affected: expiry SEED.CORE.MIDDLEMAN seed.core.middleman.classification Description: Default classification of the submission started by middleman Datatype: string Used when: Live Components affected: middleman seed.core.middleman.default_prefix Description: Default prefix that will be added to all ingest submissions description Datatype: string Used when: Live Components affected: middleman seed.core.middleman.dropper_threads Description: Number of dropper thread spawned by each of the middleman processes Datatype: int Used when: Live Components affected: middleman seed.core.middleman.expire_after Description: Time in seconds after which the internal caching expires Datatype: int Used when: Live Components affected: middleman seed.core.middleman.incomplete_expire_after Description: Time in seconds after which and incomplete submission expires Datatype: int Used when: Live Components affected: middleman Page | 36 UNCLASSIFIED seed.core.middleman.incomplete_stale_after Description: Time in seconds after which and incomplete submission becomes stale Datatype: int Used when: Live Components affected: middleman seed.core.middleman.ingester_threads Description: Number of ingester threads started by each of the middleman processes Datatype: int Used when: Live Components affected: middleman seed.core.middleman.max_extracted Description: Maximum number of extracted files extraction services can do per level for middleman submissions Datatype: int Used when: Live Components affected: middleman seed.core.middleman.max_supplementary Description: Maximum number of supplementary files services can submit for middleman submissions Datatype: int Used when: Live Components affected: middleman seed.core.middleman.max_value_size Description: Max size of the metadata fields that after which they are removed from the metadata section Datatype: int Used when: Live Components affected: middleman seed.core.middleman.shards Description: Number of processes of middleman that will be started by the service Datatype: int Used when: Live Components affected: middleman Page | 37 UNCLASSIFIED seed.core.middleman.stale_after Description: Number of seconds after which cache entries are considered stale Datatype: int Used when: Live Components affected: middleman seed.core.middleman.submitter_threads Description: Number of submitter threads started by each of the middleman processes Datatype: int Used when: Live Components affected: middleman seed.core.middleman.user Description: Username that middleman will use to submit files to the system Datatype: string Used when: Live Components affected: middleman SEED.CORE.MIDDLEMAN.SAMPLING_AT seed.core.middleman.sampling_at.critical Description: Number of queued critical priority samples after which middleman will start sampling Datatype: int Used when: Live Components affected: middleman seed.core.middleman.sampling_at.high Description: Number of queued high priority samples after which middleman will start sampling Datatype: int Used when: Live Components affected: middleman seed.core.middleman.sampling_at.low Description: Number of queued low priority samples after which middleman will start sampling Datatype: int Used when: Live Components affected: middleman Page | 38 UNCLASSIFIED seed.core.middleman.sampling_at.medium Description: Number of queued medium priority samples after which middleman will start sampling Datatype: int Used when: Live Components affected: middleman SEED.CORE.REDIS SEED.CORE.REDIS.NONPERSISTENT seed.core.redis.nonpersistent.db Description: Redis db number used to store non-persistent keys Datatype: int Used when: Live Components affected: All seed.core.redis.nonpersistent.host Description: IP or domain of the redis non-persistent host Datatype: string Used when: Installation Components affected: All seed.core.redis.nonpersistent.port Description: Port on which the redis non-persistent server listen on Datatype: int Used when: Installation Components affected: All SEED.CORE.REDIS.PERSISTENT seed.core.redis.persistent.db Description: Redis db number used to store persistent keys Datatype: int Used when: Live Components affected: All seed.core.redis.persistent.host Description: IP or domain of the redis persistent host Datatype: string Used when: Installation Components affected: All Page | 39 UNCLASSIFIED seed.core.redis.persistent.port Description: Port on which the redis persistent server listen on Datatype: int Used when: Installation Components affected: All SEED.DATASOURCES seed.datasources. .classpath Description: Python class path to the actual datasource object Datatype: string Used when: Live Components affected: uwsgi, hostagent seed.datasources. .config Description: Either a dictionary containing the config or a path in the seed where to get it’s config from Datatype: dictionary or string Used when: Live Components affected: uwsgi, hostagent SEED.DATASTORE seed.datastore.default_timeout Description: Default timeout for connections to the datastore (future - currently unused) Datatype: int Used when: Installation Components affected: All seed.datastore.hosts Description: List of hosts IP or domain used as datastore jump points when bootstrapping. Usually the core server. Datatype: list[string] Used when: Installation Components affected: All seed.datastore.port Description: Port used to connect to Riak PB. Only change this value in appliance mode. Datatype: int Used when: Installation Components affected: All Page | 40 UNCLASSIFIED seed.datastore.solr_port Description: Port used to connect to SOLR. Only change this value in appliance mode. Datatype: int Used when: Installation Components affected: All seed.datastore.stream_port Description: Port use to connect to Riak HTTP interface. Only change this value in appliance mode. Datatype: int Used when: Installation Components affected: All SEED.DATASTORE.RIAK seed.datastore.riak.nodes Description: List of all the Riak nodes IPs or Domain names Datatype: list[string] Used when: Installation Components affected: All seed.datastore.riak.ring_size Description: Number of partitions Riak will split the data in. (Once set this can never be changed) Datatype: int Used when: Installation Components affected: Riak SEED.DATASTORE.RIAK.NVALS seed.datastore.riak.nvals.high Description: Number of replication for high concurrency data buckets (Once set, this can’t be changed) Datatype: int Used when: Installation Components affected: Riak, SOLR seed.datastore.riak.nvals.low Description: Number of replication for low concurrency data buckets (Once set, this can’t be changed) Datatype: int Used when: Installation Components affected: Riak, SOLR Page | 41 UNCLASSIFIED seed.datastore.riak.nvals.med Description: Number of replication for medium concurrency data buckets (Once set, this can’t be changed) Datatype: int Used when: Installation Components affected: Riak, SOLR SEED.DATASTORE.RIAK.SOLR seed.datastore.riak.solr.gc Description: Extra parameters applied to SOLR Garbage collector (thread lightly... ) Datatype: string Used when: Installation Components affected: SOLR seed.datastore.riak.solr.heap_max_gb Description: Maximum amount of heap memory in GB allocated to SOLR (Don’t go over 31GB) Datatype: int Used when: Installation Components affected: SOLR seed.datastore.riak.solr.heap_min_gb Description: Minimum amount of heap memory in GB allocated to SOLR (Might as well set the same value as maximum) Datatype: int Used when: Installation Components affected: SOLR SEED.DATASTORE.RIAK.TWEAKS seed.datastore.riak.tweaks.10gnic Description: Should we apply the recommended tweaks for 10GB Network cards on the Riak Servers Datatype: bool Used when: Installation Components affected: Riak seed.datastore.riak.tweaks.disableswap Description: Should we disable swap on the different Riak nodes Datatype: bool Used when: Installation Components affected: Riak Page | 42 UNCLASSIFIED seed.datastore.riak.tweaks.fs Description: Should we apply recommended tweaks to the file system for Riak Datatype: bool Used when: Installation Components affected: Riak seed.datastore.riak.tweaks.jetty Description: Should we apply tweaks to SOLR’s jetty server on the Riak servers (May not actually do much, leave OFF) Datatype: bool Used when: Installation Components affected: Riak seed.datastore.riak.tweaks.net Description: Should we apply the recommended network tweaks on the Riak Servers Datatype: bool Used when: Installation Components affected: Riak seed.datastore.riak.tweaks.noop_scheduler Description: Should we apply the recommended disk scheduler tweak to the Riak servers Datatype: bool Used when: Installation Components affected: Riak seed.datastore.riak.tweaks.tuned_solr_configs Description: Should we tuned each individual SOLR index for high volume (You should leave that ON) Datatype: bool Used when: Installation Components affected: Riak Page | 43 UNCLASSIFIED SEED.FILESTORE seed.filestore.ftp_ip_restriction Description: CIDR block of allowed IPs to the FTP server Datatype: string Used when: Installation Components affected: ProFTPd seed.filestore.ftp_password Description: Password used to login to Core’s FTP server Datatype: string Used when: Installation Components affected: ProFTPd seed.filestore.ftp_root Description: Directory where the Core FTP user is locked into Datatype: string Used when: Installation Components affected: ProFTPd seed.filestore.ftp_user Description: Username used to login to Core’s FTP server Datatype: string Used when: Installation Components affected: ProFTPd seed.filestore.support_urls Description: List of fully qualified URLs (http://user:pass@domain/path) where support files are found (VM disks and such) Datatype: list[string] Used when: Live Components affected: All seed.filestore.urls Description: List of fully qualified URLs (http://user:pass@domain/path) where submitted files are found (VM disks and such) Datatype: list[string] Used when: Live Components affected: All Page | 44 UNCLASSIFIED SEED.INSTALLATION seed.installation.pip_index_url Description: Url to your local PIP repo if you are using one Datatype: string Used when: Installation Components affected: All SEED.INSTALLATION.DOCKER seed.installation.docker.apt_repo_info Description: Apt repo descriptor to add to your apt repos Datatype: string Used when: Installation Components affected: All seed.installation.docker.apt_repo_key_url Description: Url to fetch the apt repo key to add the docker apt repo to your repos Datatype: string Used when: Installation Components affected: All seed.installation.docker.private_registry Description: Domain:port of your private registry Datatype: string Used when: Installation Components affected: All SEED.INSTALLATION.EXTERNAL_PACKAGES seed.installation.external_packages. .args Description: Arguments passed to the transport (see PackageFetcher class in /opt/al/pkg/assemblyline/al/install/__init__.py for more details about the possible arguments of the possible transports) Datatype: dictionary Used when: Installation Components affected: All Page | 45 UNCLASSIFIED seed.installation.external_packages. .transport Description: Type of transport to be instantiated by the PackageFetcher class Datatype: string Used when: Installation Components affected: All SEED.INSTALLATION.HOOKS seed.installation.hooks.core_pre Description: List of python class path of the different hooks that will be executed during core installation Datatype: list[string] Used when: Installation Components affected: Core seed.installation.hooks.riak_pre Description: List of python class path of the different hooks that will be executed during Riak installation Datatype: list[string] Used when: Installation Components affected: Riak seed.installation.hooks.ui_pre Description: List of python class path of the different hooks that will be executed during the UI installation Datatype: list[string] Used when: Installation Components affected: uwsgi, nginx SEED.INSTALLATION.REPOSITORIES SEED.INSTALLATION.REPOSITORIES.REALMS seed.installation.repositories.realms. .branch Description: Name of the branch to checkout for this repo during install time Datatype: string Used when: Installation Components affected: All seed.installation.repositories.realms. .key Description: Private key of the user the will checkout the repo (optional) Datatype: string Used when: Installation Components affected: All Page | 46 UNCLASSIFIED seed.installation.repositories.realms. .password Description: Password of the user the will checkout the repo during install time Datatype: string Used when: Installation Components affected: All seed.installation.repositories.realms. .url Description: Fully qualified url to checkout the repo except the repo name Datatype: string Used when: Installation Components affected: All seed.installation.repositories.realms. .user Description: Username used to checkout the repo during install time Datatype: string Used when: Installation Components affected: All SEED.INSTALLATION.REPOSITORIES.REPOS seed.installation.repositories.repos. .realm Description: Name of the realm where the repo is found in. Datatype: string Used when: Installation Components affected: All SEED.INSTALLATION.SUPPLEMENTARY_PACKAGES seed.installation.supplementary_packages.apt Description: List of apt packages that are not required by the system to function but are useful to manage it Datatype: list[string] Used when: Installation Components affected: All seed.installation.supplementary_packages.pip Description: List of pip packages that are not required by the system to function but are useful to manage it Datatype: list[string] Used when: Installation Components affected: All Page | 47 UNCLASSIFIED SEED.LOGGING seed.logging.directory Description: Directory where the logs are being stored on the system (as to exists and be writeable by AL user) Datatype: string Used when: Live Components affected: All seed.logging.log_to_console Description: Should the logger log to console Datatype: bool Used when: Live Components affected: All seed.logging.log_to_file Description: Should the logger log to file Datatype: bool Used when: Live Components affected: All seed.logging.log_to_syslog Description: Should the logger log to syslog Datatype: bool Used when: Live Components affected: All SEED.LOGGING.LOGSERVER seed.logging.logserver.node Description: IP or Domain of the log server Datatype: string Used when: Live Components affected: All Page | 48 UNCLASSIFIED SEED.LOGGING.LOGSERVER.ELASTICSEARCH seed.logging.logserver.elasticsearch.heap_size Description: Size of elasticsearch java heap (Do not make bigger than 31GB) Datatype: int Used when: Installation Components affected: elasticsearch seed.logging.logserver.elasticsearch.index_ttl.al_metrics Description: Time to live in days for the al_metrics index in elasticsearch Datatype: int Used when: Installation Components affected: cron seed.logging.logserver.elasticsearch.index_ttl.audit Description: Time to live in days for the audit index in elasticsearch Datatype: int Used when: Installation Components affected: cron seed.logging.logserver.elasticsearch.index_ttl.logs Description: Time to live in days for the logs index in elasticsearch Datatype: int Used when: Installation Components affected: cron seed.logging.logserver.elasticsearch.index_ttl.riak Description: Time to live in days for the riak index in elasticsearch Datatype: int Used when: Installation Components affected: cron seed.logging.logserver.elasticsearch.index_ttl.solr Description: Time to live in days for the solr index in elasticsearch Datatype: int Used when: Installation Components affected: cron Page | 49 UNCLASSIFIED seed.logging.logserver.elasticsearch.index_ttl.system_metrics Description: Time to live in days for the system_metrics index in elasticsearch Datatype: int Used when: Installation Components affected: cron SEED.LOGGING.LOGSERVER.KIBANA seed.logging.logserver.kibana.dashboards Description: List of available dashboards in the kibana interface that will be exported to Assemblyline UI in an iFrame Datatype: list[string] Used when: Live Components affected: uwsgi seed.logging.logserver.kibana.extra_indices Description: List of extra indices to be loaded at install time Datatype: list[string] Used when: Installation Components affected: uwsgi seed.logging.logserver.kibana.extra_viz Description: List of extra visualisations and dashboards to be loaded at install time Datatype: list[string] Used when: Installation Components affected: uwsgi seed.logging.logserver.kibana.host Description: Datatype: string Used when: Live Components affected: uwsgi seed.logging.logserver.kibana.password Description: Password use to connect to the kibana interface on the log server Datatype: string Used when: Installation Components affected: Kibana Page | 50 UNCLASSIFIED seed.logging.logserver.kibana.port Description: Port to access the kibana dashboard Datatype: int Used when: Live Components affected: uwsgi seed.logging.logserver.kibana.scheme Description: Scheme to use to access the kibana dashboard (HTTPS) Datatype: string Used when: Live Components affected: uwsgi SEED.LOGGING.LOGSERVER.SSL seed.logging.logserver.ssl.crt Description: Path to the public key of the web server Datatype: string Used when: Installation Components affected: nginx seed.logging.logserver.ssl.key Description: Path to the private key of the logserver Datatype: string Used when: Installation Components affected: nginx SEED.MONITORING seed.monitoring.harddrive Description: Should the hard drive monitoring tool be installed on all servers Datatype: bool Used when: Installation Components affected: None Page | 51 UNCLASSIFIED SEED.SERVICES seed.services.categories Description: List of all the categories of services for easier selection in the UI Datatype: list[string] Used when: Live Components affected: uwsgi, dispatcher, hostagent seed.services.flex_blacklist Description: List of services that cannot be instantiated by the flex nodes Datatype: list[string] Used when: Live Components affected: Hostagent seed.services.stages Description: List of stages of execution in their actual execution order Datatype: list[string] Used when: Live Components affected: Dispatcher, hostagent seed.services.system_category Description: Category for the system services (aka services that are automatically added to all tasks) Datatype: string Used when: Live Components affected: uwsgi, dispatcher, hostagent SEED.SERVICES.LIMITS seed.services.limits.max_extracted Description: Maximum number of extracted files a service can create Datatype: int Used when: Live Components affected: Hostagent, uwsgi seed.services.limits.max_supplementary Description: Maximum number of supplementary file a service can create Datatype: int Used when: Live Components affected: Hostagent, uwsgi Page | 52 UNCLASSIFIED SEED.SERVICES.MASTER_LIST seed.services.master_list. .accepts Description: Regular expression of file types accepted by the service Datatype: string Used when: Live Components affected: dispatcher, hostagent seed.services.master_list. .class_name Description: Name of the class instantiated once installed Datatype: string Used when: Installation Components affected: All seed.services.master_list. .category Description: Category that the service falls under Datatype: string Used when: Live Components affected: dispatcher, hostagent seed.services.master_list. .classpath Description: Python class path to the main service class Datatype: string Used when: Live Components affected: dispatcher, hostagent seed.services.master_list. .config Description: Dictionary of configuration for a given service. This is free for all any service can have any kind of config. Datatype: dict Used when: Live Components affected: hostagent seed.services.master_list. .cpu_cores Description: Max cpu utilization while the service is running at full capacity - used for provisioning (1.0 = 1 core at 100%) Datatype: float Used when: Live Components affected: uwsgi Page | 53 UNCLASSIFIED seed.services.master_list. .description Description: Description of what the service do Datatype: string Used when: Live Components affected: None seed.services.master_list. .enabled Description: Is the service enabled in the system or not Datatype: bool Used when: Live Components affected: dispatcher, hostagent seed.services.master_list. .install_by_default Description: Should with call the service installer during installation Datatype: bool Used when: Installation Components affected: hostagent seed.services.master_list. .name Description: Name of the service. (It has to match the service class name and should never be changed) Datatype: string Used when: Installation Components affected: dispatcher, uwsgi, hostagent seed.services.master_list. .ram_mb Description: Maximum amount of ram in MB the service can consume while processing files - used for provisioning Datatype: int Used when: Live Components affected: uwsgi seed.services.master_list. .realm Description: Name of the realm to use to checkout the repository Datatype: string Used when: Installation Components affected: All Page | 54 UNCLASSIFIED seed.services.master_list. .rejects Description: Regular expression of file types rejected by the service Datatype: string Used when: Live Components affected: dispatcher, hostagent seed.services.master_list. .repo Description: Name of the repository on the realm Datatype: string Used when: Installation Components affected: All seed.services.master_list. .stage Description: Stage at which the service executes Datatype: string Used when: Live Components affected: dispatcher, hostagent seed.services.master_list. .submission_params Description: List of dictionaries describing the possible parameters that can be added to a submission for a given service. The dict objects in this list should look like this: { “default”: “”, # Default value for the parameter “type”: “”, # Type of parameter (int, str, bool, list) “name”: “”, # Name of the parameter “list”: “”, # (optional) If a list type, the list of possible values “value”: “” } # Value of the parameter Datatype: list[dict] Used when: Live Components affected: uwsgi seed.services.master_list. .supported_platforms Description: List of OS the service can run on (possible values: windows, linux) Datatype: list[string] Used when: Live Components affected: hostagent Page | 55 UNCLASSIFIED seed.services.master_list. .timeout Description: Time in seconds that after which a task is cancelled if the service did not have time to complete Datatype: int Used when: Live Components affected: dispatcher, hostagent SEED.SERVICES.TIMEOUTS seed.services.timeouts.default Description: Default service timeout in seconds for service that do not provide a timeout Datatype: int Used when: Live Components affected: hostagent SEED.STATISTICS seed.statistics.alert_statistics_fields Description: When generating statistics about currently viewed alerts, this is the list of fields that will be faceted Datatype: list[string] Used when: Live Components affected: uwsgi seed.statistics.submission_meta_fields Description: List of submission fields faceted when viewing detailed results about a file Datatype: list[string] Used when: Live Components affected: uwsgi SEED.SUBMISSIONS seed.submissions.decode_file Description: Python path to the built-in file un-neutering function Datatype: string Used when: Live Components affected: uwsgi, hostagent Page | 56 UNCLASSIFIED seed.submissions.password Description: Password used to connect to the submission host by the services Datatype: string Used when: Live Components affected: hostagent seed.submissions.ttl Description: Default time to live in days for submissions Datatype: int Used when: Live Components affected: middleman, uwsgi, dispatcher seed.submissions.url Description: Url containing scheme, host and port to connect to when doing resubmissions (https://localhost:443) Datatype: string Used when: Live Components affected: hostagent seed.submissions.user Description: Uname used to connect to the submission host by the services Datatype: string Used when: Live Components affected: hostagent seed.submissions.working_dir Description: Working directory for the submissions to be processed in (deprecated) Datatype: string Used when: Live Components affected: hostagent SEED.SUBMISSIONS.MAX seed.submissions.max.priority Description: Highest priority that can be set in the system Datatype: int Used when: Live Components affected: middleman, uwsgi, dispatcher, hostagent Page | 57 UNCLASSIFIED seed.submissions.max.size Description: Maximum file size of each files in the submission Datatype: int Used when: Live Components affected: uwsgi SEED.SYSTEM seed.system.constants Description: Python path to the constants file Datatype: string Used when: Live Components affected: All seed.system.country_code_map Description: Python path to a class that does geolocation for an IP Datatype: string Used when: Live Components affected: hostagent seed.system.load_config_from_riak Description: Should the seed information be loaded from Riak (Should be True in a normal install) Datatype: bool Used when: Live Components affected: All seed.system.name Description: Name of the system. If ‘production’ the UI will display the production interface else will display the dev interface Datatype: string Used when: Live Components affected: uwsgi seed.system.organisation Description: Acronym of the organisation which deployed the system Datatype: string Used when: Live Components affected: uwsgi Page | 58 UNCLASSIFIED seed.system.password Description: Password of the system user (leave null for production system) Datatype: string Used when: Installation Components affected: All seed.system.root Description: Assemblyline installation directory (you should leave it to /opt/al) Datatype: string Used when: Installation Components affected: All seed.system.update_interval Description: Time interval in seconds at which heartbeats are sent throughout the system Datatype: int Used when: Live Components affected: All seed.system.use_proxy Description: Should the system proxy connections to Riak and Redis (leave to True) Datatype: bool Used when: Installation Components affected: All seed.system.user Description: Username of the user that will execute the various components Datatype: string Used when: Installation Components affected: All SEED.SYSTEM.CLASSIFICATION seed.system.classification.definition Description: Dictionary that defines how the classification engine works in the system. This is much too complicated to explain here and the documentation on the class is very complete. Check the classification file for more details about this: https://bitbucket.org/cse-assemblyline/assemblyline/src/master/al/common/classification.py Datatype: dict Used when: Live Components affected: All Page | 59 UNCLASSIFIED seed.system.classification.engine Description: Python path to the classification engine class Datatype: string Used when: Live Components affected: All SEED.SYSTEM.INTERNAL_REPOSITORY seed.system.internal_repository. .branch Description: Branch workers will checkout when checking out a given repository Datatype: string Used when: Live Components affected: None seed.system.internal_repository. .url Description: Fully qualified url workers will used to checkout a given repository Datatype: string Used when: Live Components affected: None SEED.SYSTEM.YARA seed.system.yara.externals Description: List of external task values that will be accessible from inside the yara rules Datatype: list[string] Used when: Live Components affected: uwsgi, hostagent seed.system.yara.parser Description: Python path to the YaraParser class Datatype: string Used when: Live Components affected: uwsgi, hostagent seed.system.yara.importer Description: Python path to the YaraImporter class Datatype: string Used when: Live Components affected: uwsgi, hostagent Page | 60 UNCLASSIFIED SEED.UI seed.ui.allow_raw_downloads Description: Should the UI allow the user to download the files in raw format Datatype: bool Used when: Live Components affected: uwsgi seed.ui.allowed_checkout_range Description: CIDR of IPs that are allowed to checkout the code from the core server (restrict this as much as possible) Datatype: string Used when: Installation Components affected: nginx seed.ui.audit Description: Should the UI audit the different requests it receives Datatype: bool Used when: Live Components affected: uwsgi seed.ui.context Description: Python path to the UI context dictionary Datatype: string Used when: Live Components affected: uwsgi seed.ui.debug Description: Should the debug features be turned on into the UI Datatype: bool Used when: Live Components affected: uwsgi seed.ui.download_encoding Description: Default download encoding of every files downloaded in the system Datatype: string Used when: Live Components affected: uwsgi Page | 61 UNCLASSIFIED seed.ui.email Description: Administrative email displayed in the terms of service page Datatype: string Used when: Live Components affected: uwsgi seed.ui.enforce_quota Description: Should the UI enforme submission and API quotas Datatype: bool Used when: Live Components affected: uwsgi seed.ui.fqdn Description: Fully qualified domain name of the web server Datatype: string Used when: Installation Components affected: nginx seed.ui.install_path Description: File path where the different repos are checked out Datatype: string Used when: Live Components affected: uwsgi seed.ui.rsa_key_size Description: Size of the rsa key used to encode user’s password during the login process Datatype: int Used when: Live Components affected: uwsgi seed.ui.secret_key Description: Flask secret key to make sure the session cookies are safe (make this is long and random…) Datatype: string Used when: Live Components affected: uwsgi Page | 62 UNCLASSIFIED seed.ui.tos Description: Terms of service in markdown format Datatype: string Used when: Live Components affected: uwsgi seed.ui.tos_lockout Description: Should the UI lockout the user for administrative review after they’ve agreed to terms of service Datatype: bool Used when: Live Components affected: uwsgi SEED.UI.SSL seed.ui.ssl.enabled Description: Should SSL be enabled on the core server Datatype: bool Used when: Installation Components affected: nginx SEED.UI.SSL.CERTS seed.ui.ssl.certs.autogen Description: Should the system automatically generate certs for SSL Datatype: bool Used when: Installation Components affected: nginx seed.ui.ssl.certs.ca Description: Path, local to the install_dir, to the client certs ca Datatype: string Used when: Installation Components affected: nginx seed.ui.ssl.certs.crl Description: Path, local to the install_dir, to the client certs revocation list Datatype: string Used when: Installation Components affected: nginx Page | 63 UNCLASSIFIED seed.ui.ssl.certs.crt Description: Path, local to the install_dir, to the server's public key Datatype: string Used when: Installation Components affected: nginx seed.ui.ssl.certs.key Description: Path, local to the install_dir, to the server's private key Datatype: string Used when: Installation Components affected: nginx seed.ui.ssl.certs.tc Description: Path, local to the install_dir, to the server's cert trust chain Datatype: string Used when: Installation Components affected: nginx SEED.WORKERS seed.workers.default_profile Description: Default profile loaded by the workers when they don’t have a specific profile assigned to them Datatype: string Used when: Live Components affected: hostagent seed.workers.install_kvm Description: Should the necessary components to run VMs be installed on the workers Datatype: bool Used when: Installation Components affected: None seed.workers.nodes Description: List of IP or domains of all the workers in the system Datatype: list[string] Used when: Installation Components affected: None Page | 64 UNCLASSIFIED seed.workers.proxy_redis Description: Should connections to redis be proxied Datatype: bool Used when: Installation Components affected: haproxy SEED.WORKERS.VIRTUALMACHINES seed.workers.virtualmachines.disk_root Description: Path to the folder where the VMs disks are save on the workers Datatype: string Used when: Live Components affected: hostagent seed.workers.virtualmachines.use_parent_as_datastore Description: Should the VMs use their parent to access datastore Datatype: bool Used when: Live Components affected: hostagent seed.workers.virtualmachines.use_parent_as_queue Description: Should the VMs use their parent to access the queues Datatype: bool Used when: Live Components affected: hostagent SEED.WORKERS.VIRTUALMACHINES.MASTER_LIST seed.workers.virtualmachines.master_list. .num_workers Description: Number of worker processes that will be instantiated inside the VMs Datatype: string Used when: Live Components affected: hostagent SEED.WORKERS.VIRTUALMACHINES.MASTER_LIST.CFG seed.workers.virtualmachines.master_list. .cfg.enabled Description: Is the VM enabled Datatype: string Used when: Live Components affected: hostagent, dispatcher Page | 65 UNCLASSIFIED seed.workers.virtualmachines.master_list. .cfg.name Description: Name of the VM. (Must match the name of the service it launches internally) Datatype: string Used when: Live Components affected: hostagent, dispatcher seed.workers.virtualmachines.master_list. .cfg.os_type Description: Type of OS the VM runs (windows or linux) Datatype: string Used when: Live Components affected: hostagent seed.workers.virtualmachines.master_list. .cfg.os_variant Description: Variant of OS the VM runs (win7, win2k8, ubuntu precise, …) Datatype: string Used when: Live Components affected: hostagent seed.workers.virtualmachines.master_list. .cfg.ram Description: Amount of ram allocated to the VM Datatype: int Used when: Live Components affected: hostagent seed.workers.virtualmachines.master_list. .cfg.revert_every Description: Time in seconds after which the VM is automatically reverted Datatype: int Used when: Live Components affected: hostagent seed.workers.virtualmachines.master_list. .cfg.vcpus Description: Number of virtual CPUs allocated to the VM Datatype: int Used when: Live Components affected: hostagent Page | 66 UNCLASSIFIED seed.workers.virtualmachines.master_list. .cfg.virtual_disk_url Description: Name of the QCOW2 disk use to create the VM Datatype: string Used when: Live Components affected: hostagent Page | 67
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : Yes Producer : Skia/PDF m64 Page Count : 67EXIF Metadata provided by EXIF.tools