ArangoDB V3.3.19 Ation Arango DB Manual 3.3.19
User Manual:
Open the PDF directly: View PDF .
Page Count: 727
Download | ![]() |
Open PDF In Browser | View PDF |
Table of Contents Introduction 1.1 Getting Started 1.2 Installing 1.2.1 Linux 1.2.1.1 M ac OS X 1.2.1.2 Windows 1.2.1.3 Compiling 1.2.1.4 Authentication 1.2.2 Accessing the Web Interface 1.2.3 Coming from SQL 1.2.4 Tutorials 1.3 Kubernetes 1.3.1 Datacenter to datacenter Replication on Kubernetes 1.3.2 Highlights 1.4 Scalability 1.5 Architecture 1.5.1 Data models 1.5.2 Limitations 1.5.3 Data models & modeling 1.6 Concepts 1.6.1 Databases 1.6.2 Working with Databases 1.6.2.1 Notes about Databases 1.6.2.2 Collections 1.6.3 Collection M ethods 1.6.3.1 Database M ethods 1.6.3.2 Documents 1.6.4 Basics and Terminology 1.6.4.1 Collection M ethods 1.6.4.2 Database M ethods 1.6.4.3 Graphs, Vertices & Edges 1.6.5 Naming Conventions 1.6.6 Database Names 1.6.6.1 Collection Names 1.6.6.2 Document Keys 1.6.6.3 Attribute Names 1.6.6.4 Indexing 1.7 Index Basics 1.7.1 Which index to use when 1.7.2 Index Utilization 1.7.3 Working with Indexes 1.7.4 1 Hash Indexes 1.7.4.1 Skiplists 1.7.4.2 Persistent 1.7.4.3 Fulltext Indexes 1.7.4.4 Geo Indexes 1.7.4.5 Vertex Centric Indexes 1.7.4.6 Graphs 1.8 General Graphs 1.8.1 Graph M anagement 1.8.1.1 Graph Functions 1.8.1.2 SmartGraphs SmartGraph M anagement Traversals 1.8.2 1.8.2.1 1.8.3 Using Traversal Objects 1.8.3.1 Example Data 1.8.3.2 Working with Edges 1.8.4 Pregel 1.8.5 Foxx M icroservices 1.9 At a glance 1.9.1 Getting started 1.9.2 Service manifest 1.9.3 Service context 1.9.4 Configuration 1.9.5 Dependencies 1.9.6 Routers 1.9.7 Endpoints 1.9.7.1 M iddleware 1.9.7.2 Request 1.9.7.3 Response 1.9.7.4 Using GraphQL 1.9.8 Sessions middleware 1.9.9 Session storages 1.9.9.1 Collection storage 1.9.9.1.1 JWT storage 1.9.9.1.2 Session transports 1.9.9.2 Cookie transport 1.9.9.2.1 Header transport 1.9.9.2.2 Serving files 1.9.10 Writing tests 1.9.11 Cross Origin 1.9.12 Scripts and queued jobs 1.9.13 M igrating 2.x services 1.9.14 M igrating from pre-2.8 1.9.14.1 manifest.json 1.9.14.2 2 applicationContext 1.9.14.3 Repositories and M odels 1.9.14.4 Controllers 1.9.14.5 Request context 1.9.14.5.1 Error handling 1.9.14.5.2 Before/After/Around 1.9.14.5.3 Request object 1.9.14.5.4 Response object 1.9.14.5.5 Dependency Injection 1.9.14.5.6 Sessions 1.9.14.6 Auth and OAuth2 1.9.14.7 Foxx Queries 1.9.14.8 Legacy compatibility mode 1.9.15 User management 1.9.16 Related modules 1.9.17 Authentication 1.9.17.1 OAuth 1.0a 1.9.17.2 OAuth 2.0 1.9.17.3 Transactions 1.10 Transaction invocation 1.10.1 Passing parameters 1.10.2 Locking and isolation 1.10.3 Durability 1.10.4 Limitations 1.10.5 Deployment 1.11 Single instance 1.11.1 Cluster 1.11.2 M esos, DC/OS 1.11.2.1 Generic & Docker 1.11.2.2 Advanced Topics 1.11.2.3 Standalone Agency 1.11.2.3.1 Local test setups 1.11.2.3.2 Processes 1.11.2.3.3 Docker 1.11.2.3.4 M ultiple Datacenters 1.11.3 Kubernetes 1.11.4 Using the Operator 1.11.4.1 Deployment Resource Reference 1.11.4.2 Driver Configuration 1.11.4.3 Authentication 1.11.4.4 Scaling 1.11.4.5 Upgrading 1.11.4.6 ArangoDB Configuration & Secrets 1.11.4.7 M etrics 1.11.4.8 3 Services & Load balancer 1.11.4.9 Deployment Replication Resource Reference 1.11.4.10 Storage 1.11.4.11 Storage Resource 1.11.4.12 TLS 1.11.4.13 Troubleshooting 1.11.4.14 Backup & Restore 1.12 Administration 1.13 Web Interface 1.13.1 Dashboard 1.13.1.1 Cluster 1.13.1.2 Collections 1.13.1.3 Document 1.13.1.4 Queries 1.13.1.5 Graphs 1.13.1.6 Services 1.13.1.7 Users 1.13.1.8 Logs 1.13.1.9 ArangoDB Shell 1.13.2 Shell Output 1.13.2.1 Configuration 1.13.2.2 Details 1.13.2.3 Arangoimp 1.13.3 Arangodump 1.13.4 Arangorestore 1.13.5 Arangoexport 1.13.6 M anaging Users 1.13.7 In Arangosh Server Configuration 1.13.7.1 1.13.8 Operating System Configuration 1.13.8.1 M anaging Endpoints 1.13.8.2 SSL Configuration 1.13.8.3 LDAP Options 1.13.8.4 Logging Options 1.13.8.5 General Options 1.13.8.6 Write-Ahead Log Options 1.13.8.7 Compaction Options 1.13.8.8 Cluster Options 1.13.8.9 RocksDB Engine Options 1.13.8.10 Hash Cache Options 1.13.8.11 Asynchronous Tasks 1.13.8.12 Durability 1.13.9 Encryption 1.13.10 Auditing 1.13.11 4 Configuration 1.13.11.1 Events 1.13.11.2 Replication 1.13.12 Asynchronous Replication 1.13.12.1 Components 1.13.12.1.1 Per-Database Setup 1.13.12.1.2 Server-Level Setup 1.13.12.1.3 Syncing Collections 1.13.12.1.4 Replication Limitations 1.13.12.1.5 Synchronous Replication 1.13.12.2 Implementation 1.13.12.2.1 Configuration 1.13.12.2.2 Satellite Collections 1.13.12.3 Sharding 1.13.13 Upgrading 1.13.14 Upgrading to 3.3 1.13.14.1 Upgrading to 3.2 1.13.14.2 Upgrading to 3.1 1.13.14.3 Upgrading to 3.0 1.13.14.4 Upgrading to 2.8 1.13.14.5 Upgrading to 2.6 1.13.14.6 Upgrading to 2.5 1.13.14.7 Upgrading to 2.4 1.13.14.8 Upgrading to 2.3 1.13.14.9 Upgrading to 2.2 1.13.14.10 Troubleshooting 1.14 arangod 1.14.1 Emergency Console 1.14.2 Arangoinspect 1.14.3 Datafile Debugger 1.14.4 Arangobench 1.14.5 Architecture 1.15 Write-ahead log 1.15.1 Storage Engines 1.15.2 Release notes 1.16 Whats New in 3.3 1.16.1 Incompatible changes in 3.3 1.16.2 Whats New in 3.2 1.16.3 Known Issues in 3.2 1.16.4 Incompatible changes in 3.2 1.16.5 Whats New in 3.1 1.16.6 Incompatible changes in 3.1 1.16.7 Whats New in 3.0 1.16.8 Incompatible changes in 3.0 1.16.9 5 Whats New in 2.8 1.16.10 Incompatible changes in 2.8 1.16.11 Whats New in 2.7 1.16.12 Incompatible changes in 2.7 1.16.13 Whats New in 2.6 1.16.14 Incompatible changes in 2.6 1.16.15 Whats New in 2.5 1.16.16 Incompatible changes in 2.5 1.16.17 Whats New in 2.4 1.16.18 Incompatible changes in 2.4 1.16.19 Whats New in 2.3 1.16.20 Incompatible changes in 2.3 1.16.21 Whats New in 2.2 1.16.22 Whats New in 2.1 1.16.23 Appendix 1.17 References 1.17.1 db 1.17.1.1 collection 1.17.1.2 JavaScript M odules 1.17.2 @arangodb 1.17.2.1 console 1.17.2.2 crypto 1.17.2.3 fs 1.17.2.4 request 1.17.2.5 actions 1.17.2.6 queries 1.17.2.7 Write-ahead log 1.17.2.8 Task M anagement 1.17.2.9 Deprecated Simple Queries 1.17.3 1.17.3.1 Sequential Access 1.17.3.1.1 Pagination 1.17.3.1.2 M odification Queries 1.17.3.1.3 Geo Queries 1.17.3.1.4 Fulltext Queries 1.17.3.1.5 Actions 1.17.3.2 Delivering HTM L Pages 1.17.3.2.1 Json Objects 1.17.3.2.2 M odifying 1.17.3.2.3 Error codes and meanings 1.17.4 Glossary 1.17.5 6 Introduction ArangoDB v3.3.19 Documentation Welcome to the ArangoDB documentation! New and eager to try out ArangoDB? Start right away with our beginner's guide: Getting S tarted The documentation is organized in four handbooks: This manual describes ArangoDB and its features in detail for you as a user, developer and administrator. The AQL handbook explains ArangoDB's query language AQL. The HTTP handbook describes the internal API of ArangoDB that is used to communicate with clients. In general, the HTTP handbook will be of interest to driver developers. If you use any of the existing drivers for the language of your choice, you can skip this handbook. Our cookbook with recipes for specific problems and solutions. Features are illustrated with interactive usage examples; you can cut'n'paste them into arangosh to try them out. The HTTP REST-API for driver developers is demonstrated with cut'n'paste recipes intended to be used with the cURL. Drivers may provide their own examples based on these .js based examples to improve understandability for their respective users, i.e. for the java driver some of the samples are re-implemented. Overview ArangoDB is a native multi-model, open-source database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions. Use ACID transactions if you require them. Scale horizontally and vertically with a few mouse clicks. Key features include: installing ArangoDB on a cluster is as easy as installing an app on your mobile Flexible data modeling: model your data as combination of key-value pairs, documents or graphs - perfect for social relations Powerful query language (AQL) to retrieve and modify data Use ArangoDB as an application server and fuse your application and database together for maximal throughput Transactions: run queries on multiple documents or collections with optional transactional consistency and isolation Replication and S harding: set up the database in a master-slave configuration or spread bigger datasets across multiple servers Configurable durability: let the application decide if it needs more durability or more performance No-nonsense storage: ArangoDB uses all of the power of modern storage hardware, like SSD and large caches JavaScript for all: no language zoo, you can use one language from your browser to your back-end ArangoDB can be easily deployed as a fault-tolerant distributed state machine, which can serve as the animal brain of distributed appliances It is open source (Apache License 2.0) Community If you have questions regarding ArangoDB, Foxx, drivers, or this documentation don't hesitate to contact us on: GitHub for issues and misbehavior or pull requests Google Groups for discussions about ArangoDB in general or to announce your new Foxx App StackOverflow for questions about AQL, usage scenarios etc. Slack, our community chat When reporting issues, please describe: the environment you run ArangoDB in the ArangoDB version you use whether you're using Foxx the client you're using 7 Introduction which parts of the documentation you're working with (link) what you expect to happen what is actually happening We will respond as soon as possible. 8 Getting Started Getting started Overview This beginner's guide will make you familiar with ArangoDB. We will cover how to install and run a local ArangoDB server use the web interface to interact with it store example data in the database query the database to retrieve the data again edit and remove existing data Installation Head to arangodb.com/download, select your operating system and download ArangoDB. You may also follow the instructions on how to install with a package manager, if available. If you installed a binary package under Linux, the server is automatically started. If you installed ArangoDB using homebrew under M acOS X, start the server by running /usr/local/sbin/arangod If you installed ArangoDB under Windows as a service, the server is automatically started. Otherwise, run the the installation folder's bin directory. You may have to run it as administrator to grant it write permissions to . arangod.exe located in C:\Program Files . For more in-depth information on how to install ArangoDB, as well as available startup parameters, installation in a cluster and so on, see Installing. ArangoDB offers two storage engines: M M Files and RocksDB. Choose the one which suits your needs best in the installation process or on first startup. Securing the installation The default installation contains one database _system and a user named root. Debian based packages and the Windows installer will ask for a password during the installation process. Red-Hat based packages will set a random password. For all other installation packages you need to execute shell> arango-secure-installation This will ask for a root password and sets this password. Web interface The server itself (arangod) speaks HTTP / REST, but you can use the graphical web interface to keep it simple. There's also arangosh, a synchronous shell for interaction with the server. If you're a developer, you might prefer the shell over the GUI. It does not provide features like syntax highlighting however. When you start using ArangoDB in your project, you will likely use an official or community-made driver written in the same language as your project. Drivers implement a programming interface that should feel natural for that programming language, and do all the talking to the server. Therefore, you can most certainly ignore the HTTP API unless you want to write a driver yourself or explicitly want to use the raw interface. To get familiar with the database system you can even put drivers aside and use the web interface (code name Aardvark) for basic interaction. The web interface will become available shortly after you started arangod . You can access it in your browser at http://localhost:8529 - if not, please see Troubleshooting. 9 Getting Started By default, authentication is enabled. The default user is root . Depending on the installation method used, the installation process either prompted for the root password or the default root password is empty (see above). Next you will be asked which database to use. Every server instance comes with a _system database. Select this database to continue. 10 Getting Started You should then be presented the dashboard with server statistics like this: For a more detailed description of the interface, see Web Interface. Databases, collections and documents Databases are sets of collections. Collections store records, which are referred to as documents. Collections are the equivalent of tables in RDBM S, and documents can be thought of as rows in a table. The difference is that you don't define what columns (or rather attributes) there will be in advance. Every document in any collection can have arbitrary attribute keys and values. Documents in a single collection will likely have a similar structure in practice however, but the database system itself does not impose it and will operate stable and fast no matter how your data looks like. Read more in the data-model concepts chapter. For now, you can stick with the default _system database and use the web interface to create collections and documents. Start by clicking the COLLECTIONS menu entry, then the Add Collection tile. Give it a name, e.g. users, leave the other settings unchanged (we want it to be a document collection) and Save it. A new tile labeled users should show up, which you can click to open. There will be No documents yet. Click the green circle with the white plus on the right-hand side to create a first document in this collection. A dialog will ask you for a _key . You can leave the field blank and click Create to let the database system assign an automatically generated (unique) key. Note that the _key property is immutable, which means you can not change it once the document is created. What you can use as document key is described in the naming conventions. An automatically generated key could be "9883" ( _key is always a string!), and the document _id would be "users/9883" in that case. Aside from a few system attributes, there is nothing in this document yet. Let's add a custom attribute by clicking the icon to the left of (empty object), then Append. Two input fields will become available, FIELD (attribute key) and VALUE (attribute value). Type name as key and your name as value. Append another attribute, name it age and set it to your age. Click Save to persist the changes. If you click on Collection: users at the top on the right-hand side of the ArangoDB logo, the document browser will show the documents in the users collection and you will see the document you just created in the list. Querying the database 11 Getting Started Time to retrieve our document using AQL, ArangoDB's query language. We can directly look up the document we created via the , _id but there are also other options. Click the QUERIES menu entry to bring up the query editor and type the following (adjust the document ID to match your document): RETURN DOCUMENT("users/9883") Then click Execute to run the query. The result appears below the query editor: [ { "_key": "9883", "_id": "users/9883", "_rev": "9883", "age": 32, "name": "John Smith" } ] As you can see, the entire document including the system attributes is returned. DOCUM ENT() is a function to retrieve a single document or a list of documents of which you know the _key s or _id s. We return the result of the function call as our query result, which is our document inside of the result array (we could have returned more than one result with a different query, but even for a single document as result, we still get an array at the top level). This type of query is called data access query. No data is created, changed or deleted. There is another type of query called data modification query. Let's insert a second document using a modification query: INSERT { name: "Katie Foster", age: 27 } INTO users The query is pretty self-explanatory: the INSERT keyword tells ArangoDB that we want to insert something. What to insert, a document with two attributes in this case, follows next. The curly braces { } signify documents, or objects. When talking about records in a collection, we call them documents. Encoded as JSON, we call them objects. Objects can also be nested. Here's an example: { "name": { "first": "Katie", "last": "Foster" } } INTO is a mandatory part of every INSERT operation and is followed by the collection name that we want to store the document in. Note that there are no quote marks around the collection name. If you run above query, there will be an empty array as result because we did not specify what to return using a optional in modification queries, but mandatory in data access queries. Even with RETURN RETURN keyword. It is , the return value can still be an empty array, e.g. if the specified document was not found. Despite the empty result, the above query still created a new user document. You can verify this with the document browser. Let's add another user, but return the newly created document this time: INSERT { name: "James Hendrix", age: 69 } INTO users RETURN NEW is a pseudo-variable, which refers to the document created by NEW INSERT . The result of the query will look like this: [ { "_key": "10074", "_id": "users/10074", "_rev": "10074", "age": 69, "name": "James Hendrix" } ] 12 Getting Started Now that we have 3 users in our collection, how to retrieve them all with a single query? The following does not work: RETURN DOCUMENT("users/9883") RETURN DOCUMENT("users/9915") RETURN DOCUMENT("users/10074") There can only be a single RETURN statement here and a syntax error is raised if you try to execute it. The DOCUMENT() function offers a secondary signature to specify multiple document handles, so we could do: RETURN DOCUMENT( ["users/9883", "users/9915", "users/10074"] ) An array with the _id s of all 3 documents is passed to the function. Arrays are denoted by square brackets [ ] and their elements are separated by commas. But what if we add more users? We would have to change the query to retrieve the newly added users as well. All we want to say with our query is: "For every user in the collection users, return the user document". We can formulate this with a FOR loop: FOR user IN users RETURN user It expresses to iterate over every document in users document. It could also be called ahuacatlguacamole doc , u or and to use user as variable name, which we can use to refer to the current user , this is up to you. It is advisable to use a short and self-descriptive name however. The loop body tells the system to return the value of the variable user , which is a single user document. All user documents are returned this way: [ { "_key": "9915", "_id": "users/9915", "_rev": "9915", "age": 27, "name": "Katie Foster" }, { "_key": "9883", "_id": "users/9883", "_rev": "9883", "age": 32, "name": "John Smith" }, { "_key": "10074", "_id": "users/10074", "_rev": "10074", "age": 69, "name": "James Hendrix" } ] You may have noticed that the order of the returned documents is not necessarily the same as they were inserted. There is no order guaranteed unless you explicitly sort them. We can add a SORT operation very easily: FOR user IN users SORT user._key RETURN user This does still not return the desired result: James (10074) is returned before John (9883) and Katie (9915). The reason is that the attribute is a string in ArangoDB, and not a number. The individual characters of the strings are compared. result is therefore "correct". If we wanted to use the numerical value of the _key 1 is lower than 9 _key and the attributes instead, we could convert the string to a number and use it for sorting. There are some implications however. We are better off sorting something else. How about the age, in descending order? 13 Getting Started FOR user IN users SORT user.age DESC RETURN user The users will be returned in the following order: James (69), John (32), Katie (27). Instead of used for ascending order. ASC DESC for descending order, ASC can be is the default though and can be omitted. We might want to limit the result set to a subset of users, based on the age attribute for example. Let's return users older than 30 only: FOR user IN users FILTER user.age > 30 SORT user.age RETURN user This will return John and James (in this order). Katie's age attribute does not fulfill the criterion (greater than 30), she is only 27 and therefore not part of the result set. We can make her age to return her user document again, using a modification query: UPDATE "9915" WITH { age: 40 } IN users RETURN NEW UPDATE _id allows to partially edit an existing document. There is also , which remain the same) and only add the specified ones. REPLACE UPDATE , which would remove all attributes (except for and _key on the other hand only replaces the specified attributes and keeps everything else as-is. The UPDATE keyword is followed by the document key (or a document / object with a attributes to update are written as object after the like INTO WITH keyword. IN _key attribute) to identify what to modify. The denotes in which collection to perform this operation in, just (both keywords are actually interchangable here). The full document with the changes applied is returned if we use the NEW pseudo-variable: [ { "_key": "9915", "_id": "users/9915", "_rev": "12864", "age": 40, "name": "Katie Foster" } If we used REPLACE instead, the name attribute would be gone. With UPDATE , the attribute is kept (the same would apply to additional attributes if we had them). Let us run our FILTER query again, but only return the user names this time: FOR user IN users FILTER user.age > 30 SORT user.age RETURN user.name This will return the names of all 3 users: [ "John Smith", "Katie Foster", "James Hendrix" ] It is called a projection if only a subset of attributes is returned. Another kind of projection is to change the structure of the results: FOR user IN users RETURN { userName: user.name, age: user.age } 14 Getting Started The query defines the output format for every user document. The user name is returned as userName instead of name , the age keeps the attribute key in this example: [ { "userName": "James Hendrix", "age": 69 }, { "userName": "John Smith", "age": 32 }, { "userName": "Katie Foster", "age": 40 } ] It is also possible to compute new values: FOR user IN users RETURN CONCAT(user.name, "'s age is ", user.age) CONCAT() is a function that can join elements together to a string. We use it here to return a statement for every user. As you can see, the result set does not always have to be an array of objects: [ "James Hendrix's age is 69", "John Smith's age is 32", "Katie Foster's age is 40" ] Now let's do something crazy: for every document in the users collection, iterate over all user documents again and return user pairs, e.g. John and Katie. We can use a loop inside a loop for this to get the cross product (every possible combination of all user records, 3 * 3 = 9). We don't want pairings like John + John however, so let's eliminate them with a filter condition: FOR user1 IN users FOR user2 IN users FILTER user1 != user2 RETURN [user1.name, user2.name] We get 6 pairings. Pairs like James + John and John + James are basically redundant, but fair enough: [ [ "James Hendrix", "John Smith" ], [ "James Hendrix", "Katie Foster" ], [ "John Smith", "James Hendrix" ], [ "John Smith", "Katie Foster" ], [ "Katie Foster", "James Hendrix" ], [ "Katie Foster", "John Smith" ] ] We could calculate the sum of both ages and compute something new this way: FOR user1 IN users FOR user2 IN users FILTER user1 != user2 RETURN { pair: [user1.name, user2.name], sumOfAges: user1.age + user2.age } We introduce a new attribute sumOfAges and add up both ages for the value: [ 15 Getting Started { "pair": [ "James Hendrix", "John Smith" ], "sumOfAges": 101 }, { "pair": [ "James Hendrix", "Katie Foster" ], "sumOfAges": 109 }, { "pair": [ "John Smith", "James Hendrix" ], "sumOfAges": 101 }, { "pair": [ "John Smith", "Katie Foster" ], "sumOfAges": 72 }, { "pair": [ "Katie Foster", "James Hendrix" ], "sumOfAges": 109 }, { "pair": [ "Katie Foster", "John Smith" ], "sumOfAges": 72 } ] If we wanted to post-filter on the new attribute to only return pairs with a sum less than 100, we should define a variable to temporarily store the sum, so that we can use it in a FILTER statement as well as in the RETURN statement: FOR user1 IN users FOR user2 IN users FILTER user1 != user2 LET sumOfAges = user1.age + user2.age FILTER sumOfAges < 100 RETURN { pair: [user1.name, user2.name], sumOfAges: sumOfAges } The LET keyword is followed by the designated variable name ( sumOfAges ), then there's a = symbol and the value or an expression to define what value the variable is supposed to have. We re-use our expression to calculate the sum here. We then have another FILTER to skip the unwanted pairings and make use of the variable we declared before. We return a projection with an array of the user names and the calculated age, for which we use the variable again: [ { "pair": [ "John Smith", "Katie Foster" ], "sumOfAges": 72 }, { "pair": [ "Katie Foster", "John Smith" ], "sumOfAges": 72 } ] Pro tip: when defining objects, if the desired attribute key and the variable to use for the attribute value are the same, you can use a shorthand notation: { sumOfAges } instead of { sumOfAges: sumOfAges } . Finally, let's delete one of the user documents: REMOVE "9883" IN users It deletes the user John ( _key: "9883" ). We could also remove documents in a loop (same goes for INSERT , UPDATE and REPLACE ): FOR user IN users FILTER user.age >= 30 REMOVE user IN users 16 Getting Started The query deletes all users whose age is greater than or equal to 30. How to continue There is a lot more to discover in AQL and much more functionality that ArangoDB offers. Continue reading the other chapters and experiment with a test database to foster your knowledge. If you want to write more AQL queries right now, have a look here: Data Queries: data access and modification queries High-level operations: detailed descriptions of FOR , FILTER and more operations not shown in this introduction Functions: a reference of all provided functions ArangoDB programs The ArangoDB package comes with the following programs: arangod : The ArangoDB database daemon. This server program is intended to run as a daemon process and to serve the various clients connection to the server via TCP / HTTP. arangosh : The ArangoDB shell. A client that implements a read-eval-print loop (REPL) and provides functions to access and administrate the ArangoDB server. arangoimp : A bulk importer for the ArangoDB server. It supports JSON and CSV. arangodump : A tool to create backups of an ArangoDB database in JSON format. arangorestore : A tool to load data of a backup back into an ArangoDB database. arango-dfdb : A datafile debugger for ArangoDB. It is primarily intended to be used during development of ArangoDB. arangobench : A benchmark and test tool. It can be used for performance and server function testing. 17 Installing Installing First of all, download and install the corresponding RPM or Debian package or use homebrew on M acOS X. You can find packages for various operation systems at our install section, including installers for Windows. How to do that in detail is described in the subchapters of this section. On how to set up a cluster, checkout the Deployment chapter. 18 Linux Linux Visit the official ArangoDB download page and download the correct package for your Linux distribution. You can find binary packages for the most common distributions there. Follow the instructions to use your favorite package manager for the major distributions. After setting up the ArangoDB repository you can easily install ArangoDB using yum, aptitude, urpmi or zypper. Debian based packages will ask for a password during installation. For an unattended installation for Debian, see below. Red-Hat based packages will set a random password during installation. For other distributions or to change the password, run secure-installation arango- to set a root password. Alternatively, see Compiling if you want to build ArangoDB yourself. Start up the database server. Normally, this is done by executing the following command: unix> /etc/init.d/arangod start It will start the server, and do that as well at system boot time. To stop the server you can use the following command: unix> /etc/init.d/arangod stop The exact commands depend on your Linux distribution. You may require root privileges to execute these commands. Linux Mint Please use the corresponding Ubuntu or Debian packages. Unattended Installation Debian based package will ask for a password during installation. For unattended installation, you can set the password using the debconf helpers. echo arangodb3 arangodb3/password password NEWPASSWORD | debconf-set-selections echo arangodb3 arangodb3/password_again password NEWPASSWORD | debconf-set-selections The commands should be executed prior to the installation. Red-Hat based packages will set a random password during installation. If you want to force a password, execute ARANGODB_DEFAULT_ROOT_PASSWORD=NEWPASSWORD arango-secure-installation The command should be executed after the installation. Non-Standard Installation If you compiled ArangoDB from source and did not use any installation package – or using non-default locations and/or multiple ArangoDB instances on the same host – you may want to start the server process manually. You can do so by invoking the arangod binary from the command line as shown below: unix> /usr/local/sbin/arangod /tmp/vocbase 20ZZ-XX-YYT12:37:08Z [8145] INFO using built-in JavaScript startup files 20ZZ-XX-YYT12:37:08Z [8145] INFO ArangoDB (version 1.x.y) is ready for business 20ZZ-XX-YYT12:37:08Z [8145] INFO Have Fun! 19 Linux To stop the database server gracefully, you can either press CTRL-C or by send the SIGINT signal to the server process. On many systems this can be achieved with the following command: unix> kill -2 `pidof arangod` Once you started the server, there should be a running instance of arangod - the ArangoDB database server. unix> ps auxw | fgrep arangod arangodb 14536 0.1 0.6 5307264 23464 s002 S 1:21pm 0:00.18 /usr/local/sbin/arangod If there is no such process, check the log file /var/log/arangodb/arangod.log for errors. If you see a log message like 2012-12-03T11:35:29Z [12882] ERROR Database directory version (1) is lower than server version (1.2). 2012-12-03T11:35:29Z [12882] ERROR It seems like you have upgraded the ArangoDB binary. If this is what you wanted to do, pleas e restart with the --database.auto-upgrade option to upgrade the data in the database directory. 2012-12-03T11:35:29Z [12882] FATAL Database version check failed. Please start the server with the --database.auto-upgrade opti on make sure to start the server once with the --database.auto-upgrade option. Note that you may have to enable logging first. If you start the server in a shell, you should see errors logged there as well. 20 M ac OS X Mac OS X The preferred method for installing ArangoDB under M ac OS X is homebrew. However, in case you are not using homebrew, we provide a command-line app or graphical app which contains all the executables. Homebrew If you are using homebrew, then you can install the latest released stable version of ArangoDB using brew as follows: brew install arangodb This will install the current stable version of ArangoDB and all dependencies within your Homebrew tree. Note that the server will be installed as: /usr/local/sbin/arangod You can start the server by running the command /usr/local/sbin/arangod & . Configuration file is located at /usr/local/etc/arangodb3/arangod.conf The ArangoDB shell will be installed as: /usr/local/bin/arangosh You can uninstall ArangoDB using: brew uninstall arangodb However, in case you started ArangoDB using the launchctl, you need to unload it before uninstalling the server: launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.arangodb.plist Then remove the LaunchAgent: rm ~/Library/LaunchAgents/homebrew.mxcl.arangodb.plist Note: If the latest ArangoDB Version is not shown in homebrew, you also need to update homebrew: brew update Known issues Performance - the LLVM delivered as of M ac OS X El Capitan builds slow binaries. Use GCC instead, until this issue has been fixed by Apple. the Commandline argument parsing doesn't accept blanks in filenames; the CLI version below does. if you need to change server endpoint while starting homebrew version, you can edit arangod.conf file and uncomment line with endpoint needed, e.g.: [server] endpoint = tcp://0.0.0.0:8529 21 M ac OS X Graphical App In case you are not using homebrew, we also provide a graphical app. You can download it from here. Choose Mac OS X. Download and install the application ArangoDB in your application folder. Command line App In case you are not using homebrew, we also provide a command-line app. You can download it from here. Choose Mac OS X. Download and install the application ArangoDB-CLI in your application folder. Starting the application will start the server and open a terminal window showing you the log-file. ArangoDB server has been started The database directory is located at '/Applications/ArangoDB-CLI.app/Contents/MacOS/opt/arangodb/var/lib/arangodb' The log file is located at '/Applications/ArangoDB-CLI.app/Contents/MacOS/opt/arangodb/var/log/arangodb/arangod.log' You can access the server using a browser at 'http://127.0.0.1:8529/' or start the ArangoDB shell '/Applications/ArangoDB-CLI.app/Contents/MacOS/arangosh' Switching to log-file now, killing this windows will NOT stop the server. 2013-10-27T19:42:04Z [23840] INFO ArangoDB (version 1.4.devel [darwin]) is ready for business. Have fun! Note that it is possible to install both, the homebrew version and the command-line app. You should, however, edit the configuration files of one version and change the port used. 22 Windows Windows The default installation directory is C:\Program Files\ArangoDB-3.x.x. During the installation process you may change this. In the following description we will assume that ArangoDB has been installed in the location. You have to be careful when choosing an installation directory. You need either write permission to this directory or you need to modify the configuration file for the server process. In the latter case the database directory and the Foxx directory have to be writable by the user. Single- and Multiuser Installation There are two main modes for the installer of ArangoDB. The installer lets you select: multi user installation (default; admin privileges required) Will install ArangoDB as service. single user installation Allow to install Arangodb as normal user. Requires manual starting of the database server. CheckBoxes The checkboxes allow you to chose weather you want to: chose custom install paths do an automatic upgrade keep an backup of your data add executables to path create a desktop icon or not. Custom Install Paths This checkbox controls if you will be able to override the default paths for the installation in subsequent steps. The default installation paths are: M ulti User Default: Installation: C:\Program Files\ArangoDB-3.x.x DataBase: C:\ProgramData\ArangoDB Foxx Service: C:\ProgramData\ArangoDB-apps Single User Default: Installation: C:\Users\\\AppData\Local\ArangoDB-3.x.x DataBase: C:\Users\\\AppData\Local\ArangoDB Foxx Service: C:\Users\\\AppData\Local\ArangoDB-apps We are not using the roaming part of the user's profile, because doing so avoids the data being synced to the windows domain controller. Automatic Upgrade If this checkbox is selected the installer will attempt to perform an automatic update. For more information please see Upgrading from Previous Version. Keep Backup Select this to create a backup of your database directory during automatic upgrade. The backup will be created next to your current database directory suffixed by a time stamp. 23 Windows Add to Path Select this to add the binary directory to your system's path (multi user installation) or user's path (single user installation). Desktop Icon Select if you want the installer to create Desktop Icons that let you: access the web inteface start the commandline client (arangosh) start the database server (single user installation only) Upgrading from Previous Version If you are upgrading ArangoDB from an earlier version you need to copy your old database directory to the new default paths. Upgrading will keep your old data, password and choice of storage engine as it is. Switching to the RocksDB storage engine requires a export and reimport of your data. Starting If you installed ArangoDB for multiple users (as a service) it is automatically started. Otherwise you need to use the link that was created on you Desktop if you chose to let the installer create desktop icons or the executable arangod.exe located in \bin. This will use the configuration file arangod.conf located in \etc\arangodb, which you can adjust to your needs and use the data directory \var\lib\arangodb. This is the place where all your data (databases and collections) will be stored by default. Please check the output of the arangod.exe executable before going on. If the server started successfully, you should see a line is ready for business. Have fun! ArangoDB at the end of its output. We now wish to check that the installation is working correctly and to do this we will be using the administration web interface. Execute arangod.exe if you have not already done so, then open up your web browser and point it to the page: http://127.0.0.1:8529/ Advanced Starting If you want to provide our own start scripts, you can set the environment variable ARANGODB_CONFIG_PATH. This variable should point to a directory containing the configuration files. Using the Client To connect to an already running ArangoDB server instance, there is a shell arangosh.exe located in \bin. This starts a shell which can be used – amongst other things – to administer and query a local or remote ArangoDB server. Note that arangosh.exe does NOT start a separate server, it only starts the shell. To use it you must have a server running somewhere, e.g. by using the arangod.exe executable. arangosh.exe uses configuration from the file arangosh.conf located in \etc\arangodb\. Please adjust this to your needs if you want to use different connection settings etc. Uninstalling To uninstall the Arango server application you can use the windows control panel (as you would normally uninstall an application). Note however, that any data files created by the Arango server will remain as well as the directory. To complete the uninstallation process, remove the data files and the directory manually. 24 Windows Limitations for Cygwin Please note some important limitations when running ArangoDB under Cygwin: Starting ArangoDB can be started from out of a Cygwin terminal, but pressing CTRL-C will forcefully kill the server process without giving it a chance to handle the kill signal. In this case, a regular server shutdown is not possible, which may leave a file LOCK around in the server's data directory. This file needs to be removed manually to make ArangoDB start again. Additionally, as ArangoDB does not have a chance to handle the kill signal, the server cannot forcefully flush any data to disk on shutdown, leading to potential data loss. When starting ArangoDB from a Cygwin terminal it might also happen that no errors are printed in the terminal output. Starting ArangoDB from an M S-DOS command prompt does not impose these limitations and is thus the preferred method. Please note that ArangoDB uses UTF-8 as its internal encoding and that the system console must support a UTF-8 codepage (65001) and font. It may be necessary to manually switch the console font to a font that supports UTF-8. 25 Compiling Compiling ArangoDB from scratch The following sections describe how to compile and build the ArangoDB from scratch. ArangoDB will compile on most Linux and M ac OS X systems. We assume that you use the GNU C/C++ compiler or clang/clang++ to compile the source. ArangoDB has been tested with these compilers, but should be able to compile with any Posix-compliant, C++11-enabled compiler. Please let us know whether you successfully compiled it with another C/C++ compiler. By default, cloning the github repository will checkout devel. This version contains the development version of the ArangoDB. Use this branch if you want to make changes to the ArangoDB source. On Windows you first need to allow and enable symlinks for your user. Please checkout the cookbook on how to compile ArangoDB. 26 Authentication Authentication ArangoDB allows to restrict access to databases to certain users. All users of the system database are considered administrators. During installation a default user root is created, which has access to all databases. You should create a database for your application together with a user that has access rights to this database. See M anaging Users. Use the arangosh to create a new database and user. arangosh> db._createDatabase("example"); arangosh> var users = require("@arangodb/users"); arangosh> users.save("root@example", "password"); arangosh> users.grantDatabase("root@example", "example"); You can now connect to the new database using the user root@example. shell> arangosh --server.username "root@example" --server.database example 27 Accessing the Web Interface Accessing the Web Interface ArangoDB comes with a built-in web interface for administration. The web interface can be accessed via the URL: http://127.0.0.1:8529 If everything works as expected, you should see the login view: For more information on the ArangoDB web interface, see Web Interface 28 Coming from SQL Coming from SQL If you worked with a relational database management system (RDBM S) such as M ySQL, M ariaDB or PostgreSQL, you will be familiar with its query language, a dialect of SQL (Structured Query Language). ArangoDB's query language is called AQL. There are some similarities between both languages despite the different data models of the database systems. The most notable difference is probably the concept of loops in AQL, which makes it feel more like a programming language. It suits the schema-less model more natural and makes the query language very powerful while remaining easy to read and write. To get started with AQL, have a look at our detailed comparison of SQL and AQL. It will also help you to translate SQL queries to AQL when migrating to ArangoDB. How do browse vectors translate into document queries? In traditional SQL you may either fetch all columns of a table row by row, using SELECT * FROM table , or select a subset of the columns. The list of table columns to fetch is commonly called column list or browse vector: SELECT columnA, columnB, columnZ FROM table Since documents aren't two-dimensional, and neither do you want to be limited to returning two-dimensional lists, the requirements for a query language are higher. AQL is thus a little bit more complex than plain SQL at first, but offers much more flexibility in the long run. It lets you handle arbitrarily structured documents in convenient ways, mostly leaned on the syntax used in JavaScript. Composing the documents to be returned The AQL RETURN statement returns one item per document it is handed. You can return the whole document, or just parts of it. Given that oneDocument is a document (retrieved like LET oneDocument = DOCUMENT("myusers/3456789") for instance), it can be returned as-is like this: RETURN oneDocument [ { "_id": "myusers/3456789", "_key": "3456789" "_rev": "14253647", "firstName": "John", "lastName": "Doe", "address": { "city": "Gotham", "street": "Road To Nowhere 1" }, "hobbies": [ { name: "swimming", howFavorite: 10 }, { name: "biking", howFavorite: 6 }, { name: "programming", howFavorite: 4 } ] } ] Return the hobbies sub-structure only: RETURN oneDocument.hobbies [ [ { name: "swimming", howFavorite: 10 }, { name: "biking", howFavorite: 6 }, { name: "programming", howFavorite: 4 } ] 29 Coming from SQL ] Return the hobbies and the address: RETURN { hobbies: oneDocument.hobbies, address: oneDocument.address } [ { hobbies: [ { name: "swimming", howFavorite: 10 }, { name: "biking", howFavorite: 6 }, { name: "programming", howFavorite: 4 } ], address: { "city": "Gotham", "street": "Road To Nowhere 1" } } ] Return the first hobby only: RETURN oneDocument.hobbies[0].name [ "swimming" ] Return a list of all hobby strings: RETURN { hobbies: oneDocument.hobbies[*].name } [ { hobbies: ["swimming", "biking", "porgramming"] } ] M ore complex array and object manipulations can be done using AQL functions and operators. 30 Tutorials Tutorials Kubernetes: Start ArangoDB on Kubernetes in 5 minutes Kubernetes: DC2DC on Kubernetes 31 Kubernetes Start ArangoDB on Kubernetes in 5 minutes Starting an ArangoDB database (either single server or full blown cluster) on Kubernetes involves a lot of resources. The servers needs to run in Pods , you need Secrets for authentication, TLS certificates and Services to enable communication with the database. Use kube-arangodb , the ArangoDB Kubernetes Operator to greatly simplify this process. In this guide, we will explain what the ArangoDB Kubernetes Operator is, how to install it and how use it to deploy your first ArangoDB database in a Kubernetes cluster. What is kube-arangodb kube-arangodb is a set of two operators that you deploy in your Kubernetes cluster to (1) manage deployments of the ArangoDB database and (2) provide PersistentVolumes Note that the operator that provides PersistentVolumes To install PersistentVolumes is not needed to run ArangoDB deployments. You can also use provided by other controllers. In this guide we will focus on the Installing on local storage of your nodes for optimal storage performance. operator. ArangoDeployment kube-arangodb kube-arangodb in your Kubernetes cluster, make sure you have access to this cluster and the rights to deploy resources at cluster level. For now, any recent Kubernetes cluster will do (e.g. Then run (replace minikube ). with the version of the operator that you want to install): kubectl apply -f https://raw.githubusercontent.com/arangodb/kube-arangodb/ /manifests/crd.yaml kubectl apply -f https://raw.githubusercontent.com/arangodb/kube-arangodb/ /manifests/arango-deployment.yaml # Optional kubectl apply -f https://raw.githubusercontent.com/arangodb/kube-arangodb/ /manifests/arango-storage.yaml The first command installs two ArangoDeployment CustomResourceDefinitions in your Kubernetes cluster: is the resource used to deploy ArangoDB database. ArangoLocalStorage is the resource used to provision The second command installs a Deployment The optional third command installs a PersistentVolumes that runs the operator that controls Deployment on local storage. ArangoDeployment that runs the operator that provides resources. PersistentVolumes on local disks of the cluster nodes. Use this when running on bare-metal or if there is no provisioner for fast storage in your Kubernetes cluster. Deploying your first ArangoDB database The first database we are going to deploy is a single server database. Create a file called single-server.yaml with the following content. apiVersion: "database.arangodb.com/v1alpha" kind: "ArangoDeployment" metadata: name: "single-server" spec: mode: Single 32 Kubernetes Now insert this resource in your Kubernetes cluster using: kubectl apply -f single-server.yaml The ArangoDeployment operator in will now inspect the resource you just deployed and start the process to run a single kube-arangodb server database. To inspect the current status of your deployment, run: kubectl describe ArangoDeployment single-server # or shorter kubectl describe arango single-server To inspect the pods created for this deployment, run: kubectl get pods --selector=arango_deployment=single-server The result will look similar to this: NAME READY STATUS RESTARTS AGE single-server-sngl-cjtdxrgl-fe06f0 1/1 Running 0 1m Once the pod reports that it is has a Running status and is ready, your database s available. Connecting to your database The single server database you deployed in the previous chapter is now available from within the Kubernetes cluster as well as outside it. Access to the database from outside the Kubernetes cluster is provided using an external-access service. By default this service is of type LoadBalancer . If this type of service is not supported by your Kubernetes cluster, it will be replaced by a service of type NodePort after a minute. To see the type of service that has been created, run: kubectl get service single-server-ea When the service is of the of the NodePort LoadBalancer type, use the IP address listed in the EXTERNAL-IP column with port 8529. When the service is type, use the IP address of any of the nodes of the cluster, combine with the high (>30000) port listed in the PORT(S) column. Now you can connect your browser to https:// : / . Your browser will show a warning about an unknown certificate. Accept the certificate for now. Then login using username root and an empty password. If you want to delete your single server ArangoDB database, just run: kubectl delete ArangoDeployment single-server Deploying a full blown ArangoDB cluster database The deployment of a full blown cluster is very similar to deploying a single server database. The difference is in the ArangoDeployment specification. Create a file called cluster.yaml mode field of the with the following content. apiVersion: "database.arangodb.com/v1alpha" kind: "ArangoDeployment" 33 Kubernetes metadata: name: "cluster" spec: mode: Cluster Now insert this resource in your Kubernetes cluster using: kubectl apply -f cluster.yaml The same commands used in the single server deployment can be used to inspect your cluster. Just use the correct deployment name ( cluster instead of single-server ). Where to go from here ArangoDB Kubernetes Operator 34 Datacenter to datacenter Replication on Kubernetes Start ArangoDB Cluster to Cluster Synchronization on Kubernetes This tutorial guides you through the steps needed to configure an ArangoDB datacenter to datacenter replication between two ArangoDB clusters running in Kubernetes. This feature is only available in the Enterprise Edition Requirements 1. This tutorial assumes that you have 2 ArangoDB clusters running in 2 different Kubernetes clusters. 2. Both Kubernetes clusters are equipped with support for 3. You can create (global) DNS names for configured Services Services of type LoadBalancer . with low propagation times. E.g. use Cloudflare. 4. You have 4 DNS names available: One for the database in the source ArangoDB cluster. E.g. src-db.mycompany.com One for the ArangoDB syncmasters in the source ArangoDB cluster. E.g. One for the database in the destination ArangoDB cluster. E.g. src-sync.mycompany.com dst-db.mycompany.com One for the ArangoDB syncmasters in the destination ArangoDB cluster. E.g. dst-sync.mycompany.com Step 1: Enable Datacenter Replication Support on source ArangoDB cluster Set your current Kubernetes context to the Kubernetes source cluster. Edit the ArangoDeployment of the source ArangoDB clusters. Set: spec.tls.altNames to ["src-db.mycompany.com"] spec.sync.enabled to true spec.sync.externalAccess.masterEndpoint to (can include more names / IP addresses) ["https://src-sync.mycompany.com:8629"] spec.sync.externalAccess.accessPackageSecretNames to ["src-accesspackage"] Step 2: Extract access-package from source ArangoDB cluster Run: kubectl get secret src-accesspackage --template='{{index .data "accessPackage.yaml"}}' | \ base64 -D > accessPackage.yaml Step 3: Configure source DNS names Run: kubectl get service Find the IP address contained in the -ea LoadBalancer column for the following Use this IP address for the -sync src-db.mycompany.com Use this IP address for the Services : DNS name. src-sync.mycompany.com DNS name. 35 Datacenter to datacenter Replication on Kubernetes The process for configuring DNS names is specific to each DNS provider. Step 4: Enable Datacenter Replication Support on destination ArangoDB cluster Set your current Kubernetes context to the Kubernetes destination cluster. Edit the ArangoDeployment of the source ArangoDB clusters. Set: spec.tls.altNames to ["dst-db.mycompany.com"] spec.sync.enabled to true spec.sync.externalAccess.masterEndpoint to (can include more names / IP addresses) ["https://dst-sync.mycompany.com:8629"] Step 5: Import access package in destination cluster Run: kubectl apply -f accessPackage.yaml Note: This imports two Secrets , containing TLS information about the source cluster, into the destination cluster Step 6: Configure destination DNS names Run: kubectl get service Find the IP address contained in the LoadBalancer column for the following Use this IP address for the -ea -sync dst-db.mycompany.com Use this IP address for the Services : DNS name. dst-sync.mycompany.com DNS name. The process for configuring DNS names is specific to each DNS provider. Step 7: Create an Create a yaml file (e.g. called ArangoDeploymentReplication src-to-dst-repl.yaml resource ) with the following content: apiVersion: "replication.database.arangodb.com/v1alpha" kind: "ArangoDeploymentReplication" metadata: name: "replication-src-to-dst" spec: source: masterEndpoint: ["https://src-sync.mycompany.com:8629"] auth: keyfileSecretName: src-accesspackage-auth tls: caSecretName: src-accesspackage-ca destination: deploymentName: Step 8: Wait for DNS names to propagate Wait until the DNS names configured in step 3 and 6 resolve to their configured IP addresses. 36 Datacenter to datacenter Replication on Kubernetes Depending on your DNS provides this can take a few minutes up to 24 hours. Step 9: Activate replication Run: kubectl apply -f src-to-dst-repl.yaml Replication from the source cluster to the destination cluster will now be configured. Check the status of the replication by inspecting the status of the ArangoDeploymentReplication resource using: kubectl describe ArangoDeploymentReplication replication-src-to-dst As soon as the replication is configured, the Add collection button in the Collections page of the web UI (of the destination cluster) will be grayed out. 37 Highlights Highlights Version 3.3 Enterprise Edition Datacenter to Datacenter Replication: Replicate the entire structure and content of an ArangoDB cluster asynchronously to another cluster in a different datacenter with ArangoSync. M ulti-datacenter support means you can fallback to a replica of your cluster in case of a disaster in one datacenter. Encrypted Backups: Arangodump can create backups encrypted with a secret key using AES256 block cipher. All Editions S erver-level Replication: In addition to per-database replication, there is now an additional globalApplier . Start the global replication on the slave once and all current and future databases will be replicated from the master to the slave automatically. Asynchronous Failover: M ake a single server instance resilient with a second server instance, one as master and the other as asynchronously replicating slave, with automatic failover to the slave if the master goes down. Also see What's New in 3.3. Version 3.2 RocksDB S torage Engine: You can now use as much data in ArangoDB as you can fit on your disk. Plus, you can enjoy performance boosts on writes by having only document-level locks Pregel: We implemented distributed graph processing with Pregel to discover hidden patterns, identify communities and perform indepth analytics of large graph data sets. Fault-Tolerant Foxx: The Foxx management internals have been rewritten from the ground up to make sure multi-coordinator cluster setups always keep their services in sync and new coordinators are fully initialized even when all existing coordinators are unavailable. Enterprise: Working with some of our largest customers, we’ve added further security and scalability features to ArangoDB Enterprise like LDAP integration, Encryption at Rest, and the brand new Satellite Collections. Also see What's New in 3.2. Version 3.1 S martGraphs: Scale with graphs to a cluster and stay performant. With SmartGraphs you can use the "smartness" of your application layer to shard your graph efficiently to your machines and let traversals run locally. Encryption Control: Choose your level of SSL encryption Auditing: Keep a detailed log of all the important things that happened in ArangoDB. Also see What's New in 3.1. Version 3.0 self-organizing cluster with synchronous replication, master/master setup, shared nothing architecture, cluster management agency. Deeply integrated, native AQL graph traversal 38 Highlights VelocyPack as new internal binary storage format as well as for intermediate AQL values. Persistent indexes via RocksDB suitable for sorting and range queries. Foxx 3.0: overhauled JS framework for data-centric microservices Significantly improved Web Interface Also see What's New in 3.0. 39 Scalability Scalability ArangoDB is a distributed database supporting multiple data models, and can thus be scaled horizontally, that is, by using many servers, typically based on commodity hardware. This approach not only delivers performance as well as capacity improvements, but also achieves resilience by means of replication and automatic fail-over. Furthermore, one can build systems that scale their capacity dynamically up and down automatically according to demand. One can also scale ArangoDB vertically, that is, by using ever larger servers. There is no built in limitation in ArangoDB, for example, the server will automatically use more threads if more CPUs are present. However, scaling vertically has the disadvantage that the costs grow faster than linear with the size of the server, and none of the resilience and dynamical capabilities can be achieved in this way. In this chapter we explain the distributed architecture of ArangoDB and discuss its scalability features and limitations: ArangoDB's distributed architecture Different data models and scalability Limitations 40 Architecture Architecture The cluster architecture of ArangoDB is a CP master/master model with no single point of failure. With "CP" we mean that in the presence of a network partition, the database prefers internal consistency over availability. With "master/master" we mean that clients can send their requests to an arbitrary node, and experience the same view on the database regardless. "No single point of failure" means that the cluster can continue to serve requests, even if one machine fails completely. In this way, ArangoDB has been designed as a distributed multi-model database. This section gives a short outline on the cluster architecture and how the above features and capabilities are achieved. Structure of an ArangoDB cluster An ArangoDB cluster consists of a number of ArangoDB instances which talk to each other over the network. They play different roles, which will be explained in detail below. The current configuration of the cluster is held in the "Agency", which is a highly-available resilient key/value store based on an odd number of ArangoDB instances running Raft Consensus Protocol. For the various instances in an ArangoDB cluster there are 4 distinct roles: Agents, Coordinators, Primary and Secondary DBservers. In the following sections we will shed light on each of them. Note that the tasks for all roles run the same binary from the same Docker image. Agents One or multiple Agents form the Agency in an ArangoDB cluster. The Agency is the central place to store the configuration in a cluster. It performs leader elections and provides other synchronization services for the whole cluster. Without the Agency none of the other components can operate. While generally invisible to the outside it is the heart of the cluster. As such, fault tolerance is of course a must have for the Agency. To achieve that the Agents are using the Raft Consensus Algorithm. The algorithm formally guarantees conflict free configuration management within the ArangoDB cluster. At its core the Agency manages a big configuration tree. It supports transactional read and write operations on this tree, and other servers can subscribe to HTTP callbacks for all changes to the tree. Coordinators Coordinators should be accessible from the outside. These are the ones the clients talk to. They will coordinate cluster tasks like executing queries and running Foxx services. They know where the data is stored and will optimize where to run user supplied queries or parts thereof. Coordinators are stateless and can thus easily be shut down and restarted as needed. Primary DBservers Primary DBservers are the ones where the data is actually hosted. They host shards of data and using synchronous replication a primary may either be leader or follower for a shard. They should not be accessed from the outside but indirectly through the coordinators. They may also execute queries in part or as a whole when asked by a coordinator. Secondaries Secondary DBservers are asynchronous replicas of primaries. If one is using only synchronous replication, one does not need secondaries at all. For each primary, there can be one or more secondaries. Since the replication works asynchronously (eventual consistency), the replication does not impede the performance of the primaries. On the other hand, their replica of the data can be slightly out of date. The secondaries are perfectly suitable for backups as they don't interfere with the normal cluster operation. Cluster ID Every non-Agency ArangoDB instance in a cluster is assigned a unique ID during its startup. Using its ID a node is identifiable throughout the cluster. All cluster operations will communicate via this ID. 41 Architecture Sharding Using the roles outlined above an ArangoDB cluster is able to distribute data in so called shards across multiple primaries. From the outside this process is fully transparent and as such we achieve the goals of what other systems call "master-master replication". In an ArangoDB cluster you talk to any coordinator and whenever you read or write data it will automatically figure out where the data is stored (read) or to be stored (write). The information about the shards is shared across the coordinators using the Agency. Also see Sharding in the Administration chapter. Many sensible configurations This architecture is very flexible and thus allows many configurations, which are suitable for different usage scenarios: 1. The default configuration is to run exactly one coordinator and one primary DBserver on each machine. This achieves the classical master/master setup, since there is a perfect symmetry between the different nodes, clients can equally well talk to any one of the coordinators and all expose the same view to the data store. 2. One can deploy more coordinators than DBservers. This is a sensible approach if one needs a lot of CPU power for the Foxx services, because they run on the coordinators. 3. One can deploy more DBservers than coordinators if more data capacity is needed and the query performance is the lesser bottleneck 4. One can deploy a coordinator on each machine where an application server (e.g. a node.js server) runs, and the Agents and DBservers on a separate set of machines elsewhere. This avoids a network hop between the application server and the database and thus decreases latency. Essentially, this moves some of the database distribution logic to the machine where the client runs. These shall suffice for now. The important piece of information here is that the coordinator layer can be scaled and deployed independently from the DBserver layer. Replication ArangoDB offers two ways of data replication within a cluster, synchronous and asynchronous. In this section we explain some details and highlight the advantages and disadvantages respectively. Synchronous replication with automatic fail-over Synchronous replication works on a per-shard basis. One configures for each collection, how many copies of each shard are kept in the cluster. At any given time, one of the copies is declared to be the "leader" and all other replicas are "followers". Write operations for this shard are always sent to the DBserver which happens to hold the leader copy, which in turn replicates the changes to all followers before the operation is considered to be done and reported back to the coordinator. Read operations are all served by the server holding the leader copy, this allows to provide snapshot semantics for complex transactions. If a DBserver fails that holds a follower copy of a shard, then the leader can no longer synchronize its changes to that follower. After a short timeout (3 seconds), the leader gives up on the follower, declares it to be out of sync, and continues service without the follower. When the server with the follower copy comes back, it automatically resynchronizes its data with the leader and synchronous replication is restored. If a DBserver fails that holds a leader copy of a shard, then the leader can no longer serve any requests. It will no longer send a heartbeat to the Agency. Therefore, a supervision process running in the Raft leader of the Agency, can take the necessary action (after 15 seconds of missing heartbeats), namely to promote one of the servers that hold in-sync replicas of the shard to leader for that shard. This involves a reconfiguration in the Agency and leads to the fact that coordinators now contact a different DBserver for requests to this shard. Service resumes. The other surviving replicas automatically resynchronize their data with the new leader. When the DBserver with the original leader copy comes back, it notices that it now holds a follower replica, resynchronizes its data with the new leader and order is restored. All shard data synchronizations are done in an incremental way, such that resynchronizations are quick. This technology allows to move shards (follower and leader ones) between DBservers without service interruptions. Therefore, an ArangoDB cluster can move all the data on a specific DBserver to other DBservers and then shut down that server in a controlled way. This allows to scale down an ArangoDB cluster without service interruption, loss of fault tolerance or data loss. Furthermore, one can re-balance the distribution of the shards, either manually or automatically. All these operations can be triggered via a REST/JSON API or via the graphical web UI. All fail-over operations are completely handled within the ArangoDB cluster. 42 Architecture Obviously, synchronous replication involves a certain increased latency for write operations, simply because there is one more network hop within the cluster for every request. Therefore the user can set the replication factor to 1, which means that only one copy of each shard is kept, thereby switching off synchronous replication. This is a suitable setting for less important or easily recoverable data for which low latency write operations matter. Asynchronous replication with automatic fail-over Asynchronous replication works differently, in that it is organized using primary and secondary DBservers. Each secondary server replicates all the data held on a primary by polling in an asynchronous way. This process has very little impact on the performance of the primary. The disadvantage is that there is a delay between the confirmation of a write operation that is sent to the client and the actual replication of the data. If the master server fails during this delay, then committed and confirmed data can be lost. Nevertheless, we also offer automatic fail-over with this setup. Contrary to the synchronous case, here the fail-over management is done from outside the ArangoDB cluster. In a future version we might move this management into the supervision process in the Agency, but as of now, the management is done via the M esos framework scheduler for ArangoDB (see below). The granularity of the replication is a whole ArangoDB instance with all data that resides on that instance, which means that you need twice as many instances as without asynchronous replication. Synchronous replication is more flexible in that respect, you can have smaller and larger instances, and if one fails, the data can be rebalanced across the remaining ones. Microservices and zero administation The design and capabilities of ArangoDB are geared towards usage in modern microservice architectures of applications. With the Foxx services it is very easy to deploy a data centric microservice within an ArangoDB cluster. In addition, one can deploy multiple instances of ArangoDB within the same project. One part of the project might need a scalable document store, another might need a graph database, and yet another might need the full power of a multi-model database actually mixing the various data models. There are enormous efficiency benefits to be reaped by being able to use a single technology for various roles in a project. To simplify life of the devops in such a scenario we try as much as possible to use a zero administration approach for ArangoDB. A running ArangoDB cluster is resilient against failures and essentially repairs itself in case of temporary failures. See the next section for further capabilities in this direction. Apache Mesos integration For the distributed setup, we use the Apache M esos infrastructure by default. ArangoDB is a fully certified package for DC/OS and can thus be deployed essentially with a few mouse clicks or a single command, once you have an existing DC/OS cluster. But even on a plain Apache M esos cluster one can deploy ArangoDB via M arathon with a single API call and some JSON configuration. The advantage of this approach is that we can not only implement the initial deployment, but also the later management of automatic replacement of failed instances and the scaling of the ArangoDB cluster (triggered manually or even automatically). Since all manipulations are either via the graphical web UI or via JSON/REST calls, one can even implement auto-scaling very easily. A DC/OS cluster is a very natural environment to deploy microservice architectures, since it is so convenient to deploy various services, including potentially multiple ArangoDB cluster instances within the same DC/OS cluster. The built-in service discovery makes it extremely simple to connect the various microservices and M esos automatically takes care of the distribution and deployment of the various tasks. See the Deployment chapter and its subsections for instructions. It is possible to deploy an ArangoDB cluster by simply launching a bunch of Docker containers with the right command line options to link them up, or even on a single machine starting multiple ArangoDB processes. In that case, synchronous replication will work within the deployed ArangoDB cluster, and automatic fail-over in the sense that the duties of a failed server will automatically be assigned to another, surviving one. However, since the ArangoDB cluster cannot within itself launch additional instances, replacement of failed nodes is not automatic and scaling up and down has to be managed manually. This is why we do not recommend this setup for production deployment. 43 Data models Different data models and scalability In this section we discuss scalability in the context of the different data models supported by ArangoDB. Key/value pairs The key/value store data model is the easiest to scale. In ArangoDB, this is implemented in the sense that a document collection always has a primary key _key attribute and in the absence of further secondary indexes the document collection behaves like a simple key/value store. The only operations that are possible in this context are single key lookups and key/value pair insertions and updates. If _key is the only sharding attribute then the sharding is done with respect to the primary key and all these operations scale linearly. If the sharding is done using different shard keys, then a lookup of a single key involves asking all shards and thus does not scale linearly. Document store For the document store case even in the presence of secondary indexes essentially the same arguments apply, since an index for a sharded collection is simply the same as a local index for each shard. Therefore, single document operations still scale linearly with the size of the cluster, unless a special sharding configuration makes lookups or write operations more expensive. For a deeper analysis of this topic see this blog post in which good linear scalability of ArangoDB for single document operations is demonstrated. Complex queries and joins The AQL query language allows complex queries, using multiple collections, secondary indexes as well as joins. In particular with the latter, scaling can be a challenge, since if the data to be joined resides on different machines, a lot of communication has to happen. The AQL query execution engine organizes a data pipeline across the cluster to put together the results in the most efficient way. The query optimizer is aware of the cluster structure and knows what data is where and how it is indexed. Therefore, it can arrive at an informed decision about what parts of the query ought to run where in the cluster. Nevertheless, for certain complicated joins, there are limits as to what can be achieved. Graph database Graph databases are particularly good at queries on graphs that involve paths in the graph of an a priori unknown length. For example, finding the shortest path between two vertices in a graph, or finding all paths that match a certain pattern starting at a given vertex are such examples. However, if the vertices and edges along the occurring paths are distributed across the cluster, then a lot of communication is necessary between nodes, and performance suffers. To achieve good performance at scale, it is therefore necessary to get the distribution of the graph data across the shards in the cluster right. M ost of the time, the application developers and users of ArangoDB know best, how their graphs ara structured. Therefore, ArangoDB allows users to specify, according to which attributes the graph data is sharded. A useful first step is usually to make sure that the edges originating at a vertex reside on the same cluster node as the vertex. 44 Limitations Limitations ArangoDB has no built-in limitations to horizontal scalability. The central resilient Agency will easily sustain hundreds of DBservers and coordinators, and the usual database operations work completely decentrally and do not require assistance of the Agency. Likewise, the supervision process in the Agency can easily deal with lots of servers, since all its activities are not performance critical. Obviously, an ArangoDB cluster is limited by the available resources of CPU, memory, disk and network bandwidth and latency. 45 Data models & modeling Data models & modeling This chapter introduces ArangoDB's core concepts and covers its data model (or data models respectively), the terminology used throughout the database system and in this documentation, as well as aspects to consider when modeling your data to strike a balance between natural data structures and great performance You will also find usage examples on how to interact with the database system using arangosh, e.g. how to create and drop databases / collections, or how to save, update, replace and remove documents. You can do all this using the web interface as well and may therefore skip these sections as beginner. 46 Concepts Concepts Database Interaction ArangoDB is a database that serves documents to clients. These documents are transported using JSON via a TCP connection, using the HTTP protocol. A REST API is provided to interact with the database system. The web interface that comes with ArangoDB, called Aardvark, provides graphical user interface that is easy to use. An interactive shell, called Arangosh, is also shipped. In addition, there are so called drivers that make it easy to use the database system in various environments and programming languages. All these tools use the HTTP interface of the server and remove the necessity to roll own lowlevel code for basic communication in most cases. Data model The documents you can store in ArangoDB closely follow the JSON format, although they are stored in a binary format called VelocyPack. A document contains zero or more attributes, each of these attributes having a value. A value can either be an atomic type, i. e. number, string, boolean or null, or a compound type, i.e. an array or embedded document / object. Arrays and sub-objects can contain all of these types, which means that arbitrarily nested data structures can be represented in a single document. Documents are grouped into collections. A collection contains zero or more documents. If you are familiar with relational database management systems (RDBM S) then it is safe to compare collections to tables and documents to rows. The difference is that in a traditional RDBM S, you have to define columns before you can store records in a table. Such definitions are also known as schemas. ArangoDB is schema-less, which means that there is no need to define what attributes a document can have. Every single document can have a completely different structure and still be stored together with other documents in a single collection. In practice, there will be common denominators among the documents in a collection, but the database system itself doesn't force you to limit yourself to a certain data structure. There are two types of collections: document collection (also refered to as vertex collections in the context of graphs) as well as edge collections. Edge collections store documents as well, but they include two special attributes, _from and _to, which are used to create relations between documents. Usually, two documents (vertices) stored in document collections are linked by a document (edge) stored in an edge collection. This is ArangoDB's graph data model. It follows the mathematical concept of a directed, labeled graph, except that edges don't just have labels, but are full-blown documents. Collections exist inside of databases. There can be one or many databases. Different databases are usually used for multi tenant setups, as the data inside them (collections, documents etc.) is isolated from one another. The default database _system is special, because it cannot be removed. Database users are managed in this database, and their credentials are valid for all databases of a server instance. Data Retrieval Queries are used to filter documents based on certain criteria, to compute new data, as well as to manipulate or delete existing documents. Queries can be as simple as a "query by example" or as complex as "joins" using many collections or traversing graph structures. They are written in the ArangoDB Query Language (AQL). Cursors are used to iterate over the result of queries, so that you get easily processable batches instead of one big hunk. Indexes are used to speed up searches. There are various types of indexes, such as hash indexes and geo indexes. 47 Databases Handling Databases This is an introduction to managing databases in ArangoDB from within JavaScript. When you have an established connection to ArangoDB, the current database can be changed explicitly using the db._useDatabase() method. This will switch to the specified database (provided it exists and the user can connect to it). From this point on, any following action in the same shell or connection will use the specified database, unless otherwise specified. Note: If the database is changed, client drivers need to store the current database name on their side, too. This is because connections in ArangoDB do not contain any state information. All state information is contained in the HTTP request/response data. To connect to a specific database after arangosh has started use the command described above. It is also possible to specify a database name when invoking arangosh. For this purpose, use the command-line parameter --server.database, e.g. > arangosh --server.database test Please note that commands, actions, scripts or AQL queries should never access multiple databases, even if they exist. The only intended and supported way in ArangoDB is to use one database at a time for a command, an action, a script or a query. Operations started in one database must not switch the database later and continue operating in another. 48 Working with Databases Working with Databases Database Methods The following methods are available to manage databases via JavaScript. Please note that several of these methods can be used from the _system database only. Name return the database name db._name() Returns the name of the current database as a string. Examples arangosh> require("@arangodb").db._name(); _system ID return the database id db._id() Returns the id of the current database as a string. Examples arangosh> require("@arangodb").db._id(); 1 Path return the path to database files db._path() Returns the filesystem path of the current database as a string. Examples arangosh> require("@arangodb").db._path(); /tmp/arangosh_Fqdn5s/tmp-8043-1455050571/data/databases/database-1 isSystem return the database type db._isSystem() Returns whether the currently used database is the _system database. The system database has some special privileges and properties, for example, database management operations such as create or drop can only be executed from within this database. Additionally, the _system database itself cannot be dropped. Use Database change the current database db._useDatabase(name) Changes the current database to the database specified by name. Note that the database specified by name must already exist. Changing the database might be disallowed in some contexts, for example server-side actions (including Foxx). 49 Working with Databases When performing this command from arangosh, the current credentials (username and password) will be re-used. These credentials might not be valid to connect to the database specified by name. Additionally, the database only be accessed from certain endpoints only. In this case, switching the database might not work, and the connection / session should be closed and restarted with different username and password credentials and/or endpoint data. List Databases return the list of all existing databases db._databases() Returns the list of all databases. This method can only be used from within the _system database. Create Database create a new database db._createDatabase(name, options, users) Creates a new database with the name specified by name. There are restrictions for database names (see DatabaseNames). Note that even if the database is created successfully, there will be no change into the current database to the new database. Changing the current database must explicitly be requested by using the db._useDatabase method. The options attribute currently has no meaning and is reserved for future use. The optional users attribute can be used to create initial users for the new database. If specified, it must be a list of user objects. Each user object can contain the following attributes: username: the user name as a string. This attribute is mandatory. passwd: the user password as a string. If not specified, then it defaults to an empty string. active: a boolean flag indicating whether the user account should be active or not. The default value is true. extra: an optional JSON object with extra user information. The data contained in extra will be stored for the user but not be interpreted further by ArangoDB. If no initial users are specified, a default user root will be created with an empty string password. This ensures that the new database will be accessible via HTTP after it is created. You can create users in a database if no initial user is specified. Switch into the new database (username and password must be identical to the current session) and add or modify users with the following commands. require("@arangodb/users").save(username, password, true); require("@arangodb/users").update(username, password, true); require("@arangodb/users").remove(username); Alternatively, you can specify user data directly. For example: db._createDatabase("newDB", {}, [{ username: "newUser", passwd: "123456", active: true}]) Those methods can only be used from within the _system database. Drop Database drop an existing database db._dropDatabase(name) Drops the database specified by name. The database specified by name must exist. Note: Dropping databases is only possible from within the _system database. The _system database itself cannot be dropped. Databases are dropped asynchronously, and will be physically removed if all clients have disconnected and references have been garbagecollected. Engine statistics retrieve statistics related to the storage engine-rocksdb db._engineStats() Returns some statistics related to storage engine activity, including figures about data size, cache usage, etc. 50 Working with Databases Note: Currently this only produces useful output for the RocksDB engine. 51 Notes about Databases Notes about Databases Please keep in mind that each database contains its own system collections, which need to be set up when a database is created. This will make the creation of a database take a while. Replication is configured on a per-database level, meaning that any replication logging or applying for a new database must be configured explicitly after a new database has been created. Foxx applications are also available only in the context of the database they have been installed in. A new database will only provide access to the system applications shipped with ArangoDB (that is the web interface at the moment) and no other Foxx applications until they are explicitly installed for the particular database. 52 Collections JavaScript Interface to Collections This is an introduction to ArangoDB's interface for collections and how to handle collections from the JavaScript shell arangosh. For other languages see the corresponding language API. The most important call is the call to create a new collection. Address of a Collection All collections in ArangoDB have a unique identifier and a unique name. ArangoDB internally uses the collection's unique identifier to look up collections. This identifier, however, is managed by ArangoDB and the user has no control over it. In order to allow users to use their own names, each collection also has a unique name which is specified by the user. To access a collection from the user perspective, the collection name should be used, i.e.: Collection db._collection(collection-name) A collection is created by a "db._create" call. For example: Assume that the collection identifier is 7254820 and the name is demo, then the collection can be accessed as: db._collection("demo") If no collection with such a name exists, then null is returned. There is a short-cut that can be used for non-system collections: Collection name db.collection-name This call will either return the collection named db.collection-name or create a new one with that name and a set of default properties. Note: Creating a collection on the fly using db.collection-name is not recommend and does not work in arangosh. To create a new collection, please use Create db._create(collection-name) This call will create a new collection called collection-name. This method is a database method and is documented in detail at Database M ethods Synchronous replication Starting in ArangoDB 3.0, the distributed version offers synchronous replication, which means that there is the option to replicate all data automatically within the ArangoDB cluster. This is configured for sharded collections on a per collection basis by specifying a "replication factor" when the collection is created. A replication factor of k means that altogether k copies of each shard are kept in the cluster on k different servers, and are kept in sync. That is, every write operation is automatically replicated on all copies. This is organised using a leader/follower model. At all times, one of the servers holding replicas for a shard is "the leader" and all others are "followers", this configuration is held in the Agency (see Scalability for details of the ArangoDB cluster architecture). Every write operation is sent to the leader by one of the coordinators, and then replicated to all followers before the operation is reported to have succeeded. The leader keeps a record of which followers are currently in sync. In case of network problems or a failure of a follower, a leader can and will drop a follower temporarily after 3 seconds, such that service can resume. In due course, the follower will automatically resynchronize with the leader to restore resilience. 53 Collections If a leader fails, the cluster Agency automatically initiates a failover routine after around 15 seconds, promoting one of the followers to leader. The other followers (and the former leader, when it comes back), automatically resynchronize with the new leader to restore resilience. Usually, this whole failover procedure can be handled transparently for the coordinator, such that the user code does not even see an error message. Obviously, this fault tolerance comes at a cost of increased latency. Each write operation needs an additional network roundtrip for the synchronous replication of the followers, but all replication operations to all followers happen concurrently. This is, why the default replication factor is 1, which means no replication. For details on how to switch on synchronous replication for a collection, see the database method db._create(collection-name) in the section about Database M ethods. 54 Collection M ethods Collection Methods Drop drops a collection collection.drop(options) Drops a collection and all its indexes and data. In order to drop a system collection, an options object with attribute isSystem set to true must be specified. Note: dropping a collection in a cluster, which is prototype for sharing in other collections is prohibited. In order to be able to drop such a collection, all dependent collections must be dropped first. Examples arangosh> col = db.example; [ArangoCollection 15512, "example" (type document, status loaded)] arangosh> col.drop(); arangosh> col; [ArangoCollection 15512, "example" (type document, status deleted)] arangosh> col = db._example; [ArangoCollection 15516, "_example" (type document, status loaded)] arangosh> col.drop({ isSystem: true }); arangosh> col; [ArangoCollection 15516, "_example" (type document, status deleted)] Truncate truncates a collection collection.truncate() Truncates a collection, removing all documents but keeping all its indexes. Examples Truncates a collection: arangosh> col = db.example; arangosh> col.save({ "Hello" : "World" }); arangosh> col.count(); arangosh> col.truncate(); arangosh> col.count(); show execution results Properties gets or sets the properties of a collection collection.properties() Returns an object containing all collection properties. waitForSync: If true creating a document will only return after the data was synced to disk. journalSize : The size of the journal in bytes. This option is meaningful for the M M Files storage engine only. isVolatile: If true then the collection data will be kept in memory only and ArangoDB will not write or sync the data to disk. This option is meaningful for the M M Files storage engine only. 55 Collection M ethods keyOptions (optional) additional options for key generation. This is a JSON array containing the following attributes (note: some of the attributes are optional): type: the type of the key generator used for the collection. allowUserKeys: if set to true, then it is allowed to supply own key values in the _key attribute of a document. If set to false, then the key generator will solely be responsible for generating keys and supplying own key values in the _key attribute of documents is considered an error. increment: increment value for autoincrement key generator. Not used for other key generator types. offset: initial offset value for autoincrement key generator. Not used for other key generator types. indexBuckets: number of buckets into which indexes using a hash table are split. The default is 16 and this number has to be a power of 2 and less than or equal to 1024. This option is meaningful for the M M Files storage engine only. For very large collections one should increase this to avoid long pauses when the hash table has to be initially built or resized, since buckets are resized individually and can be initially built in parallel. For example, 64 might be a sensible value for a collection with 100 000 000 documents. Currently, only the edge index respects this value, but other index types might follow in future ArangoDB versions. Changes (see below) are applied when the collection is loaded the next time. In a cluster setup, the result will also contain the following attributes: numberOfShards: the number of shards of the collection. shardKeys: contains the names of document attributes that are used to determine the target shard for documents. replicationFactor: determines how many copies of each shard are kept on different DBServers. collection.properties(properties) Changes the collection properties. properties must be an object with one or more of the following attribute(s): waitForSync: If true creating a document will only return after the data was synced to disk. journalSize : The size of the journal in bytes. This option is meaningful for the M M Files storage engine only. indexBuckets : See above, changes are only applied when the collection is loaded the next time. This option is meaningful for the M M Files storage engine only. replicationFactor : Change the number of shard copies kept on different DBServers, valid values are integer numbers in the range of 1-10 (Cluster only) Note: it is not possible to change the journal size after the journal or datafile has been created. Changing this parameter will only effect newly created journals. Also note that you cannot lower the journal size to less then size of the largest document already stored in the collection. Note: some other collection properties, such as type, isVolatile, or keyOptions cannot be changed once the collection is created. Examples Read all properties arangosh> db.example.properties(); show execution results Change a property arangosh> db.example.properties({ waitForSync : true }); show execution results Figures returns the figures of a collection collection.figures() Returns an object containing statistics about the collection. Note : Retrieving the figures will always load the collection into memory. alive.count: The number of currently active documents in all datafiles and journals of the collection. Documents that are contained in 56 Collection M ethods the write-ahead log only are not reported in this figure. alive.size: The total size in bytes used by all active documents of the collection. Documents that are contained in the write-ahead log only are not reported in this figure. dead.count: The number of dead documents. This includes document versions that have been deleted or replaced by a newer version. Documents deleted or replaced that are contained in the write-ahead log only are not reported in this figure. dead.size: The total size in bytes used by all dead documents. dead.deletion: The total number of deletion markers. Deletion markers only contained in the write-ahead log are not reporting in this figure. datafiles.count: The number of datafiles. datafiles.fileSize: The total filesize of datafiles (in bytes). journals.count: The number of journal files. journals.fileSize: The total filesize of the journal files (in bytes). compactors.count: The number of compactor files. compactors.fileSize: The total filesize of the compactor files (in bytes). shapefiles.count: The number of shape files. This value is deprecated and kept for compatibility reasons only. The value will always be 0 since ArangoDB 2.0 and higher. shapefiles.fileSize: The total filesize of the shape files. This value is deprecated and kept for compatibility reasons only. The value will always be 0 in ArangoDB 2.0 and higher. shapes.count: The total number of shapes used in the collection. This includes shapes that are not in use anymore. Shapes that are contained in the write-ahead log only are not reported in this figure. shapes.size: The total size of all shapes (in bytes). This includes shapes that are not in use anymore. Shapes that are contained in the write-ahead log only are not reported in this figure. attributes.count: The total number of attributes used in the collection. Note: the value includes data of attributes that are not in use anymore. Attributes that are contained in the write-ahead log only are not reported in this figure. attributes.size: The total size of the attribute data (in bytes). Note: the value includes data of attributes that are not in use anymore. Attributes that are contained in the write-ahead log only are not reported in this figure. indexes.count: The total number of indexes defined for the collection, including the pre-defined indexes (e.g. primary index). indexes.size: The total memory allocated for indexes in bytes. lastTick: The tick of the last marker that was stored in a journal of the collection. This might be 0 if the collection does not yet have a journal. uncollectedLogfileEntries: The number of markers in the write-ahead log for this collection that have not been transferred to journals or datafiles. documentReferences: The number of references to documents in datafiles that JavaScript code currently holds. This information can be used for debugging compaction and unload issues. waitingFor: An optional string value that contains information about which object type is at the head of the collection's cleanup queue. This information can be used for debugging compaction and unload issues. compactionStatus.time: The point in time the compaction for the collection was last executed. This information can be used for debugging compaction issues. compactionStatus.message: The action that was performed when the compaction was last run for the collection. This information can be used for debugging compaction issues. Note: collection data that are stored in the write-ahead log only are not reported in the results. When the write-ahead log is collected, documents might be added to journals and datafiles of the collection, which may modify the figures of the collection. Also note that waitingFor and compactionStatus may be empty when called on a coordinator in a cluster. Additionally, the filesizes of collection and index parameter JSON files are not reported. These files should normally have a size of a few bytes each. Please also note that the fileSize values are reported in bytes and reflect the logical file sizes. Some filesystems may use optimisations (e.g. sparse files) so that the actual physical file size is somewhat different. Directories and sub-directories may also require space in the file system, but this space is not reported in the fileSize results. That means that the figures reported do not reflect the actual disk usage of the collection with 100% accuracy. The actual disk usage of a collection is normally slightly higher than the sum of the reported fileSize values. Still the sum of the fileSize values can still be used as a lower bound approximation of the disk usage. Examples arangosh> db.demo.figures() 57 Collection M ethods show execution results Load loads a collection collection.load() Loads a collection into memory. Note: cluster collections are loaded at all times. Examples arangosh> col = db.example; [ArangoCollection 15586, "example" (type document, status loaded)] arangosh> col.load(); arangosh> col; [ArangoCollection 15586, "example" (type document, status loaded)] Revision returns the revision id of a collection collection.revision() Returns the revision id of the collection The revision id is updated when the document data is modified, either by inserting, deleting, updating or replacing documents in it. The revision id of a collection can be used by clients to check whether data in a collection has changed or if it is still unmodified since a previous fetch of the revision id. The revision id returned is a string value. Clients should treat this value as an opaque string, and only use it for equality/non-equality comparisons. Path returns the physical path of the collection collection.path() The path operation returns a string with the physical storage path for the collection data. Note: this method will return nothing meaningful in a cluster. In a single-server ArangoDB, this method will only return meaningful data for the M M Files engine. Checksum calculates a checksum for the data in a collection collection.checksum(withRevisions, withData) The checksum operation calculates an aggregate hash value for all document keys contained in collection collection. If the optional argument withRevisions is set to true, then the revision ids of the documents are also included in the hash calculation. If the optional argument withData is set to true, then all user-defined document attributes are also checksummed. Including the document data in checksumming will make the calculation slower, but is more accurate. The checksum calculation algorithm changed in ArangoDB 3.0, so checksums from 3.0 and earlier versions for the same data will differ. Note: this method is not available in a cluster. Unload unloads a collection collection.unload() Starts unloading a collection from memory. Note that unloading is deferred until all query have finished. Note: cluster collections cannot be unloaded. Examples 58 Collection M ethods arangosh> col = db.example; [ArangoCollection 7427, "example" (type document, status loaded)] arangosh> col.unload(); arangosh> col; [ArangoCollection 7427, "example" (type document, status unloaded)] Rename renames a collection collection.rename(new-name) Renames a collection using the new-name. The new-name must not already be used for a different collection. new-name must also be a valid collection name. For more information on valid collection names please refer to the naming conventions. If renaming fails for any reason, an error is thrown. If renaming the collection succeeds, then the collection is also renamed in all graph definitions inside the _graphs collection in the current database. Note: this method is not available in a cluster. Examples arangosh> c = db.example; [ArangoCollection 15669, "example" (type document, status loaded)] arangosh> c.rename("better-example"); arangosh> c; [ArangoCollection 15669, "better-example" (type document, status loaded)] Rotate rotates the current journal of a collection collection.rotate() Rotates the current journal of a collection. This operation makes the current journal of the collection a read-only datafile so it may become a candidate for garbage collection. If there is currently no journal available for the collection, the operation will fail with an error. Note: this method is specific for the M M Files storage engine, and there it is not available in a cluster. Note: please note that you need appropriate user permissions to execute this. To do the rename collections in first place you need to have administrative rights on the database To have access to the resulting renamed collection you either need to have access to all collections of that database ( * ) or a main system administrator has to give you access to the newly named one. 59 Database M ethods Database Methods Collection returns a single collection or null db._collection(collection-name) Returns the collection with the given name or null if no such collection exists. db._collection(collection-identifier) Returns the collection with the given identifier or null if no such collection exists. Accessing collections by identifier is discouraged for end users. End users should access collections using the collection name. Examples Get a collection by name: arangosh> db._collection("demo"); [ArangoCollection 92, "demo" (type document, status loaded)] Get a collection by id: arangosh> db._collection(123456); [ArangoCollection 123456, "demo" (type document, status loaded)] Unknown collection: arangosh> db._collection("unknown"); null Create creates a new document or edge collection db._create(collection-name) Creates a new document collection named collection-name. If the collection name already exists or if the name format is invalid, an error is thrown. For more information on valid collection names please refer to the naming conventions. db._create(collection-name, properties) properties must be an object with the following attributes: waitForSync (optional, default false): If true creating a document will only return after the data was synced to disk. journalSize (optional, default is a configuration parameter: The maximal size of a journal or datafile. Note that this also limits the maximal size of a single object. M ust be at least 1M B. isSystem (optional, default is false): If true, create a system collection. In this case collection-name should start with an underscore. End users should normally create non-system collections only. API implementors may be required to create system collections in very special occasions, but normally a regular collection will do. isVolatile (optional, default is false): If true then the collection data is kept in-memory only and not made persistent. Unloading the collection will cause the collection data to be discarded. Stopping or re-starting the server will also cause full loss of data in the collection. The collection itself will remain however (only the data is volatile). Setting this option will make the resulting collection be slightly faster than regular collections because ArangoDB does not enforce any synchronization to disk and does not calculate any CRC checksums for datafiles (as there are no datafiles). This option is meaningful for the M M Files storage engine only. keyOptions (optional): additional options for key generation. If specified, then keyOptions should be a JSON object containing the following attributes (note: some of them are optional): type: specifies the type of the key generator. The currently available generators are traditional and autoincrement. (note: autoincrement is currently only supported for non-sharded collections) 60 Database M ethods allowUserKeys: if set to true, then it is allowed to supply own key values in the _key attribute of a document. If set to false, then the key generator will solely be responsible for generating keys and supplying own key values in the _key attribute of documents is considered an error. increment: increment value for autoincrement key generator. Not used for other key generator types. offset: initial offset value for autoincrement key generator. Not used for other key generator types. numberOfShards (optional, default is 1): in a cluster, this value determines the number of shards to create for the collection. In a single server setup, this option is meaningless. shardKeys (optional, default is [ "_key" ] ): in a cluster, this attribute determines which document attributes are used to determine the target shard for documents. Documents are sent to shards based on the values they have in their shard key attributes. The values of all shard key attributes in a document are hashed, and the hash value is used to determine the target shard. Note that values of shard key attributes cannot be changed once set. This option is meaningless in a single server setup. When choosing the shard keys, one must be aware of the following rules and limitations: In a sharded collection with more than one shard it is not possible to set up a unique constraint on an attribute that is not the one and only shard key given in shardKeys. This is because enforcing a unique constraint would otherwise make a global index necessary or need extensive communication for every single write operation. Furthermore, if _key is not the one and only shard key, then it is not possible to set the _key attribute when inserting a document, provided the collection has more than one shard. Again, this is because the database has to enforce the unique constraint on the _key attribute and this can only be done efficiently if this is the only shard key by delegating to the individual shards. replicationFactor (optional, default is 1): in a cluster, this attribute determines how many copies of each shard are kept on different DBServers. The value 1 means that only one copy (no synchronous replication) is kept. A value of k means that k-1 replicas are kept. Any two copies reside on different DBServers. Replication between them is synchronous, that is, every write operation to the "leader" copy will be replicated to all "follower" replicas, before the write operation is reported successful. If a server fails, this is detected automatically and one of the servers holding copies take over, usually without an error being reported. When using the Enterprise version of ArangoDB the replicationFactor may be set to "satellite" making the collection locally joinable on every database server. This reduces the number of network hops dramatically when using joins in AQL at the costs of reduced write performance on these collections. distributeShardsLike distribute the shards of this collection cloning the shard distribution of another. If this value is set it will copy replicationFactor and numberOfShards from the other collection, the attributes in this collection will be ignored and can be ommited. db._create(collection-name, properties, type) Specifies the optional type of the collection, it can either be document or edge. On default it is document. Instead of giving a type you can also use db._createEdgeCollection or db._createDocumentCollection. db._create(collection-name, properties[, type], options) As an optional third (if the type string is being omitted) or fourth parameter you can specify an optional options map that controls how the cluster will create the collection. These options are only relevant at creation time and will not be persisted: waitForSyncReplication (default: true) When enabled the server will only report success back to the client if all replicas have created the collection. Set to false if you want faster server responses and don't care about full replication. enforceReplicationFactor (default: true) When enabled which means the server will check if there are enough replicas available at creation time and bail out otherwise. Set to false to disable this extra check. Examples With defaults: arangosh> c = db._create("users"); arangosh> c.properties(); show execution results With properties: arangosh> c = db._create("users", { waitForSync : true, 61 Database M ethods ........> journalSize : 1024 * 1204}); arangosh> c.properties(); show execution results With a key generator: arangosh> db._create("users", ........> { keyOptions: { type: "autoincrement", offset: 10, increment: 5 } }); arangosh> db.users.save({ name: "user 1" }); arangosh> db.users.save({ name: "user 2" }); arangosh> db.users.save({ name: "user 3" }); show execution results With a special key option: arangosh> db._create("users", { keyOptions: { allowUserKeys: false } }); arangosh> db.users.save({ name: "user 1" }); arangosh> db.users.save({ name: "user 2", _key: "myuser" }); arangosh> db.users.save({ name: "user 3" }); show execution results creates a new edge collection db._createEdgeCollection(collection-name) Creates a new edge collection named collection-name. If the collection name already exists an error is thrown. The default value for waitForSync is false. db._createEdgeCollection(collection-name, properties) properties must be an object with the following attributes: waitForSync (optional, default false): If true creating a document will only return after the data was synced to disk. journalSize (optional, default is "configuration parameter"): The maximal size of a journal or datafile. Note that this also limits the maximal size of a single object and must be at least 1M B. creates a new document collection db._createDocumentCollection(collection-name) Creates a new document collection named collection-name. If the document name already exists and error is thrown. All Collections returns all collections db._collections() Returns all collections of the given database. Examples arangosh> db._collections(); show execution results Collection Name selects a collection from the vocbase db.collection-name Returns the collection with the given collection-name. If no such collection exists, create a collection named collection-name with the default properties. Examples arangosh> db.example; 62 Database M ethods [ArangoCollection 15402, "example" (type document, status loaded)] Drop drops a collection db._drop(collection) Drops a collection and all its indexes and data. db._drop(collection-identifier) Drops a collection identified by collection-identifier with all its indexes and data. No error is thrown if there is no such collection. db._drop(collection-name) Drops a collection named collection-name and all its indexes. No error is thrown if there is no such collection. db._drop(collection-name, options) In order to drop a system collection, one must specify an options object with attribute isSystem set to true. Otherwise it is not possible to drop system collections. Note: cluster collection, which are prototypes for collections with distributeShardsLike parameter, cannot be dropped. Examples Drops a collection: arangosh> col = db.example; [ArangoCollection 15449, "example" (type document, status loaded)] arangosh> db._drop(col); arangosh> col; [ArangoCollection 15449, "example" (type document, status loaded)] Drops a collection identified by name: arangosh> col = db.example; [ArangoCollection 15453, "example" (type document, status loaded)] arangosh> db._drop("example"); arangosh> col; [ArangoCollection 15453, "example" (type document, status deleted)] Drops a system collection arangosh> col = db._example; [ArangoCollection 15457, "_example" (type document, status loaded)] arangosh> db._drop("_example", { isSystem: true }); arangosh> col; [ArangoCollection 15457, "_example" (type document, status deleted)] Truncate truncates a collection db._truncate(collection) Truncates a collection, removing all documents but keeping all its indexes. db._truncate(collection-identifier) Truncates a collection identified by collection-identified. No error is thrown if there is no such collection. db._truncate(collection-name) Truncates a collection named collection-name. No error is thrown if there is no such collection. Examples 63 Database M ethods Truncates a collection: arangosh> col = db.example; arangosh> col.save({ "Hello" : "World" }); arangosh> col.count(); arangosh> db._truncate(col); arangosh> col.count(); show execution results Truncates a collection identified by name: arangosh> col = db.example; arangosh> col.save({ "Hello" : "World" }); arangosh> col.count(); arangosh> db._truncate("example"); arangosh> col.count(); show execution results 64 Documents Documents This is an introduction to ArangoDB's interface for working with documents from the JavaScript shell arangosh or in JavaScript code in the server. For other languages see the corresponding language API. Basics and Terminology: section on the basic approach Collection M ethods: detailed API description for collection objects Database M ethods: detailed API description for database objects 65 Basics and Terminology Basics and Terminology Documents in ArangoDB are JSON objects. These objects can be nested (to any depth) and may contain lists. Each document has a unique primary key which identifies it within its collection. Furthermore, each document is uniquely identified by its document handle across all collections in the same database. Different revisions of the same document (identified by its handle) can be distinguished by their document revision. Any transaction only ever sees a single revision of a document. For example: { "_id" : "myusers/3456789", "_key" : "3456789", "_rev" : "14253647", "firstName" : "John", "lastName" : "Doe", "address" : { "street" : "Road To Nowhere 1", "city" : "Gotham" }, "hobbies" : [ {name: "swimming", howFavorite: 10}, {name: "biking", howFavorite: 6}, {name: "programming", howFavorite: 4} ] } All documents contain special attributes: the document handle is stored as a string in the document revision in _key _rev . The value of the _key _id , the document's primary key in attribute can be specified by the user when creating a document. values are immutable once the document has been created. The _rev and _key _id and value is maintained by ArangoDB automatically. Document Handle A document handle uniquely identifies a document in the database. It is a string and consists of the collection's name and the document key ( _key attribute) separated by / . Document Key A document key uniquely identifies a document in the collection it is stored in. It can and should be used by clients when specific documents are queried. The document key is stored in the _key attribute of each document. The key values are automatically indexed by ArangoDB in a collection's primary index. Thus looking up a document by its key is a fast operation. The _key value of a document is immutable once the document has been created. By default, ArangoDB will auto-generate a document key if no _key attribute is specified, and use the user-specified _key otherwise. The generated _key is guaranteed to be unique in the collection it was generated for. This also applies to sharded collections in a cluster. It can't be guaranteed that the _key is unique within a database or across a whole node or instance however. This behavior can be changed on a per-collection level by creating collections with the Using keyOptions keyOptions attribute. it is possible to disallow user-specified keys completely, or to force a specific regime for auto-generating the _key values. Document Revision As ArangoDB supports M VCC (M ultiple Version Concurrency Control), documents can exist in more than one revision. The document revision is the M VCC token used to specify a particular revision of a document (identified by its _id ). It is a string value that contained (up to ArangoDB 3.0) an integer number and is unique within the list of document revisions for a single document. In ArangoDB >= 3.1 the _rev strings are in fact time stamps. They use the local clock of the DBserver that actually writes the document and have millisecond accuracy. Actually, a "Hybrid Logical Clock" is used (for this concept see this paper). Within one shard it is guaranteed that two different document revisions have a different _rev string, even if they are written in the same millisecond, and that these stamps are ascending. 66 Basics and Terminology Note however that different servers in your cluster might have a clock skew, and therefore between different shards or even between different collections the time stamps are not guaranteed to be comparable. The Hybrid Logical Clock feature does one thing to address this issue: Whenever a message is sent from some server A in your cluster to another one B, it is ensured that any timestamp taken on B after the message has arrived is greater than any timestamp taken on A before the message was sent. This ensures that if there is some "causality" between events on different servers, time stamps increase from cause to effect. A direct consequence of this is that sometimes a server has to take timestamps that seem to come from the future of its own clock. It will however still produce ever increasing timestamps. If the clock skew is small, then your timestamps will relatively accurately describe the time when the document revision was actually written. ArangoDB uses 64bit unsigned integer values to maintain document revisions internally. At this stage we intentionally do not document the exact format of the revision values. When returning document revisions to clients, ArangoDB will put them into a string to ensure the revision is not clipped by clients that do not support big integers. Clients should treat the revision returned by ArangoDB as an opaque string when they store or use it locally. This will allow ArangoDB to change the format of revisions later if this should be required (as has happened with 3.1 with the Hybrid Logical Clock). Clients can use revisions to perform simple equality/non-equality comparisons (e.g. to check whether a document has changed or not), but they should not use revision ids to perform greater/less than comparisons with them to check if a document revision is older than one another, even if this might work for some cases. Document revisions can be used to conditionally query, update, replace or delete documents in the database. In order to find a particular revision of a document, you need the document handle or key, and the document revision. Multiple Documents in a single Command Beginning with ArangoDB 3.0 the basic document API has been extended to handle not only single documents but multiple documents in a single command. This is crucial for performance, in particular in the cluster situation, in which a single request can involve multiple network hops within the cluster. Another advantage is that it reduces the overhead of individual network round trips between the client and the server. The general idea to perform multiple document operations in a single command is to use JSON arrays of objects in the place of a single document. As a consequence, document keys, handles and revisions for preconditions have to be supplied embedded in the individual documents given. M ultiple document operations are restricted to a single document or edge collection. See the API descriptions for collection objects for details. Note that the API for database objects do not offer these operations. 67 Collection M ethods Collection Methods All collection.all() Fetches all documents from a collection and returns a cursor. You can use toArray, next, or hasNext to access the result. The result can be limited using the skip and limit operator. Examples Use toArray to get all documents at once: arangosh> db.five.save({ name : "one" }); arangosh> db.five.save({ name : "two" }); arangosh> db.five.save({ name : "three" }); arangosh> db.five.save({ name : "four" }); arangosh> db.five.save({ name : "five" }); arangosh> db.five.all().toArray(); show execution results Use limit to restrict the documents: arangosh> db.five.save({ name : "one" }); arangosh> db.five.save({ name : "two" }); arangosh> db.five.save({ name : "three" }); arangosh> db.five.save({ name : "four" }); arangosh> db.five.save({ name : "five" }); arangosh> db.five.all().limit(2).toArray(); show execution results Query by example collection.byExample(example) Fetches all documents from a collection that match the specified example and returns a cursor. You can use toArray, next, or hasNext to access the result. The result can be limited using the skip and limit operator. An attribute name of the form a.b is interpreted as attribute path, not as attribute. If you use { "a" : { "c" : 1 } } as example, then you will find all documents, such that the attribute a contains a document of the form {c : 1 }. For example the document { "a" : { "c" : 1 }, "b" : 1 } will match, but the document { "a" : { "c" : 1, "b" : 1 } } will not. However, if you use { "a.c" : 1 } 68 Collection M ethods then you will find all documents, which contain a sub-document in a that has an attribute c of value 1. Both the following documents { "a" : { "c" : 1 }, "b" : 1 } and { "a" : { "c" : 1, "b" : 1 } } will match. collection.byExample(path1, value1, ...) As alternative you can supply an array of paths and values. Examples Use toArray to get all documents at once: arangosh> db.users.save({ name: "Gerhard" }); arangosh> db.users.save({ name: "Helmut" }); arangosh> db.users.save({ name: "Angela" }); arangosh> db.users.all().toArray(); arangosh> db.users.byExample({ "_id" : "users/20" }).toArray(); arangosh> db.users.byExample({ "name" : "Gerhard" }).toArray(); arangosh> db.users.byExample({ "name" : "Helmut", "_id" : "users/15" }).toArray(); show execution results Use next to loop over all documents: arangosh> db.users.save({ name: "Gerhard" }); arangosh> db.users.save({ name: "Helmut" }); arangosh> db.users.save({ name: "Angela" }); arangosh> var a = db.users.byExample( {"name" : "Angela" } ); arangosh> while (a.hasNext()) print(a.next()); show execution results First Example collection.firstExample(example) Returns some document of a collection that matches the specified example. If no such document exists, null will be returned. The example has to be specified as paths and values. See byExample for details. collection.firstExample(path1, value1, ...) As alternative you can supply an array of paths and values. Examples arangosh> db.users.firstExample("name", "Angela"); show execution results Range collection.range(attribute, left, right) 69 Collection M ethods Returns all documents from a collection such that the attribute is greater or equal than left and strictly less than right. You can use toArray, next, or hasNext to access the result. The result can be limited using the skip and limit operator. An attribute name of the form a.b is interpreted as attribute path, not as attribute. Note: the range simple query function is deprecated as of ArangoDB 2.6. The function may be removed in future versions of ArangoDB. The preferred way for retrieving documents from a collection within a specific range is to use an AQL query as follows: FOR doc IN @@collection FILTER doc.value >= @left && doc.value < @right LIMIT @skip, @limit RETURN doc Examples Use toArray to get all documents at once: arangosh> db.old.ensureIndex({ type: "skiplist", fields: [ "age" ] }); arangosh> db.old.save({ age: 15 }); arangosh> db.old.save({ age: 25 }); arangosh> db.old.save({ age: 30 }); arangosh> db.old.range("age", 10, 30).toArray(); show execution results Closed range collection.closedRange(attribute, left, right) Returns all documents of a collection such that the attribute is greater or equal than left and less or equal than right. You can use toArray, next, or hasNext to access the result. The result can be limited using the skip and limit operator. An attribute name of the form a.b is interpreted as attribute path, not as attribute. Note: the closedRange simple query function is deprecated as of ArangoDB 2.6. The function may be removed in future versions of ArangoDB. The preferred way for retrieving documents from a collection within a specific range is to use an AQL query as follows: FOR doc IN @@collection FILTER doc.value >= @left && doc.value <= @right LIMIT @skip, @limit RETURN doc Examples Use toArray to get all documents at once: arangosh> db.old.ensureIndex({ type: "skiplist", fields: [ "age" ] }); arangosh> db.old.save({ age: 15 }); arangosh> db.old.save({ age: 25 }); arangosh> db.old.save({ age: 30 }); arangosh> db.old.closedRange("age", 10, 30).toArray(); show execution results Any collection.any() Returns a random document from the collection or null if none exists. Note: this method is expensive when using the RocksDB storage engine. 70 Collection M ethods Count collection.count() Returns the number of living documents in the collection. Examples arangosh> db.users.count(); 0 toArray collection.toArray() Converts the collection into an array of documents. Never use this call in a production environment as it will basically create a copy of your collection in RAM which will use resources depending on the number and size of the documents in your collecion. Document collection.document(object) The document method finds a document given an object object containing the _id or _key attribute. The method returns the document if it can be found. If both attributes are given, the _id takes precedence, it is an error, if the collection part of the _id does not match the collection. An error is thrown if _rev is specified but the document found has a different revision already. An error is also thrown if no document exists with the given _id or _key value. Please note that if the method is executed on the arangod server (e.g. from inside a Foxx application), an immutable document object will be returned for performance reasons. It is not possible to change attributes of this immutable object. To update or patch the returned document, it needs to be cloned/copied into a regular JavaScript object first. This is not necessary if the document method is called from out of arangosh or from any other client. collection.document(document-handle) As before. Instead of object a document-handle can be passed as first argument. No revision can be specified in this case. collection.document(document-key) As before. Instead of object a document-key can be passed as first argument. collection.document(array) This variant allows to perform the operation on a whole array of arguments. The behavior is exactly as if document would have been called on all members of the array separately and all results are returned in an array. If an error occurs with any of the documents, no exception is risen! Instead of a document an error object is returned in the result array. Examples Returns the document for a document-handle: arangosh> db.example.document("example/2873916"); show execution results Returns the document for a document-key: arangosh> db.example.document("2873916"); show execution results Returns the document for an object: arangosh> db.example.document({_id: "example/2873916"}); 71 Collection M ethods show execution results Returns the document for an array of two keys: arangosh> db.example.document(["2873916","2873917"]); show execution results An error is raised if the document is unknown: arangosh> db.example.document("example/4472917"); [ArangoError 1202: document not found] An error is raised if the handle is invalid: arangosh> db.example.document(""); [ArangoError 1205: illegal document handle] Changes in 3.0 from 2.8: document can now query multiple documents with one call. Exists checks whether a document exists collection.exists(object) The exists method determines whether a document exists given an object object containing the _id or _key attribute. If both attributes are given, the _id takes precedence, it is an error, if the collection part of the _id does not match the collection. An error is thrown if _rev is specified but the document found has a different revision already. Instead of returning the found document or an error, this method will only return an object with the attributes _id, _key and _rev, or false if no document with the given _id or _key exists. It can thus be used for easy existence checks. This method will throw an error if used improperly, e.g. when called with a non-document handle, a non-document, or when a crosscollection request is performed. collection.exists(document-handle) As before. Instead of object a document-handle can be passed as first argument. collection.exists(document-key) As before. Instead of object a document-key can be passed as first argument. collection.exists(array) This variant allows to perform the operation on a whole array of arguments. The behavior is exactly as if exists would have been called on all members of the array separately and all results are returned in an array. If an error occurs with any of the documents, the operation stops immediately returning only an error object. Changes in 3.0 from 2.8: In the case of a revision mismatch exists now throws an error instead of simply returning false. This is to make it possible to tell the difference between a revision mismatch and a non-existing document. exists can now query multiple documents with one call. Lookup By Keys collection.documents(keys) 72 Collection M ethods Looks up the documents in the specified collection using the array of keys provided. All documents for which a matching key was specified in the keys array and that exist in the collection will be returned. Keys for which no document can be found in the underlying collection are ignored, and no exception will be thrown for them. This method is deprecated in favour of the array variant of document. Examples arangosh> keys = [ ]; arangosh> for (var i = 0; i < 10; ++i) { ........> db.example.insert({ _key: "test" + i, value: i }); ........> keys.push("test" + i); ........> } arangosh> db.example.documents(keys); show execution results Insert collection.insert(data) Creates a new document in the collection from the given data. The data must be an object. The attributes _id and _rev are ignored and are automatically generated. A unique value for the attribute _key will be automatically generated if not specified. If specified, there must not be a document with the given _key in the collection. The method returns a document with the attributes _id, _key and _rev. The attribute _id contains the document handle of the newly created document, the attribute _key the document key and the attribute _rev contains the document revision. collection.insert(data, options) Creates a new document in the collection from the given data as above. The optional options parameter must be an object and can be used to specify the following options: waitForSync: One can force synchronization of the document creation operation to disk even in case that the waitForSync flag is been disabled for the entire collection. Thus, the waitForSync option can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. silent: If this flag is set to true, the method does not return any output. returnNew: If this flag is set to true, the complete new document is returned in the output under the attribute new. Note: since ArangoDB 2.2, insert is an alias for save. collection.insert(array) collection.insert(array, options) These two variants allow to perform the operation on a whole array of arguments. The behavior is exactly as if insert would have been called on all members of the array separately and all results are returned in an array. If an error occurs with any of the documents, no exception is risen! Instead of a document an error object is returned in the result array. The options behave exactly as before. Changes in 3.0 from 2.8: The options silent and returnNew are new. The method can now insert multiple documents with one call. Examples arangosh> db.example.insert({ Hello : "World" }); arangosh> db.example.insert({ Hello : "World" }, {waitForSync: true}); show execution results arangosh> db.example.insert([{ Hello : "World" }, {Hello: "there"}]) 73 Collection M ethods arangosh> db.example.insert([{ Hello : "World" }, {}], {waitForSync: true}); show execution results Replace collection.replace(selector, data) Replaces an existing document described by the selector, which must be an object containing the _id or _key attribute. There must be a document with that _id or _key in the current collection. This document is then replaced with the data given as second argument. Any attribute _id, _key or _rev in data is ignored. The method returns a document with the attributes _id, _key, _rev and _oldRev. The attribute _id contains the document handle of the updated document, the attribute _rev contains the document revision of the updated document, the attribute _oldRev contains the revision of the old (now replaced) document. If the selector contains a _rev attribute, the method first checks that the specified revision is the current revision of that document. If not, there is a conflict, and an error is thrown. collection.replace(selector, data, options) As before, but options must be an object that can contain the following boolean attributes: waitForSync: One can force synchronization of the document creation operation to disk even in case that the waitForSync flag is been disabled for the entire collection. Thus, the waitForSync option can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. overwrite: If this flag is set to true, a _rev attribute in the selector is ignored. returnNew: If this flag is set to true, the complete new document is returned in the output under the attribute new. returnOld: If this flag is set to true, the complete previous revision of the document is returned in the output under the attribute old. silent: If this flag is set to true, no output is returned. collection.replace(document-handle, data) collection.replace(document-handle, data, options) As before. Instead of selector a document-handle can be passed as first argument. No revision precondition is tested. collection.replace(document-key, data) collection.replace(document-key, data, options) As before. Instead of selector a document-key can be passed as first argument. No revision precondition is tested. collection.replace(selectorarray, dataarray) collection.replace(selectorarray, dataarray, options) These two variants allow to perform the operation on a whole array of selector/data pairs. The two arrays given as selectorarray and dataarray must have the same length. The behavior is exactly as if replace would have been called on all respective members of the two arrays and all results are returned in an array. If an error occurs with any of the documents, no exception is risen! Instead of a document an error object is returned in the result array. The options behave exactly as before. Examples Create and update a document: arangosh> a1 = db.example.insert({ a : 1 }); arangosh> a2 = db.example.replace(a1, { a : 2 }); arangosh> a3 = db.example.replace(a1, { a : 3 }); show execution results Use a document handle: arangosh> a1 = db.example.insert({ a : 1 }); 74 Collection M ethods arangosh> a2 = db.example.replace("example/3903044", { a : 2 }); show execution results Changes in 3.0 from 2.8: The options silent, returnNew and returnOld are new. The method can now replace multiple documents with one call. Update collection.update(selector, data) Updates an existing document described by the selector, which must be an object containing the _id or _key attribute. There must be a document with that _id or _key in the current collection. This document is then patched with the data given as second argument. Any attribute _id, _key or _rev in data is ignored. The method returns a document with the attributes _id, _key, _rev and _oldRev. The attribute _id contains the document handle of the updated document, the attribute _rev contains the document revision of the updated document, the attribute _oldRev contains the revision of the old (now updated) document. If the selector contains a _rev attribute, the method first checks that the specified revision is the current revision of that document. If not, there is a conflict, and an error is thrown. collection.update(selector, data, options) As before, but options must be an object that can contain the following boolean attributes: waitForSync: One can force synchronization of the document creation operation to disk even in case that the waitForSync flag is been disabled for the entire collection. Thus, the waitForSync option can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. overwrite: If this flag is set to true, a _rev attribute in the selector is ignored. returnNew: If this flag is set to true, the complete new document is returned in the output under the attribute new. returnOld: If this flag is set to true, the complete previous revision of the document is returned in the output under the attribute old. silent: If this flag is set to true, no output is returned. keepNull: The optional keepNull parameter can be used to modify the behavior when handling null values. Normally, null values are stored in the database. By setting the keepNull parameter to false, this behavior can be changed so that all attributes in data with null values will be removed from the target document. mergeObjects: Controls whether objects (not arrays) will be merged if present in both the existing and the patch document. If set to false, the value in the patch document will overwrite the existing document's value. If set to true, objects will be merged. The default is true. collection.update(document-handle, data) collection.update(document-handle, data, options) As before. Instead of selector a document-handle can be passed as first argument. No revision precondition is tested. collection.update(document-key, data) collection.update(document-key, data, options) As before. Instead of selector a document-key can be passed as first argument. No revision precondition is tested. collection.update(selectorarray, dataarray) collection.update(selectorarray, dataarray, options) These two variants allow to perform the operation on a whole array of selector/data pairs. The two arrays given as selectorarray and dataarray must have the same length. The behavior is exactly as if update would have been called on all respective members of the two arrays and all results are returned in an array. If an error occurs with any of the documents, no exception is risen! Instead of a document an error object is returned in the result array. The options behave exactly as before. Examples Create and update a document: 75 Collection M ethods arangosh> a1 = db.example.insert({"a" : 1}); arangosh> a2 = db.example.update(a1, {"b" : 2, "c" : 3}); arangosh> a3 = db.example.update(a1, {"d" : 4}); arangosh> a4 = db.example.update(a2, {"e" : 5, "f" : 6 }); arangosh> db.example.document(a4); arangosh> a5 = db.example.update(a4, {"a" : 1, c : 9, e : 42 }); arangosh> db.example.document(a5); show execution results Use a document handle: arangosh> a1 = db.example.insert({"a" : 1}); arangosh> a2 = db.example.update("example/18612115", { "x" : 1, "y" : 2 }); show execution results Use the keepNull parameter to remove attributes with null values: arangosh> db.example.insert({"a" : 1}); arangosh> db.example.update("example/19988371", ........> { "b" : null, "c" : null, "d" : 3 }); arangosh> db.example.document("example/19988371"); arangosh> db.example.update("example/19988371", { "a" : null }, false, false); arangosh> db.example.document("example/19988371"); arangosh> db.example.update("example/19988371", ........> { "b" : null, "c": null, "d" : null }, false, false); arangosh> db.example.document("example/19988371"); show execution results Patching array values: arangosh> db.example.insert({"a" : { "one" : 1, "two" : 2, "three" : 3 }, ........> "b" : { }}); arangosh> db.example.update("example/20774803", {"a" : { "four" : 4 }, ........> "b" : { "b1" : 1 }}); arangosh> db.example.document("example/20774803"); arangosh> db.example.update("example/20774803", { "a" : { "one" : null }, ........> "b" : null }, ........> false, false); arangosh> db.example.document("example/20774803"); show execution results Changes in 3.0 from 2.8: The options silent, returnNew and returnOld are new. The method can now update multiple documents with one call. Remove collection.remove(selector) Removes a document described by the selector, which must be an object containing the _id or _key attribute. There must be a document with that _id or _key in the current collection. This document is then removed. The method returns a document with the attributes _id, _key and _rev. The attribute _id contains the document handle of the removed document, the attribute _rev contains the document revision of the removed document. 76 Collection M ethods If the selector contains a _rev attribute, the method first checks that the specified revision is the current revision of that document. If not, there is a conflict, and an error is thrown. collection.remove(selector, options) As before, but options must be an object that can contain the following boolean attributes: waitForSync: One can force synchronization of the document creation operation to disk even in case that the waitForSync flag is been disabled for the entire collection. Thus, the waitForSync option can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. overwrite: If this flag is set to true, a _rev attribute in the selector is ignored. returnOld: If this flag is set to true, the complete previous revision of the document is returned in the output under the attribute old. silent: If this flag is set to true, no output is returned. collection.remove(document-handle) collection.remove(document-handle, options) As before. Instead of selector a document-handle can be passed as first argument. No revision check is performed. collection.remove(document-key) collection.remove(document-handle, options) As before. Instead of selector a document-handle can be passed as first argument. No revision check is performed. collection.remove(selectorarray) collection.remove(selectorarray,options) These two variants allow to perform the operation on a whole array of selectors. The behavior is exactly as if remove would have been called on all members of the array separately and all results are returned in an array. If an error occurs with any of the documents, no exception is risen! Instead of a document an error object is returned in the result array. The options behave exactly as before. Examples Remove a document: arangosh> a1 = db.example.insert({ a : 1 }); arangosh> db.example.document(a1); arangosh> db.example.remove(a1); arangosh> db.example.document(a1); show execution results Remove a document with a conflict: arangosh> a1 = db.example.insert({ a : 1 }); arangosh> a2 = db.example.replace(a1, { a : 2 }); arangosh> db.example.remove(a1); arangosh> db.example.remove(a1, true); arangosh> db.example.document(a1); show execution results Changes in 3.0 from 2.8: The method now returns not only true but information about the removed document(s). The options silent and returnOld are new. The method can now remove multiple documents with one call. Remove By Keys collection.removeByKeys(keys) 77 Collection M ethods Looks up the documents in the specified collection using the array of keys provided, and removes all documents from the collection whose keys are contained in the keys array. Keys for which no document can be found in the underlying collection are ignored, and no exception will be thrown for them. The method will return an object containing the number of removed documents in the removed sub-attribute, and the number of notremoved/ignored documents in the ignored sub-attribute. This method is deprecated in favour of the array variant of remove. Examples arangosh> keys = [ ]; arangosh> for (var i = 0; i < 10; ++i) { ........> db.example.insert({ _key: "test" + i, value: i }); ........> keys.push("test" + i); ........> } arangosh> db.example.removeByKeys(keys); show execution results Remove By Example collection.removeByExample(example) Removes all documents matching an example. collection.removeByExample(document, waitForSync) The optional waitForSync parameter can be used to force synchronization of the document deletion operation to disk even in case that the waitForSync flag had been disabled for the entire collection. Thus, the waitForSync parameter can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. collection.removeByExample(document, waitForSync, limit) The optional limit parameter can be used to restrict the number of removals to the specified value. If limit is specified but less than the number of documents in the collection, it is undefined which documents are removed. Examples arangosh> db.example.removeByExample( {Hello : "world"} ); 1 Replace By Example collection.replaceByExample(example, newValue) Replaces all documents matching an example with a new document body. The entire document body of each document matching the example will be replaced with newValue. The document meta-attributes _id, _key and _rev will not be replaced. collection.replaceByExample(document, newValue, waitForSync) The optional waitForSync parameter can be used to force synchronization of the document replacement operation to disk even in case that the waitForSync flag had been disabled for the entire collection. Thus, the waitForSync parameter can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. collection.replaceByExample(document, newValue, waitForSync, limit) The optional limit parameter can be used to restrict the number of replacements to the specified value. If limit is specified but less than the number of documents in the collection, it is undefined which documents are replaced. 78 Collection M ethods Examples arangosh> db.example.save({ Hello : "world" }); arangosh> db.example.replaceByExample({ Hello: "world" }, {Hello: "mars"}, false, 5); show execution results Update By Example collection.updateByExample(example, newValue) Partially updates all documents matching an example with a new document body. Specific attributes in the document body of each document matching the example will be updated with the values from newValue. The document meta-attributes _id, _key and _rev cannot be updated. Partial update could also be used to append new fields, if there were no old field with same name. collection.updateByExample(document, newValue, keepNull, waitForSync) The optional keepNull parameter can be used to modify the behavior when handling null values. Normally, null values are stored in the database. By setting the keepNull parameter to false, this behavior can be changed so that all attributes in data with null values will be removed from the target document. The optional waitForSync parameter can be used to force synchronization of the document replacement operation to disk even in case that the waitForSync flag had been disabled for the entire collection. Thus, the waitForSync parameter can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. collection.updateByExample(document, newValue, keepNull, waitForSync, limit) The optional limit parameter can be used to restrict the number of updates to the specified value. If limit is specified but less than the number of documents in the collection, it is undefined which documents are updated. collection.updateByExample(document, newValue, options) Using this variant, the options for the operation can be passed using an object with the following sub-attributes: keepNull waitForSync limit mergeObjects Examples arangosh> db.example.save({ Hello : "world", foo : "bar" }); arangosh> db.example.updateByExample({ Hello: "world" }, { Hello: "foo", World: "bar" }, false); arangosh> db.example.byExample({ Hello: "foo" }).toArray() show execution results Collection type collection.type() Returns the type of a collection. Possible values are: 2: document collection 3: edge collection Get the Version of ArangoDB db._version() 79 Collection M ethods Returns the server version string. Note that this is not the version of the database. Examples arangosh> require("@arangodb").db._version(); 3.3.19 Edges Edges are normal documents that always contain a _from and a _to attribute. Therefore, you can use the document methods to operate on edges. The following methods, however, are specific to edges. edge-collection.edges(vertex) The edges operator finds all edges starting from (outbound) or ending in (inbound) vertex. edge-collection.edges(vertices) The edges operator finds all edges starting from (outbound) or ending in (inbound) a document from vertices, which must be a list of documents or document handles. arangosh> db._create("vertex"); arangosh> db._createEdgeCollection("relation"); arangosh> var myGraph = {}; arangosh> myGraph.v1 = db.vertex.insert({ name : "vertex 1" }); arangosh> myGraph.v2 = db.vertex.insert({ name : "vertex 2" }); arangosh> myGraph.e1 = db.relation.insert(myGraph.v1, myGraph.v2, ........> { label : "knows"}); arangosh> db._document(myGraph.e1); arangosh> db.relation.edges(myGraph.e1._id); show execution results edge-collection.inEdges(vertex) The edges operator finds all edges ending in (inbound) vertex. edge-collection.inEdges(vertices) The edges operator finds all edges ending in (inbound) a document from vertices, which must a list of documents or document handles. Examples arangosh> db._create("vertex"); arangosh> db._createEdgeCollection("relation"); arangosh> myGraph.v1 = db.vertex.insert({ name : "vertex 1" }); arangosh> myGraph.v2 = db.vertex.insert({ name : "vertex 2" }); arangosh> myGraph.e1 = db.relation.insert(myGraph.v1, myGraph.v2, ........> { label : "knows"}); arangosh> db._document(myGraph.e1); arangosh> db.relation.inEdges(myGraph.v1._id); arangosh> db.relation.inEdges(myGraph.v2._id); show execution results edge-collection.outEdges(vertex) The edges operator finds all edges starting from (outbound) vertices. edge-collection.outEdges(vertices) The edges operator finds all edges starting from (outbound) a document from vertices, which must a list of documents or document handles. Examples 80 Collection M ethods arangosh> db._create("vertex"); arangosh> db._createEdgeCollection("relation"); arangosh> myGraph.v1 = db.vertex.insert({ name : "vertex 1" }); arangosh> myGraph.v2 = db.vertex.insert({ name : "vertex 2" }); arangosh> myGraph.e1 = db.relation.insert(myGraph.v1, myGraph.v2, ........> { label : "knows"}); arangosh> db._document(myGraph.e1); arangosh> db.relation.outEdges(myGraph.v1._id); arangosh> db.relation.outEdges(myGraph.v2._id); show execution results Misc collection.iterate(iterator, options) Iterates over some elements of the collection and apply the function iterator to the elements. The function will be called with the document as first argument and the current number (starting with 0) as second argument. options must be an object with the following attributes: limit (optional, default none): use at most limit documents. probability (optional, default all): a number between 0 and 1. Documents are chosen with this probability. Examples arangosh> for (i = -90; ........> i <= 90; for (j = -180; ........> i += 10) { j <= 180; j += 10) { db.example.save({ name : "Name/" + i + "/" + j, ........> home : [ i, j ], ........> work : [ -i, -j ] }); ........> } ........> } ........> arangosh> db.example.ensureIndex({ type: "geo", fields: [ "home" ] }); arangosh> items = db.example.getIndexes().map(function(x) { return x.id; }); ........> db.example.index(items[1]); show execution results 81 Database M ethods Database Methods Document db._document(object) The _document method finds a document given an object object containing the _id attribute. The method returns the document if it can be found. An error is thrown if _rev is specified but the document found has a different revision already. An error is also thrown if no document exists with the given _id. Please note that if the method is executed on the arangod server (e.g. from inside a Foxx application), an immutable document object will be returned for performance reasons. It is not possible to change attributes of this immutable object. To update or patch the returned document, it needs to be cloned/copied into a regular JavaScript object first. This is not necessary if the _document method is called from out of arangosh or from any other client. db._document(document-handle) As before. Instead of object a document-handle can be passed as first argument. No revision can be specified in this case. Examples Returns the document: arangosh> db._document("example/12345"); show execution results Exists db._exists(object) The _exists method determines whether a document exists given an object object containing the _id attribute. An error is thrown if _rev is specified but the document found has a different revision already. Instead of returning the found document or an error, this method will only return an object with the attributes _id, _key and _rev, or false if no document with the given _id or _key exists. It can thus be used for easy existence checks. This method will throw an error if used improperly, e.g. when called with a non-document handle, a non-document, or when a crosscollection request is performed. db._exists(document-handle) As before. Instead of object a document-handle can be passed as first argument. Changes in 3.0 from 2.8: In the case of a revision mismatch _exists now throws an error instead of simply returning false. This is to make it possible to tell the difference between a revision mismatch and a non-existing document. Replace db._replace(selector, data) Replaces an existing document described by the selector, which must be an object containing the _id attribute. There must be a document with that _id in the current database. This document is then replaced with the data given as second argument. Any attribute _id, _key or _rev in data is ignored. The method returns a document with the attributes _id, _key, _rev and _oldRev. The attribute _id contains the document handle of the updated document, the attribute _rev contains the document revision of the updated document, the attribute _oldRev contains the revision of the old (now replaced) document. 82 Database M ethods If the selector contains a _rev attribute, the method first checks that the specified revision is the current revision of that document. If not, there is a conflict, and an error is thrown. collection.replace(selector, data, options) As before, but options must be an object that can contain the following boolean attributes: waitForSync: One can force synchronization of the document creation operation to disk even in case that the waitForSync flag is been disabled for the entire collection. Thus, the waitForSync option can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. overwrite: If this flag is set to true, a _rev attribute in the selector is ignored. returnNew: If this flag is set to true, the complete new document is returned in the output under the attribute new. returnOld: If this flag is set to true, the complete previous revision of the document is returned in the output under the attribute old. silent: If this flag is set to true, no output is returned. db._replace(document-handle, data) db._replace(document-handle, data, options) As before. Instead of selector a document-handle can be passed as first argument. No revision precondition is tested. Examples Create and replace a document: arangosh> a1 = db.example.insert({ a : 1 }); arangosh> a2 = db._replace(a1, { a : 2 }); arangosh> a3 = db._replace(a1, { a : 3 }); show execution results Changes in 3.0 from 2.8: The options silent, returnNew and returnOld are new. Update db._update(selector, data) Updates an existing document described by the selector, which must be an object containing the _id attribute. There must be a document with that _id in the current database. This document is then patched with the data given as second argument. Any attribute _id, _key or _rev in data is ignored. The method returns a document with the attributes _id, _key, _rev and _oldRev. The attribute _id contains the document handle of the updated document, the attribute _rev contains the document revision of the updated document, the attribute _oldRev contains the revision of the old (now updated) document. If the selector contains a _rev attribute, the method first checks that the specified revision is the current revision of that document. If not, there is a conflict, and an error is thrown. db._update(selector, data, options) As before, but options must be an object that can contain the following boolean attributes: waitForSync: One can force synchronization of the document creation operation to disk even in case that the waitForSync flag is been disabled for the entire collection. Thus, the waitForSync option can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. overwrite: If this flag is set to true, a _rev attribute in the selector is ignored. returnNew: If this flag is set to true, the complete new document is returned in the output under the attribute new. returnOld: If this flag is set to true, the complete previous revision of the document is returned in the output under the attribute old. 83 Database M ethods silent: If this flag is set to true, no output is returned. keepNull: The optional keepNull parameter can be used to modify the behavior when handling null values. Normally, null values are stored in the database. By setting the keepNull parameter to false, this behavior can be changed so that all attributes in data with null values will be removed from the target document. mergeObjects: Controls whether objects (not arrays) will be merged if present in both the existing and the patch document. If set to false, the value in the patch document will overwrite the existing document's value. If set to true, objects will be merged. The default is true. db._update(document-handle, data) db._update(document-handle, data, options) As before. Instead of selector a document-handle can be passed as first argument. No revision precondition is tested. Examples Create and update a document: arangosh> a1 = db.example.insert({ a : 1 }); arangosh> a2 = db._update(a1, { b : 2 }); arangosh> a3 = db._update(a1, { c : 3 }); show execution results Changes in 3.0 from 2.8: The options silent, returnNew and returnOld are new. Remove db._remove(selector) Removes a document described by the selector, which must be an object containing the _id attribute. There must be a document with that _id in the current database. This document is then removed. The method returns a document with the attributes _id, _key and _rev. The attribute _id contains the document handle of the removed document, the attribute _rev contains the document revision of the removed eocument. If the selector contains a _rev attribute, the method first checks that the specified revision is the current revision of that document. If not, there is a conflict, and an error is thrown. db._remove(selector, options) As before, but options must be an object that can contain the following boolean attributes: waitForSync: One can force synchronization of the document creation operation to disk even in case that the waitForSync flag is been disabled for the entire collection. Thus, the waitForSync option can be used to force synchronization of just specific operations. To use this, set the waitForSync parameter to true. If the waitForSync parameter is not specified or set to false, then the collection's default waitForSync behavior is applied. The waitForSync parameter cannot be used to disable synchronization for collections that have a default waitForSync value of true. overwrite: If this flag is set to true, a _rev attribute in the selector is ignored. returnOld: If this flag is set to true, the complete previous revision of the document is returned in the output under the attribute old. silent: If this flag is set to true, no output is returned. db._remove(document-handle) db._remove(document-handle, options) As before. Instead of selector a document-handle can be passed as first argument. No revision check is performed. Examples Remove a document: arangosh> a1 = db.example.insert({ a : 1 }); 84 Database M ethods arangosh> db._remove(a1); arangosh> db._remove(a1); arangosh> db._remove(a1, {overwrite: true}); show execution results Remove the document in the revision a1 with a conflict: arangosh> a1 = db.example.insert({ a : 1 }); arangosh> a2 = db._replace(a1, { a : 2 }); arangosh> db._remove(a1); arangosh> db._remove(a1, {overwrite: true} ); arangosh> db._document(a1); show execution results Remove a document using new signature: arangosh> db.example.insert({ _key: "11265325374", a: 1 } ); arangosh> db.example.remove("example/11265325374", ........> { overwrite: true, waitForSync: false}) show execution results Changes in 3.0 from 2.8: The method now returns not only true but information about the removed document(s). The options silent and returnOld are new. 85 Graphs, Vertices & Edges Graphs, Vertices & Edges Graphs, vertices & edges are defined in the Graphs chapter in details. Related blog posts: Graphs in data modeling - is the emperor naked? Index Free Adjacency or Hybrid Indexes for Graph Databases 86 Naming Conventions Naming Conventions in ArangoDB The following naming conventions should be followed by users when creating databases, collections and documents in ArangoDB. 87 Database Names Database Names ArangoDB will always start up with a default database, named _system. Users can create additional databases in ArangoDB, provided the database names conform to the following constraints: Database names must only consist of the letters a to z (both lower and upper case allowed), the numbers 0 to 9, and the underscore (_) or dash (-) symbols This also means that any non-ASCII database names are not allowed Database names must always start with a letter. Database names starting with an underscore are considered to be system databases, and users should not create or delete those The maximum allowed length of a database name is 64 bytes Database names are case-sensitive 88 Collection Names Collection Names Users can pick names for their collections as desired, provided the following naming constraints are not violated: Collection names must only consist of the letters a to z (both in lower and upper case), the numbers 0 to 9, and the underscore (_) or dash (-) symbols. This also means that any non-ASCII collection names are not allowed User-defined collection names must always start with a letter. System collection names must start with an underscore. All collection names starting with an underscore are considered to be system collections that are for ArangoDB's internal use only. System collection names should not be used by end users for their own collections The maximum allowed length of a collection name is 64 bytes Collection names are case-sensitive 89 Document Keys Document Keys Users can define their own keys for documents they save. The document key will be saved along with a document in the _key attribute. Users can pick key values as required, provided that the values conform to the following restrictions: The key must be a string value. Numeric keys are not allowed, but any numeric value can be put into a string and can then be used as document key. The key must be at least 1 byte and at most 254 bytes long. Empty keys are disallowed when specified (though it may be valid to completely omit the _key attribute from a document) It must consist of the letters a-z (lower or upper case), the digits 0-9 or any of the following punctuation characters: @ ( ) + , = ; $ ! * ' _ - : . % Any other characters, especially multi-byte UTF-8 sequences, whitespace or punctuation characters cannot be used inside key values The key must be unique within the collection it is used Keys are case-sensitive, i.e. myKey and MyKEY are considered to be different keys. Specifying a document key is optional when creating new documents. If no document key is specified by the user, ArangoDB will create the document key itself as each document is required to have a key. There are no guarantees about the format and pattern of auto-generated document keys other than the above restrictions. Clients should therefore treat auto-generated document keys as opaque values and not rely on their format. The current format for generated keys is a string containing numeric digits. The numeric values reflect chronological time in the sense that _key values generated later will contain higher numbers than _key values generated earlier. But the exact value that will be generated by the server is not predictable. Note that if you sort on the _key attribute, string comparison will be used, which means "99" "100" is less than etc. 90 Attribute Names Attribute Names Users can pick attribute names for document attributes as desired, provided the following attribute naming constraints are not violated: Attribute names starting with an underscore are considered to be system attributes for ArangoDB's internal use. Such attribute names are already used by ArangoDB for special purposes: _id is used to contain a document's handle _key is used to contain a document's user-defined key _rev is used to contain the document's revision number In edge collections, the _from _to attributes are used to reference other documents. M ore system attributes may be added in the future without further notice so end users should try to avoid using their own attribute names starting with underscores. Theoretically, attribute names can include punctuation and special characters as desired, provided the name is a valid UTF-8 string. For maximum portability, special characters should be avoided though. For example, attribute names may contain the dot symbol, but the dot has a special meaning in JavaScript and also in AQL, so when using such attribute names in one of these languages, the attribute name needs to be quoted by the end user. Overall it might be better to use attribute names which don't require any quoting/escaping in all languages used. This includes languages used by the client (e.g. Ruby, PHP) if the attributes are mapped to object members there. Attribute names starting with an at-mark (@) will need to be enclosed in backticks when used in an AQL query to tell them apart from bind variables. Therefore we do not encourage the use of attributes starting with at-marks, though they will work when used properly. ArangoDB does not enforce a length limit for attribute names. However, long attribute names may use more memory in result sets etc. Therefore the use of long attribute names is discouraged. Attribute names are case-sensitive. Attributes with empty names (an empty string) are disallowed. 91 Indexing Handling Indexes This is an introduction to ArangoDB's interface for indexes in general. There are special sections for Index Basics: Introduction to all index types Which index to use when: Index type and options adviser Index Utilization: How ArangoDB uses indexes Working with Indexes: How to handle indexes programmatically using the db object Hash Indexes Skiplists Persistent Indexes Fulltext Indexes Geo-spatial Indexes Vertex-centric Indexes 92 Index Basics Index basics Indexes allow fast access to documents, provided the indexed attribute(s) are used in a query. While ArangoDB automatically indexes some system attributes, users are free to create extra indexes on non-system attributes of documents. User-defined indexes can be created on collection level. M ost user-defined indexes can be created by specifying the names of the index attributes. Some index types allow indexing just one attribute (e.g. fulltext index) whereas other index types allow indexing multiple attributes at the same time. The system attributes _id extra indexes for them. _id , _key and , _from _key and _to are automatically indexed by ArangoDB, without the user being required to create are covered by a collection's primary key, and and _from _to are covered by an edge collection's edge index automatically. Using the system attribute _id in user-defined indexes is not possible, but indexing , _key _rev , _from , and _to is. ArangoDB provides the following index types: Primary Index For each collection there will always be a primary index which is a hash index for the document keys ( _key attribute) of all documents in the collection. The primary index allows quick selection of documents in the collection using either the _key will be used from within AQL queries automatically when performing equality lookups on . There are also dedicated functions to find a document given its _key or _id _key or _id or _id attributes. It that will always make use of the primary index: db.collection.document(" "); db._document(" "); As the primary index is an unsorted hash index, it cannot be used for non-equality range queries or for sorting. The primary index of a collection cannot be dropped or changed, and there is no mechanism to create user-defined primary indexes. Edge Index Every edge collection also has an automatically created edge index. The edge index provides quick access to documents by either their _from or _to attributes. It can therefore be used to quickly find connections between vertex documents and is invoked when the connecting edges of a vertex are queried. Edge indexes are used from within AQL when performing equality lookups on also dedicated functions to find edges given their _from or _to _from or _to values in an edge collections. There are values that will always make use of the edge index: db.collection.edges(" "); db.collection.edges(" "); db.collection.outEdges(" "); db.collection.outEdges(" "); db.collection.inEdges(" "); db.collection.inEdges(" "); Internally, the edge index is implemented as a hash index, which stores the union of all _from and _to attributes. It can be used for equality lookups, but not for range queries or for sorting. Edge indexes are automatically created for edge collections. It is not possible to create user-defined edge indexes. However, it is possible to freely use the _from and _to attributes in user-defined indexes. An edge index cannot be dropped or changed. Hash Index A hash index can be used to quickly find documents with specific attribute values. The hash index is unsorted, so it supports equality lookups but no range queries or sorting. 93 Index Basics A hash index can be created on one or multiple document attributes. A hash index will only be used by a query if all index attributes are present in the search condition, and if all attributes are compared using the equality ( AQL and several query functions, e.g. byExample , firstExample == ) operator. Hash indexes are used from within etc. Hash indexes can optionally be declared unique, then disallowing saving the same value(s) in the indexed attribute(s). Hash indexes can optionally be sparse. The different types of hash indexes have the following characteristics: unique hash index: all documents in the collection must have different values for the attributes covered by the unique index. Trying to insert a document with the same key value as an already existing document will lead to a unique constraint violation. This type of index is not sparse. Documents that do not contain the index attributes or that have a value of attribute(s) will still be indexed. A key value of null null in the index may only occur once in the index, so this type of index cannot be used for optional attributes. The unique option can also be used to ensure that no duplicate edges are created, by adding a combined index for the fields and _to _from to an edge collection. unique, sparse hash index: all documents in the collection must have different values for the attributes covered by the unique index. Documents in which at least one of the index attributes is not set or has a value of null are not included in the index. This type of index can be used to ensure that there are no duplicate keys in the collection for documents which have the indexed attributes set. As the index will exclude documents for which the indexed attributes are null or not set, it can be used for optional attributes. non-unique hash index: all documents in the collection will be indexed. This type of index is not sparse. Documents that do not contain the index attributes or that have a value of null in the index attribute(s) will still be indexed. Duplicate key values can occur and do not lead to unique constraint violations. non-unique, sparse hash index: only those documents will be indexed that have all the indexed attributes set to a value other than null . It can be used for optional attributes. The amortized complexity of lookup, insert, update, and removal operations in unique hash indexes is O(1). Non-unique hash indexes have an amortized complexity of O(1) for insert, update, and removal operations. That means non-unique hash indexes can be used on attributes with low cardinality. If a hash index is created on an attribute that is missing in all or many of the documents, the behavior is as follows: if the index is sparse, the documents missing the attribute will not be indexed and not use index memory. These documents will not influence the update or removal performance for the index. if the index is non-sparse, the documents missing the attribute will be contained in the index with a key value of null . Hash indexes support indexing array values if the index attribute name is extended with a [*]. Skiplist Index A skiplist is a sorted index structure. It can be used to quickly find documents with specific attribute values, for range queries and for returning documents from the index in sorted order. Skiplists will be used from within AQL and several query functions, e.g. firstExample byExample , etc. Skiplist indexes will be used for lookups, range queries and sorting only if either all index attributes are provided in a query, or if a leftmost prefix of the index attributes is specified. For example, if a skiplist index is created on attributes <= and >= value1 and value2 , the following filter conditions can use the index (note: the operators are intentionally omitted here for the sake of brevity): FILTER doc.value1 == ... FILTER doc.value1 < ... FILTER doc.value1 > ... FILTER doc.value1 > ... && doc.value1 < ... FILTER doc.value1 == ... && doc.value2 == ... FILTER doc.value1 == ... && doc.value2 > ... FILTER doc.value1 == ... && doc.value2 > ... && doc.value2 < ... 94 Index Basics In order to use a skiplist index for sorting, the index attributes must be specified in the SORT clause of the query in the same order as they appear in the index definition. Skiplist indexes are always created in ascending order, but they can be used to access the indexed elements in both ascending or descending order. However, for a combined index (an index on multiple attributes) this requires that the sort orders in a single query as specified in the SORT clause must be either all ascending (optionally ommitted as ascending is the default) or all descending. For example, if the skiplist index is created on attributes value1 and value2 (in this order), then the following sorts clauses can use the index for sorting: SORT value1 ASC, value2 ASC (and its equivalent SORT value1, value2 ) SORT value1 DESC, value2 DESC SORT value1 ASC (and its equivalent SORT value1 ) SORT value1 DESC The following sort clauses cannot make use of the index order, and require an extra sort step: SORT value1 ASC, value2 DESC SORT value1 DESC, value2 ASC SORT value2 (and its equivalent SORT value2 DESC SORT value2 ASC (because first indexed attribute ) value1 is not used in sort clause) Note: the latter two sort clauses cannot use the index because the sort clause does not refer to a leftmost prefix of the index attributes. Skiplists can optionally be declared unique, disallowing saving the same value in the indexed attribute. They can be sparse or non-sparse. The different types of skiplist indexes have the following characteristics: unique skiplist index: all documents in the collection must have different values for the attributes covered by the unique index. Trying to insert a document with the same key value as an already existing document will lead to a unique constraint violation. This type of index is not sparse. Documents that do not contain the index attributes or that have a value of attribute(s) will still be indexed. A key value of null null in the index may only occur once in the index, so this type of index cannot be used for optional attributes. unique, sparse skiplist index: all documents in the collection must have different values for the attributes covered by the unique index. Documents in which at least one of the index attributes is not set or has a value of null are not included in the index. This type of index can be used to ensure that there are no duplicate keys in the collection for documents which have the indexed attributes set. As the index will exclude documents for which the indexed attributes are null or not set, it can be used for optional attributes. non-unique skiplist index: all documents in the collection will be indexed. This type of index is not sparse. Documents that do not contain the index attributes or that have a value of null in the index attribute(s) will still be indexed. Duplicate key values can occur and do not lead to unique constraint violations. non-unique, sparse skiplist index: only those documents will be indexed that have all the indexed attributes set to a value other than null . It can be used for optional attributes. The operational amortized complexity for skiplist indexes is logarithmically correlated with the number of documents in the index. Skiplist indexes support indexing array values if the index attribute name is extended with a [*]`. Persistent Index The persistent index is a sorted index with persistence. The index entries are written to disk when documents are stored or updated. That means the index entries do not need to be rebuilt from the collection data when the server is restarted or the indexed collection is initially loaded. Thus using persistent indexes may reduce collection loading times. The persistent index type can be used for secondary indexes at the moment. That means the persistent index currently cannot be made the only index for a collection, because there will always be the in-memory primary index for the collection in addition, and potentially more indexes (such as the edges index for an edge collection). The index implementation is using the RocksDB engine, and it provides logarithmic complexity for insert, update, and remove operations. As the persistent index is not an in-memory index, it does not store pointers into the primary index as all the in-memory indexes do, but instead it stores a document's primary key. To retrieve a document via a persistent index via an index value lookup, there will therefore be 95 Index Basics an additional O(1) lookup into the primary index to fetch the actual document. As the persistent index is sorted, it can be used for point lookups, range queries and sorting operations, but only if either all index attributes are provided in a query, or if a leftmost prefix of the index attributes is specified. Geo Index Users can create additional geo indexes on one or multiple attributes in collections. A geo index is used to find places on the surface of the earth fast. The geo index stores two-dimensional coordinates. It can be created on either two separate document attributes (latitude and longitude) or a single array attribute that contains both latitude and longitude. Latitude and longitude must be numeric values. The geo index provides operations to find documents with coordinates nearest to a given comparison coordinate, and to find documents with coordinates that are within a specifiable radius around a comparison coordinate. The geo index is used via dedicated functions in AQL, the simple queries functions and it is implicitly applied when in AQL a SORT or FILTER is used with the distance function. Otherwise it will not be used for other types of queries or conditions. Fulltext Index A fulltext index can be used to find words, or prefixes of words inside documents. A fulltext index can be created on a single attribute only, and will index all words contained in documents that have a textual value in that attribute. Only words with a (specifiable) minimum length are indexed. Word tokenization is done using the word boundary analysis provided by libicu, which is taking into account the selected language provided at server start. Words are indexed in their lower-cased form. The index supports complete match queries (full words) and prefix queries, plus basic logical operations such as and , or and not for combining partial results. The fulltext index is sparse, meaning it will only index documents for which the index attribute is set and contains a string value. Additionally, only words with a configurable minimum length will be included in the index. The fulltext index is used via dedicated functions in AQL or the simple queries, but will not be enabled for other types of queries or conditions. Indexing attributes and sub-attributes Top-level as well as nested attributes can be indexed. For attributes at the top level, the attribute names alone are required. To index a single field, pass an array with a single element (string of the attribute key) to the fields parameter of the ensureIndex() method. To create a combined index over multiple fields, simply add more members to the fields array: // { name: "Smith", age: 35 } db.posts.ensureIndex({ type: "hash", fields: [ "name" ] }) db.posts.ensureIndex({ type: "hash", fields: [ "name", "age" ] }) To index sub-attributes, specify the attribute path using the dot notation: // { name: {last: "Smith", first: "John" } } db.posts.ensureIndex({ type: "hash", fields: [ "name.last" ] }) db.posts.ensureIndex({ type: "hash", fields: [ "name.last", "name.first" ] }) Indexing array values If an index attribute contains an array, ArangoDB will store the entire array as the index value by default. Accessing individual members of the array via the index is not possible this way. To make an index insert the individual array members into the index instead of the entire array value, a special array index needs to be created for the attribute. Array indexes can be set up like regular hash or skiplist indexes using the collection.ensureIndex() function. To make a hash or skiplist index an array index, the index attribute name needs to be extended with [*] when creating the index and when filtering in an AQL query using the IN operator. The following example creates an array hash index on the tags attribute in a collection named posts : 96 Index Basics db.posts.ensureIndex({ type: "hash", fields: [ "tags[*]" ] }); db.posts.insert({ tags: [ "foobar", "baz", "quux" ] }); This array index can then be used for looking up individual tags values from AQL queries via the IN operator: FOR doc IN posts FILTER 'foobar' IN doc.tags RETURN doc It is possible to add the array expansion operator [*], but it is not mandatory. You may use it to indicate that an array index is used, it is purely cosmetic however: FOR doc IN posts FILTER 'foobar' IN doc.tags[*] RETURN doc The following FILTER conditions will not use the array index: FILTER doc.tags ANY == 'foobar' FILTER doc.tags ANY IN 'foobar' FILTER doc.tags IN 'foobar' FILTER doc.tags == 'foobar' FILTER 'foobar' == doc.tags It is also possible to create an index on subattributes of array values. This makes sense if the index attribute is an array of objects, e.g. db.posts.ensureIndex({ type: "hash", fields: [ "tags[*].name" ] }); db.posts.insert({ tags: [ { name: "foobar" }, { name: "baz" }, { name: "quux" } ] }); The following query will then use the array index (this does require the array expansion operator): FOR doc IN posts FILTER 'foobar' IN doc.tags[*].name RETURN doc If you store a document having the array which does contain elements not having the subattributes this document will also be indexed with the value null , which in ArangoDB is equal to attribute not existing. ArangoDB supports creating array indexes with a single [*] operator per index attribute. For example, creating an index as follows is not supported: db.posts.ensureIndex({ type: "hash", fields: [ "tags[*].name[*].value" ] }); Array values will automatically be de-duplicated before being inserted into an array index. For example, if the following document is inserted into the collection, the duplicate array value bar will be inserted only once: db.posts.insert({ tags: [ "foobar", "bar", "bar" ] }); This is done to avoid redudant storage of the same index value for the same document, which would not provide any benefit. If an array index is declared unique, the de-duplication of array values will happen before inserting the values into the index, so the above insert operation with two identical values bar will not necessarily fail It will always fail if the index already contains an instance of the bar value. However, if the value index, then the de-duplication of the array values will effectively lead to bar bar is not already present in the being inserted only once. To turn off the deduplication of array values, it is possible to set the deduplicate attribute on the array index to value for deduplicate is true false . The default however, so de-duplication will take place if not explicitly turned off. db.posts.ensureIndex({ type: "hash", fields: [ "tags[*]" ], deduplicate: false }); 97 Index Basics // will fail now db.posts.insert({ tags: [ "foobar", "bar", "bar" ] }); If an array index is declared and you store documents that do not have an array at the specified attribute this document will not be inserted in the index. Hence the following objects will not be indexed: db.posts.ensureIndex({ type: "hash", fields: [ "tags[*]" ] }); db.posts.insert({ something: "else" }); db.posts.insert({ tags: null }); db.posts.insert({ tags: "this is no array" }); db.posts.insert({ tags: { content: [1, 2, 3] } }); An array index is able to index explicit explicitly null null values. When queried for null values, it will only return those documents having stored in the array, it will not return any documents that do not have the array at all. db.posts.ensureIndex({ type: "hash", fields: [ "tags[*]" ] }); db.posts.insert({tags: null}) // Will not be indexed db.posts.insert({tags: []}) // Will not be indexed db.posts.insert({tags: [null]}); // Will be indexed for null db.posts.insert({tags: [null, 1, 2]}); // Will be indexed for null, 1 and 2 Declaring an array index as sparse does not have an effect on the array part of the index, this in particular means that explicit null values are also indexed in the sparse version. If an index is combined from an array and a normal attribute the sparsity will apply for the attribute e.g.: db.posts.ensureIndex({ type: "hash", fields: [ "tags[*]", "name" ], sparse: true }); db.posts.insert({tags: null, name: "alice"}) // Will not be indexed db.posts.insert({tags: [], name: "alice"}) // Will not be indexed db.posts.insert({tags: [1, 2, 3]}) // Will not be indexed db.posts.insert({tags: [1, 2, 3], name: null}) // Will not be indexed db.posts.insert({tags: [1, 2, 3], name: "alice"}) // Will be indexed for [1, "alice"], [2, "alice"], [3, "alice"] db.posts.insert({tags: [null], name: "bob"}) // Will be indexed for [null, "bob"] Please note that filtering using array indexes only works from within AQL queries and only if the query filters on the indexed attribute using the IN operator. The other comparison operators ( == , != , > , >= , < , <= , ANY , ALL , NONE ) currently cannot use and _to array indexes. Vertex centric indexes As mentioned above, the most important indexes for graphs are the edge indexes, indexing the _from attributes of edge collections. They provide very quick access to all edges originating in or arriving at a given vertex, which allows to quickly find all neighbours of a vertex in a graph. In many cases one would like to run more specific queries, for example finding amongst the edges originating in a given vertex only those with the 20 latest time stamps. Exactly this is achieved with "vertex centric indexes". In a sense these are localized indexes for an edge collection, which sit at every single vertex. Technically, they are implemented in ArangoDB as indexes, which sort the complete edge collection first by attributes. If we for example have a skiplist index on the attributes _from and timestamp _from and then by other of an edge collection, we can answer the above question very quickly with a single range lookup in the index. Since ArangoDB 3.0 one can create sorted indexes (type "skiplist" and "persistent") that index the special edge attributes _to and additionally other attributes. Since ArangoDB 3.1, these are used in graph traversals, when appropriate FILTER _from or statements are found by the optimizer. For example, to create a vertex centric index of the above type, you would simply do db.edges.ensureIndex({"type":"skiplist", "fields": ["_from", "timestamp"]}); Then, queries like 98 Index Basics FOR v, e, p IN 1..1 OUTBOUND "V/1" edges FILTER e.timestamp ALL >= "2016-11-09" RETURN p will be considerably faster in case there are many edges originating in vertex "V/1" but only few with a recent time stamp. 99 Which index to use when Which Index to use when ArangoDB automatically indexes the that a document's _id _key attribute in each collection. There is no need to index this attribute separately. Please note attribute is derived from the ArangoDB will also automatically create an index on _key attribute, and is thus implicitly indexed, too. _from and _to in any edge collection, meaning incoming and outgoing connections can be determined efficiently. Index types Users can define additional indexes on one or multiple document attributes. Several different index types are provided by ArangoDB. These indexes have different usage scenarios: hash index: provides quick access to individual documents if (and only if) all indexed attributes are provided in the search query. The index will only be used for equality comparisons. It does not support range queries and cannot be used for sorting. The hash index is a good candidate if all or most queries on the indexed attribute(s) are equality comparisons. The unique hash index provides an amortized complexity of O(1) for insert, update, remove and lookup operations. The non-unique hash index provides O(1) inserts, updates and removes, and will allow looking up documents by index value with amortized O(n) complexity, with n being the number of documents with that index value. A non-unique hash index on an optional document attribute should be declared sparse so that it will not index documents for which the index attribute is not set. skiplist index: skiplists keep the indexed values in an order, so they can be used for equality lookups, range queries and for sorting. For high selectivity attributes, skiplist indexes will have a higher overhead than hash indexes. For low selectivity attributes, skiplist indexes will be more efficient than non-unique hash indexes. Additionally, skiplist indexes allow more use cases (e.g. range queries, sorting) than hash indexes. Furthermore, they can be used for lookups based on a leftmost prefix of the index attributes. persistent index: a persistent index behaves much like the sorted skiplist index, except that all index values are persisted on disk and do not need to be rebuilt in memory when the server is restarted or the indexed collection is reloaded. The operations in a persistent index have logarithmic complexity, but operations have may have a higher constant factor than the operations in a skiplist index, because the persistent index may need to make extra roundtrips to the primary index to fetch the actual documents. A persistent index can be used for equality lookups, range queries and for sorting. For high selectivity attributes, persistent indexes will have a higher overhead than skiplist or hash indexes. Persistent indexes allow more use cases (e.g. range queries, sorting) than hash indexes. Furthermore, they can be used for lookups based on a leftmost prefix of the index attributes. In contrast to the in-memory skiplist indexes, persistent indexes do not need to be rebuilt in-memory so they don't influence the loading time of collections as other in-memory indexes do. geo index: the geo index provided by ArangoDB allows searching for documents within a radius around a two-dimensional earth coordinate (point), or to find documents with are closest to a point. Document coordinates can either be specified in two different document attributes or in a single attribute, e.g. { "latitude": 50.9406645, "longitude": 6.9599115 } or { "coords": [ 50.9406645, 6.9599115 ] } Geo indexes will be invoked via special functions or AQL optimization. The optimization can be triggered when a collection with geo index is enumerated and a SORT or FILTER statement is used in conjunction with the distance function. fulltext index: a fulltext index can be used to index all words contained in a specific attribute of all documents in a collection. Only words with a (specifiable) minimum length are indexed. Word tokenization is done using the word boundary analysis provided by libicu, which is taking into account the selected language provided at server start. 100 Which index to use when The index supports complete match queries (full words) and prefix queries. Fulltext indexes will only be invoked via special functions. Sparse vs. non-sparse indexes Hash indexes and skiplist indexes can optionally be created sparse. A sparse index does not contain documents for which at least one of the index attribute is not set or contains a value of null . As such documents are excluded from sparse indexes, they may contain fewer documents than their non-sparse counterparts. This enables faster indexing and can lead to reduced memory usage in case the indexed attribute does occur only in some, but not all documents of the collection. Sparse indexes will also reduce the number of collisions in non-unique hash indexes in case non-existing or optional attributes are indexed. In order to create a sparse index, an object with the attribute can be added to the index creation commands: sparse db.collection.ensureIndex({ type: "hash", fields: [ "attributeName" ], sparse: true }); db.collection.ensureIndex({ type: "hash", fields: [ "attributeName1", "attributeName2" ], sparse: true }); db.collection.ensureIndex({ type: "hash", fields: [ "attributeName" ], unique: true, sparse: true }); db.collection.ensureIndex({ type: "hash", fields: [ "attributeName1", "attributeName2" ], unique: true, sparse: true }); db.collection.ensureIndex({ type: "skiplist", fields: [ "attributeName" ], sparse: true }); db.collection.ensureIndex({ type: "skiplist", fields: [ "attributeName1", "attributeName2" ], sparse: true }); db.collection.ensureIndex({ type: "skiplist", fields: [ "attributeName" ], unique: true, sparse: true }); db.collection.ensureIndex({ type: "skiplist", fields: [ "attributeName1", "attributeName2" ], unique: true, sparse: true }); When not explicitly set, the sparse attribute defaults to false for new indexes. Other indexes than hash and skiplist do not support sparsity. As sparse indexes may exclude some documents from the collection, they cannot be used for all types of queries. Sparse hash indexes cannot be used to find documents for which at least one of the indexed attributes has a value of query cannot use a sparse index, even if one was created on attribute attr null . For example, the following AQL : FOR doc In collection FILTER doc.attr == null RETURN doc If the lookup value is non-constant, a sparse index may or may not be used, depending on the other types of conditions in the query. If the optimizer can safely determine that the lookup value cannot be null , a sparse index may be used. When uncertain, the optimizer will not make use of a sparse index in a query in order to produce correct results. For example, the following queries cannot use a sparse index on values which are compared to doc.attr will include null attr because the optimizer will not know beforehand whether the : FOR doc In collection FILTER doc.attr == SOME_FUNCTION(...) RETURN doc FOR other IN otherCollection FOR doc In collection FILTER doc.attr == other.attr RETURN doc Sparse skiplist indexes can be used for sorting if the optimizer can safely detect that the index range does not include null for any of the index attributes. Note that if you intend to use joins it may be clever to use non-sparsity and maybe even uniqueness for that attribute, else all items containing the null value will match against each other and thus produce large results. 101 Index Utilization Index Utilization In most cases ArangoDB will use a single index per collection in a given query. AQL queries can use more than one index per collection when multiple FILTER conditions are combined with a logical OR and these can be covered by indexes. AQL queries will use a single index per collection when FILTER conditions are combined with logical AND . Creating multiple indexes on different attributes of the same collection may give the query optimizer more choices when picking an index. Creating multiple indexes on different attributes can also help in speeding up different queries, with FILTER conditions on different attributes. It is often beneficial to create an index on more than just one attribute. By adding more attributes to an index, an index can become more selective and thus reduce the number of documents that queries need to process. ArangoDB's primary indexes, edges indexes and hash indexes will automatically provide selectivity estimates. Index selectivity estimates are provided in the web interface, the getIndexes() return value and in the explain() output for a given query. The more selective an index is, the more documents it will filter on average. The index selectivity estimates are therefore used by the optimizer when creating query execution plans when there are multiple indexes the optimizer can choose from. The optimizer will then select a combination of indexes with the lowest estimated total cost. In general, the optimizer will pick the indexes with the highest estimated selectivity. Sparse indexes may or may not be picked by the optimizer in a query. As sparse indexes do not contain used for queries if the optimizer cannot safely determine whether a FILTER condition includes null null values, they will not be values for the index attributes. The optimizer policy is to produce correct results, regardless of whether or which index is used to satisfy FILTER conditions. If it is unsure about whether using an index will violate the policy, it will not make use of the index. Troubleshooting When in doubt about whether and which indexes will be used for executing a given AQL query, click the Explain button in the web interface in the Queries view or use the explain() method for the statement as follows (from the ArangoShell): var query = "FOR doc IN collection FILTER doc.value > 42 RETURN doc"; var stmt = db._createStatement(query); stmt.explain(); The explain() command will return a detailed JSON representation of the query's execution plan. The JSON explain output is intended to be used by code. To get a human-readable and much more compact explanation of the query, there is an explainer tool: var query = "FOR doc IN collection FILTER doc.value > 42 RETURN doc"; require("@arangodb/aql/explainer").explain(query); If any of the explain methods shows that a query is not using indexes, the following steps may help: check if the attribute names in the query are correctly spelled. In a schema-free database, documents in the same collection can have varying structures. There is no such thing as a non-existing attribute error. A query that refers to attribute names not present in any of the documents will not return an error, and obviously will not benefit from indexes. check the return value of the getIndexes() method for the collections used in the query and validate that indexes are actually present on the attributes used in the query's filter conditions. if indexes are present but not used by the query, the indexes may have the wrong type. For example, a hash index will only be used for equality comparisons (i.e. == ) but not for other comparison types such as < , <= , > , >= . Additionally hash indexes will only be used if all of the index attributes are used in the query's FILTER conditions. A skiplist index will only be used if at least its first attribute is used in a FILTER condition. If additionally of the skiplist index attributes are specified in the query (from left-toright), they may also be used and allow to filter more documents. using indexed attributes as function parameters or in arbitrary expressions will likely lead to the index on the attribute not being used. For example, the following queries will not use an index on value : 102 Index Utilization FOR doc IN collection FILTER TO_NUMBER(doc.value) == 42 RETURN doc FOR doc IN collection FILTER doc.value - 1 == 42 RETURN doc In these cases the queries should be rewritten so that only the index attribute is present on one side of the operator, or additional filters and indexes should be used to restrict the amount of documents otherwise. certain AQL functions such as WITHIN() or FULLTEXT() do utilize indexes internally, but their use is not mentioned in the query explanation for functions in general. These functions will raise query errors (at runtime) if no suitable index is present for the collection in question. the query optimizer will in general pick one index per collection in a query. It can pick more than one index per collection if the FILTER condition contains multiple branches combined with logical OR . For example, the following queries can use indexes: FOR doc IN collection FILTER doc.value1 == 42 || doc.value1 == 23 RETURN doc FOR doc IN collection FILTER doc.value1 == 42 || doc.value2 == 23 RETURN doc FOR doc IN collection FILTER doc.value1 < 42 || doc.value2 > 23 RETURN doc The two OR s in the first query will be converted to an second query requires two separate indexes on indexes on value1 and value2 value1 IN and list, and if there is a suitable index on value2 value1 , it will be used. The and will use them if present. The third query can use the when they are sorted. 103 Working with Indexes Working with Indexes Index Identifiers and Handles An index handle uniquely identifies an index in the database. It is a string and consists of the collection name and an index identifier separated by a / . The index identifier part is a numeric value that is auto-generated by ArangoDB. A specific index of a collection can be accessed using its index handle or index identifier as follows: db.collection.index(" "); db.collection.index(" "); db._index(" "); For example: Assume that the index handle, which is stored in the created in a collection named demo _id attribute of the index, is demo/362549736 and the index was . Then this index can be accessed as: db.demo.index("demo/362549736"); Because the index handle is unique within the database, you can leave out the collection and use the shortcut: db._index("demo/362549736"); Collection Methods Listing all indexes of a collection returns information about the indexes getIndexes() Returns an array of all indexes defined for the collection. Note that _key implicitly has an index assigned to it. arangosh> db.test.ensureHashIndex("hashListAttribute", ........> "hashListSecondAttribute.subAttribute"); arangosh> db.test.getIndexes(); show execution results Creating an index Indexes should be created using the general method ensureIndex. This method obsoletes the specialized index-specific methods ensureHashIndex, ensureSkiplist, ensureUniqueConstraint etc. ensures that an index exists collection.ensureIndex(index-description) Ensures that an index according to the index-description exists. A new index will be created if none exists with the given description. The index-description must contain at least a type attribute. Other attributes may be necessary, depending on the index type. type can be one of the following values: hash: hash index skiplist: skiplist index fulltext: fulltext index geo1: geo index, with one attribute geo2: geo index, with two attributes 104 Working with Indexes sparse can be true or false. For hash, and skiplist the sparsity can be controlled, fulltext and geo are sparse by definition. unique can be true or false and is supported by hash or skiplist Calling this method returns an index object. Whether or not the index object existed before the call is indicated in the return attribute isNewlyCreated. deduplicate can be true or false and is supported by array indexes of type hash or skiplist. It controls whether inserting duplicate index values from the same document into a unique array index will lead to a unique constraint error or not. The default value is true, so only a single instance of each non-unique index value will be inserted into the index per document. Trying to insert a value into the index that already exists in the index will always fail, regardless of the value of this attribute. Examples arangosh> db.test.ensureIndex({ type: "hash", fields: [ "a" ], sparse: true }); arangosh> db.test.ensureIndex({ type: "hash", fields: [ "a", "b" ], unique: true }); show execution results Dropping an index drops an index collection.dropIndex(index) Drops the index. If the index does not exist, then false is returned. If the index existed and was dropped, then true is returned. Note that you cannot drop some special indexes (e.g. the primary index of a collection or the edge index of an edge collection). collection.dropIndex(index-handle) Same as above. Instead of an index an index handle can be given. arangosh> db.example.ensureSkiplist("a", "b"); arangosh> var indexInfo = db.example.getIndexes(); arangosh> indexInfo; arangosh> db.example.dropIndex(indexInfo[0]) arangosh> db.example.dropIndex(indexInfo[1].id) arangosh> indexInfo = db.example.getIndexes(); show execution results Load Indexes into Memory Loads all indexes of this collection into M emory. collection.loadIndexesIntoMemory() This function tries to cache all index entries of this collection into the main memory. Therefore it iterates over all indexes of the collection and stores the indexed values, not the entire document data, in memory. All lookups that could be found in the cache are much faster than lookups not stored in the cache so you get a nice performance boost. It is also guaranteed that the cache is consistent with the stored data. For the time being this function is only useful on RocksDB storage engine, as in M M Files engine all indexes are in memory anyways. On RocksDB this function honors all memory limits, if the indexes you want to load are smaller than your memory limit this function guarantees that most index values are cached. If the index is larger than your memory limit this function will fill up values up to this limit and for the time being there is no way to control which indexes of the collection should have priority over others. arangosh> db.example.loadIndexesIntoMemory(); { "result" : true } 105 Working with Indexes Database Methods Fetching an index by handle finds an index db._index(index-handle) Returns the index with index-handle or null if no such index exists. arangosh> db.example.ensureIndex({ type: "skiplist", fields: [ "a", "b" ] }); arangosh> var indexInfo = db.example.getIndexes().map(function(x) { return x.id; }); arangosh> indexInfo; arangosh> db._index(indexInfo[0]) arangosh> db._index(indexInfo[1]) show execution results Dropping an index drops an index db._dropIndex(index) Drops the index. If the index does not exist, then false is returned. If the index existed and was dropped, then true is returned. db._dropIndex(index-handle) Drops the index with index-handle. arangosh> db.example.ensureIndex({ type: "skiplist", fields: [ "a", "b" ] }); arangosh> var indexInfo = db.example.getIndexes(); arangosh> indexInfo; arangosh> db._dropIndex(indexInfo[0]) arangosh> db._dropIndex(indexInfo[1].id) arangosh> indexInfo = db.example.getIndexes(); show execution results Revalidating whether an index is used finds an index So you've created an index, and since its maintainance isn't for free, you definitely want to know whether your query can utilize it. You can use explain to verify whether skiplists or hash indexes are used (if you omit colors: false you will get nice colors in ArangoShell): arangosh> var explain = require("@arangodb/aql/explainer").explain; arangosh> db.example.ensureIndex({ type: "skiplist", fields: [ "a", "b" ] }); arangosh> explain("FOR doc IN example FILTER doc.a < 23 RETURN doc", {colors:false}); show execution results 106 Hash Indexes Hash Indexes Introduction to Hash Indexes It is possible to define a hash index on one or more attributes (or paths) of a document. This hash index is then used in queries to locate documents in O(1) operations. If the hash index is unique, then no two documents are allowed to have the same set of attribute values. Creating a new document or updating a document will fail if the uniqueness is violated. If the index is declared sparse, a document will be excluded from the index and no uniqueness checks will be performed if any index attribute value is not set or has a value of null . Accessing Hash Indexes from the Shell Unique Hash Indexes Ensures that a unique constraint exists: collection.ensureIndex({ type: "hash", fields: [ "field1", ..., "fieldn" ], unique: true }) Creates a unique hash index on all documents using field1, ... fieldn as attribute paths. At least one attribute path has to be given. The index will be non-sparse by default. All documents in the collection must differ in terms of the indexed attributes. Creating a new document or updating an existing document will will fail if the attribute uniqueness is violated. To create a sparse unique index, set the sparse attribute to true : collection.ensureIndex({ type: "hash", fields: [ "field1", ..., "fieldn" ], unique: true, sparse: true }) In case that the index was successfully created, the index identifier is returned. Non-existing attributes will default to attributes are null null . In a sparse index all documents will be excluded from the index for which all specified index . Such documents will not be taken into account for uniqueness checks. In a non-sparse index, all documents regardless of null - attributes will be indexed and will be taken into account for uniqueness checks. In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. arangosh> db.test.ensureIndex({ type: "hash", fields: [ "a", "b.c" ], unique: true }); arangosh> db.test.save({ a : 1, b : { c : 1 } }); arangosh> db.test.save({ a : 1, b : { c : 1 } }); arangosh> db.test.save({ a : 1, b : { c : null } }); arangosh> db.test.save({ a : 1 }); show execution results Non-unique Hash Indexes Ensures that a non-unique hash index exists: collection.ensureIndex({ type: "hash", fields: [ "field1", ..., "fieldn" ] }) Creates a non-unique hash index on all documents using field1, ... fieldn as attribute paths. At least one attribute path has to be given. The index will be non-sparse by default. To create a sparse unique index, set the sparse attribute to true : collection.ensureIndex({ type: "hash", fields: [ "field1", ..., "fieldn" ], sparse: true }) In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. arangosh> db.test.ensureIndex({ type: "hash", fields: [ "a" ] }); arangosh> db.test.save({ a : 1 }); 107 Hash Indexes arangosh> db.test.save({ a : 1 }); arangosh> db.test.save({ a : null }); show execution results Hash Array Indexes Ensures that a hash array index exists (non-unique): collection.ensureIndex({ type: "hash", fields: [ "field1[*]", ..., "fieldn[*]" ] }) Creates a non-unique hash array index for the individual elements of the array attributes field1[*], ... fieldn[*] found in the documents. At least one attribute path has to be given. The index always treats the indexed arrays as sparse. It is possible to combine array indexing with standard indexing: collection.ensureIndex({ type: "hash", fields: [ "field1[*]", "field2" ] }) In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. arangosh> db.test.ensureIndex({ type: "hash", fields: [ "a[*]" ] }); arangosh> db.test.save({ a : [ 1, 2 ] }); arangosh> db.test.save({ a : [ 1, 3 ] }); arangosh> db.test.save({ a : null }); show execution results Ensure uniqueness of relations in edge collections It is possible to create secondary indexes using the edge attributes _from and _to , starting with ArangoDB 3.0. A combined index over both fields together with the unique option enabled can be used to prevent duplicate relations from being created. For example, a document collection verts might contain vertices with the document handles verts/A , verts/B and verts/C . Relations between these documents can be stored in an edge collection edges for instance. Now, you may want to make sure that the vertex verts/A fields is never linked to _from and _to verts/B by an edge more than once. This can be achieved by adding a unique, non-sparse hash index for the : db.edges.ensureIndex({ type: "hash", fields: [ "_from", "_to" ], unique: true }); Creating an edge { _from: "verts/A", _to: "verts/B" } in edges will be accepted, but only once. Another attempt to store an edge with the relation A → B will be rejected by the server with a unique constraint violated error. This includes updates to the _from and _to fields. Note that adding a relation B → A is still possible, so is A → A and B → B, because they are all different relations in a directed graph. Each one can only occur once however. 108 Skiplists Skiplists Introduction to Skiplist Indexes This is an introduction to ArangoDB's skiplists. It is possible to define a skiplist index on one or more attributes (or paths) of documents. This skiplist is then used in queries to locate documents within a given range. If the skiplist is declared unique, then no two documents are allowed to have the same set of attribute values. Creating a new document or updating a document will fail if the uniqueness is violated. If the skiplist index is declared sparse, a document will be excluded from the index and no uniqueness checks will be performed if any index attribute value is not set or has a value of null . Accessing Skiplist Indexes from the Shell Unique Skiplist Index Ensures that a unique skiplist index exists: collection.ensureIndex({ type: "skiplist", fields: [ "field1", ..., "fieldn" ], unique: true }) Creates a unique skiplist index on all documents using field1, ... fieldn as attribute paths. At least one attribute path has to be given. The index will be non-sparse by default. All documents in the collection must differ in terms of the indexed attributes. Creating a new document or updating an existing document will fail if the attribute uniqueness is violated. To create a sparse unique index, set the sparse attribute to true : collection.ensureIndex({ type: "skiplist", fields: [ "field1", ..., "fieldn" ], unique: true, sparse: true }) In a sparse index all documents will be excluded from the index that do not contain at least one of the specified index attributes or that have a value of null in any of the specified index attributes. Such documents will not be indexed, and not be taken into account for uniqueness checks. In a non-sparse index, these documents will be indexed (for non-present indexed attributes, a value of null will be used) and will be taken into account for uniqueness checks. In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. arangosh> db.ids.ensureIndex({ type: "skiplist", fields: [ "myId" ], unique: true }); arangosh> db.ids.save({ "myId": 123 }); arangosh> db.ids.save({ "myId": 456 }); arangosh> db.ids.save({ "myId": 789 }); arangosh> db.ids.save({ "myId": 123 }); show execution results arangosh> db.ids.ensureIndex({ type: "skiplist", fields: [ "name.first", "name.last" ], unique: true }); arangosh> db.ids.save({ "name" : { "first" : "hans", "last": "hansen" }}); arangosh> db.ids.save({ "name" : { "first" : "jens", "last": "jensen" }}); arangosh> db.ids.save({ "name" : { "first" : "hans", "last": "jensen" }}); arangosh> db.ids.save({ "name" : { "first" : "hans", "last": "hansen" }}); show execution results Non-unique Skiplist Index 109 Skiplists Ensures that a non-unique skiplist index exists: collection.ensureIndex({ type: "skiplist", fields: [ "field1", ..., "fieldn" ] }) Creates a non-unique skiplist index on all documents using field1, ... fieldn as attribute paths. At least one attribute path has to be given. The index will be non-sparse by default. To create a sparse non-unique index, set the sparse attribute to true . collection.ensureIndex({ type: "skiplist", fields: [ "field1", ..., "fieldn" ], sparse: true }) In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. arangosh> db.names.ensureIndex({ type: "skiplist", fields: [ "first" ] }); arangosh> db.names.save({ "first" : "Tim" }); arangosh> db.names.save({ "first" : "Tom" }); arangosh> db.names.save({ "first" : "John" }); arangosh> db.names.save({ "first" : "Tim" }); arangosh> db.names.save({ "first" : "Tom" }); show execution results Skiplist Array Index Ensures that a skiplist array index exists (non-unique): collection.ensureIndex({ type: "skiplist", fields: [ "field1[*]", ..., "fieldn[*]" ] }) Creates a non-unique skiplist array index for the individual elements of the array attributes field1[*], ... fieldn[*] found in the documents. At least one attribute path has to be given. The index always treats the indexed arrays as sparse. It is possible to combine array indexing with standard indexing: collection.ensureIndex({ type: "skiplist", fields: [ "field1[*]", "field2" ] }) In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. arangosh> db.test.ensureIndex({ type: "skiplist", fields: [ "a[*]" ] }); arangosh> db.test.save({ a : [ 1, 2 ] }); arangosh> db.test.save({ a : [ 1, 3 ] }); arangosh> db.test.save({ a : null }); show execution results Query by example using a skiplist index Constructs a query-by-example using a skiplist index: collection.byExample(example) Selects all documents from the collection that match the specified example and returns a cursor. A skiplist index will be used if present. You can use toArray, next, or hasNext to access the result. The result can be limited using the skip and limit operator. An attribute name of the form a.b is interpreted as attribute path, not as attribute. If you use { "a" : { "c" : 1 } } as example, then you will find all documents, such that the attribute a contains a document of the form {c : 1 }. For example the document { "a" : { "c" : 1 }, "b" : 1 } will match, but the document { "a" : { "c" : 1, "b" : 1 } } 110 Skiplists will not. However, if you use { "a.c" : 1 }, then you will find all documents, which contain a sub-document in a that has an attribute c of value 1. Both the following documents { "a" : { "c" : 1 }, "b" : 1 } and { "a" : { "c" : 1, "b" : 1 } } will match. 111 Persistent Persistent indexes Introduction to Persistent Indexes This is an introduction to ArangoDB's persistent indexes. It is possible to define a persistent index on one or more attributes (or paths) of documents. The index is then used in queries to locate documents within a given range. If the index is declared unique, then no two documents are allowed to have the same set of attribute values. Creating a new document or updating a document will fail if the uniqueness is violated. If the index is declared sparse, a document will be excluded from the index and no uniqueness checks will be performed if any index attribute value is not set or has a value of null . Accessing Persistent Indexes from the Shell ensures that a unique persistent index exists collection.ensureIndex({ type: "persistent", fields: [ "field1", ..., "fieldn" ], unique: true }) Creates a unique persistent index on all documents using field1, ... fieldn as attribute paths. At least one attribute path has to be given. The index will be non-sparse by default. All documents in the collection must differ in terms of the indexed attributes. Creating a new document or updating an existing document will will fail if the attribute uniqueness is violated. To create a sparse unique index, set the sparse attribute to true : collection.ensureIndex({ type: "persistent", fields: [ "field1", ..., "fieldn" ], unique: true, sparse: true }) In a sparse index all documents will be excluded from the index that do not contain at least one of the specified index attributes or that have a value of null in any of the specified index attributes. Such documents will not be indexed, and not be taken into account for uniqueness checks. In a non-sparse index, these documents will be indexed (for non-present indexed attributes, a value of null will be used) and will be taken into account for uniqueness checks. In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. arangosh> db.ids.ensureIndex({ type: "persistent", fields: [ "myId" ], unique: true }); arangosh> db.ids.save({ "myId": 123 }); arangosh> db.ids.save({ "myId": 456 }); arangosh> db.ids.save({ "myId": 789 }); arangosh> db.ids.save({ "myId": 123 }); show execution results arangosh> db.ids.ensureIndex({ type: "persistent", fields: [ "name.first", "name.last" ], unique: true }); arangosh> db.ids.save({ "name" : { "first" : "hans", "last": "hansen" }}); arangosh> db.ids.save({ "name" : { "first" : "jens", "last": "jensen" }}); arangosh> db.ids.save({ "name" : { "first" : "hans", "last": "jensen" }}); arangosh> db.ids.save({ "name" : { "first" : "hans", "last": "hansen" }}); show execution results ensures that a non-unique persistent index exists collection.ensureIndex({ type: "persistent", fields: [ "field1", ..., "fieldn" ] }) Creates a non-unique persistent index on all documents using field1, ... fieldn as attribute paths. At least one attribute path has to be given. The index will be non-sparse by default. 112 Persistent To create a sparse unique index, set the sparse attribute to true . In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. arangosh> db.names.ensureIndex({ type: "persistent", fields: [ "first" ] }); arangosh> db.names.save({ "first" : "Tim" }); arangosh> db.names.save({ "first" : "Tom" }); arangosh> db.names.save({ "first" : "John" }); arangosh> db.names.save({ "first" : "Tim" }); arangosh> db.names.save({ "first" : "Tom" }); show execution results Query by example using a persistent index constructs a query-by-example using a persistent index collection.byExample(example) Selects all documents from the collection that match the specified example and returns a cursor. A persistent index will be used if present. You can use toArray, next, or hasNext to access the result. The result can be limited using the skip and limit operator. An attribute name of the form a.b is interpreted as attribute path, not as attribute. If you use { "a" : { "c" : 1 } } as example, then you will find all documents, such that the attribute a contains a document of the form {c : 1 }. For example the document { "a" : { "c" : 1 }, "b" : 1 } will match, but the document { "a" : { "c" : 1, "b" : 1 } } will not. However, if you use { "a.c" : 1 }, then you will find all documents, which contain a sub-document in a that has an attribute c of value 1. Both the following documents { "a" : { "c" : 1 }, "b" : 1 } and { "a" : { "c" : 1, "b" : 1 } } will match. Persistent Indexes and Server Language The order of index entries in persistent indexes adheres to the configured server language. If, however, the server is restarted with a different language setting as when the persistent index was created, not all documents may be returned anymore and the sort order of those which are returned can be wrong (whenever the persistent index is consulted). To fix persistent indexes after a language change, delete and re-create them. Skiplist indexes are not affected, because they are not persisted and automatically rebuilt on every server start. 113 Persistent 114 Fulltext Indexes Fulltext indexes This is an introduction to ArangoDB's fulltext indexes. Introduction to Fulltext Indexes A fulltext index can be used to find words, or prefixes of words inside documents. A fulltext index can be defined on one attribute only, and will include all words contained in documents that have a textual value in the index attribute. Since ArangoDB 2.6 the index will also include words from the index attribute if the index attribute is an array of strings, or an object with string value members. For example, given a fulltext index on the translations attribute and the following documents, then searching for fulltext index would return only the first document. Searching for the index for the exact string documents, and searching for prefix:Fox Fox лиса using the would return the first two would return all three documents: { translations: { en: "fox", de: "Fuchs", fr: "renard", ru: "лиса" } } { translations: "Fox is the English translation of the German word Fuchs" } { translations: [ "ArangoDB", "document", "database", "Foxx" ] } Note that deeper nested objects are ignored. For example, a fulltext index on translations would index Fuchs, but not fox, given the following document structure: { translations: { en: { US: "fox" }, de: "Fuchs" } If you need to search across multiple fields and/or nested objects, you may write all the strings into a special attribute, which you then create the index on (it might be necessary to clean the strings first, e.g. remove line breaks and strip certain words). If the index attribute is neither a string, an object or an array, its contents will not be indexed. When indexing the contents of an array attribute, an array member will only be included in the index if it is a string. When indexing the contents of an object attribute, an object member value will only be included in the index if it is a string. Other data types are ignored and not indexed. Currently, fulltext indexes are not yet supported with the RocksDB storage engine. Thus the function FULLTEXT() will be unavailable when using this storage engine. To use fulltext indexes, please use the M M Files storage engine for the time being. Accessing Fulltext Indexes from the Shell Ensures that a fulltext index exists: collection.ensureIndex({ type: "fulltext", fields: [ "field" ], minLength: minLength }) Creates a fulltext index on all documents on attribute field. Fulltext indexes are implicitly sparse: all documents which do not have the specified field attribute or that have a non-qualifying value in their field attribute will be ignored for indexing. Only a single attribute can be indexed. Specifying multiple attributes is unsupported. The minimum length of words that are indexed can be specified via the minLength parameter. Words shorter than minLength characters will not be indexed. minLength has a default value of 2, but this value might be changed in future versions of ArangoDB. It is thus recommended to explicitly specify this value. In case that the index was successfully created, an object with the index details is returned. arangosh> db.example.ensureIndex({ type: "fulltext", fields: [ "text" ], minLength: 3 }); arangosh> db.example.save({ text : "the quick brown", b : { c : 1 } }); arangosh> db.example.save({ text : "quick brown fox", b : { c : 2 } }); arangosh> db.example.save({ text : "brown fox jums", b : { c : 3 } }); arangosh> db.example.save({ text : "fox jumps over", b : { c : 4 } }); 115 Fulltext Indexes arangosh> db.example.save({ text : "jumps over the", b : { c : 5 } }); arangosh> db.example.save({ text : "over the lazy", b : { c : 6 } }); arangosh> db.example.save({ text : "the lazy dog", b : { c : 7 } }); arangosh> db._query("FOR document IN FULLTEXT(example, 'text', 'the') RETURN document"); show execution results Looks up a fulltext index: collection.lookupFulltextIndex(attribute, minLength) Checks whether a fulltext index on the given attribute attribute exists. Fulltext AQL Functions Fulltext AQL functions are detailed in Fulltext functions. 116 Geo Indexes Geo Indexes Introduction to Geo Indexes This is an introduction to ArangoDB's geo indexes. AQL's geographic features are described in Geo functions. ArangoDB uses Hilbert curves to implement geo-spatial indexes. See this blog for details. A geo-spatial index assumes that the latitude is between -90 and 90 degree and the longitude is between -180 and 180 degree. A geo index will ignore all documents which do not fulfill these requirements. Accessing Geo Indexes from the Shell ensures that a geo index exists collection.ensureIndex({ type: "geo", fields: [ "location" ] }) Creates a geo-spatial index on all documents using location as path to the coordinates. The value of the attribute has to be an array with at least two numeric values. The array must contain the latitude (first value) and the longitude (second value). All documents, which do not have the attribute path or have a non-conforming value in it are excluded from the index. A geo index is implicitly sparse, and there is no way to control its sparsity. In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. To create a geo index on an array attribute that contains longitude first, set the geoJson attribute to true . This corresponds to the format described in RFC 7946 Position collection.ensureIndex({ type: "geo", fields: [ "location" ], geoJson: true }) To create a geo-spatial index on all documents using latitude and longitude as separate attribute paths, two paths need to be specified in the fields array: collection.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] }) In case that the index was successfully created, an object with the index details, including the index-identifier, is returned. Examples Create a geo index for an array attribute: arangosh> db.geo.ensureIndex({ type: "geo", fields: [ "loc" ] }); arangosh> for (i = -90; ........> ........> ........> i <= 90; i += 10) { for (j = -180; j <= 180; j += 10) { db.geo.save({ name : "Name/" + i + "/" + j, loc: [ i, j ] }); } ........> } arangosh> db.geo.count(); arangosh> db.geo.near(0, 0).limit(3).toArray(); arangosh> db.geo.near(0, 0).count(); show execution results Create a geo index for a hash array attribute: arangosh> db.geo2.ensureIndex({ type: "geo", fields: [ "location.latitude", "location.longitude" ] }); arangosh> for (i = -90; ........> ........> i <= 90; i += 10) { for (j = -180; j <= 180; j += 10) { db.geo2.save({ name : "Name/" + i + "/" + j, location: { latitude : i, longitude : j } }); 117 Geo Indexes ........> } ........> } arangosh> db.geo2.near(0, 0).limit(3).toArray(); show execution results Use GeoIndex with AQL SORT statement: arangosh> db.geoSort.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] }); arangosh> for (i = -90; ........> i <= 90; i += 10) { for (j = -180; j <= 180; j += 10) { ........> db.geoSort.save({ name : "Name/" + i + "/" + j, latitude : i, longitude : j }); ........> } ........> } arangosh> var query = "FOR doc in geoSort SORT DISTANCE(doc.latitude, doc.longitude, 0, 0) LIMIT 5 RETURN doc" arangosh> db._explain(query, {}, {colors: false}); arangosh> db._query(query); show execution results Use GeoIndex with AQL FILTER statement: arangosh> db.geoFilter.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] }); arangosh> for (i = -90; ........> i <= 90; i += 10) { for (j = -180; j <= 180; j += 10) { ........> db.geoFilter.save({ name : "Name/" + i + "/" + j, latitude : i, longitude : j }); ........> } ........> } arangosh> var query = "FOR doc in geoFilter FILTER DISTANCE(doc.latitude, doc.longitude, 0, 0) < 2000 RETURN doc" arangosh> db._explain(query, {}, {colors: false}); arangosh> db._query(query); show execution results constructs a geo index selection collection.geo(location-attribute) Looks up a geo index defined on attribute location_attribute. Returns a geo index object if an index was found. The near or operators can then be used to execute a geo-spatial query on this within particular index. This is useful for collections with multiple defined geo indexes. collection.geo(location_attribute, true) Looks up a geo index on a compound attribute location_attribute. Returns a geo index object if an index was found. The near or operators can then be used to execute a geo-spatial query on this within particular index. collection.geo(latitude_attribute, longitude_attribute) Looks up a geo index defined on the two attributes latitude_attribute and longitude-attribute. Returns a geo index object if an index was found. The near or operators can then be used to execute a geo-spatial query on this within particular index. Note: this method is not yet supported by the RocksDB storage engine. 118 Geo Indexes Note: the geo simple query helper function is deprecated as of ArangoDB 2.6. The function may be removed in future versions of ArangoDB. The preferred way for running geo queries is to use their AQL equivalents. Examples Assume you have a location stored as list in the attribute home and a destination stored in the attribute work. Then you can use the operator to select which geo-spatial attributes (and thus which index) to use in a arangosh> for (i = -90; ........> i <= 90; for (j = -180; ........> query. i += 10) { j <= 180; j += 10) { db.complex.save({ name : "Name/" + i + "/" + j, ........> home : [ i, j ], ........> work : [ -i, -j ] }); ........> near geo } ........> } ........> arangosh> db.complex.near(0, 170).limit(5); arangosh> db.complex.ensureIndex({ type: "geo", fields: [ "home" ] }); arangosh> db.complex.near(0, 170).limit(5).toArray(); arangosh> db.complex.geo("work").near(0, 170).limit(5); arangosh> db.complex.ensureIndex({ type: "geo", fields: [ "work" ] }); arangosh> db.complex.geo("work").near(0, 170).limit(5).toArray(); show execution results constructs a near query for a collection collection.near(latitude, longitude) The returned list is sorted according to the distance, with the nearest document to the coordinate (latitude, longitude) coming first. If there are near documents of equal distance, documents are chosen randomly from this set until the limit is reached. It is possible to change the limit using the limit operator. In order to use the near operator, a geo index must be defined for the collection. This index also defines which attribute holds the coordinates for the document. If you have more then one geo-spatial index, you can use the geo operator to select a particular index. Note: near does not support negative skips. // However, you can still use limit followed to skip. collection.near(latitude, longitude).limit(limit) Limits the result to limit documents instead of the default 100. Note: Unlike with multiple explicit limits, limit will raise the implicit default limit imposed by within . collection.near(latitude, longitude).distance() This will add an attribute distance to all documents returned, which contains the distance between the given point and the document in meters. collection.near(latitude, longitude).distance(name) This will add an attribute name to all documents returned, which contains the distance between the given point and the document in meters. Note: this method is not yet supported by the RocksDB storage engine. Note: the near simple query function is deprecated as of ArangoDB 2.6. The function may be removed in future versions of ArangoDB. The preferred way for retrieving documents from a collection using the near operator is to use the AQL NEAR function in an AQL query as follows: FOR doc IN NEAR(@@collection, @latitude, @longitude, @limit) RETURN doc Examples To get the nearest two locations: 119 Geo Indexes arangosh> db.geo.ensureIndex({ type: "geo", fields: [ "loc" ] }); arangosh> for (var i = -90; ........> i <= 90; i += 10) { for (var j = -180; j <= 180; j += 10) { ........> db.geo.save({ ........> name : "Name/" + i + "/" + j, ........> loc: [ i, j ] }); ........> } } arangosh> db.geo.near(0, 0).limit(2).toArray(); show execution results If you need the distance as well, then you can use the distance operator: arangosh> db.geo.ensureIndex({ type: "geo", fields: [ "loc" ] }); arangosh> for (var i = -90; ........> i <= 90; i += 10) { for (var j = -180; j <= 180; j += 10) { ........> db.geo.save({ ........> name : "Name/" + i + "/" + j, ........> loc: [ i, j ] }); ........> } } arangosh> db.geo.near(0, 0).distance().limit(2).toArray(); show execution results constructs a within query for a collection collection.within(latitude, longitude, radius) This will find all documents within a given radius around the coordinate (latitude, longitude). The returned array is sorted by distance, beginning with the nearest document. In order to use the within operator, a geo index must be defined for the collection. This index also defines which attribute holds the coordinates for the document. If you have more then one geo-spatial index, you can use the geo operator to select a particular index. collection.within(latitude, longitude, radius).distance() This will add an attribute _distance to all documents returned, which contains the distance between the given point and the document in meters. collection.within(latitude, longitude, radius).distance(name) This will add an attribute name to all documents returned, which contains the distance between the given point and the document in meters. Note: this method is not yet supported by the RocksDB storage engine. Note: the within simple query function is deprecated as of ArangoDB 2.6. The function may be removed in future versions of ArangoDB. The preferred way for retrieving documents from a collection using the within operator is to use the AQL WITHIN function in an AQL query as follows: FOR doc IN WITHIN(@@collection, @latitude, @longitude, @radius, @distanceAttributeName) RETURN doc Examples To find all documents within a radius of 2000 km use: arangosh> for (var i = -90; ........> i <= 90; i += 10) { for (var j = -180; j <= 180; j += 10) { ........> db.geo.save({ name : "Name/" + i + "/" + j, loc: [ i, j ] }); } } arangosh> db.geo.within(0, 0, 2000 * 1000).distance().toArray(); show execution results ensures that a geo index exists collection.ensureIndex({ type: "geo", fields: [ "location" ] }) 120 Geo Indexes Since ArangoDB 2.5, this method is an alias for ensureGeoIndex since geo indexes are always sparse, meaning that documents that do not contain the index attributes or have non-numeric values in the index attributes will not be indexed. ensureGeoConstraint is deprecated and ensureGeoIndex should be used instead. The index does not provide a unique option because of its limited usability. It would prevent identical coordinates from being inserted only, but even a slightly different location (like 1 inch or 1 cm off) would be unique again and not considered a duplicate, although it probably should. The desired threshold for detecting duplicates may vary for every project (including how to calculate the distance even) and needs to be implemented on the application layer as needed. You can write a Foxx service for this purpose and make use of the AQL geo functions to find nearby coordinates supported by a geo index. 121 Vertex Centric Indexes Vertex Centric Indexes Introduction to Vertex Centric Indexes In ArangoDB there are special indices designed to speed up graph operations, especially if the graph contains supernodes (vertices that have an exceptionally high amount of connected edges). These indices are called vertex centric indexes and can be used in addition to the existing edge index. Motivation The idea of this index is to index a combination of a vertex, the direction and any arbitrary set of other attributes on the edges. To take an example, if we have an attribute called attached to a vertex with a given type type on the edges, we can use an outbound vertex-centric index on this attribute to find all edges . The following query example could benefit from such an index: FOR v, e, p IN 3..5 OUTBOUND @start GRAPH @graphName FILTER p.edges[*].type ALL == "friend" RETURN v Using the built-in edge-index ArangoDB can find the list of all edges attached to the vertex fast, but still it has to walk through this list and check if all of them have the attribute the vertex having the attribute type == "friend" type == "friend" . Using a vertex-centric index would allow ArangoDB to find all edges for in the same time and can save the iteration to verify the condition. Index creation A vertex-centric can be either of the following types: Hash Index Skiplist Index Persistent Index And is created using their creation operations. However in the list of fields used to create the index we have to include either _to _from or . Let us again explain this by an example. Assume we want to create an hash-based outbound vertex-centric index on the attribute type . This can be created with the following way: arangosh> db.collection.ensureIndex({ type: "hash", fields: [ "_from", "type" ] }) show execution results All options that are supported by the respective indexes are supported by the vertex-centric index as well. Index usage The AQL optimizer can decide to use a vertex-centric whenever suitable, however it is not guaranteed that this index is used, the optimizer may estimate that an other index is assumed to be better. The optimizer will consider this type of indexes on explicit filtering of _from respectively _to : FOR edge IN collection FILTER edge._from == "vertices/123456" AND edge.type == "friend" RETURN edge and during pattern matching queries: FOR v, e, p IN 3..5 OUTBOUND @start GRAPH @graphName FILTER p.edges[*].type ALL == "friend" RETURN v 122 Vertex Centric Indexes 123 Graphs ArangoDB Graphs First Steps with Graphs A Graph consists of vertices and edges. Edges are stored as documents in edge collections. A vertex can be a document of a document collection or of an edge collection (so edges can be used as vertices). Which collections are used within a named graph is defined via edge definitions. A named graph can contain more than one edge definition, at least one is needed. Graphs allow you to structure your models in line with your domain and group them logically in collections and giving you the power to query them in the same graph queries. New to graphs? Take our free graph course for freshers and get from zero knowledge to advanced query techniques. Coming from a relational background - what's a graph? In SQL you commonly have the construct of a relation table to store n:m relations between two data tables. An edge collection is somewhat similar to these relation tables; vertex collections resemble the data tables with the objects to connect. While simple graph queries with fixed number of hops via the relation table may be doable in SQL with several nested joins, graph databases can handle an arbitrary number of these hops over edge collections - this is called traversal. Also edges in one edge collection may point to several vertex collections. Its common to have attributes attached to edges, i.e. a label naming this interconnection. Edges have a direction, with their relations _from and _to pointing from one document to another document stored in vertex collections. In queries you can define in which directions the edge relations may be followed ( _to OUTBOUND : _from → _to , INBOUND : _from ← _to , ANY : _from ↔ ). Named Graphs Named graphs are completely managed by ArangoDB, and thus also visible in the web interface. They use the full spectrum of ArangoDB's graph features. You may access them via several interfaces. AQL Graph Operations with several flavors: AQL Traversals on both named and anonymous graphs AQL Shortest Path on both named and anonymous graph JavaScript General Graph implementation, as you may use it in Foxx Services Graph M anagement; creating & manipualating graph definitions; inserting, updating and deleting vertices and edges into graphs Graph Functions for working with edges and vertices, to analyze them and their relations JavaScript Smart Graph implementation, for scalable graphs Smart Graph M anagement; creating & manipualating SmartGraph definitions; Differences to General Graph RESTful General Graph interface used to implement graph management in client drivers Manipulating collections of named graphs with regular document functions The underlying collections of the named graphs are still accessible using the standard methods for collections. However the graph module adds an additional layer on top of these collections giving you the following guarantees: All modifications are executed transactional If you delete a vertex all edges will be deleted, you will never have loose ends If you insert an edge it is checked if the edge matches the edge definitions, your edge collections will only contain valid edges These guarantees are lost if you access the collections in any other way than the graph module or AQL, so if you delete documents from your vertex collections directly, the edges pointing to them will be remain in place. Anonymous graphs Sometimes you may not need all the powers of named graphs, but some of its bits may be valuable to you. You may use anonymous graphs in the traversals and in the Working with Edges chapter. Anonymous graphs don't have edge definitions describing which vertex collection is connected by which edge collection. The graph model has to be maintained in the client side code. This gives you more freedom than the strict named graphs. 124 Graphs AQL Graph Operations are available for both, named and anonymous graphs: AQL Traversals AQL Shortest Path When to choose anonymous or named graphs? As noted above, named graphs ensure graph integrity, both when inserting or removing edges or vertices. So you won't encounter dangling edges, even if you use the same vertex collection in several named graphs. This involves more operations inside the database which come at a cost. Therefore anonymous graphs may be faster in many operations. So this question may be narrowed down to: 'Can I afford the additional effort or do I need the warranty for integrity?'. Multiple edge collections vs. FILTER s on edge document attributes If you want to only traverse edges of a specific type, there are two ways to achieve this. The first would be an attribute in the edge document - i.e. can later type , where you specify a differentiator for the edge - i.e. FILTER e.type = "friends" "friends" , "family" , "married" or , so you "workmates" if you only want to follow the friend edges. Another way, which may be more efficient in some cases, is to use different edge collections for different types of edges, so you have friend_edges , family_edges , and married_edges workmate_edges as collection names. You can then configure several named graphs including a subset of the available edge and vertex collections - or you use anonymous graph queries, where you specify a list of edge collections to take into account in that query. To only follow friend edges, you would specify Both approaches have advantages and disadvantages. FILTER friend_edges as sole edge collection. operations on edge attributes will do comparisons on each traversed edge, which may become CPU-intense. When not finding the edges in the first place because of the collection containing them is not traversed at all, there will never be a reason to actualy check for their type attribute with FILTER . The multiple edge collections approach is limited by the number of collections that can be used simultaneously in one query. Every collection used in a query requires some resources inside of ArangoDB and the number is therefore limited to cap the resource requirements. You may also have constraints on other edge attributes, such as a hash index with a unique constraint, which requires the documents to be in a single collection for the uniqueness guarantee, and it may thus not be possible to store the different types of edges in multiple edge collections. So, if your edges have about a dozen different types, it's okay to choose the collection approach, otherwise the preferred. You can still use FILTER operations on edges of course. You can get rid of a FILTER on the type FILTER approach is with the former approach, everything else can stay the same. Which part of my data is an Edge and which a Vertex? The main objects in your data model, such as users, groups or articles, are usually considered to be vertices. For each type of object, a document collection (also called vertex collection) should store the individual entities. Entities can be connected by edges to express and classify relations between vertices. It often makes sense to have an edge collection per relation type. ArangoDB does not require you to store your data in graph structures with edges and vertices, you can also decide to embed attributes such as which groups a user is part of, or _id s of documents in another document instead of connecting the documents with edges. It can be a meaningful performance optimization for 1:n relationships, if your data is not focused on relations and you don't need graph traversal with varying depth. It usually means to introduce some redundancy and possibly inconsistencies if you embed data, but it can be an acceptable tradeoff. Vertices Let's say we have two vertex collections, Users and Groups i.e. when it was founded, its subject, an icon URL and so on. . Documents in the Users Groups collection contain the attributes of the Group, documents contain the data specific to a user - like all names, birthdays, Avatar URLs, hobbies... Edges We can use an edge collection to store relations between users and groups. Since multiple users may be in an arbitrary number of groups, this is an m:n relation. The edge collection can be called _to pointing to Groups/BowlingGroupHappyPin UsersInGroups with i.e. one edge with _from pointing to Users/John and . This makes the user John a member of the group Bowling Group Happy Pin. 125 Graphs Attributes of this relation may contain qualifiers to this relation, like the permissions of John in this group, the date when he joined the group etc. So roughly put, if you use documents and their attributes in a sentence, nouns would typically be vertices, verbs become the edges. You can see this in the knows graph below: Alice knows Bob, who in term knows Charlie. Advantages of this approach Graphs give you the advantage of not just being able to have a fixed number of m:n relations in a row, but an arbitrary number. Edges can be traversed in both directions, so it's easy to determine all groups a user is in, but also to find out which members a certain group has. Users could also be interconnected to create a social network. Using the graph data model, dealing with data that has lots of relations stays manageable and can be queried in very flexible ways, whereas it would cause headache to handle it in a relational database system. Backup and restore For sure you want to have backups of your graph data, you can use Arangodump to create the backup, and Arangorestore to restore a backup into a new ArangoDB. You should however note that: you need the system collection _graphs if you backup named graphs. you need to backup the complete set of all edge and vertex collections your graph consists of. Partial dump/restore may not work. Managing graphs By default you should use the interface your driver provides to manage graphs. This is i.e. documented in Graphs-Section of the ArangoDB Java driver. Example Graphs ArangoDB comes with a set of easily graspable graphs that are used to demonstrate the APIs. You can use the create graph window in the webinterface, or load the module @arangodb/graph-examples/example-graph add samples tab in the in arangosh and use it to create instances of these graphs in your ArangoDB. Once you've created them, you can inspect them in the webinterface - which was used to create the pictures below. You can easily look into the innards of this script for reference about howto manage graphs programatically. The Knows_Graph 126 Graphs A set of persons knowing each other: The knows graph consists of one vertex collection persons connected via one edge collection knows . It will contain five persons Alice, Bob, Charlie, Dave and Eve. We will have the following directed relations: Alice knows Bob Bob knows Charlie Bob knows Dave Eve knows Alice Eve knows Bob This is how we create it, inspect its vertices and edges, and drop it again: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("knows_graph"); arangosh> db.persons.toArray() arangosh> db.knows.toArray(); arangosh> examples.dropGraph("knows_graph"); show execution results The Social Graph A set of persons and their relations: 127 Graphs This example has female and male persons as vertices in two vertex collections the relation female and male . The edges are their connections in edge collection. This is how we create it, inspect its vertices and edges, and drop it again: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> db.female.toArray() arangosh> db.male.toArray() arangosh> db.relation.toArray() arangosh> examples.dropGraph("social"); show execution results The City Graph A set of european cities, and their fictional traveling distances as connections: 128 Graphs The example has the cities as vertices in several vertex collections several edge collections french / german / international Highway germanCity and frenchCity . The edges are their interconnections in . This is how we create it, inspect its edges and vertices, and drop it again: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("routeplanner"); arangosh> db.frenchCity.toArray(); arangosh> db.germanCity.toArray(); arangosh> db.germanHighway.toArray(); arangosh> db.frenchHighway.toArray(); arangosh> db.internationalHighway.toArray(); arangosh> examples.dropGraph("routeplanner"); show execution results The Traversal Graph This graph was designed to demonstrate filters in traversals. It has some labels to filter on it. 129 Graphs The example has all its vertices in the circles collection, and an edges edge collection to connect them. Circles have unique numeric labels. Edges have two boolean attributes (theFalse always being false, theTruth always being true) and a label sorting B - D to the left side, G K to the right side. Left and right side split into Paths - at B and G which are each direct neighbours of the root-node A. Starting from A the graph has a depth of 3 on all its paths. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("traversalGraph"); arangosh> db.circles.toArray(); arangosh> db.edges.toArray(); arangosh> examples.dropGraph("traversalGraph"); show execution results The World Graph The world country graph structures its nodes like that: world → continent → country → capital. In some cases edge directions aren't forward (therefore it will be displayed disjunct in the graph viewer). It has two ways of creating it. One using the named graph utilities (worldCountry), one without (worldCountryUnManaged). It is used to demonstrate raw traversal operations. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("worldCountry"); arangosh> db.worldVertices.toArray(); arangosh> db.worldEdges.toArray(); arangosh> examples.dropGraph("worldCountry"); arangosh> var g = examples.loadGraph("worldCountryUnManaged"); arangosh> examples.dropGraph("worldCountryUnManaged"); show execution results 130 Graphs Cookbook examples The above referenced chapters describe the various APIs of ArangoDBs graph engine with small examples. Our cookbook has some more real life examples: Traversing a graph in full depth Using an example vertex with the java driver Retrieving documents from ArangoDB without knowing the structure Using a custom visitor from node.js AQL Example Queries on an Actors and M ovies Database Higher volume graph examples All of the above examples are rather small so they are easier to comprehend and can demonstrate the way the functionality works. There are however several datasets freely available on the web that are a lot bigger. We collected some of them with import scripts so you may play around with them. Another huge graph is the Pokec social network from Slovakia that we used for performance testing on several databases; You will find importing scripts etc. in this blogpost. 131 General Graphs Graphs This chapter describes the general-graph module. It allows you to define a graph that is spread across several edge and document collections. This allows you to structure your models in line with your domain and group them logically in collections giving you the power to query them in the same graph queries. There is no need to include the referenced collections within the query, this module will handle it for you. Three Steps to create a graph Create a graph arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> var graph = graph_module._create("myGraph"); arangosh> graph; {[Graph] } Add some vertex collections arangosh> graph._addVertexCollection("shop"); arangosh> graph._addVertexCollection("customer"); arangosh> graph._addVertexCollection("pet"); arangosh> graph; show execution results Define relations on the Graph arangosh> var rel = graph_module._relation("isCustomer", ["shop"], ["customer"]); arangosh> graph._extendEdgeDefinitions(rel); arangosh> graph; show execution results 132 Graph M anagement Graph Management This chapter describes the javascript interface for creating and modifying named graphs. In order to create a non empty graph the functionality to create edge definitions has to be introduced first: Edge Definitions An edge definition is always a directed relation of a graph. Each graph can have arbitrary many relations defined within the edge definitions array. Initialize the list Create a list of edge definitions to construct a graph. graph_module._edgeDefinitions(relation1, relation2, ..., relationN) The list of edge definitions of a graph can be managed by the graph module itself. This function is the entry point for the management and will return the correct list. Parameters relationX (optional) An object representing a definition of one relation in the graph Examples arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> directed_relation = graph_module._relation("lives_in", "user", "city"); arangosh> undirected_relation = graph_module._relation("knows", "user", "user"); arangosh> edgedefinitions = graph_module._edgeDefinitions(directed_relation, undirected_relation); show execution results Extend the list Extend the list of edge definitions to construct a graph. graph_module._extendEdgeDefinitions(edgeDefinitions, relation1, relation2, ..., relationN) In order to add more edge definitions to the graph before creating this function can be used to add more definitions to the initial list. Parameters edgeDefinitions (required) A list of relation definition objects. relationX (required) An object representing a definition of one relation in the graph Examples arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> directed_relation = graph_module._relation("lives_in", "user", "city"); arangosh> undirected_relation = graph_module._relation("knows", "user", "user"); arangosh> edgedefinitions = graph_module._edgeDefinitions(directed_relation); arangosh> edgedefinitions = graph_module._extendEdgeDefinitions(undirected_relation); show execution results Relation Define a directed relation. 133 Graph M anagement graph_module._relation(relationName, fromVertexCollections, toVertexCollections) The relationName defines the name of this relation and references to the underlying edge collection. The fromVertexCollections is an Array of document collections holding the start vertices. The toVertexCollections is an Array of document collections holding the target vertices. Relations are only allowed in the direction from any collection in fromVertexCollections to any collection in toVertexCollections. Parameters relationName (required) The name of the edge collection where the edges should be stored. Will be created if it does not yet exist. fromVertexCollections (required) One or a list of collection names. Source vertices for the edges have to be stored in these collections. Collections will be created if they do not exist. toVertexCollections (required) One or a list of collection names. Target vertices for the edges have to be stored in these collections. Collections will be created if they do not exist. Examples arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> graph_module._relation("has_bought", ["Customer", "Company"], ["Groceries", "Electronics"]); show execution results arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> graph_module._relation("has_bought", "Customer", "Product"); show execution results Create a graph After having introduced edge definitions a graph can be created. Create a graph graph_module._create(graphName, edgeDefinitions, orphanCollections) The creation of a graph requires the name of the graph and a definition of its edges. For every type of edge definition a convenience method exists that can be used to create a graph. Optionally a list of vertex collections can be added, which are not used in any edge definition. These collections are referred to as orphan collections within this chapter. All collections used within the creation process are created if they do not exist. Parameters graphName (required) Unique identifier of the graph edgeDefinitions (optional) List of relation definition objects orphanCollections (optional) List of additional vertex collection names Examples Create an empty graph, edge definitions can be added at runtime: arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> graph = graph_module._create("myGraph"); {[Graph] } Create a graph using an edge collection edges and a single vertex collection vertices arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> var edgeDefinitions = [ { collection: "edges", "from": [ "vertices" ], "to" : [ "vertices" ] } ]; 134 Graph M anagement arangosh> graph = graph_module._create("myGraph", edgeDefinitions); {[Graph] "edges" : [ArangoCollection 16646, "edges" (type edge, status loaded)], "vertices" : [ArangoCollection 16641, "vertices" (type document, status loaded)] } Create a graph with edge definitions and orphan collections: arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> graph = graph_module._create("myGraph", ........> [graph_module._relation("myRelation", ["male", "female"], ["male", "female"])], ["sessions"]); show execution results Complete Example to create a graph Example Call: arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> var edgeDefinitions = graph_module._edgeDefinitions(); arangosh> graph_module._extendEdgeDefinitions(edgeDefinitions, graph_module._relation("friend_of", "Customer", "Customer")); arangosh> graph_module._extendEdgeDefinitions( ........> edgeDefinitions, graph_module._relation( ........> "has_bought", ["Customer", "Company"], ["Groceries", "Electronics"])); arangosh> graph_module._create("myStore", edgeDefinitions); show execution results alternative call: arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> var edgeDefinitions = graph_module._edgeDefinitions( ........> graph_module._relation("friend_of", ["Customer"], ["Customer"]), graph_module._relation( ........> "has_bought", ["Customer", "Company"], ["Groceries", "Electronics"])); arangosh> graph_module._create("myStore", edgeDefinitions); show execution results List available graphs List all graphs. graph_module._list() Lists all graph names stored in this database. Examples arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> graph_module._list(); [ ] Load a graph 135 Graph M anagement Get a graph graph_module._graph(graphName) A graph can be retrieved by its name. Parameters graphName (required) Unique identifier of the graph Examples Get a graph: arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> graph = graph_module._graph("social"); show execution results Remove a graph Remove a graph graph_module._drop(graphName, dropCollections) A graph can be dropped by its name. This can drop all collections contained in the graph as long as they are not used within other graphs. To drop the collections only belonging to this graph, the optional parameter drop-collections has to be set to true. Parameters graphName (required) Unique identifier of the graph dropCollections (optional) Define if collections should be dropped (default: false) Examples Drop a graph and keep collections: arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> graph_module._drop("social"); true arangosh> db._collection("female"); [ArangoCollection 16733, "female" (type document, status loaded)] arangosh> db._collection("male"); [ArangoCollection 16736, "male" (type document, status loaded)] arangosh> db._collection("relation"); [ArangoCollection 16739, "relation" (type edge, status loaded)] arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> graph_module._drop("social", true); true arangosh> db._collection("female"); null arangosh> db._collection("male"); null arangosh> db._collection("relation"); null Modify a graph definition during runtime After you have created an graph its definition is not immutable. You can still add, delete or modify edge definitions and vertex collections. 136 Graph M anagement Extend the edge definitions Add another edge definition to the graph graph._extendEdgeDefinitions(edgeDefinition) Extends the edge definitions of a graph. If an orphan collection is used in this edge definition, it will be removed from the orphanage. If the edge collection of the edge definition to add is already used in the graph or used in a different graph with different from and/or to collections an error is thrown. Parameters edgeDefinition (required) The relation definition to extend the graph Examples arangosh> var graph_module = require("@arangodb/general-graph") arangosh> var ed1 = graph_module._relation("myEC1", ["myVC1"], ["myVC2"]); arangosh> var ed2 = graph_module._relation("myEC2", ["myVC1"], ["myVC3"]); arangosh> var graph = graph_module._create("myGraph", [ed1]); arangosh> graph._extendEdgeDefinitions(ed2); Modify an edge definition M odify an relation definition graph_module._editEdgeDefinition(edgeDefinition) Edits one relation definition of a graph. The edge definition used as argument will replace the existing edge definition of the graph which has the same collection. Vertex Collections of the replaced edge definition that are not used in the new definition will transform to an orphan. Orphans that are used in this new edge definition will be deleted from the list of orphans. Other graphs with the same edge definition will be modified, too. Parameters edgeDefinition (required) The edge definition to replace the existing edge definition with the same attribute collection. Examples arangosh> var graph_module = require("@arangodb/general-graph") arangosh> var original = graph_module._relation("myEC1", ["myVC1"], ["myVC2"]); arangosh> var modified = graph_module._relation("myEC1", ["myVC2"], ["myVC3"]); arangosh> var graph = graph_module._create("myGraph", [original]); arangosh> graph._editEdgeDefinitions(modified); Delete an edge definition Delete one relation definition graph_module._deleteEdgeDefinition(edgeCollectionName, dropCollection) Deletes a relation definition defined by the edge collection of a graph. If the collections defined in the edge definition (collection, from, to) are not used in another edge definition of the graph, they will be moved to the orphanage. Parameters edgeCollectionName (required) Name of edge collection in the relation definition. dropCollection (optional) Define if the edge collection should be dropped. Default false. Examples Remove an edge definition but keep the edge collection: arangosh> var graph_module = require("@arangodb/general-graph") 137 Graph M anagement arangosh> var ed1 = graph_module._relation("myEC1", ["myVC1"], ["myVC2"]); arangosh> var ed2 = graph_module._relation("myEC2", ["myVC1"], ["myVC3"]); arangosh> var graph = graph_module._create("myGraph", [ed1, ed2]); arangosh> graph._deleteEdgeDefinition("myEC1"); arangosh> db._collection("myEC1"); [ArangoCollection 22374, "myEC1" (type edge, status loaded)] Remove an edge definition and drop the edge collection: arangosh> var graph_module = require("@arangodb/general-graph") arangosh> var ed1 = graph_module._relation("myEC1", ["myVC1"], ["myVC2"]); arangosh> var ed2 = graph_module._relation("myEC2", ["myVC1"], ["myVC3"]); arangosh> var graph = graph_module._create("myGraph", [ed1, ed2]); arangosh> graph._deleteEdgeDefinition("myEC1", true); arangosh> db._collection("myEC1"); null Extend vertex Collections Each graph can have an arbitrary amount of vertex collections, which are not part of any edge definition of the graph. These collections are called orphan collections. If the graph is extended with an edge definition using one of the orphans, it will be removed from the set of orphan collection automatically. Add a vertex collection Add a vertex collection to the graph graph._addVertexCollection(vertexCollectionName, createCollection) Adds a vertex collection to the set of orphan collections of the graph. If the collection does not exist, it will be created. If it is already used by any edge definition of the graph, an error will be thrown. Parameters vertexCollectionName (required) Name of vertex collection. createCollection (optional) If true the collection will be created if it does not exist. Default: true. Examples arangosh> var graph_module = require("@arangodb/general-graph"); arangosh> var ed1 = graph_module._relation("myEC1", ["myVC1"], ["myVC2"]); arangosh> var graph = graph_module._create("myGraph", [ed1]); arangosh> graph._addVertexCollection("myVC3", true); Get the orphaned collections Get all orphan collections graph._orphanCollections() Returns all vertex collections of the graph that are not used in any edge definition. Examples arangosh> var graph_module = require("@arangodb/general-graph") arangosh> var ed1 = graph_module._relation("myEC1", ["myVC1"], ["myVC2"]); arangosh> var graph = graph_module._create("myGraph", [ed1]); arangosh> graph._addVertexCollection("myVC3", true); arangosh> graph._orphanCollections(); 138 Graph M anagement [ "myVC3" ] Remove a vertex collection Remove a vertex collection from the graph graph._removeVertexCollection(vertexCollectionName, dropCollection) Removes a vertex collection from the graph. Only collections not used in any relation definition can be removed. Optionally the collection can be deleted, if it is not used in any other graph. Parameters vertexCollectionName (required) Name of vertex collection. dropCollection (optional) If true the collection will be dropped if it is not used in any other graph. Default: false. Examples arangosh> var graph_module = require("@arangodb/general-graph") arangosh> var ed1 = graph_module._relation("myEC1", ["myVC1"], ["myVC2"]); arangosh> var graph = graph_module._create("myGraph", [ed1]); arangosh> graph._addVertexCollection("myVC3", true); arangosh> graph._addVertexCollection("myVC4", true); arangosh> graph._orphanCollections(); arangosh> graph._removeVertexCollection("myVC3"); arangosh> graph._orphanCollections(); show execution results Manipulating Vertices Save a vertex Create a new vertex in vertexCollectionName graph.vertexCollectionName.save(data) Parameters data (required) JSON data of vertex. Examples arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> graph.male.save({name: "Floyd", _key: "floyd"}); show execution results Replace a vertex Replaces the data of a vertex in collection vertexCollectionName graph.vertexCollectionName.replace(vertexId, data, options) Parameters vertexId (required) _id attribute of the vertex data (required) JSON data of vertex. 139 Graph M anagement options (optional) See collection documentation Examples arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> graph.male.save({neym: "Jon", _key: "john"}); arangosh> graph.male.replace("male/john", {name: "John"}); show execution results Update a vertex Updates the data of a vertex in collection vertexCollectionName graph.vertexCollectionName.update(vertexId, data, options) Parameters vertexId (required) _id attribute of the vertex data (required) JSON data of vertex. options (optional) See collection documentation Examples arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> graph.female.save({name: "Lynda", _key: "linda"}); arangosh> graph.female.update("female/linda", {name: "Linda", _key: "linda"}); show execution results Remove a vertex Removes a vertex in collection vertexCollectionName graph.vertexCollectionName.remove(vertexId, options) Additionally removes all ingoing and outgoing edges of the vertex recursively (see edge remove). Parameters vertexId (required) _id attribute of the vertex options (optional) See collection documentation Examples arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> graph.male.save({name: "Kermit", _key: "kermit"}); arangosh> db._exists("male/kermit") arangosh> graph.male.remove("male/kermit") arangosh> db._exists("male/kermit") show execution results Manipulating Edges Save a new edge 140 Graph M anagement Creates an edge from vertex from to vertex to in collection edgeCollectionName graph.edgeCollectionName.save(from, to, data, options) Parameters from (required) _id attribute of the source vertex to (required) _id attribute of the target vertex data (required) JSON data of the edge options (optional) See collection documentation Examples arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> graph.relation.save("male/bob", "female/alice", {type: "married", _key: "bobAndAlice"}); show execution results If the collections of from and to are not defined in an edge definition of the graph, the edge will not be stored. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> graph.relation.save( ........> ........> "relation/aliceAndBob", "female/alice", ........> {type: "married", _key: "bobAndAlice"}); [ArangoError 1906: invalid edge between relation/aliceAndBob and female/alice. Doesn't conform to any edge definition] Replace an edge Replaces the data of an edge in collection edgeCollectionName. Note that _from and _to are mandatory. graph.edgeCollectionName.replace(edgeId, data, options) Parameters edgeId (required) _id attribute of the edge data (required) JSON data of the edge options (optional) See collection documentation Examples arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> graph.relation.save("female/alice", "female/diana", {typo: "nose", _key: "aliceAndDiana"}); arangosh> graph.relation.replace("relation/aliceAndDiana", {type: "knows", _from: "female/alice", _to: "female/diana"}); show execution results Update an edge Updates the data of an edge in collection edgeCollectionName graph.edgeCollectionName.update(edgeId, data, options) Parameters 141 Graph M anagement edgeId (required) _id attribute of the edge data (required) JSON data of the edge options (optional) See collection documentation Examples arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> graph.relation.save("female/alice", "female/diana", {type: "knows", _key: "aliceAndDiana"}); arangosh> graph.relation.update("relation/aliceAndDiana", {type: "quarreled", _key: "aliceAndDiana"}); show execution results Remove an edge Removes an edge in collection edgeCollectionName graph.edgeCollectionName.remove(edgeId, options) If this edge is used as a vertex by another edge, the other edge will be removed (recursively). Parameters edgeId (required) _id attribute of the edge options (optional) See collection documentation Examples arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> graph.relation.save("female/alice", "female/diana", {_key: "aliceAndDiana"}); arangosh> db._exists("relation/aliceAndDiana") arangosh> graph.relation.remove("relation/aliceAndDiana") arangosh> db._exists("relation/aliceAndDiana") show execution results Connect edges Get all connecting edges between 2 groups of vertices defined by the examples graph._getConnectingEdges(vertexExample, vertexExample2, options) The function accepts an id, an example, a list of examples or even an empty example as parameter for vertexExample. Parameters vertexExample1 (optional) See Definition of examples vertexExample2 (optional) See Definition of examples options (optional) An object defining further options. Can have the following values: edgeExamples: Filter the edges, see Definition of examples edgeCollectionRestriction : One or a list of edge-collection names that should be considered to be on the path. vertex1CollectionRestriction : One or a list of vertex-collection names that should be considered on the intermediate vertex steps. vertex2CollectionRestriction : One or a list of vertex-collection names that should be considered on the intermediate vertex steps. Examples A route planner example, all connecting edges between capitals. 142 Graph M anagement arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._getConnectingEdges({isCapital : true}, {isCapital : true}); [ ] 143 Graph Functions Graph Functions This chapter describes various functions on a graph. A lot of these accept a vertex (or edge) example as parameter as defined in the next section. Examples will explain the API on the the city graph: Definition of examples For many of the following functions examples can be passed in as a parameter. Examples are used to filter the result set for objects that match the conditions. These examples can have the following values: null, there is no matching executed all found results are valid. A string, only results are returned, which _id equal the value of the string An example object, defining a set of attributes. Only results having these attributes are matched. A list containing example objects and/or strings. All results matching at least one of the elements in the list are returned. Get vertices from edges. Get vertex from of an edge Get the source vertex of an edge graph._fromVertex(edgeId) Returns the vertex defined with the attribute _from of the edge with edgeId as its _id. Parameters edgeId (required) _id attribute of the edge Examples 144 Graph Functions arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> var any = require("@arangodb").db.relation.any(); arangosh> graph._fromVertex("relation/" + any._key); show execution results Get vertex to of an edge Get the target vertex of an edge graph._toVertex(edgeId) Returns the vertex defined with the attribute _to of the edge with edgeId as its _id. Parameters edgeId (required) _id attribute of the edge Examples arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("social"); arangosh> var any = require("@arangodb").db.relation.any(); arangosh> graph._toVertex("relation/" + any._key); show execution results _neighbors Get all neighbors of the vertices defined by the example graph._neighbors(vertexExample, options) The function accepts an id, an example, a list of examples or even an empty example as parameter for vertexExample. The complexity of this method is O(n*m^x) with n being the vertices defined by the parameter vertexExamplex, m the average amount of neighbors and x the maximal depths. Hence the default call would have a complexity of O(n*m); Parameters vertexExample (optional) See Definition of examples options (optional) An object defining further options. Can have the following values: direction: The direction of the edges. Possible values are outbound, inbound and any (default). edgeExamples: Filter the edges, see Definition of examples neighborExamples: Filter the neighbor vertices, see Definition of examples edgeCollectionRestriction : One or a list of edge-collection names that should be considered to be on the path. vertexCollectionRestriction : One or a list of vertex-collection names that should be considered on the intermediate vertex steps. minDepth: Defines the minimal number of intermediate steps to neighbors (default is 1). maxDepth: Defines the maximal number of intermediate steps to neighbors (default is 1). Examples A route planner example, all neighbors of capitals. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._neighbors({isCapital : true}); show execution results A route planner example, all outbound neighbors of Hamburg. 145 Graph Functions arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._neighbors('germanCity/Hamburg', {direction : 'outbound', maxDepth : 2}); show execution results _commonNeighbors Get all common neighbors of the vertices defined by the examples. graph._commonNeighbors(vertex1Example, vertex2Examples, optionsVertex1, optionsVertex2) This function returns the intersection of graph_module._neighbors(vertex1Example, optionsVertex1) and graph_module._neighbors(vertex2Example, optionsVertex2). For parameter documentation see _neighbors. The complexity of this method is O(n*m^x) with n being the maximal amount of vertices defined by the parameters vertexExamples, m the average amount of neighbors and x the maximal depths. Hence the default call would have a complexity of O(n*m); Examples A route planner example, all common neighbors of capitals. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._commonNeighbors({isCapital : true}, {isCapital : true}); show execution results A route planner example, all common outbound neighbors of Hamburg with any other location which have a maximal depth of 2 : arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._commonNeighbors( ........> 'germanCity/Hamburg', ........> {}, ........> {direction : 'outbound', maxDepth : 2}, ........> {direction : 'outbound', maxDepth : 2}); show execution results _countCommonNeighbors Get the amount of common neighbors of the vertices defined by the examples. graph._countCommonNeighbors(vertex1Example, vertex2Examples, optionsVertex1, optionsVertex2) Similar to _commonNeighbors but returns count instead of the elements. Examples A route planner example, all common neighbors of capitals. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> var example = { isCapital: true }; arangosh> var options = { includeData: true }; arangosh> graph._countCommonNeighbors(example, example, options, options); show execution results 146 Graph Functions A route planner example, all common outbound neighbors of Hamburg with any other location which have a maximal depth of 2 : arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> var options = { direction: 'outbound', maxDepth: 2, includeData: true }; arangosh> graph._countCommonNeighbors('germanCity/Hamburg', {}, options, options); show execution results _commonProperties Get the vertices of the graph that share common properties. graph._commonProperties(vertex1Example, vertex2Examples, options) The function accepts an id, an example, a list of examples or even an empty example as parameter for vertex1Example and vertex2Example. The complexity of this method is O(n) with n being the maximal amount of vertices defined by the parameters vertexExamples. Parameters vertex1Examples (optional) Filter the set of source vertices, see Definition of examples vertex2Examples (optional) Filter the set of vertices compared to, see Definition of examples options (optional) An object defining further options. Can have the following values: vertex1CollectionRestriction : One or a list of vertex-collection names that should be searched for source vertices. vertex2CollectionRestriction : One or a list of vertex-collection names that should be searched for compare vertices. ignoreProperties : One or a list of attribute names of a document that should be ignored. Examples A route planner example, all locations with the same properties: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._commonProperties({}, {}); show execution results A route planner example, all cities which share same properties except for population. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._commonProperties({}, {}, {ignoreProperties: 'population'}); show execution results _countCommonProperties Get the amount of vertices of the graph that share common properties. graph._countCommonProperties(vertex1Example, vertex2Examples, options) Similar to _commonProperties but returns count instead of the objects. Examples A route planner example, all locations with the same properties: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); 147 Graph Functions arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._countCommonProperties({}, {}); show execution results A route planner example, all German cities which share same properties except for population. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._countCommonProperties({}, {}, {vertex1CollectionRestriction : 'germanCity', ........> vertex2CollectionRestriction : 'germanCity' ,ignoreProperties: 'population'}); show execution results _paths The _paths function returns all paths of a graph. graph._paths(options) This function determines all available paths in a graph. The complexity of this method is O(n*n*m) with n being the amount of vertices in the graph and m the average amount of connected edges; Parameters options (optional) An object containing options, see below: direction: The direction of the edges. Possible values are any, inbound and outbound (default). followCycles (optional): If set to true the query follows cycles in the graph, default is false. minLength (optional): Defines the minimal length a path must have to be returned (default is 0). maxLength (optional): Defines the maximal length a path must have to be returned (default is 10). Examples Return all paths of the graph "social": arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("social"); arangosh> g._paths(); show execution results Return all inbound paths of the graph "social" with a maximal length of 1 and a minimal length of 2: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("social"); arangosh> g._paths({direction : 'inbound', minLength : 1, maxLength : 2}); show execution results _shortestPath The _shortestPath function returns all shortest paths of a graph. graph._shortestPath(startVertexExample, endVertexExample, options) 148 Graph Functions This function determines all shortest paths in a graph. The function accepts an id, an example, a list of examples or even an empty example as parameter for start and end vertex. The length of a path is by default the amount of edges from one start vertex to an end vertex. The option weight allows the user to define an edge attribute representing the length. Parameters startVertexExample (optional) An example for the desired start Vertices (see Definition of examples). endVertexExample (optional) An example for the desired end Vertices (see Definition of examples). options (optional) An object containing options, see below: direction: The direction of the edges as a string. Possible values are outbound, inbound and any (default). edgeCollectionRestriction: One or multiple edge collection names. Only edges from these collections will be considered for the path. startVertexCollectionRestriction: One or multiple vertex collection names. Only vertices from these collections will be considered as start vertex of a path. endVertexCollectionRestriction: One or multiple vertex collection names. Only vertices from these collections will be considered as end vertex of a path. weight: The name of the attribute of the edges containing the length as a string. defaultWeight: Only used with the option weight. If an edge does not have the attribute named as defined in option weight this default is used as length. If no default is supplied the default would be positive Infinity so the path could not be calculated. Examples A route planner example, shortest path from all german to all french cities: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("routeplanner"); arangosh> g._shortestPath({}, {}, {weight : 'distance', endVertexCollectionRestriction : 'frenchCity', ........> startVertexCollectionRestriction : 'germanCity'}); show execution results A route planner example, shortest path from Hamburg and Cologne to Lyon: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("routeplanner"); arangosh> g._shortestPath([{_id: 'germanCity/Cologne'},{_id: 'germanCity/Munich'}], 'frenchCity/Lyon', ........> {weight : 'distance'}); show execution results _distanceTo The _distanceTo function returns all paths and there distance within a graph. graph._distanceTo(startVertexExample, endVertexExample, options) This function is a wrapper of graph._shortestPath. It does not return the actual path but only the distance between two vertices. Examples A route planner example, shortest distance from all german to all french cities: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("routeplanner"); arangosh> g._distanceTo({}, {}, {weight : 'distance', endVertexCollectionRestriction : 'frenchCity', ........> startVertexCollectionRestriction : 'germanCity'}); 149 Graph Functions show execution results A route planner example, shortest distance from Hamburg and Cologne to Lyon: arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var g = examples.loadGraph("routeplanner"); arangosh> g._distanceTo([{_id: 'germanCity/Cologne'},{_id: 'germanCity/Munich'}], 'frenchCity/Lyon', ........> {weight : 'distance'}); show execution results _absoluteEccentricity Get the eccentricity of the vertices defined by the examples. graph._absoluteEccentricity(vertexExample, options) The function accepts an id, an example, a list of examples or even an empty example as parameter for vertexExample. Parameters vertexExample (optional) Filter the vertices, see Definition of examples options (optional) An object defining further options. Can have the following values: direction: The direction of the edges. Possible values are outbound, inbound and any (default). edgeCollectionRestriction : One or a list of edge-collection names that should be considered to be on the path. startVertexCollectionRestriction : One or a list of vertex-collection names that should be considered for source vertices. endVertexCollectionRestriction : One or a list of vertex-collection names that should be considered for target vertices. weight: The name of the attribute of the edges containing the weight. defaultWeight: Only used with the option weight. If an edge does not have the attribute named as defined in option weight this default is used as weight. If no default is supplied the default would be positive infinity so the path and hence the eccentricity can not be calculated. Examples A route planner example, the absolute eccentricity of all locations. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._absoluteEccentricity({}); show execution results A route planner example, the absolute eccentricity of all locations. This considers the actual distances. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._absoluteEccentricity({}, {weight : 'distance'}); show execution results A route planner example, the absolute eccentricity of all cities regarding only outbound paths. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._absoluteEccentricity({}, {startVertexCollectionRestriction : 'germanCity', ........> direction : 'outbound', weight : 'distance'}); show execution results 150 Graph Functions _eccentricity Get the normalized eccentricity of the vertices defined by the examples. graph._eccentricity(vertexExample, options) Similar to _absoluteEccentricity but returns a normalized result. Examples A route planner example, the eccentricity of all locations. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._eccentricity(); show execution results A route planner example, the weighted eccentricity. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._eccentricity({weight : 'distance'}); show execution results _absoluteCloseness Get the closeness of the vertices defined by the examples. graph._absoluteCloseness(vertexExample, options) The function accepts an id, an example, a list of examples or even an empty example as parameter for vertexExample. Parameters vertexExample (optional) Filter the vertices, see Definition of examples options (optional) An object defining further options. Can have the following values: direction: The direction of the edges. Possible values are outbound, inbound and any (default). edgeCollectionRestriction : One or a list of edge-collection names that should be considered to be on the path. startVertexCollectionRestriction : One or a list of vertex-collection names that should be considered for source vertices. endVertexCollectionRestriction : One or a list of vertex-collection names that should be considered for target vertices. weight: The name of the attribute of the edges containing the weight. defaultWeight: Only used with the option weight. If an edge does not have the attribute named as defined in option weight this default is used as weight. If no default is supplied the default would be positive infinity so the path and hence the closeness can not be calculated. Examples A route planner example, the absolute closeness of all locations. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._absoluteCloseness({}); show execution results A route planner example, the absolute closeness of all locations. This considers the actual distances. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); 151 Graph Functions arangosh> graph._absoluteCloseness({}, {weight : 'distance'}); show execution results A route planner example, the absolute closeness of all German Cities regarding only outbound paths. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._absoluteCloseness({}, {startVertexCollectionRestriction : 'germanCity', ........> direction : 'outbound', weight : 'distance'}); show execution results _closeness Get the normalized closeness of graphs vertices. graph._closeness(options) Similar to _absoluteCloseness but returns a normalized value. Examples A route planner example, the normalized closeness of all locations. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._closeness(); show execution results A route planner example, the closeness of all locations. This considers the actual distances. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._closeness({weight : 'distance'}); show execution results A route planner example, the closeness of all cities regarding only outbound paths. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._closeness({direction : 'outbound', weight : 'distance'}); show execution results _absoluteBetweenness Get the betweenness of all vertices in the graph. graph._absoluteBetweenness(vertexExample, options) Parameters vertexExample (optional) Filter the vertices, see Definition of examples options (optional) An object defining further options. Can have the following values: direction: The direction of the edges. Possible values are outbound, inbound and any (default). weight: The name of the attribute of the edges containing the weight. defaultWeight: Only used with the option weight. If an edge does not have the attribute named as defined in option weight this 152 Graph Functions default is used as weight. If no default is supplied the default would be positive infinity so the path and hence the betweeness can not be calculated. Examples A route planner example, the absolute betweenness of all locations. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._absoluteBetweenness({}); show execution results A route planner example, the absolute betweenness of all locations. This considers the actual distances. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._absoluteBetweenness({weight : 'distance'}); { } A route planner example, the absolute betweenness of all cities regarding only outbound paths. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._absoluteBetweenness({direction : 'outbound', weight : 'distance'}); { } _betweenness Get the normalized betweenness of graphs vertices. graph_module._betweenness(options) Similar to _absoluteBetweeness but returns normalized values. Examples A route planner example, the betweenness of all locations. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._betweenness(); show execution results A route planner example, the betweenness of all locations. This considers the actual distances. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._betweenness({weight : 'distance'}); show execution results A route planner example, the betweenness of all cities regarding only outbound paths. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._betweenness({direction : 'outbound', weight : 'distance'}); 153 Graph Functions show execution results _radius Get the radius of a graph. ` Parameters options (optional) An object defining further options. Can have the following values: direction: The direction of the edges. Possible values are outbound, inbound and any (default). weight: The name of the attribute of the edges containing the weight. defaultWeight: Only used with the option weight. If an edge does not have the attribute named as defined in option weight this default is used as weight. If no default is supplied the default would be positive infinity so the path and hence the radius can not be calculated. Examples A route planner example, the radius of the graph. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._radius(); 1 A route planner example, the radius of the graph. This considers the actual distances. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._radius({weight : 'distance'}); 1 A route planner example, the radius of the graph regarding only outbound paths. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._radius({direction : 'outbound', weight : 'distance'}); 1 _diameter Get the diameter of a graph. graph._diameter(graphName, options) Parameters options (optional) An object defining further options. Can have the following values: direction: The direction of the edges. Possible values are outbound, inbound and any (default). weight: The name of the attribute of the edges containing the weight. defaultWeight: Only used with the option weight. If an edge does not have the attribute named as defined in option weight this default is used as weight. If no default is supplied the default would be positive infinity so the path and hence the radius can not be calculated. Examples 154 Graph Functions A route planner example, the diameter of the graph. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._diameter(); 1 A route planner example, the diameter of the graph. This considers the actual distances. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._diameter({weight : 'distance'}); 1 A route planner example, the diameter of the graph regarding only outbound paths. arangosh> var examples = require("@arangodb/graph-examples/example-graph.js"); arangosh> var graph = examples.loadGraph("routeplanner"); arangosh> graph._diameter({direction : 'outbound', weight : 'distance'}); 1 155 SmartGraphs SmartGraphs This feature is only available in the Enterprise Edition. This chapter describes the smart-graph module. It enables you to manage graphs at scale, it will give a vast performance benefit for all graphs sharded in an ArangoDB Cluster. On a single server this feature is pointless, hence it is only available in a cluster mode. In terms of querying there is no difference between smart and General Graphs. The former are a transparent replacement for the latter. So for querying the graph please refer to AQL Graph Operations and Graph Functions sections. The optimizer is clever enough to identify if we are on a SmartGraph or not. The difference is only in the management section: creating and modifying the underlying collections of the graph. For a detailed API reference please refer to SmartGraph M anagement. What makes a graph smart? M ost graphs have one feature that divides the entire graph into several smaller subgraphs. These subgraphs have a large amount of edges that only connect vertices in the same subgraph and only have few edges connecting vertices from other subgraphs. Examples for these graphs are: Social Networks Typically the feature here is the region/country users live in. Every user typicalliy has more contacts in the same region/country then she has in other regions/countries Transport Systems For those also the feature is the region/country. You have many local transportion but only few accross countries. E-Commerce In this case probably the category of products is a good feature. Often products of the same category are bought together. If this feature is known, SmartGraphs can make use if it. When creating a SmartGraph you have to define a smartAttribute, which is the name of an attribute stored in every vertex. The graph will than be automatically sharded in such a way that all vertices with the same value are stored on the same physical machine, all edges connecting vertices with identical smartAttribute values are stored on this machine as well. During query time the query optimizer and the query executor both know for every document exactly where it is stored and can thereby minimize network overhead. Everything that can be computed locally will be computed locally. Benefits of SmartGraphs Because of the above described guaranteed sharding, the performance of queries that only cover one subgraph have a performance almost equal to an only local computation. Queries that cover more than one subgraph require some network overhead. The more subgraphs are touched the more network cost will apply. However the overall performance is never worse than the same query on a General Graph. Getting started First of all SmartGraphs cannot use existing collections, when switching to SmartGraph from an existing data set you have to import the data into a fresh SmartGraph. This switch can be easily achieved with arangodump and arangorestore. The only thing you have to change in this pipeline is that you create the new collections with the SmartGraph before starting arangorestore . Create a graph In comparison to General Graph we have to add more options when creating the graph. The two options numberOfShards smartGraphAttribute and are required and cannot be modifed later. arangosh> var graph_module = require("@arangodb/smart-graph"); arangosh> var graph = graph_module._create("myGraph", [], [], {smartGraphAttribute: "region", numberOfShards: 9}); arangosh> graph; [ SmartGraph myGraph EdgeDefinitions: [ ] VertexCollections: [ ] ] Add some vertex collections 156 SmartGraphs This is again identical to General Graph. The module will setup correct sharding for all these collections. Note: The collections have to be new. arangosh> graph._addVertexCollection("shop"); arangosh> graph._addVertexCollection("customer"); arangosh> graph._addVertexCollection("pet"); arangosh> graph; [ SmartGraph myGraph EdgeDefinitions: [ ] VertexCollections: [ "shop", "customer", "pet" ] ] Define relations on the Graph arangosh> var rel = graph_module._relation("isCustomer", ["shop"], ["customer"]); arangosh> graph._extendEdgeDefinitions(rel); arangosh> graph; [ SmartGraph myGraph EdgeDefinitions: [ "isCustomer: [shop] -> [customer]" ] VertexCollections: [ "pet" ] ] 157 SmartGraph M anagement Smart Graph Management This chapter describes the JavaScript interface for creating and modifying SmartGraphs. At first you have to note that every SmartGraph is a specialized version of a General Graph, which means all of the General Graph functionality is available on a SmartGraph as well. The major difference of both modules is handling of the underlying collections, the General Graph does not enforce or maintain any sharding of the collections and can therefor combine arbitrary sets of existing collections. SmartGraphs enforce and rely on a special sharding of the underlying collections and hence can only work with collections that are created through the SmartGraph itself. This also means that SmartGraphs cannot be overlapping, a collection can either be sharded for one SmartGraph or for the other. If you need to make sure that all queries can be executed with SmartGraph performance, just create one large SmartGraph covering everything and query it stating the subset of edge collections explicitly. To generally understand the concept of this module please read the chapter about General Graph M anagement first. In the following we will only describe the overloaded functionality. Everything else works identical in both modules. Create a graph Also SmartGraphs require edge relations to be created, the format of the relations is identical. The only difference is that all collections used within the relations to create a new SmartGraph cannot exist yet. They have to be created by the Graph in order to enforce the correct sharding. Create a graph graph_module._create(graphName, edgeDefinitions, orphanCollections, smartOptions) The creation of a graph requires the name and some SmartGraph options. Due to the API have to be given, but both can be empty arrays and can be created later. The method _relation known from the general-graph edgeDefinitions edgeDefinitions module, which is also available here. and orphanCollections can be created using the convenience orphanCollections again is just a list of additional vertex collections which are not yet connected via edges but should follow the same sharding to be connected later on. All collections used within the creation process are newly created. The process will fail if one of them already exists. All newly created collections will immediately be dropped again in the failed case. Parameters graphName (required) Unique identifier of the graph edgeDefinitions (required) List of relation definition objects, may be empty orphanCollections (required) List of additional vertex collection names, may be empty smartOptions (required) A JSON object having the following keys: numberOfShards (required) The number of shards that will be created for each collection. To maintain the correct sharding all collections need an identical number of shards. This cannot be modified after creation of the graph. smartGraphAttribute (required) The attribute that will be used for sharding. All vertices are required to have this attribute set and it has to be a string. Edges derive the attribute from their connected vertices. Examples Create an empty graph, edge definitions can be added at runtime: arangosh> var graph_module = require("@arangodb/smart-graph"); arangosh> var graph = graph_module._create("myGraph", [], [], {smartGraphAttribute: "region", numberOfShards: 9}); [ SmartGraph myGraph EdgeDefinitions: [ ] VertexCollections: [ ] ] Create a graph using an edge collection edges and a single vertex collection vertices arangosh> var graph_module = require("@arangodb/smart-graph"); arangosh> var edgeDefinitions = [ graph_module._relation("edges", "vertices", "vertices") ]; arangosh> var graph = graph_module._create("myGraph", edgeDefinitions, [], {smartGraphAttribute: "region", numberOfShards: 9}); [ SmartGraph myGraph EdgeDefinitions: [ "edges: [vertices] -> [vertices]" ] VertexCollections: [ ] ] Create a graph with edge definitions and orphan collections: arangosh> var graph_module = require("@arangodb/smart-graph"); arangosh> var edgeDefinitions = [ graph_module._relation("myRelation", ["male", "female"], ["male", "female"]) ]; 158 SmartGraph M anagement arangosh> var graph = graph_module._create("myGraph", edgeDefinitions, ["sessions"], {smartGraphAttribute: "region", numberOfSh ards: 9}); [ Graph myGraph EdgeDefinitions: [ "myRelation: [female, male] -> [female, male]" ] VertexCollections: [ "sessions" ] ] Modify a graph definition during runtime After you have created a SmartGraph its definition is also not immutable. You can still add or remove relations. This is again identical to General Graphs. However there is one important difference: You can only add collections that either do not exist, or that have been created by this graph earlier. The later can be achieved if you for example remove an orphan collection from this graph, without dropping the collection itself. Than after some time you decide to add it again, it can be used. This is because the enforced sharding is still applied to this vertex collection, hence it is suitable to be added again. Remove a vertex collection Remove a vertex collection from the graph graph._removeVertexCollection(vertexCollectionName, dropCollection) In most cases this function works identically to the General Graph one. But there is one special case: The first vertex collection added to the graph (either orphan or within a relation) defines the sharding for all collections within the graph. This collection can never be removed from the graph. Parameters vertexCollectionName (required) Name of vertex collection. dropCollection (optional) If true the collection will be dropped if it is not used in any other graph. Default: false. Examples The following example shows that you cannot drop the initial collection. You have to drop the complete graph. If you just want to get rid of the data truncate it. arangosh> var graph_module = require("@arangodb/smart-graph") arangosh> var relation = graph_module._relation("edges", "vertices", "vertices"); arangosh> var graph = graph_module._create("myGraph", [relation], ["other"], {smartGraphAttribute: "region", numberOfShards: 9}); arangosh> graph._orphanCollections(); [ "other" ] arangosh> graph._deleteEdgeDefinition("edges"); arangosh> graph._orphanCollections(); [ "vertices", "other" ] arangosh> graph._removeVertexCollection("other"); arangosh> graph._orphanCollections(); [ "vertices" ] arangosh> graph._removeVertexCollection("vertices"); ArangoError 4002: cannot drop this smart collection 159 Traversals Traversals ArangoDB provides several ways to query graph data. Very simple operations can be composed with the low-level edge methods edges, inEdges, and outEdges for edge collections. These work on named and anonymous graphs. For more complex operations, ArangoDB provides predefined traversal objects. Also Traversals have been added to AQL. Please read the chapter about AQL traversersals before you continue reading here. M ost of the traversal cases are covered by AQL and will be executed in an optimized way. Only if the logic for your is too complex to be defined using AQL filters you can use the traversal object defined here which gives you complete programmatic access to the data. For any of the following examples, we'll be using the example collections v and e, populated with continents, countries and capitals data listed below (see Example Data). Starting from Scratch ArangoDB provides the edges, inEdges, and outEdges methods for edge collections. These methods can be used to quickly determine if a vertex is connected to other vertices, and which. This functionality can be exploited to write very simple graph queries in JavaScript. For example, to determine which edges are linked to the world vertex, we can use inEdges: db.e.inEdges('v/world').forEach(function(edge) { require("@arangodb").print(edge._from, "->", edge.type, "->", edge._to); }); inEdges will give us all ingoing edges for the specified vertex v/world. The result is a JavaScript array that we can iterate over and print the results: v/continent-africa -> is-in -> v/world v/continent-south-america -> is-in -> v/world v/continent-asia -> is-in -> v/world v/continent-australia -> is-in -> v/world v/continent-europe -> is-in -> v/world v/continent-north-america -> is-in -> v/world Note: edges, inEdges, and outEdges return an array of edges. If we want to retrieve the linked vertices, we can use each edges' _from and _to attributes as follows: db.e.inEdges('v/world').forEach(function(edge) { require("@arangodb").print(db._document(edge._from).name, "->", edge.type, "->", db._document(edge._to).name); }); We are using the document method from the db object to retrieve the connected vertices now. While this may be sufficient for one-level graph operations, writing a traversal by hand may become too complex for multi-level traversals. 160 Using Traversal Objects Getting started To use a traversal object, we first need to require the traversal module: var traversal = require("@arangodb/graph/traversal"); var examples = require("@arangodb/graph-examples/example-graph.js"); examples.loadGraph("worldCountry"); We then need to setup a configuration for the traversal and determine at which vertex to start the traversal: var config = { datasource: traversal.generalGraphDatasourceFactory("worldCountry"), strategy: "depthfirst", order: "preorder", filter: traversal.visitAllFilter, expander: traversal.inboundExpander, maxDepth: 1 }; var startVertex = db._document("v/world"); Note: The startVertex needs to be a document, not only a document id. We can then create a traverser and start the traversal by calling its traverse method. Note that traverse needs a result object, which it can modify in place: var result = { visited: { vertices: [ ], paths: [ ] } }; var traverser = new traversal.Traverser(config); traverser.traverse(result, startVertex); Finally, we can print the contents of the results object, limited to the visited vertices. We will only print the name and type of each visited vertex for brevity: require("@arangodb").print(result.visited.vertices.map(function(vertex) { return vertex.name + " (" + vertex.type + ")"; })); The full script, which includes all steps carried out so far is thus: var traversal = require("@arangodb/graph/traversal"); var config = { datasource: traversal.generalGraphDatasourceFactory("worldCountry"), strategy: "depthfirst", order: "preorder", filter: traversal.visitAllFilter, expander: traversal.inboundExpander, maxDepth: 1 }; var startVertex = db._document("v/world"); var result = { visited: { vertices: [ ], paths: [ ] } }; var traverser = new traversal.Traverser(config); traverser.traverse(result, startVertex); 161 Using Traversal Objects require("@arangodb").print(result.visited.vertices.map(function(vertex) { return vertex.name + " (" + vertex.type + ")"; })); The result is an array of vertices that were visited during the traversal, starting at the start vertex (i.e. v/world in our example): [ "World (root)", "Africa (continent)", "Asia (continent)", "Australia (continent)", "Europe (continent)", "North America (continent)", "South America (continent)" ] Note: The result is limited to vertices directly connected to the start vertex. We achieved this by setting the maxDepth attribute to 1. Not setting it would return the full array of vertices. Traversal Direction For the examples contained in this manual, we'll be starting the traversals at vertex v/world. Vertices in our graph are connected like this: v/world <- is-in <- continent (Africa) <- is-in <- country (Algeria) <- is-in <- capital (Algiers) To get any meaningful results, we must traverse the graph in inbound order. This means, we'll be following all incoming edges of to a vertex. In the traversal configuration, we have specified this via the expander attribute: var config = { ... expander: traversal.inboundExpander }; For other graphs, we might want to traverse via the outgoing edges. For this, we can use the outboundExpander. There is also an anyExpander, which will follow both outgoing and incoming edges. This should be used with care and the traversal should always be limited to a maximum number of iterations (e.g. using the maxIterations attribute) in order to terminate at some point. To invoke the default outbound expander for a graph, simply use the predefined function: var config = { ... expander: traversal.outboundExpander }; Please note the outbound expander will not produce any output for the examples if we still start the traversal at the v/world vertex. Still, we can use the outbound expander if we start somewhere else in the graph, e.g. var traversal = require("@arangodb/graph/traversal"); var config = { datasource: traversal.generalGraphDatasourceFactory("world_graph"), strategy: "depthfirst", order: "preorder", filter: traversal.visitAllFilter, expander: traversal.outboundExpander }; var startVertex = db._document("v/capital-algiers"); var result = { visited: { vertices: [ ], paths: [ ] } }; 162 Using Traversal Objects var traverser = new traversal.Traverser(config); traverser.traverse(result, startVertex); require("@arangodb").print(result.visited.vertices.map(function(vertex) { return vertex.name + " (" + vertex.type + ")"; })); The result is: [ "Algiers (capital)", "Algeria (country)", "Africa (continent)", "World (root)" ] which confirms that now we're going outbound. Traversal Strategy Depth-first traversals The visitation order of vertices is determined by the strategy and order attributes set in the configuration. We chose depthfirst and preorder, meaning the traverser will visit each vertex before handling connected edges (pre-order), and descend into any connected edges before processing other vertices on the same level (depth-first). Let's remove the maxDepth attribute now. We'll now be getting all vertices (directly and indirectly connected to the start vertex): var config = { datasource: traversal.generalGraphDatasourceFactory("world_graph"), strategy: "depthfirst", order: "preorder", filter: traversal.visitAllFilter, expander: traversal.inboundExpander }; var result = { visited: { vertices: [ ], paths: [ ] } }; var traverser = new traversal.Traverser(config); traverser.traverse(result, startVertex); require("@arangodb").print(result.visited.vertices.map(function(vertex) { return vertex.name + " (" + vertex.type + ")"; })); The result will be a longer array, assembled in depth-first, pre-order order. For each continent found, the traverser will descend into linked countries, and then into the linked capital: [ "World (root)", "Africa (continent)", "Algeria (country)", "Algiers (capital)", "Angola (country)", "Luanda (capital)", "Botswana (country)", "Gaborone (capital)", "Burkina Faso (country)", "Ouagadougou (capital)", ... ] 163 Using Traversal Objects Let's switch the order attribute from preorder to postorder. This will make the traverser visit vertices after all connected vertices were visited (i.e. most distant vertices will be emitted first): [ "Algiers (capital)", "Algeria (country)", "Luanda (capital)", "Angola (country)", "Gaborone (capital)", "Botswana (country)", "Ouagadougou (capital)", "Burkina Faso (country)", "Bujumbura (capital)", "Burundi (country)", "Yaounde (capital)", "Cameroon (country)", "N'Djamena (capital)", "Chad (country)", "Yamoussoukro (capital)", "Cote d'Ivoire (country)", "Cairo (capital)", "Egypt (country)", "Asmara (capital)", "Eritrea (country)", "Africa (continent)", ... ] Breadth-first traversals If we go back to preorder, but change the strategy to breadth-first and re-run the traversal, we'll see that the return order changes, and items on the same level will be returned adjacently: [ "World (root)", "Africa (continent)", "Asia (continent)", "Australia (continent)", "Europe (continent)", "North America (continent)", "South America (continent)", "Burkina Faso (country)", "Burundi (country)", "Cameroon (country)", "Chad (country)", "Algeria (country)", "Angola (country)", ... ] Note: The order of items returned for the same level is undefined. This is because there is no natural order of edges for a vertex with multiple connected edges. To explicitly set the order for edges on the same level, you can specify an edge comparator function with the sort attribute: var config = { ... sort: function (l, r) { return l._key < r._key ? 1 : -1; } ... }; The arguments l and r are edge documents. This will traverse edges of the same vertex in backward _key order: [ "World (root)", "South America (continent)", "North America (continent)", "Europe (continent)", "Australia (continent)", "Asia (continent)", 164 Using Traversal Objects "Africa (continent)", "Ecuador (country)", "Colombia (country)", "Chile (country)", "Brazil (country)", "Bolivia (country)", "Argentina (country)", ... ] Note: This attribute only works for the usual expanders traversal.inboundExpander, traversal.outboundExpander, traversal.anyExpander and their corresponding "WithLabels" variants. If you are using custom expanders you have to organize the sorting within the specified expander. Writing Custom Visitors So far we have used much of the traverser's default functions. The traverser is very configurable and many of the default functions can be overridden with custom functionality. For example, we have been using the default visitor function (which is always used if the configuration does not contain the visitor attribute). The default visitor function is called for each vertex in a traversal, and will push it into the result. This is the reason why the result variable looked different after the traversal, and needed to be initialized before the traversal was started. Note that the default visitor (named trackingVisitor ) will add every visited vertex into the result, including the full paths from the start vertex. This is useful for learning and debugging purposes, but should be avoided in production because it might produce (and copy) huge amounts of data. Instead, only those data should be copied into the result that are actually necessary. The traverser comes with the following predefined visitors: trackingVisitor: this is the default visitor. It will copy all data of each visited vertex plus the full path information into the result. This can be slow if the result set is huge or vertices contain a lot of data. countingVisitor: this is a very lightweight visitor: all it does is increase a counter in the result for each vertex visited. Vertex data and paths will not be copied into the result. doNothingVisitor: if no action shall be carried out when a vertex is visited, this visitor can be employed. It will not do anything and will thus be fast. It can be used for performance comparisons with other visitors. We can also write our own visitor function if we want to. The general function signature for visitor functions is as follows: var config = { ... visitor: function (config, result, vertex, path, connected) { ... } }; Note: the connected parameter value will only be set if the traversal order is set to preorder-expander. Otherwise, this parameter won't be set by the traverser. Visitor functions are not expected to return any values. Instead, they can modify the result variable (e.g. by pushing the current vertex into it), or do anything else. For example, we can create a simple visitor function that only prints information about the current vertex as we traverse: var config = { datasource: traversal.generalGraphDatasourceFactory("world_graph"), strategy: "depthfirst", order: "preorder", filter: traversal.visitAllFilter, expander: traversal.inboundExpander, visitor: function (config, result, vertex, path) { require("@arangodb").print("visiting vertex", vertex.name); } }; var traverser = new traversal.Traverser(config); traverser.traverse(undefined, startVertex); To write a visitor that increments a counter each time a vertex is visited, we could write the following custom visitor: 165 Using Traversal Objects config.visitor = function (config, result, vertex, path, connected) { if (! result) { result = { }; } if (! result.hasOwnProperty('count')) { result.count = 0; } ++result.count; } Note that such visitor is already predefined (it's the countingVisitor described above). It can be used as follows: config.visitor = traversal.countingVisitor; Another example of a visitor is one that collects the _id values of all vertices visited: config.visitor = function (config, result, vertex, path, connected) { if (! result) { result = { }; } if (! result.hasOwnProperty("visited")) { result.visited = { vertices: [ ] }; } result.visited.vertices.push(vertex._id); } When the traversal order is set to preorder-expander, the traverser will pass a fifth parameter value into the visitor function. This parameter contains the connected edges of the visited vertex as an array. This can be handy because in this case the visitor will get all information about the vertex and the connected edges together. For example, the following visitor can be used to print only leaf nodes (that do not have any further connected edges): config.visitor = function (config, result, vertex, path, connected) { if (connected && connected.length === 0) { require("@arangodb").print("found a leaf-node: ", vertex); } } Note that for this visitor to work, the traversal order attribute needs to be set to the value preorder-expander. Filtering Vertices and Edges Filtering Vertices So far we have printed or returned all vertices that were visited during the traversal. This is not always required. If the result shall be restrict to just specific vertices, we can use a filter function for vertices. It can be defined by setting the filter attribute of a traversal configuration, e.g.: var config = { filter: function (config, vertex, path) { if (vertex.type !== 'capital') { return 'exclude'; } } } The above filter function will exclude all vertices that do not have a type value of capital. The filter function will be called for each vertex found during the traversal. It will receive the traversal configuration, the current vertex, and the full path from the traversal start vertex to the current vertex. The path consists of an array of edges, and an array of vertices. We could also filter everything but capitals by 166 Using Traversal Objects checking the length of the path from the start vertex to the current vertex. Capitals will have a distance of 3 from the v/world start vertex (capital → is-in → country → is-in → continent → is-in → world): var config = { ... filter: function (config, vertex, path) { if (path.edges.length < 3) { return 'exclude'; } } } Note: If a filter function returns nothing (or undefined), the current vertex will be included, and all connected edges will be followed. If a filter function returns exclude the current vertex will be excluded from the result, and all still all connected edges will be followed. If a filter function returns prune, the current vertex will be included, but no connected edges will be followed. For example, the following filter function will not descend into connected edges of continents, limiting the depth of the traversal. Still, continent vertices will be included in the result: var config = { ... filter: function (config, vertex, path) { if (vertex.type === 'continent') { return 'prune'; } } } It is also possible to combine exclude and prune by returning an array with both values: return [ 'exclude', 'prune' ]; Filtering Edges It is possible to exclude certain edges from the traversal. To filter on edges, a filter function can be defined via the expandFilter attribute. The expandFilter is a function which is called for each edge during a traversal. It will receive the current edge (edge variable) and the vertex which the edge connects to (in the direction of the traversal). It also receives the current path from the start vertex up to the current vertex (excluding the current edge and the vertex the edge points to). If the function returns true, the edge will be followed. If the function returns false, the edge will not be followed. Here is a very simple custom edge filter function implementation, which simply includes edges if the (edges) path length is less than 1, and will exclude any other edges. This will effectively terminate the traversal after the first level of edges: var config = { ... expandFilter: function (config, vertex, edge, path) { return (path.edges.length < 1); } }; Writing Custom Expanders The edges connected to a vertex are determined by the expander. So far we have used a default expander (the default inbound expander to be precise). The default inbound expander simply enumerates all connected ingoing edges for a vertex, based on the edge collection specified in the traversal configuration. There is also a default outbound expander, which will enumerate all connected outgoing edges. Finally, there is an any expander, which will follow both ingoing and outgoing edges. If connected edges must be determined in some different fashion for whatever reason, a custom expander can be written and registered by setting the expander attribute of the configuration. The expander function signature is as follows: 167 Using Traversal Objects var config = { ... expander: function (config, vertex, path) { ... } } It is the expander's responsibility to return all edges and vertices directly connected to the current vertex (which is passed via the vertex variable). The full path from the start vertex up to the current vertex is also supplied via the path variable. An expander is expected to return an array of objects, which need to have an edge and a vertex attribute each. Note: If you want to rely on a particular order in which the edges are traversed, you have to sort the edges returned by your expander within the code of the expander. The functions to get outbound, inbound or any edges from a vertex do not guarantee any particular order! A custom implementation of an inbound expander could look like this (this is a non-deterministic expander, which randomly decides whether or not to include connected edges): var config = { ... expander: function (config, vertex, path) { var connected = [ ]; var datasource = config.datasource; datasource.getInEdges(vertex._id).forEach(function (edge) { if (Math.random() >= 0.5) { connected.push({ edge: edge, vertex: (edge._from) }); } }); return connected; } }; A custom expander can also be used as an edge filter because it has full control over which edges will be returned. Following are two examples of custom expanders that pick edges based on attributes of the edges and the connected vertices. Finding the connected edges / vertices based on an attribute when in the connected vertices. The goal is to follow the edge that leads to the vertex with the highest value in the when attribute: var config = { ... expander: function (config, vertex, path) { var datasource = config.datasource; // determine all outgoing edges var outEdges = datasource.getOutEdges(vertex); if (outEdges.length === 0) { return [ ]; } var data = [ ]; outEdges.forEach(function (edge) { data.push({ edge: edge, vertex: datasource.getInVertex(edge) }); }); // sort outgoing vertices according to "when" attribute value data.sort(function (l, r) { if (l.vertex.when === r.vertex.when) { return 0; } return (l.vertex.when < r.vertex.when ? 1 : -1); }); // pick first vertex found (with highest "when" attribute value) return [ data[0] ]; } ... }; Finding the connected edges / vertices based on an attribute when in the edge itself. The goal is to pick the one edge (out of potentially many) that has the highest when attribute value: 168 Using Traversal Objects var config = { ... expander: function (config, vertex, path) { var datasource = config.datasource; // determine all outgoing edges var outEdges = datasource.getOutEdges(vertex); if (outEdges.length === 0) { return [ ]; // return an empty array } // sort all outgoing edges according to "when" attribute outEdges.sort(function (l, r) { if (l.when === r.when) { return 0; } return (l.when < r.when ? -1 : 1); }); // return first edge (the one with highest "when" value) var edge = outEdges[0]; try { var v = datasource.getInVertex(edge); return [ { edge: edge, vertex: v } ]; } catch (e) { } return [ ]; } ... }; Handling Uniqueness Graphs may contain cycles. To be on top of what happens when a traversal encounters a vertex or an edge it has already visited, there are configuration options. The default configuration is to visit every vertex, regardless of whether it was already visited in the same traversal. However, edges will by default only be followed if they are not already present in the current path. Imagine the following graph which contains a cycle: A -> B -> C -> A When the traversal finds the edge from C to A, it will by default follow it. This is because we have not seen this edge yet. It will also visit vertex A again. This is because by default all vertices will be visited, regardless of whether already visited or not. However, the traversal will not again following the outgoing edge from A to B. This is because we already have the edge from A to B in our current path. These default settings will prevent infinite traversals. To adjust the uniqueness for visiting vertices, there are the following options for uniqueness.vertices: "none": always visit a vertices, regardless of whether it was already visited or not "global": visit a vertex only if it was not visited in the traversal "path": visit a vertex if it is not included in the current path To adjust the uniqueness for following edges, there are the following options for uniqueness.edges: "none": always follow an edge, regardless of whether it was followed before "global": follow an edge only if it wasn't followed in the traversal "path": follow an edge if it is not included in the current path Note that uniqueness checking will have some effect on both runtime and memory usage. For example, when uniqueness checks are set to "global", arrays of visited vertices and edges must be kept in memory while the traversal is executed. Global uniqueness should thus only be used when a traversal is expected to visit few nodes. 169 Using Traversal Objects In terms of runtime, turning off uniqueness checks (by setting both options to "none") is the best choice, but it is only safe for graphs that do not contain cycles. When uniqueness checks are deactivated in a graph with cycles, the traversal might not abort in a sensible amount of time. Optimizations There are a few options for making a traversal run faster. The best option is to make the amount of visited vertices and followed edges as small as possible. This can be achieved by writing custom filter and expander functions. Such functions should only include vertices of interest, and only follow edges that might be interesting. Traversal depth can also be bounded with the minDepth and maxDepth options. Another way to speed up traversals is to write a custom visitor function. The default visitor function (trackingVisitor) will copy every visited vertex into the result. If vertices contain lots of data, this might be expensive. It is therefore recommended to only copy such data into the result that is actually needed. The default visitor function will also copy the full path to the visited document into the result. This is even more expensive and should be avoided if possible. If the goal of a traversal is to only count the number of visited vertices, the prefab countingVisitor will be much more efficient than the default visitor. For graphs that are known to not contain any cycles, uniqueness checks should be turned off. This can achieved via the uniqueness configuration options. Note that uniqueness checks should not be turned off for graphs that are known contain cycles or if there is no information about the graph's structure. By default, a traversal will only process a limited number of vertices. This is protect the user from unintentionally run a never-ending traversal on a graph with cyclic data. How many vertices will be processed at most is determined by the maxIterations configuration option. If a traversal hits the cap specified by maxIterations, it will abort and throw a too many iterations exception. If this error is encountered, the maxIterations value should be increased if it is made sure that the other traversal configuration parameters are sane and the traversal will abort naturally at some point. Finally, the buildVertices configuration option can be set to false to avoid looking up and fully constructing vertex data. If all that's needed from vertices are the _id or _key attributes, the buildvertices option can be set to false. If visitor, filter or expandFilter functions need to access other vertex attributes, the option should not be changed. Configuration Overview This section summarizes the configuration attributes for the traversal object. The configuration can consist of the following attributes: visitor: visitor function for vertices. It will be called for all non-excluded vertices. The general visitor function signature is function (config, result, vertex, path). If the traversal order is preorder-expander, the connecting edges of the visited vertex will be passed as the fifth parameter, extending the function signature to: function (config, result, vertex, path, edges). Visitor functions are not expected to return values, but they may modify the result variable as needed (e.g. by pushing vertex data into the result). expander: expander function that is responsible for returning edges and vertices directly connected to a vertex. The function signature is function (config, vertex, path). The expander function is required to return an array of connection objects, consisting of an edge and vertex attribute each. If there are no connecting edges, the expander is expected to return an empty array. filter: vertex filter function. The function signature is function (config, vertex, path). It may return one of the following values: undefined: vertex will be included in the result and connected edges will be traversed "exclude": vertex will not be included in the result and connected edges will be traversed "prune": vertex will be included in the result but connected edges will not be traversed [ "prune", "exclude" ]: vertex will not be included in the result and connected edges will not be returned expandFilter: filter function applied on each edge/vertex combination determined by the expander. The function signature is function (config, vertex, edge, path). The function should return true if the edge/vertex combination should be processed, and false if it should be ignored. sort: a filter function to determine the order in which connected edges are processed. The function signature is function (l, r). The function is required to return one of the following values: -1 if l should have a sort value less than r 170 Using Traversal Objects 1 if l should have a higher sort value than r 0 if l and r have the same sort value strategy: determines the visitation strategy. Possible values are depthfirst and breadthfirst. order: determines the visitation order. Possible values are preorder, postorder, and preorder-expander. preorder-expander is the same as preorder, except that the signature of the visitor function will change as described above. itemOrder: determines the order in which connections returned by the expander will be processed. Possible values are forward and backward. maxDepth: if set to a value greater than 0, this will limit the traversal to this maximum depth. minDepth: if set to a value greater than 0, all vertices found on a level below the minDepth level will not be included in the result. maxIterations: the maximum number of iterations that the traversal is allowed to perform. It is sensible to set this number so unbounded traversals will terminate at some point. uniqueness: an object that defines how repeated visitations of vertices should be handled. The uniqueness object can have a subattribute vertices, and a sub-attribute edges. Each sub-attribute can have one of the following values: "none": no uniqueness constraints "path": element is excluded if it is already contained in the current path. This setting may be sensible for graphs that contain cycles (e.g. A → B → C → A). "global": element is excluded if it was already found/visited at any point during the traversal. buildVertices: this attribute controls whether vertices encountered during the traversal will be looked up in the database and will be made available to visitor, filter, and expandFilter functions. By default, vertices will be looked up and made available. However, there are some special use cases when fully constructing vertex objects is not necessary and can be avoided. For example, if a traversal is meant to only count the number of visited vertices but do not read any data from vertices, this option might be set to true. 171 Example Data Example Data The following examples all use a vertex collection v and an edge collection e. The vertex collection v contains continents, countries, and capitals. The edge collection e contains connections between continents and countries, and between countries and capitals. To set up the collections and populate them with initial data, the following script was used: db._create("v"); db._createEdgeCollection("e"); // vertices: root node db.v.save({ _key: "world", name: "World", type: "root" }); // vertices: continents db.v.save({ _key: "continent-africa", name: "Africa", type: "continent" }); db.v.save({ _key: "continent-asia", name: "Asia", type: "continent" }); db.v.save({ _key: "continent-australia", name: "Australia", type: "continent" }); db.v.save({ _key: "continent-europe", name: "Europe", type: "continent" }); db.v.save({ _key: "continent-north-america", name: "North America", type: "continent" }); db.v.save({ _key: "continent-south-america", name: "South America", type: "continent" }); // vertices: countries db.v.save({ _key: "country-afghanistan", name: "Afghanistan", type: "country", code: "AFG" }); db.v.save({ _key: "country-albania", name: "Albania", type: "country", code: "ALB" }); db.v.save({ _key: "country-algeria", name: "Algeria", type: "country", code: "DZA" }); db.v.save({ _key: "country-andorra", name: "Andorra", type: "country", code: "AND" }); db.v.save({ _key: "country-angola", name: "Angola", type: "country", code: "AGO" }); db.v.save({ _key: "country-antigua-and-barbuda", name: "Antigua and Barbuda", type: "country", code: "ATG" }); db.v.save({ _key: "country-argentina", name: "Argentina", type: "country", code: "ARG" }); db.v.save({ _key: "country-australia", name: "Australia", type: "country", code: "AUS" }); db.v.save({ _key: "country-austria", name: "Austria", type: "country", code: "AUT" }); db.v.save({ _key: "country-bahamas", name: "Bahamas", type: "country", code: "BHS" }); db.v.save({ _key: "country-bahrain", name: "Bahrain", type: "country", code: "BHR" }); db.v.save({ _key: "country-bangladesh", name: "Bangladesh", type: "country", code: "BGD" }); db.v.save({ _key: "country-barbados", name: "Barbados", type: "country", code: "BRB" }); db.v.save({ _key: "country-belgium", name: "Belgium", type: "country", code: "BEL" }); db.v.save({ _key: "country-bhutan", name: "Bhutan", type: "country", code: "BTN" }); db.v.save({ _key: "country-bolivia", name: "Bolivia", type: "country", code: "BOL" }); db.v.save({ _key: "country-bosnia-and-herzegovina", name: "Bosnia and Herzegovina", type: "country", code: "BIH" }); db.v.save({ _key: "country-botswana", name: "Botswana", type: "country", code: "BWA" }); db.v.save({ _key: "country-brazil", name: "Brazil", type: "country", code: "BRA" }); db.v.save({ _key: "country-brunei", name: "Brunei", type: "country", code: "BRN" }); db.v.save({ _key: "country-bulgaria", name: "Bulgaria", type: "country", code: "BGR" }); db.v.save({ _key: "country-burkina-faso", name: "Burkina Faso", type: "country", code: "BFA" }); db.v.save({ _key: "country-burundi", name: "Burundi", type: "country", code: "BDI" }); db.v.save({ _key: "country-cambodia", name: "Cambodia", type: "country", code: "KHM" }); db.v.save({ _key: "country-cameroon", name: "Cameroon", type: "country", code: "CMR" }); db.v.save({ _key: "country-canada", name: "Canada", type: "country", code: "CAN" }); db.v.save({ _key: "country-chad", name: "Chad", type: "country", code: "TCD" }); db.v.save({ _key: "country-chile", name: "Chile", type: "country", code: "CHL" }); db.v.save({ _key: "country-colombia", name: "Colombia", type: "country", code: "COL" }); db.v.save({ _key: "country-cote-d-ivoire", name: "Cote d'Ivoire", type: "country", code: "CIV" }); db.v.save({ _key: "country-croatia", name: "Croatia", type: "country", code: "HRV" }); db.v.save({ _key: "country-czech-republic", name: "Czech Republic", type: "country", code: "CZE" }); db.v.save({ _key: "country-denmark", name: "Denmark", type: "country", code: "DNK" }); db.v.save({ _key: "country-ecuador", name: "Ecuador", type: "country", code: "ECU" }); db.v.save({ _key: "country-egypt", name: "Egypt", type: "country", code: "EGY" }); db.v.save({ _key: "country-eritrea", name: "Eritrea", type: "country", code: "ERI" }); db.v.save({ _key: "country-finland", name: "Finland", type: "country", code: "FIN" }); db.v.save({ _key: "country-france", name: "France", type: "country", code: "FRA" }); db.v.save({ _key: "country-germany", name: "Germany", type: "country", code: "DEU" }); db.v.save({ _key: "country-people-s-republic-of-china", name: "People's Republic of China", type: "country", code: "CHN" }); // vertices: capitals db.v.save({ _key: "capital-algiers", name: "Algiers", type: "capital" }); db.v.save({ _key: "capital-andorra-la-vella", name: "Andorra la Vella", type: "capital" }); db.v.save({ _key: "capital-asmara", name: "Asmara", type: "capital" }); db.v.save({ _key: "capital-bandar-seri-begawan", name: "Bandar Seri Begawan", type: "capital" }); db.v.save({ _key: "capital-beijing", name: "Beijing", type: "capital" }); db.v.save({ _key: "capital-berlin", name: "Berlin", type: "capital" }); db.v.save({ _key: "capital-bogota", name: "Bogota", type: "capital" }); db.v.save({ _key: "capital-brasilia", name: "Brasilia", type: "capital" }); 172 Example Data db.v.save({ _key: "capital-bridgetown", name: "Bridgetown", type: "capital" }); db.v.save({ _key: "capital-brussels", name: "Brussels", type: "capital" }); db.v.save({ _key: "capital-buenos-aires", name: "Buenos Aires", type: "capital" }); db.v.save({ _key: "capital-bujumbura", name: "Bujumbura", type: "capital" }); db.v.save({ _key: "capital-cairo", name: "Cairo", type: "capital" }); db.v.save({ _key: "capital-canberra", name: "Canberra", type: "capital" }); db.v.save({ _key: "capital-copenhagen", name: "Copenhagen", type: "capital" }); db.v.save({ _key: "capital-dhaka", name: "Dhaka", type: "capital" }); db.v.save({ _key: "capital-gaborone", name: "Gaborone", type: "capital" }); db.v.save({ _key: "capital-helsinki", name: "Helsinki", type: "capital" }); db.v.save({ _key: "capital-kabul", name: "Kabul", type: "capital" }); db.v.save({ _key: "capital-la-paz", name: "La Paz", type: "capital" }); db.v.save({ _key: "capital-luanda", name: "Luanda", type: "capital" }); db.v.save({ _key: "capital-manama", name: "Manama", type: "capital" }); db.v.save({ _key: "capital-nassau", name: "Nassau", type: "capital" }); db.v.save({ _key: "capital-n-djamena", name: "N'Djamena", type: "capital" }); db.v.save({ _key: "capital-ottawa", name: "Ottawa", type: "capital" }); db.v.save({ _key: "capital-ouagadougou", name: "Ouagadougou", type: "capital" }); db.v.save({ _key: "capital-paris", name: "Paris", type: "capital" }); db.v.save({ _key: "capital-phnom-penh", name: "Phnom Penh", type: "capital" }); db.v.save({ _key: "capital-prague", name: "Prague", type: "capital" }); db.v.save({ _key: "capital-quito", name: "Quito", type: "capital" }); db.v.save({ _key: "capital-saint-john-s", name: "Saint John's", type: "capital" }); db.v.save({ _key: "capital-santiago", name: "Santiago", type: "capital" }); db.v.save({ _key: "capital-sarajevo", name: "Sarajevo", type: "capital" }); db.v.save({ _key: "capital-sofia", name: "Sofia", type: "capital" }); db.v.save({ _key: "capital-thimphu", name: "Thimphu", type: "capital" }); db.v.save({ _key: "capital-tirana", name: "Tirana", type: "capital" }); db.v.save({ _key: "capital-vienna", name: "Vienna", type: "capital" }); db.v.save({ _key: "capital-yamoussoukro", name: "Yamoussoukro", type: "capital" }); db.v.save({ _key: "capital-yaounde", name: "Yaounde", type: "capital" }); db.v.save({ _key: "capital-zagreb", name: "Zagreb", type: "capital" }); // edges: continent -> world db.e.save("v/continent-africa", "v/world", { type: "is-in" }); db.e.save("v/continent-asia", "v/world", { type: "is-in" }); db.e.save("v/continent-australia", "v/world", { type: "is-in" }); db.e.save("v/continent-europe", "v/world", { type: "is-in" }); db.e.save("v/continent-north-america", "v/world", { type: "is-in" }); db.e.save("v/continent-south-america", "v/world", { type: "is-in" }); // edges: country -> continent db.e.save("v/country-afghanistan", "v/continent-asia", { type: "is-in" }); db.e.save("v/country-albania", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-algeria", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-andorra", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-angola", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-antigua-and-barbuda", "v/continent-north-america", { type: "is-in" }); db.e.save("v/country-argentina", "v/continent-south-america", { type: "is-in" }); db.e.save("v/country-australia", "v/continent-australia", { type: "is-in" }); db.e.save("v/country-austria", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-bahamas", "v/continent-north-america", { type: "is-in" }); db.e.save("v/country-bahrain", "v/continent-asia", { type: "is-in" }); db.e.save("v/country-bangladesh", "v/continent-asia", { type: "is-in" }); db.e.save("v/country-barbados", "v/continent-north-america", { type: "is-in" }); db.e.save("v/country-belgium", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-bhutan", "v/continent-asia", { type: "is-in" }); db.e.save("v/country-bolivia", "v/continent-south-america", { type: "is-in" }); db.e.save("v/country-bosnia-and-herzegovina", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-botswana", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-brazil", "v/continent-south-america", { type: "is-in" }); db.e.save("v/country-brunei", "v/continent-asia", { type: "is-in" }); db.e.save("v/country-bulgaria", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-burkina-faso", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-burundi", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-cambodia", "v/continent-asia", { type: "is-in" }); db.e.save("v/country-cameroon", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-canada", "v/continent-north-america", { type: "is-in" }); db.e.save("v/country-chad", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-chile", "v/continent-south-america", { type: "is-in" }); db.e.save("v/country-colombia", "v/continent-south-america", { type: "is-in" }); db.e.save("v/country-cote-d-ivoire", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-croatia", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-czech-republic", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-denmark", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-ecuador", "v/continent-south-america", { type: "is-in" }); 173 Example Data db.e.save("v/country-egypt", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-eritrea", "v/continent-africa", { type: "is-in" }); db.e.save("v/country-finland", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-france", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-germany", "v/continent-europe", { type: "is-in" }); db.e.save("v/country-people-s-republic-of-china", "v/continent-asia", { type: "is-in" }); // edges: capital -> country db.e.save("v/capital-algiers", "v/country-algeria", { type: "is-in" }); db.e.save("v/capital-andorra-la-vella", "v/country-andorra", { type: "is-in" }); db.e.save("v/capital-asmara", "v/country-eritrea", { type: "is-in" }); db.e.save("v/capital-bandar-seri-begawan", "v/country-brunei", { type: "is-in" }); db.e.save("v/capital-beijing", "v/country-people-s-republic-of-china", { type: "is-in" }); db.e.save("v/capital-berlin", "v/country-germany", { type: "is-in" }); db.e.save("v/capital-bogota", "v/country-colombia", { type: "is-in" }); db.e.save("v/capital-brasilia", "v/country-brazil", { type: "is-in" }); db.e.save("v/capital-bridgetown", "v/country-barbados", { type: "is-in" }); db.e.save("v/capital-brussels", "v/country-belgium", { type: "is-in" }); db.e.save("v/capital-buenos-aires", "v/country-argentina", { type: "is-in" }); db.e.save("v/capital-bujumbura", "v/country-burundi", { type: "is-in" }); db.e.save("v/capital-cairo", "v/country-egypt", { type: "is-in" }); db.e.save("v/capital-canberra", "v/country-australia", { type: "is-in" }); db.e.save("v/capital-copenhagen", "v/country-denmark", { type: "is-in" }); db.e.save("v/capital-dhaka", "v/country-bangladesh", { type: "is-in" }); db.e.save("v/capital-gaborone", "v/country-botswana", { type: "is-in" }); db.e.save("v/capital-helsinki", "v/country-finland", { type: "is-in" }); db.e.save("v/capital-kabul", "v/country-afghanistan", { type: "is-in" }); db.e.save("v/capital-la-paz", "v/country-bolivia", { type: "is-in" }); db.e.save("v/capital-luanda", "v/country-angola", { type: "is-in" }); db.e.save("v/capital-manama", "v/country-bahrain", { type: "is-in" }); db.e.save("v/capital-nassau", "v/country-bahamas", { type: "is-in" }); db.e.save("v/capital-n-djamena", "v/country-chad", { type: "is-in" }); db.e.save("v/capital-ottawa", "v/country-canada", { type: "is-in" }); db.e.save("v/capital-ouagadougou", "v/country-burkina-faso", { type: "is-in" }); db.e.save("v/capital-paris", "v/country-france", { type: "is-in" }); db.e.save("v/capital-phnom-penh", "v/country-cambodia", { type: "is-in" }); db.e.save("v/capital-prague", "v/country-czech-republic", { type: "is-in" }); db.e.save("v/capital-quito", "v/country-ecuador", { type: "is-in" }); db.e.save("v/capital-saint-john-s", "v/country-antigua-and-barbuda", { type: "is-in" }); db.e.save("v/capital-santiago", "v/country-chile", { type: "is-in" }); db.e.save("v/capital-sarajevo", "v/country-bosnia-and-herzegovina", { type: "is-in" }); db.e.save("v/capital-sofia", "v/country-bulgaria", { type: "is-in" }); db.e.save("v/capital-thimphu", "v/country-bhutan", { type: "is-in" }); db.e.save("v/capital-tirana", "v/country-albania", { type: "is-in" }); db.e.save("v/capital-vienna", "v/country-austria", { type: "is-in" }); db.e.save("v/capital-yamoussoukro", "v/country-cote-d-ivoire", { type: "is-in" }); db.e.save("v/capital-yaounde", "v/country-cameroon", { type: "is-in" }); db.e.save("v/capital-zagreb", "v/country-croatia", { type: "is-in" }); 174 Working with Edges Edges, Identifiers, Handles This is an introduction to ArangoDB's interface for edges. Edges may be used in graphs. Here we work with edges from the JavaScript shell arangosh. For other languages see the corresponding language API. A graph data model always consists of at least two collections: the relations between the nodes in the graphs are stored in an "edges collection", the nodes in the graph are stored in documents in regular collections. Edges in ArangoDB are special documents. In addition to the system attributes _key, _id and _rev, they have the attributes _from and _to, which contain document handles, namely the start-point and the end-point of the edge. Example: the "edge" collection stores the information that a company's reception is sub-unit to the services unit and the services unit is subunit to the CEO. You would express this relationship with the _from and _to attributes the "normal" collection stores all the properties about the reception, e.g. that 20 people are working there and the room number etc _from is the document handle of the linked vertex (incoming relation) _to is the document handle of the linked vertex (outgoing relation) Edge collections are special collections that store edge documents. Edge documents are connection documents that reference other documents. The type of a collection must be specified when a collection is created and cannot be changed afterwards. To change edge endpoints you would need to remove old document/edge and insert new one. Other fields can be updated as in default collection. Working with Edges Edges are normal documents that always contain a _from and a _to attribute. 175 Pregel Distributed Iterative Graph Processing (Pregel) Distributed graph processing enables you to do online analytical processing directly on graphs stored into arangodb. This is intended to help you gain analytical insights on your data, without having to use external processing sytems. Examples of algorithms to execute are PageRank, Vertex Centrality, Vertex Closeness, Connected Components, Community Detection. This system is not useful for typical online queries, where you just do work on a small set of vertices. These kind of tasks are better suited for AQL. The processing system inside ArangoDB is based on: Pregel: A System for Large-Scale Graph Processing – M alewicz et al. (Google) 2010 This concept enables us to perform distributed graph processing, without the need for distributed global locking. Prerequisites If you are running a single ArangoDB instance in single-server mode, there are no requirements regarding the modeling of your data. All you need is at least one vertex collection and one edge collection. Note that the performance may be better, if the number of your shards / collections matches the number of CPU cores. When you use ArangoDB Community edition in cluster mode, you might need to model your collections in a certain way to ensure correct results. For more information see the next section. Requirements for Collections in a Cluster (Non Smart Graph) To enable iterative graph processing for your data, you will need to ensure that your vertex and edge collections are sharded in a specific way. The pregel computing model requires all edges to be present on the DB Server where the vertex document identified by the _from value is located. This means the vertex collections need to be sharded by '_key' and the edge collection will need to be sharded after an attribute which always contains the '_key' of the vertex. Our implementation currently requires every edge collection to be sharded after a "vertex" attributes, additionally you will need to specify the key distributeShardsLike and an equal number of shards on every collection. Only if these requirements are met can ArangoDB place the edges and vertices correctly. For example you might create your collections like this: // Create main vertex collection: db._create("vertices", { shardKeys:['_key'], numberOfShards: 8 }); // Optionally create arbitrary additional vertex collections db._create("additonal", { distributeShardsLike:"vertices", numberOfShards:8 }); // Create (one or more) edge-collections: db._createEdgeCollection("edges", { shardKeys:['vertex'], distributeShardsLike:"vertices", numberOfShards:8 }); You will need to ensure that edge documents contain the proper values in their sharding attribute. For a vertex document with the following content {_key:"A", value:0} the corresponding edge documents would have look like this: {_from:"vertices/A", _to: "vertices/B", vertex:"A"} {_from:"vertices/A", _to: "vertices/C", vertex:"A"} {_from:"vertices/A", _to: "vertices/D", vertex:"A"} ... 176 Pregel This will ensure that outgoing edge documents will be placed on the same DBServer as the vertex. Without the correct placement of the edges, the pregel graph processing system will not work correctly, because edges will not load correctly. Arangosh API Starting an Algorithm Execution The pregel API is accessible through the @arangodb/pregel package. To start an execution you need to specify the algorithm name and the vertex and edge collections. Alternatively you can specify a named graph. Additionally you can specify custom parameters which vary for each algorithm. The The below version of the start start method will always a unique ID which can be used to interact with the algorithm and later on. method can be used for named graphs: var pregel = require("@arangodb/pregel"); var params = {}; var execution = pregel.start(" ", " ", params); Params needs to be an object, the valid keys are mentioned below in the section Algorithms Alternatively you might want to specify the vertex and edge collections directly. The call-syntax of the case. The second argument must be an object with the keys vertexCollections and start`` method changes in this edgeCollections`. var pregel = require("@arangodb/pregel"); var params = {}; var execution = pregel.start(" ", {vertexCollections:["vertices"], edgeCollections:["edges"]}, {}); The last argument is still the parameter object. See below for a list of algorithms and parameters. Status of an Algorithm Execution The code returned by the pregel.start(...) method can be used to track the status of your algorithm. var execution = pregel.start("sssp", "demograph", {source: "vertices/V"}); var status = pregel.status(execution); The result will tell you the current status of the algorithm execution. It will tell you the current state of the execution, the current global superstep, the runtime, the global aggregator values as well as the number of send and received messages. Valid values for the state field include: "running" algorithm is still running "done": The execution is done, the result might not be written back into the collection yet. "canceled": The execution was permanently canceled, either by the user or by an error. "in error": The exeuction is in an error state. This can be caused by primary DBServers being not reachable or being non responsive. The execution might recover later, or switch to "canceled" if it was not able to recover successfuly "recovering": The execution is actively recovering, will switch back to "running" if the recovery was successful The object returned by the status method might for example look something like this: { "state" : "running", "gss" : 12, "totalRuntime" : 123.23, "aggregators" : { "converged" : false, "max" : true, "phase" : 2 }, "sendCount" : 3240364978, "receivedCount" : 3240364975 } 177 Pregel Canceling an Execution / Discarding results To cancel an execution which is still runnning, and discard any intermediare results you can use the cancel method. This will immediatly free all memory taken up by the execution, and will make you lose all intermediary data. You might get inconsistent results if you cancel an execution while it is already in it's done state. The data is written multi-threaded into all collection shards at once, this means there are multiple transactions simultaniously. A transaction might already be commited when you cancel the execution job, therefore you might see the result in your collection. This does not apply if you configured the execution to not write data into the collection. // start a single source shortest path job var execution = pregel.start("sssp", "demograph", {source: "vertices/V"}); pregel.cancel(execution); AQL integration ArangoDB supports retrieving temporary pregel results through the ArangoDB query language (AQL). When our graph processing subsystem finishes executing an algorithm, the result can either be written back into the database or kept in memory. In both cases the result can be queried via AQL. If the data was not written to the database store it is only held temporarily, until the user calls the cancel methodFor example a user might want to query only nodes with the most rank from the result set of a PageRank execution. FOR v IN PREGEL_RESULT( ) FILTER v.value >= 0.01 RETURN v._key Available Algorithms There are a number of general parameters which apply to almost all algorithms: store : Is per default true, the pregel engine will write results back to the database. if the value is false then you can query the results via AQLk, see AQL integration. maxGSS : M aximum number of global iterations for this algorithm parallelism : Number of parellel threads to use per worker. Does not influence the number of threads used to load or store data from the database (this depends on the number of shards). async : Algorithms wich support async mode, will run without synchronized global iterations, might lead to performance increases if you have load imbalances. resultField : M ost algorithms will write the result into this field Page Rank PageRank is a well known algorithm to rank documents in a graph. The algorithm will run until the execution converges. Specify a custom threshold with the parameter threshold , to run for a fixed number of iterations use the maxGSS parameter. var pregel = require("@arangodb/pregel"); pregel.start("pagerank", "graphname", {maxGSS: 100, threshold:0.00000001}) Single-Source Shortest Path Calculates the distance of each vertex to a certain shortest path. The algorithm will run until it converges, the iterations are bound by the diameter (the longest shortest path) of your graph. var pregel = require("@arangodb/pregel"); pregel.start("sssp", "graphname", {source:"vertices/1337"}) Connected Components 178 Pregel There are two algorithms to find connected components in a graph. To find weakly connected components (WCC) you can use the algorithm named "connectedcomponents", to find strongly connected components (SCC) you can use the algorithm named "scc". Both algorithm will assign a component ID to each vertex. A weakly connected components means that there exist a path from every vertex pair in that component. WCC is a very simple and fast algorithm, which will only work correctly on undirected graphs. Your results on directed graphs may vary, depending on how connected your components are. In the case of SCC a component means every vertex is reachable from any other vertex in the same component. The algorithm is more complex than the WCC algorithm and requires more RAM , because each vertex needs to store much more state. Consider using WCC if you think your data may be suitable for it. var pregel = require("@arangodb/pregel"); // weakly connected components pregel.start("connectedcomponents", "graphname") // strongly connected components pregel.start("scc", "graphname") Hyperlink-Induced Topic Search (HITS) HITS is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg (The algorithm is also known as hubs and authorities). The idea behind Hubs and Authorities comes from the typical structure of the web: Certain websites known as hubs, serve as large directories that are not actually authoritative on the information that they hold. These hubs are used as compilations of a broad catalog of information that leads users direct to other authoritative webpages. The algorithm assigns each vertex two scores: The authority-score and the hub-score. The authority score rates how many good hubs point to a particular vertex (or webpage), the hub score rates how good (authoritative) the vertices pointed to are. For more see https://en.wikipedia.org/wiki/HITS_algorithm Our version of the algorithm converges after a certain amount of time. The parameter threshold can be used to set a limit for the convergence (measured as maximum absolute difference of the hub and authority scores between the current and last iteration) When you specify the result field name, the hub score will be stored in "_hub" and the authority score in "_auth". The algorithm can be executed like this: var pregel = require("@arangodb/pregel"); var handle = pregel.start("hits", "yourgraph", {threshold:0.00001, resultField: "score"}); Vertex Centrality Centrality measures help identify the most important vertices in a graph. They can be used in a wide range of applications: For example they can be used to identify influencers in social networks, or middle-men in terrorist networks. There are various definitions for centrality, the simplest one being the vertex degree. These definitions were not designed with scalability in mind. It is probably impossible to discover an efficient algorithm which computes them in a distributed way. Fortunately there are scalable substitutions available, which should be equally usable for most use cases. 179 Pregel Effective Closeness A common definitions of centrality is the closeness centrality (or closeness). The closeness of a vertex in a graph is the inverse average length of the shortest path between the vertex and all other vertices. For vertices x, y and shortest distance d(y,x) it is defined as Effective Closeness approximates the closeness measure. The algorithm works by iteratively estimating the number of shortest paths passing through each vertex. The score will approximates the the real closeness score, since it is not possible to actually count all shortest paths due to the horrendous O(n^2 d) memory requirements. The algorithm is from the paper Centralities in Large Networks: Algorithms and Observations (U Kang et.al. 2011)* ArangoDBs implementation approximates the number of shortest path in each iteration by using a HyperLogLog counter with 64 buckets. This should work well on large graphs and on smaller ones as well. The memory requirements should be O(n * d) where n is the number of vertices and d the diameter of your graph. Each vertex will store a counter for each iteration of the algorithm. The algorithm can be used like this const pregel = require("@arangodb/pregel"); const handle = pregel.start("effectivecloseness", "yourgraph", {resultField: "closeness"}); LineRank Another common measure is the betweenness* centrality: It measures the number of times a vertex is part of shortest paths between any pairs of vertices. For a vertex v betweenness is defined as Where the σ represents the number of shortest paths between x and y, and σ(v) represents the number of paths also passing through a vertex v. By intuition a vertex with higher betweeness centrality will have more information passing through it. LineRank approximates the random walk betweenness of every vertex in a graph. This is the probability that someone starting on an arbitary vertex, will visit this node when he randomly chooses edges to visit. The algoruthm essentially builds a line graph out of your graph (switches the vertices and edges), and then computes a score similar to PageRank. This can be considered a scalable equivalent to vertex betweeness, which can be executed distributedly in ArangoDB. The algorithm is from the paper Centralities in Large Networks: Algorithms and Observations (U Kang et.al. 2011) 180 Pregel const pregel = require("@arangodb/pregel"); const handle = pregel.start("linerank", "yourgraph", {"resultField": "rank"}); Community Detection Graphs based on real world networks often have a community structure. This means it is possible to find groups of vertices such that each each vertex group is internally more densely connected than outside the group. This has many applications when you want to analyze your networks, for example Social networks include community groups (the origin of the term, in fact) based on common location, interests, occupation, etc. Label Propagation Label Propagation can be used to implement community detection on large graphs. The idea is that each vertex should be in the community that most of his neighbours are in. We iteratively detemine this by first assigning random Community ID's. Then each itertation, a vertex will send it's current community ID to all his neighbor vertices. Then each vertex adopts the community ID he received most frequently during the iteration. The algorithm runs until it converges, which likely never really happens on large graphs. Therefore you need to specify a maximum iteration bound which suits you. The default bound is 500 iterations, which is likely too large for your application. Should work best on undirected graphs, results on directed graphs might vary depending on the density of your graph. const pregel = require("@arangodb/pregel"); const handle = pregel.start("labelpropagation", "yourgraph", {maxGSS:100, resultField: "community"}); S peaker-Listener Label Propagation The Speaker-listener Label Propagation (SLPA) can be used to implement community detection. It works similar to the label propagation algorithm, but now every node additionally accumulates a memory of observed labels (instead of forgetting all but one label). Before the algorithm run, every vertex is initialized with an unique ID (the initial community label). During the run three steps are executed for each vertex: 1. Current vertex is the listener all other vertices are speakers 2. Each speaker sends out a label from memory, we send out a random label with a probability proportional to the number of times the vertex observed the label 3. The listener remembers one of the labels, we always choose the most frequently observed label const pregel = require("@arangodb/pregel"); const handle = pregel.start("slpa", "yourgraph", {maxGSS:100, resultField: "community"}); You can also execute SLPA with the maxCommunities parameter to limit the number of ouput communities. Internally the algorithm will still keep the memory of all labels, but the output is reduced to just he n most frequently observed labels. const pregel = require("@arangodb/pregel"); const handle = pregel.start("slpa", "yourgraph", {maxGSS:100, resultField:"community", maxCommunities:1}); // check the status periodically for completion pregel.status(handle); 181 Foxx M icroservices Foxx Traditionally, server-side projects have been developed as standalone applications that guide the communication between the client-side frontend and the database backend. This has led to applications that were either developed as single monoliths or that duplicated data access and domain logic across all services that had to access the database. Additionally, tools to abstract away the underlying database calls could incur a lot of network overhead when using remote databases without careful optimization. ArangoDB allows application developers to write their data access and domain logic as microservices running directly within the database with native access to in-memory data. The Foxx microservice framework makes it easy to extend ArangoDB's own REST API with custom HTTP endpoints using modern JavaScript running on the same V8 engine you know from Node.js and the Google Chrome web browser. Unlike traditional approaches to storing logic in the database (like stored procedures), these microservices can be written as regular structured JavaScript applications that can be easily distributed and version controlled. Depending on your project's needs Foxx can be used to build anything from optimized REST endpoints performing complex data access to entire standalone applications running directly inside the database. 182 At a glance Foxx at a glance Each Foxx service is defined by a JSON manifest specifying the entry point, any scripts defined by the service, possible configuration options and Foxx dependencies, as well as other metadata. Within a service, these options are exposed as the service context. At the heart of the Foxx framework lies the Foxx Router which is used to define HTTP endpoints. A service can access the database either directly from its context using prefixed collections or the ArangoDB database API. While Foxx is primarily designed to be used to access the database itself, ArangoDB also provides an API to make HTTP requests to external services. Scripts can be used to perform one-off tasks, which can also be scheduled to be performed asynchronously using the built-in job queue. Finally, Foxx services can be installed and managed over the Web-UI or through ArangoDBs HTTP API. How does it work Foxx services consist of JavaScript code running in the V8 JavaScript runtime embedded inside ArangoDB. Each service is mounted in each available V8 context (the number of contexts can be adjusted in the ArangoDB configuration). Incoming requests are distributed accross these contexts automatically. If you're coming from another JavaScript environment like Node.js this is similar to running multiple Node.js processes behind a load balancer: you should not rely on server-side state (other than the database itself) between different requests as there is no way of making sure consecutive requests will be handled in the same context. Because the JavaScript code is running inside the database another difference is that all Foxx and ArangoDB APIs are purely synchronous and should be considered blocking. This is especially important for transactions, which in ArangoDB can execute arbitrary code but may have to lock entire collections (effectively preventing any data to be written) until the code has completed. For information on how this affects interoperability with third-party JavaScript modules written for other JavaScript environments see the chapter on dependencies. Development mode Development mode allows you to make changes to deployed services in-place directly on the database server's file system without downloading and re-uploading the service bundle. Additionally error messages will contain stacktraces. You can toggle development mode on and off in the service settings tab of the web interface or using the HTTP API. Once activated the service's file system path will be shown in the info tab. Once enabled the service's source files and manifest will be re-evaluated, and the setup script (if present) re-executed, every time a route of the service is accessed, effectively re-deploying the service on every request. As the name indicates this is intended to be used strictly during development and is most definitely a bad idea on production servers. The additional information exposed during development mode may include file system paths and parts of the service's source code. Also note that if you are serving static files as part of your service, accessing these files from a browser may also trigger a re-deployment of the service. Finally, making HTTP requests to a service running in development mode from within the service (i.e. using the @arangodb/request module to access the service itself) is probably not a good idea either. Beware of deleting the database the service is deployed on: it will erase the source files of the service along with the collections. You should backup the code you worked on in development before doing that to avoid losing your progress. Foxx store The Foxx store provides access to a number of ready-to-use official and community-maintained Foxx services you can install with a single click, including example services and wrappers for external SaaS tools like transactional e-mail services, bug loggers or analytics trackers. You can find the Foxx store in the web interface by using the Add Service button in the service list. 183 At a glance Cluster-Foxx When running ArangoDB in a cluster the Foxx services will run on each coordinator. Installing, upgrading and uninstalling services on any coordinator will automatically distribute the service to the other coordinators, making deployments as easy as in single-server mode. However, this means there are some limitations: You should avoid any kind of file system state beyond the deployed service bundle itself. Don't write data to the file system or encode any expectations of the file system state other than the files in the service folder that were installed as part of the service (e.g. file uploads or custom log files). Additionally, the development mode will lead to an inconsistent state of the cluster until it is disabled. While a service is running in development mode you can make changes to the service on the filesystem of any coordinator and see them reflected in real time just like when running ArangoDB as a single server. However the changes made on one coordinator will not be reflected across other coordinators until the development mode is disabled. When disabling the development mode for a service, the coordinator will create a new bundle and distribute it across the service like a manual upgrade of the service. For these reasons it is strongly recommended not to use development mode in a cluster with multiple coordinators unless you are sure that no requests or changes will be made to other coordinators while you are modifying the service. Using development mode in a production cluster is extremely unsafe and highly discouraged. 184 Getting started Getting Started We're going to start with an empty folder. This will be the root folder of our services. You can name it something clever but for the course of this guide we'll assume it's called the name of your service: First we need to create a manifest. Create a new file called getting-started manifest.json . and add the following content: { "engines": { "arangodb": "^3.0.0" } } This just tells ArangoDB the service is compatible with versions 3.0.0 and later (all the way up to but not including 4.0.0), allowing older versions of ArangoDB to understand that this service likely won't work for them and newer versions what behavior to emulate should they still support it. The little hat to the left of the version number is not a typo, it's called a "caret" and indicates the version range. Foxx uses semantic versioning (also called "semver") for most of its version handling. You can find out more about how semver works at the official semver website. Next we'll need to specify an entry point to our service. This is the JavaScript file that will be executed to define our service's HTTP endpoints. We can do this by adding a "main" field to our manifest: { "engines": { "arangodb": "^3.0.0" }, "main": "index.js" } That's all we need in our manifest for now, so let's next create the index.js file: 'use strict'; const createRouter = require('@arangodb/foxx/router'); const router = createRouter(); module.context.use(router); The first line causes our file to be interpreted using strict mode. All examples in the ArangoDB documentation assume strict mode, so you might want to familiarize yourself with it if you haven't encountered it before. The second line imports the function to create a new The module.context @arangodb/foxx/router router module which provides a function for creating new Foxx routers. We're using this object which we'll be using for our service. is the so-called Foxx context or service context. This variable is available in all files that are part of your Foxx service and provides access to Foxx APIs specific to the current service, like the use method, which tells Foxx to mount the in router this service (and to expose its routes to HTTP). Next let's define a route that prints a generic greeting: // continued router.get('/hello-world', function (req, res) { res.send('Hello World!'); }) .response(['text/plain'], 'A generic greeting.') .summary('Generic greeting') .description('Prints a generic greeting.'); The router provides the methods get , post , etc corresponding to each HTTP verb as well as the catch-all all . These methods indicate that the given route should be used to handle incoming requests with the given HTTP verb (or any method when using all ). 185 Getting started These methods take an optional path (if omitted, it defaults to (request) and res "/" ) as well as a request handler, which is a function taking the req (response) objects to handle the incoming request and generate the outgoing response. If you have used the express framework in Node.js, you may already be familiar with how this works, otherwise check out the chapter on routes. The object returned by the router's methods provides additional methods to attach metadata and validation to the route. We're using and summary to document what the route does -- these aren't strictly necessary but give us some nice auto-generated description documentation. The response method lets us additionally document the response content type and what the response body will represent. Try it out At this point you can upload the service folder as a zip archive from the web interface using the Services tab. Click Add Service then pick the Zip option in the dialog. You will need to provide a mount path, which is the URL prefix at which the service will be mounted (e.g. /getting-started ). Once you have picked the zip archive using the file picker, the upload should begin immediately and your service should be installed. Otherwise press the Install button and wait for the dialog to disappear and the service to show up in the service list. Click anywhere on the card with your mount path on the label to open the service's details. In the API documentation you should see the route we defined earlier ( HTTP method it supports and the summary /hello-world ) with the word GET next to it indicating the we provided on the right. By clicking on the route's path you can open the documentation for the route. Note that the description we provided appears in the generated documentation as well as the description we added to the (which should correctly indicate the content type text/plain response , i.e. plain text). Click the Try it out! button to send a request to the route and you should see an example request with the service's response: "Hello World!". Congratulations! You have just created, installed and used your first Foxx service. Parameter validation Let's add another route that provides a more personalized greeting: // continued const joi = require('joi'); router.get('/hello/:name', function (req, res) { res.send(`Hello ${req.pathParams.name}`); }) .pathParam('name', joi.string().required(), 'Name to greet.') .response(['text/plain'], 'A personalized greeting.') .summary('Personalized greeting') .description('Prints a personalized greeting.'); The first line imports the joi module from npm which comes bundled with ArangoDB. Joi is a validation library that is used throughout Foxx to define schemas and parameter types. Note: You can bundle your own modules from npm by installing them in your service folder and making sure the node_modules folder is included in your zip archive. For more information see the section on module dependencies in the chapter on dependencies. The pathParam method allows us to specify parameters we are expecting in the path. The first argument corresponds to the parameter name in the path, the second argument is a joi schema the parameter is expected to match and the final argument serves to describe the parameter in the API documentation. The path parameters are accessible from the pathParams property of the request object. We're using a template string to generate the server's response containing the parameter's value. 186 Getting started Note that routes with path parameters that fail to validate for the request URL will be skipped as if they wouldn't exist. This allows you to define multiple routes that are only distinguished by the schemas of their path parameters (e.g. a route taking only numeric parameters and one taking any string as a fallback). Let's take this further and create a route that takes a JSON request body: // continued router.post('/sum', function (req, res) { const values = req.body.values; res.send({ result: values.reduce(function (a, b) { return a + b; }, 0) }); }) .body(joi.object({ values: joi.array().items(joi.number().required()).required() }).required(), 'Values to add together.') .response(joi.object({ result: joi.number().required() }).required(), 'Sum of the input values.') .summary('Add up numbers') .description('Calculates the sum of an array of number values.'); Note that we used post to define this route instead of this route's URL (in the absence of a get (which does not support request bodies). Trying to send a GET request to route for the same path) will result in Foxx responding with an appropriate error response, get indicating the supported HTTP methods. As this route not only expects a JSON object as input but also responds with a JSON object as output we need to define two schemas. We don't strictly need a response schema but it helps documenting what the route should be expected to respond with and will show up in the API documentation. Because we're passing a schema to the response method we don't need to explicitly tell Foxx we are sending a JSON response. The presence of a schema in the absence of a content type always implies we want JSON. Though we could just add ["application/json"] as an additional argument after the schema if we wanted to make this more explicit. The body method works the same way as the response method except the schema will be used to validate the request body. If the request body can't be parsed as JSON or doesn't match the schema, Foxx will reject the request with an appropriate error response. Creating collections The real power of Foxx comes from interacting with the database itself. In order to be able to use a collection from within our service, we should first make sure that the collection actually exists. The right place to create collections your service is going to use is in a setup script, which Foxx will execute for you when installing or updating the service. First create a new folder called "scripts" in the service folder, which will be where our scripts are going to live. For simplicity's sake, our setup script will live in a file called setup.js inside that folder: // continued 'use strict'; const db = require('@arangodb').db; const collectionName = 'myFoxxCollection'; if (!db._collection(collectionName)) { db._createDocumentCollection(collectionName); } The script uses the db object from the @arangodb module, which lets us interact with the database the Foxx service was installed in and the collections inside that database. Because the script may be executed multiple times (i.e. whenever we update the service or when the server is restarted) we need to make sure we don't accidentally try to create the same collection twice (which would result in an exception); we do that by first checking whether it already exists before creating it. 187 Getting started The _collection method looks up a collection by name and returns _createDocumentCollection null if no collection with that name was found. The method creates a new document collection by name ( _createEdgeCollection also exists and works analogously for edge collections). Note: Because we have hardcoded the collection name, multiple copies of the service installed alongside each other in the same database will share the same collection. Because this may not always be what you want, the Foxx context also provides the collectionName method which applies a mount point specific prefix to any given collection name to make it unique to the service. It also provides the collection method, which behaves almost exactly like db._collection except it also applies the prefix before looking the collection up. Next we need to tell our service about the script by adding it to the manifest file: { "engines": { "arangodb": "^3.0.0" }, "main": "index.js", "scripts": { "setup": "scripts/setup.js" } } The only thing that has changed is that we added a "scripts" field specifying the path of the setup script we just wrote. Go back to the web interface and update the service with our new code, then check the Collections tab. If everything worked right, you should see a new collection called "myFoxxCollection". Accessing collections Let's expand our service by adding a few more routes to our index.js : // continued const db = require('@arangodb').db; const errors = require('@arangodb').errors; const foxxColl = db._collection('myFoxxCollection'); const DOC_NOT_FOUND = errors.ERROR_ARANGO_DOCUMENT_NOT_FOUND.code; router.post('/entries', function (req, res) { const data = req.body; const meta = foxxColl.save(req.body); res.send(Object.assign(data, meta)); }) .body(joi.object().required(), 'Entry to store in the collection.') .response(joi.object().required(), 'Entry stored in the collection.') .summary('Store an entry') .description('Stores an entry in the "myFoxxCollection" collection.'); router.get('/entries/:key', function (req, res) { try { const data = foxxColl.document(req.pathParams.key); res.send(data) } catch (e) { if (!e.isArangoError || e.errorNum !== DOC_NOT_FOUND) { throw e; } res.throw(404, 'The entry does not exist', e); } }) .pathParam('key', joi.string().required(), 'Key of the entry.') .response(joi.object().required(), 'Entry stored in the collection.') .summary('Retrieve an entry') .description('Retrieves an entry from the "myFoxxCollection" collection by key.'); We're using the save and document methods of the collection object to store and retrieve documents in the collection we created in our setup script. Because we don't care what the documents look like we allow any attributes on the request body and just accept an object. 188 Getting started Because the key will be automatically generated by ArangoDB when one wasn't specified in the request body, we're using Object.assign to apply the attributes of the metadata object returned by the save method to the document before returning it from our first route. The an document method returns a document in a collection by its ArangoError _key or _id . However when no matching document exists it throws exception. Because we want to provide a more descriptive error message than ArangoDB does out of the box, we need to handle that error explicitly. All ArangoError about exceptions have a truthy attribute instanceof import the errors checks. They also provide an object from the @arangodb isArangoError errorNum and an that helps you recognizing these errors without having to worry errorMessage . If you want to check for specific errors you can just module instead of having to memorize numeric error codes. Instead of defining our own response logic for the error case we just use res.throw , which makes the response object throw an exception Foxx can recognize and convert to the appropriate server response. We also pass along the exception itself so Foxx can provide more diagnostic information when we want it to. We could extend the post route to support arrays of objects as well, each object following a certain schema: // store schema in variable to make it re-usable, see .body() const docSchema = joi.object().required().keys({ name: joi.string().required(), age: joi.number().required() }).unknown(); // allow additional attributes router.post('/entries', function (req, res) { const multiple = Array.isArray(req.body); const body = multiple ? req.body : [req.body]; let data = []; for (var doc of body) { const meta = foxxColl.save(doc); data.push(Object.assign(doc, meta)); } res.send(multiple ? data : data[0]); }) .body(joi.alternatives().try( docSchema, joi.array().items(docSchema) ), 'Entry or entries to store in the collection.') .response(joi.alternatives().try( joi.object().required(), joi.array().items(joi.object().required()) ), 'Entry or entries stored in the collection.') .summary('Store entry or entries') .description('Store a single entry or multiple entries in the "myFoxxCollection" collection.'); Writing database queries Storing and retrieving entries is fine, but right now we have to memorize each key when we create an entry. Let's add a route that gives us a list of the keys of all entries so we can use those to look an entry up in detail. The naïve approach would be to use the toArray() method to convert the entire collection to an array and just return that. But we're only interested in the keys and there might potentially be so many entries that first retrieving every single document might get unwieldy. Let's write a short AQL query to do this instead: // continued const aql = require('@arangodb').aql; router.get('/entries', function (req, res) { const keys = db._query(aql` FOR entry IN ${foxxColl} RETURN entry._key `); res.send(keys); }) .response(joi.array().items( joi.string().required() 189 Getting started ).required(), 'List of entry keys.') .summary('List entry keys') .description('Assembles a list of keys of entries in the collection.'); Here we're using two new things: The _query The aql method executes an AQL query in the active database. template string handler allows us to write multi-line AQL queries and also handles query parameters and collection names. Instead of hardcoding the name of the collection we want to use in the query we can simply reference the foxxColl variable we defined earlier -- it recognizes the value as an ArangoDB collection object and knows we are specifying a collection rather than a regular value even though AQL distinguishes between the two. Note: If you aren't used to JavaScript template strings and template string handlers just think of multiline string split at every ${} aql as a function that receives the expression as well as an array of the values of those expressions -- that's actually all there is to it. Alternatively, here's a version without template strings (notice how much cleaner the aql version will be in comparison when you have multiple variables): const keys = db._query( 'FOR entry IN @@coll RETURN entry._key', {'@coll': foxxColl} ); Next steps You now know how to create a Foxx service from scratch, how to handle user input and how to access the database from within your Foxx service to store, retrieve and query data you store inside ArangoDB. This should allow you to build meaningful APIs for your own applications but there are many more things you can do with Foxx: Need to go faster? Turn on development mode and hack on your code right on the server. Concerned about security? You could add authentication to your service to protect access to the data before it even leaves the database. Writing a single page app? You could store some basic assets right inside your Foxx service. Need to integrate external services? You can make HTTP requests from inside Foxx and use queued jobs to perform that work in the background. Tired of reinventing the wheel? Learn about dependencies. Everything broken? You can write tests to make sure your logic remains sound. 190 Service manifest Manifest files Every service comes with a configuration: file providing metadata. The following fields are allowed in manifests: manifest.json (optional) Object An object defining the configuration options this service requires. defaultDocument: If specified, the / (optional) string (root) route of the service will automatically redirect to the given relative path, e.g.: "defaultDocument": "index.html" This would have the same effect as creating the following route in JavaScript: const createRouter = require('@arangodb/foxx/router'); const indexRouter = createRouter(); indexRouter.all('/', function (req, res) { res.redirect('index.html'); }); module.context.use(indexRouter); Note: As of 3.0.0 this field can safely be omitted; the value no longer defaults to dependencies: Object (optional) and provides: Object "index.html" . (optional) Objects specifying other services this service has as dependencies and what dependencies it can provide to other services. engines: Object (optional) An object indicating the semantic version ranges of ArangoDB (or compatible environments) the service will be compatible with, e.g.: "engines": { "arangodb": "^3.0.0" } This should correctly indicate the minimum version of ArangoDB the service has been tested against. Foxx maintains a strict semantic versioning policy as of ArangoDB 3.0.0 so it is generally safe to use semver ranges (e.g. greater or equal to files: Object 3.0.0 and below 4.0.0 ^3.0.0 to match any version ) for maximum compatibility. (optional) An object defining file assets served by this service. lib: string (Default: "." ) The relative path to the Foxx JavaScript files in the service, e.g.: "lib": "lib" This would result in the main entry point (see below) and other JavaScript paths being resolved as relative to the lib folder inside the service folder. main: string (optional) The relative path to the main entry point of this service (relative to lib, see above), e.g.: "main": "index.js" This would result in Foxx loading and executing the file index.js when the service is mounted or started. 191 Service manifest Note: while it is technically possible to omit this field, you will likely want to provide an entry point to your service as this is the only way to expose HTTP routes or export a JavaScript API. scripts: (optional) Object An object defining named scripts provided by this service, which can either be used directly or as queued jobs by other services. tests: or string Array (optional) A path or list of paths of JavaScript tests provided for this service. Additionally manifests can provide the following metadata: author: (optional) string The full name of the author of the service (i.e. you). This will be shown in the web interface. contributors: Array (optional) A list of names of people that have contributed to the development of the service in some way. This will be shown in the web interface. description: string (optional) A human-readable description of the service. This will be shown in the web interface. keywords: Array (optional) A list of keywords that help categorize this service. This is used by the Foxx Store installers to organize services. license: string (optional) A string identifying the license under which the service is published, ideally in the form of an SPDX license identifier. This will be shown in the web interface. name: string (optional) The name of the Foxx service. Allowed characters are A-Z, 0-9, the ASCII hyphen ( - ) and underscore ( _ ) characters. The name must not start with a number. This will be shown in the web interface. thumbnail: string (optional) The filename of a thumbnail that will be used alongside the service in the web interface. This should be a JPEG or PNG image that looks good at sizes 50x50 and 160x160. version: string (optional) The version number of the Foxx service. The version number must follow the semantic versioning format. This will be shown in the web interface. Examples { "name": "example-foxx-service", "version": "3.0.0-dev", "license": "MIT", "description": "An example service with a relatively full-featured manifest.", "thumbnail": "foxx-icon.png", "keywords": ["demo", "service"], "author": "ArangoDB GmbH", "contributors": [ "Alan Plum " ], "lib": "dist", "main": "entry.js", "defaultDocument": "welcome.html", "engines": { "arangodb": "^3.0.0" }, "files": { 192 Service manifest "welcome.html": "assets/index.html", "hello.jpg": "assets/hello.jpg" "world.jpg": { "path": "assets/world.jpg", "type": "image/jpeg", "gzip": false } }, "tests": "dist/**.spec.js" } 193 Service context Foxx service context The service context provides access to methods and attributes that are specific to a given service. In a Foxx service the context is generally available as the module.context variable. Within a router's request handler the request and response objects' context attribute also provide access to the context of the service the route was mounted in (which may be different from the one the route handler was defined in). Examples // in service /my-foxx-1 const createRouter = require('@arangodb/foxx/router'); const router = createRouter(); // See the chapter on dependencies for more info on // how exports and dependencies work across services module.exports = {routes: router}; router.get(function (req, res) { module.context.mount === '/my-foxx-1'; req.context.mount === '/my-foxx-2'; res.write('Hello from my-foxx-1'); }); // in service /my-foxx-2 const createRouter = require('@arangodb/foxx/router'); const router2 = createRouter(); module.context.use(router2); router2.post(function (req, res) { module.context.mount === '/my-foxx-2'; req.context.mount === '/my-foxx-2'; res.write('Hello from my-foxx-2'); }); const router1 = module.context.dependencies.myFoxx1.routes; module.context.use(router1); The service context specifies the following properties: argv: any Any arguments passed in if the current file was executed as a script or queued job. basePath: string The file system path of the service, i.e. the folder in which the service was installed to by ArangoDB. baseUrl: string The base URL of the service, relative to the ArangoDB server, e.g. collectionPrefix: /_db/_system/my-foxx . string The prefix that will be used by collection and collectionName to derive the names of service-specific collections. This is derived from the service's mount point, e.g. configuration: /my-foxx becomes my_foxx . Object Configuration options for the service. dependencies: Object Configured dependencies for the service. isDevelopment: boolean Indicates whether the service is running in development mode. 194 Service context isProduction: boolean The inverse of isDevelopment. manifest: Object The parsed manifest file of the service. mount: string The mount point of the service, e.g. /my-foxx . apiDocumentation module.context.apiDocumentation([options]): Function DEPRECATED Creates a request handler that serves the API documentation. Note: This method has been deprecated in ArangoDB 3.1 and replaced with the more straightforward createDocumentationRouter method providing the same functionality. Arguments See createDocumentationRouter below. Examples // Serve the API docs for the current service router.get('/docs/*', module.context.apiDocumentation()); // Note that the path must end with a wildcard // and the route must use HTTP GET. createDocumentationRouter module.context.createDocumentationRouter([options]): Router Creates a router that serves the API documentation. Note: The router can be mounted like any other child router (see examples below). Arguments options: Object (optional) An object with any of the following properties: mount: string (Default: module.context.mount ) The mount path of the service to serve the documentation of. indexFile: string (Default: "index.html" ) File name of the HTM L file serving the API documentation. swaggerRoot: string (optional) Full path of the folder containing the Swagger assets and the indexFile. Defaults to the Swagger assets used by the web interface. before: Function (optional) A function that will be executed before a request is handled. If the function returns false the request will not be processed any further. If the function returns an object, its attributes will be used to override the options for the current request. 195 Service context Any other return value will be ignored. If options is a function it will be used as the before option. If options is a string it will be used as the swaggerRoot option. Returns a Foxx router. Examples // Serve the API docs for the current service router.use('/docs', module.context.createDocumentationRouter()); // -- or -// Serve the API docs for the service the router is mounted in router.use('/docs', module.context.createDocumentationRouter(function (req) { return {mount: req.context.mount}; })); // -- or -// Serve the API docs only for users authenticated with ArangoDB router.use('/docs', module.context.createDocumentationRouter(function (req, res) { if (req.suffix === 'swagger.json' && !req.arangoUser) { res.throw(401, 'Not authenticated'); } })); collection module.context.collection(name): ArangoCollection | null Passes the given name to collectionName, then looks up the collection with the prefixed name. Arguments name: string Unprefixed name of the service-specific collection. Returns a collection or null if no collection with the prefixed name exists. collectionName module.context.collectionName(name): string Prefixes the given name with the collectionPrefix for this service. Arguments name: string Unprefixed name of the service-specific collection. Returns the prefixed name. Examples module.context.mount === '/my-foxx' module.context.collectionName('doodads') === 'my_foxx_doodads' file module.context.file(name, [encoding]): Buffer | string Passes the given name to fileName, then loads the file with the resulting name. 196 Service context Arguments name: string Name of the file to load, relative to the current service. encoding: string (optional) Encoding of the file, e.g. utf-8 . If omitted the file will be loaded as a raw buffer instead of a string. Returns the file's contents. fileName module.context.fileName(name): string Resolves the given file name relative to the current service. Arguments name: string Name of the file, relative to the current service. Returns the absolute file path. use module.context.use([path], router): Endpoint M ounts a given router on the service to expose the router's routes on the service's mount point. Arguments path: string (Default: "/" ) Path to mount the router at, relative to the service's mount point. router: Router | Middleware A router or middleware to mount. Returns an Endpoint for the given router or middleware. Note: M ounting services at run time (e.g. within request handlers or queued jobs) is not supported. 197 Configuration Foxx configuration Foxx services can define configuration parameters to make them more re-usable. The object maps names to configuration parameters: configuration The key is the name under which the parameter will be available on the service context's configuration property. The value is a parameter definition. The parameter definition can have the following properties: description: string Human readable description of the parameter. type: (Default: string "string" ) Type of the configuration parameter. Supported values are: "integer" or "int" : any finite integer number. "boolean" or "bool" : the values true or false "number" : any finite decimal or integer number. "string" : any string value. "json" : any well-formed JSON value. "password" default: . : like string but will be displayed as a masked input field in the web frontend. any Default value of the configuration parameter. required: (Default: true ) Whether the parameter is required. If the configuration has parameters that do not specify a default value, you need to configure the service before it becomes active. In the meantime a fallback servicelication will be mounted that responds to all requests with a HTTP 500 status code indicating a server-side error. The configuration parameters of a mounted service can be adjusted from the web interface by clicking the Configuration button in the service details. Examples "configuration": { "currency": { "description": "Currency symbol to use for prices in the shop.", "default": "$", "type": "string" }, "secretKey": { "description": "Secret key to use for signing session tokens.", "type": "password" } } 198 Dependencies Dependency management There are two things commonly called "dependencies" in Foxx: M odule dependencies, i.e. dependencies on external JavaScript modules (e.g. from the public npm registry) Foxx dependencies, i.e. dependencies between Foxx services Let's look at them in more detail: Module dependencies You can use the folder to bundle third-party Foxx-compatible npm and Node.js modules with your Foxx service. node_modules Typically this is achieved by adding a package.json file to your project specifying npm dependencies using the dependencies attribute and installing them with the npm command-line tool. M ake sure to include the actual node_modules folder in your Foxx service bundle as ArangoDB will not do anything special to install these dependencies. Also keep in mind that bundling extraneous modules like development dependencies may bloat the file size of your Foxx service bundle. Compatibility caveats Unlike JavaScript in browsers or Node.js, the JavaScript environment in ArangoDB is synchronous. This means any modules that depend on asynchronous behaviour like promises or setTimeout will not behave correctly in ArangoDB or Foxx. Additionally unlike Node.js ArangoDB does not support native extensions. All modules have to be implemented in pure JavaScript. While ArangoDB provides a lot of compatibility code to support modules written for Node.js, some Node.js built-in modules can not be provided by ArangoDB. For a closer look at the Node.js modules ArangoDB does or does not provide check out the appendix on JavaScript modules. Also note that these restrictions not only apply on the modules you wish to install but also the dependencies of those modules. As a rule of thumb: modules written to work in Node.js and the browser that do not rely on async behaviour should generally work; modules that rely on network or filesystem I/O or make heavy use of async behaviour most likely will not. Foxx dependencies Foxx dependencies can be declared in a service's manifest using the provides provides and dependencies fields: lists the dependencies a given service provides, i.e. which APIs it claims to be compatible with lists the dependencies a given service uses, i.e. which APIs its dependencies need to be compatible with dependencies A dependency name should generally use the same format as a namespaced (org-scoped) NPM module, e.g. @foxx/sessions . Dependency names refer to the external JavaScript API of a service rather than specific services implementing those APIs. Some dependency names defined by officially maintained services are: @foxx/auth (version 1.0.0 ) @foxx/api-keys (version 1.0.0 @foxx/bugsnag (versions 1.0.0 and 2.0.0 ) @foxx/mailgun (versions 1.0.0 and 2.0.0 ) @foxx/postageapp (versions ) 1.0.0 and 2.0.0 @foxx/postmark (versions 1.0.0 and 2.0.0 ) @foxx/sendgrid (versions 1.0.0 and 2.0.0 ) @foxx/oauth2 (versions @foxx/segment-io @foxx/sessions @foxx/users 1.0.0 (versions (versions (versions and 1.0.0 1.0.0 1.0.0 , 2.0.0 and and 2.0.0 ) 2.0.0 2.0.0 and ) ) ) 3.0.0 ) 199 Dependencies A provides definition maps each provided dependency's name to the provided version: "provides": { "@foxx/auth": "1.0.0" } A dependencies definition maps the local alias of a given dependency against its name and the supported version range (either as a JSON object or a shorthand string): "dependencies": { "mySessions": "@foxx/sessions:^2.0.0", "myAuth": { "name": "@foxx/auth", "version": "^1.0.0", "description": "This description is entirely optional.", "required": false } } Dependencies can be configured from the web interface in a service's settings tab using the Dependencies button. The value for each dependency should be the database-relative mount path of the service (including the leading slash). In order to be usable as the dependency of another service both services need to be mounted in the same database. A service can be used to provide multiple dependencies for the same service (as long as the expected JavaScript APIs don't conflict). A service that has unconfigured required dependencies can not be used until all of its dependencies have been configured. It is possible to specify the mount path of a service that does not actually declare the dependency as provided. There is currently no validation beyond the manifest formats. When a service uses another mounted service as a dependency the dependency's the module.context.dependencies main entry file's exports object becomes available in object of the other service: Examples Service A and Service B are mounted in the same database. Service B has a dependency with the local alias "greeter" . The dependency is configured to use the mount path of Service A. // Entry file of Service A module.exports = { sayHi () { return 'Hello'; } }; // Somewhere in Service B const greeter = module.context.dependencies.greeter; res.write(greeter.sayHi()); 200 Routers Routers const createRouter = require('@arangodb/foxx/router'); Routers let you define routes that extend ArangoDB's HTTP API with custom endpoints. Routers need to be mounted using the use method of a service context to expose their HTTP routes at a service's mount path. You can pass routers between services mounted in the same database as dependencies. You can even nest routers within each other. Creating a router createRouter(): Router This returns a new, clean router object that has not yet been mounted in the service and can be exported like any other object. Request handlers router.get([path], [...middleware], handler, [name]): Endpoint router.post([path], [...middleware], handler, [name]): Endpoint router.put([path], [...middleware], handler, [name]): Endpoint router.patch([path], [...middleware], handler, [name]): Endpoint router.delete([path], [...middleware], handler, [name]): Endpoint router.all([path], [...middleware], handler, [name]): Endpoint These methods let you specify routes on the router. The all method defines a route that will match any supported HTTP verb, the other methods define routes that only match the HTTP verb with the same name. Arguments path: string (Default: "/" ) The path of the request handler relative to the base path the Router is mounted at. If omitted, the request handler will handle requests to the base path of the Router. For information on defining dynamic routes see the section on path parameters in the chapter on router endpoints. middleware: Function (optional) Zero or more middleware functions that take the following arguments: req: Request An incoming server request object. res: Response An outgoing server response object. next: Function A callback that passes control over to the next middleware function and returns when that function has completed. If a truthy argument is passed, that argument will be thrown as an error. If there is no next middleware function, the handler: handler will be invoked instead (see below). Function A function that takes the following arguments: req: Request An incoming server request object. 201 Routers res: Response An outgoing server response. name: string (optional) A name that can be used to generate URLs for the endpoint. For more information see the reverse method of the request object. Returns an Endpoint for the route. Examples Simple index route: router.get(function (req, res) { res.set('content-type', 'text/plain'); res.write('Hello World!'); }); Restricting access to authenticated ArangoDB users: router.get('/secrets', function (req, res, next) { if (req.arangoUser) { next(); } else { res.throw(404, 'Secrets? What secrets?'); } }, function (req, res) { res.download('allOurSecrets.zip'); }); M ultiple middleware functions: function counting (req, res, next) { if (!req.counter) req.counter = 0; req.counter++; next(); req.counter--; } router.get(counting, counting, counting, function (req, res) { res.json({counter: req.counter}); // {"counter": 3} }); Mounting child routers and middleware router.use([path], middleware, [name]): Endpoint The use method lets you mount a child router or middleware at a given path. Arguments path: string (optional) The path of the middleware relative to the base path the Router is mounted at. If omitted, the middleware will handle requests to the base path of the Router. For information on defining dynamic routes see the section on path parameters in the chapter on router endpoints. middleware: Router | Middleware An unmounted router object or a middleware. name: string (optional) A name that can be used to generate URLs for endpoints of this router. For more information see the reverse method of the request object. Has no effect if handler is a M iddleware. Returns an Endpoint for the middleware or child router. 202 Routers 203 Endpoints Endpoints Endpoints are returned by the use , all and HTTP verb (e.g. get , post ) methods of routers as well as the use method of the service context. They can be used to attach metadata to mounted routes, middleware and child routers that affects how requests and responses are processed or provides API documentation. Endpoints should only be used to invoke the following methods. Endpoint methods can be chained together (each method returns the endpoint itself). header endpoint.header(name, [schema], [description]): this Defines a request header recognized by the endpoint. Any additional non-defined headers will be treated as optional string values. The definitions will also be shown in the route details in the API documentation. If the endpoint is a child router, all routes of that router will use this header definition unless overridden. Arguments name: string Name of the header. This should be considered case insensitive as all header names will be converted to lowercase. schema: Schema (optional) A schema describing the format of the header value. This can be a joi schema or anything that has a compatible The value of this header will be set to the value validate method. property of the validation result. A validation failure will result in an automatic 400 (Bad Request) error response. description: string (optional) A human readable string that will be shown in the API documentation. Returns the endpoint. Examples router.get(/* ... */) .header('arangoVersion', joi.number().min(30000).default(30000)); pathParam endpoint.pathParam(name, [schema], [description]): this Defines a path parameter recognized by the endpoint. Path parameters are expected to be filled as part of the endpoint's mount path. Any additional non-defined path parameters will be treated as optional string values. The definitions will also be shown in the route details in the API documentation. If the endpoint is a child router, all routes of that router will use this parameter definition unless overridden. Arguments name: string Name of the parameter. schema: Schema (optional) A schema describing the format of the parameter. This can be a joi schema or anything that has a compatible validate method. 204 Endpoints The value of this parameter will be set to the value property of the validation result. A validation failure will result in the route failing to match and being ignored (resulting in a 404 (Not Found) error response if no other routes match). description: string (optional) A human readable string that will be shown in the API documentation. Returns the endpoint. Examples router.get('/some/:num/here', /* ... */) .pathParam('num', joi.number().required()); queryParam endpoint.queryParam(name, [schema], [description]): this Defines a query parameter recognized by the endpoint. Any additional non-defined query parameters will be treated as optional string values. The definitions will also be shown in the route details in the API documentation. If the endpoint is a child router, all routes of that router will use this parameter definition unless overridden. Arguments name: string Name of the parameter. schema: (optional) Schema A schema describing the format of the parameter. This can be a joi schema or anything that has a compatible The value of this parameter will be set to the value validate method. property of the validation result. A validation failure will result in an automatic 400 (Bad Request) error response. description: string (optional) A human readable string that will be shown in the API documentation. Returns the endpoint. Examples router.get(/* ... */) .queryParam('num', joi.number().required()); body endpoint.body([model], [mimes], [description]): this Defines the request body recognized by the endpoint. There can only be one request body definition per endpoint. The definition will also be shown in the route details in the API documentation. In the absence of a request body definition, the request object's body property will be initialized to the unprocessed rawBody buffer. If the endpoint is a child router, all routes of that router will use this body definition unless overridden. If the endpoint is a middleware, the request body will only be parsed once (i.e. the M IM E types of the route matching the same request will be ignored but the body will still be validated again). Arguments model: Model | Schema | null (optional) A model or joi schema describing the request body. A validation failure will result in an automatic 400 (Bad Request) error response. 205 Endpoints If the value is a model with a method, that method will be applied to the parsed request body. fromClient If the value is a schema or a model with a schema, the schema will be used to validate the request body and the value property of the validation result of the parsed request body will be used instead of the parsed request body itself. If the value is a model or a schema and the M IM E type has been omitted, the M IM E type will default to JSON instead. If the value is explicitly set to null , no request body will be expected. If the value is an array containing exactly one model or schema, the request body will be treated as an array of items matching that model or schema. mimes: (optional) Array An array of M IM E types the route supports. Common non-mime aliases like "json" or "html" are also supported and will be expanded to the appropriate M IM E type (e.g. "application/json" and "text/html"). If the M IM E type is recognized by Foxx the request body will be parsed into the appropriate structure before being validated. Currently only JSON, application/x-www-form-urlencoded and multipart formats are supported in this way. If the M IM E type indicated in the request headers does not match any of the supported M IM E types, the first M IM E type in the list will be used instead. Failure to parse the request body will result in an automatic 400 (Bad Request) error response. description: string (optional) A human readable string that will be shown in the API documentation. Returns the endpoint. Examples router.post('/expects/some/json', /* ... */) .body( joi.object().required(), 'This implies JSON.' ); router.post('/expects/nothing', /* ... */) .body(null); // No body allowed router.post('/expects/some/plaintext', /* ... */) .body(['text/plain'], 'This body will be a string.'); response endpoint.response([status], [model], [mimes], [description]): this Defines a response body for the endpoint. When using the response object's send method in the request handler of this route, the definition with the matching status code will be used to generate the response body. The definitions will also be shown in the route details in the API documentation. If the endpoint is a child router, all routes of that router will use this response definition unless overridden. If the endpoint is a middleware, this method has no effect. Arguments status: number | string (Default: 200 or 204 ) HTTP status code the response applies to. If a string is provided instead of a numeric status code it will be used to look up a numeric status code using the statuses module. model: Model | Schema | null (optional) A model or joi schema describing the response body. 206 Endpoints If the value is a model with a forClient method, that method will be applied to the data passed to response.send within the route if the response status code matches (but also if no status code has been set). If the value is a schema or a model with a schema, the actual schema will not be used to validate the response body and only serves to document the response in more detail in the API documentation. If the value is a model or a schema and the M IM E type has been omitted, the M IM E type will default to JSON instead. If the value is explicitly set to instead of 200 null and the status code has been omitted, the status code will default to 204 ("no content") . If the value is an array containing exactly one model or schema, the response body will be an array of items matching that model or schema. mimes: Array (optional) An array of M IM E types the route might respond with for this status code. Common non-mime aliases like "json" or "html" are also supported and will be expanded to the appropriate M IM E type (e.g. "application/json" and "text/html"). When using the description: response.send string method the response body will be converted to the appropriate M IM E type if possible. (optional) A human-readable string that briefly describes the response and will be shown in the endpoint's detailed documentation. Returns the endpoint. Examples // This example only provides documentation // and implies a generic JSON response body. router.get(/* ... */) .response( joi.array().items(joi.string()), 'A list of doodad identifiers.' ); // No response body will be expected here. router.delete(/* ... */) .response(null, 'The doodad no longer exists.'); // An endpoint can define multiple response types // for different status codes -- but never more than // one for each status code. router.post(/* ... */) .response('found', 'The doodad is located elsewhere.') .response(201, ['text/plain'], 'The doodad was created so here is a haiku.'); // Here the response body will be set to // the querystring-encoded result of // FormModel.forClient({some: 'data'}) // because the status code defaults to 200. router.patch(function (req, res) { // ... res.send({some: 'data'}); }) .response(FormModel, ['application/x-www-form-urlencoded'], 'OMG.'); // In this case the response body will be set to // SomeModel.forClient({some: 'data'}) because // the status code has been set to 201 before. router.put(function (req, res) { // ... res.status(201); res.send({some: 'data'}); }) .response(201, SomeModel, 'Something amazing happened.'); error 207 Endpoints endpoint.error(status, [description]): this Documents an error status for the endpoint. If the endpoint is a child router, all routes of that router will use this error description unless overridden. If the endpoint is a middleware, this method has no effect. This method only affects the generated API documentation and has not other effect within the service itself. Arguments status: number | string HTTP status code for the error (e.g. 400 for "bad request"). If a string is provided instead of a numeric status code it will be used to look up a numeric status code using the statuses module. description: string (optional) A human-readable string that briefly describes the error condition and will be shown in the endpoint's detailed documentation. Returns the endpoint. Examples router.get(function (req, res) { // ... res.throw(403, 'Validation error at x.y.z'); }) .error(403, 'Indicates that a validation has failed.'); summary endpoint.summary(summary): this Adds a short description to the endpoint's API documentation. If the endpoint is a child router, all routes of that router will use this summary unless overridden. If the endpoint is a middleware, this method has no effect. This method only affects the generated API documentation and has not other effect within the service itself. Arguments summary: string A human-readable string that briefly describes the endpoint and will appear next to the endpoint's path in the documentation. Returns the endpoint. Examples router.get(/* ... */) .summary('List all discombobulated doodads') description endpoint.description(description): this Adds a long description to the endpoint's API documentation. If the endpoint is a child router, all routes of that router will use this description unless overridden. If the endpoint is a middleware, this method has no effect. This method only affects the generated API documentation and has not other effect within the service itself. Arguments 208 Endpoints description: string A human-readable string that describes the endpoint in detail and will be shown in the endpoint's detailed documentation. Returns the endpoint. Examples // The "dedent" library helps formatting // multi-line strings by adjusting indentation // and removing leading and trailing blank lines const dd = require('dedent'); router.post(/* ... */) .description(dd` This route discombobulates the doodads by frobnicating the moxie of the request body. `) deprecated endpoint.deprecated([deprecated]): this M arks the endpoint as deprecated. If the endpoint is a child router, all routes of that router will also be marked as deprecated. If the endpoint is a middleware, this method has no effect. This method only affects the generated API documentation and has not other effect within the service itself. Arguments deprecated: boolean (Default: true ) Whether the endpoint should be marked as deprecated. If set to false the endpoint will be explicitly marked as not deprecated. Returns the endpoint. Examples router.get(/* ... */) .deprecated(); tag endpoint.tag(...tags): this M arks the endpoint with the given tags that will be used to group related routes in the generated API documentation. If the endpoint is a child router, all routes of that router will also be marked with the tags. If the endpoint is a middleware, this method has no effect. This method only affects the generated API documentation and has not other effect within the service itself. Arguments tags: string One or more strings that will be used to group the endpoint's routes. Returns the endpoint. Examples router.get(/* ... */) .tag('auth', 'restricted'); 209 Endpoints 210 M iddleware Middleware M iddleware in Foxx refers to functions that can be mounted like routes and can manipulate the request and response objects before and after the route itself is invoked. They can also be used to control access or to provide common logic like logging etc. Unlike routes, middleware is mounted with the Instead of a function the use use method like a router. method can also accept an object with a register function that will take a parameter endpoint , the middleware will be mounted at and returns the actual middleware function. This allows manipulating the endpoint before creating the middleware (e.g. to document headers, request bodies, path parameters or query parameters). Examples Restrict access to ArangoDB-authenticated users: module.context.use(function (req, res, next) { if (!req.arangoUser) { res.throw(401, 'Not authenticated with ArangoDB'); } next(); }); Any truthy argument passed to the next function will be thrown as an error: module.context.use(function (req, res, next) { let err = null; if (!req.arangoUser) { err = new Error('This should never happen'); } next(err); // throws if the error was set }) Trivial logging middleware: module.context.use(function (req, res, next) { const start = Date.now(); try { next(); } finally { console.log(`Handled request in ${Date.now() - start}ms`); } }); M ore complex example for header-based sessions: const sessions = module.context.collection('sessions'); module.context.use({ register (endpoint) { endpoint.header('x-session-id', joi.string().optional(), 'The session ID.'); return function (req, res, next) { const sid = req.get('x-session-id'); if (sid) { try { req.session = sessions.document(sid); } catch (e) { delete req.headers['x-session-id']; } } next(); if (req.session) { if (req.session._rev) { sessions.replace(req.session, req.session); res.set('x-session-id', req.session._key); } else { const meta = sessions.save(req.session); res.set('x-session-id', meta._key); } 211 M iddleware } }; } }); 212 Request Request objects The request object specifies the following properties: arangoUser: string | null The authenticated ArangoDB username used to make the request. This value is only set if authentication is enabled in ArangoDB and the request set an authorization header ArangoDB was able to verify. You are strongly encouraged to implement your own authentication logic for your own services but this property can be useful if you need to integrate with ArangoDB's own authentication mechanisms. arangoVersion: number The numeric value of the x-arango-version header or the numeric version of the ArangoDB server (e.g. 30102 for version 3.1.2) if no valid header was provided. baseUrl: string Root-relative base URL of the service, i.e. the prefix body: "/_db/" followed by the value of database. any The processed and validated request body for the current route. If no body has been defined for the current route, the value will be identical to rawBody. For details on how request bodies can be processed and validated by Foxx see the body method of the endpoint object. context: Context The service context in which the router was mounted (rather than the context in which the route was defined). database: string The name of the database in which the request is being handled, e.g. headers: "_system" . object The raw headers object. For details on how request headers can be validated by Foxx see the header method of the endpoint object. hostname: string The hostname (domain name) indicated in the request headers. Defaults to the hostname portion (i.e. excluding the port) of the method: Host header and falls back to the listening address of the server. string The HTTP verb used to make the request, e.g. originalUrl: "GET" . string Root-relative URL of the request, i.e. path followed by the raw query parameters, if any. path: string Database-relative path of the request URL (not including the query parameters). pathParams: object An object mapping the names of path parameters of the current route to their validated values. For details on how path parameters can be validated by Foxx see the pathParam method of the endpoint object. port: number The port indicated in the request headers. 213 Request Defaults to the port portion (i.e. excluding the hostname) of the default port ( for HTTPS or 443 80 header and falls back to the listening port or the appropriate Host for HTTP, depending on secure) if the header only indicates a hostname. If the request was made using a trusted proxy (see trustProxy), this is set to the port portion of the header (or X-Forwarded-Host approriate default port) if present. protocol: string The protocol used for the request. Defaults to or "https" "http" depending on whether ArangoDB is configured to use SSL or not. If the request was made using a trusted proxy (see trustProxy), this is set to the value of the queryParams: X-Forwarded-Proto header if present. object An object mapping the names of query parameters of the current route to their validated values. For details on how query parameters can be validated by Foxx see the queryParam method of the endpoint object. rawBody: Buffer The raw, unparsed, unvalidated request body as a buffer. remoteAddress: string The IP of the client that made the request. If the request was made using a trusted proxy (see trustProxy), this is set to the first IP listed in the X-Forwarded-For header if present. remoteAddresses: Array A list containing the IP addresses used to make the request. Defaults to the value of remoteAddress wrapped in an array. If the request was made using a trusted proxy (see trustProxy), this is set to the list of IPs specified in the X-Forwarded-For header if present. remotePort: number The listening port of the client that made the request. If the request was made using a trusted proxy (see trustProxy), this is set to the port specified in the X-Forwarded-Port header if present. secure: boolean Whether the request was made over a secure connection (i.e. HTTPS). This is set to suffix: when protocol is false "http" and true when protocol is "https" . string The trailing path relative to the current route if the current route ends in a wildcard (e.g. /something/* ). Note: Starting with ArangoDB 3.2 is passed into the service as-is, i.e. percentage escape sequences like unescaped. Also note that the suffix may contain path segments like .. %2F will no longer be which may have special meaning if the suffix is used to build filesystem paths. trustProxy: boolean Indicates whether the request was made using a trusted proxy. If the origin server's address was specified in the ArangoDB configuration using be false url: --frontend.trusted-proxy or the service's trustProxy setting is enabled, this will be true , otherwise it will . string The URL of the request. 214 Request xhr: boolean Whether the request indicates it was made within a browser using AJAX. This is set to true if the X-Requested-With header is present and is a case-insensitive match for the value "xmlhttprequest" . Note that this value does not guarantee whether the request was made from inside a browser or whether AJAX was used and is merely a convention established by JavaScript frameworks like jQuery. accepts req.accepts(types): string | false req.accepts(...types): string | false req.acceptsCharsets(charsets): string | false req.acceptsCharsets(...charsets): string | false req.acceptsEncodings(encodings): string | false req.acceptsEncodings(...encodings): string | false req.acceptsLanguages(languages): string | false req.acceptsLanguages(...languages): string | false These methods wrap the corresponding content negotiation methods of the accepts module for the current request. Examples if (req.accepts(['json', 'html']) === 'html') { // Client explicitly prefers HTML over JSON res.write(' Client prefers HTML
'); } else { // Otherwise just send JSON res.json({success: true}); } cookie req.cookie(name, options): string | null Gets the value of a cookie by name. Arguments name: string Name of the cookie. options: object (optional) An object with any of the following properties: secret: string (optional) Secret that was used to sign the cookie. If a secret is specified, the cookie's signature is expected to be present in a second cookie with the same name and the suffix .sig . Otherwise the signature (if present) will be ignored. algorithm: string (Default: "sha256" ) Algorithm that was used to sign the cookie. If a string is passed instead of an options object it will be interpreted as the secret option. Returns the value of the cookie or null if the cookie is not set or its signature is invalid. 215 Request get / header req.get(name): string req.header(name): string Gets the value of a header by name. You can validate request headers using the header method of the endpoint. Arguments name: string Name of the header. Returns the header value. is req.is(types): string req.is(...types): string This method wraps the (request body) content type detection method of the type-is module for the current request. Examples const type = req.is('html', 'application/xml', 'application/*+xml'); if (type === false) { // no match handleDefault(req.rawBody); } else if (type === 'html') { handleHtml(req.rawBody); } else { // is XML handleXml(req.rawBody); } json req.json(): any Attempts to parse the raw request body as JSON and returns the result. It is generally more useful to define a request body on the endpoint and use the Returns undefined if the request body is empty. M ay throw a SyntaxError req.body property instead. if the body could not be parsed. makeAbsolute req.makeAbsolute(path, [query]): string Resolves the given path relative to the req.context.service 's mount path to a full URL. Arguments path: string The path to resovle. query: string | object A string or object with query parameters to add to the URL. Returns the formatted absolute URL. params req.param(name): any 216 Request Arguments Looks up a parameter by name, preferring It's probably better style to use the name: pathParams or req.pathParams over queryParams req.queryParams . objects directly. string Name of the parameter. Returns the (validated) value of the parameter. range req.range([size]): Ranges | number This method wraps the range header parsing method of the range-parser module for the current request. Arguments size: number (Default: Infinity ) Length of the satisfiable range (e.g. number of bytes in the full response). If present, ranges exceeding the size will be considered unsatisfiable. Returns undefined if the Range header is absent, -2 if the header is present but malformed, -1 if the range is invalid (e.g. start offset is larger than end offset) or unsatisfiable for the given size. Otherwise returns an array of objects with the properties start and end values for each range. The array has an additional property type indicating the request range type. Examples console.log(req.headers.range); // "bytes=40-80" const ranges = req.range(100); console.log(ranges); // [{start: 40, end: 80}] console.log(ranges.type); // "bytes" reverse req.reverse(name, [params]): string Looks up the URL of a named route for the given parameters. Arguments name: string Name of the route to look up. params: object (optional) An object containing values for the (path or query) parameters of the route. Returns the URL of the route for the given parameters. Examples router.get('/items/:id', function (req, res) { /* ... */ }, 'getItemById'); router.post('/items', function (req, res) { // ... const url = req.reverse('getItemById', {id: createdItem._key}); res.set('location', req.makeAbsolute(url)); }); 217 Request 218 Response Response objects The response object specifies the following properties: body: Buffer | string Response body as a string or buffer. Can be set directly or using some of the response methods. context: Context The service context in which the router was mounted (rather than the context in which the route was defined). headers: object The raw headers object. statusCode: number Status code of the response. Defaults to undefined 200 (body set and not an empty string or buffer) or 204 (otherwise) if not changed from . attachment res.attachment([filename]): this Sets the content-disposition header to indicate the response is a downloadable file with the given name. Note: This does not actually modify the response body or access the file system. To send a file from the file system see the or download methods. sendFile Arguments filename: (optional) string Name of the downloadable file in the response body. If present, the extension of the filename will be used to set the response content-type if it has not yet been set. Returns the response object. cookie res.cookie(name, value, [options]): this Sets a cookie with the given name. Arguments name: string Name of the cookie. value: string Value of the cookie. options: object (optional) An object with any of the following properties: ttl: number (optional) Time to live of the cookie in seconds. algorithm: string (Default: "sha256" ) Algorithm that will be used to sign the cookie. 219 Response secret: string (optional) Secret that will be used to sign the cookie. If a secret is specified, the cookie's signature will be stored in a second cookie with the same options, the same name and the suffix .sig . Otherwise no signature will be added. path: string (optional) Path for which the cookie should be issued. domain: string (optional) Domain for which the cookie should be issued. secure: boolean (Default: false ) Whether the cookie should be marked as secure (i.e. HTTPS/SSL-only). httpOnly: (Default: boolean false ) Whether the cookie should be marked as HTTP-only (rather than also exposing it to client-side code). If a string is passed instead of an options object it will be interpreted as the secret option. If a number is passed instead of an options object it will be interpreted as the ttl option. Returns the response object. download res.download(path, [filename]): this The equivalent of calling res.attachment(filename).sendFile(path) . Arguments path: string Path to the file on the local filesystem to be sent as the response body. filename: string (optional) Filename to indicate in the content-disposition header. If omitted the path will be used instead. Returns the response object. getHeader res.getHeader(name): string Gets the value of the header with the given name. Arguments name: string Name of the header to get. Returns the value of the header or undefined . json res.json(data): this Sets the response body to the JSON string value of the given data. 220 Response Arguments data: any The data to be used as the response body. Returns the response object. redirect res.redirect([status], path): this Redirects the response by setting the response location header and status code. Arguments status: number | string (optional) Response status code to set. If the status code is the string value "permanent" it will be treated as the value 301 . If the status code is a string it will be converted to a numeric status code using the statuses module first. If the status code is omitted but the response status has not already been set, the response status will be set to path: 302 . string URL to set the location header to. Returns the response object. removeHeader res.removeHeader(name): this Removes the header with the given name from the response. Arguments name: string Name of the header to remove. Returns the response object. send res.send(data, [type]): this Sets the response body to the given data with respect to the response definition for the response's current status code. Arguments data: any The data to be used as the response body. Will be converted according the response definition for the response's current status code (or 200 ) in the following way: If the data is an ArangoDB result set, it will be converted to an array first. If the response definition specifies a model with a array and the response definition has the multiple forClient method, that method will be applied to the data first. If the data is an flag set, the method will be applied to each entry individually instead. Finally the data will be processed by the response type handler to conver the response body to a string or buffer. type: string (Default: "auto" ) 221 Response Content-type of the response body. If set to "auto" the first M IM E type specified in the response definition for the response's current status code (or 200 ) will be used instead. If set to "auto" and no response definition exists, the M IM E type will be determined the following way: If the data is a buffer the M IM E type will be set to binary ( application/octet-stream ). If the data is an object the M IM E type will be set to JSON and the data will be converted to a JSON string. Otherwise the M IM E type will be set to HTM L and the data will be converted to a string. Returns the response object. sendFile res.sendFile(path, [options]): this Sends a file from the local filesystem as the response body. Arguments path: string Path to the file on the local filesystem to be sent as the response body. If no content-type options: object header has been set yet, the extension of the filename will be used to set the value of that header. (optional) An object with any of the following properties: lastModified: If set to true (optional) boolean or if no last-modified header has been set yet and the value is not set to false the last-modified header will be set to the modification date of the file in milliseconds. Returns the response object. Examples // Send the file "favicon.ico" from this service's folder res.sendFile(module.context.fileName('favicon.ico')); sendStatus res.sendStatus(status): this Sends a plaintext response for the given status code. The response status will be set to the given status code, the response body will be set to the status message corresponding to that status code. Arguments status: number | string Response status code to set. If the status code is a string it will be converted to a numeric status code using the statuses module first. Returns the response object. setHeader / set res.setHeader(name, value): this res.set(name, value): this 222 Response res.set(headers): this Sets the value of the header with the given name. Arguments name: string Name of the header to set. value: string Value to set the header to. headers: object Header object mapping header names to values. Returns the response object. status res.status(status): this Sets the response status to the given status code. Arguments status: number | string Response status code to set. If the status code is a string it will be converted to a numeric status code using the statuses module first. Returns the response object. throw res.throw(status, [reason], [options]): void Throws an HTTP exception for the given status, which will be handled by Foxx to serve the appropriate JSON error response. Arguments status: number | string Response status code to set. If the status code is a string it will be converted to a numeric status code using the statuses module first. If the status code is in the 500-range (500-599), its stacktrace will always be logged as if it were an unhandled exception. If development mode is enabled, the error's stacktrace will be logged as a warning if the status code is in the 400-range (400-499) or as a regular message otherwise. reason: string (optional) M essage for the exception. If omitted, the status message corresponding to the status code will be used instead. options: object (optional) An object with any of the following properties: cause: Error (optional) Cause of the exception that will be logged as part of the error's stacktrace (recursively, if the exception also has a cause property and so on). 223 Response extra: object (optional) Additional properties that will be added to the error response body generated by Foxx. If development mode is enabled, an exception property will be added to this value containing the error message and a property will be added containing an array with each line of the error's stacktrace. stacktrace If an error is passed instead of an options object it will be interpreted as the cause option. If no reason was provided the error's message will be used as the reason instead. Returns nothing. type res.type([type]): string Sets the response content-type to the given type if provided or returns the previously set content-type. Arguments type: string (optional) Content-type of the response body. Unlike res.set('content-type', type) type (e.g. json becomes file extensions can be provided as values and will be translated to the corresponding M IM E application/json ). Returns the content-type of the response body. vary res.vary(names): this res.vary(...names): this This method wraps the vary header manipulation method of the vary module for the current response. The given names will be added to the response's vary header if not already present. Returns the response object. Examples res.vary('user-agent'); res.vary('cookie'); res.vary('cookie'); // duplicates will be ignored // -- or -res.vary('user-agent', 'cookie'); // -- or -res.vary(['user-agent', 'cookie']); write res.write(data): this Appends the given data to the response body. Arguments data: string | Buffer Data to append. 224 Response If the data is a buffer the response body will be converted to a buffer first. If the response body is a buffer the data will be converted to a buffer first. If the data is an object it will be converted to a JSON string first. If the data is any other non-string value it will be converted to a string first. Returns the response object. 225 Using GraphQL Using GraphQL in Foxx const createGraphQLRouter = require('@arangodb/foxx/graphql'); Foxx bundles version 0.6 of the graphql-sync module, which is a synchronous wrapper for the official JavaScript GraphQL reference implementation, to allow writing GraphQL schemas directly inside Foxx. Additionally the the @arangodb/foxx/graphql express-graphql lets you create routers for serving GraphQL requests, which closely mimicks the behaviour of module. For more information on graphql-sync see the graphql-js API reference (note that graphql-sync never wraps results in promises). For an example of a GraphQL schema in Foxx that resolves fields using the database see the GraphQL example service (also available from the Foxx store). Examples const graphql = require('graphql-sync'); const graphqlSchema = new graphql.GraphQLSchema({ // ... }); // Mounting a graphql endpoint directly in a service: module.context.use('/graphql', createGraphQLRouter({ schema: graphqlSchema, graphiql: true })); // Or at the service's root URL: module.context.use(createGraphQLRouter({ schema: graphqlSchema, graphiql: true })); // Or inside an existing router: router.get('/hello', function (req, res) { res.write('Hello world!'); }); router.use('/graphql', createGraphQLRouter({ schema: graphqlSchema, graphiql: true })); Note: ArangoDB aims for stability which means bundled dependencies will generally not be updated as quickly as their maintainers make updates available on GitHub or NPM . Starting with ArangoDB 3.2, if you want to use a newer version than the one bundled with your target version of ArangoDB, you can provide your own version of the library by passing it via the graphql option: const graphql = require('graphql-sync'); const graphqlSchema = new graphql.Schema({ //... }); module.context.use(createGraphQLRouter({ schema: graphqlSchema, graphiql: true, graphql: graphql })) Starting with graphql 0.12 you can also use the official graphql library if you include it in the node_modules folder of your service bundle: const graphql = require('graphql'); // 0.12 or later const graphqlSchema = new graphql.Schema({ //... }); module.context.use(createGraphQLRouter({ schema: graphqlSchema, graphiql: true, 226 Using GraphQL graphql: graphql })) Creating a router createGraphQLRouter(options): Router This returns a new router object with POST and GET routes for serving GraphQL requests. Arguments options: object An object with any of the following properties: schema: GraphQLSchema A GraphQL Schema object from context: any graphql-sync . (optional) The GraphQL context that will be passed to the rootValue: graphql() (Default: boolean If , JSON responses will be pretty-printed. Function false graphql() function from graphql-sync to handle GraphQL queries. (optional) A function that will be used to format errors produced by graphql-sync to handle GraphQL queries. ) pretty: formatError: graphql-sync (optional) object The GraphQL root value that will be passed to the true function from graphql-sync . If omitted, the function from formatError will be used instead. validationRules: Array(optional) Additional validation rules queries must satisfy in addition to those defined in the GraphQL spec. graphiql: If true boolean (Default: false ) , the GraphiQL explorer will be served when loaded directly from a browser. graphql: object (optional) If you need to use your own copy of the graphql-sync module instead of the one bundled with ArangoDB, here you can pass it in directly. If a GraphQL Schema object is passed instead of an options object it will be interpreted as the schema option. Generated routes The router handles GET and POST requests to its root path and accepts the following parameters, which can be provided either as query parameters or as the POST request body: query: string A GraphQL query that will be executed. variables: object | string (optional) An object or a string containing a JSON object with runtime values to use for any GraphQL query variables. operationName: If the provided raw: boolean string query (optional) contains multiple named operations, this specifies which operation should be executed. (Default: false ) 227 Using GraphQL Forces a JSON response even if graphiql is enabled and the request was made using a browser. The POST request body can be provided as JSON or as query string using as application/graphql will be interpreted as the query application/x-www-form-urlencoded . A request body passed parameter. 228 Sessions middleware Session Middleware const sessionMiddleware = require('@arangodb/foxx/sessions'); The session middleware adds the and session sessionStorage properties to the request object and deals with serializing and deserializing the session as well as extracting session identifiers from incoming requests and injecting them into outgoing responses. Examples // Create a session middleware const sessions = sessionsMiddleware({ storage: module.context.collection('sessions'), transport: ['header', 'cookie'] }); // First enable the middleware for this service module.context.use(sessions); // Now mount the routers that use the session const router = createRouter(); module.context.use(router); router.get('/', function (req, res) { res.send(`Hello ${req.session.uid || 'anonymous'}!`); }, 'hello'); router.post('/login', function (req, res) { req.session.uid = req.body; req.sessionStorage.save(req.session); res.redirect(req.reverse('hello')); }); .body(['text/plain'], 'Username'); Creating a session middleware sessionMiddleware(options): Middleware Creates a session middleware. Arguments options: Object An object with the following properties: storage: Storage Storage that will be used to persist the sessions. The storage is also exposed as the sessionStorage on all request objects and as the storage property of the middleware. If a string or collection is passed instead of a Storage, it will be used to create a Collection Storage. transport: Transport | Array Transport or array of transports that will be used to extract the session identifiers from incoming requests and inject them into outgoing responses. When attempting to extract a session identifier, the transports will be used in the order specified until a match is found. When injecting (or clearing) session identifiers, all transports will be invoked. The transports are also exposed as the transport property of the middleware. If the string "cookie" is passed instead of a Transport, the Cookie Transport will be used with the default settings instead. If the string "header" is passed instead of a Transport, the Header Transport will be used with the default settings instead. autoCreate: boolean (Default: If enabled the session storage's true new ) method will be invoked to create an empty session whenever the transport failed to return a session for the incoming request. Otherwise the session will be initialized as null . 229 Sessions middleware Returns the session middleware. 230 Session storages Session Storages Session storages are used by the sessions middleware to persist sessions across requests. Session storages must implement the and fromClient forClient methods and can optionally implement the new method. The built-in session storages generally provide the following attributes: uid: string (Default: null ) A unique identifier indicating the active user. created: number (Default: Date.now() ) The numeric timestamp of when the session was created. data: any (Default: null ) Arbitrary data to persisted in the session. new storage.new(): Session Generates a new session object representing an empty session. The empty session object should not be persisted unless necessary. The return value will be exposed by the middleware as the session property of the request object if no session identifier was returned by the session transports and auto-creation is not explicitly disabled in the session middleware. Examples new() { return { uid: null, created: Date.now(), data: null }; } fromClient storage.fromClient(sid): Session | null Resolves or deserializes a session identifier to a session object. Arguments sid: string Session identifier to resolve or deserialize. Returns a session object representing the session with the given session identifier that will be exposed by the middleware as the session property of the request object. This method will only be called if any of the session transports returned a session identifier. If the session identifier is invalid or expired, the method should return a null value to indicate no matching session. Examples fromClient(sid) { return db._collection('sessions').firstExample({_key: sid}); } forClient storage.forClient(session): string | null 231 Session storages Derives a session identifier from the given session object. Arguments session: Session Session to derive a session identifier from. Returns a session identifier for the session represented by the given session object. This method will be called with the property of the request object unless that property is empty (e.g. null session ). Examples forClient(session) { if (!session._key) { const meta = db._collection('sessions').save(session); return meta._key; } db._collection('sessions').replace(session._key, session); return session._key; } 232 Session storages Collection Session Storage const collectionStorage = require('@arangodb/foxx/sessions/storages/collection'); The collection session storage persists sessions to a collection in the database. Creating a storage collectionStorage(options): Storage Creates a Storage that can be used in the sessions middleware. Arguments options: Object An object with the following properties: collection: ArangoCollection The collection that should be used to persist the sessions. If a string is passed instead of a collection it is assumed to be the fully qualified name of a collection in the current database. ttl: number (Default: 60 * 60 ) The time in seconds since the last update until a session will be considered expired. pruneExpired: boolean (Default: false ) Whether expired sessions should be removed from the collection when they are accessed instead of simply being ignored. autoUpdate: boolean (Default: true ) Whether sessions should be updated in the collection every time they are accessed to keep them from expiring. Disabling this option will improve performance but means you will have to take care of keeping your sessions alive yourself. If a string or collection is passed instead of an options object, it will be interpreted as the collection option. prune storage.prune(): Array Removes all expired sessions from the collection. This method should be called even if the pruneExpired option is enabled to clean up abandoned sessions. Returns an array of the keys of all sessions that were removed. save storage.save(session): Session Saves (replaces) the given session object in the collection. This method needs to be invoked explicitly after making changes to the session or the changes will not be persisted. Assigns a new _key to the session if it previously did not have one. Arguments session: Session A session object. Returns the modified session. clear 233 Session storages storage.clear(session): boolean Removes the session from the collection. Has no effect if the session was already removed or has not yet been saved to the collection (i.e. has no _key ). Arguments session: Session A session object. Returns true if the session was removed or false if it had no effect. 234 Session storages JWT Session Storage const jwtStorage = require('@arangodb/foxx/sessions/storages/jwt'); The JWT session storage converts sessions to and from JSON Web Tokens. Examples // Pass in a secure secret from the Foxx configuration const secret = module.context.configuration.jwtSecret; const sessions = sessionsMiddleware({ storage: jwtStorage(secret), transport: 'header' }); module.context.use(sessions); Creating a storage jwtStorage(options): Storage Creates a Storage that can be used in the sessions middleware. Note: while the "none" algorithm (i.e. no signature) is supported this dummy algorithm provides no security and allows clients to make arbitrary modifications to the payload and should not be used unless you are certain you specifically need it. Arguments options: Object An object with the following properties: algorithm: string (Default: "HS512" ) The algorithm to use for signing the token. Supported values: "HS256" (HM AC-SHA256) "HS384" (HM AC-SHA384) "HS512" (HM AC-SHA512) "none" secret: (no signature) string The secret to use for signing the token. This field is forbidden when using the "none" algorithm but required otherwise. ttl: number (Default: 3600 ) The maximum lifetime of the token in seconds. You may want to keep this short as a new token is generated on every request allowing clients to refresh tokens automatically. verify: boolean If set to false maxExp: (Default: true ) the signature will not be verified but still generated (unless using the "none" algorithm). number (Default: Infinity ) Largest value that will be accepted in an incoming JWT exp (expiration) field. If a string is passed instead of an options object it will be interpreted as the secret option. 235 Session transports Session Transports Session transports are used by the sessions middleware to store and retrieve session identifiers in requests and responses. Session transports must implement the get and/or set methods and can optionally implement the clear method. get transport.get(request): string | null Retrieves a session identifier from a request object. If present this method will automatically be invoked for each transport until a transport returns a session identifier. Arguments request: Request Request object to extract a session identifier from. Returns the session identifier or null if the transport can not find a session identifier in the request. Examples get(req) { return req.get('x-session-id') || null; } set transport.set(response, sid): void Attaches a session identifier to a response object. If present this method will automatically be invoked at the end of a request regardless of whether the session was modified or not. Arguments response: Response Response object to attach a session identifier to. sid: string Session identifier to attach to the response. Returns nothing. Examples set(res) { res.set('x-session-id', value); } clear transport.clear(response): void Attaches a payload indicating that the session has been cleared to the response object. This can be used to clear a session cookie when the session has been destroyed (e.g. during logout). If present this method will automatically be invoked instead of set when the req.session attribute was removed by the route handler. Arguments 236 Session transports response: Response Response object to remove the session identifier from. Returns nothing. 237 Session transports Cookie Session Transport const cookieTransport = require('@arangodb/foxx/sessions/transports/cookie'); The cookie transport stores session identifiers in cookies on the request and response object. Examples // Pass in a secure secret from the Foxx configuration const secret = module.context.configuration.cookieSecret; const sessions = sessionsMiddleware({ storage: module.context.collection('sessions'), transport: cookieTransport({ name: 'FOXXSESSID', ttl: 60 * 60 * 24 * 7, // one week in seconds algorithm: 'sha256', secret: secret }) }); module.context.use(sessions); Creating a transport cookieTransport([options]): Transport Creates a Transport that can be used in the sessions middleware. Arguments options: Object (optional) An object with the following properties: name: (Default: string "sid" ) The name of the cookie. ttl: number (optional) Cookie lifetime in seconds. Note that this does not affect the storage TTL (i.e. how long the session itself is considered valid), just how long the cookie should be stored by the client. algorithm: string (optional) The algorithm used to sign and verify the cookie. If no algorithm is specified, the cookie will not be signed or verified. See the cookie method on the response object. secret: string (optional) Secret to use for the signed cookie. Will be ignored if no algorithm is provided. path: string (optional) Path for which the cookie should be issued. domain: string (optional) Domain for which the cookie should be issued. secure: boolean (Default: false ) Whether the cookie should be marked as secure (i.e. HTTPS/SSL-only). httpOnly: boolean (Default: false ) Whether the cookie should be marked as HTTP-only (rather than also exposing it to client-side code). If a string is passed instead of an options object, it will be interpreted as the name option. 238 Session transports 239 Session transports Header Session Transport const headerTransport = require('@arangodb/foxx/sessions/transports/header'); The header transport stores session identifiers in headers on the request and response objects. Examples const sessions = sessionsMiddleware({ storage: module.context.collection('sessions'), transport: headerTransport('X-FOXXSESSID') }); module.context.use(sessions); Creating a transport headerTransport([options]): Transport Creates a Transport that can be used in the sessions middleware. Arguments options: Object (optional) An object with the following properties: name: string (Default: X-Session-Id ) Name of the header that contains the session identifier (not case sensitive). If a string is passed instead of an options object, it will be interpreted as the name option. 240 Serving files Static file assets The most flexible way to serve files in your Foxx service is to simply pass them through in your router using the context object's fileName method and the response object's sendFile method: router.get('/some/filename.png', function (req, res) { const filePath = module.context.fileName('some-local-filename.png'); res.sendFile(filePath); }); While allowing for greater control of how the file should be sent to the client and who should be able to access it, doing this for all your static assets can get tedious. Alternatively you can specify file assets that should be served by your Foxx service directly in the service manifest using the files attribute: "files": { "/some/filename.png": { "path": "some-local-filename.png", "type": "image/png", "gzip": false }, "/favicon.ico": "bookmark.ico", "/static": "my-assets-folder" } Each entry in the files attribute can represent either a single file or a directory. When serving entire directories, the key acts as a prefix and requests to that prefix will be resolved within the given directory. Options path: string The relative path of the file or folder within the service. type: string (optional) The M IM E content type of the file. Defaults to an intelligent guess based on the filename's extension. gzip: boolean If set to true (Default: false ) the file will be served with gzip-encoding if supported by the client. This can be useful when serving text files like client-side JavaScript, CSS or HTM L. If a string is provided instead of an object, it will be interpreted as the path option. 241 Writing tests Writing tests Foxx provides out of the box support for running tests against an installed service using the M ocha test runner. Test files have full access to the service context and all ArangoDB APIs but like scripts can not define Foxx routes. Running tests An installed service's tests can be executed from the administrative web interface: 1. Open the "Services" tab of the web interface 2. Click on the installed service to be tested 3. Click on the "Settings" tab 4. Click on the flask icon in the top right 5. Accept the confirmation dialog Note that running tests in a production database is not recommended and may result in data loss if the tests access the database. When running a service in development mode special care needs to be taken as performing requests to the service's own routes as part of the test suites may result in tests being executed while the database is in an inconsistent state, leading to unexpected behaviour. Test file paths In order to tell Foxx about files containing test suites, one or more patterns need to be specified in the tests option of the service manifest: { "tests": [ "**/test_*.js", "**/*_test.js" ] } These patterns can be either relative file paths or "globstar" patterns where * matches zero or more characters in a filename ** matches zero or more nested directories For example, given the following directory structure: ++ test/ |++ a/ ||+- a1.js ||+- a2.js ||+- test.js |+- b.js |+- c.js |+- d_test.js +- e_test.js +- test.js The following patterns would match the following files: test.js: test.js test/*.js: /test/b.js /test/c.js /test/d_test.js 242 Writing tests test/**/*.js: /test/a/a1.js /test/a/a2.js /test/a/test.js /test/b.js /test/c.js /test/d_test.js **/test.js: /test/a/test.js **/*test.js: /test/a/test.js /test/d_test.js /e_test.js /test.js Even if multiple patterns match the same file the tests in that file will only be run once. The order of tests is always determined by the file paths, not the order in which they are matched or specified in the manifest. Test structure M ocha test suites can be defined using one of three interfaces: BDD, TDD or Exports. The QUnit interface of M ocha is not supported in ArangoDB. Like all ArangoDB code, test code is always synchronous. BDD interface The BDD interface defines test suites using the describe function and each test case is defined using the function: it 'use strict'; const assert = require('assert'); const trueThing = true; describe('True things', () => { it('are true', () => { assert.equal(trueThing, true); }); }); The BDD interface also offers the alias Test fixtures can be handled using context before and for describe after and specify for for suite-wide fixtures and it . beforeEach and afterEach for per-test fixtures: describe('False things', () => { let falseThing; before(() => { falseThing = !true; }); it('are false', () => { assert.equal(falseThing, false); }); }); TDD interface The TDD interface defines test suites using the suite function and each test case is defined using the test function: 'use strict'; const assert = require('assert'); const trueThing = true; suite('True things', () => { test('are true', () => { 243 Writing tests assert.equal(trueThing, true); }); }); Test fixtures can be handled using suiteSetup and suiteTeardown for suite-wide fixtures and setup and teardown for per-test fixtures: suite('False things', () => { let falseThing; suiteSetup(() => { falseThing = !true; }); test('are false', () => { assert.equal(falseThing, false); }); }); Exports interface The Exports interface defines test cases as methods of plain object properties of the module.exports object: 'use strict'; const assert = require('assert'); const trueThing = true; exports['True things'] = { 'are true': function() { assert.equal(trueThing, true); } }; The keys before , after , beforeEach and afterEach are special-cased and behave like the corresponding functions in the BDD interface: let falseThing; exports['False things'] = { before () { falseThing = false; }, 'are false': function() { assert.equal(falseThing, false); } }; Assertions ArangoDB provides two bundled modules to define assertions: assert chai corresponds to the Node.js assert module, providing low-level assertions that can optionally specify an error message. is the popular Chai Assertion Library, providing both BDD and TDD style assertions using a familiar syntax. 244 Cross Origin Cross-Origin Resource Sharing (CORS) To use CORS in your Foxx services you first need to configure ArangoDB for CORS. As of 3.2 Foxx will then automatically whitelist all response headers as they are used. If you want more control over the whitelist or are using an older version of ArangoDB you can set the following response headers in your request handler: access-control-expose-headers : a comma-separated list of response headers. This defaults to a list of all headers the response is actually using (but not including any access-control-allow-credentials access-control : can be set to ArangoDB trusts the origin. See the notes on headers). "false" to forbid exposing cookies. The default value depends on whether http.trusted-origin . Note that it is not possible to override these headers for the CORS preflight response. It is therefore not possible to accept credentials or cookies only for individual routes, services or databases. The origin needs to be trusted according to the general ArangoDB configuration (see above). 245 Scripts and queued jobs Foxx scripts and queued jobs Foxx lets you define scripts that can be executed as part of the installation and removal process, invoked manually or scheduled to run at a later time using the job queue. To register your script, just add a scripts section to your service manifest: { ... "scripts": { "setup": "scripts/setup.js", "send-mail": "scripts/send-mail.js" } ... } The scripts you define in your service manifest can be invoked from the web interface in the service's settings page with the Scripts dropdown. You can also use the scripts as queued jobs: 'use strict'; const queues = require('@arangodb/foxx/queues'); queues.get('default').push( {mount: '/my-service-mount-point', name: 'send-mail'}, {to: 'user@example.com', body: 'Hello'} ); Script arguments and return values If the script was invoked with any arguments, you can access them using the To return data from your script, you can assign the data to module.exports module.context.argv array. as usual. Please note that this data will be converted to JSON. Any errors raised by the script will be handled depending on how the script was invoked: if the script was invoked from the HTTP API (e.g. using the web interface), it will return an error response using the exception's statusCode property if specified or 500. if the script was invoked from a Foxx job queue, the job's failure counter will be incremented and the job will be rescheduled or marked as failed if no attempts remain. Examples Let's say you want to define a script that takes two numeric values and returns the result of multiplying them: 'use strict'; const assert = require('assert'); const argv = module.context.argv; assert.equal(argv.length, 2, 'Expected exactly two arguments'); assert.equal(typeof argv[0], 'number', 'Expected first argument to be a number'); assert.equal(typeof argv[1], 'number', 'Expected second argument to be a number'); module.exports = argv[0] * argv[1]; Lifecycle Scripts Foxx recognizes lifecycle scripts if they are defined and will invoke them during the installation, update and removal process of the service if you want. 246 Scripts and queued jobs The following scripts are currently recognized as lifecycle scripts by their name: "setup" and "teardown" . Setup Script The setup script will be executed without arguments during the installation of your Foxx service. The setup script may be executed more than once and should therefore be treated as reentrant. Running the same setup script again should not result in any errors or duplicate data. The setup script is typically used to create collections your service needs or insert seed data like initial administrative user accounts and so on. Examples 'use strict'; const db = require('@arangodb').db; const textsCollectionName = module.context.collectionName('texts'); // `textsCollectionName` is now the prefixed name of this service's "texts" collection. // e.g. "example_texts" if the service has been mounted at `/example` if (db._collection(textsCollectionName) === null) { const collection = db._create(textsCollectionName); collection.save({text: 'entry 1 from collection texts'}); collection.save({text: 'entry 2 from collection texts'}); collection.save({text: 'entry 3 from collection texts'}); } else { console.debug(`collection ${texts} already exists. Leaving it untouched.`); } Teardown Script The teardown script will be executed without arguments during the removal of your Foxx service. It can also optionally be executed before upgrading an service. This script typically removes the collections and/or documents created by your service's setup script. Examples 'use strict'; const db = require('@arangodb').db; const textsCollection = module.context.collection('texts'); if (textsCollection) { textsCollection.drop(); } Queues const queues = require('@arangodb/foxx/queues') Foxx allows defining job queues that let you perform slow or expensive actions asynchronously. These queues can be used to send emails, call external APIs or perform other actions that you do not want to perform directly or want to retry on failure. enable or disable the Foxx queues feature --foxx.queues flag If true, the Foxx queues will be available and jobs in the queues will be executed asynchronously. The default is true. When set to false the queue manager will be disabled and any jobs are prevented from being processed, which may reduce CPU load a bit. Please note that Foxx job queues are database-specific. Queues and jobs are always relative to the database in which they are created or accessed. poll interval for Foxx queues --foxx.queues-poll-interval value 247 Scripts and queued jobs The poll interval for the Foxx queues manager. The value is specified in seconds. Lower values will mean more immediate and more frequent Foxx queue job execution, but will make the queue thread wake up and query the queues more often. When set to a low value, the queue thread might cause CPU load. The default is 1 second. If Foxx queues are not used much, then this value may be increased to make the queues thread wake up less. For the low-level functionality see the chapter on the task management module. Creating or updating a queue queues.create(name, [maxWorkers]): Queue Returns the queue for the given name. If the queue does not exist, a new queue with the given name will be created. If a queue with the given name already exists and maxWorkers is set, the queue's maximum number of workers will be updated. The queue will be created in the current database. Arguments name: string Name of the queue to create. maxWorkers: number (Default: 1 ) The maximum number of workers. Examples // Create a queue with the default number of workers (i.e. one) const queue1 = queues.create("my-queue"); // Create a queue with a given number of workers const queue2 = queues.create("another-queue", 2); // Update the number of workers of an existing queue const queue3 = queues.create("my-queue", 10); // queue1 and queue3 refer to the same queue assertEqual(queue1, queue3); Fetching an existing queue queues.get(name): Queue Returns the queue for the given name. If the queue does not exist an exception is thrown instead. The queue will be looked up in the current database. Arguments name: string Name of the queue to fetch. Examples If the queue does not yet exist an exception is thrown: queues.get("some-queue"); // Error: Queue does not exist: some-queue // at ... Otherwise the queue will be returned: const queue1 = queues.create("some-queue"); const queue2 = queues.get("some-queue"); assertEqual(queue1, queue2); Deleting a queue queues.delete(name): boolean 248 Scripts and queued jobs Returns true if the queue was deleted successfully. If the queue did not exist, it returns false instead. The queue will be looked up and deleted in the current database. When a queue is deleted, jobs on that queue will no longer be executed. Deleting a queue will not delete any jobs on that queue. Arguments name: string Name of the queue to delete. Examples const queue = queues.create("my-queue"); queues.delete("my-queue"); // true queues.delete("my-queue"); // false Adding a job to a queue queue.push(script, data, [opts]): string The job will be added to the specified queue in the current database. Returns the job id. Arguments script: object A job type definition, consisting of an object with the following properties: name: string Name of the script that will be invoked. mount: string M ount path of the service that defines the script. backOff: Function | number (Default: 1000 ) Either a function that takes the number of times the job has failed before as input and returns the number of milliseconds to wait before trying the job again, or the delay to be used to calculate an exponential back-off, or maxFailures: number | Infinity (Default: 0 0 for no delay. ): Number of times a single run of a job will be re-tried before it is marked as "failed" . A negative value or Infinity means that the job will be re-tried on failure indefinitely. schema: Schema (optional) Schema to validate a job's data against before enqueuing the job. preprocess: Function (optional) Function to pre-process a job's (validated) data before serializing it in the queue. data: any Job data of the job; must be serializable to JSON. opts: object (optional) Object with any of the following properties: success: Function (optional) Function to be called after the job has been completed successfully. 249 Scripts and queued jobs failure: Function (optional) Function to be called after the job has failed too many times. delayUntil: number | Date (Default: Timestamp in milliseconds (or backOff: Function | number Date Date.now() ) instance) until which the execution of the job should be delayed. (Default: 1000 ) See script.backOff. maxFailures: number | Infinity (Default: 0 ): (Default: 0 ) See script.maxFailures. repeatTimes: number | Function If set to a positive number, the job will be repeated this many times (not counting recovery when using maxFailures). If set to a negative number or repeatUntil: Infinity number | Date , the job will be repeated indefinitely. If set to the job will not be repeated. 0 (optional) If the job is set to automatically repeat, this can be set to a timestamp in milliseconds (or will no longer repeat. Setting this value to zero, a negative value or repeatDelay: number (Default: 0 Infinity Date instance) after which the job has no effect. ) If the job is set to automatically repeat, this can be set to a non-negative value to set the number of milliseconds for which the job will be delayed before it is started again. Note that if you pass a function for the backOff calculation, success callback or failure callback options the function will be serialized to the database as a string and therefore must not rely on any external scope or external variables. When the job is set to automatically repeat, the failure callback will only be executed when a run of the job has failed more than maxFailures times. Note that if the job fails and maxFailures is set, it will be rescheduled according to the backOff until it has either failed too many times or completed successfully before being scheduled according to the repeatDelay again. Recovery attempts by maxFailures do not count towards repeatTimes. The success and failure callbacks receive the following arguments: result: any The return value of the script for the current run of the job. jobData: any The data passed to this method. job: object ArangoDB document representing the job's current state. Examples Let's say we have an service mounted at /mailer that provides a script called send-mail : 'use strict'; const queues = require('@arangodb/foxx/queues'); const queue = queues.create('my-queue'); queue.push( {mount: '/mailer', name: 'send-mail'}, {to: 'hello@example.com', body: 'Hello world'} ); This will not work, because log was defined outside the callback function (the callback must be serializable to a string): // WARNING: THIS DOES NOT WORK! 'use strict'; const queues = require('@arangodb/foxx/queues'); 250 Scripts and queued jobs const queue = queues.create('my-queue'); const log = require('console').log; // outside the callback's function scope queue.push( {mount: '/mailer', name: 'send-mail'}, {to: 'hello@example.com', body: 'Hello world'}, {success: function () { log('Yay!'); // throws 'log is not defined' }} ); Here's an example of a job that will be executed every 5 seconds until tomorrow: 'use strict'; const queues = require('@arangodb/foxx').queues; const queue = queues.create('my-queue'); queue.push( {mount: '/mailer', name: 'send-mail'}, {to: 'hello@example.com', body: 'Hello world'}, { repeatTimes: Infinity, repeatUntil: Date.now() + (24 * 60 * 60 * 1000), repeatDelay: 5 * 1000 } ); Fetching a job from the queue queue.get(jobId): Job Creates a proxy object representing a job with the given job id. The job will be looked up in the specified queue in the current database. Returns the job for the given jobId. Properties of the job object will be fetched whenever they are referenced and can not be modified. Arguments jobId: string The id of the job to create a proxy object for. Examples const jobId = queue.push({mount: '/logger', name: 'log'}, 'Hello World!'); const job = queue.get(jobId); assertEqual(job.id, jobId); Deleting a job from the queue queue.delete(jobId): boolean Deletes a job with the given job id. The job will be looked up and deleted in the specified queue in the current database. Arguments jobId: string The id of the job to delete. Returns true if the job was deleted successfully. If the job did not exist it returns false instead. Fetching an array of jobs in a queue Examples const logScript = {mount: '/logger', name: 'log'}; queue.push(logScript, 'Hello World!', {delayUntil: Date.now() + 50}); assertEqual(queue.pending(logScript).length, 1); // 50 ms later... 251 Scripts and queued jobs assertEqual(queue.pending(logScript).length, 0); assertEqual(queue.progress(logScript).length, 1); // even later... assertEqual(queue.progress(logScript).length, 0); assertEqual(queue.complete(logScript).length, 1); Fetching an array of pending jobs in a queue queue.pending([script]): Array Returns an array of job ids of jobs in the given queue with the status "pending" , optionally filtered by the given job type. The jobs will be looked up in the specified queue in the current database. Arguments script: object (optional) An object with the following properties: name: string Name of the script. mount: string M ount path of the service defining the script. Fetching an array of jobs that are currently in progress queue.progress([script]) Returns an array of job ids of jobs in the given queue with the status "progress" , optionally filtered by the given job type. The jobs will be looked up in the specified queue in the current database. Arguments script: object (optional) An object with the following properties: name: string Name of the script. mount: string M ount path of the service defining the script. Fetching an array of completed jobs in a queue queue.complete([script]): Array Returns an array of job ids of jobs in the given queue with the status "complete" , optionally filtered by the given job type. The jobs will be looked up in the specified queue in the current database. Arguments script: object (optional) An object with the following properties: name: string Name of the script. mount: string M ount path of the service defining the script. Fetching an array of failed jobs in a queue queue.failed([script]): Array 252 Scripts and queued jobs Returns an array of job ids of jobs in the given queue with the status "failed" , optionally filtered by the given job type. The jobs will be looked up in the specified queue in the current database. Arguments script: object (optional) An object with the following properties: name: string Name of the script. mount: string M ount path of the service defining the script. Fetching an array of all jobs in a queue queue.all([script]): Array Returns an array of job ids of all jobs in the given queue, optionally filtered by the given job type. The jobs will be looked up in the specified queue in the current database. Arguments script: object (optional) An object with the following properties: name: string Name of the script. mount: string M ount path of the service defining the script. Aborting a job job.abort(): void Aborts a non-completed job. Sets a job's status to "failed" if it is not already "complete" , without calling the job's onFailure callback. 253 M igrating 2.x services Migrating 2.x services to 3.0 When migrating services from older versions of ArangoDB it is generally recommended you make sure they work in legacy compatibility mode, which can also serve as a stop-gap solution. This chapter outlines the major differences in the Foxx API between ArangoDB 2.8 and ArangoDB 3.0. General changes The console object in later versions of ArangoDB 2.x implemented a special Foxx console API and would optionally log messages to a collection. ArangoDB 3.0 restores the original behaviour where console is the same object available from the console module. 254 M igrating from pre-2.8 Migrating from pre-2.8 When migrating from a version older than ArangoDB 2.8 please note that starting with ArangoDB 2.8 the behaviour of the require function more closely mimics the behaviour observed in Node.js and module bundlers for browsers, e.g.: In a file /routes/examples.js require('./my-module') (relative to the root folder of the service): will be attempted to be resolved in the following order: 1. /routes/my-module 2. /routes/my-module.js 3. /routes/my-module.json 4. /routes/my-module/index.js 5. /routes/my-module/index.json require('lodash') (relative to service root) (relative to service root) (relative to service root) (relative to service root) (relative to service root) will be attempted to be resolved in the following order: 1. /routes/node_modules/lodash 2. /node_modules/lodash 3. ArangoDB module (relative to service root) lodash 4. Node compatibility module 5. Bundled NPM module require('/abs/path') (relative to service root) lodash lodash will be attempted to be resolved in the following order: 1. /abs/path 2. /abs/path.js (relative to file system root) 3. /abs/path.json 4. /abs/path/index.js 5. /abs/path/index.json (relative to file system root) (relative to file system root) (relative to file system root) (relative to file system root) This behaviour is incompatible with the source code generated by the Foxx generator in the web interface before ArangoDB 2.8. Note: The the old name @arangodb module is aliased to the new name org/arangodb org/arangodb @arangodb in ArangoDB 3.0.0 and the @arangodb module was aliased to in ArangoDB 2.8.0. Either one will work in 2.8 and 3.0 but outside of legacy services you should use going forward. Foxx queue In ArangoDB 2.6 Foxx introduced a new way to define queued jobs using Foxx scripts to replace the function-based job type definitions which were causing problems when restarting the server. The function-based jobs have been removed in 2.7 and are no longer supported at all. CoffeeScript ArangoDB 3.0 no longer provides built-in support for CoffeeScript source files, even in legacy compatibility mode. If you want to use an alternative language like CoffeeScript, make sure to pre-compile the raw source files to JavaScript and use the compiled JavaScript files in the service. The request module The @arangodb/request module when used with the json option previously overwrote the string in the body property of the response object of the response with the parsed JSON body. In 2.8 this was changed so the parsed JSON body is added as the property of the response object in addition to overwriting the body property is no longer overwritten and must use the body json json property. In 3.0 and later (including legacy compatibility mode) the property instead. Note that this only affects code using the json option when making the request. 255 M igrating from pre-2.8 Bundled NPM modules The bundled NPM modules have been upgraded and may include backwards-incompatible changes, especially the API of joi has changed several times. If in doubt you should bundle your own versions of these modules to ensure specific versions will be used. The utility module lodash is now available and should be used instead of underscore , but both modules will continue to be provided. 256 manifest.json Manifest M any of the fields that were required in ArangoDB 2.x are now optional and can be safely omitted. To avoid compatibility problems with future versions of ArangoDB you should always specify the engines field, e.g.: { "engines": { "arangodb": "^3.0.0" } } Controllers & exports Previously Foxx distinguished between these have been merged into a single exports main and controllers , each of which could be specified as an object. In ArangoDB 3.0 field specifying an entry file. The easiest way to migrate services using multiple exports and/or controllers is to create a separate entry file that imports these files: Old (manifest.json): { "exports": { "doodads": "doodads.js", "dingbats": "dingbats.js" }, "controllers": { "/doodads": "routes/doodads.js", "/dingbats": "routes/dingbats.js", "/": "routes/root.js" } } New (manifest.json): { "main": "index.js" } New (index.js): 'use strict'; module.context.use('/doodads', require('./routes/doodads')); module.context.use('/dingbats', require('./routes/dingbats')); module.context.use('/', require('./routes/root')); module.exports = { doodads: require('./doodads'), dingbats: require('./dingbats') }; Index redirect If you previously did not define the index.html defaultDocument field, please note that in ArangoDB 3.0 the field will no longer default to the value when omitted: Old: { // no defaultDocument } 257 manifest.json New: { "defaultDocument": "index.html" } This also means it is no longer necessary to specify the serve requests at the / defaultDocument field with an empty value to prevent the redirect and be able to (root) path of the mount point: Old: { "defaultDocument": "" } New: { // no defaultDocument } Assets The assets field is no longer supported in ArangoDB 3.0 outside of legacy compatibility mode. If you previously used the field to serve individual files as-is you can simply use the files field instead: Old: { "assets": { "client.js": { "files": ["assets/client.js"], "contentType": "application/javascript" } } } New: { "files": { "client.js": { "path": "assets/client.js", "type": "application/javascript" } } } If you relied on being able to specify multiple files that should be concatenated, you will have to use build tools outside of ArangoDB to prepare these files accordingly. Root element The rootElement field is no longer supported and has been removed entirely. If your controllers relied on this field being available you need to adjust your schemas and routes to be able to handle the full JSON structure of incoming documents. System services 258 manifest.json The isSystem field is no longer supported. The presence or absence of the field had no effect in most recent versions of ArangoDB 2.x and has now been removed entirely. 259 applicationContext The application context The global applicationContext variable available in Foxx modules has been replaced with the context attribute of the module variable. For consistency it is now referred to as the service context throughout this documentation. Some methods of the service context have changed in ArangoDB 3.0: fileName() path() now behaves like foxxFileName() Additionally the options dependencies The internal did in ArangoDB 2.x fileName() has been removed (use version manifest.version The path() has been removed (use and and name instead) fileName() instead) attributes have been removed and can now only be accessed via the manifest.name manifest attribute (as ). Note that the corresponding manifest fields are now optional and may be omitted. attribute has also been removed as it should be considered an implementation detail. You should instead access the and configuration _prefix attributes directly. attribute (which was an alias for basePath ) and the internal comment and clearComments methods (which were used by the magical documentation comments in ArangoDB 2.x) have also been removed. The internal _service attribute (which provides access to the service itself) has been renamed to service . 260 Repositories and M odels Repositories and models Previously Foxx was heavily built around the concept of repositories and models, which provided complex but rarely necessary abstractions on top of ArangoDB collections and documents. In ArangoDB 3.0 these have been removed entirely. Repositories vs collections Repositories mostly wrapped methods that already existed on ArangoDB collection objects and primarily dealt with converting between plain ArangoDB documents and Foxx model instances. In ArangoDB 3.0 you can simply use these collections directly and treat documents as plain JavaScript objects. Old: 'use strict'; const Foxx = require('org/arangodb/foxx'); const myRepo = new Foxx.Repository( applicationContext.collection('myCollection'), {model: Foxx.Model} ); // ... const models = myRepo.byExample({color: 'green'}); res.json(models.map(function (model) { return model.forClient(); })); New: 'use strict'; const myDocs = module.context.collection('myCollection'); // ... const docs = myDocs.byExample({color: 'green'}); res.json(docs); Schema validation The main purpose of models in ArangoDB 2.x was to validate incoming data using joi schemas. In more recent versions of ArangoDB 2.x it was already possible to pass these schemas directly in most places where a model was expected as an argument. The only difference is that schemas should now be considered the default. If you previously relied on the automatic validation of Foxx model instances when setting attributes or instantiating models from untrusted data, you can simply use the schema's validate method directly. Old: 'use strict'; const joi = require('joi'); const mySchema = { name: joi.string().required(), size: joi.number().required() }; const Foxx = require('org/arangodb/foxx'); const MyModel = Foxx.Model.extend({schema: mySchema}); // ... const model = new MyModel(req.json()); if (!model.isValid) { res.status(400); 261 Repositories and M odels res.write('Bad request'); return; } New: 'use strict'; const joi = require('joi'); // Note this is now wrapped in a joi.object() const mySchema = joi.object({ name: joi.string().required(), size: joi.number().required() }).required(); // ... const result = mySchema.validate(req.body); if (result.errors) { res.status(400); res.write('Bad request'); return; } Migrating models While most use cases for models can now be replaced with plain joi schemas, there is still the concept of a "model" in Foxx in ArangoDB 3.0 although it is quite different from Foxx models in ArangoDB 2.x. A model in Foxx now refers to a plain JavaScript object with an optional fromClient schema attribute and the optional methods forClient and . M odels can be used instead of plain joi schemas to define request and response bodies but there are no model "instances" in ArangoDB 3.0. Old: 'use strict'; const _ = require('underscore'); const joi = require('joi'); const Foxx = require('org/arangodb/foxx'); const MyModel = Foxx.Model.extend({ schema: { name: joi.string().required(), size: joi.number().required() }, forClient () { return _.omit(this.attributes, ['_key', '_id', '_rev']); } }); // ... ctrl.get(/* ... */) .bodyParam('body', {type: MyModel}); New: 'use strict'; const _ = require('lodash'); const joi = require('joi'); const MyModel = { schema: joi.object({ name: joi.string().required(), size: joi.number().required() }).required(), forClient (data) { return _.omit(data, ['_key', '_id', '_rev']); } }; 262 Repositories and M odels // ... router.get(/* ... */) .body(MyModel); Triggers When saving, updating, replacing or deleting models in ArangoDB 2.x using the repository methods the repository and model would fire events that could be subscribed to in order to perform side-effects. Note that even in 2.x these events would not fire when using queries or manipulating documents in any other way than using the specific repository methods that operated on individual documents. This behaviour is no longer available in ArangoDB 3.0 but can be emulated by using an EventEmitter directly if it is not possible to solve the problem differently: Old: 'use strict'; const Foxx = require('org/arangodb/foxx'); const MyModel = Foxx.Model.extend({ // ... }, { afterRemove () { console.log(this.get('name'), 'was removed'); } }); // ... const model = myRepo.firstExample({name: 'myName'}); myRepo.remove(model); // -> "myName was removed successfully" New: 'use strict'; const EventEmitter = require('events'); const emitter = new EventEmitter(); emitter.on('afterRemove', function (doc) { console.log(doc.name, 'was removed'); }); // ... const doc = myDocs.firstExample({name: 'myName'}); myDocs.remove(doc); emitter.emit('afterRemove', doc); // -> "myName was removed successfully" Or simply: 'use strict'; function afterRemove(doc) { console.log(doc.name, 'was removed'); } // ... const doc = myDocs.firstExample({name: 'myName'}); myDocs.remove(doc); afterRemove(doc); // -> "myName was removed successfully" 263 Controllers Controllers vs routers Foxx Controllers have been replaced with routers. This is more than a cosmetic change as there are significant differences in behaviour: Controllers were automatically mounted when the file defining them was executed. Routers need to be explicitly mounted using the module.context.use method. Routers can also be exported, imported and even nested. This makes it easier to split up complex routing trees across multiple files. Old: 'use strict'; const Foxx = require('org/arangodb/foxx'); const ctrl = new Foxx.Controller(applicationContext); ctrl.get('/hello', function (req, res) { // ... }); New: 'use strict'; const createRouter = require('org/arangodb/foxx/router'); const router = createRouter(); // If you are importing this file from your entry file ("main"): module.exports = router; // Otherwise: module.context.use(router); router.get('/hello', function (req, res) { // ... }); Some general changes in behaviour that might trip you up: When specifying path parameters with schemas Foxx will now ignore the route if the schema does not match (i.e. no longer match /hello/:num if num specifies a schema that doesn't match the value "foxx" /hello/foxx will ). With controllers this could previously result in users seeing a 400 (bad request) error when they should instead be served a 404 (not found) response. When a request is made with an HTTP verb not supported by an endpoint, Foxx will now respond with a 405 (method not allowed) error with an appropriate Allowed header listing the supported HTTP verbs for that endpoint. Foxx will no longer parse your JSDoc comments to generate route documentation (use the summary and description methods of the endpoint instead). The apiDocumentation method now lives on the service context and behaves slightly differently. There is no router equivalent for the activateAuthentication and activateSessions methods. Instead you should use the session middleware (see the section on sessions below). There is no del alias for the delete method on routers. It has always been safe to use keywords as method names in Foxx, so the use of this alias was already discouraged before. The allRoutes proxy is no lot available on routers but can easily be replaced with middleware or child routers. 264 Controllers The request context When defining a route on a controller the controller would return an object called request context. Routers return a similar object called endpoint. Routers also return endpoints when mounting child routers or middleware, as does the use method of the service context. The main differences between the new endpoints and the objects returned by controllers in previous versions of ArangoDB are: bodyParam is now simply called body ; it is no longer neccessary or possible to give the body a name and the request body will not show up in the request parameters. It's also possible to specify a M IM E type body , queryParam and pathParam now take position arguments instead of an object. For specifics see the endpoint documentation. notes onlyIf is now called and description onlyIfAuthenticated and takes a single string argument. are no longer available; they can be emulated with middleware if necessary: Old: ctrl.get(/* ... */) .onlyIf(function (req) { if (!req.user) { throw new Error('Not authenticated!'); } }); New: router.use(function (req, res, next) { if (!req.arangoUser) { res.throw(403, 'Not authenticated!'); } next(); }); router.get(/* ... */); 265 Controllers Error handling The errorResponse method provided by controller request contexts has no equivalent in router endpoints. If you want to handle specific error types with specific status codes you need to catch them explicitly, either in the route or in a middleware: Old: ctrl.get('/puppies', function (req, res) { // Exception is thrown here }) .errorResponse(TooManyPuppiesError, 400, 'Something went wrong!'); New: ctrl.get('/puppies', function (req, res) { try { // Exception is thrown here } catch (e) { if (!(e instanceof TooManyPuppiesError)) { throw e; } res.throw(400, 'Something went wrong!'); } }) // The "error" method merely documents the meaning // of the status code and has no other effect. .error(400, 'Thrown if there are too many puppies.'); Note that errors created with http-errors are still handled by Foxx intelligently. In fact res.throw is just a helper method for creating and throwing these errors. 266 Controllers Before, after and around The before , after and around methods can easily be replaced by middleware: Old: let start; ctrl.before(function (req, res) { start = Date.now(); }); ctrl.after(function (req, res) { console.log('Request handled in ', (Date.now() - start), 'ms'); }); New: router.use(function (req, res, next) { let start = Date.now(); next(); console.log('Request handled in ', (Date.now() - start), 'ms'); }); Note that unlike around middleware receives the next function as the third argument (the "opts" argument has no equivalent). 267 Controllers Request objects The names of some attributes of the request object have been adjusted to more closely align with those of the corresponding methods on the endpoint objects and established conventions in other JavaScript frameworks: req.urlParameters req.parameters req.params() is now called is now called req.requestType req.param() req.method is now called is now called req.pathParams req.queryParams is now called req.compatibility req.user is now called req.arangoVersion req.arangoUser Some attributes have been removed or changed: has been removed entirely (use req.cookies req.requestBody req.suffix Additionally the req.cookie(name) ) has been removed entirely (see below) is now a string rather than an array req.server and req.client attributes are no longer available. The information is now exposed in a way that can (optionally) transparently handle proxy forwarding headers: req.hostname req.port defaults to defaults to req.remoteAddress req.remotePort Finally, the req.server.address req.server.port defaults to defaults to req.cookie client.address client.port method now takes the signed options directly. Old: const sid = req.cookie('sid', { signed: { secret: 'keyboardcat', algorithm: 'sha256' } }); New: const sid = req.cookie('sid', { secret: 'keyboardcat', algorithm: 'sha256' }); Request bodies The req.body req.rawBody is no longer a method and no longer automatically parses JSON request bodies unless a request body was defined. The now corresponds to the req.rawBodyBuffer of ArangoDB 2.x and is also no longer a method. Old: ctrl.post('/', function (req, res) { const data = req.body(); // ... }); 268 Controllers New: router.post('/', function (req, res) { const data = req.body; // ... }) .body(['json']); Or simply: const joi = require('joi'); router.post('/', function (req, res) { const data = req.body; // ... }) .body(joi.object().optional()); Multipart requests The req.requestParts method has been removed entirely. If you need to accept multipart request bodies, you can simply define the request body using a multipart M IM E type like multipart/form-data : Old: ctrl.post('/', function (req, res) { const parts = req.requestParts(); // ... }); New: router.post('/', function (req, res) { const parts = req.body; // ... }) .body(['multipart/form-data']); 269 Controllers Response objects The response object has a lot of new methods in ArangoDB 3.0 but otherwise remains similar to the response object of previous versions: The res.send method behaves very differently from how the method with the same name behaved in ArangoDB 2.x: the conversion now takes the response body definition of the route into account. There is a new method Note that consecutive calls to The res.contentType you should set the res.write res.write that implements the old behaviour. will append to the response body rather than replacing it like res.send . property is also no longer available. If you want to set the M IM E type of the response body to an explicit value content-type header instead: Old: res.contentType = 'application/json'; res.body = JSON.stringify(results); New: res.set('content-type', 'application/json'); res.body = JSON.stringify(results); Or simply: // sets the content type to JSON // if it has not already been set res.json(results); The res.cookie method now takes the signed options as part of the regular options object. Old: res.cookie('sid', 'abcdef', { ttl: 60 * 60, signed: { secret: 'keyboardcat', algorithm: 'sha256' } }); New: res.cookie('sid', 'abcdef', { ttl: 60 * 60, secret: 'keyboardcat', algorithm: 'sha256' }); 270 Controllers Dependency injection There is no equivalent of the addInjector method available in ArangoDB 2.x controllers. M ost use cases can be solved by simply using plain variables but if you need something more flexible you can also use middleware: Old: ctrl.addInjector('magicNumber', function () { return Math.random(); }); ctrl.get('/', function (req, res, injected) { res.json(injected.magicNumber); }); New: function magicMiddleware(name) { return { register () { let magic; return function (req, res, next) { if (!magic) { magic = Math.random(); } req[name] = magic; next(); }; } }; } router.use(magicMiddleware('magicNumber')); router.get('/', function (req, res) { res.json(req.magicNumber); }); Or simply: const magicNumber = Math.random(); router.get('/', function (req, res) { res.json(magicNumber); }); 271 Sessions Sessions The ctrl.activateSessions method and the related util-sessions-local Foxx service have been replaced with the Foxx sessions middleware. It is no longer possible to use the built-in session storage but you can simply pass in any document collection directly. Old: const localSessions = applicationContext.dependencies.localSessions; const sessionStorage = localSessions.sessionStorage; ctrl.activateSessions({ sessionStorage: sessionStorage, cookie: {secret: 'keyboardcat'} }); ctrl.destroySession('/logout', function (req, res) { res.json({message: 'Goodbye!'}); }); New: const sessionMiddleware = require('@arangodb/foxx/sessions'); const cookieTransport = require('@arangodb/foxx/sessions/transports/cookie'); router.use(sessionMiddleware({ storage: module.context.collection('sessions'), transport: cookieTransport('keyboardcat') })); router.post('/logout', function (req, res) { req.sessionStorage.clear(req.session); res.json({message: 'Goodbye!'}); }); 272 Auth and OAuth2 Auth and OAuth2 The util-simple-auth and util-oauth2 Foxx services have been replaced with the Foxx auth and Foxx OAuth2 modules. It is no longer necessary to install these services as dependencies in order to use the functionality. Old: 'use strict'; const auth = applicationContext.dependencies.simpleAuth; // ... const valid = auth.verifyPassword(authData, password); New: 'use strict'; const createAuth = require('@arangodb/foxx/auth'); const auth = createAuth(); // Use default configuration // ... const valid = auth.verifyPassword(authData, password); 273 Foxx Queries Foxx queries The createQuery db._query method has been removed. It can be trivially replaced with plain JavaScript functions and direct calls to the method: Old: 'use strict'; const Foxx = require('org/arangodb/foxx'); const query = Foxx.createQuery({ query: 'FOR u IN _users SORT u.user ASC RETURN u[@propName]', params: ['propName'], transform: function (results, uppercase) { return ( uppercase ? results[0].toUpperCase() : results[0].toLowerCase() ); } }); query('user', true); New: 'use strict'; const db = require('@arangodb').db; const aql = require('@arangodb').aql; function query(propName, uppercase) { const results = db._query(aql` FOR u IN _users SORT u.user ASC RETURN u[${propName}] `); return ( uppercase ? results[0].toUpperCase() : results[0].toLowerCase() ); } query('user', true); 274 Legacy compatibility mode Legacy compatibility mode for 2.8 services ArangoDB 3 continues to support Foxx services written for ArangoDB 2.8 by running them in a special legacy compatibility mode that provides access to some of the modules and APIs no longer provided in 3.0 and beyond. Note: Legacy compatibility mode is strictly intended as a temporary stop gap solution for supporting existing services while upgrading to ArangoDB 3.0 and should not be considered a permanent feature of ArangoDB or Foxx. In order to mark an existing service as a legacy service, just make sure the following attribute is defined in the service manifest: "engines": { "arangodb": "^2.8.0" } This semantic version range denotes that the service is known to work with ArangoDB 2.8.0 and supports all newer versions of ArangoDB up to but not including 3.0.0 (nor any development version of 3.0.0 and greater). Any similar version range the does not include 3.0.0 or greater will have the same effect (e.g. compatibility mode, as will 1.2.3 , but >=2.8.0 ^2.5.0 will also trigger the legacy will not as it indicates compatibility with all versions greater or equal 2.8.0, not just those within the 2.x version range). Features supported in legacy compatibility mode Legacy compatibility mode supports the old manifest format, specifically: main is ignored will be mounted as in 2.8 controllers exports will be executed as in 2.8 Additionally the isSystem attribute will be ignored if present but does not result in a warning in legacy compatibility mode. The Foxx console is available as the console The service context is available as the pseudo-global variable (shadowing the global console object). applicationContext pseudo-global variable in the controllers , exports , scripts and tests as in 2.8. The following additional properties are available on the service context in legacy compatibility mode: is an alias for 3.x path() fileName() foxxFileName() version name fileName() behaves as in 2.x (using is an alias for 2.x path.join to build file paths) to build file paths) fileName exposes the service manifest's exposes the service manifest's options (using fs.safeJoin version attribute attribute name exposes the service's raw options The following methods are removed on the service context in legacy compatibility mode: use() -- use @arangodb/foxx/controller apiDocumentation() registerType() -- use instead controller.apiDocumentation() instead -- not supported in legacy compatibility mode The following modules that have been removed or replaced in 3.0.0 are available in legacy compatibility mode: @arangodb/foxx/authentication @arangodb/foxx/console @arangodb/foxx/controller @arangodb/foxx/model @arangodb/foxx/query @arangodb/foxx/repository @arangodb/foxx/schema @arangodb/foxx/sessions @arangodb/foxx/template_middleware 275 Legacy compatibility mode The module also provides the same exports as in 2.8, namely: @arangodb/foxx Controller from Model from Repository queues @arangodb/foxx/query @arangodb/foxx/model from toJSONSchema getExports @arangodb/foxx/controller from createQuery and from @arangodb/foxx/repository from @arangodb/foxx/schema requireApp from @arangodb/foxx/manager @arangodb/foxx/queues Any feature not supported in 2.8 will also not work in legacy compatibility mode. When migrating from an older version of ArangoDB it is a good idea to migrate to ArangoDB 2.8 first for an easier upgrade path. Additionally please note the differences laid out in the chapter Migrating from pre-2.8 in the migration guide. 276 User management User management Foxx does not provide any user management out of the box but it is very easy to roll your own solution: The session middleware provides mechanisms for adding session logic to your service, using e.g. a collection or JSON Web Tokens to store the sessions between requests. The auth module provides utilities for basic password verification and hashing. The following example service demonstrates how user management can be implemented using these basic building blocks. Setting up the collections Let's say we want to store sessions and users in collections. We can use the setup script to make sure these collections are created before the service is mounted. First add a setup script to your manifest if it isn't already defined: "scripts": { "setup": "scripts/setup.js" } Then create the setup script with the following content: 'use strict'; const db = require('@arangodb').db; const sessions = module.context.collectionName('sessions'); const users = module.context.collectionName('users'); if (!db._collection(sessions)) { db._createDocumentCollection(sessions); } if (!db._collection(users)) { db._createDocumentCollection(users); } db._collection(users).ensureIndex({ type: 'hash', fields: ['username'], unique: true }); Creating the router The following main file demonstrates basic user management: 'use strict'; const joi = require('joi'); const createAuth = require('@arangodb/foxx/auth'); const createRouter = require('@arangodb/foxx/router'); const sessionsMiddleware = require('@arangodb/foxx/sessions'); const auth = createAuth(); const router = createRouter(); const users = module.context.collection('users'); const sessions = sessionsMiddleware({ storage: module.context.collection('sessions'), transport: 'cookie' }); module.context.use(sessions); module.context.use(router); router.get('/whoami', function (req, res) { 277 User management try { const user = users.document(req.session.uid); res.send({username: user.username}); } catch (e) { res.send({username: null}); } }) .description('Returns the currently active username.'); router.post('/login', function (req, res) { // This may return a user object or null const user = users.firstExample({ username: req.body.username }); const valid = auth.verify( // Pretend to validate even if no user was found user ? user.authData : {}, req.body.password ); if (!valid) res.throw('unauthorized'); // Log the user in req.session.uid = user._key; req.sessionStorage.save(req.session); res.send({sucess: true}); }) .body(joi.object({ username: joi.string().required(), password: joi.string().required() }).required(), 'Credentials') .description('Logs a registered user in.'); router.post('/logout', function (req, res) { if (req.session.uid) { req.session.uid = null; req.sessionStorage.save(req.session); } res.send({success: true}); }) .description('Logs the current user out.'); router.post('/signup', function (req, res) { const user = req.body; try { // Create an authentication hash user.authData = auth.create(user.password); delete user.password; const meta = users.save(user); Object.assign(user, meta); } catch (e) { // Failed to save the user // We'll assume the UniqueConstraint has been violated res.throw('bad request', 'Username already taken', e); } // Log the user in req.session.uid = user._key; req.sessionStorage.save(req.session); res.send({success: true}); }) .body(joi.object({ username: joi.string().required(), password: joi.string().required() }).required(), 'Credentials') .description('Creates a new user and logs them in.'); 278 Related modules Related modules These are some of the modules outside of Foxx you will find useful when writing Foxx services. Additionally there are modules providing some level of compatibility with Node.js as well as a number of bundled NPM modules (like lodash and joi). For more information on these modules see the JavaScript modules appendix. The @arangodb module require('@arangodb') This module provides access to various ArangoDB internals as well as three of the most important exports necessary to work with the database in Foxx: db , aql and errors . You can find a full description of this module in the ArangoDB module appendix. The @arangodb/request module require('@arangodb/request') This module provides a function for making HTTP requests to external services. Note that while this allows communicating with thirdparty services it may affect database performance by blocking Foxx requests as ArangoDB waits for the remote service to respond. If you routinely make requests to slow external services and are not directly interested in the response it is probably a better idea to delegate the actual request/response cycle to a gateway service running outside ArangoDB. You can find a full description of this module in the request module appendix. The @arangodb/general-graph module require('@arangodb/general-graph') This module provides access to ArangoDB graph definitions and various low-level graph operations in JavaScript. For more complex queries it is probably better to use AQL but this module can be useful in your setup and teardown scripts to create and destroy graph definitions. For more information see the chapter on the general graph module. 279 Authentication Authentication const createAuth = require('@arangodb/foxx/auth'); Authenticators allow implementing basic password mechanism using simple built-in hashing functions. For a full example of sessions with authentication and registration see the example in the chapter on User M anagement. Creating an authenticator createAuth([options]): Authenticator Creates an authenticator. Arguments options: Object (optional) An object with the following properties: method: string (Default: "sha256" ) The hashing algorithm to use to create password hashes. The authenticator will be able to verify passwords against hashes using any supported hashing algorithm. This only affects new hashes created by the authenticator. Supported values: "md5" "sha1" "sha224" "sha256" "sha384" "sha512" saltLength: number (Default: 16 ) Length of the salts that will be generated for password hashes. Returns an authenticator. Creating authentication data objects auth.create(password): AuthData Creates an authentication data object for the given password with the following properties: method: string The method used to generate the hash. salt: string A random salt used to generate this hash. hash: string The hash string itself. Arguments password: string A password to hash. Returns the authentication data object. 280 Authentication Validating passwords against authentication data objects auth.verify([hash, [password]]): boolean Verifies the given password against the given hash using a constant time string comparison. Arguments hash: AuthData (optional) A authentication data object generated with the create method. password: string (optional) A password to verify against the hash. Returns true if the hash matches the given password. Returns false otherwise. 281 OAuth 1.0a OAuth 1.0a const createOAuth1Client = require('@arangodb/foxx/oauth1'); The OAuth1 module provides abstractions over OAuth 1.0a providers like Twitter, XING and Tumblr. Examples The following extends the user management example: const router = createRouter(); const oauth1 = createOAuth1Client({ // We'll use Twitter for this example requestTokenEndpoint: 'https://api.twitter.com/oauth/request_token', authEndpoint: 'https://api.twitter.com/oauth/authorize', accessTokenEndpoint: 'https://api.twitter.com/oauth/access_token', activeUserEndpoint: 'https://api.twitter.com/1.1/account/verify_credentials.json', clientId: 'keyboardcat', clientSecret: 'keyboardcat' }); module.context.use('/oauth1', router); // See the user management example for setting up the // sessions and users objects used in this example router.use(sessions); router.post('/auth', function (req, res) { const url = req.reverse('oauth1_callback'); const oauth_callback = req.makeAbsolute(url); const requestToken = oauth1.fetchRequestToken(oauth_callback); if (requestToken.oauth_callback_confirmed !== 'true') { res.throw(500, 'Could not fetch OAuth request token'); } // Set request token cookie for five minutes res.cookie('oauth1_request_token', requestToken.oauth_token, {ttl: 60 * 5}); // Redirect to the provider's authorization URL res.redirect(303, oauth1.getAuthUrl(requestToken.oauth_token)); }); router.get('/auth', function (req, res) { // Make sure CSRF cookie matches the URL const expectedToken = req.cookie('oauth1_request_token'); if (!expectedToken || req.queryParams.oauth_token !== expectedToken) { res.throw(400, 'CSRF mismatch.'); } const authData = oauth1.exchangeRequestToken( req.queryParams.oauth_token, req.queryParams.oauth_verifier ); const twitterToken = authData.oauth_token; const twitterSecret = authData.oauth_token_secret; // Fetch the active user's profile info const profile = oauth1.fetchActiveUser(twitterToken, twitterSecret); const twitterId = profile.screen_name; // Try to find an existing user with the user ID // (this requires the users collection) let user = users.firstExample({twitterId}); if (user) { // Update the twitterToken if it has changed if ( user.twitterToken !== twitterToken || user.twitterSecret !== twitterSecret ) { users.update(user, {twitterToken, twitterSecret}); } } else { // Create a new user document user = { username: `twitter:${twitterId}`, twitterId, twitterToken 282 OAuth 1.0a } const meta = users.save(user); Object.assign(user, meta); } // Log the user in (this requires the session middleware) req.session.uid = user._key; req.session.twitterToken = authData.twitterToken; req.session.twitterSecret = authData.twitterSecret; req.sessionStorage.save(req.session); // Redirect to the default route res.redirect(303, req.makeAbsolute('/')); }, 'oauth1_callback') .queryParam('oauth_token', joi.string().optional()) .queryParam('oauth_verifier', joi.string().optional()); Creating an OAuth1.0a client createOAuth1Client(options): OAuth1Client Creates an OAuth1.0a client. Arguments options: Object An object with the following properties: requestTokenEndpoint: string The fully-qualified URL of the provider's Temporary Credentials Request endpoint. This URL is used to fetch the unauthenticated temporary credentials that will be used to generate the authorization redirect for the user. authEndpoint: string The fully-qualified URL of the provider's Resource Owner Authorization endpoint. This is the URL the user will be redirected to in order to authorize the OAuth consumer (i.e. your service). accessTokenEndpoint: string The fully-qualified URL of the provider's Token Request endpoint. This URL is used to exchange the authenticated temporary credentials received from the authorization redirect for the actual token credentials that can be used to make requests to the API server. activeUserEndpoint: string (optional) The fully-qualified URL of the provider's endpoint for fetching details about the current user. clientId: string The application's Client ID (or Consumer Key) for the provider. clientS ecret: string The application's Client Secret (or Consumer Secret) for the provider. signatureMethod: string (Default: "HMAC-SHA1" ) The cryptographic method that will be used to sign OAuth 1.0a requests. Only "HMAC-SHA1-" and "PLAINTEXT" are supported at this time. Note that many providers may not implement "PLAINTEXT" as it exposes the Client Secret and oauth_token_secret instead of generating a signature. Returns an OAuth 1.0a client for the given provider. Setting up OAuth 1.0a for Twitter If you want to use Twitter as the OAuth 1.0a provider, use the following options: 283 OAuth 1.0a requestTokenEndpoint: authEndpoint: https://api.twitter.com/oauth/request_token https://api.twitter.com/oauth/authorize accessTokenEndpoint: activeUserEndpoint: https://api.twitter.com/oauth/access_token https://api.twitter.com/1.1/account/verify_credentials.json You also need to obtain a client ID and client secret from Twitter: 1. Create a regular account at Twitter or use an existing account you own. 2. Visit the Twitter Application M anagement dashboard and sign in with your Twitter account. 3. Click on Create New App and follow the instructions provided. The Callback URL should match your oauth_callback later. You may be prompted to add a mobile phone number to your account and verify it. 4. Open the Keys and Access Tones tab, then note down the Consumer Key and Consumer Secret. 5. Set the option clientId to the Consumer Key and the option clientSecret to the Consumer Secret. Note that if you only need read-only access to public information, you can also use the clientId and clientSecret directly without OAuth 1.0a. See Twitter REST API Reference Documentation. Setting up OAuth 1.0a for XING If you want to use XING as the OAuth 1.0a provider, use the following options: requestTokenEndpoint: authEndpoint: https://api.xing.com/v1/request_token https://api.xing.com/v1/authorize accessTokenEndpoint: activeUserEndpoint: https://api.xing.com/v1/access_token https://api.xing.com/v1/users/me You also need to obtain a client ID and client secret from XING: 1. Create a regular account at XING or use an existing account you own. 2. Visit the XING Developer page and sign in with your XING account. 3. Click on Create app and note down the Consumer key and Consumer secret. 4. Set the option clientId to the Consumer key and the option clientSecret to the Consumer secret. See XING Developer Documentation. Setting up OAuth 1.0a for Tumblr If you want to use Tumblr as the OAuth 1.0a provider, use the following options: requestTokenEndpoint: authEndpoint: https://www.tumblr.com/oauth/request_token https://www.tumblr.com/oauth/authorize accessTokenEndpoint: activeUserEndpoint: https://www.tumblr.com/oauth/access_token https://api.tumblr.com/v2/user/info You also need to obtain a client ID and client secret from Tumblr: 1. Create a regular account at Tumblr or use an existing account you own. 2. Visit the Tumblr Applications dashboard. 3. Click on Register application, then follow the instructions provided. The Default callback URL should match your oauth_callback later. 4. Note down the OAuth Consumer Key and Secret Key. The secret may be hidden by default. 5. Set the option clientId to the OAuth Consumer Key and the option clientSecret to the Secret Key. See Tumblr API Documentation. Fetch an unauthenticated request token oauth1.fetchRequestToken(oauth_callback, opts) Fetches an oauth_token that can be used to create an authorization URL that redirects to the given oauth_callback on confirmation. 284 OAuth 1.0a Performs a POST response to the requestTokenEndpoint. Throws an exception if the remote server responds with an empty response body. Arguments oauth_callback: string The fully-qualified URL of your application's OAuth 1.0a callback. opts: Object (optional) An object with additional query parameters to include in the request. See RFC 5849. Returns the parsed response object. Get the authorization URL oauth1.getAuthUrl(oauth_token, opts): string Generates the authorization URL for the authorization endpoint. Arguments oauth_token: The string oauth_token previously returned by fetchRequestToken . opts: (optional) An object with additional query parameters to add to the URL. See RFC 5849. Returns a fully-qualified URL for the authorization endpoint of the provider by appending the oauth_token and any additional arguments from opts to the authEndpoint. Examples const requestToken = oauth1.fetchRequestToken(oauth_callback); if (requestToken.oauth_callback_confirmed !== 'true') { throw new Error('Provider could not confirm OAuth 1.0 callback'); } const authUrl = oauth1.getAuthUrl(requestToken.oauth_token); Exchange an authenticated request token for an access token oauth1.exchangeRequestToken(oauth_token, oauth_verifier, opts) Takes a pair of authenticated temporary credentials passed to the callback URL by the provider and exchanges it for an oauth_token and than can be used to perform authenticated requests to the OAuth 1.0a provider. oauth_token_secret Performs a POST response to the accessTokenEndpoint. Throws an exception if the remote server responds with an empty response body. Arguments oauth_token: The string oauth_token oauth_verifier: The opts: passed to the callback URL by the provider. string oauth_verifier Object passed to the callback URL by the provider. (optional) 285 OAuth 1.0a An object with additional query parameters to include in the request. See RFC 5849. Returns the parsed response object. Fetch the active user oauth1.fetchActiveUser(oauth_token, oauth_token_secret, opts): Object Fetches details of the active user. Performs a GET response to the activeUserEndpoint. Throws an exception if the remote server responds with an empty response body. Returns null if the activeUserEndpoint is not configured. Arguments oauth_token: string An OAuth 1.0a access token as returned by exchangeRequestToken. oauth_token_secret: string An OAuth 1.0a access token secret as returned by exchangeRequestToken. opts: Object (optional) An object with additional query parameters to include in the request. See RFC 5849. Returns the parsed response object. Examples const authData = oauth1.exchangeRequestToken(oauth_token, oauth_verifier); const userData = oauth1.fetchActiveUser(authData.oauth_token, authData.oauth_token_secret); Create an authenticated request object oauth1.createSignedRequest(method, url, parameters, oauth_token, oauth_token_secret) Creates a request object that can be used to perform a request to the OAuth 1.0a provider with the provided token credentials. Arguments method: string HTTP method the request will use, e.g. url: "POST" . string The fully-qualified URL of the provider the request will be performed against. The URL may optionally contain any number of query parameters. parameters: string | Object | null An additional object or query string containing query parameters or body parameters that will be part of the signed request. oauth_token: string An OAuth 1.0a access token as returned by exchangeRequestToken. oauth_token_secret: string 286 OAuth 1.0a An OAuth 1.0a access token secret as returned by exchangeRequestToken. Returns an object with three properties: url: The normalized URL without any query parameters. qs: A normalized query string containing all parameters and query parameters. headers: An object containing the following properties: accept: The string "application/json" . authorization: An OAuth authorization header containing all OAuth parameters and the request signature. Examples Fetch a list of tweets mentioning @arangodb : const request = require('@arangodb/request'); const req = oauth1.createSignedRequest( 'GET', 'https://api.twitter.com/1.1/search/tweets.json', {q: '@arangodb'}, authData.oauth_token, authData.oauth_token_secret ); const res = request(req); console.log(res.json.statuses); Signing a more complex request: const url = 'https://api.example.com/v1/timeline?visible=public'; const params = {hello: 'world', longcat: 'is long'}; const req = oauth1.createSignedRequest( 'POST', url, // URL includes a query parameter that will be signed params, // Request body needs to be signed too authData.oauth_token, authData.oauth_token_secret ); const res = request.post(url, { form: params, headers: { accept: 'application/x-www-form-urlencoded', // Authorization header includes the signature authorization: req.headers.authorization } }); console.log(res.json); 287 OAuth 2.0 OAuth 2.0 const createOAuth2Client = require('@arangodb/foxx/oauth2'); The OAuth2 module provides abstractions over OAuth 2.0 providers like Facebook, GitHub and Google. Examples The following extends the user management example: const crypto = require('@arangodb/crypto'); const router = createRouter(); const oauth2 = createOAuth2Client({ // We'll use Facebook for this example authEndpoint: 'https://www.facebook.com/dialog/oauth', tokenEndpoint: 'https://graph.facebook.com/oauth/access_token', activeUserEndpoint: 'https://graph.facebook.com/v2.0/me', clientId: 'keyboardcat', clientSecret: 'keyboardcat' }); module.context.use('/oauth2', router); // See the user management example for setting up the // sessions and users objects used in this example router.use(sessions); router.post('/auth', function (req, res) { const csrfToken = crypto.genRandomAlphaNumbers(32); const url = req.reverse('oauth2_callback', {csrfToken}); const redirect_uri = req.makeAbsolute(url); // Set CSRF cookie for five minutes res.cookie('oauth2_csrf_token', csrfToken, {ttl: 60 * 5}); // Redirect to the provider's authorization URL res.redirect(303, oauth2.getAuthUrl(redirect_uri)); }); router.get('/auth', function (req, res) { // Some providers pass errors as query parameter if (req.queryParams.error) { res.throw(500, `Provider error: ${req.queryParams.error}`) } // Make sure CSRF cookie matches the URL const expectedToken = req.cookie('oauth2_csrf_token'); if (!expectedToken || req.queryParams.csrfToken !== expectedToken) { res.throw(400, 'CSRF mismatch.'); } // Make sure the URL contains a grant token if (!req.queryParams.code) { res.throw(400, 'Provider did not pass grant token.'); } // Reconstruct the redirect_uri used for the grant token const url = req.reverse('oauth2_callback'); const redirect_uri = req.makeAbsolute(url); // Fetch an access token from the provider const authData = oauth2.exchangeGrantToken( req.queryParams.code, redirect_uri ); const facebookToken = authData.access_token; // Fetch the active user's profile info const profile = oauth2.fetchActiveUser(facebookToken); const facebookId = profile.id; // Try to find an existing user with the user ID // (this requires the users collection) let user = users.firstExample({facebookId}); if (user) { // Update the facebookToken if it has changed if (user.facebookToken !== facebookToken) { users.update(user, {facebookToken}); } } else { 288 OAuth 2.0 // Create a new user document user = { username: `fb:${facebookId}`, facebookId, facebookToken } const meta = users.save(user); Object.assign(user, meta); } // Log the user in (this requires the session middleware) req.session.uid = user._key; req.session.facebookToken = authData.facebookToken; req.sessionStorage.save(req.session); // Redirect to the default route res.redirect(303, req.makeAbsolute('/')); }, 'oauth2_callback') .queryParam('error', joi.string().optional()) .queryParam('csrfToken', joi.string().optional()) .queryParam('code', joi.string().optional()); Creating an OAuth 2.0 client createOAuth2Client(options): OAuth2Client Creates an OAuth 2.0 client. Arguments options: Object An object with the following properties: authEndpoint: string The fully-qualified URL of the provider's authorization endpoint. tokenEndpoint: string The fully-qualified URL of the provider's token endpoint. refreshEndpoint: string (optional) The fully-qualified URL of the provider's refresh token endpoint. activeUserEndpoint: string (optional) The fully-qualified URL of the provider's endpoint for fetching details about the current user. clientId: string The application's Client ID (or App ID) for the provider. clientS ecret: string The application's Client Secret (or App Secret) for the provider. Returns an OAuth 2.0 client for the given provider. Setting up OAuth 2.0 for Facebook If you want to use Facebook as the OAuth 2.0 provider, use the following options: authEndpoint: tokenEndpoint: https://www.facebook.com/dialog/oauth https://graph.facebook.com/oauth/access_token activeUserEndpoint: https://graph.facebook.com/v2.0/me You also need to obtain a client ID and client secret from Facebook: 1. Create a regular account at Facebook or use an existing account you own. 2. Visit the Facebook Developers page. 289 OAuth 2.0 3. Click on Apps in the menu, then select Register as a Developer (the only option) and follow the instructions provided. You may need to verify your account by phone. 4. Click on Apps in the menu, then select Create a New App and follow the instructions provided. 5. Open the app dashboard, then note down the App ID and App Secret. The secret may be hidden by default. 6. Click on Settings, then Advanced and enter one or more Valid OAuth redirect URIs. At least one of them must match your redirect_uri later. Don't forget to save your changes. 7. Set the option clientId to the App ID and the option clientSecret to the App Secret. Setting up OAuth 2.0 for GitHub If you want to use GitHub as the OAuth 2.0 provider, use the following options: authEndpoint: https://github.com/login/oauth/authorize?scope=user tokenEndpoint: https://github.com/login/oauth/access_token activeUserEndpoint: https://api.github.com/user You also need to obtain a client ID and client secret from GitHub: 1. Create a regular account at GitHub or use an existing account you own. 2. Go to Account Settings > Applications > Register new application. 3. Provide an authorization callback URL. This must match your redirect_uri later. 4. Fill in the other required details and follow the instructions provided. 5. Open the application page, then note down the Client ID and Client Secret. 6. Set the option clientId to the Client ID and the option clientSecret to the Client Secret. Setting up OAuth 2.0 for Google If you want to use Google as the OAuth 2.0 provider, use the following options: authEndpoint: https://accounts.google.com/o/oauth2/auth?access_type=offline&scope=profile tokenEndpoint: https://accounts.google.com/o/oauth2/token activeUserEndpoint: https://www.googleapis.com/plus/v1/people/me You also need to obtain a client ID and client secret from Google: 1. Create a regular account at Google or use an existing account you own. 2. Visit the Google Developers Console. 3. Click on Create Project, then follow the instructions provided. 4. When your project is ready, open the project dashboard, then click on Enable an API. 5. Enable the Google+ API to allow your app to distinguish between different users. 6. Open the Credentials page and click Create new Client ID, then follow the instructions provided. At least one Authorized Redirect URI must match your redirect_uri later. At least one Authorized JavaScript Origin must match your app's fully-qualified domain. 7. When the Client ID is ready, note down the Client ID and Client secret. 8. Set the option clientId to the Client ID and the option clientSecret to the Client secret. Get the authorization URL oauth2.getAuthUrl(redirect_uri, args): string Generates the authorization URL for the authorization endpoint. Arguments redirect_uri: string The fully-qualified URL of your application's OAuth 2.0 callback. args: (optional) An object with any of the following properties: response_type: string (Default: "code" ) 290 OAuth 2.0 See RFC 6749. Returns a fully-qualified URL for the authorization endpoint of the provider by appending the client ID and any additional arguments from args to the authEndpoint. Exchange a grant code for an access token oauth2.exchangeGrantToken(code, redirect_uri) Exchanges a grant code for an access token. Performs a POST response to the tokenEndpoint. Throws an exception if the remote server responds with an empty response body. Arguments code: string A grant code returned by the provider's authorization endpoint. redirect_uri: string The original callback URL with which the code was requested. args: Object (optional) An object with any of the following properties: grant_type: string (Default: "authorization_code" ) See RFC 6749. Returns the parsed response object. Fetch the active user oauth2.fetchActiveUser(access_token): Object Fetches details of the active user. Performs a GET response to the activeUserEndpoint. Throws an exception if the remote server responds with an empty response body. Returns null if the activeUserEndpoint is not configured. Arguments access_token: string An OAuth 2.0 access token as returned by exchangeGrantToken. Returns the parsed response object. Examples const authData = oauth2.exchangeGrantToken(code, redirect_uri); const userData = oauth2.fetchActiveUser(authData.access_token); 291 Transactions Transactions Starting with version 1.3, ArangoDB provides support for user-definable transactions. Transactions in ArangoDB are atomic, consistent, isolated, and durable (ACID). These ACID properties provide the following guarantees: The atomicity principle makes transactions either complete in their entirety or have no effect at all. The consistency principle ensures that no constraints or other invariants will be violated during or after any transaction. The isolation property will hide the modifications of a transaction from other transactions until the transaction commits. Finally, the durability proposition makes sure that operations from transactions that have committed will be made persistent. The amount of transaction durability is configurable in ArangoDB, as is the durability on collection level. 292 Transaction invocation Transaction invocation ArangoDB transactions are different from transactions in SQL. In SQL, transactions are started with explicit BEGIN or START TRANSACTION command. Following any series of data retrieval or modification operations, an SQL transaction is finished with a COMMIT command, or rolled back with a ROLLBACK command. There may be client/server communication between the start and the commit/rollback of an SQL transaction. In ArangoDB, a transaction is always a server-side operation, and is executed on the server in one go, without any client interaction. All operations to be executed inside a transaction need to be known by the server when the transaction is started. There are no individual BEGIN, COMMIT or ROLLBACK transaction commands in ArangoDB. Instead, a transaction in ArangoDB is started by providing a description of the transaction to the db._executeTransaction JavaScript function: db._executeTransaction(description); This function will then automatically start a transaction, execute all required data retrieval and/or modification operations, and at the end automatically commit the transaction. If an error occurs during transaction execution, the transaction is automatically aborted, and all changes are rolled back. Execute transaction executes a transaction db._executeTransaction(object) Executes a server-side transaction, as specified by object. object must have the following attributes: collections: a sub-object that defines which collections will be used in the transaction. collections can have these attributes: read: a single collection or a list of collections that will be used in the transaction in read-only mode write: a single collection or a list of collections that will be used in the transaction in write or read mode. action: a Javascript function or a string with Javascript code containing all the instructions to be executed inside the transaction. If the code runs through successfully, the transaction will be committed at the end. If the code throws an exception, the transaction will be rolled back and all database operations will be rolled back. Additionally, object can have the following optional attributes: waitForSync: boolean flag indicating whether the transaction is forced to be synchronous. lockTimeout: a numeric value that can be used to set a timeout for waiting on collection locks. If not specified, a default value will be used. Setting lockTimeout to 0 will make ArangoDB not time out waiting for a lock. params: optional arguments passed to the function specified in action. Declaration of collections All collections which are to participate in a transaction need to be declared beforehand. This is a necessity to ensure proper locking and isolation. Collections can be used in a transaction in write mode or in read-only mode. If any data modification operations are to be executed, the collection must be declared for use in write mode. The write mode allows modifying and reading data from the collection during the transaction (i.e. the write mode includes the read mode). Contrary, using a collection in read-only mode will only allow performing read operations on a collection. Any attempt to write into a collection used in read-only mode will make the transaction fail. Collections for a transaction are declared by providing them in the collections attribute of the object passed to the _executeTransaction function. The collections attribute has the sub-attributes read and write: db._executeTransaction({ collections: { write: [ "users", "logins" ], 293 Transaction invocation read: [ "recommendations" ] } }); read and write are optional attributes, and only need to be specified if the operations inside the transactions demand for it. The contents of read or write can each be lists arrays collection names or a single collection name (as a string): db._executeTransaction({ collections: { write: "users", read: "recommendations" } }); Note: It is currently optional to specify collections for read-only access. Even without specifying them, it is still possible to read from such collections from within a transaction, but with relaxed isolation. Please refer to Transactions Locking for more details. In order to make a transaction fail when a non-declared collection is used inside for reading, the optional allowImplicit sub-attribute of collections can be set to false: db._executeTransaction({ collections: { read: "recommendations", allowImplicit: false /* this disallows read access to other collections than specified */ }, action: function () { var db = require("@arangodb").db; return db.foobar.toArray(); /* will fail because db.foobar must not be accessed for reading inside this transaction */ } }); The default value for allowImplicit is true. Write-accessing collections that have not been declared in the collections array is never possible, regardless of the value of allowImplicit. Declaration of data modification and retrieval operations All data modification and retrieval operations that are to be executed inside the transaction need to be specified in a Javascript function, using the action attribute: db._executeTransaction({ collections: { write: "users" }, action: function () { // all operations go here } }); Any valid Javascript code is allowed inside action but the code may only access the collections declared in collections. action may be a Javascript function as shown above, or a string representation of a Javascript function: db._executeTransaction({ collections: { write: "users" }, action: "function () { doSomething(); }" }); Please note that any operations specified in action will be executed on the server, in a separate scope. Variables will be bound late. Accessing any JavaScript variables defined on the client-side or in some other server context from inside a transaction may not work. Instead, any variables used inside action should be defined inside action itself: 294 Transaction invocation db._executeTransaction({ collections: { write: "users" }, action: function () { var db = require(...).db; db.users.save({ ... }); } }); When the code inside the action attribute is executed, the transaction is already started and all required locks have been acquired. When the code inside the action attribute finishes, the transaction will automatically commit. There is no explicit commit command. To make a transaction abort and roll back all changes, an exception needs to be thrown and not caught inside the transaction: db._executeTransaction({ collections: { write: "users" }, action: function () { var db = require("@arangodb").db; db.users.save({ _key: "hello" }); // will abort and roll back the transaction throw "doh!"; } }); There is no explicit abort or roll back command. As mentioned earlier, a transaction will commit automatically when the end of the action function is reached and no exception has been thrown. In this case, the user can return any legal JavaScript value from the function: db._executeTransaction({ collections: { write: "users" }, action: function () { var db = require("@arangodb").db; db.users.save({ _key: "hello" }); // will commit the transaction and return the value "hello" return "hello"; } }); Custom exceptions One may wish to define custom exceptions inside of a transaction. To have the exception propagate upwards properly, please throw an an instance of base JavaScript Error class or a derivative. To specify an error number, include it as the errorNumber field. As an example: db._executeTransaction({ collections: {}, action: function () { var err = new Error('My error context'); err.errorNumber = 1234; throw err; } }); Note: In previous versions, custom exceptions which did not have an the exception Error -like form were simply converted to strings and exposed in field of the returned error. This is no longer the case, as it had the potential to leak unwanted information if improperly used. Examples 295 Transaction invocation The first example will write 3 documents into a collection named c1. The c1 collection needs to be declared in the write attribute of the collections attribute passed to the executeTransaction function. The action attribute contains the actual transaction code to be executed. This code contains all data modification operations (3 in this example). // setup db._create("c1"); db._executeTransaction({ collections: { write: [ "c1" ] }, action: function () { var db = require("@arangodb").db; db.c1.save({ _key: "key1" }); db.c1.save({ _key: "key2" }); db.c1.save({ _key: "key3" }); } }); db.c1.count(); // 3 Aborting the transaction by throwing an exception in the action function will revert all changes, so as if the transaction never happened: // setup db._create("c1"); db._executeTransaction({ collections: { write: [ "c1" ] }, action: function () { var db = require("@arangodb").db; db.c1.save({ _key: "key1" }); db.c1.count(); // 1 db.c1.save({ _key: "key2" }); db.c1.count(); // 2 throw "doh!"; } }); db.c1.count(); // 0 The automatic rollback is also executed when an internal exception is thrown at some point during transaction execution: // setup db._create("c1"); db._executeTransaction({ collections: { write: [ "c1" ] }, action: function () { var db = require("@arangodb").db; db.c1.save({ _key: "key1" }); // will throw duplicate a key error, not explicitly requested by the user db.c1.save({ _key: "key1" }); // we'll never get here... } }); db.c1.count(); // 0 As required by the consistency principle, aborting or rolling back a transaction will also restore secondary indexes to the state at transaction start. Cross-collection transactions 296 Transaction invocation There's also the possibility to run a transaction across multiple collections. In this case, multiple collections need to be declared in the collections attribute, e.g.: // setup db._create("c1"); db._create("c2"); db._executeTransaction({ collections: { write: [ "c1", "c2" ] }, action: function () { var db = require("@arangodb").db; db.c1.save({ _key: "key1" }); db.c2.save({ _key: "key2" }); } }); db.c1.count(); // 1 db.c2.count(); // 1 Again, throwing an exception from inside the action function will make the transaction abort and roll back all changes in all collections: // setup db._create("c1"); db._create("c2"); db._executeTransaction({ collections: { write: [ "c1", "c2" ] }, action: function () { var db = require("@arangodb").db; for (var i = 0; i < 100; ++i) { db.c1.save({ _key: "key" + i }); db.c2.save({ _key: "key" + i }); } db.c1.count(); // 100 db.c2.count(); // 100 // abort throw "doh!" } }); db.c1.count(); // 0 db.c2.count(); // 0 297 Passing parameters Passing parameters to transactions Arbitrary parameters can be passed to transactions by setting the params attribute when declaring the transaction. This feature is handy to re-use the same transaction code for multiple calls but with different parameters. A basic example: db._executeTransaction({ collections: { }, action: function (params) { return params[1]; }, params: [ 1, 2, 3 ] }); The above example will return 2. Some example that uses collections: db._executeTransaction({ collections: { write: "users", read: [ "c1", "c2" ] }, action: function (params) { var db = require('@arangodb').db; var doc = db.c1.document(params['c1Key']); db.users.save(doc); doc = db.c2.document(params['c2Key']); db.users.save(doc); }, params: { c1Key: "foo", c2Key: "bar" } }); 298 Locking and isolation Locking and Isolation Transactions need to specify from which collections they will read data and which collections they intend do modify. This can be done by setting the read, write, or exclusive attributes in the collections attribute of the transaction: db._executeTransaction({ collections: { read: "users", write: ["test", "log"] }, action: function () { const db = require("@arangodb").db; db.users.toArray().forEach(function(doc) { db.log.insert({ value: "removed user: " + doc.name }); db.test.remove(doc._key); }); } }); write here means write access to the collection, and also includes any read accesses. exclusive is a synonym for write in the M M Files engine, because both exclusive and write will acquire collection-level locks in this engine. In the RocksDB engine, exclusive means exclusive write access to the collection, and write means (shared) write access to the collection, which can be interleaved with write accesses by other concurrent transactions. MMFiles engine The MMFiles engine uses the following locking mechanisms to serialize transactions on the same data: All collections specified in the collections attribute are locked in the requested mode (read or write) at transaction start. Locking of multiple collections is performed in alphabetical order. When a transaction commits or rolls back, all locks are released in reverse order. The locking order is deterministic to avoid deadlocks. While locks are held, modifications by other transactions to the collections participating in the transaction are prevented. A transaction will thus see a consistent view of the participating collections' data. Additionally, a transaction will not be interrupted or interleaved with any other ongoing operations on the same collection. This means each transaction will run in isolation. A transaction should never see uncommitted or rolled back modifications by other transactions. Additionally, reads inside a transaction are repeatable. Note that the above is true only for all collections that are declared in the collections attribute of the transaction. RocksDB engine The RocksDB engine does not lock any collections participating in a transaction for read. Read operations can run in parallel to other read or write operations on the same collections. For all collections that are used in write mode, the RocksDB engine will internally acquire a (shared) read lock. This means that many writers can modify data in the same collection in parallel (and also run in parallel to ongoing reads). However, if two concurrent transactions attempt to modify the same document or index entry, there will be a write-write conflict, and one of the transactions will abort with error 1200 ("conflict"). It is then up to client applications to retry the failed transaction or accept the failure. In order to guard long-running or complex transactions against concurrent operations on the same data, the RocksDB engine allows to access collections in exclusive mode. Exclusive accesses will internally acquire a write-lock on the collections, so they are not executed in parallel with any other write operations. Read operations can still be carried out by other concurrent transactions. Lazily adding collections There might be situations when declaring all collections a priori is not possible, for example, because further collections are determined by a dynamic AQL query inside the transaction, for example a query using AQL graph traversal. 299 Locking and isolation In this case, it would be impossible to know beforehand which collection to lock, and thus it is legal to not declare collections that will be accessed in the transaction in read-only mode. Accessing a non-declared collection in read-only mode during a transaction will add the collection to the transaction lazily, and fetch data from the collection as usual. However, as the collection is added lazily, there is no isolation from other concurrent operations or transactions. Reads from such collections are potentially non-repeatable. Examples: db._executeTransaction({ collections: { read: "users" }, action: function () { const db = require("@arangodb").db; /* Execute an AQL query that traverses a graph starting at a "users" vertex. It is yet unknown into which other collections the query might traverse */ db._createStatement({ query: `FOR v IN ANY "users/1234" connections RETURN v` }).execute().toArray().forEach(function (d) { /* ... */ }); } }); This automatic lazy addition of collections to a transaction also introduces the possibility of deadlocks. Deadlocks may occur if there are concurrent transactions that try to acquire locks on the same collections lazily. In order to make a transaction fail when a non-declared collection is used inside a transaction for reading, the optional allowImplicit subattribute of collections can be set to false: db._executeTransaction({ collections: { read: "users", allowImplicit: false }, action: function () { /* The below query will now fail because the collection "connections" has not been specified in the list of collections used by the transaction */ const db = require("@arangodb").db; db._createStatement({ query: `FOR v IN ANY "users/1234" connections RETURN v` }).execute().toArray().forEach(function (d) { /* ... */ }); } }); The default value for allowImplicit is true. Write-accessing collections that have not been declared in the collections array is never possible, regardless of the value of allowImplicit. If users/1234 has an edge in connections, linking it to another document in the users collection, then the following explicit declaration will work: db._executeTransaction({ collections: { read: ["users", "connections"], allowImplicit: false }, /* ... */ If the edge points to a document in another collection however, then the query will fail, unless that other collection is added to the declaration as well. Note that if a document handle is used as starting point for a traversal, e.g. "users/not_linked"} ... false FOR v IN ANY "users/not_linked" ... or , then no error is raised in the case of the start vertex not having any edges to follow, with FOR v IN ANY {_id: allowImplicit: and users not being declared for read access. AQL only sees a string and does not consider it a read access, unless there are edges connected to it. FOR v IN ANY DOCUMENT("users/not_linked") ... will fail even without edges, as it is always considered to be a read access to the users collection. 300 Locking and isolation Deadlocks and Deadlock detection A deadlock is a situation in which two or more concurrent operations (user transactions or AQL queries) try to access the same resources (collections, documents) and need to wait for the others to finish, but none of them can make any progress. A good example for a deadlock is two concurrently executing transactions T1 and T2 that try to access the same collections but that need to wait for each other. In this example, transaction T1 will write to collection c1 , but will also read documents from collection c2 without announcing it: db._executeTransaction({ collections: { write: "c1" }, action: function () { const db = require("@arangodb").db; /* write into c1 (announced) */ db.c1.insert({ foo: "bar" }); /* some operation here that takes long to execute... */ /* read from c2 (unannounced) */ db.c2.toArray(); } }); Transaction T2 announces to write into collection c2 , but will also read documents from collection c1 without announcing it: db._executeTransaction({ collections: { write: "c2" }, action: function () { var db = require("@arangodb").db; /* write into c2 (announced) */ db.c2.insert({ bar: "baz" }); /* some operation here that takes long to execute... */ /* read from c1 (unannounced) */ db.c1.toArray(); } }); In the above example, a deadlock will occur if transaction T1 and T2 have both acquired their write locks (T1 for collection for collection c2 ) and are then trying to read from the other other (T1 will read from acquire the read lock on collection c2 c2 , T2 will read from c1 and T2 c1 ). T1 will then try to , which is prevented by transaction T2. T2 however will wait for the read lock on collection c1 , which is prevented by transaction T1. In case of such deadlock, there would be no progress for any of the involved transactions, and none of the involved transactions could ever complete. This is completely undesirable, so the automatic deadlock detection mechanism in ArangoDB will automatically abort one of the transactions involved in such deadlock. Aborting means that all changes done by the transaction will be rolled back and error 29 ( deadlock detected ) will be thrown. Client code (AQL queries, user transactions) that accesses more than one collection should be aware of the potential of deadlocks and should handle the error 29 ( deadlock detected ) properly, either by passing the exception to the caller or retrying the operation. To avoid both deadlocks and non-repeatable reads, all collections used in a transaction should be specified in the collections attribute when known in advance. In case this is not possible because collections are added dynamically inside the transaction, deadlocks may occur and the deadlock detection may kick in and abort the transaction. The RocksDB engine uses document-level locks and therefore will not have a deadlock problem on collection level. If two concurrent transactions however modify the same documents or index entries, the RocksDB engine will signal a write-write conflict and abort one of the transactions with error 1200 ("conflict") automatically. 301 Locking and isolation 302 Durability Durability Transactions are executed in main memory first until there is either a rollback or a commit. On rollback, no data will be written to disk, but the operations from the transaction will be reversed in memory. On commit, all modifications done in the transaction will be written to the collection datafiles. These writes will be synchronized to disk if any of the modified collections has the waitForSync property set to true, or if any individual operation in the transaction was executed with the waitForSync attribute. Additionally, transactions that modify data in more than one collection are automatically synchronized to disk. This synchronization is done to not only ensure durability, but to also ensure consistency in case of a server crash. That means if you only modify data in a single collection, and that collection has its waitForSync property set to false, the whole transaction will not be synchronized to disk instantly, but with a small delay. There is thus the potential risk of losing data between the commit of the transaction and the actual (delayed) disk synchronization. This is the same as writing into collections that have the waitForSync property set to false outside of a transaction. In case of a crash with waitForSync set to false, the operations performed in the transaction will either be visible completely or not at all, depending on whether the delayed synchronization had kicked in or not. To ensure durability of transactions on a collection that have the waitForSync property set to false, you can set the waitForSync attribute of the object that is passed to executeTransaction. This will force a synchronization of the transaction to disk even for collections that have waitForSync set to false: db._executeTransaction({ collections: { write: "users" }, waitForSync: true, action: function () { ... } }); An alternative is to perform an operation with an explicit sync request in a transaction, e.g. db.users.save({ _key: "1234" }, true); In this case, the true value will make the whole transaction be synchronized to disk at the commit. In any case, ArangoDB will give users the choice of whether or not they want full durability for single collection transactions. Using the delayed synchronization (i.e. waitForSync with a value of false) will potentially increase throughput and performance of transactions, but will introduce the risk of losing the last committed transactions in the case of a crash. In contrast, transactions that modify data in more than one collection are automatically synchronized to disk. This comes at the cost of several disk sync. For a multi-collection transaction, the call to the _executeTransaction function will only return after the data of all modified collections has been synchronized to disk and the transaction has been made fully durable. This not only reduces the risk of losing data in case of a crash but also ensures consistency after a restart. In case of a server crash, any multi-collection transactions that were not yet committed or in preparation to be committed will be rolled back on server restart. For multi-collection transactions, there will be at least one disk sync operation per modified collection. M ulti-collection transactions thus have a potentially higher cost than single collection transactions. There is no configuration to turn off disk synchronization for multicollection transactions in ArangoDB. The disk sync speed of the system will thus be the most important factor for the performance of multi-collection transactions. 303 Limitations Limitations In General Transactions in ArangoDB have been designed with particular use cases in mind. They will be mainly useful for short and small data retrieval and/or modification operations. The implementation is not optimized for very long-running or very voluminous operations, and may not be usable for these cases. One limitation is that a transaction operation information must fit into main memory. The transaction information consists of record pointers, revision numbers and rollback information. The actual data modification operations of a transaction are written to the writeahead log and do not need to fit entirely into main memory. Ongoing transactions will also prevent the write-ahead logs from being fully garbage-collected. Information in the write-ahead log files cannot be written to collection data files or be discarded while transactions are ongoing. To ensure progress of the write-ahead log garbage collection, transactions should be kept as small as possible, and big transactions should be split into multiple smaller transactions. Transactions in ArangoDB cannot be nested, i.e. a transaction must not start another transaction. If an attempt is made to call a transaction from inside a running transaction, the server will throw error 1651 (nested transactions detected). It is also disallowed to execute user transaction on some of ArangoDB's own system collections. This shouldn't be a problem for regular usage as system collections will not contain user data and there is no need to access them from within a user transaction. Some operations are not allowed inside transactions in general: creation and deletion of databases ( db._createDatabase() creation and deletion of collections ( creation and deletion of indexes ( db._create() , , db._dropDatabase() db._drop() db. .ensureIndex() , ) db. .rename() , ) db. .dropIndex() ) If an attempt is made to carry out any of these operations during a transaction, ArangoDB will abort the transaction with error code 1653 (disallowed operation inside transaction). Finally, all collections that may be modified during a transaction must be declared beforehand, i.e. using the collections attribute of the object passed to the _executeTransaction function. If any attempt is made to carry out a data modification operation on a collection that was not declared in the collections attribute, the transaction will be aborted and ArangoDB will throw error 1652 unregistered collection used in transaction. It is legal to not declare read-only collections, but this should be avoided if possible to reduce the probability of deadlocks and non-repeatable reads. Please refer to Locking and Isolation for more details. In Clusters Using a single instance of ArangoDB, multi-document / multi-collection queries are guaranteed to be fully ACID. This is more than many other NoSQL database systems support. In cluster mode, single-document operations are also fully ACID. M ulti-document / multicollection queries in a cluster are not ACID, which is equally the case with competing database systems. Transactions in a cluster will be supported in a future version of ArangoDB and make these operations fully ACID as well. Transactions in the RocksDB storage engine Data of ongoing transactions is stored in RAM . Query-Transactions that get too big (in terms of number of operations involved or the total size of data created or modified by the transaction) will be committed automatically. Effectively this means that big user transactions are split into multiple smaller RocksDB transactions that are committed individually. The entire user transaction will not necessarily have ACID properties in this case. The following global options can be used to control the RAM usage and automatic intermediate commits for the RocksDB engine: --rocksdb.max-transaction-size 304 Limitations Transaction size limit (in bytes). Transactions store all keys and values in RAM , so large transactions run the risk of causing out-ofmemory sitations. This setting allows you to ensure that does not happen by limiting the size of any individual transaction. Transactions whose operations would consume more RAM than this threshold value will abort automatically with error 32 ("resource limit exceeded"). --rocksdb.intermediate-commit-size If the size of all operations in a transaction reaches this threshold, the transaction is committed automatically and a new transaction is started. The value is specified in bytes. --rocksdb.intermediate-commit-count If the number of operations in a transaction reaches this value, the transaction is committed automatically and a new transaction is started. The above values can also be adjusted per query, by setting the following attributes in the call to db._query(): maxTransactionSize: transaction size limit in bytes intermediateCommitSize: maximum total size of operations after which an intermediate commit is performed automatically intermediateCommitCount: maximum number of operations after which an intermediate commit is performed automatically 305 Deployment Deployment In this chapter we describe various possibilities to deploy ArangoDB. In particular for the cluster mode, there are different ways and we want to highlight their advantages and disadvantages. We even document in detail, how to set up a cluster by simply starting various ArangoDB processes on different machines, either directly or using Docker containers. Single instance Cluster DC/OS, Apache M esos and M arathon Generic & Docker Advanced Topics Standalone Agency Test setup on a local machine Starting processes on different machines Launching an ArangoDB cluster using Docker containers M ultiple Datacenters 306 Single instance Single instance deployment The latest official builds of ArangoDB for all supported operating systems may be obtained from https://www.arangodb.com/download/. Linux remarks Besides the official images which are provided for the most popular linux distributions there are also a variety of unofficial images provided by the community. We are tracking most of the community contributions (including new or updated images) in our newsletter: https://www.arangodb.com/category/newsletter/ Windows remarks Please note that ArangoDB will only work on 64bit. Docker The simplest way to deploy ArangoDB is using Docker. To get a general understanding of Docker have a look at their excellent documentation. Authentication To start the official Docker container you will have to decide on an authentication method. Otherwise the container won't start. Provide one of the arguments to Docker as an environment variable. There are three options: 1. ARANGO_NO_AUTH=1 Disable authentication completely. Useful for local testing or for operating in a trusted network (without a public interface). 2. ARANGO_ROOT_PASSWORD=password Start ArangoDB with the given password for root 3. ARANGO_RANDOM _ROOT_PASSWORD=1 Let ArangoDB generate a random root password To get going quickly: docker run -e ARANGO_RANDOM_ROOT_PASSWORD=1 arangodb/arangodb For an in depth guide about Docker and ArangoDB please check the official documentation: https://hub.docker.com/r/arangodb/arangodb/ . Note that we are using the image arangodb arangodb/arangodb here which is always the most current one. There is also the "official" one called whose documentation is here: https://hub.docker.com/_/arangodb/ 307 Cluster Cluster M esos, DC/OS: Distributed deployment using Apache M esos Generic & Docker: Automatic native clusters with ArangoDB Starter Advanced Topics: Standalone Agency, local / distributed / Docker clusters 308 M esos, DC/OS Distributed deployment using Apache Mesos ArangoDB has a sophisticated and yet easy way to use cluster mode. To leverage the full cluster feature set (monitoring, scaling, automatic failover and automatic replacement of failed nodes) you have to run ArangoDB on some kind of cluster management system. Currently ArangoDB relies on Apache M esos in that matter. M esos is a cluster operating system which powers some of the worlds biggest datacenters running several thousands of nodes. DC/OS DC/OS is the recommended way to install a cluster as it eases much of the process to install a M esos cluster. You can deploy it very quickly on a variety of cloud hosters or setup your own DC/OS locally. DC/OS is a set of tools built on top of Apache M esos. Apache M esos is a so called "Distributed Cluster Operation System" and the core of DC/OS. Apache M esos has the concept of so called persistent volumes which make it perfectly suitable for a database. Installing First prepare a DC/OS cluster by going to https://dcos.io and following the instructions there. DC/OS comes with its own package management. Packages can be installed from the so called "Universe". As an official DC/OS partner ArangoDB can be installed from there straight away. 1. Installing via DC/OS UI i. Open your browser and go to the DC/OS admin interface ii. Open the "Universe" tab iii. Locate arangodb and hit "Install Package" iv. Press "Install Package" 2. Installing via the DC/OS command line i. Install the dcos cli ii. Open a terminal and issue dcos install arangodb Both options are essentially doing the same in the background. Both are starting ArangoDB with its default options set. To review the default options using the web interface simply click "Advanced Installation" in the web interface. There you will find a list of options including some explanation. To review the default options using the CLI first type dcos package describe --config arangodb . This will give you a flat list of default settings. To get an explanation of the various command line options please check the latest options here (choose the most recent number and have a look at config.json ): https://github.com/mesosphere/universe/tree/version-3.x/repo/packages/A/arangodb After installation DC/OS will start deploying the ArangoDB cluster on the DC/OS cluster. You can watch ArangoDB starting on the "Services" tab in the web interface. Once it is listed as healthy click the link next to it and you should see the ArangoDB web interface. ArangoDB Mesos framework As soon as ArangoDB was deployed M esos will keep your cluster running. The web interface has many monitoring facilities so be sure to make yourself familiar with the DC/OS web interface. As a fault tolerant system M esos will take care of most failure scenarios automatically. M esos does that by running ArangoDB as a so called "framework". This framework has been specifically built to keep ArangoDB running in a healthy condition on the M esos cluster. From time to time a task might fail. The ArangoDB framework will then take care of rescheduling the failed task. As it knows about the very specifics of each cluster task and its role it will automatically take care of most failure scenarios. To inspect what the framework is doing go to http://web-interface-url/mesos in your browser. Locate the task "arangodb" and inspect stderr in the "Sandbox". This can be of interest for example when a slave got lost and the framework is rescheduling the task. 309 M esos, DC/OS Using ArangoDB To use ArangoDB as a datastore in your DC/OS cluster you can facilitate the service discovery of DC/OS. Assuming you deployed a standard ArangoDB cluster the mesos dns will know about arangodb.mesos . By doing a SRV DNS request (check the documentation of mesos dns) you can find out the port where the internal HAProxy of ArangoDB is running. This will offer a round robin load balancer to access all ArangoDB coordinators. Scaling ArangoDB To change the settings of your ArangoDB Cluster access the ArangoDB UI and hit "Nodes". On the scale tab you will have the ability to scale your cluster up and down. After changing the settings the ArangoDB framework will take care of the rest. Scaling your cluster up is generally a straightforward operation as M esos will simply launch another task and be done with it. Scaling down is a bit more complicated as the data first has to be moved to some other place so that will naturally take somewhat longer. Please note that scaling operations might not always work. For example if the underlying M esos cluster is completely saturated with its running tasks scaling up will not be possible. Scaling down might also fail due to the cluster not being able to move all shards of a DBServer to a new destination because of size limitations. Be sure to check the output of the ArangoDB framework. Deinstallation Deinstalling ArangoDB is a bit more difficult as there is much state being kept in the M esos cluster which is not automatically cleaned up. To deinstall from the command line use the following one liner: dcos arangodb uninstall ; dcos package uninstall arangodb This will first cleanup the state in the cluster and then uninstall arangodb. arangodb-cleanup-framework Should you forget to cleanup the state you can do so later by using the arangodb-cleanup-framework container. Otherwise you might not be able to deploy a new arangodb installation. The cleanup framework will announce itself as a normal ArangoDB. M esos will recognize this and offer all persistent volumes it still has for ArangoDB to this framework. The cleanup framework will then properly free the persistent volumes. Finally it will clean up any state left in zookeeper (the central configuration manager in a M esos cluster). To deploy the cleanup framework, follow the instructions in the github repository. After deployment watch the output in the sandbox of the M esos web interface. After a while there shouldn't be any persistent resource offers anymore as everything was cleaned up. After that you can delete the cleanup framework again via M arathon. Apache Mesos and Marathon You can also install ArangoDB on a bare Apache M esos cluster provided that M arathon is running on it. Doing so has the following downsides: 1. M anual M esos cluster setup 2. You need to implement your own service discovery 3. You are missing the dcos cli 4. Installation and deinstallation are tedious 5. You need to setup some kind of proxy tunnel to access ArangoDB from the outside 6. Sparse monitoring capabilities However these are things which do not influence ArangoDB itself and operating your cluster like this is fully supported. Installing via Marathon To install ArangoDB via marathon you need a proper config file: { 310 M esos, DC/OS "id": "arangodb", "cpus": 0.25, "mem": 256.0, "ports": [0, 0, 0], "instances": 1, "args": [ "framework", "--framework_name=arangodb", "--master=zk://172.17.0.2:2181/mesos", "--zk=zk://172.17.0.2:2181/arangodb", "--user=", "--principal=pri", "--role=arangodb", "--mode=cluster", "--async_replication=true", "--minimal_resources_agent=mem(*):512;cpus(*):0.25;disk(*):512", "--minimal_resources_dbserver=mem(*):512;cpus(*):0.25;disk(*):1024", "--minimal_resources_secondary=mem(*):512;cpus(*):0.25;disk(*):1024", "--minimal_resources_coordinator=mem(*):512;cpus(*):0.25;disk(*):1024", "--nr_agents=1", "--nr_dbservers=2", "--nr_coordinators=2", "--failover_timeout=86400", "--arangodb_image=arangodb/arangodb-mesos:3.1", "--secondaries_with_dbservers=false", "--coordinators_with_dbservers=false" ], "container": { "type": "DOCKER", "docker": { "image": "arangodb/arangodb-mesos-framework:3.1", "network": "HOST" } }, "healthChecks": [ { "protocol": "HTTP", "path": "/framework/v1/health.json", "gracePeriodSeconds": 3, "intervalSeconds": 10, "portIndex": 0, "timeoutSeconds": 10, "maxConsecutiveFailures": 0 } ] } Carefully review the settings (especially the IPs and the resources). Then you can deploy to M arathon: curl -X POST -H "Content-Type: application/json" http://url-of-marathon/v2/apps -d @arangodb3.json Alternatively use the web interface of M arathon to deploy ArangoDB. It has a JSON mode and you can use the above configuration file. Deinstallation via Marathon As with DC/OS you first need to properly cleanup any state leftovers. The easiest is to simply delete ArangoDB and then deploy the cleanup-framework (see section arangodb-cleanup-framework). Configuration options The Arangodb M esos framework has a ton of different options which are listed and described here: https://github.com/arangodb/arangodb-mesos-framework/tree/3.1 311 Generic & Docker Automatic native Clusters Similarly to how the M esos framework aranges an ArangoDB cluster in a DC/OS environment for you, arangodb can do this for you in a plain environment. By invoking the first arangodb you launch a primary node. It will bind a network port, and output the commands you need to cut'n'paste into the other nodes. Let's review the process of such a startup on three hosts named h01 , h02 , and h03 : arangodb@h01 ~> arangodb --ownAddress h01:4000 2017/06/12 14:59:38 Starting arangodb version 0.5.0+git, build 5f97368 2017/06/12 14:59:38 Serving as master with ID '52698769' on h01:4000... 2017/06/12 14:59:38 Waiting for 3 servers to show up. 2017/06/12 14:59:38 Use the following commands to start other servers: arangodb --dataDir=./db2 --join h01:4000 arangodb --dataDir=./db3 --join h01:4000 2017/06/12 14:59:38 Listening on 0.0.0.0:4000 (h01:4000) So you cut the lines arangodb --data.dir=./db2 --starter.join 127.0.0.1 node on your network, replace the --starter.join 127.0.0.1 and execute them for the other nodes. If you run it on another by the public IP of the first host. arangodbh02 ~> arangodb --dataDir=./db2 --join h01:4000 2017/06/12 14:48:50 Starting arangodb version 0.5.0+git, build 5f97368 2017/06/12 14:48:50 Contacting master h01:4000... 2017/06/12 14:48:50 Waiting for 3 servers to show up... 2017/06/12 14:48:50 Listening on 0.0.0.0:4000 (:4000) arangodbh03 ~> arangodb --dataDir=./db3 --join h01:4000 2017/06/12 14:48:50 Starting arangodb version 0.5.0+git, build 5f97368 2017/06/12 14:48:50 Contacting master h01:4000... 2017/06/12 14:48:50 Waiting for 3 servers to show up... 2017/06/12 14:48:50 Listening on 0.0.0.0:4000 (:4000) Once the two other processes joined the cluster, and started their ArangoDB server processes (this may take a while depending on your system), it will inform you where to connect the Cluster from a Browser, shell or your programm: ... 2017/06/12 14:55:21 coordinator up and running. At this point you may access your cluster at either coordinator endpoint, http://h01:4002/, http://h02:4002/ or http://h03:4002/. Automatic native local test Clusters If you only want a local test cluster, you can run a single starter with the --starter.local argument. It will start a 3 "machine" cluster on your local PC. arangodb --starter.local Note. A local cluster is intended only for test purposes since a failure of a single PC will bring down the entire cluster. Automatic Docker Clusters ArangoDBStarter can also be used to launch clusters based on docker containers. Its a bit more complicated, since you need to provide information about your environment that can't be autodetected. In the Docker world you need to take care about where persistant data is stored, since containers are intended to be volatile. We use a volume named arangodb1 here: 312 Generic & Docker docker volume create arangodb1 (You can use any type of docker volume that fits your setup instead.) We then need to determine the the IP of the docker host where you intend to run ArangoDB starter on. Depending on your operating system execute ip addr, ifconfig or ipconfig to determine your local ip address. 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.32 So this example uses the IP 192.168.1.32 : docker run -it --name=adb1 --rm -p 8528:8528 \ -v arangodb1:/data \ -v /var/run/docker.sock:/var/run/docker.sock \ arangodb/arangodb-starter \ --starter.address=192.168.1.32 It will start the master instance, and command you to start the slave instances: Unable to find image 'arangodb/arangodb-starter:latest' locally latest: Pulling from arangodb/arangodb-starter Digest: sha256:b87d20c0b4757b7daa4cb7a9f55cb130c90a09ddfd0366a91970bcf31a7fd5a4 Status: Downloaded newer image for arangodb/arangodb-starter:latest 2017/06/12 13:26:14 Starting arangodb version 0.7.1, build f128884 2017/06/12 13:26:14 Serving as master with ID '46a2b40d' on 192.168.1.32:8528... 2017/06/12 13:26:14 Waiting for 3 servers to show up. 2017/06/12 13:26:14 Use the following commands to start other servers: docker volume create arangodb2 && \ docker run -it --name=adb2 --rm -p 8533:8528 -v arangodb2:/data \ -v /var/run/docker.sock:/var/run/docker.sock arangodb/arangodb-starter:0.7 \ --starter.address=192.168.1.32 --starter.join=192.168.1.32 docker volume create arangodb3 && \ docker run -it --name=adb3 --rm -p 8538:8528 -v arangodb3:/data \ -v /var/run/docker.sock:/var/run/docker.sock arangodb/arangodb-starter:0.7 \ --starter.address=192.168.1.32 --starter.join=192.168.1.32 Once you start the other instances, it will continue like this: 2017/05/11 09:05:45 Added master 'fc673b3b': 192.168.1.32, portOffset: 0 2017/05/11 09:05:45 Added new peer 'e98ea757': 192.168.1.32, portOffset: 5 2017/05/11 09:05:50 Added new peer 'eb01d0ef': 192.168.1.32, portOffset: 10 2017/05/11 09:05:51 Starting service... 2017/05/11 09:05:51 Looking for a running instance of agent on port 8531 2017/05/11 09:05:51 Starting agent on port 8531 2017/05/11 09:05:52 Looking for a running instance of dbserver on port 8530 2017/05/11 09:05:52 Starting dbserver on port 8530 2017/05/11 09:05:53 Looking for a running instance of coordinator on port 8529 2017/05/11 09:05:53 Starting coordinator on port 8529 2017/05/11 09:05:58 agent up and running (version 3.2.devel). 2017/05/11 09:06:15 dbserver up and running (version 3.2.devel). 2017/05/11 09:06:31 coordinator up and running (version 3.2.devel). And at least it tells you where you can work with your cluster: 2017/05/11 09:06:31 Your cluster can now be accessed with a browser at `http://192.168.1.32:8529` or 2017/05/11 09:06:31 using `arangosh --server.endpoint tcp://192.168.1.32:8529`. Under the hood The first arangodb you ran (as shown above) will become the master in your setup, the --starter.join will be the slaves. 313 Generic & Docker The master determines which ArangoDB server processes to launch on which slave, and how they should communicate. It will then launch the server processes and monitor them. Once it has detected that the setup is complete you will get the prompt. The master will save the setup for subsequent starts. M ore complicated setup options can be found in ArangoDBStarters Readme. 314 Advanced Topics Advanced Topics In contrast to the other topics in this chapter that strive to get you simply set up in prepared environments, The following chapters describe whats going on under the hood in details, the components of ArangoDB Clusters, and how they're put together: Standalone Agency Test setup on a local machine Starting processes on different machines Launching an ArangoDB cluster using Docker containers 315 Advanced Topics Launching ArangoDB's standalone "agency" Multiple ArangoDB instances can be deployed as a fault-tolerant distributed state machine. What is a fault-tolerant state machine in the first place? In many service deployments consisting of arbitrary components distributed over multiple machines one is faced with the challenge of creating a dependable centralised knowledge base or configuration. Implementation of such a service turns out to be one of the most fundamental problems in information engineering. While it may seem as if the realisation of such a service is easily conceivable, dependablity formulates a paradoxon on computer networks per se. On the one hand, one needs a distributed system to avoid a single point of failure. On the other hand, one has to establish consensus among the computers involved. Consensus is the keyword here and its realisation on a network proves to be far from trivial. M any papers and conference proceedings have discussed and evaluated this key challenge. Two algorithms, historically far apart, have become widely popular, namely Paxos and its derivatives and Raft. Discussing them and their differences, although highly enjoyable, must remain far beyond the scope of this document. Find the references to the main publications at the bottom of this page. At ArangoDB, we decided to implement Raft as it is arguably the easier to understand and thus implement. In simple terms, Raft guarantees that a linear stream of transactions, is replicated in realtime among a group of machines through an elected leader, who in turn must have access to and project leadership upon an overall majority of participating instances. In ArangoDB we like to call the entirety of the components of the replicated transaction log, that is the machines and the ArangoDB instances, which constitute the replicated log, the agency. Startup The agency must consists of an odd number of agents in order to be able to establish an overall majority and some means for the agents to be able to find one another at startup. The most obvious way would be to inform all agents of the addresses and ports of the rest. This however, is more information than needed. For example, it would suffice, if all agents would know the address and port of the next agent in a cyclic fashion. Another straitforward solution would be to inform all agents of the address and port of say the first agent. Clearly all cases, which would form disjunct subsets of agents would break or in the least impair the functionality of the agency. From there on the agents will gossip the missing information about their peers. Typically, one achieves fairly high fault-tolerance with low, odd number of agents while keeping the necessary network traffic at a minimum. It seems that the typical agency size will be in range of 3 to 7 agents. The below commands start up a 3-host agency on one physical/logical box with ports 8529, 8530 and 8531 for demonstration purposes. The adress of the first instance, port 8529, is known to the other two. After atmost 2 rounds of gossipping, the last 2 agents will have a complete picture of their surrounding and persist it for the next restart. ./build/bin/arangod --agency.activate true --agency.size 3 --agency.my-address tcp://localhost:8529 --server.authentication fal se --server.endpoint tcp://0.0.0.0:8529 agency-8529 ./build/bin/arangod --agency.activate true --agency.size 3 --agency.endpoint tcp://localhost:8529 --agency.my-address tcp://loc alhost:8530 --server.authentication false --server.endpoint tcp://0.0.0.0:8530 agency-8530 ./build/bin/arangod --agency.activate true --agency.size 3 --agency.endpoint tcp://localhost:8529 --agency.my-address tcp://loc alhost:8531 --server.authentication false --server.endpoint tcp://0.0.0.0:8531 agency-8531 The parameter agency.endpoint is the key ingredient for the second and third instances to find the first instance and thus form a complete agency. Please refer to the the shell-script scripts/startStandaloneAgency.sh on github or in the source directory. Key-value-store API The agency should be up and running within a couple of seconds, during which the instances have gossiped their way into knowing the other agents and elected a leader. The public API can be checked for the state of the configuration: curl -s localhost:8529/_api/agency/config 316 Advanced Topics { "term": 1, "leaderId": "f5d11cde-8468-4fd2-8747-b4ef5c7dfa98", "lastCommitted": 1, "lastAcked": { "ac129027-b440-4c4f-84e9-75c042942171": 0.21, "c54dbb8a-723d-4c82-98de-8c841a14a112": 0.21, "f5d11cde-8468-4fd2-8747-b4ef5c7dfa98": 0 }, "configuration": { "pool": { "ac129027-b440-4c4f-84e9-75c042942171": "tcp://localhost:8531", "c54dbb8a-723d-4c82-98de-8c841a14a112": "tcp://localhost:8530", "f5d11cde-8468-4fd2-8747-b4ef5c7dfa98": "tcp://localhost:8529" }, "active": [ "ac129027-b440-4c4f-84e9-75c042942171", "c54dbb8a-723d-4c82-98de-8c841a14a112", "f5d11cde-8468-4fd2-8747-b4ef5c7dfa98" ], "id": "f5d11cde-8468-4fd2-8747-b4ef5c7dfa98", "agency size": 3, "pool size": 3, "endpoint": "tcp://localhost:8529", "min ping": 0.5, "max ping": 2.5, "supervision": false, "supervision frequency": 5, "compaction step size": 1000, "supervision grace period": 120 } } To highlight some details in the above output look for "term" and "leaderId" . Both are key information about the current state of the Raft algorithm. You may have noted that the first election term has established a random leader for the agency, who is in charge of replication of the state machine and for all external read and write requests until such time that the process gets isolated from the other two subsequenctly losing its leadership. Read and Write APIs Generally, all read and write accesses are transactions moreover any read and write access may consist of multiple such transactions formulated as arrays of arrays in JSON documents. Read transaction An agency started from scratch will deal with the simplest query as follows: curl -L localhost:8529/_api/agency/read -d '[["/"]]' [{}] The above request for an empty key value store will return with an empty document. The inner array brackets will aggregate a result from multiple sources in the key-value-store while the outer array will deliver multiple such aggregated results. Also note the -L curl flag, which allows the request to follow redirects to the current leader. Consider the following key-value-store: { "baz": 12, "corge": { "e": 2.718281828459045, "pi": 3.14159265359 }, "foo": { "bar": "Hello World" }, 317 Advanced Topics "qux": { "quux": "Hello World" } } The following array of read transactions will yield: curl -L localhost:8529/_api/agency/read -d '[["/foo", "/foo/bar", "/baz"],["/qux"]]' [ { "baz": 12, "foo": { "bar": "Hello World" } }, { "qux": { "quux": "Hello World" } } ] Note that the result is an array of two results for the first and second read transactions from above accordingly. Also note that the results from the first read transaction are aggregated into { "baz": 12, "foo": { "bar": "Hello World" } } The aggregation is performed on 2 levels: 1. /foo/bar is eliminated as a subset of 2. The results from /foo and /bar /foo are joined The word transaction means here that it is guaranteed that all aggregations happen in quasi-realtime and that no write access could have happened in between. Btw, the same transaction on the virgin key-value store would produce [{},{}] Write API: The write API must unfortunately be a little more complex. M ultiple roads lead to Rome: curl -L localhost:8529/_api/agency/write -d '[[{"/foo":{"op":"push","new":"bar"}}]]' curl -L localhost:8529/_api/agency/write -d '[[{"/foo":{"op":"push","new":"baz"}}]]' curl -L localhost:8529/_api/agency/write -d '[[{"/foo":{"op":"push","new":"qux"}}]]' and curl -L localhost:8529/_api/agency/write -d '[[{"foo":["bar","baz","qux"]}]]' are equivalent for example and will create and fill an array at /foo . Here, again, the outermost array is the container for the transaction arrays. We documentent a complete guide of the API in the API section. 318 Advanced Topics Launching an ArangoDB cluster for testing An ArangoDB cluster consists of several running tasks (or server processes) which form the cluster. ArangoDB itself won't start or monitor any of these tasks. So it will need some kind of supervisor which is monitoring and starting these tasks. For production usage we recommend using Apache M esos as the cluster supervisor. However starting a cluster manually is possible and is a very easy method to get a first impression of what an ArangoDB cluster looks like. The easiest way to start a local cluster for testing purposes is to run scripts/startLocalCluster.sh repository after compiling ArangoDB from source (see instructions in the file Agency, 2 DBServers and 1 Coordinator. To stop the cluster issue from a clone of the source README_maintainers.md scripts/stopLocalCluster.sh in the repository. This will start 1 . This section will discuss the required parameters for every role in an ArangoDB cluster. Be sure to read the Architecture documentation to get a basic understanding of the underlying architecture and the involved roles in an ArangoDB cluster. In the following sections we will go through the relevant options per role. Agency To start up an agency you first have to activate it. This is done by providing --agency.activate true To start up the agency in its fault tolerant mode set the . You will then have to provide at least 3 agents before the --agency.size to 3 . agency will start operation. During initialization the agents have to find each other. To do so provide at least one common --agency.endpoint . The agents will then coordinate startup themselves. They will announce themselves with their external address which may be specified using address --agency.my- . This is required in bridged docker setups or NATed environments. So in summary this is what your startup might look like: 319 Advanced Topics arangod --server.endpoint tcp://0.0.0.0:5001 --agency.my-address=tcp://127.0.0.1:5001 --server.authentication false --agency.ac tivate true --agency.size 3 --agency.endpoint tcp://127.0.0.1:5001 --agency.supervision true --database.directory agency1 & arangod --server.endpoint tcp://0.0.0.0:5002 --agency.my-address=tcp://127.0.0.1:5002 --server.authentication false --agency.ac tivate true --agency.size 3 --agency.endpoint tcp://127.0.0.1:5001 --agency.supervision true --database.directory agency2 & arangod --server.endpoint tcp://0.0.0.0:5003 --agency.my-address=tcp://127.0.0.1:5003 --server.authentication false --agency.ac tivate true --agency.size 3 --agency.endpoint tcp://127.0.0.1:5001 --agency.supervision true --database.directory agency3 & If you are happy with a single agent, then simply use a single command like this: arangod --server.endpoint tcp://0.0.0.0:5001 --server.authentication false --agency.activate true --agency.size 1 --agency.endp oint tcp://127.0.0.1:5001 --agency.supervision true --database-directory agency1 & Furthermore, in the following sections when use a single option --cluster.agency-address is used multiple times to specify all three agent addresses, just instead. --cluster.agency.address tcp://127.0.0.1:5001 Coordinators and DBServers These two roles share a common set of relevant options. First you should specify the role using PRIMARY address (a database server) or COORDINATOR --cluster.my-role . This can either be . Furthermore provide the external endpoint (IP and port) of the task via --cluster.my- . The following is a full-example of what it might look like: arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:8529 --cluster.my-address tcp://127.0.0.1:8529 --cluster. my-role PRIMARY --cluster.agency-endpoint tcp://127.0.0.1:5001 --cluster.agency-endpoint tcp://127.0.0.1:5002 --cluster.agencyendpoint tcp://127.0.0.1:5003 --database.directory primary1 & arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:8530 --cluster.my-address tcp://127.0.0.1:8530 --cluster. my-role PRIMARY --cluster.agency-endpoint tcp://127.0.0.1:5001 --cluster.agency-endpoint tcp://127.0.0.1:5002 --cluster.agencyendpoint tcp://127.0.0.1:5003 --database.directory primary2 & arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:8531 --cluster.my-address tcp://127.0.0.1:8531 --cluster. my-role COORDINATOR --cluster.agency-endpoint tcp://127.0.0.1:5001 --cluster.agency-endpoint tcp://127.0.0.1:5002 --cluster.age ncy-endpoint tcp://127.0.0.1:5003 --database.directory coordinator & Note in particular that the endpoint descriptions given under IP address 0.0.0.0 server.endpoint --cluster.my-address and --cluster.agency-endpoint because they must contain an actual address that can be routed to the corresponding server. The must not use the 0.0.0.0 in -- simply means that the server binds itself to all available network devices with all available IP addresses. Upon registering with the agency during startup the cluster will assign an ID to every server. The generated ID will be printed out to the log or can be accessed via the http API by calling http://server-address/_admin/server/id . You have now launched a complete ArangoDB cluster and can contact its coordinator at the endpoint means that you can reach the web UI under http://127.0.0.1:8531 tcp://127.0.0.1:8531 , which . 320 Advanced Topics Launching an ArangoDB cluster on multiple machines Essentially, one can use the method from the previous section to start an ArangoDB cluster on multiple machines as well. The only changes are that one has to replace all local addresses 127.0.0.1 by the actual IP address of the corresponding server. If we assume that you want to start you ArangoDB cluster on three different machines with IP addresses 192.168.1.1 192.168.1.2 192.168.1.3 then the commands you have to use are (you can use host names if they can be resolved to IP addresses on all machines): On 192.168.1.1: sudo arangod --server.endpoint tcp://0.0.0.0:5001 --agency.my-address tcp://192.168.1.1:5001 --server.authentication false --ag ency.activate true --agency.size 3 --agency.supervision true --database.directory agency On 192.168.1.2: sudo arangod --server.endpoint tcp://0.0.0.0:5001 --agency.my-address tcp://192.168.1.2:5001 --server.authentication false --ag ency.activate true --agency.size 3 --agency.supervision true --database.directory agency On 192.168.1.3: sudo arangod --server.endpoint tcp://0.0.0.0:5001 --agency.my-address tcp://192.168.1.3:5001 --server.authentication false --ag ency.activate true --agency.size 3 --agency.endpoint tcp://192.168.1.1:5001 --agency.endpoint tcp://192.168.1.2:5001 --agency.e ndpoint tcp://192.168.1.3:5001 --agency.supervision true --database.directory agency On 192.168.1.1: sudo arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:8529 --cluster.my-address tcp://192.168.1.1:8529 --c luster.my-role PRIMARY --cluster.agency-endpoint tcp://192.168.1.1:5001 --cluster.agency-endpoint tcp://192.168.1.2:5001 --clus ter.agency-endpoint tcp://192.168.1.3:5001 --database.directory primary1 & On 192.168.1.2: sudo arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:8530 --cluster.my-address tcp://192.168.1.2:8530 --c luster.my-role PRIMARY --cluster.agency-endpoint tcp://192.168.1.1:5001 --cluster.agency-endpoint tcp://192.168.1.2:5001 --clus ter.agency-endpoint tcp://192.168.1.3:5001 --database.directory primary2 & On 192.168.1.3: arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:8531 --cluster.my-address tcp://192.168.1.3:8531 --cluste r.my-role COORDINATOR --cluster.agency-endpoint tcp://192.168.1.1:5001 --cluster.agency-endpoint tcp://192.168.1.2:5001 --clust er.agency-endpoint tcp://192.168.1.3:5001 --database.directory coordinator & Obviously, it would no longer be necessary to use different port numbers on different servers. We have chosen to keep all port numbers in comparison to the local setup to minimize the necessary changes. After having swallowed these longish commands, we hope that you appreciate the simplicity of the setup with Apache M esos and DC/OS. 321 Advanced Topics ArangoDB Cluster and Docker Networking A bit of extra care has to be invested due to the way in which Docker isolates its network. By default it fully isolates the network and by doing so an endpoint like --server.endpoint tcp://0.0.0.0:8529 will only bind to all interfaces inside the Docker container which does not include any external interface on the host machine. This may be sufficient if you just want to access it locally but in case you want to expose it to the outside you must facilitate Dockers port forwarding using the -p command line option. Be sure to check the official Docker documentation. To simply make arangodb available on all host interfaces on port 8529: docker run -p 8529:8529 -e ARANGO_NO_AUTH=1 arangodb Another possibility is to start Docker via network mode host . This is possible but generally not recommended. To do it anyway check the Docker documentation for details. Docker and Cluster tasks To start the cluster via Docker is basically the same as starting locally or on multiple machines. However just like with the single networking image we will face networking issues. You can simply use the -p flag to make the individual task available on the host machine or you could use Docker's links to enable task intercommunication. Please note that there are some flags that specify how ArangoDB can reach a task from the outside. These are very important and built for this exact usecase. An example configuration might look like this: docker run -e ARANGO_NO_AUTH=1 -p 192.168.1.1:10000:8529 arangodb/arangodb arangod --server.endpoint tcp://0.0.0.0:8529 --clust er.my-address tcp://192.168.1.1:10000 --cluster.my-role PRIMARY --cluster.agency-endpoint tcp://192.168.1.1:5001 --cluster.agen cy-endpoint tcp://192.168.1.2:5002 --cluster.agency-endpoint tcp://192.168.1.3:5003 This will start a primary DB server within a Docker container with an isolated network. Within the Docker container it will bind to all interfaces (this will be 127.0.0.1:8529 and some internal Docker ip on port 8529). By supplying -p 192.168.1.1:10000:8529 we are establishing a port forwarding from our local IP (192.168.1.1 port 10000 in this example) to port 8529 inside the container. Within the command we are telling arangod how it can be reached from the outside --cluster.my-address tcp://192.168.1.1:10000 . This information will be forwarded to the agency so that the other tasks in your cluster can see how this particular DBServer may be reached. 322 M ultiple Datacenters Datacenter to datacenter replication. This feature is only available in the Enterprise Edition About At some point in the grows of a database, there comes a need for replicating it across multiple datacenters. Reasons for that can be: Fallback in case of a disaster in one datacenter. Regional availability Separation of concerns And many more. This tutorial describes what the ArangoSync datacenter to datacenter replication solution (ArangoSync from now on) offers, when to use it, when not to use it and how to configure, operate, troubleshoot it & keep it safe. What is it ArangoSync is a solution that enables you to asynchronously replicate the entire structure and content in an ArangoDB cluster in one place to a cluster in another place. Typically it is used from one datacenter to another. It is not a solution for replicating single server instances. The replication done by ArangoSync in asynchronous. That means that when a client is writing data into the source datacenter, it will consider the request finished before the data has been replicated to the other datacenter. The time needed to completely replicate changes to the other datacenter is typically in the order of seconds, but this can vary significantly depending on load, network & computer capacity. ArangoSync performs replication in a single direction only. That means that you can replicate data from cluster A to cluster B or from cluster B to cluster A, but never at the same time. Data modified in the destination cluster will be lost! Replication is a completely autonomous process. Once it is configured it is designed to run 24/7 without frequent manual intervention. This does not mean that it requires no maintenance or attention at all. As with any distributed system some attention is needed to monitor its operation and keep it secure (e.g. certificate & password rotation). Once configured, ArangoSync will replicate both structure and data of an entire cluster. This means that there is no need to make additional configuration changes when adding/removing databases or collections. Also meta data such as users, foxx application & jobs are automatically replicated. When to use it... and when not ArangoSync is a good solution in all cases where you want to replicate data from one cluster to another without the requirement that the data is available immediately in the other cluster. ArangoSync is not a good solution when one of the following applies: You want to replicate data from cluster A to cluster B and from cluster B to cluster A at the same time. You need synchronous replication between 2 clusters. There is no network connection betwee cluster A and B. You want complete control over which database, collection & documents are replicate and which not. 323 M ultiple Datacenters Requirements To use ArangoSync you need the following: Two datacenters, each running an ArangoDB Enterprise cluster, version 3.3 or higher. A network connection between both datacenters with accessible endpoints for several components (see individual components for details). TLS certificates for ArangoSync master instances (can be self-signed). TLS certificates for Kafka brokers (can be self-signed). Optional (but recommended) TLS certificates for ArangoDB clusters (can be self-signed). Client certificates CA for ArangoSync masters (typically self-signed). Client certificates for ArangoSync masters (typically self-signed). At least 2 instances of the ArangoSync master in each datacenter. One instances of the ArangoSync worker on every machine in each datacenter. Note: In several places you will need a (x509) certificate. The certificates section below provides more guidance for creating and renewing these certificates. Besides the above list, you probably want to use the following: An orchestrator to keep all components running. In this tutorial we will use systemd as an example. A log file collector for centralized collection & access to the logs of all components. A metrics collector & viewing solution such as Prometheus + Grafana. Deployment In the following paragraphs you'll learn how to deploy all the components needed for datacenter to datacenter replication. ArangoDB cluster There are several ways to start an ArangoDB cluster. In this tutorial we will focus on our recommended way to start ArangoDB: the ArangoDB starter. Datacenter to datacenter replication requires the rocksdb storage engine. In this tutorial the example setup will have rocksdb enabled. If you choose to deploy with a different strategy keep in mind to set the storage engine. Also see other possibilities to deploy an ArangoDB cluster. The starter simplifies things for the operator and will coordinate a distributed cluster startup across several machines and assign cluster roles automatically. When started on several machines and enough machines have joined, the starters will start agents, coordinators and dbservers on these machines. When running the starter will supervise its child tasks (namely coordinators, dbservers and agents) and restart them in case of failure. To start the cluster using a systemd unit file use the following: [Unit] Description=Run the ArangoDB Starter After=network.target [Service] Restart=on-failure EnvironmentFile=/etc/arangodb.env EnvironmentFile=/etc/arangodb.env.local Environment=DATADIR=/var/lib/arangodb/cluster ExecStartPre=/usr/bin/sh -c "mkdir -p ${DATADIR}" ExecStart=/usr/bin/arangodb \ --starter.address=${PRIVATEIP} \ --starter.data-dir=${DATADIR} \ --starter.join=${STARTERENDPOINTS} \ --server.storage-engine=rocksdb \ --auth.jwt-secret=${CLUSTERSECRETPATH} TimeoutStopSec=60 324 M ultiple Datacenters [Install] WantedBy=multi-user.target Note that we set rocksdb in the unit service file. Cluster authentication The communication between the cluster nodes use a token (JWT) to authenticate. This must be shared between cluster nodes. Sharing secrets is obviously a very delicate topic. The above workflow assumes that the operator will put a secret in a file named ${CLUSTERSECRETPATH} . We recommend to use a dedicated system for managing secrets like HashiCorps' Vault or the secret management of DC/OS . Required ports As soon as enough machines have joined, the starter will begin starting agents, coordinators and dbservers. Each of these tasks needs a port to communicate. Please make sure that the following ports are available on all machines: 8529 for coordinators 8530 for dbservers 8531 for agents The starter itself will use port 8528 . Kafka & Zookeeper How to deploy zookeeper How to deploy kafka Accessible ports Sync Master The Sync M aster is responsible for managing all synchronization, creating tasks and assigning those to workers. At least 2 instances muts be deployed in each datacenter. One instance will be the "leader", the other will be an inactive slave. When the leader is gone for a short while, one of the other instances will take over. With clusters of a significant size, the sync master will require a significant set of resources. Therefore it is recommended to deploy sync masters on their own servers, equiped with sufficient CPU power and memory capacity. To start a sync master using a systemd service, use a unit like this: [Unit] Description=Run ArangoSync in master mode After=network.target [Service] Restart=on-failure EnvironmentFile=/etc/arangodb.env EnvironmentFile=/etc/arangodb.env.local ExecStart=/usr/sbin/arangosync run master \ --log.level=debug \ --cluster.endpoint=${CLUSTERENDPOINTS} \ --cluster.jwtSecret=${CLUSTERSECRET} \ --server.keyfile=${CERTIFICATEDIR}/tls.keyfile \ --server.client-cafile=${CERTIFICATEDIR}/client-auth-ca.crt \ --server.endpoint=https://${PUBLICIP}:${MASTERPORT} \ --server.port=${MASTERPORT} \ --master.jwtSecret=${MASTERSECRET} \ --mq.type=kafka \ --mq.kafka-addr=${KAFKAENDPOINTS} \ --mq.kafka-client-keyfile=${CERTIFICATEDIR}/kafka-client.key \ --mq.kafka-cacert=${CERTIFICATEDIR}/tls-ca.crt \ TimeoutStopSec=60 325 M ultiple Datacenters [Install] WantedBy=multi-user.target The sync master needs a TLS server certificate and a If you want the service to create a TLS certificate & client authentication certificate, for authenticating with sync masters in another datacenter, for every start, add this to the Service section. ExecStartPre=/usr/bin/sh -c "mkdir -p ${CERTIFICATEDIR}" ExecStartPre=/usr/sbin/arangosync create tls keyfile \ --cacert=${CERTIFICATEDIR}/tls-ca.crt \ --cakey=${CERTIFICATEDIR}/tls-ca.key \ --keyfile=${CERTIFICATEDIR}/tls.keyfile \ --host=${PUBLICIP} \ --host=${PRIVATEIP} \ --host=${HOST} ExecStartPre=/usr/sbin/arangosync create client-auth keyfile \ --cacert=${CERTIFICATEDIR}/tls-ca.crt \ --cakey=${CERTIFICATEDIR}/tls-ca.key \ --keyfile=${CERTIFICATEDIR}/kafka-client.key \ --host=${PUBLICIP} \ --host=${PRIVATEIP} \ --host=${HOST} The sync master must be reachable on a TCP port ${MASTERPORT} (used with --server.port option). This port must be reachable from inside the datacenter (by sync workers and operations) and from inside of the other datacenter (by sync masters in the other datacenter). Sync Workers The Sync Worker is responsible for executing synchronization tasks. For optimal performance at least 1 worker instance must be placed on every machine that has an ArangoDB dbserver running. This ensures that tasks can be executed with minimal network traffic outside of the machine. Since sync workers will automatically stop once their TLS server certificate expires (which is set to 2 years by default), it is recommended to run at least 2 instances of a worker on every machine in the datacenter. That way, tasks can still be assigned in the most optimal way, even when a worker in temporarily down for a restart. To start a sync worker using a systemd service, use a unit like this: [Unit] Description=Run ArangoSync in worker mode After=network.target [Service] Restart=on-failure EnvironmentFile=/etc/arangodb.env EnvironmentFile=/etc/arangodb.env.local Environment=PORT=8729 ExecStart=/usr/sbin/arangosync run worker \ --log.level=debug \ --server.port=${PORT} \ --server.endpoint=https://${PRIVATEIP}:${PORT} \ --master.endpoint=${MASTERENDPOINTS} \ --master.jwtSecret=${MASTERSECRET} TimeoutStopSec=60 [Install] WantedBy=multi-user.target The sync worker must be reachable on a TCP port ${PORT} (used with --server.port option). This port must be reachable from inside the datacenter (by sync masters). Prometheus & Grafana (optional) ArangoSync provides metrics in a format supported by Prometheus. We also provide a standard set of dashboards for viewing those metrics in Grafana. If you want to use these tools, go to their websites for instructions on how to deploy them. 326 M ultiple Datacenters After deployment, you must configure prometheus using a configuration file that instructs it about which targets to scrape. For ArangoSync you should configure scrape targets for all sync masters and all sync workers. To do so, you can use a configuration such as this: global: scrape_interval: 10s # scrape targets every 10 seconds. scrape_configs: # Scrap sync masters - job_name: 'sync_master' scheme: 'https' bearer_token: "${MONITORINGTOKEN}" tls_config: insecure_skip_verify: true static_configs: - targets: - "${IPMASTERA1}:8629" - "${IPMASTERA2}:8629" - "${IPMASTERB1}:8629" - "${IPMASTERB2}:8629" labels: type: "master" relabel_configs: - source_labels: [__address__] regex: ${IPMASTERA1}\:8629|${IPMASTERA2}\:8629 target_label: dc replacement: A - source_labels: [__address__] regex: ${IPMASTERB1}\:8629|${IPMASTERB2}\:8629 target_label: dc replacement: B - source_labels: [__address__] regex: ${IPMASTERA1}\:8629|${IPMASTERB1}\:8629 target_label: instance replacement: 1 - source_labels: [__address__] regex: ${IPMASTERA2}\:8629|${IPMASTERB2}\:8629 target_label: instance replacement: 2 # Scrap sync workers - job_name: 'sync_worker' scheme: 'https' bearer_token: "${MONITORINGTOKEN}" tls_config: insecure_skip_verify: true static_configs: - targets: - "${IPWORKERA1}:8729" - "${IPWORKERA2}:8729" - "${IPWORKERB1}:8729" - "${IPWORKERB2}:8729" labels: type: "worker" relabel_configs: - source_labels: [__address__] regex: ${IPWORKERA1}\:8729|${IPWORKERA2}\:8729 target_label: dc replacement: A - source_labels: [__address__] regex: ${IPWORKERB1}\:8729|${IPWORKERB2}\:8729 target_label: dc replacement: B - source_labels: [__address__] regex: ${IPWORKERA1}\:8729|${IPWORKERB1}\:8729 target_label: instance replacement: 1 - source_labels: [__address__] regex: ${IPWORKERA2}\:8729|${IPWORKERB2}\:8729 target_label: instance replacement: 2 327 M ultiple Datacenters Note: The above example assumes 2 datacenters, with 2 sync masters & 2 sync workers per datacenter. You have to replace all ${...} variables in the above configuration with applicable values from your environment. Configuration Once all components of the ArangoSync solution have been deployed and are running properly, ArangoSync will not automatically replicate database structure and content. For that, it is is needed to configure synchronization. To configure synchronization, you need the following: The endpoint of the sync master in the target datacenter. The endpoint of the sync master in the source datacenter. A certificate (in keyfile format) used for client authentication of the sync master (with the sync master in the source datacenter). A CA certificate (public key only) for verifying the integrity of the sync masters. A username+password pair (or client certificate) for authenticating the configure require with the sync master (in the target datacenter) With that information, run: arangosync configure sync \ --master.endpoint= \ --master.keyfile= \ --source.endpoint= \ --source.cacert= \ --auth.user= \ --auth.password= The command will finish quickly. Afterwards it will take some time until the clusters in both datacenters are in sync. Use the following command to inspect the status of the synchronization of a datacenter: arangosync get status \ --master.endpoint= \ --auth.user= \ --auth.password= \ -v Note: Invoking this command on the target datacenter will return different results from invoking it on the source datacenter. You need insight in both results to get a "complete picture". Where the get status command gives insight in the status of synchronization, there are more detailed commands to give insight in tasks & registered workers. Use the following command to get a list of all synchronization tasks in a datacenter: arangosync get tasks \ --master.endpoint= \ --auth.user= \ --auth.password= \ -v Use the following command to get a list of all masters in a datacenter and know which master is the current leader: arangosync get masters \ --master.endpoint= \ --auth.user= \ --auth.password= \ -v Use the following command to get a list of all workers in a datacenter: arangosync get workers \ --master.endpoint= \ 328 M ultiple Datacenters --auth.user= \ --auth.password= \ -v Stop synchronization If you no longer want to synchronize data from a source to a target datacenter you must stop it. To do so, run the following command: arangosync stop sync \ --master.endpoint= \ --auth.user= \ --auth.password= The command will wait until synchronization has completely stopped before returning. If the synchronization is not completely stopped within a reasonable period (2 minutes by default) the command will fail. If the source datacenter is no longer available it is not possible to stop synchronization in a graceful manner. If that happens abort the synchronization with the following command: arangosync abort sync \ --master.endpoint= \ --auth.user= \ --auth.password= If the source datacenter recovers after an abort sync has been executed, it is needed to "cleanup" ArangoSync in the source datacenter. To do so, execute the following command: arangosync abort outgoing sync \ --master.endpoint= \ --auth.user= \ --auth.password= Reversing synchronization direction If you want to reverse the direction of synchronization (e.g. after a failure in datacenter A and you switched to the datacenter B for fallback), you must first stop (or abort) the original synchronization. Once that is finished (and cleanup has been applied in case of abort), you must now configure the synchronization again, but with swapped source & target settings. Operations & Maintenance ArangoSync is a distributed system with a lot different components. As with any such system, it requires some, but not a lot, of operational support. What means are available to monitor status All of the components of ArangoSync provide means to monitor their status. Below you'll find an overview per component. Sync master & workers: The A status API, see arangosync servers running as either master or worker, provide: arangosync get status . M ake sure that all statuses report For even more detail the following commands are also available: arangosync get workers . arangosync get tasks , arangosync get masters & . A log on the standard output. Log levels can be configured using A metrics API running GET /metrics --log.level settings. . This API is compatible with Prometheus. Sample Grafana dashboards for inspecting these metrics are available. ArangoDB cluster: The arangod servers that make up the ArangoDB cluster provide: A log file. This is configurable with settings with a log. prefix. E.g. --log.output=file://myLogFile or --log.level=info . 329 M ultiple Datacenters A statistics API GET /_admin/statistics Kafka cluster: The kafka brokers provide: A log file, see settings with prefix in its log. server.properties configuration file. Zookeeper: The zookeeper agents provide: A log on standard output. What to look for while monitoring status The very first thing to do when monitoring the status of ArangoSync is to look into the status provided by v . When not everything is in the running arangosync get status ... - state (on both datacenters), this is an indication that something may be wrong. In case that happens, give it some time (incremental synchronization may take quite some time for large collections) and look at the status again. If the statuses do not change (or change, but not reach running ) it is time to inspects the metrics & log files. When the metrics or logs seem to indicate a problem in a sync master or worker, it is safe to restart it, as long as only 1 instance is restarted at a time. Give restarted instances some time to "catch up". What to do when problems remain When a problem remains and restarting masters/workers does not solve the problem, contact support. M ake sure to include provide support with the following information: Output of arangosync get version ... Output of arangosync get status ... -v on both datacenters. Output of arangosync get tasks ... -v Output of arangosync get masters ... -v on both datacenters. Output of arangosync get workers ... -v on both datacenters. on both datacenters. on both datacenters. Log files of all components A complete description of the problem you observed and what you did to resolve it. How to monitor status of ArangoSync How to keep it alive What to do in case of failures or bugs What to do when a source datacenter is down When you use ArangoSync for backup of your cluster from one datacenter to another and the source datacenter has a complete outage, you may consider switching your applications to the target (backup) datacenter. This is what you must do in that case. 1. Stop configuration using arangosync stop sync ... . When the source datacenter is completely unresponsive this will not succeed. In that case use arangosync abort sync ... . See Configuration for how to cleanup the source datacenter when it becomes available again. 2. Verify that configuration has completely stopped using arangosync get status ... -v . 3. Reconfigure your applications to use the target (backup) datacenter. When the original source datacenter is restored, you may switch roles and make it the target datacenter. To do so, use configure sync ... arangosync as described in Configuration. What to do in case of a planned network outage. All ArangoSync tasks send out heartbeat messages out to the other datacenter to indicate "it is still alive". The other datacenter assumes the connection is "out of sync" when it does not receive any messages for a certain period of time. If you're planning some sort of maintenance where you know the connectivity will be lost for some time (e.g. 3 hours), you can prepare ArangoSync for that such that it will hold of re-synchronization for a given period of time. To do so, on both datacenters, run: arangosync set message timeout \ 330 M ultiple Datacenters --master.endpoint= \ --auth.user= \ --auth.password= \ 3h The last argument is the period that ArangoSync should hold-of resynchronization for. This can be minutes (e.g. 3h 15m ) or hours (e.g. ). If maintenance is taking longer than expected, you can use the same command the extend the hold of period (e.g. to After the maintenance, use the same command restore the hold of period to its default of 1h 4h ). . What to do in case of a document that exceeds the message queue limits. If you insert/update a document in a collection and the size of that document is larger than the maximum message size of your message queue, the collection will no longer be able to synchronize. It will go into a failed state. To recover from that, first remove the document from the ArangoDB cluster in the source datacenter. After that, for each failed shard, run: arangosync reset failed shard \ --master.endpoint= \ --auth.user= \ --auth.password= \ --database= \ --collection= \ --shard= After this command, a new set of tasks will be started to synchronize the shard. It can take some time for the shard to reach running state. Metrics ArangoSync (master & worker) provide metrics that can be used for monitoring the ArangoSync solution. These metrics are available using the following HTTPS endpoints: GET /metrics GET /metrics.json : Provides metrics in a format supported by Prometheus. : Provides the same metrics in JSON format. Both endpoints include help information per metrics. Note: Both endpoints require authentication. Besides the usual authentication methods these endpoints are also accessible using a special bearer token specified using the The Prometheus output ( --monitoring.token /metrics command line option. ) looks like this: ... # HELP arangosync_master_worker_registrations Total number of registrations # TYPE arangosync_master_worker_registrations counter arangosync_master_worker_registrations 2 # HELP arangosync_master_worker_storage Number of times worker info is stored, loaded # TYPE arangosync_master_worker_storage counter arangosync_master_worker_storage{kind="",op="save",result="success"} 20 arangosync_master_worker_storage{kind="empty",op="load",result="success"} 1 ... The JSON output ( /metrics.json ) looks like this: { ... "arangosync_master_worker_registrations": { "help": "Total number of registrations", "type": "counter", "samples": [ { "value": 2 331 M ultiple Datacenters } ] }, "arangosync_master_worker_storage": { "help": "Number of times worker info is stored, loaded", "type": "counter", "samples": [ { "value": 8, "labels": { "kind": "", "op": "save", "result": "success" } }, { "value": 1, "labels": { "kind": "empty", "op": "load", "result": "success" } } ] } ... } Hint: To get a list of a metrics and their help information, run: alias jq='docker run --rm -i realguess/jq jq' curl -sk -u " : " https:// :8629/metrics.json | \ jq 'with_entries({key: .key, value:.value.help})' Security Firewall settings The components of ArangoSync use (TCP) network connections to communicate with each other. Below you'll find an overview of these connections and the TCP ports that should be accessible. 1. The sync masters must be allowed to connect to the following components within the same datacenter: ArangoDB agents and coordinators (default ports: Kafka brokers (default port 9092 ) Sync workers (default port 8729 ) 8531 and 8529 ) Additionally the sync masters must be allowed to connect to the sync masters in the other datacenter. By default the sync masters will operate on port 8629 . 2. The sync workers must be allowed to connect to the following components within the same datacenter: ArangoDB coordinators (default port Kafka brokers (default port Sync masters (default port 8629 8529 ) ) 9092 ) By default the sync workers will operate on port 8729 . Additionally the sync workers must be allowed to connect to the Kafka brokers in the other datacenter. 3. Kafka The kafka brokers must be allowed to connect to the following components within the same datacenter: Other kafka brokers (default port Zookeeper (default ports The default port for kafka is 2181 9092 , 9092 2888 ) and 3888 ) . The default kafka installation will also expose some prometheus metrics on port 7071 . To gain more insight into kafka open this port for your prometheus installation. 332 M ultiple Datacenters 4. Zookeeper The zookeeper agents must be allowed to connect to the following components within the same datacenter: Other zookeeper agents The setup here is a bit special as zookeeper uses 3 ports for different operations. All agents need to be able to connect to all of these ports. By default Zookeeper uses: port 2181 for client communication port 2888 for follower communication port 3888 for leader elections Certificates Digital certificates are used in many places in ArangoSync for both encryption and authentication. In ArangoSync all network connections are using Transport Layer Security (TLS), a set of protocols that ensure that all network traffic is encrypted. For this TLS certificates are used. The server side of the network connection offers a TLS certificate. This certificate is (often) verified by the client side of the network connection, to ensure that the certificate is signed by a trusted Certificate Authority (CA). This ensures the integrity of the server. In several places additional certificates are used for authentication. In those cases the client side of the connection offers a client certificate (on top of an existing TLS connection). The server side of the connection uses the client certificate to authenticate the client and (optionally) decides which rights should be assigned to the client. Note: ArangoSync does allow the use of certificates signed by a well know CA (eg. verisign) however it is more convenient (and common) to use your own CA. Formats All certificates are x509 certificates with a public key, a private key and an optional chain of certificates used to sign the certificate (this chain is typically provided by the Certificate Authority (CA)). Depending on their use, certificates stored in a different format. The following formats are used: Public key only ( .crt ): A file that contains only the public key of a certificate with an optional chain of parent certificates (public keys of certificates used to signed the certificate). Since this format contains only public keys, it is not a problem if its contents are exposed. It must still be store it in a safe place to avoid losing it. Private key only ( .key ): A file that contains only the private key of a certificate. It is vital to protect these files and store them in a safe place. Keyfile with public & private key ( .keyfile ): A file that contains the public key of a certificate, an optional chain of parent certificates and a private key. Since this format also contains a private key, it is vital to protect these files and store them in a safe place. Java keystore ( .jks ): A file containing a set of public and private keys. It is possible to protect access to the content of this file using a keystore password. Since this format can contain private keys, it is vital to protect these files and store them in a safe place (even when its content is protected with a keystore password). Creating certificates ArangoSync provides commands to create all certificates needed. TLS server certificates To create a certificate used for TLS servers in the keyfile format, you need the public key of the CA ( CA ( --cakey --cacert ), the private key of the ) and one or more hostnames (or IP addresses). Then run: 333 M ultiple Datacenters arangosync create tls keyfile \ --cacert=my-tls-ca.crt --cakey=my-tls-ca.key \ --host= \ --keyfile=my-tls-cert.keyfile M ake sure to store the generated keyfile ( my-tls-cert.keyfile ) in a safe place. To create a certificate used for TLS servers in the crt & key format, you need the public key of the CA ( the CA ( --cakey --cacert ), the private key of ) and one or more hostnames (or IP addresses). Then run: arangosync create tls certificate \ --cacert=my-tls-ca.crt --cakey=my-tls-ca.key \ --host= \ --cert=my-tls-cert.crt \ --key=my-tls-cert.key \ M ake sure to protect and store the generated files ( my-tls-cert.crt & my-tls-cert.key ) in a safe place. Client authentication certificates To create a certificate used for client authentication in the keyfile format, you need the public key of the CA ( key of the CA ( --cakey --cacert ), the private ) and one or more hostnames (or IP addresses) or email addresses. Then run: arangosync create client-auth keyfile \ --cacert=my-client-auth-ca.crt --cakey=my-client-auth-ca.key \ [--host= | --email= ] \ --keyfile=my-client-auth-cert.keyfile M ake sure to protect and store the generated keyfile ( my-client-auth-cert.keyfile ) in a safe place. CA certificates To create a CA certificate used to sign TLS certificates, run: arangosync create tls ca \ --cert=my-tls-ca.crt --key=my-tls-ca.key M ake sure to protect and store both generated files ( my-tls-ca.crt & my-tls-ca.key ) in a safe place. Note: CA certificates have a much longer lifetime than normal certificates. Therefore even more care is needed to store them safely. To create a CA certificate used to sign client authentication certificates, run: arangosync create client-auth ca \ --cert=my-client-auth-ca.crt --key=my-client-auth-ca.key M ake sure to protect and store both generated files ( my-client-auth-ca.crt & my-client-auth-ca.key ) in a safe place. Note: CA certificates have a much longer lifetime than normal certificates. Therefore even more care is needed to store them safely. Renewing certificates All certificates have meta information in them the limit their use in function, target & lifetime. A certificate created for client authentication (function) cannot be used as a TLS server certificate (same is true for the reverse). A certificate for host myserver (target) cannot be used for host anotherserver . A certficiate that is valid until October 2017 (limetime) cannot be used after October 2017. If anything changes in function, target or lifetime you need a new certificate. The procedure for creating a renewed certificate is the same as for creating a "first" certificate. After creating the renewed certificate the process(es) using them have to be updated. This mean restarting them. All ArangoSync components are designed to support stopping and starting single instances, but do not restart more than 1 instance at the same time. As soon as 1 instance has been restarted, give it some time to "catch up" before restarting the next instance. 334 M ultiple Datacenters 335 Kubernetes ArangoDB Kubernetes Operator The ArangoDB Kubernetes Operator ( kube-arangodb ) is a set of operators that you deploy in your Kubernetes cluster to: M anage deployments of the ArangoDB database Provide PersistentVolumes on local storage of your nodes for optimal storage performance. Configure ArangoDB Datacenter to Datacenter replication Each of these uses involves a different custom resource. Use an ArangoDeployment resource to create an ArangoDB database deployment. Use an ArangoLocalStorage Use an ArangoDeploymentReplication resource to provide local PersistentVolumes for optimal I/O performance. resource to configure ArangoDB Datacenter to Datacenter replication. Continue with Using the ArangoDB Kubernetes Operator to learn how to install the ArangoDB Kubernetes operator and create your first deployment. 336 Using the Operator Using the ArangoDB Kubernetes Operator Installation The ArangoDB Kubernetes Operator needs to be installed in your Kubernetes cluster first. To do so, run (replace with the version of the operator that you want to install): export URLPREFIX=https://raw.githubusercontent.com/arangodb/kube-arangodb/ /manifests kubectl apply -f $URLPREFIX/crd.yaml kubectl apply -f $URLPREFIX/arango-deployment.yaml To use ArangoLocalStorage resources, also run: kubectl apply -f $URLPREFIX/arango-storage.yaml To use ArangoDeploymentReplication resources, also run: kubectl apply -f $URLPREFIX/arango-deployment-replication.yaml You can find the latest release of the ArangoDB Kubernetes Operator in the kube-arangodb repository. ArangoDB deployment creation Once the operator is running, you can create your ArangoDB database deployment by creating a ArangoDeployment custom resource and deploying it into your Kubernetes cluster. For example (all examples can be found in the kube-arangodb repository): kubectl apply -f examples/simple-cluster.yaml Deployment removal To remove an existing ArangoDB deployment, delete the custom resource. The operator will then delete all created resources. For example: kubectl delete -f examples/simple-cluster.yaml Note that this will also delete all data in your ArangoDB deployment! If you want to keep your data, make sure to create a backup before removing the deployment. Operator removal To remove the entire ArangoDB Kubernetes Operator, remove all clusters first and then remove the operator by running: kubectl delete deployment arango-deployment-operator # If `ArangoLocalStorage` operator is installed kubectl delete deployment -n kube-system arango-storage-operator # If `ArangoDeploymentReplication` operator is installed kubectl delete deployment arango-deployment-replication-operator 337 Using the Operator See also Driver configuration Scaling Upgrading 338 Deployment Resource Reference ArangoDeployment Custom Resource The ArangoDB Deployment Operator creates and maintains ArangoDB deployments in a Kubernetes cluster, given a deployment specification. This deployment specification is a CustomResource following a CustomResourceDefinition created by the operator. Example minimal deployment definition of an ArangoDB database cluster: apiVersion: "database.arangodb.com/v1alpha" kind: "ArangoDeployment" metadata: name: "example-arangodb-cluster" spec: mode: Cluster Example more elaborate deployment definition: apiVersion: "database.arangodb.com/v1alpha" kind: "ArangoDeployment" metadata: name: "example-arangodb-cluster" spec: mode: Cluster environment: Production agents: count: 3 args: - --log.level=debug resources: requests: storage: 8Gi storageClassName: ssd dbservers: count: 5 resources: requests: storage: 80Gi storageClassName: ssd coordinators: count: 3 image: "arangodb/arangodb:3.3.4" Specification reference Below you'll find all settings of the indicated with agents where can be any of: for all agents of a Cluster or dbservers for all dbservers of a coordinators single custom resource. Several settings are for various groups of servers. These are ArangoDeployment for all coordinators of a for all single servers of a ActiveFailover Cluster . Cluster Single pair. . instance or syncmasters for all syncmasters of a Cluster . syncworkers for all syncworkers of a Cluster . ActiveFailover pair. spec.mode: string This setting specifies the type of deployment you want to create. Possible values are: Cluster (default) Full cluster. Defaults to 3 agents, 3 dbservers & 3 coordinators. ActiveFailover Single Active-failover single pair. Defaults to 3 agents and 2 single servers. Single server only (note this does not provide high availability or reliability). This setting cannot be changed after the deployment has been created. 339 Deployment Resource Reference spec.environment: string This setting specifies the type of environment in which the deployment is created. Possible values are: (default) This value optimizes the deployment for development use. It is possible to run a deployment on a small Development number of nodes (e.g. minikube). Production This value optimizes the deployment for production use. It puts required affinity constraints on all pods to avoid agents & dbservers from running on the same machine. spec.image: string This setting specifies the docker image to use for all ArangoDB servers. In a arangodb/arangodb:latest use explicit version (not . For latest production development environment this setting defaults to environments this is a required setting without a default value. It is highly recommend to ) for production environments. spec.imagePullPolicy: string This setting specifies the pull policy for the docker image to use for all ArangoDB servers. Possible values are: IfNotPresent Always (default) to pull only when the image is not found on the node. to always pull the image before using it. spec.storageEngine: string This setting specifies the type of storage engine used for all servers in the cluster. Possible values are: MMFiles To use the M M Files storage engine. RocksDB (default) To use the RocksDB storage engine. This setting cannot be changed after the cluster has been created. spec.downtimeAllowed: bool This setting is used to allow automatic reconciliation actions that yield some downtime of the ArangoDB deployment. When this setting is set to false (the default), no automatic action that may result in downtime is allowed. If the need for such an action is detected, an event is added to the ArangoDeployment Once this setting is set to true . , the automatic action is executed. Operations that may result in downtime are: Rotating TLS CA certificate Note: It is still possible that there is some downtime when the Kubernetes cluster is down, or in a bad state, irrespective of the value of this setting. spec.rocksdb.encryption.keySecretName This setting specifies the name of a kubernetes Secret that contains an encryption key used for encrypting all data stored by ArangoDB servers. When an encryption key is used, encryption of the data in the cluster is enabled, without it encryption is disabled. The default value is empty. This requires the Enterprise version. The encryption key cannot be changed after the cluster has been created. The secret specified by this setting, must have a data field named 'key' containing an encryption key that is exactly 32 bytes long. spec.externalAccess.type: string This setting specifies the type of Service that will be created to provide access to the ArangoDB deployment from outside the Kubernetes cluster. Possible values are: 340 Deployment Resource Reference None To limit access to application running inside the Kubernetes cluster. LoadBalancer Auto To create a To create a NodePort (default) To create a LoadBalancer Service Service of type of type Service LoadBalancer NodePort of type for the ArangoDB deployment. for the ArangoDB deployment. LoadBalancer and fallback to a Service or type NodePort when the is not assigned an IP address. spec.externalAccess.loadBalancerIP: string This setting specifies the IP used to for the LoadBalancer to expose the ArangoDB deployment on. This setting is used when spec.externalAccess.type is set to LoadBalancer or Auto . If you do not specify this setting, an IP will be chosen automatically by the load-balancer provisioner. spec.externalAccess.nodePort: int This setting specifies the port used to expose the ArangoDB deployment on. This setting is used when to NodePort or Auto spec.externalAccess.type is set . If you do not specify this setting, a random port will be chosen automatically. spec.auth.jwtSecretName: string This setting specifies the name of a kubernetes name is specified, it defaults to Secret -jwt If you specify a name of a Secret If you specify a name of a Secret that contains the JWT token used for accessing all ArangoDB servers. When no . To disable authentication, set this value to , that secret must have the token in a data field named token that does not exist, a random token is created and stored in a None . . Secret with given name. Changing a JWT token results in stopping the entire cluster and restarting it. spec.tls.caSecretName: string This setting specifies the name of a kubernetes Secret that contains a standard CA certificate + private key used to sign certificates for individual ArangoDB servers. When no name is specified, it defaults to to None -ca . To disable authentication, set this value . If you specify a name of a Secret that does not exist, a self-signed CA certificate + key is created and stored in a Secret with given name. The specified Secret , must contain the following data fields: ca.crt PEM encoded public key of the CA certificate ca.key PEM encoded private key of the CA certificate spec.tls.altNames: []string This setting specifies a list of alternate names that will be added to all generated certificates. These names can be DNS names or email addresses. The default value is empty. spec.tls.ttl: duration This setting specifies the time to live of all generated server certificates. The default value is 2160h (about 3 month). When the server certificate is about to expire, it will be automatically replaced by a new one and the affected server will be restarted. Note: The time to live of the CA certificate (when created automatically) will be set to 10 years. spec.sync.enabled: bool This setting enables/disables support for data center 2 data center replication in the cluster. When enabled, the cluster will contain a number of syncmaster & syncworker servers. The default value is false . 341 Deployment Resource Reference spec.sync.externalAccess.type: string This setting specifies the type of that will be created to provide access to the ArangoSync syncM asters from outside the Service Kubernetes cluster. Possible values are: None To limit access to applications running inside the Kubernetes cluster. LoadBalancer NodePort Auto To create a To create a Service (default) To create a LoadBalancer of type Service of type Service LoadBalancer NodePort of type for the ArangoSync SyncM asters. for the ArangoSync SyncM asters. LoadBalancer and fallback to a Service or type NodePort when the is not assigned an IP address. Note that when you specify a value of None ,a Service will still be created, but of type . ClusterIP spec.sync.externalAccess.loadBalancerIP: string This setting specifies the IP used for the LoadBalancer to expose the ArangoSync SyncM asters on. This setting is used when spec.sync.externalAccess.type is set to LoadBalancer or Auto . If you do not specify this setting, an IP will be chosen automatically by the load-balancer provisioner. spec.sync.externalAccess.nodePort: int This setting specifies the port used to expose the ArangoSync SyncM asters on. This setting is used when spec.sync.externalAccess.type is set to NodePort or Auto . If you do not specify this setting, a random port will be chosen automatically. spec.sync.externalAccess.masterEndpoint: []string This setting specifies the master endpoint(s) advertised by the ArangoSync SyncM asters. If not set, this setting defaults to: If spec.sync.externalAccess.loadBalancerIP Otherwise it defaults to is set, it defaults to https:// :<8629> https:// :<8629> . . spec.sync.externalAccess.accessPackageSecretNames: []string This setting specifies the names of zero of more An access package contains those Secrets Secrets By removing a name from this setting, the corresponding empty array in place ( See the [] that will be created by the deployment operator containing "access packages". that are needed to access the SyncM asters of this Secret ArangoDeployment . is also deleted. Note that to remove all access packages, leave an ). Completely removing the setting results in not modifying the list. ArangoDeploymentReplication specification for more information on access packages. spec.sync.auth.jwtSecretName: string This setting specifies the name of a kubernetes When not specified, the Secret spec.auth.jwtSecretName If you specify a name of a Secret that contains the JWT token used for accessing all ArangoSync master servers. value is used. that does not exist, a random token is created and stored in a Secret with given name. spec.sync.auth.clientCASecretName: string This setting specifies the name of a kubernetes Secret that contains a PEM encoded CA certificate used for client certificate verification in all ArangoSync master servers. This is a required setting when spec.sync.enabled is true . The default value is empty. spec.sync.mq.type: string This setting sets the type of message queue used by ArangoSync. Possible values are: Direct (default) for direct HTTP connections between the 2 data centers. 342 Deployment Resource Reference spec.sync.tls.caSecretName: string This setting specifies the name of a kubernetes Secret that contains a standard CA certificate + private key used to sign certificates for individual ArangoSync master servers. When no name is specified, it defaults to If you specify a name of a Secret -sync-ca . that does not exist, a self-signed CA certificate + key is created and stored in a Secret with given name. The specified , must contain the following data fields: Secret ca.crt PEM encoded public key of the CA certificate ca.key PEM encoded private key of the CA certificate spec.sync.tls.altNames: []string This setting specifies a list of alternate names that will be added to all generated certificates. These names can be DNS names or email addresses. The default value is empty. spec.sync.monitoring.tokenSecretName: string This setting specifies the name of a kubernetes Secret that contains the bearer token used for accessing all monitoring endpoints of all ArangoSync servers. When not specified, no monitoring token is used. The default value is empty. spec.ipv6.forbidden: bool This setting prevents the use of IPv6 addresses by ArangoDB servers. The default is false . spec. .count: number This setting specifies the number of servers to start for the given group. For the agent group, this value must be a positive, odd number. The default value is ActiveFailover For the 3 for all groups except (there the default is single 1 for spec.mode: Single and 2 for spec.mode: ). syncworkers group, it is highly recommended to use the same number as for the dbservers group. spec. .args: [string] This setting specifies additional commandline arguments passed to all servers of this group. The default value is an empty array. spec. .resources.requests.cpu: cpuUnit This setting specifies the amount of CPU requested by server of this group. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ for details. spec. .resources.requests.memory: memoryUnit This setting specifies the amount of memory requested by server of this group. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ for details. spec. .resources.requests.storage: storageUnit This setting specifies the amount of storage required for each server of this group. The default value is This setting is not available for group coordinators , syncmasters & syncworkers 8Gi . because servers in these groups do not need persistent storage. spec. .serviceAccountName: string 343 Deployment Resource Reference This setting specifies the Using an alternative serviceAccountName ServiceAccount for the Pods created for each server of this group. is typically used to separate access rights. The ArangoDB deployments do not require any special rights. spec. .storageClassName: string This setting specifies the storageClass This setting is not available for group for the PersistentVolume coordinators , syncmasters s created for each server of this group. & syncworkers because servers in these groups do not need persistent storage. spec. .tolerations: [Toleration] This setting specifies the tolerations for the Pod s created for each server of this group. By default, suitable tolerations are set for the following keys with the NoExecute effect: node.kubernetes.io/not-ready node.kubernetes.io/unreachable node.alpha.kubernetes.io/unreachable (will be removed in future version) For more information on tolerations, consult the Kubernetes documentation. 344 Driver Configuration Configuring your driver for ArangoDB access In this chapter you'll learn how to configure a driver for accessing an ArangoDB deployment in Kubernetes. The exact methods to configure a driver are specific to that driver. Database endpoint(s) The endpoint(s) (or URLs) to communicate with is the most important parameter your need to configure in your driver. Finding the right endpoints depend on wether your client application is running in the same Kubernetes cluster as the ArangoDB deployment or not. Client application in same Kubernetes cluster If your client application is running in the same Kubernetes cluster as the ArangoDB deployment, you should configure your driver to use the following endpoint: https:// . .svc:8529 Only if your deployment has set spec.tls.caSecretName to None , should you use http instead of https . Client application outside Kubernetes cluster If your client application is running outside the Kubernetes cluster in which the ArangoDB deployment is running, your driver endpoint depends on the external-access configuration of your ArangoDB deployment. If the external-access of the ArangoDB deployment is of type LoadBalancer , then use the IP address of that LoadBalancer like this: https:// :8529 If the external-access of the ArangoDB deployment is of type cluster, combined with the NodePort NodePort , then use the IP address(es) of the Nodes of the Kubernetes that is used by the external-access service. For example: https:// :30123 You can find the type of external-access by inspecting the external-access Service . To do so, run the following command: kubectl get service -n -ea The output looks like this: NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR example-simple-cluster-ea LoadBalancer 10.106.175.38 192.168.10.208 8529:31890/TCP 1s app=arangodb,arango_de ployment=example-simple-cluster,role=coordinator In this case the external-access is of type of https://192.168.10.208:8529 LoadBalancer with a load-balancer IP address of 192.168.10.208 . This results in an endpoint . TLS settings As mentioned before the ArangoDB deployment managed by the ArangoDB operator will use a secure (TLS) connection unless you set spec.tls.caSecretName to None in your ArangoDeployment . 345 Driver Configuration When using a secure connection, you can choose to verify the server certificates provides by the ArangoDB servers or not. If you want to verify these certificates, configure your driver with the CA certificate found in a Kubernetes namespace as the ArangoDeployment The name of this Secret Secret found in the same . is stored in the spec.tls.caSecretName setting of the ArangoDeployment . If you don't set this setting explicitly, it will be set automatically. Then fetch the CA secret using the following command (or use a Kubernetes client library to fetch it): kubectl get secret -n --template='{{index .data "ca.crt"}}' | base64 -D > ca.crt This results in a file called ca.crt containing a PEM encoded, x509 CA certificate. Query requests For most client requests made by a driver, it does not matter if there is any kind of load-balancer between your client application and the ArangoDB deployment. Note that even a simple Service of type ClusterIP already behaves as a load-balancer. The exception to this is cursor related requests made to an ArangoDB query request (that results in a Cursor Cluster deployment. The coordinator that handles an initial ) will save some in-memory state in that coordinator, if the result of the query is too big to be transfer back in the response of the initial request. Follow-up requests have to be made to fetch the remaining data. These follow-up requests must be handled by the same coordinator to which the initial request was made. As soon as there is a load-balancer between your client application and the ArangoDB cluster, it is uncertain which coordinator will actually handle the follow-up request. To resolve this uncertainty, make sure to run your client application in the same Kubernetes cluster and synchronize your endpoints before making the initial query request. This will result in the use (by the driver) of internal DNS names of all coordinators. A follow-up request can then be sent to exactly the same coordinator. If your client application is running outside the Kubernetes cluster this is much harder to solve. The easiest way to work around it, is by making sure that the query results are small enough. When that is not feasible, it is also possible to resolve this when the internal DNS names of your Kubernetes cluster are exposed to your client application and the resuling IP addresses are routeable from your client application. To expose internal DNS names of your Kubernetes cluster, your can use CoreDNS. 346 Authentication Authentication The ArangoDB Kubernetes Operator will by default create ArangoDB deployments that require authentication to access the database. It uses a single JWT secret (stored in a Kubernetes secret) to provide super-user access between all servers of the deployment as well as access from the ArangoDB Operator to the deployment. To disable authentication, set spec.auth.jwtSecretName to None . Initially the deployment is accessible through the web user-interface and API's, using the user root with an empty password. M ake sure to change this password immediately after starting the deployment! See also Secure connections (TLS) 347 Scaling Scaling The ArangoDB Kubernetes Operator supports up and down scaling of the number of dbservers & coordinators. Currently it is not possible to change the number of agents of a cluster. The scale up or down, change the number of servers in the custom resource. E.g. change spec.dbservers.count from 3 to 4 . Then apply the updated resource using: kubectl apply -f yourCustomResourceFile.yaml Inspect the status of the custom resource to monitor the progress of the scaling operation. 348 Upgrading Upgrading The ArangoDB Kubernetes Operator supports upgrading an ArangoDB from one version to the next. Upgrade an ArangoDB deployment To upgrade a cluster, change the version by changing the spec.image setting and the apply the updated custom resource using: kubectl apply -f yourCustomResourceFile.yaml The ArangoDB operator will perform an sequential upgrade of all servers in your deployment. Only one server is upgraded at a time. For patch level upgrades (e.g. 3.3.9 to 3.3.10) each server is stopped and restarted with the new version. For minor level upgrades (e.g. 3.3.9 to 3.4.0) each server is stopped, then the new version is started with --database.auto-upgrade and once that is finish the new version is started with the normal arguments. The process for major level upgrades depends on the specific version. Upgrade the operator itself To update the ArangoDB Kubernetes Operator itself to a new version, update the image version of the deployment resource and apply it using: kubectl apply -f examples/yourUpdatedDeployment.yaml See also Scaling 349 ArangoDB Configuration & Secrets Configuration & secrets An ArangoDB cluster has lots of configuration options. Some will be supported directly in the ArangoDB Operator, others will have to specified separately. Built-in options All built-in options are passed to ArangoDB servers via commandline arguments configured in the Pod-spec. Other configuration options All commandline options of arangod (and arangosync ) are available by adding options to the spec. .args list of a group of servers. These arguments are added to th commandline created for these servers. Secrets The ArangoDB cluster needs several secrets such as JWT tokens TLS certificates and so on. All these secrets are stored as Kubernetes Secrets and passed to the applicable Pods as files, mapped into the Pods filesystem. The name of the secret is specified in the custom resource. For example: apiVersion: "cluster.arangodb.com/v1alpha" kind: "Cluster" metadata: name: "example-arangodb-cluster" spec: mode: Cluster auth: jwtSecretName: 350 M etrics Metrics The ArangoDB Kubernetes Operator ( kube-arangodb The metrics are exposed through HTTPS on port Look at examples/metrics for examples of ) exposes metrics of its operations in a format that is compatible with Prometheus. 8528 Services under path and /metrics ServiceMonitors . you can use to integrate with Prometheus through the Prometheus-Operator by CoreOS. 351 Services & Load balancer Services and load balancer The ArangoDB Kubernetes Operator will create services that can be used to reach the ArangoDB servers from inside the Kubernetes cluster. By default, the ArangoDB Kubernetes Operator will also create an additional service to reach the ArangoDB deployment from outside the Kubernetes cluster. For exposing the ArangoDB deployment to the outside, there are 2 options: Using a NodePort service. This will expose the deployment on a specific port (above 30.000) on all nodes of the Kubernetes cluster. Using a The LoadBalancer LoadBalancer service. This will expose the deployment on a load-balancer that is provisioned by the Kubernetes cluster. option is the most convenient, but not all Kubernetes clusters are able to provision a load-balancer. Therefore we offer a third (and default) option: Auto . In this option, the ArangoDB Kubernetes Operator tries to create a LoadBalancer service. It then waits for up to a minute for the Kubernetes cluster to provision a load-balancer for it. If that has not happened after a minute, the service is replaced by a service of type NodePort . To inspect the created service, run: kubectl get services -ea To use the ArangoDB servers from outside the Kubernetes cluster you have to add another service as explained below. Services If you do not want the ArangoDB Kubernetes Operator to create an external-access service for you, set None spec.externalAccess.Type to . If you want to create external access services manually, follow the instructions below. Single server For a single server deployment, the operator creates a single Service named . This service has a normal cluster IP address. Full cluster For a full cluster deployment, the operator creates two -int a headless Service Services . intended to provide DNS names for all pods created by the operator. It selects all ArangoDB & ArangoSync servers in the cluster. ClientIP a normal Service that selects only the coordinators of the cluster. This Service is configured with session affinity. This is needed for cursor requests, since they are bound to a specific coordinator. When the coordinators are asked to provide endpoints of the cluster (e.g. when calling the DNS names of the individual Pods will be returned ( client.SynchronizeEndpoints() . -int. .svc in the go driver) ) Full cluster with DC2DC For a full cluster with datacenter replication deployment, the same Services are created as for a Full cluster, with the following additions: -sync a normal Service that selects only the syncmasters of the cluster. Load balancer 352 Services & Load balancer If you want full control of the spec.externalAccess.Type Create a Service of type Services of the needed to access the ArangoDB deployment from outside your Kubernetes cluster, set ArangoDeployment LoadBalancer or to NodePort None and create a Service as specified below. , depending on your Kubernetes deployment. This service should select: arango_deployment: role: coordinator The following example yields a service of type cluster can now be reached on LoadBalancer https://1.2.3.4:8529 with a specific load balancer IP address. With this service, the ArangoDB . kind: Service apiVersion: v1 metadata: name: arangodb-cluster-exposed spec: selector: arango_deployment: arangodb-cluster role: coordinator type: LoadBalancer loadBalancerIP: 1.2.3.4 ports: - protocol: TCP port: 8529 targetPort: 8529 The following example yields a service of type NodePort with the ArangoDB cluster exposed on port 30529 of all nodes of the Kubernetes cluster. kind: Service apiVersion: v1 metadata: name: arangodb-cluster-exposed spec: selector: arango_deployment: arangodb-cluster role: coordinator type: NodePort ports: - protocol: TCP port: 8529 targetPort: 8529 nodePort: 30529 353 Deployment Replication Resource Reference ArangoDeploymentReplication Custom Resource The ArangoDB Replication Operator creates and maintains ArangoDB replication specification. This replication specification is a arangosync CustomResource configurations in a Kubernetes cluster, given a following a CustomResourceDefinition created by the operator. Example minimal replication definition for 2 ArangoDB cluster with sync in the same Kubernetes cluster: apiVersion: "replication.database.arangodb.com/v1alpha" kind: "ArangoDeploymentReplication" metadata: name: "replication-from-a-to-b" spec: source: deploymentName: cluster-a auth: keyfileSecretName: cluster-a-sync-auth destination: deploymentName: cluster-b This definition results in: the arangosync a cluster-a in deployment SyncMaster to the syncmasters in cluster-b , the JWT secret found in the deployment of deployment of cluster-b is called to configure a synchronization from the syncmasters in , using the client authentication certificate stored in cluster-b cluster-a is used. To access Secret cluster-b cluster-a-sync-auth cluster- . To access , the JWT secret found in the is used. Example replication definition for replicating from a source that is outside the current Kubernetes cluster to a destination that is in the same Kubernetes cluster: apiVersion: "replication.database.arangodb.com/v1alpha" kind: "ArangoDeploymentReplication" metadata: name: "replication-from-a-to-b" spec: source: masterEndpoint: ["https://163.172.149.229:31888", "https://51.15.225.110:31888", "https://51.15.229.133:31888"] auth: keyfileSecretName: cluster-a-sync-auth tls: caSecretName: cluster-a-sync-ca destination: deploymentName: cluster-b This definition results in: the arangosync SyncMaster in deployment cluster-b given list of endpoint URL's to the syncmasters a-sync-auth . To access cluster-a is called to configure a synchronization from the syncmasters located at the cluster-b , using the client authentication certificate stored in , the keyfile (containing a client authentication certificate) is used. To access JWT secret found in the deployment of cluster-b Secret cluster- cluster-b , the is used. Specification reference Below you'll find all settings of the ArangoDeploymentReplication custom resource. spec.source.deploymentName: string This setting specifies the name of an ArangoDeployment resource that runs a cluster with sync enabled. This cluster configured as the replication source. 354 Deployment Replication Resource Reference spec.source.masterEndpoint: []string This setting specifies zero or more master endpoint URL's of the source cluster. Use this setting if the source cluster is not running inside a Kubernetes cluster that is reachable from the Kubernetes cluster the ArangoDeploymentReplication Specifying this setting and resource is deployed in. at the same time is not allowed. spec.source.deploymentName spec.source.auth.keyfileSecretName: string This setting specifies the name of a Secret containing a client authentication certificate called tls.keyfile used to authenticate with the SyncM aster at the specified source. If spec.source.auth.userSecretName has not been set, the client authentication certificate found in the secret with this name is also used to configure the synchronization and fetch the synchronization status. This setting is required. spec.source.auth.userSecretName: string This setting specifies the name of a Secret containing a & username password used to authenticate with the SyncM aster at the specified source in order to configure synchronization and fetch synchronization status. The user identified by the username must have write access in the _system database of the source ArangoDB cluster. spec.source.tls.caSecretName: string This setting specifies the name of a Secret containing a TLS CA certificate ca.crt used to verify the TLS connection created by the SyncM aster at the specified source. This setting is required, unless spec.source.deploymentName has been set. spec.destination.deploymentName: string This setting specifies the name of an ArangoDeployment resource that runs a cluster with sync enabled. This cluster configured as the replication destination. spec.destination.masterEndpoint: []string This setting specifies zero or more master endpoint URL's of the destination cluster. Use this setting if the destination cluster is not running inside a Kubernetes cluster that is reachable from the Kubernetes cluster the ArangoDeploymentReplication Specifying this setting and resource is deployed in. spec.destination.deploymentName at the same time is not allowed. spec.destination.auth.keyfileSecretName: string This setting specifies the name of a Secret containing a client authentication certificate called tls.keyfile used to authenticate with the SyncM aster at the specified destination. If spec.destination.auth.userSecretName has not been set, the client authentication certificate found in the secret with this name is also used to configure the synchronization and fetch the synchronization status. This setting is required, unless Specifying this setting and spec.destination.deploymentName spec.destination.userSecretName or spec.destination.auth.userSecretName has been set. at the same time is not allowed. spec.destination.auth.userSecretName: string 355 Deployment Replication Resource Reference This setting specifies the name of a Secret containing a username & password used to authenticate with the SyncM aster at the specified destination in order to configure synchronization and fetch synchronization status. The user identified by the username must have write access in the Specifying this setting and database of the destination ArangoDB cluster. _system at the same time is not allowed. spec.destination.keyfileSecretName spec.destination.tls.caSecretName: string This setting specifies the name of a Secret containing a TLS CA certificate ca.crt used to verify the TLS connection created by the SyncM aster at the specified destination. This setting is required, unless has been set. spec.destination.deploymentName Authentication details The authentication settings in a resource are used for two distinct purposes. ArangoDeploymentReplication The first use is the authentication of the syncmasters at the destination with the syncmasters at the source. This is always done using a client authentication certificate which is found in a field in a secret identified by tls.keyfile spec.source.auth.keyfileSecretName . The second use is the authentication of the ArangoDB Replication operator with the syncmasters at the source or destination. These connections are made to configure synchronization, stop configuration and fetch the status of the configuration. The method used for this authentication is derived as follows (where If spec.X.userSecretName If spec.X.keyfileSecretName X is either or source ): destination is set, the username + password found in the Secret identified by this name is used. is set, the client authentication certificate (keyfile) found in the Secret identifier by this name is used. If spec.X.deploymentName is set, the JWT secret found in the deployment is used. Creating client authentication certificate keyfiles The client authentication certificates needed for the spec.destination.auth.keyfileSecretName keyfile Secrets identified by spec.source.auth.keyfileSecretName are normal ArangoDB keyfiles that can be created by the & arangosync create client-auth command. In order to do so, you must have access to the client authentication CA of the source/destination. If the client authentication CA at the source/destination also contains a private key ( used to create such a keyfile for you, without the need to have arangosync ca.key ), the ArangoDeployment operator can be installed locally. Read the following paragraphs for instructions on how to do that. Creating and using access packages An access package is a YAM L file that contains: A client authentication certificate, wrapped in a Secret A TLS certificate authority public key, wrapped in a in a Secret tls.keyfile data field. in a data field. ca.crt The format of the access package is such that it can be inserted into a Kubernetes cluster using the standard To create an access package that can be used to authenticate with the ArangoDB SyncM asters of an non-existing Secret Secret to the spec.sync.externalAccess.accessPackageSecretNames field of the is created in that Kubernetes cluster, with the given name, that contains a kubectl tool. ArangoDeployment ArangoDeployment accessPackage.yaml , add a name of a . In response, a data field that contains a Kubernetes resource specification that can be inserted into the other Kubernetes cluster. The process for creating and using an access package for authentication at the source cluster is as follows: Edit the ArangoDeployment resource of the source cluster, set spec.sync.externalAccess.accessPackageSecretNames to ["my- access-package"] Wait for the ArangoDeployment operator to create a Secret named my-access-package . Extract the access package from the Kubernetes source cluster using: 356 Deployment Replication Resource Reference kubectl get secret my-access-package --template='{{index .data "accessPackage.yaml"}}' | base64 -D > accessPackage.yaml Insert the secrets found in the access package in the Kubernetes destination cluster using: kubectl apply -f accessPackage.yaml As a result, the destination Kubernetes cluster will have 2 additional Secrets . One contains a client authentication certificate formatted as a keyfile. Another contains the public key of the TLS CA certificate of the source cluster. 357 Storage Storage An ArangoDB cluster relies heavily on fast persistent storage. The ArangoDB Kubernetes Operator uses PersistentVolumeClaims to deliver the storage to Pods that need them. Storage configuration In the resource, one can specify the type of storage used by groups of servers using the ArangoDeployment This is an example of a spec. setting. .storageClassName Cluster deployment that stores its agent & dbserver data on PersistentVolumes that use the my-local-ssd StorageClass apiVersion: "database.arangodb.com/v1alpha" kind: "ArangoDeployment" metadata: name: "cluster-using-local-ssh" spec: mode: Cluster agents: storageClassName: my-local-ssd dbservers: storageClassName: my-local-ssd The amount of storage needed is configured using the spec. .resources.requests.storage setting. Note that configuring storage is done per group of servers. It is not possible to configure storage per individual server. This is an example of a Cluster deployment that requests volumes of 80GB for every dbserver, resulting in a total storage capacity of 240GB (with 3 dbservers). apiVersion: "database.arangodb.com/v1alpha" kind: "ArangoDeployment" metadata: name: "cluster-using-local-ssh" spec: mode: Cluster dbservers: resources: requests: storage: 80Gi Local storage For optimal performance, ArangoDB should be configured with locally attached SSD storage. The easiest way to accomplish this is to deploy an PersistentVolumes ArangoLocalStorage resource. The ArangoDB Storage Operator will use it to provide for you. This is an example of an ArangoLocalStorage cluster under the directory resource that will result in /mnt/big-ssd-disk PersistentVolumes created on any node of the Kubernetes . apiVersion: "storage.arangodb.com/v1alpha" kind: "ArangoLocalStorage" metadata: name: "example-arangodb-storage" spec: storageClass: name: my-local-ssd localPath: - /mnt/big-ssd-disk 358 Storage Note that using local storage required VolumeScheduling by default, on version 1.9 you have to enable it with a to be enabled in your Kubernetes cluster. ON Kubernetes 1.10 this is enabled --feature-gate setting. Manually creating PersistentVolumes The alternative is to create a PersistentVolumes manually, for all servers that need persistent storage (single, agents & dbservers). E.g. for with 3 agents and 5 dbservers, you must create 8 volumes. Cluster Note that each volume must have a capacity that is equal to or higher than the capacity needed for each server. To select the correct node, add a required node-affinity annotation as shown in the example below. apiVersion: v1 kind: PersistentVolume metadata: name: volume-agent-1 annotations: "volume.alpha.kubernetes.io/node-affinity": '{ "requiredDuringSchedulingIgnoredDuringExecution": { "nodeSelectorTerms": [ { "matchExpressions": [ { "key": "kubernetes.io/hostname", "operator": "In", "values": ["node-1"] } ]} ]} }' spec: capacity: storage: 100Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Delete storageClassName: local-ssd local: path: /mnt/disks/ssd1 For Kubernetes 1.9 and up, you should create a StorageClass which is configured to bind volumes on their first use as shown in the example below. This ensures that the Kubernetes scheduler takes all constraints on a Pod that into consideration before binding the volume to a claim. kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: local-ssd provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer 359 Storage Resource ArangoLocalStorage Custom Resource The ArangoDB Storage Operator creates and maintains ArangoDB storage resources in a Kubernetes cluster, given a storage specification. This storage specification is a CustomResource following a CustomResourceDefinition created by the operator. Example minimal storage definition: apiVersion: "storage.arangodb.com/v1alpha" kind: "ArangoLocalStorage" metadata: name: "example-arangodb-storage" spec: storageClass: name: my-local-ssd localPath: - /mnt/big-ssd-disk This definition results in: a StorageClass called my-local-ssd the dynamic provisioning of PersistentVolume's with a local volume on a node where the local volume starts in a sub-directory of /mnt/big-ssd-disk . the dynamic cleanup of PersistentVolume's (created by the operator) after one is released. The provisioned volumes will have a capacity that matches the requested capacity of volume claims. Specification reference Below you'll find all settings of the ArangoLocalStorage custom resource. spec.storageClass.name: string This setting specifies the name of the storage class that created If empty, this field defaults to the name of the If a StorageClass PersistentVolume ArangoLocalStorage will use. object. with given name does not yet exist, it will be created. spec.storageClass.isDefault: bool This setting specifies if the created StorageClass will be marked as default storage class. (default is false ) spec.localPath: stringList This setting specifies one of more local directories (on the nodes) used to create persistent volumes in. spec.nodeSelector: nodeSelector This setting specifies which nodes the operator will provision persistent volumes on. 360 TLS Secure connections (TLS) The ArangoDB Kubernetes Operator will by default create ArangoDB deployments that use secure TLS connections. It uses a single CA certificate (stored in a Kubernetes secret) and one certificate per ArangoDB server (stored in a Kubernetes secret per server). To disable TLS, set spec.tls.caSecretName to None . Install CA certificate If the CA certificate is self-signed, it will not be trusted by browsers, until you install it in the local operating system or browser. This process differs per operating system. To do so, you first have to fetch the CA certificate from its Kubernetes secret. kubectl get secret -ca --template='{{index .data "ca.crt"}}' | base64 -D > ca.crt Windows To install a CA certificate in Windows, follow the procedure described here. MacOS To install a CA certificate in M acOS, run: sudo /usr/bin/security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ca.crt To uninstall a CA certificate in M acOS, run: sudo /usr/bin/security remove-trusted-cert -d ca.crt Linux To install a CA certificate in Linux, on Ubuntu, run: sudo cp ca.crt /usr/local/share/ca-certificates/ .crt sudo update-ca-certificates See also Authentication 361 Troubleshooting Troubleshooting While Kubernetes and the ArangoDB Kubernetes operator will automatically resolve a lot of issues, there are always cases where human attention is needed. This chapter gives your tips & tricks to help you troubleshoot deployments. Where to look In Kubernetes all resources can be inspected using kubectl using either the get or describe command. To get all details of the resource (both specification & status), run the following command: kubectl get -n -o yaml For example, to get the entire specification and status of an ArangoDeployment resource named my-arangodb in the default namespace, run: kubectl get ArangoDeployment my-arango -n default -o yaml # or shorter kubectl get arango my-arango -o yaml Several types of resources (including all ArangoDB custom resources) support events. These events show what happened to the resource over time. To show the events (and most important resource data) of a resource, run the following command: kubectl describe -n Getting logs Another invaluable source of information is the log of containers being run in Kubernetes. These logs are accessible through the Pods that group these containers. To fetch the logs of the default container running in a Pod , run: kubectl logs -n # or with follow option to keep inspecting logs while they are written kubectl logs -n -f To inspect the logs of a specific container in using kubectl describe pod ... Pod , add -c . You can find the names of the containers in the Pod , . Note that the ArangoDB operators are being deployed themselves as a Kubernetes that you will have to fetch the logs of 2 Pods Deployment with 2 replicas. This means running those replicas. What if The Pods of a deployment stay in Pending state There are two common causes for this. 362 Troubleshooting 1) The Pods cannot be scheduled because there are not enough nodes available. This is usally only the case with a setting that has a value of Production spec.environment . Solution: Add more nodes. 1) There are no PersistentVolumes available to be bound to the created by the operator. PersistentVolumeClaims Solution: Use kubectl get persistentvolumes provision to inspect the available PersistentVolumes and if needed, use the ArangoLocalStorage operator to . PersistentVolumes When restarting a Node , the Pods scheduled on that node remain in Terminating state When a your Node Pods no longer makes regular calls to the Kubernetes API server, it is marked as not available. Depending on specific settings in , Kubernetes will at some point decide to terminate the Kubernetes API server, Kubernetes will try to use the The ArangoDeployment Node Pod . As long as the itself to terminate the Pod operator recognizes this condition and will try to replace those Node is not completely removed from the . Pods with Pods on different nodes. The exact behavior differs per type of server. What happens when a Node with local data is broken When a Node with PersistentVolumes hosted on that Node is broken and cannot be repaired, the data in those PersistentVolumes is lost. If an ArangoDeployment of type Single was using one of those of type ActiveFailover PersistentVolumes the database is lost and must be restored from a backup. If an ArangoDeployment or Cluster was using one of those PersistentVolumes , it depends on the type of server that was using the volume. If an If a Agent was using the volume, it can be repaired as long as 2 other agents are still healthy. DBServer was using the volume, and the replication factor of all database collections is 2 or higher, and the remaining dbservers are still healthy, the cluster will duplicate the remaining replicas to bring the number of replicases back to the original number. If a DBServer was using the volume, and the replication factor of a database collection is 1 and happens to be stored on that dbserver, the data is lost. If a single server of an ActiveFailover deployment was using the volume, and the other single server is still healthy, the other single server will become leader. After replacing the failed single server, the new follower will synchronize with the leader. 363 Backup & Restore Backup and Restore Backup and restore can be done via the tools arangodump and arangorestore. Performing frequent backups is important and a recommended best practices that can allow you to recover your data in case unexpected problems occur. Hardware failures, system crashes, or users mistakenly deleting data can always happen. Furthermore, while a big effort is put into the development and testing of ArangoDB (in all its deployment modes), ArangoDB, as any other software product, might include bugs or errors and data loss could occur. It is therefore important to regularly backup your data to be able to recover and get up an running again in case of serious problems. Creating backups of your data before an ArangoDB upgrade is also a best practice. M aking use of a high availability deployment mode of ArangoDB, like Active Failover, Cluster or data-center to data-center replication, does not remove the need of taking frequent backups, which are recommended also when using such deployment modes. 364 Administration Administration M ost administration can be managed using the arangosh. Filesystems As one would expect for a database, we recommend a locally mounted filesystems. NFS or similar network filesystems will not work. On Linux we recommend the use of ext4fs, on Windows NTFS and on M acOS HFS+. We recommend to not use BTRFS on Linux. It is known to not work well in conjunction with ArangoDB. We experienced that ArangoDB faces latency issues on accessing its database files on BTRFS partitions. In conjunction with BTRFS and AUFS we also saw data loss on restart. 365 Web Interface Web Interface ArangoDB comes with a built-in web interface for administration. The interface differs for standalone instances and cluster setups. Standalone: Cluster: 366 Web Interface 367 Dashboard Dashboard The Dashboard tab provides statistics which are polled regularly from the ArangoDB server. Requests Statistics: Requests per second Request types Number of client connections Transfer size Transfer size (distribution) Average request time Average request time (distribution) System Resources: Number of threads M emory Virtual size M ajor page faults Used CPU time Replication: Replication state Totals Ticks Progress 368 Cluster Cluster The cluster section displays statistics about the general cluster performance. Statistics: Available and missing coordinators Available and missing database servers M emory usage (percent) Current connections Data (bytes) HTTP (bytes) Average request time (seconds) Nodes Overview The overview shows available and missing coordinators and database servers. 369 Cluster Functions: Coordinator Dashboard: Click on a Coordinator will open a statistics dashboard. Information (Coordinator / Database servers): Name Endpoint Last Heartbeat Status Health Shards The shard section displays all available sharded collections. 370 Cluster Functions: M ove Shard Leader: Click on a leader database of a shard server will open a move shard dialog. Shards can be transferred to all available databas servers, except the leading database server or an available follower. M ove Shard Follower: Click on a follower database of a shard will open a move shard dialog. Shards can be transferred to all available databas servers, except the leading database server or an available follower. Rebalance Shards: A new database server will not have any shards. With the rebalance functionality the cluster will start to rebalance shards including empty database servers. Information (collection): Shard Leader (green state: sync is complete) Followers 371 Collections Collections The collections section displays all available collections. From here you can create new collections and jump into a collection for details (click on a collection tile). Functions: A: Toggle filter properties B: Search collection by name D: Create collection C: Filter properties H: Show collection details (click tile) Information: E: Collection type F: Collection state(unloaded, loaded, ...) G: Collection name Collection 372 Collections There are four view categories: 1. Content: Create a document Delete a document Filter documents Download documents Upload documents 2. Indices: Create indices Delete indices 3. Info: Detailed collection information and statistics 4. Settings: Configure name, journal size, index buckets, wait for sync Delete collection Truncate collection Unload/Load collection Save modifed properties (name, journal size, index buckets, wait for sync) Additional information: Upload format: I. Line-wise { "_key": "key1", ... } { "_key": "key2", ... } II. JSON documents in a list [ { "_key": "key1", ... }, { "_key": "key2", ... } 373 Collections ] 374 Document Document The document section offers a editor which let you edit documents and edges of a collection. Functions: Edit document Save document Delete docment Switch between Tree/Code - M ode Create a new document Information: Displays: _id, _rev, _key properties 375 Queries Query View The query view offers you three different subviews: Editor Running Queries Slow Query History AQL Query Editor The web interface offers a AQL Query Editor: The editor is split into two parts, the query editor pane and the bind parameter pane. The left pane is your regular query input field, where you can edit and then execute or explain your queries. By default, the entered bind parameter will automatically be recognized and shown in the bind parameter table in the right pane, where you can easily edit them. The input fields are equipped with type detection. This means you don't have to use quote marks around string, just write them as-is. Numbers will be treated as numbers, true and false as booleans, null as null-type value. Square brackets can be used to define arrays, and curly braces for objects (keys and values have to be surrounded by double quotes). This will mostly be what you want. But if you want to force something to be treated as string, use quotation marks for the value: 123 // interpreted as number "123" // interpreted as string ["foo", "bar", 123, true] // interpreted as array ['foo', 'bar', 123, true] // interpreted as string 376 Queries If you are used to work with JSON, you may want to switch the bind parameter editor to JSON mode by clicking on the upper right toggle button. You can then edit the bind parameters in raw JSON format. Custom Queries To save the current query use the Save button in the top left corner of the editor or use the shortcut (see below). By pressing the Queries button in the top left corner of the editor you activate the custom queries view. Here you can select a previously stored custom query or one of our query examples. Click on a query title to get a code preview. In addition, there are action buttons to: Copy to editor Explain query Run query Delete query For the built-in example queries, there is only Copy to editor available. To export or import queries to and from JSON you can use the buttons on the right-hand side. Result 377 Queries Each query you execute or explain opens up a new result box, so you are able to fire up multiple queries and view their results at the same time. Every query result box gives you detailed query information and of course the query result itself. The result boxes can be dismissed individually, or altogether using the Remove results button. The toggle button in the top right corner of each box switches back and forth between the Result and AQL query with bind parameters. Spotlight The spotlight feature opens up a modal view. There you can find all AQL keywords, AQL functions and collections (filtered by their type) to help you to be more productive in writing your queries. Spotlight can be opened by the magic wand icon in the toolbar or via shortcut (see below). AQL Editor Shortcuts 378 Queries Ctrl / Cmd + Return to execute a query Ctrl / Cmd + Shift + Return to explain a query Ctrl / Cmd + Shift + S to save the current query Ctrl / Cmd + Shift + C to toggle comments Ctrl + Space to open up the spotlight search Ctrl + Cmd + Z to undo last change Ctrl + Cmd + Shift + Z to redo last change Running Queries The Running Queries tab gives you a compact overview of all running queries. By clicking the red minus button, you can abort the execution of a running query. Slow Query History 379 Queries The Slow Query History tab gives you a compact overview of all past slow queries. 380 Graphs Graphs The Graphs tab provides a viewer facility for graph data stored in ArangoDB. It allows browsing ArangoDB graphs stored in the _graphs system collection or a graph consisting of an arbitrary vertex and edge collection. Please note that the graph viewer requires canvas (optional: webgl) support in your browser. Especially Internet Explorer browsers older than version 9 are likely to not support this. Graph Viewer 381 Graphs Top Toolbar Functions: Load full graph (Also nodes without connections will be drawn. Useful during graph modeling setup) Take a graph screenshot Start full screen mode Open graph options menu Default Context M enu (mouse-click background): Add a new node Close visible context menu(s) Node Context M enu (mouse-click node): Delete node Edit node Expand node (Show all bound edges) Draw edge (Connect with another node) Set as startnode (The Graph will rerender starting the selected node and given options (graph options menu)) Edge Context M enu (mouse-click edge): Edit edge Delete edge Edge Highlighting (right-mouse-click node): Highlight all edges connected to the node (right-click at the background will remove highlighting) 382 Graphs Graph Viewer Options Graph Options M enu: Startnode (string - valid node id or space seperated list of id's): Heart of your graph. Rendering and traversing will start from here. Empty value means: a random starting point will be used. Layout: Different graph layouting algoritms. No overlap (optimal: big graph), force layout (optimal: medium graph), fruchtermann (optimal: little to medium graph). Renderer: Canvas mode allows editing. WebGL currently offers only display mode (a lot faster with much nodes/edges). Search depth (number): Search depth which is starting from your start node. Limit (number): Limit nodes count. If empty or zero, no limit is set. Nodes Options M enu: Label (string): Nodes will be labeled by this attribute. If node attribute is not found, no label will be displayed. Add Collection Name: This appends the collection name to the label, if it exists. Color By Collections: Should nodes be colorized by their collection? If enabled, node color and node color attribute will be ignored. Color: Default node color. Color Attribute (string): If an attribute is given, nodes will then be colorized by the attribute. This setting ignores default node color if set. Size By Connections: Should nodes be sized by their edges count? If enabled, node sizing attribute will be ignored. Sizing Attribute (number): Default node size. Numeric value > 0. Edges Options M enu: Label (string): Edges will be labeled by this attribute. If edge attribute is not found, no label will be displayed. Add Collection Name: This appends the collection name to the label, if it exists. Color By Collections: Should edges be colorized by their collection? If enabled, edge color and edge color attribute will be ignored. Color: Default edge color. Color Attribute (string): If an attribute is given, edges will then be colorized by the attribute. This setting ignores default node color if set. Type: The renderer offers multiple types of rendering. They only differ in their display style, except for the type 'curved'. The curved type allows to display more than one edges between two nodes. 383 Graphs 384 Services Services The services section displays all installed foxx applications. You can create new services or go into a detailed view of a choosen service. Create Service There are four different possibilities to create a new service: 1. Create service via zip file 2. Create service via github repository 3. Create service via official ArangoDB store 4. Create a blank service from scratch 385 Services Service View This section offers several information about a specific service. There are four view categories: 1. Info: Displays name, short description, license, version, mode (production, development) Offers a button to go to the services interface (if available) 386 Services 2. Api: Display API as SwaggerUI Display API as RAW JSON 3. Readme: Displays the services manual (if available) 4. Settings: Download service as zip file Run service tests (if available) Run service scripts (if available) Configure dependencies (if available) Change service parameters (if available) Change mode (production, development) Replace the service Delete the service 387 Users Managing Users in the Web Interface ArangoDB users are globally stored in the _system database and can only be mananged while logged on to this database. There you can find the Users section: General Select a user to bring up the General tab with the username, name and active status, as well as options to delete the user or change the password. 388 Users Permissions Select a user and go to the Permissions tab. You will see a list of databases and their corresponding database access level for that user. 389 Users Please note that server access level follows from the access level on the database _system. Furthermore, the default database access level for this user appear in the artificial row with the database name * . Below this table is another one for the collection category access levels. At first, it shows the list of databases, too. If you click on a database, the list of collections in that database will be open and you can see the defined collection access levels for each collection of that database (which can be all unselected which means that nothing is explicitly set). The default access levels for this user and database appear in the artificial row with the collection name * . Also see Managing Users about access levels. 390 Logs Logs The logs section displays all available log entries. Log entries are filterable by their log level types. Functions: Filter log entries by log level (all, info, error, warning, debug) Information: Loglevel Date M essage 391 ArangoDB Shell ArangoDB Shell Introduction The ArangoDB shell (arangosh) is a command-line tool that can be used for administration of ArangoDB, including running ad-hoc queries. The arangosh binary is shipped with ArangoDB. It offers a JavaScript shell environment providing access to the ArangoDB server. Arangosh can be invoked like this: unix> arangosh By default arangosh will try to connect to an ArangoDB server running on server localhost on port 8529. It will use the username root and an empty password by default. Additionally it will connect to the default database (_system). All these defaults can be changed using the following command-line options: --server.database : name of the database to connect to --server.endpoint : endpoint to connect to --server.username : database username --server.password : password to use when connecting --server.authentication : whether or not to use authentication For example, to connect to an ArangoDB server on IP 192.168.173.13 on port 8530 with the user foo and using the database test, use: unix> arangosh \ --server.endpoint tcp://192.168.173.13:8530 --server.username foo --server.database test \ \ \ --server.authentication true arangosh will then display a password prompt and try to connect to the server after the password was entered. To change the current database after the connection has been made, you can use the db._useDatabase() command in arangosh: arangosh> db._createDatabase("myapp"); true arangosh> db._useDatabase("myapp"); true arangosh> db._useDatabase("_system"); true arangosh> db._dropDatabase("myapp"); true To get a list of available commands, arangosh provides a help() function. Calling it will display helpful information. arangosh also provides auto-completion. Additional information on available commands and methods is thus provided by typing the first few letters of a variable and then pressing the tab key. It is recommend to try this with entering db. (without pressing return) and then pressing tab. By the way, arangosh provides the db object by default, and this object can be used for switching to a different database and managing collections inside the current database. For a list of available methods for the db object, type arangosh> db._help(); show execution results you can paste multiple lines into arangosh, given the first line ends with an opening brace: arangosh> for (var i = 0; i < 10; i ++) { 392 ArangoDB Shell ........> require("@arangodb").print("Hello world " + i + "!\n"); ........> } show execution results To load your own JavaScript code into the current JavaScript interpreter context, use the load command: require("internal").load("/tmp/test.js") // <- Linux / MacOS require("internal").load("c:\\tmp\\test.js") // <- Windows Exiting arangosh can be done using the key combination + D or by typing quit Escaping In AQL, escaping is done traditionally with the backslash character: \ . As seen above, this leads to double backslashes when specifying Windows paths. Arangosh requires another level of escaping, also with the backslash character. It adds up to four backslashes that need to be written in Arangosh for a single literal backslash ( c:\tmp\test.js ): db._query('RETURN "c:\\\\tmp\\\\test.js"') You can use bind variables to mitigate this: var somepath = "c:\\tmp\\test.js" db._query(aql`RETURN ${somepath}`) 393 Shell Output ArangoDB Shell Output The ArangoDB shell will print the output of the last evaluated expression by default: arangosh> 42 * 23 966 In order to prevent printing the result of the last evaluated expression, the expression result can be captured in a variable, e.g. arangosh> var calculationResult = 42 * 23 There is also the print function to explicitly print out values in the ArangoDB shell: arangosh> print({ a: "123", b: [1,2,3], c: "test" }); show execution results By default, the ArangoDB shell uses a pretty printer when JSON documents are printed. This ensures documents are printed in a human-readable way: arangosh> db._create("five") arangosh> for (i = 0; i < 5; i++) db.five.save({value:i}) arangosh> db.five.toArray() show execution results While the pretty-printer produces nice looking results, it will need a lot of screen space for each document. Sometimes a more dense output might be better. In this case, the pretty printer can be turned off using the command stop_pretty_print(). To turn on pretty printing again, use the start_pretty_print() command. 394 Configuration ArangoDB Shell Configuration arangosh will look for a user-defined startup script named .arangosh.rc in the user's home directory on startup. The home directory will likely be /home/ / %HOMEPATH% on Unix/Linux, and is determined on Windows by peeking into the environment variables %HOMEDRIVE% and . If the file .arangosh.rc is present in the home directory, arangosh will execute the contents of this file inside the global scope. You can use this to define your own extra variables and functions that you need often. For example, you could put the following into the .arangosh.rc file in your home directory: // "var" keyword avoided intentionally... // otherwise "timed" would not survive the scope of this script global.timed = function (cb) { console.time("callback"); cb(); console.timeEnd("callback"); }; This will make a function named timed available in arangosh in the global scope. You can now start arangosh and invoke the function like this: timed(function () { for (var i = 0; i < 1000; ++i) { db.test.save({ value: i }); } }); Please keep in mind that, if present, the .arangosh.rc file needs to contain valid JavaScript code. If you want any variables in the global scope to survive you need to omit the var keyword for them. Otherwise the variables will only be visible inside the script itself, but not outside. 395 Details Details about the ArangoDB Shell After the server has been started, you can use the ArangoDB shell (arangosh) to administrate the server. Without any arguments, the ArangoDB shell will try to contact the server on port 8529 on the localhost. For more information see the ArangoDB Shell documentation. You might need to set additional options (endpoint, username and password) when connecting: unix> ./arangosh --server.endpoint tcp://127.0.0.1:8529 --server.username root The shell will print its own version number and if successfully connected to a server the version number of the ArangoDB server. Command-Line Options Use --help to get a list of command-line options: unix> ./arangosh --help STANDARD options: --audit-log audit log file to save commands and results to --configuration read configuration file --help help message --max-upload-size maximum size of import chunks (in bytes) (default: 500000) --no-auto-complete disable auto completion --no-colors deactivate color support --pager output pager (default: "less -X -R -F -L") --pretty-print pretty print values --quiet no banner --temp.path path for temporary files (default: "/tmp/arangodb") --use-pager use pager JAVASCRIPT options: --javascript.check syntax check code JavaScript code from file --javascript.execute execute JavaScript code from file --javascript.execute-string execute JavaScript code from string --javascript.startup-directory startup paths containing the JavaScript files --javascript.unit-tests do not start as shell, run unit tests instead --jslint do not start as shell, run jslint instead LOGGING options: --log.level log level (default: "info") CLIENT options: --server.connect-timeout connect timeout in seconds (default: 3) --server.authentication whether or not to use authentication (default: true) --server.endpoint endpoint to connect to, use 'none' to start without a server (default: "tcp://127.0 .0.1:8529") --server.password password to use when connecting (leave empty for prompt) --server.request-timeout request timeout in seconds (default: 300) --server.username username to use when connecting (default: "root") Database Wrappers The db object is available in arangosh as well as on arangod i.e. if you're using Foxx. While its interface is persistant between the arangosh and the arangod implementations, its underpinning is not. The arangod implementation are JavaScript wrappers around ArangoDB's native C++ implementation, whereas the arangosh implementation wraps HTTP accesses to ArangoDB's RESTfull API. So while this code may produce similar results when executed in arangosh and arangod, the cpu usage and time required will be really different: for (i = 0; i < 100000; i++) { db.test.save({ name: { first: "Jan" }, count: i}); } Since the arangosh version will be doing around 100k HTTP requests, and the arangod version will directly write to the database. 396 Details Using arangosh via unix shebang mechanisms In unix operating systems you can start scripts by specifying the interpreter in the first line of the script. This is commonly called shebang or hash bang . You can also do that with arangosh , i.e. create ~/test.js : #!/usr/bin/arangosh --javascript.execute require("internal").print("hello world") db._query("FOR x IN test RETURN x").toArray() Note that the first line has to end with a blank in order to make it work. M ark it executable to the OS: #> chmod a+x ~/test.js and finaly try it out: #> ~/test.js 397 Arangoimp Arangoimp This manual describes the ArangoDB importer arangoimp, which can be used for bulk imports. The most convenient method to import a lot of data into ArangoDB is to use the arangoimp command-line tool. It allows you to import data records from a file into an existing database collection. It is possible to import document keys with the documents using the _key attribute. When importing into an edge collection, it is mandatory that all imported documents have the _from and _to attributes, and that they contain valid references. Let's assume for the following examples you want to import user data into an existing collection named "users" on the server. Importing Data into an ArangoDB Database Importing JSON-encoded Data Let's further assume the import at hand is encoded in JSON. We'll be using these example user records to import: { "name" : { "first" : "John", "last" : "Connor" }, "active" : true, "age" : 25, "likes" : [ "swimming"] } { "name" : { "first" : "Jim", "last" : "O'Brady" }, "age" : 19, "likes" : [ "hiking", "singing" ] } { "name" : { "first" : "Lisa", "last" : "Jones" }, "dob" : "1981-04-09", "likes" : [ "running" ] } To import these records, all you need to do is to put them into a file (with one line for each record to import) and run the following command: > arangoimp --file "data.json" --type jsonl --collection "users" This will transfer the data to the server, import the records, and print a status summary. To show the intermediate progress during the import process, the option --progress can be added. This option will show the percentage of the input file that has been sent to the server. This will only be useful for big import files. > arangoimp --file "data.json" --type json --collection users --progress true It is also possible to use the output of another command as an input for arangoimp. For example, the following shell command can be used to pipe data from the cat process to arangoimp: > cat data.json | arangoimp --file - --type json --collection users Note that you have to use --file - if you want to use another command as input for arangoimp. No progress can be reported for such imports as the size of the input will be unknown to arangoimp. By default, the endpoint tcp://127.0.0.1:8529 will be used. If you want to specify a different endpoint, you can use the --server.endpoint option. You probably want to specify a database user and password as well. You can do so by using the options --server.username and -server.password. If you do not specify a password, you will be prompted for one. > arangoimp --server.endpoint tcp://127.0.0.1:8529 --server.username root --file "data.json" --type json --collection "users" Note that the collection (users in this case) must already exist or the import will fail. If you want to create a new collection with the import data, you need to specify the --create-collection option. Note that by default it will create a document collection and no ede collection. > arangoimp --file "data.json" --type json --collection "users" --create-collection true To create an edge collection instead, use the --create-collection-type option and set it to edge: > arangoimp --file "data.json" --collection "myedges" --create-collection true --create-collection-type edge 398 Arangoimp When importing data into an existing collection it is often convenient to first remove all data from the collection and then start the import. This can be achieved by passing the --overwrite parameter to arangoimp. If it is set to true, any existing data in the collection will be removed prior to the import. Note that any existing index definitions for the collection will be preserved even if --overwrite is set to true. > arangoimp --file "data.json" --type json --collection "users" --overwrite true As the import file already contains the data in JSON format, attribute names and data types are fully preserved. As can be seen in the example data, there is no need for all data records to have the same attribute names or types. Records can be inhomogeneous. Please note that by default, arangoimp will import data into the specified collection in the default database (_system). To specify a different database, use the --server.database option when invoking arangoimp. The tool also supports parallel imports, with multiple threads. Using multiple threads may provide a speedup, especially when using the RocksDB storage engine. To specify the number of parallel threads use the --threads option: > arangoimp --threads 4 --file "data.json" --type json --collection "users" Note that using multiple threads may lead to a non-sequential import of the input data. Data that appears later in the input file may be imported earlier than data that appears earlier in the input file. This is normally not a problem but may cause issues when when there are data dependencies or duplicates in the import data. In this case, the number of threads should be set to 1. JSON input file formats Note: arangoimp supports two formats when importing JSON data from a file. The first format that we also used above is commonly known as jsonl). However, in contrast to the JSONL specification it requires the input file to contain one complete JSON document in each line, e.g. { "_key": "one", "value": 1 } { "_key": "two", "value": 2 } { "_key": "foo", "value": "bar" } ... So one could argue that this is only a subset of JSONL. The above format can be imported sequentially by arangoimp. It will read data from the input file in chunks and send it in batches to the server. Each batch will be about as big as specified in the command-line parameter --batch-size. An alternative is to put one big JSON document into the input file like this: [ { "_key": "one", "value": 1 }, { "_key": "two", "value": 2 }, { "_key": "foo", "value": "bar" }, ... ] This format allows line breaks within the input file as required. The downside is that the whole input file will need to be read by arangoimp before it can send the first batch. This might be a problem if the input file is big. By default, arangoimp will allow importing such files up to a size of about 16 M B. If you want to allow your arangoimp instance to use more memory, you may want to increase the maximum file size by specifying the command-line option --batch-size. For example, to set the batch size to 32 M B, use the following command: > arangoimp --file "data.json" --type json --collection "users" --batch-size 33554432 Please also note that you may need to increase the value of --batch-size if a single document inside the input file is bigger than the value of --batch-size. Importing CSV Data 399 Arangoimp arangoimp also offers the possibility to import data from CSV files. This comes handy when the data at hand is in CSV format already and you don't want to spend time converting them to JSON for the import. To import data from a CSV file, make sure your file contains the attribute names in the first row. All the following lines in the file will be interpreted as data records and will be imported. The CSV import requires the data to have a homogeneous structure. All records must have exactly the same amount of columns as there are headers. By default, lines with a different number of values will not be imported and there will be warnings for them. To still import lines with less values than in the header, there is the --ignore-missing option. If set to true, lines that have a different amount of fields will be imported. In this case only those attributes will be populated for which there are values. Attributes for which there are no values present will silently be discarded. Example: "first","last","age","active","dob" "John","Connor",25,true "Jim","O'Brady" With --ignore-missing this will produce the following documents: { "first" : "John", "last" : "Connor", "active" : true, "age" : 25 } { "first" : "Jim", "last" : "O'Brady" } The cell values can have different data types though. If a cell does not have any value, it can be left empty in the file. These values will not be imported so the attributes will not "be there" in document created. Values enclosed in quotes will be imported as strings, so to import numeric values, boolean values or the null value, don't enclose the value in quotes in your file. We'll be using the following import for the CSV import: "first","last","age","active","dob" "John","Connor",25,true, "Jim","O'Brady",19,, "Lisa","Jones",,,"1981-04-09" Hans,dos Santos,0123,, Wayne,Brewer,,false, The command line to execute the import is: > arangoimp --file "data.csv" --type csv --collection "users" The above data will be imported into 5 documents which will look as follows: { "first" : "John", "last" : "Connor", "active" : true, "age" : 25 } { "first" : "Jim", "last" : "O'Brady", "age" : 19 } { "first" : "Lisa", "last" : "Jones", "dob" : "1981-04-09" } { "first" : "Hans", "last" : "dos Santos", "age" : 123 } { "first" : "Wayne", "last" : "Brewer", "active" : false } As can be seen, values left completely empty in the input file will be treated as absent. Numeric values not enclosed in quotes will be treated as numbers. Note that leading zeros in numeric values will be removed. To import numbers with leading zeros, please use strings. The literals true and false will be treated as booleans if they are not enclosed in quotes. Other values not enclosed in quotes will be treated as strings. Any values enclosed in quotes will be treated as strings, too. String values containing the quote character or the separator must be enclosed with quote characters. Within a string, the quote character itself must be escaped with another quote character (or with a backslash if the --backslash-escape option is used). Note that the quote and separator characters can be adjusted via the --quote and --separator arguments when invoking arangoimp. The quote character defaults to the double quote ("). To use a literal quote in a string, you can use two quote characters. To use backslash for escaping quote characters, please set the option --backslash-escape to true. The importer supports Windows (CRLF) and Unix (LF) line breaks. Line breaks might also occur inside values that are enclosed with the quote character. 400 Arangoimp Here's an example for using literal quotes and newlines inside values: "name","password" "Foo","r4ndom""123!" "Bar","wow! this is a multine password!" "Bartholomew ""Bart"" Simpson","Milhouse" Extra whitespace at the end of each line will be ignored. Whitespace at the start of lines or between field values will not be ignored, so please make sure that there is no extra whitespace in front of values or between them. Importing TSV Data You may also import tab-separated values (TSV) from a file. This format is very simple: every line in the file represents a data record. There is no quoting or escaping. That also means that the separator character (which defaults to the tabstop symbol) must not be used anywhere in the actual data. As with CSV, the first line in the TSV file must contain the attribute names, and all lines must have an identical number of values. If a different separator character or string should be used, it can be specified with the --separator argument. An example command line to execute the TSV import is: > arangoimp --file "data.tsv" --type tsv --collection "users" Attribute Name Translation For the CSV and TSV input formats, attribute names can be translated automatically. This is useful in case the import file has different attribute names than those that should be used in ArangoDB. A common use case is to rename an "id" column from the input file into "_key" as it is expected by ArangoDB. To do this, specify the following translation when invoking arangoimp: > arangoimp --file "data.csv" --type csv --translate "id=_key" Other common cases are to rename columns in the input file to _from and _to: > arangoimp --file "data.csv" --type csv --translate "from=_from" --translate "to=_to" The translate option can be specified multiple types. The source attribute name and the target attribute must be separated with a =. Ignoring Attributes For the CSV and TSV input formats, certain attribute names can be ignored on imports. In an ArangoDB cluster there are cases where this can come in handy, when your documents already contain a _key _key attribute and your collection has a sharding attribute other than : In the cluster this configuration is not supported, because ArangoDB needs to guarantee the uniqueness of the _key attribute in all shards of the collection. > arangoimp --file "data.csv" --type csv --remove-attribute "_key" The same thing would apply if your data contains an _id attribute: > arangoimp --file "data.csv" --type csv --remove-attribute "_id" Importing into an Edge Collection 401 Arangoimp arangoimp can also be used to import data into an existing edge collection. The import data must, for each edge to import, contain at least the _from and _to attributes. These indicate which other two documents the edge should connect. It is necessary that these attributes are set for all records, and point to valid document ids in existing collections. Examples { "_from" : "users/1234", "_to" : "users/4321", "desc" : "1234 is connected to 4321" } Note: The edge collection must already exist when the import is started. Using the --create-collection flag will not work because arangoimp will always try to create a regular document collection if the target collection does not exist. Updating existing documents By default, arangoimp will try to insert all documents from the import file into the specified collection. In case the import file contains documents that are already present in the target collection (matching is done via the _key attributes), then a default arangoimp run will not import these documents and complain about unique key constraint violations. However, arangoimp can be used to update or replace existing documents in case they already exist in the target collection. It provides the command-line option --on-duplicate to control the behavior in case a document is already present in the database. The default value of --on-duplicate is error. This means that when the import file contains a document that is present in the target collection already, then trying to re-insert a document with the same _key value is considered an error, and the document in the database will not be modified. Other possible values for --on-duplicate are: update: each document present in the import file that is also present in the target collection already will be updated by arangoimp. update will perform a partial update of the existing document, modifying only the attributes that are present in the import file and leaving all other attributes untouched. The values of system attributes _id, _key, _rev, _from and _to cannot be updated or replaced in existing documents. replace: each document present in the import file that is also present in the target collection already will be replace by arangoimp. replace will replace the existing document entirely, resulting in a document with only the attributes specified in the import file. The values of system attributes _id, _key, _rev, _from and _to cannot be updated or replaced in existing documents. ignore: each document present in the import file that is also present in the target collection already will be ignored and not modified in the target collection. When --on-duplicate is set to either update or replace, arangoimp will return the number of documents updated/replaced in the updated return value. When set to another value, the value of updated will always be zero. When --on-duplicate is set to ignore, arangoimp will return the number of ignored documents in the ignored return value. When set to another value, ignored will always be zero. It is possible to perform a combination of inserts and updates/replaces with a single arangoimp run. When --on-duplicate is set to update or replace, all documents present in the import file will be inserted into the target collection provided they are valid and do not already exist with the specified _key. Documents that are already present in the target collection (identified by _key attribute) will instead be updated/replaced. Arangoimp result output An arangoimp import run will print out the final results on the command line. It will show the number of documents created (created) number of documents updated/replaced (updated/replaced, only non-zero if --on-duplicate was set to update or replace, see below) number of warnings or errors that occurred on the server side (warnings/errors) number of ignored documents (only non-zero if --on-duplicate was set to ignore). Example created: 2 warnings/errors: 0 updated/replaced: 0 ignored: 0 402 Arangoimp For CSV and TSV imports, the total number of input file lines read will also be printed (lines read). arangoimp will also print out details about warnings and errors that happened on the server-side (if any). Attribute Naming and Special Attributes Attributes whose names start with an underscore are treated in a special way by ArangoDB: the optional _key attribute contains the document's key. If specified, the value must be formally valid (e.g. must be a string and conform to the naming conventions). Additionally, the key value must be unique within the collection the import is run for. _from: when importing into an edge collection, this attribute contains the id of one of the documents connected by the edge. The value of _from must be a syntactically valid document id and the referred collection must exist. _to: when importing into an edge collection, this attribute contains the id of the other document connected by the edge. The value of _to must be a syntactically valid document id and the referred collection must exist. _rev: this attribute contains the revision number of a document. However, the revision numbers are managed by ArangoDB and cannot be specified on import. Thus any value in this attribute is ignored on import. If you import values into _key, you should make sure they are valid and unique. When importing data into an edge collection, you should make sure that all import documents can _from and _to and that their values point to existing documents. To avoid specifying complete document ids (consisting of collection names and document keys) for _from and _to values, there are the options --from-collection-prefix and --to-collection-prefix. If specified, these values will be automatically prepended to each value in _from (or _to resp.). This allows specifying only document keys inside _from and/or _to. Example > arangoimp --from-collection-prefix users --to-collection-prefix products ... Importing the following document will then create an edge between users/1234 and products/4321: { "_from" : "1234", "_to" : "4321", "desc" : "users/1234 is connected to products/4321" } Automatic pacing with busy or low throughput disk subsystems Arangoimport has an automatic pacing algorithm that limits how fast data is sent to the ArangoDB servers. This pacing algorithm exists to prevent the import operation from failing due to slow responses. Google Compute and other VM providers limit the throughput of disk devices. Google's limit is more strict for smaller disk rentals, than for larger. Specifically, a user could choose the smallest disk space and be limited to 3 M bytes per second. Similarly, other users' processes on the shared VM can limit available throughput of the disk devices. The automatic pacing algorithm adjusts the transmit block size dynamically based upon the actual throughput of the server over the last 20 seconds. Further, each thread delivers its portion of the data in mostly non-overlapping chunks. The thread timing creates intentional windows of non-import activity to allow the server extra time for meta operations. Automatic pacing intentionally does not use the full throughput of a disk device. An unlimited (really fast) disk device might not need pacing. Raising the number of threads via the --threads X command line to any value of X greater than 2 will increase the total throughput used. Automatic pacing frees the user from adjusting the throughput used to match available resources. It is disabled by manually specifying any --batch-size . 16777216 was the previous default for --batch-size. Having --batch-size too large can lead to transmitted data piling- up on the server, resulting in a TimeoutError. The pacing algorithm works successfully with M M Files with disks limited to read and write throughput as small as 1 M byte per second. The algorithm works successfully with RocksDB with disks limited to read and write throughput as small as 3 M byte per second. 403 Arangodump Dumping Data from an ArangoDB database To dump data from an ArangoDB server instance, you will need to invoke arangodump. Dumps can be re-imported with arangorestore. arangodump can be invoked by executing the following command: unix> arangodump --output-directory "dump" This will connect to an ArangoDB server and dump all non-system collections from the default database (_system) into an output directory named dump. Invoking arangodump will fail if the output directory already exists. This is an intentional security measure to prevent you from accidentally overwriting already dumped data. If you are positive that you want to overwrite data in the output directory, you can use the parameter --overwrite true to confirm this: unix> arangodump --output-directory "dump" --overwrite true arangodump will by default connect to the _system database using the default endpoint. If you want to connect to a different database or a different endpoint, or use authentication, you can use the following command-line options: --server.database : name of the database to connect to --server.endpoint : endpoint to connect to --server.username : username --server.password : password to use (omit this and you'll be prompted for the password) --server.authentication : whether or not to use authentication Here's an example of dumping data from a non-standard endpoint, using a dedicated database name: unix> arangodump --server.endpoint tcp://192.168.173.13:8531 --server.username backup --server.database mydb --output-directory "dump" When finished, arangodump will print out a summary line with some aggregate statistics about what it did, e.g.: Processed 43 collection(s), wrote 408173500 byte(s) into datafiles, sent 88 batch(es) By default, arangodump will dump both structural information and documents from all non-system collections. To adjust this, there are the following command-line arguments: --dump-data : set to true to include documents in the dump. Set to false to exclude documents. The default value is true. --include-system-collections : whether or not to include system collections in the dump. The default value is false. For example, to only dump structural information of all collections (including system collections), use: unix> arangodump --dump-data false --include-system-collections true --output-directory "dump" To restrict the dump to just specific collections, there is is the --collection option. It can be specified multiple times if required: unix> arangodump --collection myusers --collection myvalues --output-directory "dump" Structural information for a collection will be saved in files with name pattern .structure.json. Each structure file will contains a JSON object with these attributes: parameters: contains the collection properties indexes: contains the collection indexes Document data for a collection will be saved in files with name pattern .data.json. Each line in a data file is a document insertion/update or deletion marker, alongside with some meta data. Starting with Version 2.1 of ArangoDB, the arangodump tool also supports sharding. Simply point it to one of the coordinators and it will behave exactly as described above, working on sharded collections in the cluster. 404 Arangodump However, as opposed to the single instance situation, this operation does not guarantee to dump a consistent snapshot if write operations happen during the dump operation. It is therefore recommended not to perform any data-modifcation operations on the cluster whilst arangodump is running. As above, the output will be one structure description file and one data file per sharded collection. Note that the data in the data file is sorted first by shards and within each shard by ascending timestamp. The structural information of the collection contains the number of shards and the shard keys. Note that the version of the arangodump client tool needs to match the version of the ArangoDB server it connects to. Advanced cluster options Starting with version 3.1.17, collections may be created with shard distribution identical to an existing prototypical collection; i.e. shards are distributed in the very same pattern as in the prototype collection. Such collections cannot be dumped without the reference collection or arangodump with yield an error. unix> arangodump --collection clonedCollection --output-directory "dump" ERROR Collection clonedCollection's shard distribution is based on a that of collection prototypeCollection, which is not dumpe d along. You may dump the collection regardless of the missing prototype collection by using the --ignore-distribute-shards-lik e-errors parameter. There are two ways to approach that problem: Solve it, i.e. dump the prototype collection along: unix> arangodump --collection clonedCollection --collection prototypeCollection --output-directory "dump" Processed 2 collection(s), wrote 81920 byte(s) into datafiles, sent 1 batch(es) Or override that behaviour to be able to dump the collection individually. unix> arangodump --collection B clonedCollection --output-directory "dump" --ignore-distribute-shards-like-errors Processed 1 collection(s), wrote 34217 byte(s) into datafiles, sent 1 batch(es) No that in consequence, restoring such a collection without its prototype is affected. arangorestore Encryption In the ArangoDB Enterprise Edition there are the additional parameters: Encryption key stored in file --encryption.keyfile path-of-keyfile The file path-to-keyfile must contain the encryption key. This file must be secured, so that only arangod can access it. You should also ensure that in case some-one steals the hardware, he will not be able to read the file. For example, by encryption creating a in-memory file-system under /mytmpfs /mytmpfs or . Encryption key generated by a program --encryption.key-generator path-to-my-generator The program path-to-my-generator must output the encryption on standard output and exit. Creating keys The encryption keyfile must contain 32 bytes of random data. You can create it with a command line this. dd if=/dev/random bs=1 count=32 of=yourSecretKeyFile 405 Arangodump For security, it is best to create these keys offline (away from your database servers) and directly store them in you secret management tool. 406 Arangorestore Arangorestore To reload data from a dump previously created with arangodump, ArangoDB provides the arangorestore tool. Please note that arangorestore must not be used to create several similar database instances in one installation. This means if you have an arangodump output of database and restore the dump of a into b a , and you create a second database b on the same instance of ArangoDB, - data integrity can not be guaranteed. Reloading Data into an ArangoDB database Invoking arangorestore arangorestore can be invoked from the command-line as follows: unix> arangorestore --input-directory "dump" This will connect to an ArangoDB server and reload structural information and documents found in the input directory dump. Please note that the input directory must have been created by running arangodump before. arangorestore will by default connect to the _system database using the default endpoint. If you want to connect to a different database or a different endpoint, or use authentication, you can use the following command-line options: --server.database : name of the database to connect to --server.endpoint : endpoint to connect to --server.username : username --server.password : password to use (omit this and you'll be prompted for the password) --server.authentication : whether or not to use authentication Since version 2.6 arangorestore provides the option --create-database. Setting this option to true will create the target database if it does not exist. When creating the target database, the username and passwords passed to arangorestore (in options --server.username and -server.password) will be used to create an initial user for the new database. The option --force-same-database allows restricting arangorestore operations to a database with the same name as in the source dump's "dump.json" file. It can thus be used to prevent restoring data into a "wrong" database by accident. For example, if a dump was taken from database database The option set to true --force-same-database a , and the restore is attempted into database b , then with the --force-same- , arangorestore will abort instantly. option is set to false by default to ensure backwards-compatibility. Here's an example of reloading data to a non-standard endpoint, using a dedicated database name: unix> arangorestore --server.endpoint tcp://192.168.173.13:8531 --server.username backup --server.database mydb --input-directo ry "dump" To create the target database whe restoring, use a command like this: unix> arangorestore --server.username backup --server.database newdb --create-database true --input-directory "dump" arangorestore will print out its progress while running, and will end with a line showing some aggregate statistics: Processed 2 collection(s), read 2256 byte(s) from datafiles, sent 2 batch(es) By default, arangorestore will re-create all non-system collections found in the input directory and load data into them. If the target database already contains collections which are also present in the input directory, the existing collections in the database will be dropped and re-created with the data found in the input directory. 407 Arangorestore The following parameters are available to adjust this behavior: --create-collection : set to true to create collections in the target database. If the target database already contains a collection with the same name, it will be dropped first and then re-created with the properties found in the input directory. Set to false to keep existing collections in the target database. If set to false and arangorestore encounters a collection that is present in both the target database and the input directory, it will abort. The default value is true. --import-data : set to true to load document data into the collections in the target database. Set to false to not load any document data. The default value is true. --include-system-collections : whether or not to include system collections when re-creating collections or reloading data. The default value is false. For example, to (re-)create all non-system collections and load document data into them, use: unix> arangorestore --create-collection true --import-data true --input-directory "dump" This will drop potentially existing collections in the target database that are also present in the input directory. To include system collections too, use --include-system-collections true: unix> arangorestore --create-collection true --import-data true --include-system-collections true --input-directory "dump" To (re-)create all non-system collections without loading document data, use: unix> arangorestore --create-collection true --import-data false --input-directory "dump" This will also drop existing collections in the target database that are also present in the input directory. To just load document data into all non-system collections, use: unix> arangorestore --create-collection false --import-data true --input-directory "dump" To restrict reloading to just specific collections, there is is the --collection option. It can be specified multiple times if required: unix> arangorestore --collection myusers --collection myvalues --input-directory "dump" Collections will be processed by in alphabetical order by arangorestore, with all document collections being processed before all edge collections. This is to ensure that reloading data into edge collections will have the document collections linked in edges (_from and _to attributes) loaded. Encryption See arangodump for details. Restoring Revision Ids and Collection Ids arangorestore will reload document and edges data with the exact same _key, _from and _to values found in the input directory. However, when loading document data, it will assign its own values for the _rev attribute of the reloaded documents. Though this difference is intentional (normally, every server should create its own _rev values) there might be situations when it is required to re-use the exact same _rev values for the reloaded data. This can be achieved by setting the --recycle-ids parameter to true: unix> arangorestore --collection myusers --collection myvalues --input-directory "dump" Note that setting --recycle-ids to true will also cause collections to be (re-)created in the target database with the exact same collection id as in the input directory. Any potentially existing collection in the target database with the same collection id will then be dropped. Reloading Data into a different Collection 408 Arangorestore With some creativity you can use arangodump and arangorestore to transfer data from one collection into another (either on the same server or not). For example, to copy data from a collection myvalues in database mydb into a collection mycopyvalues in database mycopy, you can start with the following command: unix> arangodump --collection myvalues --server.database mydb --output-directory "dump" This will create two files, myvalues.structure.json and myvalues.data.json, in the output directory. To load data from the datafile into an existing collection mycopyvalues in database mycopy, rename the files to mycopyvalues.structure.json and mycopyvalues.data.json. After that, run the following command: unix> arangorestore --collection mycopyvalues --server.database mycopy --input-directory "dump" Using arangorestore with sharding As of Version 2.1 the arangorestore tool supports sharding. Simply point it to one of the coordinators in your cluster and it will work as usual but on sharded collections in the cluster. If arangorestore is asked to drop and re-create a collection, it will use the same number of shards and the same shard keys as when the collection was dumped. The distribution of the shards to the servers will also be the same as at the time of the dump. This means in particular that DBservers with the same IDs as before must be present in the cluster at time of the restore. If a collection was dumped from a single instance, one can manually add the structural description for the shard keys and the number and distribution of the shards and then the restore into a cluster will work. If you restore a collection that was dumped from a cluster into a single ArangoDB instance, the number of shards and the shard keys will silently be ignored. Note that in a cluster, every newly created collection will have a new ID, it is not possible to reuse the ID from the originally dumped collection. This is for safety reasons to ensure consistency of IDs. Restoring collections with sharding prototypes arangorestore will yield an error, while trying to restore a collection, whose shard distribution follows a collection, which does not exist in the cluster and which was not dumped along: unix> arangorestore --collection clonedCollection --server.database mydb --input-directory "dump" ERROR got error from server: HTTP 500 (Internal Server Error): ArangoError 1486: must not have a distributeShardsLike attribute pointing to an unknown collection Processed 0 collection(s), read 0 byte(s) from datafiles, sent 0 batch(es) The collection can be restored by overriding the error message as follows: unix> arangorestore --collection clonedCollection --server.database mydb --input-directory "dump" --ignore-distribute-shards-li ke-errors Restore into an authentication enabled ArangoDB Of course you can restore data into a password protected ArangoDB as well. However this requires certain user rights for the user used in the restore process. The rights are described in detail in the M anaging Users chapter. For restore this short overview is sufficient: When importing into an existing database, the given user needs Administrate When creating a new Database during restore, the given user needs with Administrate access on this database. Administrate access on _system . The user will be promoted access on the newly created database. 409 Arangoexport Exporting Data from an ArangoDB database To export data from an ArangoDB server instance, you will need to invoke arangoexport. arangoexport can be invoked by executing the following command: unix> arangoexport --collection test --output-directory "dump" This exports the collections test into the directory dump as one big json array. Every entry in this array is one document from the collection without a specific order. To export more than one collection at a time specify multiple --collection options. The default output directory is export. arangoexport will by default connect to the _system database using the default endpoint. If you want to connect to a different database or a different endpoint, or use authentication, you can use the following command-line options: --server.database : name of the database to connect to --server.endpoint : endpoint to connect to --server.username : username --server.password : password to use (omit this and you'll be prompted for the password) --server.authentication : whether or not to use authentication Here's an example of exporting data from a non-standard endpoint, using a dedicated database name: unix> arangoexport --server.endpoint tcp://192.168.173.13:8531 --server.username backup --server.database mydb --collection tes t --output-directory "my-export" When finished, arangoexport will print out a summary line with some aggregate statistics about what it did, e.g.: Processed 2 collection(s), wrote 9031763 Byte(s), 78 HTTP request(s) Export JSON unix> arangoexport --type json --collection test This exports the collection test into the output directory export as one json array. Every array entry is one document from the collection test Export JSONL unix> arangoexport --type jsonl --collection test This exports the collection test into the output directory export as jsonl. Every line in the export is one document from the collection test as json. Export CSV unix> arangoexport --type csv --collection test --fields _key,_id,_rev This exports the collection test into the output directory export as CSV. The first line contains the header with all field names. Each line is one document represented as CSV and separated with a comma. Objects and Arrays are represented as a JSON string. 410 Arangoexport Export XML unix> arangoexport --type xml --collection test This exports the collection test into the output directory export as generic XM L. The root element of the generated XM L file is named collection. Each document in the collection is exported in a doc XM L attribute. Each document attribute is export in a generic att element, which has a type attribute indicating the attribute value, and a value attribute containing the attribute's value. Export XGMML XGM M L is an XM L application based on GM L. To view the XGM M L file you can use for example Cytoscape. important note If you export all attributes (--xgmml-label-only false) keep in mind that a atrribute names type have to be the same type for all documents. It wont work if you have a attribute named rank that is in one document a string and in another document a integer. Bad // doc1 { "rank": 1 } // doc2 { "rank": "2" } Good // doc1 { "rank": 1 } // doc2 { "rank": 2 } XGMML specific options --xgmml-label-attribute specify the name of the attribute that will become the label in the xgmml file. --xgmml-label-only set to true will only export the label without any attributes in edges or nodes. export based on collections unix> arangoexport --type xgmml --graph-name mygraph --collection vertex --collection edge This exports the a unnamed graph with vertex collection vertex and edge collection edge into the xgmml file mygraph.xgmml. export based on a named graph unix> arangoexport --type xgmml --graph-name mygraph 411 Arangoexport This exports the named graph mygraph into the xgmml file mygraph.xgmml. export XGMML without attributes unix> arangoexport --type xgmml --graph-name mygraph --xgmml-label-only true This exports the named graph mygraph into the xgmml file mygraph.xgmml without the tag in nodes and edges. export XGMML with a specific label unix> arangoexport --type xgmml --graph-name mygraph --xgmml-label-attribute name This exports the named graph mygraph into the xgmml file mygraph.xgmml with a label from documents attribute name instead of the default attribute label. Export via AQL query unix> arangoexport --type jsonl --query "for book in books filter book.sells > 100 return book" Export via an aql query allows you to export the returned data as the type specified with --type. The example exports all books as jsonl that are sold more than 100 times. 412 M anaging Users Managing Users The user management in ArangoDB 3 is similar to the ones found in M ySQL, PostgreSQL, or other database systems. User management is possible in the web interface and in arangosh while logged on to the _system database. Note that the only usernames must not start with :role: . Actions and Access Levels An ArangoDB server contains a list of users. It also defines various access levels that can be assigned to a user (for details, see below) and that are needed to perform certain actions. These actions can be grouped into three categories: server actions database actions collection actions The server actions are create user: allows to create a new user. update user: allows to change the access levels and details of an existing user. drop user: allows to delete an existing user. create database: allows to create a new database. drop database: allows to delete an existing database. shutdown server: remove server from cluster and shutdown The database actions are tied to a given database, and access levels must be set for each database individually. For a given database the actions are create collection: allows to create a new collection in the given database. update collection: allows to update properties of an existing collection. drop collection: allows to delete an existing collection. create index: allows to create an index for an existing collection in the given database. drop index: allows to delete an index of an existing collection in the given database. The collection actions are tied to a given collection of a given database, and access levels must be set for each collection individually. For a given collection the actions are read document: read a document of the given collection. create document: creates a new document in the given collection. modify document: modifies an existing document of the given collection, this can be an update or replace operation. drop document: deletes an existing document of the given collection. truncate collection: deletes all documents of a given collection. To perform actions on the server level the user needs at least the following access levels. The access levels are Administrate and No access: server action server level create a database Administrate drop a database Administrate create a user Administrate 413 M anaging Users update a user Administrate update user access level Administrate drop a user Administrate shutdown server Administrate To perform actions in a specific database (like creating or dropping collections), a user needs at least the following access level. The possible access levels for databases are Administrate, Access and No access. The access levels for collections are Read/Write, Read Only and No Access. database action database level collection level create collection Administrate Read/Write list collections Access Read Only rename collection Administrate Read/Write modify collection properties Administrate Read/Write read properties Access Read Only drop collection Administrate Read/Write create an index Administrate Read/Write drop an index Administrate Read/Write see index definition Access Read Only Note that the access level Access for a database is always required to perform any action on a collection in that database. For collections a user needs the following access levels to the given database and the given collection. The access levels for the database are Administrate, Access and No access. The access levels for the collection are Read/Write, Read Only and No Access. action collection level database level read a document Read/Write or Read Only Administrate or Access create a document Read/Write Administrate or Access modify a document Read/Write Administrate or Access drop a document Read/Write Administrate or Access truncate a collection Read/Write Administrate or Access Example For example, given a database example a collection data in the database example a user JohnSmith If the user JohnSmith is assigned the access level Access for the database example and the level Read/Write for the collection data, then the user is allowed to read, create, modify or delete documents in the collection data. But the user is, for example, not allowed to create indexes for the collection data nor create new collections in the database example. Granting Access Levels Access levels can be managed via the web interface or in arangosh. In order to grant an access level to a user, you can assign one of three access levels for each database and one of three levels for each collection in a database. The server access level for the user follows from the database access level in the _system database, it is Administrate if and only if the database access level is Administrate. Note that this means that database access level Access does not grant a user server access level Administrate. 414 M anaging Users Initial Access Levels When a user creates a database the access level of the user for that database is set to Administrate. The same is true for creating a collection, in this case the user get Read/Write access to the collection. Wildcard Database Access Level With the above definition, one must define the database access level for all database/user pairs in the server, which would be very tedious. In order to simplify this process, it is possible to define, for a user, a wildcard database access level. This wildcard is used if the database access level is not explicitly defined for a certain database. Each new created user has an initial database wildcard of No Access. Changing the wildcard database access level for a user will change the access level for all databases that have no explicitly defined access level. Note that this includes databases which will be created in the future and for which no explicit access levels are set for that user! If you delete the wildcard, the default access level is defined as No Access. The root user has an initial database wildcard of Administrate. Example Assume user JohnSmith has the following database access levels: access level database * Access database shop1 Administrate database shop2 No Access This will give the user JohnSmith the following database level access: database shop1 : Administrate database shop2 : No Access database something If the wildcard * : Access is changed from Access to No Access then the permissions will change as follows: database shop1 : Administrate database shop2 : No Access database something : No Access Wildcard Collection Access Level For each user and database there is a wildcard collection access level. This level is used for all collections pairs without an explicitly defined collection access level. Note that this includes collections which will be created in the future and for which no explicit access levels are set for a that user! Each new created user has an initial collection wildcard of No Access. If you delete the wildcard, the system defaults to No Access. The root user has an initial collection wildcard of Read/Write in every database. When creating a user through db._createDatabase(name, options, users) the access level of the user for this database will be set to Administrate and the wildcard for all collections within this database will be set to Read/Write. Example Assume user JohnSmith has the following database access levels: access level database * Access and collection access levels: access level 415 M anaging Users database * , collection Read/Write database shop1 , collection products Read-Only database shop1 , collection * No Access database shop2 , collection * Read-Only * Then the user doe will get the following collection access levels: : Read-Only database shop1 , collection products database shop1 , collection customers database shop2 , collection reviews database something , collection : No Access : Read-Only else : Read/Write Explanation: directly matches a defined access level. This level is defined as Read-Only. Database shop1 , collection products Database shop1 , collection customers does not match a defined access level. However, database shop1 matches and the wildcard in this database for collection level is No Access. Database shop2 , collection reviews does not match a defined access level. However, database shop2 matches and the wildcard in this database for collection level is Read-Only. Database somehing , collection else does not match a defined access level. The database something also does have a direct matches. Therefore the wildcard is selected. The level is Read/Write. Permission Resolution The access levels for databases and collections are resolved in the following way: For a database "foo": 1. Check if there is a specific database grant for foo, if yes use the granted access level 2. Choose a the higher access level of: A wildcard database grant ( for example A database grant on the _system grantDatabase('user', '*', 'rw' ) database For a collection named "bar": 1. Check if there is a specific database grant for bar, if yes use the granted access level 2. Choose a the higher access level of: Any wildcard access grant in the same database, or on "/" (in this example The access level for the current database (in this example The access level for the _system grantCollection('user', 'foo', '*', 'rw') grantDatabase('user', 'foo', 'rw' ) ) database An exception to this are system collections, here only the access level for the database is used. System Collections The access level for system collections cannot be changed. They follow different rules than user defined collections and may change without further notice. Currently the system collections follow these rules: collection _users (in _system) access level No Access _queues Read-Only _frontend Read/Write * same as db All other system collections have access level Read/Write if the user has Administrate access to the database. They have access level Read/Only if the user has Access to the database. 416 M anaging Users To modify these system collections you should always use the specialized APIs provided by ArangoDB. For example no user has access to the _users collection in the _system database. All changes to the access levels must be done using the @arangodb/users module, the /_users/ API or the web interface. LDAP Users This feature is only available in the Enterprise Edition. ArangoDB supports LDAP as an external authentication system. For detailed information please have look into the LDAP configuration guide. There are a few differences to normal ArangoDB users: ArangoDB does not "know" LDAP users before they first authenticate, calls to various API's using endpoints in _api/users/* will fail until the user first logs-in Access levels of each user are periodically updated, this will happen by default every 5 minutes It is not possible to change permissions on LDAP users directly, only on roles LDAP users cannot store configuration data per user (affects for example custom settings in the graph viewer) To grant access for an LDAP user you will need to create roles within the ArangoDB server. A role is just a user with the ":role:" prefix in its name. Role users cannot login as database users, the ":role:" prefix ensures this. Your LDAP users will need to have at least one role, once the user logs in he will be automatically granted the union of all access rights of all his roles. Note that a lower right grant in one role will be overwritten by a higher access grant in a different role. 417 In Arangosh Managing Users in the ArangoDB Shell Please note, that for backward compatibility the server access levels follow from the database access level on the database _system. Also note that the server and database access levels are represented as rw : for Administrate ro : for Access none : for No access This is again for backward compatibility. Example Fire up arangosh and require the users module. Use it to create a new user: arangosh> var users = require('@arangodb/users'); arangosh> users.save('JohnSmith', 'mypassword'); Creates a user called JohnSmith. This user will have no access at all. arangosh> users.grantDatabase('JohnSmith', 'testdb', 'rw'); This grants the user Administrate access to the database testdb. Note: Be aware that from 3.2 onwards the grantDatabase in a database. If you grant access to a database collections via grantCollection testdb revokeDatabase will revoke this access level setting. will not automatically grant users the access level to write or read collections you will additionally need to explicitly grant access levels to individual . The upgrade procedure from 3.1 to 3.2 sets the wildcard database access level for all users to Administrate and sets the wildcard collection access level for all user/database pairs to Read/Write. arangosh> users.grantCollection('JohnSmith', 'testdb', 'testcoll', 'rw'); Save users.save(user, passwd, active, extra) This will create a new ArangoDB user. The user name must be specified in user and must not be empty. The password must be given as a string, too, but can be left empty if required. If you pass the special value ARANGODB_DEFAULT_ROOT_PASSWORD, the password will be set the value stored in the environment variable ARANGODB_DEFAULT_ROOT_PASSWORD . This can be used to pass an instance variable into ArangoDB. For example, the instance identifier from Amazon. If the active attribute is not specified, it defaults to true. The extra attribute can be used to save custom data with the user. This method will fail if either the user name or the passwords are not specified or given in a wrong format, or there already exists a user with the specified name. Note: The user will not have permission to access any database. You need to grant the access rights for one or more databases using grantDatabase. Examples arangosh> require('@arangodb/users').save('my-user', 'my-secret-password'); show execution results 418 In Arangosh Grant Database users.grantDatabase(user, database, type) This grants type ('rw', 'ro' or 'none') access to the database for the user. If database is "*" , this sets the wildcard database access level for the user user. The server access level follows from the access level for the database _system . Revoke Database users.revokeDatabase(user, database) This clears the access level setting to the database for the user and the wildcard database access setting for this user kicks in. In case no wildcard access was defined the default is No Access. This will also clear the access levels for all the collections in this database. Grant Collection users.grantCollection(user, database, collection, type) This grants type ('rw', 'ro' or 'none') access level to the collection in database for the user. If collection is "*" this sets the wildcard collection access level for the user user in database database. Revoke Collection users.revokeCollection(user, database) This clears the access level setting to the collection collection for the user user. The system will either fallback to the wildcard collection access level or default to No Access Replace users.replace(user, passwd, active, extra) This will look up an existing ArangoDB user and replace its user data. The username must be specified in user, and a user with the specified name must already exist in the database. The password must be given as a string, too, but can be left empty if required. If the active attribute is not specified, it defaults to true. The extra attribute can be used to save custom data with the user. This method will fail if either the user name or the passwords are not specified or given in a wrong format, or if the specified user cannot be found in the database. Note: this function will not work from within the web interface Examples arangosh> require("@arangodb/users").replace("my-user", "my-changed-password"); show execution results Update users.update(user, passwd, active, extra) This will update an existing ArangoDB user with a new password and other data. The user name must be specified in user and the user must already exist in the database. 419 In Arangosh The password must be given as a string, too, but can be left empty if required. If the active attribute is not specified, the current value saved for the user will not be changed. The same is true for the extra attribute. This method will fail if either the user name or the passwords are not specified or given in a wrong format, or if the specified user cannot be found in the database. Examples arangosh> require("@arangodb/users").update("my-user", "my-secret-password"); show execution results isValid users.isValid(user, password) Checks whether the given combination of user name and password is valid. The function will return a boolean value if the combination of user name and password is valid. Each call to this function is penalized by the server sleeping a random amount of time. Examples arangosh> require("@arangodb/users").isValid("my-user", "my-secret-password"); true Remove users.remove(user) Removes an existing ArangoDB user from the database. The user name must be specified in User and the specified user must exist in the database. This method will fail if the user cannot be found in the database. Examples arangosh> require("@arangodb/users").remove("my-user"); Document users.document(user) Fetches an existing ArangoDB user from the database. The user name must be specified in user. This method will fail if the user cannot be found in the database. Examples arangosh> require("@arangodb/users").document("my-user"); show execution results All 420 In Arangosh users.all() Fetches all existing ArangoDB users from the database. Examples arangosh> require("@arangodb/users").all(); show execution results Reload users.reload() Reloads the user authentication data on the server All user authentication data is loaded by the server once on startup only and is cached after that. When users get added or deleted, a cache flush is done automatically, and this can be performed by a call to this method. Examples arangosh> require("@arangodb/users").reload(); Permission users.permission(user, database[, collection]) Fetches the access level to the database or a collection. The user and database name must be specified, optionally you can specify the collection name. This method will fail if the user cannot be found in the database. Examples arangosh> require("@arangodb/users").permission("my-user", "testdb"); rw 421 Server Configuration Command-line options General Options General help --help -h Prints a list of the most common options available and then exits. In order to see all options use --help-all. To make use of the startup options from a program, the option --dump-options will print out all options in JSON format and then exits. Version --version -v Prints the version of the server and exits. Configuration Files Options can be specified on the command line or in configuration files. If a string Variable occurs in the value, it is replaced by the corresponding environment variable. --configuration filename -c filename Specifies the name of the configuration file to use. If this command is not passed to the server, then by default, the server will attempt to first locate a file named ~/.arango/arangod.conf in the user's home directory. If no such file is found, the server will proceed to look for a file arangod.conf in the system configuration directory. The system configuration directory is platform-specific, and may be changed when compiling ArangoDB yourself. It may default to /etc/arangodb or /usr/local/etc/arangodb. This file is installed when using a package manager like rpm or dpkg. If you modify this file and later upgrade to a new version of ArangoDB, then the package manager normally warns you about the conflict. In order to avoid these warning for small adjustments, you can put local overrides into a file arangod.conf.local. Only command line options with a value should be set within the configuration file. Command line options which act as flags should be entered on the command line when starting the server. Each option is specified on a separate line in the form: key = value Alternatively, a header section can be specified and options pertaining to that section can be specified in a shorter form [log] level = trace rather than specifying log.level = trace So you see in general --section.param value translates to [section] param=value 422 Server Configuration Whitespace around = is ignored in the configuration file. Do not put spaces around additional however. The following example shows the correct way to specify a log level of trace = in the parameter value for the topic startup : log.level = startup=trace Note that there is no whitespace between startup and = , and also not Where one section may occur multiple times, and the last occurance of param = and trace . will become the final value. In case of parameters being vectors, multiple occurance adds another item to the vector. Vectors can be identified by the ... in the --help output of the binaries. Comments can be placed in the configuration file, only if the line begins with one or more hash symbols (#). There may be occasions where a configuration file exists and the user wishes to override configuration settings stored in a configuration file. Any settings specified on the command line will overwrite the same setting when it appears in a configuration file. If the user wishes to completely ignore configuration files without necessarily deleting the file (or files), then add the command line option -c none or --configuration none when starting up the server. Note that, the word none is case-insensitive. 423 Operating System Configuration Operating System Configuration File Systems (LINUX) We recommend to not use BTRFS on linux, it's known to not work well in conjunction with ArangoDB. We experienced that arangodb facing latency issues on accessing its database files on BTRFS partitions. In conjunction with BTRFS and AUFS we also saw data loss on restart. Virtual Memory Page Sizes (LINUX) By default, ArangoDB uses Jemalloc as the memory allocator. Jemalloc does a good job of reducing virtual memory fragmentation, especially for long-running processes. Unfortunately, some OS configurations can interfere with Jemalloc's ability to function properly. Specifically, Linux's "transparent hugepages", Windows' "large pages" and other similar features sometimes prevent Jemalloc from returning unused memory to the operating system and result in unnecessarily high memory use. Therefore, we recommend disabling these features when using Jemalloc with ArangoDB. Please consult your operating system's documentation for how to do this. Execute sudo bash -c "echo madvise >/sys/kernel/mm/transparent_hugepage/enabled" sudo bash -c "echo madvise >/sys/kernel/mm/transparent_hugepage/defrag" before executing arangod . Swap Space (LINUX) It is recommended to assign swap space for a server that is running arangod. Configuring swap space can prevent the operating system's OOM killer from killing ArangoDB too eagerly on Linux. Over-Commit Memory For the M M Files storage engine, execute sudo bash -c "echo 0 >/proc/sys/vm/overcommit_memory" before executing arangod . For the RocksDB storage engine, execute sudo bash -c "echo 2 >/proc/sys/vm/overcommit_memory" before starting. From www.kernel.org: When this flag is 0, the kernel attempts to estimate the amount of free memory left when userspace requests more memory. When this flag is 1, the kernel pretends there is always enough memory until it actually runs out. When this flag is 2, the kernel uses a "never overcommit" policy that attempts to prevent any overcommit of memory. 424 Operating System Configuration Note that then using an overcommit_memory setting of 2, this will by default allow processes to use all swap space but only half of the available RAM . This can be changed by adjusting the value of overcommit_ratio as well. From www.kernel.org: When overcommit_memory is set to 2, the committed address space is not permitted to exceed swap plus this percentage of physical RAM . Zone Reclaim Execute sudo bash -c "echo 0 >/proc/sys/vm/zone_reclaim_mode" before executing arangod . From www.kernel.org: This is value ORed together of 1 = Zone reclaim on 2 = Zone reclaim writes dirty pages out 4 = Zone reclaim swaps pages NUMA M ulti-processor systems often have non-uniform Access M emory (NUM A). ArangoDB should be started with interleave on such system. This can be achieved using numactl --interleave=all arangod ... Max Memory Mappings (LINUX) Linux kernels by default restrict the maximum number of memory mappings of a single process to about 64K mappings. While this value is sufficient for most workloads, it may be too low for a process that has lots of parallel threads that all require their own memory mappings. In this case all the threads' memory mappings will be accounted to the single arangod process, and the maximum number of 64K mappings may be reached. When the maximum number of mappings is reached, calls to mmap will fail, so the process will think no more memory is available although there may be plenty of RAM left. To avoid this scenario, it is recommended to raise the default value for the maximum number of memory mappings to a sufficiently high value. As a rule of thumb, one could use 8 times the number of available cores times 8,000. For a 32 core server, a good rule-of-thumb value thus would be 2,048,000 (32 8 8000). For certain workloads, it may be sensible to use even a higher value for the number of memory mappings. To set the value once, use the following command before starting arangod: sudo bash -c "sysctl -w 'vm.max_map_count=2048000'" To make the settings durable, it will be necessary to store the adjusted settings in /etc/sysctl.conf or other places that the operating system is looking at. Environment Variables (LINUX) 425 Operating System Configuration It is recommended to set the environment variable GLIBCXX_FORCE_NEW to 1 on systems that use glibc++ in order to disable the memory pooling built into glibc++. That memory pooling is unnecessary because Jemalloc will already do memory pooling. Execute export GLIBCXX_FORCE_NEW=1 before starting arangod . 32bit While it is possible to compile ArangoDB on 32bit system, this is not a recommended environment. 64bit systems can address a significantly bigger memory region. 426 M anaging Endpoints Managing Endpoints The ArangoDB server can listen for incoming requests on multiple endpoints. The endpoints are normally specified either in ArangoDB's configuration file or on the command-line, using the --server.endpoint . ArangoDB supports different types of endpoints: tcp://ipv4-address:port - TCP/IP endpoint, using IPv4 tcp://[ipv6-address]:port - TCP/IP endpoint, using IPv6 ssl://ipv4-address:port - TCP/IP endpoint, using IPv4, SSL encryption ssl://[ipv6-address]:port - TCP/IP endpoint, using IPv6, SSL encryption unix:///path/to/socket - Unix domain socket endpoint If a TCP/IP endpoint is specified without a port number, then the default port (8529) will be used. If multiple endpoints need to be used, the option can be repeated multiple times. The default endpoint for ArangoDB is tcp://127.0.0.1:8529 or tcp://localhost:8529. EXAMPLES unix> ./arangod --server.endpoint tcp://127.0.0.1:8529 --server.endpoint ssl://127.0.0.1:8530 --ssl.keyfile server.pem /tmp/vocbase 2012-07-26T07:07:47Z [8161] INFO using SSL protocol version 'TLSv1' 2012-07-26T07:07:48Z [8161] INFO using endpoint 'ssl://127.0.0.1:8530' for http ssl requests 2012-07-26T07:07:48Z [8161] INFO using endpoint 'tcp://127.0.0.1:8529' for http tcp requests 2012-07-26T07:07:49Z [8161] INFO ArangoDB (version 1.1.alpha) is ready for business 2012-07-26T07:07:49Z [8161] INFO Have Fun! TCP Endpoints Given a hostname: --server.endpoint tcp://hostname:port Given an IPv4 address: --server.endpoint tcp://ipv4-address:port Given an IPv6 address: --server.endpoint tcp://[ipv6-address]:port On one specific ethernet interface each port can only be bound exactly once. You can look up your available interfaces using the ifconfig command on Linux / M acOSX - the Windows equivalent is ipconfig (See Wikipedia for more details). The general names of the interfaces differ on OS's and hardwares they run on. However, typically every host has a so called loopback interface, which is a virtual interface. By convention it always has the address 127.0.0.1 or ::1 (ipv6), and can only be reached from exactly the very same host. Ethernet interfaces usually have names like eth0, wlan0, eth1:17, le0 or a plain text name in Windows. To find out which services already use ports (so ArangoDB can't bind them anymore), you can use the netstat command (it behaves a little different on each platform, run it with -lnpt on Linux, -p tcp on M acOSX or with -an on windows for valuable information). ArangoDB can also do a so called broadcast bind using tcp://0.0.0.0:8529. This way it will be reachable on all interfaces of the host. This may be useful on development systems that frequently change their network setup like laptops. Special note on IPv6 link-local addresses ArangoDB can also listen to IPv6 link-local addresses via adding the zone ID to the IPv6 address in the form address%zone-id] you only see a [ipv6-link-local- . However, what you probably instead want is to bind to a local IPv6 address. Local IPv6 addresses start with fe80: IPv6 address in your interface configuration but no IPv6 address starting with fd fd . If your interface has no local IPv6 address assigned. You can read more about IPv6 link-local addresses here. Example 427 M anaging Endpoints Bind to a link-local and local IPv6 address. unix> ifconfig This command lists all interfaces and assigned ip addresses. The link-local address may be address plus interface name). A local IPv6 address may be fd12:3456::789a server.endpoint tcp://[fe80::6257:18ff:fe82:3ec6%eth0]:8529 fe80::6257:18ff:fe82:3ec6%eth0 . To bind ArangoDB to it start arangod with (IPv6 -- . Use telnet to test the connection. unix> telnet fe80::6257:18ff:fe82:3ec6%eth0 8529 Trying fe80::6257:18ff:fe82:3ec6... Connected to my-machine. Escape character is '^]'. GET / HTTP/1.1 HTTP/1.1 301 Moved Permanently Location: /_db/_system/_admin/aardvark/index.html Content-Type: text/html Server: ArangoDB Connection: Keep-Alive Content-Length: 197 Moved Moved
This page has moved to /_db/_system/_admin/aardvark/index.html.
Reuse address --tcp.reuse-address If this boolean option is set to true then the socket option SO_REUSEADDR is set on all server endpoints, which is the default. If this option is set to false it is possible that it takes up to a minute after a server has terminated until it is possible for a new server to use the same endpoint again. This is why this is activated by default. Please note however that under some operating systems this can be a security risk because it might be possible for another process to bind to the same address and port, possibly hijacking network traffic. Under Windows, ArangoDB additionally sets the flag SO_EXCLUSIVEADDRUSE as a measure to alleviate this problem. Backlog size --tcp.backlog-size Allows to specify the size of the backlog for the listen system call The default value is 10. The maximum value is platform-dependent. Specifying a higher value than defined in the system header's SOM AXCONN may result in a warning on server start. The actual value used by listen may also be silently truncated on some platforms (this happens inside the listen system call). 428 SSL Configuration SSL Configuration SSL Endpoints Given a hostname: --server.endpoint tcp://hostname:port Given an IPv4 address: --server.endpoint tcp://ipv4-address:port Given an IPv6 address: --server.endpoint tcp://[ipv6-address]:port Note: If you are using SSL-encrypted endpoints, you must also supply the path to a server certificate using the --ssl.keyfile option. Keyfile --ssl.keyfile filename If SSL encryption is used, this option must be used to specify the filename of the server private key. The file must be PEM formatted and contain both the certificate and the server's private key. The file specified by filename can be generated using openssl: # create private key in file "server.key" openssl genrsa -des3 -out server.key 1024 # create certificate signing request (csr) in file "server.csr" openssl req -new -key server.key -out server.csr # copy away original private key to "server.key.org" cp server.key server.key.org # remove passphrase from the private key openssl rsa -in server.key.org -out server.key # sign the csr with the key, creates certificate PEM file "server.crt" openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt # combine certificate and key into single PEM file "server.pem" cat server.crt server.key > server.pem You may use certificates issued by a Certificate Authority or self-signed certificates. Self-signed certificates can be created by a tool of your choice. When using OpenSSL for creating the self-signed certificate, the following commands should create a valid keyfile: -----BEGIN CERTIFICATE----(base64 encoded certificate) -----END CERTIFICATE---------BEGIN RSA PRIVATE KEY----(base64 encoded private key) -----END RSA PRIVATE KEY----- For further information please check the manuals of the tools you use to create the certificate. CA File --ssl.cafile filename 429 SSL Configuration This option can be used to specify a file with CA certificates that are sent to the client whenever the server requests a client certificate. If the file is specified, The server will only accept client requests with certificates issued by these CAs. Do not specify this option if you want clients to be able to connect without specific certificates. The certificates in filename must be PEM formatted. SSL protocol --ssl.protocol value Use this option to specify the default encryption protocol to be used. The following variants are available: 1: SSLv2 2: SSLv2 or SSLv3 (negotiated) 3: SSLv3 4: TLSv1 5: TLSv1.2 The default value is 5 (TLSv1.2). SSL cache --ssl.session-cache value Set to true if SSL session caching should be used. value has a default value of false (i.e. no caching). SSL peer certificate This feature is available in the Enterprise Edition. --ssl.require-peer-certificate Require a peer certificate from the client before connecting. SSL options --ssl.options value This option can be used to set various SSL-related options. Individual option values must be combined using bitwise OR. Which options are available on your platform is determined by the OpenSSL version you use. The list of options available on your platform might be retrieved by the following shell command: > grep "#define SSL_OP_.*" /usr/include/openssl/ssl.h #define SSL_OP_MICROSOFT_SESS_ID_BUG 0x00000001L #define SSL_OP_NETSCAPE_CHALLENGE_BUG 0x00000002L #define SSL_OP_LEGACY_SERVER_CONNECT 0x00000004L #define SSL_OP_NETSCAPE_REUSE_CIPHER_CHANGE_BUG 0x00000008L #define SSL_OP_SSLREF2_REUSE_CERT_TYPE_BUG 0x00000010L #define SSL_OP_MICROSOFT_BIG_SSLV3_BUFFER 0x00000020L ... A description of the options can be found online in the OpenSSL documentation SSL cipher --ssl.cipher-list cipher-list This option can be used to restrict the server to certain SSL ciphers only, and to define the relative usage preference of SSL ciphers. The format of cipher-list is documented in the OpenSSL documentation. To check which ciphers are available on your platform, you may use the following shell command: 430 SSL Configuration > openssl ciphers -v ECDHE-RSA-AES256-SHA SSLv3 Kx=ECDH Au=RSA ECDHE-ECDSA-AES256-SHA SSLv3 Kx=ECDH Au=ECDSA Enc=AES(256) Enc=AES(256) Mac=SHA1 DHE-RSA-AES256-SHA SSLv3 Kx=DH Au=RSA Enc=AES(256) Mac=SHA1 DHE-DSS-AES256-SHA SSLv3 Kx=DH Au=DSS Enc=AES(256) Mac=SHA1 DHE-RSA-CAMELLIA256-SHA SSLv3 Kx=DH Au=RSA Enc=Camellia(256) Mac=SHA1 Mac=SHA1 ... The default value for cipher-list is "ALL". 431 LDAP Options LDAP This feature is only available in the Enterprise Edition. Basics Concepts The basic idea is that one can keep the user authentication setup for an ArangoDB instance (single or cluster) outside of ArangoDB in an LDAP server. A crucial feature of this is that one can add and withdraw users and permissions by only changing the LDAP server and in particular without touching the ArangoDB instance. Changes will be effective in ArangoDB within a few minutes. Since there are many different possible LDAP setups, we must support a variety of possibilities for authentication and authorization. Here is a short overview: To map ArangoDB user names to LDAP users there are two authentication methods called "simple" and "search". In the "simple" method the LDAP bind user is derived from the ArangoDB user name by prepending a prefix and appending a suffix. For example, a user "alice" could be mapped to the distinguished name uid=alice,dc=arangodb,dc=com to perform the LDAP bind and authentication. See Simple authentication method below for details and configuration options. In the "search" method there are two phases. In Phase 1 a generic read-only admin LDAP user account is used to bind to the LDAP server first and search for an LDAP user matching the ArangoDB user name. In Phase 2, the actual authentication is then performed against the LDAP user that was found in phase 1. Both methods are sensible and are recommended to use in production. See Search authentication method below for details and configuration options. Once the user is authenticated, there are now two methods for authorization: (a) "roles attribute" and (b) "roles search". In method (a) ArangoDB acquires a list of roles the authenticated LDAP user has from the LDAP server. The actual access rights to databases and collections for these roles are configured in ArangoDB itself. The user effectively has the union of all access rights of all roles he has. This method is probably the most common one for production use cases. It combines the advantages of managing users and roles outside of ArangoDB in the LDAP server with the fine grained access control within ArangoDB for the individual roles. See Roles attribute below for details about method (a) and for the associated configuration options. M ethod (b) is very similar and only differs from (a) in the way the actual list of roles of a user is derived from the LDAP server. See Roles search below for details about method (b) and for the associated configuration options. Fundamental options The fundamental options for specifying how to access the LDAP server are the following: --ldap.enabled --ldap.server --ldap.port this is a boolean option which must be set to true to activate the LDAP feature is a string specifying the host name or IP address of the LDAP server is an integer specifying the port the LDAP server is running on, the default is 389 --ldap.basedn specifies the base distinguished name under which the search takes place (can alternatively be set via --ldap.binddn and --ldap.bindpasswd ) --ldap.url are distinguished name and password for a read-only LDAP user to which ArangoDB can bind to search the LDAP server. Note that it is necessary to configure these for both the "simple" and "search" authentication methods, since even in the "simple" method, ArangoDB occasionally has to refresh the authorization information from the LDAP server even if the user session persists and no new authentication is needed! It is, however, allowed to leave both empty, but then the LDAP server must be readable with anonymous access. --ldap.refresh-rate is a floating point value in seconds. The default is 300, which means that ArangoDB will refresh the authorization information for authenticated users after at most 5 minutes. This means that changes in the LDAP server like removed users or added or removed roles for a user will be effective after at most 5 minutes. Note that the --ldap.server and --ldap.port options can alternatively be specified in the --ldap.url string together with other configuration options. For details see Section "LDAP URLs" below. Here is an example on how to configure the connection to the LDAP server, with anonymous bind: --ldap.enabled=true \ --ldap.server=ldap.arangodb.com \ 432 LDAP Options --ldap.basedn=dc=arangodb,dc=com With this configuration ArangoDB binds anonymously to the LDAP server on host executes all searches under the base distinguished name dc=arangodb,dc=com ldap.arangodb.com on the default port 389 and . If we need a user to read in LDAP here is the example for it: --ldap.enabled=true \ --ldap.server=ldap.arangodb.com \ --ldap.basedn=dc=arangodb,dc=com \ --ldap.binddn=uid=arangoadmin,dc=arangodb,dc=com \ --ldap.bindpasswd=supersecretpassword The connection is identical but the searches will be executed with the given distinguished name in binddn . Note here: The given user (or the anonymous one) needs at least read access on all user objects to find them and in the case of Roles search also read access on the objects storing the roles. Up to this point ArangoDB can now connect to a given LDAP server but it is not yet able to authenticate users properly with it. For this pick one of the following two authentication methods. LDAP URLs As an alternative one can specify the values of multiple LDAP related configuration options by specifying a single LDAP URL. Here is an example: --ldap.url ldap://ldap.arangodb.com:1234/dc=arangodb,dc=com?uid?sub This one option has the combined effect of setting the following: --ldap.server=ldap.arangodb.com \ --ldap.port=1234 \ --ldap.basedn=dc=arangodb,dc=com \ --ldap.searchAttribute=uid \ --ldap.searchScope=sub That is, the LDAP URL consists of the LDAP server and port, a basedn, a search attribute and a scope which can be one of base, one or sub. There is also the possibility to use the ldaps protocol as in: --ldap.url ldaps://ldap.arangodb.com:636/dc=arangodb,dc=com?uid?sub This does exactly the same as the one above, except that it uses the LDAP over TLS protocol. This is a non-standard method which does not involve using the STARTTLS protocol. Note that this does not work in the Windows version! We suggest to use the protocol ldap and STARTTLS as described in the next section. TLS options TLS is not supported in the Windows version of ArangoDB! To configure the usage of encrypted TLS to communicate with the LDAP server the following options are available: --ldap.tls : The main switch to active TLS. can either be default. If you switch this on and do not use the ldaps true (use TLS) or false (do not use TLS). It is switched off by protocol via the LDAP URL, then ArangoDB will use the STARTTLS protocol to initiate TLS. This is the recommended approach. --ldap.tls-version default is 1.2 : the minimal TLS version that ArangoDB should accept. Available versions are , , 1.1 and 1.2 . The . If your LDAP server does not support Version 1.2, you have to change this setting. --ldap.tls-cert-check-strategy demand 1.0 allow and try : strategy to validate the LDAP server certificate. Available strategies are . The default is hard never , hard , . 433 LDAP Options --ldap.tls-cacert-file : A file path to one or more (concatenated) certificate authority certificates in PEM format. As default no file path is configured. This certificate is used to validate the server response. --ldap.tls-cacert-dir : A directory path to certificate authority certificates in c_rehash format. As default no directory path is configured. Assuming you have the TLS CAcert file that is given to the server at /path/to/certificate.pem , here is an example on how to configure TLS: --ldap.tls true \ --ldap.tls-cacert-file /path/to/certificate.pem You can use TLS with any of the following authentication mechanisms. Esoteric options The following options can be used to configure advanced options for LDAP connectivity: --ldap.serialized : whether or not calls into the underlying LDAP library should be serialized. This option can be used to work around thread-unsafe LDAP library functionality. : sets the timeout value that is used when waiting to enter the LDAP library call serialization lock. This --ldap.serialize-timeout is only meaningful when --ldap.retries --ldap.serialized has been set to true . : number of tries to attempt a connection. Setting this to values greater than one will make ArangoDB retry to contact the LDAP server in case no connection can be made initially. Please note that some of the following options are platform-specific and may not work with all LDAP servers reliably: --ldap.restart : whether or not the LDAP library should implicitly restart connections --ldap.referrals : whether or not the LDAP library should implicitly chase referrals The following options can be used to adjust the LDAP configuration on Linux and M acOS platforms only, but will not work on Windows: --ldap.debug : turn on internal OpenLDAP library output (warning: will print to stdout). --ldap.timeout : timeout value (in seconds) for synchronous LDAP API calls (a value of 0 means default timeout). --ldap.network-timeout : timeout value (in seconds) after which network operations following the initial connection return in case of no activity (a value of 0 means default timeout). --ldap.async-connect : whether or not the connection to the LDAP library will be done asynchronously. Authentication methods In order to authenticate users in LDAP we have two options available. We need to pick exactly one them. Simple authentication method The simple authentication method is used if and only if both the --ldap.prefix and --ldap.suffix configuration options are specified and are non-empty. In all other cases the "search" authentication method is used. In the "simple" method the LDAP bind user is derived from the ArangoDB user name by prepending the value of the configuration option and by appending the value of the would be mapped to the distinguished name ldap.prefix is set to uid= and --ldap.suffix uid=alice,dc=arangodb,dc=com --ldap.suffix is set to --ldap.prefix configuration option. For example, an ArangoDB user "alice" to perform the LDAP bind and authentication, if ,dc=arangodb,dc=com -- . ArangoDB binds to the LDAP server and authenticates with the distinguished name and the password provided by the client. If the LDAP server successfully verifies the password then the user is authenticated. If you want to use this method add the following example to your ArangoDB configuration together with the fundamental configuration: --ldap.prefix uid= \ --ldap.suffix ,dc=arangodb,dc=com 434 LDAP Options This method will authenticate an LDAP user with the distinguished name alice it will search for: uid=alice,dc=arangodb,dc=com {PREFIX}{USERNAME}{SUFFIX} . This distinguished name will be used as , in this case for the arango user {{USER}} for the roles later on. Search authentication method The search authentication method is used if at least one of the two options --ldap.prefix specified. ArangoDB uses the LDAP user credentials given by the --ldap.binddn LDAP users. In this case, the values of the options , --ldap.basedn and and --ldap.suffix --ldap.bindpasswd --ldap.search-attribute , is empty or not to perform a search for and --ldap.search-filter -- are used in the following way: ldap.search-scope --ldap.search-scope is an LDAP search scope with possible values (recursive search under the base distinguished name) or one base (just search the base distinguished name), (search the base's immediate children) (default: sub is an LDAP filter expression which limits the set of LDAP users being considered (default: --ldap.search-filter sub ) objectClass=* which means all objects) --ldap.search-attribute uid specifies the attribute in the user objects which is used to match the ArangoDB user name (default: ) Here is an example on how to configure the search method. Assume we have users like the following stored in LDAP: dn: uid=alice,dc=arangodb,dc=com uid: alice objectClass: inetOrgPerson objectClass: organizationalPerson objectClass: top objectClass: person Where is the username used in ArangoDB, and we only search for objects of type uid person then we can add the following to our fundamental LDAP configuration: --ldap.search-attribute=uid \ --ldap.search-filter=objectClass=person This will use the these the dn sub search scope by default and will find all will be extracted and used as person objects where the uid is equal to the given username. From in the roles later on. {{USER}} Fetching roles for a user After authentication, the next step is to derive authorization information from the authenticated LDAP user. In order to fetch the roles and thereby the access rights for a user we again have two possible options and need to pick one of them. We can combine each authentication method with each role method. In any case a user can have no role or more than one. If a user has no role the user will not get any access to ArangoDB at all. If a user has multiple roles with different rights then the rights will be combined and the strongest right will win. Example: alice has the roles project-a and project-a has no access to collection project-b has hence alice rw BData access to collection will have rw on . project-b BData . BData , . Note that the actual database and collection access rights will be configured in ArangoDB itself by roles in the users module. The role name is always prefixed with :role: , e.g.: permissions tools in the Web interface or :role:project-a arangosh and :role:project-b respectively. You can use the normal user to configure these. Roles attribute The most important method for this is to read off the roles an LDAP user is associated with from an attribute in the LDAP user object. If the configuration option --ldap.roles-attribute-name 435 LDAP Options configuration option is set, then the value of that option is the name of the attribute being used. Here is the example to add to the overall configuration: --ldap.roles-attribute-name=role If we have the user stored like the following in LDAP: dn: uid=alice,dc=arangodb,dc=com uid: alice objectClass: inetOrgPerson objectClass: organizationalPerson objectClass: top objectClass: person role: project-a role: project-b Then the request will grant the roles within the role project-a and project-b for the user alice after successful authentication, as they are stored on the user object. Roles search An alternative method for authorization is to conduct a search in the LDAP server for LDAP objects representing roles a user has. If the configuration option --ldap.roles-search=is given, then the string {USER} in is replaced with the distinguished name of the authenticated LDAP user and the resulting search expression is used to match distinguished names of LDAP objects representing roles of that user. Example: --ldap.roles-search '(&(objectClass=groupOfUniqueNames)(uniqueMember={USER}))' After a LDAP user was found and authenticated as described in the authentication section above the will be replaced by its distinguished name, e.g. uid=alice,dc=arangodb,dc=com {USER} in the search expression , and thus with the above search expression the actual search expression would end up being: (&(objectClass=groupOfUniqueNames)(uniqueMember=uid=alice,dc=arangodb,dc=com})) This search will find all objects of groupOfUniqueNames where at least one that search would be the list of roles given by the values of the dn uniqueMember has the dn of alice . The list of results of attributes of the found role objects. Role transformations and filters For both of the above authorization methods there are further configuration options to tune the role lookup. In this section we describe these further options: --ldap.roles-include can be used to specify a regular expression that is used to filter roles. Only roles that match the regular expression are used. --ldap.roles-exclude can be used to specify a regular expression that is used to filter roles. Only roles that do not match the regular expression are used. --ldap.roles-transformation can be used to specify a regular expression and replacement text as /re/text/ . This regular expression is applied to the role name found. This is especially useful in the roles-search variant to extract the real role name out of the dn value. --ldap.superuser-role can be used to specify the role associated with the superuser. Any user belonging to this role gains superuser status. This role is checked after applying the roles-transformation expression. 436 LDAP Options Example: --ldap.roles-include "^arangodb" will only consider roles that start with arangodb . --ldap.roles-exclude=disabled will only consider roles that do contain the word disabled . --ldap.superuser-role "arangodb-admin" anyone belonging to the group "arangodb-admin" will become a superuser. The roles-transformation deserves a larger example. Assume we are using roles search and have stored roles in the following way: dn: cn=project-a,dc=arangodb,dc=com objectClass: top objectClass: groupOfUniqueNames uniqueMember: uid=alice,dc=arangodb,dc=com uniqueMember: uid=bob,dc=arangodb,dc=com cn: project-a description: Internal project A dn: cn=project-b,dc=arangodb,dc=com objectClass: top objectClass: groupOfUniqueNames uniqueMember: uid=alice,dc=arangodb,dc=com uniqueMember: uid=charlie,dc=arangodb,dc=com cn: project-b description: External project B In this case we will find :role:project-a cn=project-a,dc=arangodb,dc=com as one role of alice . However we actually want to configure a role name: which is easier to read and maintain for our administrators. If we now apply the following transformation: --ldap.roles-transformation=/^cn=([^,]*),.*$/$1/ The regex will extract out In combination with the project-a resp. superuser-role project-b of the we could make all dn attribute. project-a members arangodb admins by using: --ldap.roles-transformation=/^cn=([^,]*),.*$/$1/ \ --ldap.superuser-role=project-a Complete configuration examples In this section we would like to present complete examples for a successful LDAP configuration of ArangoDB. All of the following are just combinations of the details described above. S imple authentication with role-search, using anonymous LDAP user This example connects to the LDAP server with an anonymous read-only user. We use the simple authentication mode ( suffix ) to authenticate users and apply a role search for we extract only the cn groupOfUniqueNames objects where the user is a uniqueMember prefix + . Furthermore out of the distinguished role name. --ldap.enabled=true \ --ldap.server=ldap.arangodb.com \ --ldap.basedn=dc=arangodb,dc=com \ --ldap.prefix uid= \ --ldap.suffix ,dc=arangodb,dc=com \ --ldap.roles-search '(&(objectClass=groupOfUniqueNames)(uniqueMember={USER}))' \ 437 LDAP Options --ldap.roles-transformation=/^cn=([^,]*),.*$/$1/ \ --ldap.superuser-role=project-a S earch authentication with roles attribute using LDAP admin user having TLS enabled This example connects to the LDAP server with a given distinguished name of an admin user + password. Furthermore we activate TLS and give the certificate file to validate server responses. We use the search authentication searching for the objects. These person objects have role uid attribute of person attribute(s) containing the role(s) of a user. --ldap.enabled=true \ --ldap.server=ldap.arangodb.com \ --ldap.basedn=dc=arangodb,dc=com \ --ldap.binddn=uid=arangoadmin,dc=arangodb,dc=com \ --ldap.bindpasswd=supersecretpassword \ --ldap.tls true \ --ldap.tls-cacert-file /path/to/certificate.pem \ --ldap.search-attribute=uid \ --ldap.search-filter=objectClass=person \ --ldap.roles-attribute-name=role 438 Logging Options Command-Line Options for Logging Log levels and topics ArangoDB's log output is grouped into topics. --log.level can be specified multiple times at startup, for as many topics as needed. The log verbosity and output files can be adjusted per log topic. For example --log.level startup=trace --log.level queries=trace --log.level info will log messages concerning startup at trace level, AQL queries at trace level and everything else at info level. In a configuration file, it is written like this: [log] level = startup=trace level = queries=trace level = info Note that there must not be any whitespace around the second = . The available log levels are: fatal : only logs fatal errors error : only logs errors warning info : only logs warnings and errors : logs information messages, warnings and errors debug : logs debug and information messages, warnings and errors trace : logs trace, debug and information messages, warnings and errors Note that levels debug and trace will be very verbose. Some relevant log topics available in ArangoDB 3 are: agency : information about the agency collector : information about the WAL collector's state compactor : information about the collection datafile compactor datafiles : datafile-related operations mmap : information about memory-mapping operations (including msync) performance queries : performance-releated messages : executed AQL queries, slow queries replication requests : replication-related info : HTTP requests startup : information about server startup and shutdown threads : information about threads Log outputs The log option --log.output allows directing the global or per-topic log output to different outputs. The output definition can be one of - for stdin + for stderr syslog:// syslog:// / file:// The option can be specified multiple times in order to configure the output for different log topics. To set up a per-topic output configuration, use --log.output = , e.g. 439 Logging Options queries=file://queries.txt logs all queries to the file "queries.txt". The old option The old option -- . --log.requests-file requests=file://... Using is still available in 3.0 for convenience reasons. In 3.0 it is a shortcut for the more general option --log.file log.output file://filename is still available in 3.0. It is now a shortcut for the more general option --log.output . --log.output also allows directing log output to different files based on topics. For example, to log all AQL queries to a file "queries.log" one can use the options: --log.level queries=trace --log.output queries=file:///path/to/queries.log To additionally log HTTP request to a file named "requests.log" add the options: --log.level requests=info --log.output requests=file:///path/to/requests.log Forcing direct output The option --log.force-direct can be used to disable logging in an extra logging thread. If set to true , any log messages are immediately printed in the thread that triggered the log message. This is non-optimal for performance but can aid debugging. If set to false , log messages are handed off to an extra logging thread, which asynchronously writes the log messages. Local time Log dates and times in local time zone: --log.use-local-time If specified, all dates and times in log messages will use the server's local time-zone. If not specified, all dates and times in log messages will be printed in UTC / Zulu time. The date and time format used in logs is always UTC time is used, a Z YYYY-MM-DD HH:MM:SS , regardless of this setting. If will be appended to indicate Zulu time. Color logging --log.color value Logging to terminal output is by default colored. Colorful logging can be turned off by setting the value to false. Source file and Line number Log line number: --log.line-number Normally, if an human readable fatal, error, warning or info message is logged, no information about the file and line number is provided. The file and line number is only logged for debug and trace message. This option can be use to always log these pieces of information. Prefix Log prefix: --log.prefix prefix This option is used specify an prefix to logged text. Threads Log thread identifier: --log.thread true Whenever log output is generated, the process ID is written as part of the log information. Setting this option appends the thread id of the calling thread to the process id. For example, 2010-09-20T13:04:01Z [19355] INFO ready for business 440 Logging Options when no thread is logged and 2010-09-20T13:04:17Z [19371-18446744072487317056] ready for business when this command line option is set. To also log thread names, it is possible to set the option. By default --log.thread-name --log.thread-name is set to false . Role Log role: --log.role true When set to true , this option will make the ArangoDB logger print a single character with the server's role into each logged message. The roles are: U: undefined/unclear (used at startup) S: single server C: coordinator P: primary A: agent The default value for this option is false , so no roles will be logged. 441 General Options General Options Database Upgrade --database.auto-upgrade Specifying this option will make the server perform a database upgrade instead of starting the server normally. A database upgrade will first compare the version number stored in the file VERSION in the database directory with the current server version. If the version number found in the database directory is higher than the version number the server is running, the server expects this is an unintentional downgrade and will warn about this. Using the server in these conditions is neither recommended nor supported. If the version number found in the database directory is lower than the version number the server is running, the server will check whether there are any upgrade tasks to perform. It will then execute all required upgrade tasks and print their statuses. If one of the upgrade tasks fails, the server will exit with an error. Re-starting the server with the upgrade option will then again trigger the upgrade check and execution until the problem is fixed. Whether or not this option is specified, the server will always perform a version check on startup. Running the server with a nonmatching version number in the VERSION file will make the server refuse to start. Storage Engine As of ArangoDB 3.2 two storage engines are supported. The "traditional" engine is called MMFiles , which is also the default storage engine. An alternative engine based on RocksDB is also provided and can be turned on manually. One storage engine type is supported per server per installation. Live switching of storage engines on already installed systems isn't supported. Configuring the wrong engine (not matching the previously used one) will result in the server refusing to start. You may however use auto to let ArangoDB choose the previously used one. --server.storage-engine [auto|mmfiles|rocksdb] Daemon --daemon Runs the server as a daemon (as a background process). This parameter can only be set if the pid (process id) file is specified. That is, unless a value to the parameter pid-file is given, then the server will report an error and exit. Default Language --default-language default-language The default language ist used for sorting and comparing strings. The language value is a two-letter language code (ISO-639) or it is composed by a two-letter language code with and a two letter country code (ISO-3166). Valid languages are "de", "en", "en_US" or "en_UK". The default default-language is set to be the system locale on that platform. Supervisor --supervisor Executes the server in supervisor mode. In the event that the server unexpectedly terminates due to an internal error, the supervisor will automatically restart the server. Setting this flag automatically implies that the server will run as a daemon. Note that, as with the daemon flag, this flag requires that the pid-file parameter will set. unix> ./arangod --supervisor --pid-file /var/run/arangodb.pid /tmp/vocbase/ 2012-06-27T15:58:28Z [10133] INFO starting up in supervisor mode As can be seen (e.g. by executing the ps command), this will start a supervisor process and the actual database process: 442 General Options unix> ps fax | grep arangod 10137 ? Ssl 0:00 ./arangod --supervisor --pid-file /var/run/arangodb.pid /tmp/vocbase/ 10142 ? Sl 0:00 \_ ./arangod --supervisor --pid-file /var/run/arangodb.pid /tmp/vocbase/ When the database process terminates unexpectedly, the supervisor process will start up a new database process: > kill -SIGSEGV 10142 > ps fax | grep arangod 10137 ? Ssl 0:00 ./arangod --supervisor --pid-file /var/run/arangodb.pid /tmp/vocbase/ 10168 ? Sl 0:00 \_ ./arangod --supervisor --pid-file /var/run/arangodb.pid /tmp/vocbase/ User identity --uid uid The name (identity) of the user the server will run as. If this parameter is not specified, the server will not attempt to change its UID, so that the UID used by the server will be the same as the UID of the user who started the server. If this parameter is specified, then the server will change its UID after opening ports and reading configuration files, but before accepting connections or opening other files (such as recovery files). This is useful when the server must be started with raised privileges (in certain environments) but security considerations require that these privileges be dropped once the server has started work. Observe that this parameter cannot be used to bypass operating system security. In general, this parameter (and its corresponding relative gid) can lower privileges but not raise them. Group identity --gid gid The name (identity) of the group the server will run as. If this parameter is not specified, then the server will not attempt to change its GID, so that the GID the server runs as will be the primary group of the user who started the server. If this parameter is specified, then the server will change its GID after opening ports and reading configuration files, but before accepting connections or opening other files (such as recovery files). This parameter is related to the parameter uid. Process identity --pid-file filename The name of the process ID file to use when running the server as a daemon. This parameter must be specified if either the flag daemon or supervisor is set. Check max memory mappings --server.check-max-memory-mappings can be used on Linux to make arangod check the number of memory mappings currently used by the process (as reported in /proc//maps) and compare it with the maximum number of allowed mappings as determined by /proc/sys/vm/max_map_count. If the current number of memory mappings gets near the maximum allowed value, arangod will log a warning and disallow the creation of further V8 contexts temporarily until the current number of mappings goes down again. If the option is set to false, no such checks will be performed. All non-Linux operating systems do not provide this option and will ignore it. Console --console Runs the server in an exclusive emergency console mode. When starting the server with this option, the server is started with an interactive JavaScript emergency console, with all networking and HTTP interfaces of the server disabled. No requests can be made to the server in this mode, and the only way to work with the server in this mode is by using the emergency console. Note that the server cannot be started in this mode if it is already running in this or another mode. 443 General Options Random Generator --random.generator arg The argument is an integer (1,2,3 or 4) which sets the manner in which random numbers are generated. The default method (3) is to use the a non-blocking random (or pseudorandom) number generator supplied by the operating system. Specifying an argument of 2, uses a blocking random (or pseudorandom) number generator. Specifying an argument 1 sets a pseudorandom number generator using an implication of the M ersenne Twister M T19937 algorithm. Algorithm 4 is a combination of the blocking random number generator and the M ersenne Twister. Enable/disable authentication --server.authentication Setting this option to false will turn off authentication on the server side so all clients can execute any action without authorization and privilege checks. The default value is true. JWT Secret --server.jwt-secret secret ArangoDB will use JWTs to authenticate requests. Using this option let's you specify a JWT. When specified, the JWT secret must be at most 64 bytes long. In single server setups and when not specifying this secret ArangoDB will generate a secret. In cluster deployments which have authentication enabled a secret must be set consistently across all cluster nodes so they can talk to each other. Enable/disable authentication for UNIX domain sockets --server.authentication-unix-sockets value Setting value to true will turn off authentication on the server side for requests coming in via UNIX domain sockets. With this flag enabled, clients located on the same host as the ArangoDB server can use UNIX domain sockets to connect to the server without authentication. Requests coming in by other means (e.g. TCP/IP) are not affected by this option. The default value is false. Note: this option is only available on platforms that support UNIX domain sockets. Enable/disable authentication for system API requests only --server.authentication-system-only boolean Controls whether incoming requests need authentication only if they are directed to the ArangoDB's internal APIs and features, located at /_api/, /_admin/ etc. If the flag is set to true, then HTTP authentication is only required for requests going to URLs starting with /_, but not for other URLs. The flag can thus be used to expose a user-made API without HTTP authentication to the outside world, but to prevent the outside world from using the ArangoDB API and the admin interface without authentication. Note that checking the URL is performed after any database name prefix has been removed. That means when the actual URL called is /_db/_system/myapp/myaction, the URL /myapp/myaction will be used for authentication-system-only check. The default is true. Note that authentication still needs to be enabled for the server regularly in order for HTTP authentication to be forced for the ArangoDB API and the web interface. Setting only this flag is not enough. You can control ArangoDB's general authentication feature with the --server.authentication flag. Enable authentication cache timeout 444 General Options --server.authentication-timeout value Sets the cache timeout to value (in seconds). This is only necessary if you use an external authentication system like LDAP. Enable local authentication --server.local-authentication value If set to false only use the external authentication system. If true also use the local _users collections. The default value is true. Enable/disable replication applier --database.replication-applier flag If false the server will start with replication appliers turned off, even if the replication appliers are configured with the autoStart option. Using the command-line option will not change the value of the autoStart option in the applier configuration, but will suppress autostarting the replication applier just once. If the option is not used, ArangoDB will read the applier configuration from the file REPLICATION-APPLIER-CONFIG on startup, and use the value of the autoStart attribute from this file. The default is true. Keep-alive timeout --http.keep-alive-timeout Allows to specify the timeout for HTTP keep-alive connections. The timeout value must be specified in seconds. Idle keep-alive connections will be closed by the server automatically when the timeout is reached. A keep-alive-timeout value 0 will disable the keep alive feature entirely. Hide Product header --http.hide-product-header If true, the server will exclude the HTTP header "Server: ArangoDB" in HTTP responses. If set to false, the server will send the header in responses. The default is false. Allow method override --http.allow-method-override When this option is set to true, the HTTP request method will optionally be fetched from one of the following HTTP request headers if present in the request: x-http-method x-http-method-override x-method-override If the option is set to true and any of these headers is set, the request method will be overridden by the value of the header. For example, this allows issuing an HTTP DELETE request which to the outside world will look like an HTTP GET request. This allows bypassing proxies and tools that will only let certain request types pass. Setting this option to true may impose a security risk so it should only be used in controlled environments. The default value for this option is false. Server threads --server.threads number Specifies the number of threads that are spawned to handle requests. 445 General Options Toggling server statistics --server.statistics value If this option is value is false, then ArangoDB's statistics gathering is turned off. Statistics gathering causes regular CPU activity so using this option to turn it off might relieve heavy-loaded instances a bit. Session timeout time to live for server sessions --server.session-timeout value The timeout for web interface sessions, using for authenticating requests to the web interface (/_admin/aardvark) and related areas. Sessions are only used when authentication is turned on. Foxx queues enable or disable the Foxx queues feature --foxx.queues flag If true, the Foxx queues will be available and jobs in the queues will be executed asynchronously. The default is true. When set to false the queue manager will be disabled and any jobs are prevented from being processed, which may reduce CPU load a bit. Foxx queues poll interval poll interval for Foxx queues --foxx.queues-poll-interval value The poll interval for the Foxx queues manager. The value is specified in seconds. Lower values will mean more immediate and more frequent Foxx queue job execution, but will make the queue thread wake up and query the queues more often. When set to a low value, the queue thread might cause CPU load. The default is 1 second. If Foxx queues are not used much, then this value may be increased to make the queues thread wake up less. Directory --database.directory directory The directory containing the collections and datafiles. Defaults to /var/lib/arango. When specifying the database directory, please make sure the directory is actually writable by the arangod process. You should further not use a database directory which is provided by a network filesystem such as NFS. The reason is that networked filesystems might cause inconsistencies when there are multiple parallel readers or writers or they lack features required by arangod (e.g. flock()). directory When using the command line version, you can simply supply the database directory as argument. Examples > ./arangod --server.endpoint tcp://127.0.0.1:8529 --database.directory /tmp/vocbase Database directory state precondition --database.require-directory-state state Using this option it is possible to require the database directory to be in a specific state on startup. the options for this value are: non-existing: database directory must not exist existing: database directory must exist empty: database directory must exist but be empty populated: database directory must exist and contain specific files already any: any directory state allowed 446 General Options Journal size --database.maximal-journal-size size M aximal size of journal in bytes. Can be overwritten when creating a new collection. Note that this also limits the maximal size of a single document. The default is 32MB. Wait for sync default wait for sync behavior --database.wait-for-sync boolean Default wait-for-sync value. Can be overwritten when creating a new collection. The default is false. Force syncing of properties force syncing of collection properties to disk --database.force-sync-properties boolean Force syncing of collection properties to disk after creating a collection or updating its properties. If turned off, no fsync will happen for the collection and database properties stored in parameter.json files in the file system. Turning off this option will speed up workloads that create and drop a lot of collections (e.g. test suites). The default is true. Limiting memory for AQL queries --query.memory-limit value The default maximum amount of memory (in bytes) that a single AQL query can use. When a single AQL query reaches the specified limit value, the query will be aborted with a resource limit exceeded exception. In a cluster, the memory accounting is done per shard, so the limit value is effectively a memory limit per query per shard. The global limit value can be overriden per query by setting the memoryLimit option value for individual queries when running an AQL query. The default value is 0, meaning that there is no memory limit. Turning AQL warnings into errors --query.fail-on-warning value When set to true, AQL queries that produce warnings will instantly abort and throw an exception. This option can be set to catch obvious issues with AQL queries early. When set to false, AQL queries that produce warnings will not abort and return the warnings along with the query results. The option can also be overridden for each individual AQL query. Enable/disable AQL query tracking --query.tracking flag If true, the server's AQL slow query tracking feature will be enabled by default. Tracking of queries can be disabled by setting the option to false. The default is true. Enable/disable tracking of bind variables in AQL queries --query.tracking-with-bindvars flag If true, then the bind variables will be tracked for all running and slow AQL queries. This option only has an effect if --query.tracking was set to true. Tracking of bind variables can be disabled by setting the option to false. The default is true. 447 General Options Threshold for slow AQL queries --query.slow-threshold value By setting value it can be controlled after what execution time an AQL query is considered "slow". Any slow queries that exceed the execution time specified in value will be logged when they are finished. The threshold value is specified in seconds. Tracking of slow queries can be turned off entirely by setting the option --query.tracking to false. The default value is 10.0. Query registry timeout --query.registry-ttl value The default timeout for AQL query parts to stay alive in the cluster. The default value is 600 seconds. Query parts that are not used for the configured amount of time will expire automatically and will be aborted. The value of this option normally only needs to be increased for queries that are running longer than the default timeout value (600 seconds) and that time out. The option has no effect in singleserver mode. Limiting the number of query execution plans created by the AQL optimizer --query.optimizer-max-plans value By setting value it can be controlled how many different query execution plans the AQL query optimizer will generate at most for any given AQL query. Normally the AQL query optimizer will generate a single execution plan per AQL query, but there are some cases in which it creates multiple competing plans. M ore plans can lead to better optimized queries, however, plan creation has its costs. The more plans are created and shipped through the optimization pipeline, the more time will be spent in the optimizer. Lowering value will make the optimizer stop creating additional plans when it has already created enough plans. Note that this setting controls the default maximum number of plans to create. The value can still be adjusted on a per-query basis by setting the maxNumberOfPlans attribute when running a query. The default value is 128. Throw collection not loaded error --database.throw-collection-not-loaded-error flag Accessing a not-yet loaded collection will automatically load a collection on first access. This flag controls what happens in case an operation would need to wait for another thread to finalize loading a collection. If set to true, then the first operation that accesses an unloaded collection will load it. Further threads that try to access the same collection while it is still loading will get an error (1238, collection not loaded). When the initial operation has completed loading the collection, all operations on the collection can be carried out normally, and error 1238 will not be thrown. If set to false, the first thread that accesses a not-yet loaded collection will still load it. Other threads that try to access the collection while loading will not fail with error 1238 but instead block until the collection is fully loaded. This configuration might lead to all server threads being blocked because they are all waiting for the same collection to complete loading. Setting the option to true will prevent this from happening, but requires clients to catch error 1238 and react on it (maybe by scheduling a retry for later). The default value is false. AQL Query caching mode --query.cache-mode Toggles the AQL query cache behavior. Possible values are: off: do not use query cache on: always use query cache, except for queries that have their cache attribute set to false demand: use query cache only for queries that have their cache attribute set to true AQL Query cache size --query.cache-entries 448 General Options M aximum number of query results that can be stored per database-specific query cache. If a query is eligible for caching and the number of items in the database's query cache is equal to this threshold value, another cached query result will be removed from the cache. This option only has an effect if the query cache mode is set to either on or demand. JavaScript code execution --javascript.allow-admin-execute This option can be used to control whether user-defined JavaScript code is allowed to be executed on server by sending via HTTP to the API endpoint /_admin/execute with an authenticated user account. The default value is false, which disables the execution of user- defined code. This is also the recommended setting for production. In test environments, it may be convenient to turn the option on in order to send arbitrary setup or teardown commands for execution on the server. V8 contexts --javascript.v8-contexts number Specifies the maximum number of V8 contexts that are created for executing JavaScript code. M ore contexts allow executing more JavaScript actions in parallel, provided that there are also enough threads available. Please note that each V8 context will use a substantial amount of memory and requires periodic CPU processing time for garbage collection. Note that this value configures the maximum number of V8 contexts that can be used in parallel. Upon server start only as many V8 contexts will be created as are configured in option float at runtime between --javascript.v8-contexts-minimum --javascript.v8-contexts-minimum and . The actual number of available V8 contexts may --javascript.v8-contexts . When there are unused V8 contexts that linger around, the server's garbage collector thread will automatically delete them. --javascript.v8-contexts-minimum number Specifies the minimum number of V8 contexts that will be present at any time the server is running. The actual number of V8 contexts will never drop below this value, but it may go up as high as specified via the option --javascript.v8-contexts When there are unused V8 contexts that linger around and the number of V8 contexts is greater than . --javascript.v8-contexts-minimum the server's garbage collector thread will automatically delete them. --javascript.v8-contexts-max-invocations Specifies the maximum number of invocations after which a used V8 context is disposed. The default value of max-invocations --javascript.v8-contexts- is 0, meaning that the maximum number of invocations per context is unlimited. --javascript.v8-contexts-max-age Specifies the time duration (in seconds) after which time a V8 context is disposed automatically after its creation. If the time is elapsed, the context will be disposed. The default value for --javascript.v8-contexts-max-age If both and --javascript.v8-contexts-max-invocations is 60 seconds. --javascript.v8-contexts-max-age are set, then the context will be destroyed when either of the specified threshold values is reached. Garbage collection frequency (time-based) --javascript.gc-frequency frequency Specifies the frequency (in seconds) for the automatic garbage collection of JavaScript objects. This setting is useful to have the garbage collection still work in periods with no or little numbers of requests. Garbage collection interval (request-based) --javascript.gc-interval interval Specifies the interval (approximately in number of requests) that the garbage collection for JavaScript objects will be run in each thread. V8 options --javascript.v8-options options 449 General Options Optional arguments to pass to the V8 Javascript engine. The V8 engine will run with default settings unless explicit options are specified using this option. The options passed will be forwarded to the V8 engine which will parse them on its own. Passing invalid options may result in an error being printed on stderr and the option being ignored. Options need to be passed in one string, with V8 option names being prefixed with double dashes. M ultiple options need to be separated by whitespace. To get a list of all available V8 options, you can use the value "--help" as follows: --javascript.v8-options="--help" Another example of specific V8 options being set at startup: --javascript.v8-options="--log" Names and features or usable options depend on the version of V8 being used, and might change in the future if a different version of V8 is being used in ArangoDB. Not all options offered by V8 might be sensible to use in the context of ArangoDB. Use the specific options only if you are sure that they are not harmful for the regular database operation. 450 Write-Ahead Log Options MMFiles Write-ahead log options Since ArangoDB 2.2, the M M Files storage engine will write all data-modification operations into its write-ahead log. With ArangoDB 3.2 another Storage engine option becomes available - RocksDB. In case of using RocksDB most of the subsequent options don't have a useful meaning. The write-ahead log is a sequence of logfiles that are written in an append-only fashion. Full logfiles will eventually be garbage-collected, and the relevant data might be transferred into collection journals and datafiles. Unneeded and already garbage-collected logfiles will either be deleted or kept for the purpose of keeping a replication backlog. Directory The WAL logfiles directory: --wal.directory Specifies the directory in which the write-ahead logfiles should be stored. If this option is not specified, it defaults to the subdirectory journals in the server's global database directory. If the directory is not present, it will be created. Logfile size the size of each WAL logfile --wal.logfile-size Specifies the filesize (in bytes) for each write-ahead logfile. The logfile size should be chosen so that each logfile can store a considerable amount of documents. The bigger the logfile size is chosen, the longer it will take to fill up a single logfile, which also influences the delay until the data in a logfile will be garbage-collected and written to collection journals and datafiles. It also affects how long logfile recovery will take at server start. Allow oversize entries whether or not oversize entries are allowed --wal.allow-oversize-entries Whether or not it is allowed to store individual documents that are bigger than would fit into a single logfile. Setting the option to false will make such operations fail with an error. Setting the option to true will make such operations succeed, but with a high potential performance impact. The reason is that for each oversize operation, an individual oversize logfile needs to be created which may also block other operations. The option should be set to false if it is certain that documents will always have a size smaller than a single logfile. Number of reserve logfiles maximum number of reserve logfiles --wal.reserve-logfiles The maximum number of reserve logfiles that ArangoDB will create in a background process. Reserve logfiles are useful in the situation when an operation needs to be written to a logfile but the reserve space in the logfile is too low for storing the operation. In this case, a new logfile needs to be created to store the operation. Creating new logfiles is normally slow, so ArangoDB will try to pre-create logfiles in a background process so there are always reserve logfiles when the active logfile gets full. The number of reserve logfiles that ArangoDB keeps in the background is configurable with this option. Number of historic logfiles maximum number of historic logfiles --wal.historic-logfiles The maximum number of historic logfiles that ArangoDB will keep after they have been garbage-collected. If no replication is used, there is no need to keep historic logfiles except for having a local changelog. In a replication setup, the number of historic logfiles affects the amount of data a slave can fetch from the master's logs. The more historic logfiles, the more historic data is available for a slave, which is useful if the connection between master and slave is unstable or slow. Not having enough historic logfiles available might lead to logfile data being deleted on the master already before a slave has fetched it. Sync interval 451 Write-Ahead Log Options interval for automatic, non-requested disk syncs --wal.sync-interval The interval (in milliseconds) that ArangoDB will use to automatically synchronize data in its write-ahead logs to disk. Automatic syncs will only be performed for not-yet synchronized data, and only for operations that have been executed without the waitForSync attribute. Flush timeout WAL flush timeout `--wal.flush-timeout The timeout (in milliseconds) that ArangoDB will at most wait when flushing a full WAL logfile to disk. When the timeout is reached and the flush is not completed, the operation that requested the flush will fail with a lock timeout error. Throttling Throttle writes to WAL when at least such many operations are waiting for garbage collection: --wal.throttle-when-pending The maximum value for the number of write-ahead log garbage-collection queue elements. If set to 0, the queue size is unbounded, and no write-throttling will occur. If set to a non-zero value, write-throttling will automatically kick in when the garbage-collection queue contains at least as many elements as specified by this option. While write-throttling is active, data-modification operations will intentionally be delayed by a configurable amount of time. This is to ensure the write-ahead log garbage collector can catch up with the operations executed. Write-throttling will stay active until the garbage-collection queue size goes down below the specified value. Writethrottling is turned off by default. --wal.throttle-wait This option determines the maximum wait time (in milliseconds) for operations that are write-throttled. If write-throttling is active and a new write operation is to be executed, it will wait for at most the specified amount of time for the write-ahead log garbage-collection queue size to fall below the throttling threshold. If the queue size decreases before the maximum wait time is over, the operation will be executed normally. If the queue size does not decrease before the wait time is over, the operation will be aborted with an error. This option only has an effect if --wal.throttle-when-pending has a non-zero value, which is not the default. Number of slots M aximum number of slots to be used in parallel: --wal.slots Configures the amount of write slots the write-ahead log can give to write operations in parallel. Any write operation will lease a slot and return it to the write-ahead log when it is finished writing the data. A slot will remain blocked until the data in it was synchronized to disk. After that, a slot becomes reusable by following operations. The required number of slots is thus determined by the parallelity of write operations and the disk synchronization speed. Slow disks probably need higher values, and fast disks may only require a value lower than the default. Ignore logfile errors Ignore logfile errors when opening logfiles: --wal.ignore-logfile-errors Ignores any recovery errors caused by corrupted logfiles on startup. When set to false, the recovery procedure on startup will fail with an error whenever it encounters a corrupted (that includes only half-written) logfile. This is a security precaution to prevent data loss in case of disk errors etc. When the recovery procedure aborts because of corruption, any corrupted files can be inspected and fixed (or removed) manually and the server can be restarted afterwards. Setting the option to true will make the server continue with the recovery procedure even in case it detects corrupt logfile entries. In this case it will stop at the first corrupted logfile entry and ignore all others, which might cause data loss. Ignore recovery errors Ignore recovery errors: --wal.ignore-recovery-errors Ignores any recovery errors not caused by corrupted logfiles but by logical errors. Logical errors can occur if logfiles or any other server datafiles have been manually edited or the server is somehow misconfigured. 452 Write-Ahead Log Options Ignore (non-WAL) datafile errors Ignore datafile errors when loading collections: If set to false --database.ignore-datafile-errors boolean , CRC mismatch and other errors in collection datafiles will lead to a collection not being loaded at all. The collection in this case becomes unavailable. If such collection needs to be loaded during WAL recovery, the WAL recovery will also abort (if not forced with option --wal.ignore-recovery-errors true Setting this flag to false ). protects users from unintentionally using a collection with corrupted datafiles, from which only a subset of the original data can be recovered. Working with such collection could lead to data loss and follow up errors. In order to access such collection, it is required to inspect and repair the collection datafile with the datafile debugger (arango-dfdb). If set to true , CRC mismatch and other errors during the loading of a collection will lead to the datafile being partially loaded, up to the position of the first error. All data up to until the invalid position will be loaded. This will enable users to continue with collection datafiles even if they are corrupted, but this will result in only a partial load of the original data and potential follow up errors. The WAL recovery will still abort when encountering a collection with a corrupted datafile, at least if true --wal.ignore-recovery-errors is not set to . Setting the option to true will also automaticall repair potentially corrupted VERSION files of databases on startup, so that the startup can proceed. The default value is false, so collections with corrupted datafiles will not be loaded at all, preventing partial loads and follow up errors. However, if such collection is required at server startup, during WAL recovery, the server will abort the recovery and refuse to start. 453 Compaction Options MMFiles Compaction options The ArangoDB M M Files storage engine will run a compaction over data files. ArangoDB writes Documents in the WAL file. Once they have been sealed in the wal file, the collector may copy them into a per collection journal file. Once journal files fill up, they're sealed to become data files. => one collection may have documents in the WAL logs, its journal file, and an arbitrary number of data files. If a collection is loaded, each of these files are opened (thus use a file handle) and are mmap'ed. Since file handles and memory mapped files are also a sparse resource, that number should be kept low. Once you update or remove documents from data files (or already did while it was the journal file) these documents are marked as 'dead' with a deletion marker. Over time the number of dead documents may rise, and we don't want to use the previously mentioned resources, plus the disk space should be given back to the system. Thus several journal files can be combined to one, ommitting the dead documents. Combining several of these data files into one is called compaction. The compaction process reads the alive documents from the original data files, and writes them into new data file. Once that is done, the memory mappings to the old data files is released, and the files are erased. Since the compaction locks the collection, and also uses I/O resources, its carefully configurable under which conditions the system should perform which amount of these compaction jobs: ArangoDB spawns one compactor thread per database. The settings below vary in scope. Activity control The activity control parameters alter the behaviour in terms of scan / execution frequency of the compaction. Sleep interval between two compaction runs (in seconds): --compaction.db-sleep-time The number of seconds the collector thread will wait between two attempts to search for compactable data files of collections in one Database. If the compactor has actually executed work, a subsequent lookup is done. Scope: Database. M inimum sleep time between two compaction runs (in seconds): --compaction.min-interval When an actual compaction was executed for one collection, we wait for this time before we execute the compaction on this collection again. This is here to let eventually piled up user load be worked out. Scope: collection. Source data files These parameters control which data files are taken into account for a compaction run. You can specify several criteria which each off may be sufficcient alone. The scan over the data files belonging to one collection is executed from oldest data file to newest; if files qualify for a compaction they may be merged with newer files (containing younger documents) Scope: Collection level, some are influenced by collection settings. minimal filesize threshold original data files have to be below for a compaction: --compaction.min-small-data-file-size This is the threshold which controls below which minimum total size a data file will always be taken into account for the compaction. M inimum unused count of documents in a datafile: --compaction.dead-documents-threshold Data files will often contain dead documents. This parameter specifies their top most accetpeable count until the data file qualifies for compaction. How many bytes of the source data file are allowed to be unused at most: --compaction.dead-size-threshold The dead data size varies along with the size of your documents. If you have many big documents, this threshold may hit before the document count threshold. 454 Compaction Options How many percent of the source data file should be unused at least: --compaction.dead-size-percent-threshold since the size of the documents may vary this threshold works on the percentage of the dead documents size. Thus, if you have many huge dead documents, this threshold kicks in earlier. To name an example with numbers, if the data file contains 800 kbytes of alive and 400 kbytes of dead documents, the share of the dead documents is: 400 / (400 + 800) = 33 % . If this value if higher than the specified threshold, the data file will be compacted. Compacted target files Once data files of a collection are qualified for a compaction run, these parameters control how many data files are merged into one, (or even one source data file may be compacted into one smaller target data file) Scope: Collection level, some are influenced by collection settings. M aximum number of files to merge to one file: --compaction.dest-max-files How many data files (at most) we may merge into one resulting data file during one compaction run. How large the resulting file may be in comparison to the collections file-size-factor In ArangoDB you can configure a default database.maximal-journal-size journal filesize setting: --compaction.dest-max- globally and override it on a per collection level. This value controls the size of collected data files relative to the configured journal file size of the collection in question. A factor of 3 means that the maximum filesize of the compacted file is 3 times the size of the maximum collection journal file size. how large may the compaction result file become: --compaction.dest-max-result-file-size next to the factor above, a totally maximum allowed filesize in bytes may be specified. This will overrule all previous parameters. 455 Cluster Options Clusters Options Agency endpoint List of agency endpoints: --cluster.agency-endpoint endpoint An agency endpoint the server can connect to. The option can be specified multiple times, so the server can use a cluster of agency servers. Endpoints have the following pattern: tcp://ipv4-address:port - TCP/IP endpoint, using IPv4 tcp://[ipv6-address]:port - TCP/IP endpoint, using IPv6 ssl://ipv4-address:port - TCP/IP endpoint, using IPv4, SSL encryption ssl://[ipv6-address]:port - TCP/IP endpoint, using IPv6, SSL encryption At least one endpoint must be specified or ArangoDB will refuse to start. It is recommended to specify at least two endpoints so ArangoDB has an alternative endpoint if one of them becomes unavailable. Examples --cluster.agency-endpoint tcp://192.168.1.1:4001 --cluster.agency-endpoint tcp://192.168.1.2:4002 ... My address This server's address / endpoint: --cluster.my-address endpoint The server's endpoint for cluster-internal communication. If specified, it must have the following pattern: tcp://ipv4-address:port - TCP/IP endpoint, using IPv4 tcp://[ipv6-address]:port - TCP/IP endpoint, using IPv6 ssl://ipv4-address:port - TCP/IP endpoint, using IPv4, SSL encryption ssl://[ipv6-address]:port - TCP/IP endpoint, using IPv6, SSL encryption If no endpoint is specified, the server will look up its internal endpoint address in the agency. If no endpoint can be found in the agency for the server's id, ArangoDB will refuse to start. Examples Listen only on interface with address 192.168.1.1 --cluster.my-address tcp://192.168.1.1:8530 Listen on all ipv4 and ipv6 addresses, which are configured on port 8530 --cluster.my-address ssl://[::]:8530 My role This server's role: --cluster.my-role [dbserver|coordinator] The server's role. Is this instance a db server (backend data server) or a coordinator (frontend server for external and application access) Node ID (deprecated) This server's id: --cluster.my-local-info info Some local information about the server in the cluster, this can for example be an IP address with a process ID or any string unique to the server. Specifying info is mandatory on startup if the server id (see below) is not specified. Each server of the cluster must have a unique local info. This is ignored if my-id below is specified. 456 Cluster Options This option is deprecated and will be removed in a future release. The cluster node ids have been dropped in favour of once generated UUIDs. More advanced options (should generally remain untouched) Synchroneous replication timing: --cluster.synchronous-replication-timeout-factor double Strech or clinch timeouts for internal synchroneous replication mechanism between db servers. All such timeouts are affected by this change. Please change only with intent and great care. Default at System replication factor: 1.0 . --cluster.system-replication-factorinteger Change default replication factor for system collections. Default at 2 . 457 RocksDB Engine Options RocksDB engine options RocksDB is a highly configurable key-value store used to power our RocksDB storage engine. M ost of the options on this page are passthrough options to the underlying RocksDB instance, and we change very few of their default settings. Depending on the storage engine you have chosen the availability and the scope of these options changes. In case you have chosen mmfiles some of the following options apply to persistent indexes. In case of rocksdb it will apply to all data stored as well as indexes. Pass-through options --rocksdb.wal-directory Absolute path for the RocksDB WAL files. If left empty, this will use a subdirectory journals inside the data directory. Write buffers --rocksdb.write-buffer-size The amount of data to build up in each in-memory buffer (backed by a log file) before closing the buffer and queuing it to be flushed into standard storage. Default: 64M iB. Larger values may improve performance, especially for bulk loads. --rocksdb.max-write-buffer-number The maximum number of write buffers that built up in memory. If this number is reached before the buffers can be flushed, writes will be slowed or stalled. Default: 2. --rocksdb.total-write-buffer-size (Hidden) The total amount of data to build up in all in-memory buffers (backed by log files). This option, together with the block cache size configuration option, can be used to limit memory usage. If set to 0, the memory usage is not limited. Default: 0 (disabled). --rocksdb.min-write-buffer-number-to-merge M inimum number of write buffers that will be merged together when flushing to normal storage. Default: 1. --rocksdb.max-total-wal-size M aximum total size of WAL files that, when reached, will force a flush of all column families whose data is backed by the oldest WAL files. Setting this to a low value will trigger regular flushing of column family data from memtables, so that WAL files can be moved to the archive. Setting this to a high value will avoid regular flushing but may prevent WAL files from being moved to the archive and being removed. --rocksdb.delayed-write-rate (Hidden) Limited write rate to DB (in bytes per second) if we are writing to the last in-memory buffer allowed and we allow more than 3 buffers. Default: 16M iB/s. LSM tree structure --rocksdb.num-levels The number of levels for the database in the LSM tree. Default: 7. --rocksdb.num-uncompressed-levels The number of levels that do not use compression. The default value is 2. Levels above this number will use Snappy compression to reduce the disk space requirements for storing data in these levels. --rocksdb.dynamic-level-bytes If true, the amount of data in each level of the LSM tree is determined dynamically so as to minimize the space amplification; otherwise, the level sizes are fixed. The dynamic sizing allows RocksDB to maintain a well-structured LSM tree regardless of total data size. Default: true. 458 RocksDB Engine Options --rocksdb.max-bytes-for-level-base The maximum total data size in bytes in level-1 of the LSM tree. Only effective if --rocksdb.dynamic-level-bytes is false. Default: 256M iB. --rocksdb.max-bytes-for-level-multiplier The maximum total data size in bytes for level L of the LSM tree can be calculated as multiplier ^ (L-1)) . Only effective if --rocksdb.dynamic-level-bytes max-bytes-for-level-base * (max-bytes-for-level- is false. Default: 10. --rocksdb.level0-compaction-trigger Compaction of level-0 to level-1 is triggered when this many files exist in level-0. Setting this to a higher number may help bulk writes at the expense of slowing down reads. Default: 2. --rocksdb.level0-slowdown-trigger When this many files accumulate in level-0, writes will be slowed down to --rocksdb.delayed-write-rate to allow compaction to catch up. Default: 20. --rocksdb.level0-stop-trigger When this many files accumulate in level-0, writes will be stopped to allow compaction to catch up. Default: 36. File I/O --rocksdb.compaction-read-ahead-size If non-zero, we perform bigger reads when doing compaction. If you're running RocksDB on spinning disks, you should set this to at least 2M iB. That way RocksDB's compaction is doing sequential instead of random reads. Default: 0. --rocksdb.use-direct-reads (Hidden) Only meaningful on Linux. If set, use O_DIRECT for reading files. Default: false. --rocksdb.use-direct-io-for-flush-and-compaction Only meaningful on Linux. If set, use --rocksdb.use-fsync If set, issue an fsync O_DIRECT (Hidden) for writing files. Default: false. (Hidden) call when writing to disk (set to false to issue fdatasync only. Default: false. Background tasks --rocksdb.max-background-jobs M aximum number of concurrent background compaction jobs, submitted to the low priority thread pool. Default: number of processors. --rocksdb.num-threads-priority-high Number of threads for high priority operations (e.g. flush). We recommend setting this equal to max-background-flushes . Default: number of processors / 2. --rocksdb.num-threads-priority-low Number of threads for low priority operations (e.g. compaction). Default: number of processors / 2. Caching --rocksdb.block-cache-size This is the size of the block cache in bytes. Increasing this may improve performance. If there is less than 4GiB of RAM on the system, the default value is 256M iB. If there is more, the default is (system RAM size - 2GiB) * 0.3 . --rocksdb.block-cache-shard-bits The number of bits used to shard the block cache to allow concurrent operations. To keep individual shards at a reasonable size (i.e. at least 512KB), keep this value to at most block-cache-shard-bits / 512KB . Default: block-cache-size / 2^19 . --rocksdb.table-block-size 459 RocksDB Engine Options Approximate size of user data (in bytes) packed per block for uncompressed data. --rocksdb.recycle-log-file-num (Hidden) Number of log files to keep around for recycling. Default: 0. Miscellaneous (Hidden) --rocksdb.optimize-filters-for-hits This flag specifies that the implementation should optimize the filters mainly for cases where keys are found rather than also optimize for the case where keys are not. This would be used in cases where the application knows that there are very few misses or the performance in the case of misses is not as important. Default: false. --rocksdb.wal-recovery-skip-corrupted (Hidden) If true, skip corrupted records in WAL recovery. Default: false. Non-Pass-Through Options --rocksdb.wal-file-timeout (Hidden) Timeout after which unused WAL files are deleted (in seconds). Default: 10.0s. Data of ongoing transactions is stored in RAM . Transactions that get too big (in terms of number of operations involved or the total size of data created or modified by the transaction) will be committed automatically. Effectively this means that big user transactions are split into multiple smaller RocksDB transactions that are committed individually. The entire user transaction will not necessarily have ACID properties in this case. The following options can be used to control the RAM usage and automatic intermediate commits for the RocksDB engine: --rocksdb.max-transaction-size Transaction size limit (in bytes). Transactions store all keys and values in RAM , so large transactions run the risk of causing out-ofmemory sitations. This setting allows you to ensure that does not happen by limiting the size of any individual transaction. Transactions whose operations would consume more RAM than this threshold value will abort automatically with error 32 ("resource limit exceeded"). --rocksdb.intermediate-commit-size If the size of all operations in a transaction reaches this threshold, the transaction is committed automatically and a new transaction is started. The value is specified in bytes. --rocksdb.intermediate-commit-count If the number of operations in a transaction reaches this value, the transaction is committed automatically and a new transaction is started. --rocksdb.throttle If enabled, throttles the ingest rate of writes if necessary to reduce chances of compactions getting too far behind and blocking incoming writes. This option is true by default. --rocksdb.sync-interval The interval (in milliseconds) that ArangoDB will use to automatically synchronize data in RocksDB's write-ahead logs to disk. Automatic syncs will only be performed for not-yet synchronized data, and only for operations that have been executed without the waitForSync attribute. The default sync interval in 3.3 is 0, meaning that automatic background syncing is turned off. Automatic syncing was added in the middle of the ArangoDB 3.3 release cycle, so it is opt-in. The default sync interval will change to 100 milliseconds in ArangoDB 3.4 however. Note: this option is not supported on Windows platforms. Setting the option to a value greater 0 will produce a startup warning. --rocksdb.use-file-logging When set to true, enables writing of RocksDB's own informational LOG files into RocksDB's database directory. 460 RocksDB Engine Options This option is turned off by default, but can be enabled for debugging RocksDB internals and performance. --rocksdb.debug-logging When set to true, enables verbose logging of RocksDB's actions into the logfile written by ArangoDB (if option logging is off) or RocksDB's own log (if option --rocksdb.use-file-logging --rocksdb.use-file- is on). This option is turned off by default, but can be enabled for debugging RocksDB internals and performance. 461 Hash Cache Options Hash cache options Since ArangoDB 3.2, the several core components of the server use a cache system which pools memory across many different cache tables. In order to provide intelligent internal memory management, the system periodically reclaims memory from caches which are used less often and reallocates it to caches which get more activity. Cache size Global size limit for all hash caches: --cache.size The global caching system, all caches, and all the data contained therein will fit inside this limit. The size is specified in bytes. If there is less than 4GiB of RAM on the system, the default value is 256M iB. If there is more, the default is (system RAM size - 2GiB) * 0.3 . Rebalancing interval Time between cache rebalancing attempts: --cache.rebalancing-interval The value is specified in microseconds with a default of 2 seconds and a minimum of 500 milliseconds. 462 Asynchronous Tasks Asynchronous Tasks maximal queue size M aximum size of the queue for requests: --server.maximal-queue-size size Specifies the maximum size of the queue for asynchronous task execution. If the queue already contains size tasks, new tasks will be rejected until other tasks are popped from the queue. Setting this value may help preventing from running out of memory if the queue is filled up faster than the server can process requests. 463 Durability Durability Configuration Global Configuration There are global configuration values for durability, which can be adjusted by specifying the following configuration options: default wait for sync behavior --database.wait-for-sync boolean Default wait-for-sync value. Can be overwritten when creating a new collection. The default is false. force syncing of collection properties to disk --database.force-sync-properties boolean Force syncing of collection properties to disk after creating a collection or updating its properties. If turned off, no fsync will happen for the collection and database properties stored in parameter.json files in the file system. Turning off this option will speed up workloads that create and drop a lot of collections (e.g. test suites). The default is true. interval for automatic, non-requested disk syncs --wal.sync-interval The interval (in milliseconds) that ArangoDB will use to automatically synchronize data in its write-ahead logs to disk. Automatic syncs will only be performed for not-yet synchronized data, and only for operations that have been executed without the waitForSync attribute. Per-collection configuration You can also configure the durability behavior on a per-collection basis. Use the ArangoDB shell to change these properties. gets or sets the properties of a collection collection.properties() Returns an object containing all collection properties. waitForSync: If true creating a document will only return after the data was synced to disk. journalSize : The size of the journal in bytes. This option is meaningful for the M M Files storage engine only. isVolatile: If true then the collection data will be kept in memory only and ArangoDB will not write or sync the data to disk. This option is meaningful for the M M Files storage engine only. keyOptions (optional) additional options for key generation. This is a JSON array containing the following attributes (note: some of the attributes are optional): type: the type of the key generator used for the collection. allowUserKeys: if set to true, then it is allowed to supply own key values in the _key attribute of a document. If set to false, then the key generator will solely be responsible for generating keys and supplying own key values in the _key attribute of documents is considered an error. increment: increment value for autoincrement key generator. Not used for other key generator types. offset: initial offset value for autoincrement key generator. Not used for other key generator types. indexBuckets: number of buckets into which indexes using a hash table are split. The default is 16 and this number has to be a power of 2 and less than or equal to 1024. This option is meaningful for the M M Files storage engine only. For very large collections one should increase this to avoid long pauses when the hash table has to be initially built or resized, since buckets are resized individually and can be initially built in parallel. For example, 64 might be a sensible value for a collection with 100 000 000 documents. Currently, only the edge index respects this value, but other index types might follow in future ArangoDB versions. Changes (see below) are applied when the collection is loaded the next time. In a cluster setup, the result will also contain the following attributes: numberOfShards: the number of shards of the collection. 464 Durability shardKeys: contains the names of document attributes that are used to determine the target shard for documents. replicationFactor: determines how many copies of each shard are kept on different DBServers. collection.properties(properties) Changes the collection properties. properties must be an object with one or more of the following attribute(s): waitForSync: If true creating a document will only return after the data was synced to disk. journalSize : The size of the journal in bytes. This option is meaningful for the M M Files storage engine only. indexBuckets : See above, changes are only applied when the collection is loaded the next time. This option is meaningful for the M M Files storage engine only. replicationFactor : Change the number of shard copies kept on different DBServers, valid values are integer numbers in the range of 1-10 (Cluster only) Note: it is not possible to change the journal size after the journal or datafile has been created. Changing this parameter will only effect newly created journals. Also note that you cannot lower the journal size to less then size of the largest document already stored in the collection. Note: some other collection properties, such as type, isVolatile, or keyOptions cannot be changed once the collection is created. Examples Read all properties arangosh> db.example.properties(); show execution results Change a property arangosh> db.example.properties({ waitForSync : true }); show execution results Per-operation configuration M any data-modification operations and also ArangoDB's transactions allow to specify a waitForSync attribute, which when set ensures the operation data has been synchronized to disk when the operation returns. Disk-Usage Configuration The amount of disk space used by ArangoDB is determined by a few configuration options. Global Configuration The total amount of disk storage required by ArangoDB is determined by the size of the write-ahead logfiles plus the sizes of the collection journals and datafiles. There are the following options for configuring the number and sizes of the write-ahead logfiles: maximum number of reserve logfiles --wal.reserve-logfiles The maximum number of reserve logfiles that ArangoDB will create in a background process. Reserve logfiles are useful in the situation when an operation needs to be written to a logfile but the reserve space in the logfile is too low for storing the operation. In this case, a new logfile needs to be created to store the operation. Creating new logfiles is normally slow, so ArangoDB will try to pre-create logfiles in a background process so there are always reserve logfiles when the active logfile gets full. The number of reserve logfiles that ArangoDB keeps in the background is configurable with this option. 465 Durability maximum number of historic logfiles --wal.historic-logfiles The maximum number of historic logfiles that ArangoDB will keep after they have been garbage-collected. If no replication is used, there is no need to keep historic logfiles except for having a local changelog. In a replication setup, the number of historic logfiles affects the amount of data a slave can fetch from the master's logs. The more historic logfiles, the more historic data is available for a slave, which is useful if the connection between master and slave is unstable or slow. Not having enough historic logfiles available might lead to logfile data being deleted on the master already before a slave has fetched it. the size of each WAL logfile --wal.logfile-size Specifies the filesize (in bytes) for each write-ahead logfile. The logfile size should be chosen so that each logfile can store a considerable amount of documents. The bigger the logfile size is chosen, the longer it will take to fill up a single logfile, which also influences the delay until the data in a logfile will be garbage-collected and written to collection journals and datafiles. It also affects how long logfile recovery will take at server start. whether or not oversize entries are allowed --wal.allow-oversize-entries Whether or not it is allowed to store individual documents that are bigger than would fit into a single logfile. Setting the option to false will make such operations fail with an error. Setting the option to true will make such operations succeed, but with a high potential performance impact. The reason is that for each oversize operation, an individual oversize logfile needs to be created which may also block other operations. The option should be set to false if it is certain that documents will always have a size smaller than a single logfile. When data gets copied from the write-ahead logfiles into the journals or datafiles of collections, files will be created on the collection level. How big these files are is determined by the following global configuration value: --database.maximal-journal-size size M aximal size of journal in bytes. Can be overwritten when creating a new collection. Note that this also limits the maximal size of a single document. The default is 32MB. Per-collection configuration The journal size can also be adjusted on a per-collection level using the collection's properties method. 466 Encryption Encryption This feature is only available in the Enterprise Edition When you store sensitive data in your ArangoDB database, you want to protect that data under all circumstances. At runtime you will protect it with SSL transport encryption and strong authentication, but when the data is already on disk, you also need protection. That is where the Encryption feature comes in. The Encryption feature of ArangoDB will encrypt all data that ArangoDB is storing in your database before it is written to disk. The data is encrypted with AES-256-CTR, which is a strong encryption algorithm, that is very suitable for multi-processor environments. This means that your data is safe, but your database is still fast, even under load. M ost modern CPU's have builtin support for hardware AES encryption, which makes it even faster. Note: The Encryption feature requires the RocksDB storage engine. Encryption keys The Encryption feature of ArangoDB requires a single 32-byte key per server. It is recommended to use a different key for each server (when operating in a cluster configuration). M ake sure to protect these keys! That means: Do not write them to persistent disks or your server(s), always store them on an in-memory ( tmpfs ) filesystem. Transport your keys safely to your server(s). There are various tools for managing secrets like this (e.g. vaultproject.io). Store a copy of your key offline in a safe place. If you lose your key, there is NO way to get your data back. Configuration To activate encryption of your database, you need to supply an encryption key to the server. M ake sure to pass this option the very first time you start your database. You cannot encrypt a database that already exists. Note: You also have to activate the rocksdb storage engine. Encryption key stored in file Pass the following option to arangod : $ arangod \ --rocksdb.encryption-keyfile=/mytmpfs/mySecretKey \ --server.storage-engine=rocksdb The file /mytmpfs/mySecretKey must contain the encryption key. This file must be secured, so that only arangod can access it. You should also ensure that in case some-one steals the hardware, he will not be able to read the file. For example, by encryption or creating a in-memory file-system under /mytmpfs /mytmpfs . Encryption key generated by a program Pass the following option to arangod : $ arangod \ --rocksdb.encryption-key-generator=path-to-my-generator \ --server.storage-engine=rocksdb 467 Encryption The program path-to-my-generator output the encryption on standard output and exit. Creating keys The encryption keyfile must contain 32 bytes of random data. You can create it with a command line this. dd if=/dev/random bs=1 count=32 of=yourSecretKeyFile For security, it is best to create these keys offline (away from your database servers) and directly store them in you secret management tool. 468 Auditing Auditing This feature is only available in the Enterprise Edition Auditing allows you to monitor access to the database in detail. In general audit logs are of the form 2016-01-01 12:00:00 | server | username | database | client-ip | authentication | text1 | text2 | ... The time-stamp is in GM T. This allows to easily match log entries from servers in different time zones. The name of the server. You can specify a custom name on startup. Otherwise the default hostname is used. The username is the (authenticated or unauthenticated) name supplied by the client. A dash - is printed if no name was given by the client. The database describes the database that was accessed. Please note that there are no database crossing queries. Each access is restricted to one database. The client-ip describes the source of the request. The authentication details the methods used to authenticate the user. Details about the requests follow in the additional fields. 469 Configuration Audit Configuration This feature is available in the Enterprise Edition. Output --audit.output output Specifies the target of the audit log. Possible values are file://filename where filename can be relative or absolute. syslog://facility or syslog://facility/application-name to log into a syslog server. The option can be specified multiple times in order to configure the output for multiple targets. Hostname --audit.hostname name The name of the server used in audit log messages. By default the system hostname is used. 470 Events Audit Events This feature is available in the Enterprise Edition. Authentication Unknown authentication methods 2016-10-03 15:44:23 | server1 | - | database1 | 127.0.0.1:61525 | - | unknown authentication method | /_api/version Missing credentials 2016-10-03 15:39:49 | server1 | - | database1 | 127.0.0.1:61498 | - | credentials missing | /_api/version Wrong credentials 2016-10-03 15:47:26 | server1 | user1 | database1 | 127.0.0.1:61528 | http basic | credentials wrong | /_api/version Password change required 2016-10-03 16:18:53 | server1 | user1 | database1 | 127.0.0.1:62257 | - | password change required | /_api/version JWT login succeeded 2016-10-03 17:21:22 | server1 | - | database1 | 127.0.0.1:64214 | http jwt | user 'root' authenticated | /_open/auth Please note, that the user given as third part is the user that requested the login. In general, it will be empty. JWT login failed 2016-10-03 17:21:22 | server1 | - | database1 | 127.0.0.1:64214 | http jwt | user 'root' wrong credentials | /_open/auth Please note, that the user given as third part is the user that requested the login. In general, it will be empty. Authorization User not authorized to access database 2016-10-03 16:20:52 | server1 | user1 | database2 | 127.0.0.1:62262 | http basic | not authorized | /_api/version Databases Create a database 2016-10-04 15:33:25 | server1 | user1 | database1 | 127.0.0.1:56920 | http basic | create database 'database1' | ok | /_api/dat abase 471 Events Drop a database 2016-10-04 15:33:25 | server1 | user1 | database1 | 127.0.0.1:56920 | http basic | delete database 'database1' | ok | /_api/dat abase Collections Create a collection 2016-10-05 17:35:57 | server1 | user1 | database1 | 127.0.0.1:51294 | http basic | create collection 'collection1' | ok | /_api /collection Truncate a collection 2016-10-05 17:36:08 | server1 | user1 | database1 | 127.0.0.1:51294 | http basic | truncate collection 'collection1' | ok | /_a pi/collection/collection1/truncate Drop a collection 2016-10-05 17:36:30 | server1 | user1 | database1 | 127.0.0.1:51294 | http basic | delete collection 'collection1' | ok | /_api /collection/collection1 Indexes Create a index 2016-10-05 18:19:40 | server1 | user1 | database1 | 127.0.0.1:52467 | http basic | create index in 'collection1' | ok | {"field s":["a"],"sparse":false,"type":"skiplist","unique":false} | /_api/index?collection=collection1 Drop a index 2016-10-05 18:18:28 | server1 | user1 | database1 | 127.0.0.1:52464 | http basic | drop index ':44051' | ok | /_api/index/colle ction1/44051 Documents Reading a single document 2016-10-04 12:27:55 | server1 | user1 | database1 | 127.0.0.1:53699 | http basic | create document ok | /_api/document/collecti on1 Replacing a single document 2016-10-04 12:28:08 | server1 | user1 | database1 | 127.0.0.1:53699 | http basic | replace document ok | /_api/document/collect ion1/21456?ignoreRevs=false Modifying a single document 2016-10-04 12:28:15 | server1 | user1 | database1 | 127.0.0.1:53699 | http basic | modify document ok | /_api/document/collecti on1/21456?keepNull=true&ignoreRevs=false 472 Events Deleting a single document 2016-10-04 12:28:23 | server1 | user1 | database1 | 127.0.0.1:53699 | http basic | delete document ok | /_api/document/collecti on1/21456?ignoreRevs=false For example, if someones tries to delete a non-existing document, it will be logged as 2016-10-04 12:28:26 | server1 | user1 | database1 | 127.0.0.1:53699 | http basic | delete document failed | /_api/document/coll ection1/21456?ignoreRevs=false Queries 2016-10-06 12:12:10 | server1 | user1 | database1 | 127.0.0.1:54232 | http basic | query document | ok | for i in collection1 r eturn i | /_api/cursor 473 Replication Introduction to Replication Replication allows you to replicate data onto another machine. It forms the base of all disaster recovery and failover features ArangoDB offers. ArangoDB offers asynchronous and synchronous replication, depending on which type of arangodb deployment you are using. Since ArangoDB 3.2 the synchronous replication replication is the only replication type used in a cluster whereas the asynchronous replication is only available between single-server nodes. Future versions of ArangoDB may reintroduce asynchronous replication for the cluster. We will describe pros and cons of each of them in the following sections. Asynchronous replication In ArangoDB any write operation will be logged to the write-ahead log. When using Asynchronous replication slaves will connect to a master and apply all the events from the log in the same order locally. After that, they will have the same state of data as the master database. Synchronous replication Synchronous replication only works within a cluster and is typically used for mission critical data which must be accessible at all times. Synchronous replication generally stores a copy of a shard's data on another db server and keeps it in sync. Essentially, when storing data after enabling synchronous replication the cluster will wait for all replicas to write all the data before greenlighting the write operation to the client. This will naturally increase the latency a bit, since one more network hop is needed for each write. However, it will enable the cluster to immediately fail over to a replica whenever an outage has been detected, without losing any committed data, and mostly without even signaling an error condition to the client. Synchronous replication is organized such that every shard has a leader and The number of followers can be controlled using the replicationFactor replicationFactor r-1 followers, where r denoted the replication factor. parameter whenever you create a collection, the parameter is the total number of copies being kept, that is, it is one plus the number of followers. Satellite collections Satellite collections are synchronously replicated collections having a dynamic replicationFactor. They will replicate all data to all database servers allowing the database servers to join data locally instead of doing heavy network operations. Satellite collections are an enterprise only feature. 474 Asynchronous Replication Asynchronous replication Asynchronous replication works by logging every data modification on a master and replaying these events on a number of slaves. Transactions are honored in replication, i.e. transactional write operations will become visible on slaves atomically. As all write operations will be logged to a master database's write-ahead log, the replication in ArangoDB currently cannot be used for write-scaling. The main purposes of the replication in current ArangoDB are to provide read-scalability and "hot backups" of specific databases. It is possible to connect multiple slave databases to the same master database. Slave databases should be used as read-only instances, and no user-initiated write operations should be carried out on them. Otherwise data conflicts may occur that cannot be solved automatically, and that will make the replication stop. In an asynchronous replication scenario slaves will pull changes from the master database. Slaves need to know to which master database they should connect to, but a master database is not aware of the slaves that replicate from it. When the network connection between the master database and a slave goes down, write operations on the master can continue normally. When the network is up again, slaves can reconnect to the master database and transfer the remaining changes. This will happen automatically provided slaves are configured appropriately. Replication lag In this setup, write operations are applied first in the master database, and applied in the slave database(s) afterwards. For example, let's assume a write operation is executed in the master database at point in time t0. To make a slave database apply the same operation, it must first fetch the write operation's data from master database's write-ahead log, then parse it and apply it locally. This will happen at some point in time after t0, let's say t1. The difference between t1 and t0 is called the replication lag, and it is unavoidable in asynchronous replication. The amount of replication lag depends on many factors, a few of which are: the network capacity between the slaves and the master the load of the master and the slaves the frequency in which slaves poll the master for updates Between t0 and t1, the state of data on the master is newer than the state of data on the slave(s). At point in time t1, the state of data on the master and slave(s) is consistent again (provided no new data modifications happened on the master in between). Thus, the replication will lead to an eventually consistent state of data. Replication configuration The replication is turned off by default. In order to create a master-slave setup, the so-called replication applier needs to be enabled on the slave databases. Replication is configured on a per-database level. If multiple database are to be replicated, the replication must be set up individually per database. The replication applier on the slave can be used to perform a one-time synchronization with the master (and then stop), or to perform an ongoing replication of changes. To resume replication on slave restart, the autoStart attribute of the replication applier must be set to true. Replication overhead As the master servers are logging any write operation in the write-ahead-log anyway replication doesn't cause any extra overhead on the master. However it will of course cause some overhead for the master to serve incoming read requests of the slaves. Returning the requested data is however a trivial task for the master and should not result in a notable performance degration in production. 475 Asynchronous Replication Components Replication Logger Purpose The replication logger will write all data-modification operations into the write-ahead log. This log may then be read by clients to replay any data modification on a different server. Checking the state To query the current state of the logger, use the state command: require("@arangodb/replication").logger.state(); The result might look like this: { "state" : { "running" : true, "lastLogTick" : "133322013", "totalEvents" : 16, "time" : "2014-07-06T12:58:11Z" }, "server" : { "version" : "2.2.0-devel", "serverId" : "40897075811372" }, "clients" : { } } The running attribute will always be true. In earlier versions of ArangoDB the replication was optional and this could have been false. The totalEvents attribute indicates how many log events have been logged since the start of the ArangoDB server. Finally, the lastLogTick value indicates the id of the last operation that was written to the server's write-ahead log. It can be used to determine whether new operations were logged, and is also used by the replication applier for incremental fetching of data. Note: The replication logger state can also be queried via the HTTP API. To query which data ranges are still available for replication clients to fetch, the logger provides the firstTick and tickRanges functions: require("@arangodb/replication").logger.firstTick(); This will return the minimum tick value that the server can provide to replication clients via its replication APIs. The tickRanges function returns the minimum and maximum tick values per logfile: require("@arangodb/replication").logger.tickRanges(); Replication Applier Purpose The purpose of the replication applier is to read data from a master database's event log, and apply them locally. The applier will check the master database for new operations periodically. It will perform an incremental synchronization, i.e. only asking the master for operations that occurred after the last synchronization. 476 Asynchronous Replication The replication applier does not get notified by the master database when there are "new" operations available, but instead uses the pull principle. It might thus take some time (the so-called replication lag) before an operation from the master database gets shipped to and applied in a slave database. The replication applier of a database is run in a separate thread. It may encounter problems when an operation from the master cannot be applied safely, or when the connection to the master database goes down (network outage, master database is down or unavailable etc.). In this case, the database's replication applier thread might terminate itself. It is then up to the administrator to fix the problem and restart the database's replication applier. If the replication applier cannot connect to the master database, or the communication fails at some point during the synchronization, the replication applier will try to reconnect to the master database. It will give up reconnecting only after a configurable amount of connection attempts. The replication applier state is queryable at any time by using the state command of the applier. This will return the state of the applier of the current database: require("@arangodb/replication").applier.state(); The result might look like this: { "state" : { "running" : true, "lastAppliedContinuousTick" : "152786205", "lastProcessedContinuousTick" : "152786205", "lastAvailableContinuousTick" : "152786205", "progress" : { "time" : "2014-07-06T13:04:57Z", "message" : "fetching master log from offset 152786205", "failedConnects" : 0 }, "totalRequests" : 38, "totalFailedConnects" : 0, "totalEvents" : 1, "lastError" : { "errorNum" : 0 }, "time" : "2014-07-06T13:04:57Z" }, "server" : { "version" : "2.2.0-devel", "serverId" : "210189384542896" }, "endpoint" : "tcp://master.example.org:8529", "database" : "_system" } The running attribute indicates whether the replication applier of the current database is currently running and polling the server at endpoint for new events. The progress.failedConnects attribute shows how many failed connection attempts the replication applier currently has encountered in a row. In contrast, the totalFailedConnects attribute indicates how many failed connection attempts the applier has made in total. The totalRequests attribute shows how many requests the applier has sent to the master database in total. The totalEvents attribute shows how many log events the applier has read from the master. The progress.message sub-attribute provides a brief hint of what the applier currently does (if it is running). The lastError attribute also has an optional errorMessage sub-attribute, showing the latest error message. The errorNum sub-attribute of the lastError attribute can be used by clients to programmatically check for errors. It should be 0 if there is no error, and it should be non-zero if the applier terminated itself due to a problem. Here is an example of the state after the replication applier terminated itself due to (repeated) connection problems: { "state" : { "running" : false, "progress" : { "time" : "2014-07-06T13:14:37Z", 477 Asynchronous Replication "message" : "applier stopped", "failedConnects" : 6 }, "totalRequests" : 79, "totalFailedConnects" : 11, "totalEvents" : 0, "lastError" : { "time" : "2014-07-06T13:09:41Z", "errorMessage" : "could not connect to master at tcp://master.example.org:8529: Could not connect to 'tcp:/...", "errorNum" : 1400 }, ... } } Note: the state of a database's replication applier is queryable via the HTTP API, too. Please refer to HTTP Interface for Replication for more details. All-in-one setup To copy the initial data from the slave to the master and start the continuous replication, there is an all-in-one command setupReplication: require("@arangodb/replication").setupReplication(configuration); The following example demonstrates how to use the command for setting up replication for the _system database. Note that it should be run on the slave and not the master: db._useDatabase("_system"); require("@arangodb/replication").setupReplication({ endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd", verbose: false, includeSystem: false, incremental: true, autoResync: true }); The command will return when the initial synchronization is finished and the continuous replication is started, or in case the initial synchronization has failed. If the initial synchronization is successful, the command will store the given configuration on the slave. It also configures the continuous replication to start automatically if the slave is restarted, i.e. autoStart is set to true. If the command is run while the slave's replication applier is already running, it will first stop the running applier, drop its configuration and do a resynchronization of data with the master. It will then use the provided configration, overwriting any previously existing replication configuration on the slave. Starting and Stopping To manually start and stop the applier in the current database, the start and stop commands can be used like this: require("@arangodb/replication").applier.start( ); require("@arangodb/replication").applier.stop(); Note: Starting a replication applier without setting up an initial configuration will fail. The replication applier will look for its configuration in a file named REPLICATION-APPLIER-CONFIG in the current database's directory. If the file is not present, ArangoDB will use some default configuration, but it cannot guess the endpoint (the address of the master database) the applier should connect to. Thus starting the applier without configuration will fail. Note that at the first time you start the applier, you should pass the value returned in the lastLogTick attribute of the initial sync operation. 478 Asynchronous Replication Note: Starting a database's replication applier via the start command will not necessarily start the applier on the next and following ArangoDB server restarts. Additionally, stopping a database's replication applier manually will not necessarily prevent the applier from being started again on the next server start. All of this is configurable separately (hang on reading). Note: when stopping and restarting the replication applier of database, it will resume where it last stopped. This is sensible because replication log events should be applied incrementally. If the replication applier of a database has never been started before, it needs some tick value from the master's log from which to start fetching events. There is one caveat to consider when stopping a replication on the slave: if there are still ongoing replicated transactions that are neither committed or aborted, stopping the replication applier will cause these operations to be lost for the slave. If these transactions commit on the master later and the replication is resumed, the slave will not be able to commit these transactions, too. Thus stopping the replication applier on the slave manually should only be done if there is certainty that there are no ongoing transactions on the master. Configuration To configure the replication applier of a specific database, use the properties command. Using it without any arguments will return the applier's current configuration: require("@arangodb/replication").applier.properties(); The result might look like this: { "requestTimeout" : 600, "connectTimeout" : 10, "ignoreErrors" : 0, "maxConnectRetries" : 10, "chunkSize" : 0, "autoStart" : false, "adaptivePolling" : true, "includeSystem" : true, "requireFromPresent" : false, "autoResync" : false, "autoResyncRetries" : 2, "verbose" : false } Note: There is no endpoint attribute configured yet. The endpoint attribute is required for the replication applier to be startable. You may also want to configure a username and password for the connection via the username and password attributes. require("@arangodb/replication").applier.properties({ endpoint: "tcp://master.domain.org:8529", username: "root", password: "secret", verbose: false }); This will re-configure the replication applier for the current database. The configuration will be used from the next start of the replication applier. The replication applier cannot be re-configured while it is running. It must be stopped first to be re-configured. To make the replication applier of the current database start automatically when the ArangoDB server starts, use the autoStart attribute. Setting the adaptivePolling attribute to true will make the replication applier poll the master database for changes with a variable frequency. The replication applier will then lower the frequency when the master is idle, and increase it when the master can provide new events). Otherwise the replication applier will poll the master database for changes with a constant frequency. The idleMinWaitTime attribute controls the minimum wait time (in seconds) that the replication applier will intentionally idle before fetching more log data from the master in case the master has already sent all its log data. This wait time can be used to control the frequency with which the replication applier sends HTTP log fetch requests to the master in case there is no write activity on the master. The idleMaxWaitTime attribute controls the maximum wait time (in seconds) that the replication applier will intentionally idle before fetching more log data from the master in case the master has already sent all its log data and there have been previous log fetch attempts that resulted in no more log data. This wait time can be used to control the maximum frequency with which the replication applier sends 479 Asynchronous Replication HTTP log fetch requests to the master in case there is no write activity on the master for longer periods. Note that this configuration value will only be used if the option adaptivePolling is set to true. To set a timeout for connection and following request attempts, use the connectTimeout and requestTimeout values. The maxConnectRetries attribute configures after how many failed connection attempts in a row the replication applier will give up and turn itself off. You may want to set this to a high value so that temporary network outages do not lead to the replication applier stopping itself. The connectRetryWaitTime attribute configures how long the replication applier will wait before retrying the connection to the master in case of connection problems. The chunkSize attribute can be used to control the approximate maximum size of a master's response (in bytes). Setting it to a low value may make the master respond faster (less data is assembled before the master sends the response), but may require more requestresponse roundtrips. Set it to 0 to use ArangoDB's built-in default value. The includeSystem attribute controls whether changes to system collections (such as _graphs or _users) should be applied. If set to true, changes in these collections will be replicated, otherwise, they will not be replicated. It is often not necessary to replicate data from system collections, especially because it may lead to confusion on the slave because the slave needs to have its own system collections in order to start and keep operational. The requireFromPresent attribute controls whether the applier will start synchronizing in case it detects that the master cannot provide data for the initial tick value provided by the slave. This may be the case if the master does not have a big enough backlog of historic WAL logfiles, and when the replication is re-started after a longer pause. When requireFromPresent is set to true, then the replication applier will check at start whether the start tick from which it starts or resumes replication is still present on the master. If not, then there would be data loss. If requireFromPresent is true, the replication applier will abort with an appropriate error message. If set to false, then the replication applier will still start, and ignore the data loss. The autoResync option can be used in conjunction with the requireFromPresent option as follows: when both requireFromPresent and autoResync are set to true and the master cannot provide the log data the slave requests, the replication applier will stop as usual. But due to the fact that autoResync is set to true, the slave will automatically trigger a full resync of all data with the master. After that, the replication applier will go into continuous replication mode again. Additionally, setting autoResync to true will trigger a full resynchronization of data when the continuous replication is started and detects that there is no start tick value. Automatic re-synchronization may transfer a lot of data from the master to the slave and can be expensive. It is therefore turned off by default. When turned off, the slave will never perform an automatic re-synchronization with the master. The autoResyncRetries option can be used to control the number of resynchronization retries that will be performed in a row when automatic resynchronization is enabled and kicks in. Setting this to 0 will effectively disable autoResync. Setting it to some other value will limit the number of retries that are performed. This helps preventing endless retries in case resynchronizations always fail. The verbose attribute controls the verbosity of the replication logger. Setting it to true will make the replication applier write a line to the log for every operation it performs. This should only be used for diagnosing replication problems. The following example will set most of the discussed properties for the current database's applier: require("@arangodb/replication").applier.properties({ endpoint: "tcp://master.domain.org:8529", username: "root", password: "secret", adaptivePolling: true, connectTimeout: 15, maxConnectRetries: 100, chunkSize: 262144, autoStart: true, includeSystem: true, autoResync: true, autoResyncRetries: 2, }); After the applier is now fully configured, it could theoretically be started. However, we may first need an initial synchronization of all collections and their data from the master before we start the replication applier. The only safe method for doing a full synchronization (or re-synchronization) is thus to stop the replication applier on the slave (if currently running) perform an initial full sync with the master database note the master database's lastLogTick value and 480 Asynchronous Replication start the continuous replication applier on the slave using this tick value. The initial synchronization for the current database is executed with the sync command: require("@arangodb/replication").sync({ endpoint: "tcp://master.domain.org:8529", username: "root", password: "secret, includeSystem: true }); The includeSystem option controls whether data from system collections (such as _graphs and _users) shall be synchronized. The initial synchronization can optionally be configured to include or exclude specific collections using the restrictType and restrictCollection parameters. The following command only synchronizes collection foo and bar: require("@arangodb/replication").sync({ endpoint: "tcp://master.domain.org:8529", username: "root", password: "secret, restrictType: "include", restrictCollections: [ "foo", "bar" ] }); Using a restrictType of exclude, all collections but the specified will be synchronized. Warning: sync will do a full synchronization of the collections in the current database with collections present in the master database. Any local instances of the collections and all their data are removed! Only execute this command if you are sure you want to remove the local data! As sync does a full synchronization, it might take a while to execute. When sync completes successfully, it returns an array of collections it has synchronized in its collections attribute. It will also return the master database's last log tick value at the time the sync was started on the master. The tick value is contained in the lastLogTick attribute of the sync command: { "lastLogTick" : "231848833079705", "collections" : [ ... ] } Now you can start the continuous synchronization for the current database on the slave with the command require("@arangodb/replication").applier.start("231848833079705"); Note: The tick values should be treated as strings. Using numeric data types for tick values is unsafe because they might exceed the 32 bit value and the IEEE754 double accuracy ranges. 481 Asynchronous Replication Per-Database Setup This page describes the replication process based on a specific database within an ArangoDB instance. That means that only the specified database will be replicated. Setting up a working master-slave replication requires two ArangoDB instances: master: this is the instance that all data-modification operations should be directed to slave: on this instance, we'll start a replication applier, and this will fetch data from the master database's write-ahead log and apply its operations locally For the following example setup, we'll use the instance tcp://master.domain.org:8529 as the master, and the instance tcp://slave.domain.org:8530 as a slave. The goal is to have all data from the database _system on master tcp://master.domain.org:8529 be replicated to the database _system on the slave tcp://slave.domain.org:8530. On the master, nothing special needs to be done, as all write operations will automatically be logged in the master's write-ahead log (WAL). All-in-one setup To make the replication copy the initial data from the master to the slave and start the continuous replication on the slave, there is an all-in-one command: require("@arangodb/replication").setupReplication(configuration); The following example demonstrates how to use the command for setting up replication for the _system database. Note that it should be run on the slave and not the master: db._useDatabase("_system"); require("@arangodb/replication").setupReplication({ endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd", verbose: false, includeSystem: false, incremental: true, autoResync: true }); The command will return when the initial synchronization is finished and the continuous replication has been started, or in case the initial synchronization has failed. If the initial synchronization is successful, the command will store the given configuration on the slave. It also configures the continuous replication to start automatically if the slave is restarted, i.e. autoStart is set to true. If the command is run while the slave's replication applier is already running, it will first stop the running applier, drop its configuration and do a resynchronization of data with the master. It will then use the provided configration, overwriting any previously existing replication configuration on the slave. Initial synchronization The initial synchronization and continuous replication applier can also be started separately. To start replication on the slave, make sure there currently is no replication applier running. The following commands stop a running applier in the slave's _system database: db._useDatabase("_system"); 482 Asynchronous Replication require("@arangodb/replication").applier.stop(); The stop operation will terminate any replication activity in the _system database on the slave. After that, the initial synchronization can be run. It will copy the collections from the master to the slave, overwriting existing data. To run the initial synchronization, execute the following commands on the slave: db._useDatabase("_system"); require("@arangodb/replication").sync({ endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd", verbose: false }); Username and password only need to be specified when the master requires authentication. To check what the synchronization is currently doing, supply set the verbose option to true. If set, the synchronization will create log messages with the current synchronization status. Warning: The sync command will replace data in the slave database with data from the master database! Only execute these commands if you have verified you are on the correct server, in the correct database! The sync operation will return an attribute named lastLogTick which we'll need to note. The last log tick will be used as the starting point for subsequent replication activity. Let's assume we got the following last log tick: { "lastLogTick" : "40694126", ... } Initial synchronization from the ArangoShell The initial synchronization via the sync command may take a long time to complete. The shell will block until the slave has completed the initial synchronization or until an error occurs. By default, the sync command in the ArangoShell will poll the slave for a status update every 10 seconds. Optionally the sync command can be made non-blocking by setting its async option to true. In this case, the sync command will return instantly with an id string, and the initial synchronization will run detached on the master. To fetch the current status of the sync progress from the ArangoShell, the getSyncResult function can be used as follows: db._useDatabase("_system"); var replication = require("@arangodb/replication"); /* run command in async mode */ var id = replication.sync({ endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd", async: true }); /* now query the status of our operation */ print(replication.getSyncResult(id)); getSyncResult will return false as long as the synchronization is not complete, and return the synchronization result otherwise. Continuous synchronization When the initial synchronization is finished, the continuous replication applier can be started using the last log tick provided by the sync command. Before starting it, there is at least one configuration option to consider: replication on the slave will be running until the slave gets shut down. When the slave server gets restarted, replication will be turned off again. To change this, we first need to configure the slave's replication applier and set its autoStart attribute. 483 Asynchronous Replication Here's the command to configure the replication applier with several options, including the autoStart attribute: db._useDatabase("_system"); require("@arangodb/replication").applier.properties({ endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd", autoStart: true, autoResync: true, autoResyncRetries: 2, adaptivePolling: true, includeSystem: false, requireFromPresent: false, idleMinWaitTime: 0.5, idleMaxWaitTime: 1.5, verbose: false }); An important consideration for replication is whether data from system collections (such as _graphs or _users) should be applied. The includeSystem option controls that. If set to true, changes in system collections will be replicated. Otherwise, they will not be replicated. It is often not necessary to replicate data from system collections, especially because it may lead to confusion on the slave because the slave needs to have its own system collections in order to start and keep operational. The requireFromPresent attribute controls whether the applier will start synchronizing in case it detects that the master cannot provide data for the initial tick value provided by the slave. This may be the case if the master does not have a big enough backlog of historic WAL logfiles, and when the replication is re-started after a longer pause. When requireFromPresent is set to true, then the replication applier will check at start whether the start tick from which it starts or resumes replication is still present on the master. If not, then there would be data loss. If requireFromPresent is true, the replication applier will abort with an appropriate error message. If set to false, then the replication applier will still start, and ignore the data loss. The autoResync option can be used in conjunction with the requireFromPresent option as follows: when both requireFromPresent and autoResync are set to true and the master cannot provide the log data the slave had requested, the replication applier will stop as usual. But due to the fact that autoResync is set to true, the slave will automatically trigger a full resync of all data with the master. After that, the replication applier will go into continuous replication mode again. Additionally, setting autoResync to true will trigger a full resynchronization of data when the continuous replication is started and detects that there is no start tick value. Note that automatic re-synchronization (autoResync option set to true) may transfer a lot of data from the master to the slave and can therefore be expensive. Still it's turned on here so there's less need for manual intervention. The autoResyncRetries option can be used to control the number of resynchronization retries that will be performed in a row when automatic resynchronization is enabled and kicks in. Setting this to 0 will effectively disable autoResync. Setting it to some other value will limit the number of retries that are performed. This helps preventing endless retries in case resynchronizations always fail. Now it's time to start the replication applier on the slave using the last log tick we got before: db._useDatabase("_system"); require("@arangodb/replication").applier.start("40694126"); This will replicate all operations happening in the master's system database and apply them on the slave, too. After that, you should be able to monitor the state and progress of the replication applier by executing the state command on the slave server: db._useDatabase("_system"); require("@arangodb/replication").applier.state(); Please note that stopping the replication applier on the slave using the stop command should be avoided. The reason is that currently ongoing transactions (that have partly been replicated to the slave) will be need to be restarted after a restart of the replication applier. Stopping and restarting the replication applier on the slave should thus only be performed if there is certainty that the master is currently fully idle and all transactions have been replicated fully. Note that while a slave has only partly executed a transaction from the master, it might keep a write lock on the collections involved in the transaction. 484 Asynchronous Replication You may also want to check the master and slave states via the HTTP APIs (see HTTP Interface for Replication). 485 Asynchronous Replication Server-level Setup This page describes the replication process based on a complete ArangoDB instance. That means that all included databases will be replicated. Setting up a working master-slave replication requires two ArangoDB instances: master: this is the instance that all data-modification operations should be directed to slave: on this instance, we'll start a replication applier, and this will fetch data from the master database's write-ahead log and apply its operations locally For the following example setup, we'll use the instance tcp://master.domain.org:8529 as the master, and the instance tcp://slave.domain.org:8530 as a slave. The goal is to have all data of all databases on master tcp://master.domain.org:8529 be replicated to the slave instance tcp://slave.domain.org:8530. On the master, nothing special needs to be done, as all write operations will automatically be logged in the master's write-ahead log (WAL). All-in-one setup To make the replication copy the initial data from the master to the slave and start the continuous replication on the slave, there is an all-in-one command: require("@arangodb/replication").setupReplicationGlobal(configuration); The following example demonstrates how to use the command for setting up replication for the complete ArangoDB instance. Note that it should be run on the slave and not the master: db._useDatabase("_system"); require("@arangodb/replication").setupReplicationGlobal({ endpoint: "tcp://127.0.0.1:8529", username: "root", password: "", autoStart: true }); The command will return when the initial synchronization is finished and the continuous replication has been started, or in case the initial synchronization has failed. If the initial synchronization is successful, the command will store the given configuration on the slave. It also configures the continuous replication to start automatically if the slave is restarted, i.e. autoStart is set to true. If the command is run while the slave's replication applier is already running, it will first stop the running applier, drop its configuration and do a resynchronization of data with the master. It will then use the provided configration, overwriting any previously existing replication configuration on the slave. Stopping synchronization The initial synchronization and continuous replication applier can also be started separately. To start replication on the slave, make sure there currently is no replication applier running. The following commands stop a running applier in the slave's instance: db._useDatabase("_system"); require("@arangodb/replication").globalApplier.stop(); 486 Asynchronous Replication The stop operation will terminate any replication activity in the ArangoDB instance on the slave. After that, the initial synchronization can be run. It will copy the collections from the master to the slave, overwriting existing data. To run the initial synchronization, execute the following commands on the slave: db._useDatabase("_system"); require("@arangodb/replication").syncGlobal({ endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd", verbose: false }); Username and password only need to be specified when the master requires authentication. To check what the synchronization is currently doing, supply set the verbose option to true. If set, the synchronization will create log messages with the current synchronization status. Warning: The syncGlobal command will replace data in the slave database with data from the master database! Only execute these commands if you have verified you are on the correct server, in the correct database! The sync operation will return an attribute named lastLogTick which we'll need to note. The last log tick will be used as the starting point for subsequent replication activity. Let's assume we got the following last log tick: { "lastLogTick" : "40694126", ... } Initial synchronization from the ArangoShell The initial synchronization via the syncGlobal command may take a long time to complete. The shell will block until the slave has completed the initial synchronization or until an error occurs. By default, the syncGlobal command in the ArangoShell will poll the slave for a status update every 10 seconds. Optionally the syncGlobal command can be made non-blocking by setting its async option to true. In this case, the syncGlobal command will return instantly with an id string, and the initial synchronization will run detached on the master. To fetch the current status of the syncGlobal progress from the ArangoShell, the getSyncResult function can be used as follows: db._useDatabase("_system"); var replication = require("@arangodb/replication"); /* run command in async mode */ var id = replication.syncGlobal({ endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd", async: true }); /* now query the status of our operation */ print(replication.getSyncResult(id)); getSyncResult will return false as long as the synchronization is not complete, and return the synchronization result otherwise. Continuous synchronization When the initial synchronization is finished, the continuous replication applier can be started using the last log tick provided by the syncGlobal command. Before starting it, there is at least one configuration option to consider: replication on the slave will be running until the slave gets shut down. When the slave server gets restarted, replication will be turned off again. To change this, we first need to configure the slave's replication applier and set its autoStart attribute. Here's the command to configure the replication applier with several options, including the autoStart attribute: 487 Asynchronous Replication db._useDatabase("_system"); require("@arangodb/replication").globalApplier.properties({ endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd", autoStart: true, autoResync: true, autoResyncRetries: 2, adaptivePolling: true, includeSystem: false, requireFromPresent: false, idleMinWaitTime: 0.5, idleMaxWaitTime: 1.5, verbose: false }); An important consideration for replication is whether data from system collections (such as _graphs or _users) should be applied. The includeSystem option controls that. If set to true, changes in system collections will be replicated. Otherwise, they will not be replicated. It is often not necessary to replicate data from system collections, especially because it may lead to confusion on the slave because the slave needs to have its own system collections in order to start and keep operational. The requireFromPresent attribute controls whether the applier will start synchronizing in case it detects that the master cannot provide data for the initial tick value provided by the slave. This may be the case if the master does not have a big enough backlog of historic WAL logfiles, and when the replication is re-started after a longer pause. When requireFromPresent is set to true, then the replication applier will check at start whether the start tick from which it starts or resumes replication is still present on the master. If not, then there would be data loss. If requireFromPresent is true, the replication applier will abort with an appropriate error message. If set to false, then the replication applier will still start, and ignore the data loss. The autoResync option can be used in conjunction with the requireFromPresent option as follows: when both requireFromPresent and autoResync are set to true and the master cannot provide the log data the slave had requested, the replication applier will stop as usual. But due to the fact that autoResync is set to true, the slave will automatically trigger a full resync of all data with the master. After that, the replication applier will go into continuous replication mode again. Additionally, setting autoResync to true will trigger a full resynchronization of data when the continuous replication is started and detects that there is no start tick value. Note that automatic re-synchronization (autoResync option set to true) may transfer a lot of data from the master to the slave and can therefore be expensive. Still it's turned on here so there's less need for manual intervention. The autoResyncRetries option can be used to control the number of resynchronization retries that will be performed in a row when automatic resynchronization is enabled and kicks in. Setting this to 0 will effectively disable autoResync. Setting it to some other value will limit the number of retries that are performed. This helps preventing endless retries in case resynchronizations always fail. Now it's time to start the replication applier on the slave using the last log tick we got before: db._useDatabase("_system"); require("@arangodb/replication").globalApplier.start("40694126"); This will replicate all operations happening in the master's system database and apply them on the slave, too. After that, you should be able to monitor the state and progress of the replication applier by executing the state command on the slave server: db._useDatabase("_system"); require("@arangodb/replication").globalApplier.state(); Please note that stopping the replication applier on the slave using the stop command should be avoided. The reason is that currently ongoing transactions (that have partly been replicated to the slave) will be need to be restarted after a restart of the replication applier. Stopping and restarting the replication applier on the slave should thus only be performed if there is certainty that the master is currently fully idle and all transactions have been replicated fully. Note that while a slave has only partly executed a transaction from the master, it might keep a write lock on the collections involved in the transaction. You may also want to check the master and slave states via the HTTP APIs (see HTTP Interface for Replication). 488 Asynchronous Replication 489 Asynchronous Replication Syncing Collections In order to synchronize data for a single collection from a master to a slave instance, there is the syncCollection function: It will fetch all documents of the specified collection from the master database and store them in the local instance. After the synchronization, the collection data on the slave will be identical to the data on the master, provided no further data changes happen on the master. Any data changes that are performed on the master after the synchronization was started will not be captured by syncCollection, but need to be replicated using the regular replication applier mechanism. For the following example setup, we'll use the instance tcp://master.domain.org:8529 as the master, and the instance tcp://slave.domain.org:8530 as a slave. The goal is to have all data from the collection test in database _system on master tcp://master.domain.org:8529 be replicated to the collection test in database _system on the slave tcp://slave.domain.org:8530. On the master, the collection test needs to be present in the _system database, with any data in it. To transfer this collection to the slave, issue the following commands there: db._useDatabase("_system"); require("@arangodb/replication").syncCollection("test", { endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd" }); Warning: The syncCollection command will replace the collection's data in the slave database with data from the master database! Only execute these commands if you have verified you are on the correct server, in the correct database! Setting the optional incremental attribute in the call to syncCollection will start an incremental transfer of data. This may be useful in case when the slave already has parts or almost all of the data in the collection and only the differences need to be synchronized. Note that to compute the differences the incremental transfer will build a sorted list of all document keys in the collection on both the slave and the master, which may still be expensive for huge collections in terms of memory usage and runtime. During building the list of keys the collection will be read-locked on the master. The initialSyncMaxWaitTime attribute in the call to syncCollection controls how long the slave will wait for a master's response. This wait time can be used to control after what time the synchronization will give up and fail. The syncCollection command may take a long time to complete if the collection is big. The shell will block until the slave has synchronized the entire collection from the master or until an error occurs. By default, the syncCollection command in the ArangoShell will poll for a status update every 10 seconds. When syncCollection is called from the ArangoShell, the optional async attribute can be used to start the synchronization as a background process on the slave. If async is set to true, the call to syncCollection will return almost instantly with an id string. Using this id string, the status of the sync job on the slave can be queried using the getSyncResult function as follows: db._useDatabase("_system"); var replication = require("@arangodb/replication"); /* run command in async mode */ var id = replication.syncCollection("test", { endpoint: "tcp://master.domain.org:8529", username: "myuser", password: "mypasswd", async: true }); /* now query the status of our operation */ print(replication.getSyncResult(id)); getSyncResult will return false as long as the synchronization is not complete, and return the synchronization result otherwise. 490 Asynchronous Replication 491 Asynchronous Replication Replication Limitations The replication in ArangoDB has a few limitations. Some of these limitations may be removed in later versions of ArangoDB: there is no feedback from the slaves to the master. If a slave cannot apply an event it got from the master, the master will have a different state of data. In this case, the replication applier on the slave will stop and report an error. Administrators can then either "fix" the problem or re-sync the data from the master to the slave and start the applier again. at the moment it is assumed that only the replication applier executes write operations on a slave. ArangoDB currently does not prevent users from carrying out their own write operations on slaves, though this might lead to undefined behavior and the replication applier stopping. when a replication slave asks a master for log events, the replication master will return all write operations for user-defined collections, but it will exclude write operations for certain system collections. The following collections are excluded intentionally from replication: _apps, _trx, _replication, _configuration, _jobs, _queues, _sessions, _foxxlog and all statistics collections. Write operations for the following system collections can be queried from a master: _aqlfunctions, _graphs, _users. Foxx applications consist of database entries and application scripts in the file system. The file system parts of Foxx applications are not tracked anywhere and thus not replicated in current versions of ArangoDB. To replicate a Foxx application, it is required to copy the application to the remote server and install it there using the foxx-manager utility. master servers do not know which slaves are or will be connected to them. All servers in a replication setup are currently only loosely coupled. There currently is no way for a client to query which servers are present in a replication. when not using our mesos integration failover must be handled by clients or client APIs. there currently is one replication applier per ArangoDB database. It is thus not possible to have a slave apply operations from multiple masters into the same target database. replication is set up on a per-database level. When using ArangoDB with multiple databases, replication must be configured individually for each database. the replication applier is single-threaded, but write operations on the master may be executed in parallel if they affect different collections. Thus the replication applier might not be able to catch up with a very powerful and loaded master. replication is only supported between the two ArangoDB servers running the same ArangoDB version. It is currently not possible to replicate between different ArangoDB versions. a replication applier cannot apply data from itself. 492 Synchronous Replication Synchronous Replication At its core synchronous replication will replicate write operations to multiple hosts. This feature is only available when operating ArangoDB in a cluster. Whenever a coordinator executes a sychronously replicated write operation it will only be reported to be successful if it was carried out on all replicas. In contrast to multi master replication setups known from other systems ArangoDB's synchronous operation guarantees a consistent state across the cluster. 493 Synchronous Replication Implementation Architecture inside the cluster Synchronous replication can be configured per collection via the property replicationFactor. Synchronous replication requires a cluster to operate. Whenever you specify a replicationFactor greater than 1 when creating a collection, synchronous replication will be activated for this collection. The cluster will determine suitable leaders and followers for every requested shard (numberOfShards) within the cluster. When requesting data of a shard only the current leader will be asked whereas followers will only keep their copy in sync. This is due to the current implementation of transactions. Using synchronous replication alone will guarantee consistency and high availabilty at the cost of reduced performance: Write requests will have a higher latency (due to every write-request having to be executed on the followers) and read requests won't scale out as only the leader is being asked. In a cluster synchronous replication will be managed by the coordinators for the client. The data will always be stored on primaries. The following example will give you an idea of how synchronous operation has been implemented in ArangoDB. 1. Connect to a coordinator via arangosh 2. Create a collection 127.0.0.1:8530@_system> db._create("test", {"replicationFactor": 2}) 3. the coordinator will figure out a leader and 1 follower and create 1 shard (as this is the default) 4. Insert data 127.0.0.1:8530@_system> db.test.insert({"replication": " "}) 5. The coordinator will write the data to the leader, which in turn will replicate it to the follower. 6. Only when both were successful the result is reported to be successful { "_id" : "test/7987", "_key" : "7987", "_rev" : "7987" } When a follower fails, the leader will give up on it after 3 seconds and proceed with the operation. As soon as the follower (or the network connection to the leader) is back up, the two will resynchronize and synchronous replication is resumed. This happens all transparently to the client. The current implementation of ArangoDB does not allow changing the replicationFactor later. This is subject to change. In the meantime the only way is to dump and restore the collection. See the cookbook recipe about migrating. Automatic failover Whenever the leader of a shard is failing and there is a query trying to access data of that shard the coordinator will continue trying to contact the leader until it timeouts. The internal cluster supervision running on the agency will check cluster health every few seconds and will take action if there is no heartbeat from a server for 15 seconds. If the leader doesn't come back in time the supervision will reorganize the cluster by promoting for each shard a follower that is in sync with its leader to be the new leader. From then on the coordinators will contact the new leader. The process is best outlined using an example: 1. The leader of a shard (lets name it DBServer001) is going down. 2. A coordinator is asked to return a document: 494 Synchronous Replication 127.0.0.1:8530@_system> db.test.document("100069") 3. The coordinator determines which server is responsible for this document and finds DBServer001 4. The coordinator tries to contact DBServer001 and timeouts because it is not reachable. 5. After a short while the supervision (running in parallel on the agency) will see that heartbeats from DBServer001 are not coming in 6. The supervision promotes one of the followers (say DBServer002) that is in sync to be leader and makes DBServer001 a follower. 7. As the coordinator continues trying to fetch the document it will see that the leader changed to DBServer002 8. The coordinator tries to contact the new leader (DBServer002) and returns the result: { "_key" : "100069", "_id" : "test/100069", "_rev" : "513", "replication" : " " } 9. After a while the supervision declares DBServer001 to be completely dead. 10. A new follower is determined from the pool of DBservers. 11. The new follower syncs its data from the leader and order is restored. Please note that there may still be timeouts. Depending on when exactly the request has been done (in regard to the supervision) and depending on the time needed to reconfigure the cluster the coordinator might fail with a timeout error! 495 Synchronous Replication Configuration Requirements Synchronous replication requires an operational ArangoDB cluster. Enabling synchronous replication Synchronous replication can be enabled per collection. When creating a collection you may specify the number of replicas using the replicationFactor parameter. The default value is set to 1 which effectively disables synchronous replication. Example: 127.0.0.1:8530@_system> db._create("test", {"replicationFactor": 3}) In the above case, any write operation will require 2 replicas to report success from now on. Preparing growth You may create a collection with higher replication factor than available. When additional db servers become available the shards are automatically replicated to the newly available machines. M ultiple replicas of the same shard can never coexist on the same db server instance. 496 Satellite Collections Satellite Collections Satellite Collections are an Enterprise only feature. When doing Joins in an ArangoDB cluster data has to exchanged between different servers. Joins will be executed on a coordinator. It will prepare an execution plan and execute it. When executing the coordinator will contact all shards of the starting point of the join and ask for their data. The database servers carrying out this operation will load all their local data and then ask the cluster for the other part of the join. This again will be distributed to all involved shards of this join part. In sum this results in much network traffic and slow results depending of the amount of data that has to be sent throughout the cluster. Satellite collections are collections that are intended to address this issue. They will facilitate the synchronous replication and replicate all its data to all database servers that are part of the cluster. This enables the database servers to execute that part of any Join locally. This greatly improves performance for such joins at the costs of increased storage requirements and poorer write performance on this data. To create a satellite collection set the replicationFactor of this collection to "satellite". Using arangosh: arangosh> db._create("satellite", {"replicationFactor": "satellite"}); A full example arangosh> var explain = require("@arangodb/aql/explainer").explain arangosh> db._create("satellite", {"replicationFactor": "satellite"}) arangosh> db._create("nonsatellite", {numberOfShards: 8}) arangosh> db._create("nonsatellite2", {numberOfShards: 8}) Let's analyse a normal join not involving satellite collections: arangosh> explain("FOR doc in nonsatellite FOR doc2 in nonsatellite2 RETURN 1") Query string: FOR doc in nonsatellite FOR doc2 in nonsatellite2 RETURN 1 Execution plan: Id NodeType Site 1 SingletonNode DBS Est. 1 Comment 4 CalculationNode DBS 1 - LET #2 = 1 2 EnumerateCollectionNode DBS 0 - FOR doc IN nonsatellite 12 RemoteNode COOR 0 - REMOTE 13 GatherNode COOR 0 - GATHER 6 ScatterNode COOR 0 - SCATTER 7 RemoteNode DBS 0 - REMOTE 3 EnumerateCollectionNode DBS 0 - FOR doc2 IN nonsatellite2 8 RemoteNode COOR 0 - REMOTE 9 GatherNode COOR 0 - GATHER 5 ReturnNode COOR 0 - RETURN #2 * ROOT /* json expression */ /* const assignment */ /* full collection scan */ /* full collection scan */ Indexes used: none Optimization rules applied: Id RuleName 1 move-calculations-up 2 scatter-in-cluster 3 remove-unnecessary-remote-scatter 497 Satellite Collections All shards involved querying the nonsatellite collection will fan out via the coordinator to the shards of shards will open 8 connections to the coordinator asking for the results of the shards of nonsatellite2 nonsatellite2 nonsatellite . In sum 8 join. The coordinator will fan out to the 8 . So there will be quite some network traffic. Let's now have a look at the same using satellite collections: arangosh> db._query("FOR doc in nonsatellite FOR doc2 in satellite RETURN 1") Query string: FOR doc in nonsatellite FOR doc2 in satellite RETURN 1 Execution plan: Id NodeType Site 1 SingletonNode DBS Est. 1 Comment 4 CalculationNode DBS 1 - LET #2 = 1 2 EnumerateCollectionNode DBS 0 - FOR doc IN nonsatellite 3 EnumerateCollectionNode DBS 0 - FOR doc2 IN satellite 8 RemoteNode COOR 0 - REMOTE 9 GatherNode COOR 0 - GATHER 5 ReturnNode COOR 0 - RETURN #2 * ROOT /* json expression */ /* const assignment */ /* full collection scan */ /* full collection scan, satellite */ Indexes used: none Optimization rules applied: Id RuleName 1 move-calculations-up 2 scatter-in-cluster 3 remove-unnecessary-remote-scatter 4 remove-satellite-joins In this scenario all shards of nonsatellite will be contacted. However as the join is a satellite join all shards can do the join locally as the data is replicated to all servers reducing the network overhead dramatically. Caveats The cluster will automatically keep all satellite collections on all servers in sync by facilitating the synchronous replication. This means that write will be executed on the leader only and this server will coordinate replication to the followers. If a follower doesn't answer in time (due to network problems, temporary shutdown etc.) it may be removed as a follower. This is being reported to the Agency. The follower (once back in business) will then periodically check the Agency and know that it is out of sync. It will then automatically catch up. This may take a while depending on how much data has to be synced. When doing a join involving the satellite you can specify how long the DBServer is allowed to wait for sync until the query is being aborted. Check Accessing Cursors for details. During network failure there is also a minimal chance that a query was properly distributed to the DBServers but that a previous satellite write could not be replicated to a follower and the leader dropped the follower. The follower however only checks every few seconds if it is really in sync so it might indeed deliver stale results. 498 Sharding Sharding ArangoDB is organizing its collection data in shards. Sharding allows to use multiple machines to run a cluster of ArangoDB instances that together constitute a single database. This enables you to store much more data, since ArangoDB distributes the data automatically to the different servers. In many situations one can also reap a benefit in data throughput, again because the load can be distributed to multiple machines. Shards are configured per collection so multiple shards of data form the collection as a whole. To determine in which shard the data is to be stored ArangoDB performs a hash across the values. By default this hash is being created from _key. To configure the number of shards: 127.0.0.1:8529@_system> db._create("sharded_collection", {"numberOfShards": 4}); To configure the hashing for another attribute: 127.0.0.1:8529@_system> db._create("sharded_collection", {"numberOfShards": 4, "shardKeys": ["country"]}); This would be useful to keep data of every country in one shard which would result in better performance for queries working on a per country base. You can also specify multiple shardKeys . Note however that if you change the shard keys from their default ["_key"] , then finding a document in the collection by its primary key involves a request to every single shard. Furthermore, in this case one can no longer prescribe the primary key value of a new document but must use the automatically generated one. This latter restriction comes from the fact that ensuring uniqueness of the primary key would be very inefficient if the user could specify the primary key. On which node in a cluster a particular shard is kept is undefined. There is no option to configure an affinity based on certain shard keys. Unique indexes (hash, skiplist, persistent) on sharded collections are only allowed if the fields used to determine the shard key are also included in the list of attribute paths for the index: shardKeys indexKeys a a ok a b not ok a a, b ok a, b a not ok a, b b not ok a, b a, b ok a, b a, b, c ok a, b, c a, b not ok a, b, c a, b, c ok 499 Upgrading General Upgrade Information Recommended major upgrade procedure To upgrade an existing ArangoDB 2.x to 3.0 please use the procedure described here. Recommended minor upgrade procedure To upgrade an existing ArangoDB database to a newer version of ArangoDB (e.g. 3.0 to 3.1, or 3.1 to 3.2), the following method is recommended: Check the CHANGELOG and the list of incompatible changes for API or other changes in the new version of ArangoDB and make sure your applications can deal with them Stop the "old" arangod service or binary Copy the entire "old" data directory to a safe place (that is, a backup) Install the new version of ArangoDB and start the server with the --database.auto-upgrade option once. This might write to the logfile of ArangoDB, so you may want to check the logs for any issues before going on. Start the "new" arangod service or binary regularly and check the logs for any issues. When you're confident everything went well, you may want to check the database directory for any files with the ending .old. These files are created by ArangoDB during upgrades and can be safely removed manually later. If anything goes wrong during or shortly after the upgrade: Stop the "new" arangod service or binary Revert to the "old" arangod binary and restore the "old" data directory Start the "old" version again It is not supported to use datafiles created or modified by a newer version of ArangoDB with an older ArangoDB version. For example, it is unsupported and is likely to cause problems when using 3.2 datafiles with an ArangoDB 3.0 instance. Switching the storage engine In order to use a different storage engine with an existing data directory, it is required to first create a logical backup of the data using arangodump. After that, the arangod server should be restarted with the desired storage engine selected (this can be done by setting the option -server.storage-engine) and using a non-existing data directory. When the server is up and running with the desired storage engine, the data can be re-imported using arangorestore. 500 Upgrading to 3.3 Upgrading to ArangoDB 3.3 Please read the following sections if you upgrade from a previous version to ArangoDB 3.3. Please be sure that you have checked the list of changes in 3.3 before upgrading. 501 Upgrading to 3.2 Upgrading to ArangoDB 3.2 Please read the following sections if you upgrade from a previous version to ArangoDB 3.2. Please be sure that you have checked the list of changes in 3.2 before upgrading. Switching the storage engine In order to use a different storage engine with an existing data directory, it is required to first create a logical backup of the data using arangodump. That backup should be created before the upgrade to 3.2. After that, the ArangoDB installation can be upgraded and stopped. The server should then be restarted with the desired storage engine selected (this can be done by setting the option --server.storage-engine) and using a non-existing data directory. This will start the server with the selected storage engine but with no data. When the server is up and running, the data from the logical backup can be re-imported using arangorestore. 502 Upgrading to 3.1 Upgrading to ArangoDB 3.1 Please read the following sections if you upgrade from a previous version to ArangoDB 3.1. Please be sure that you have checked the list of changes in 3.1 before upgrading. 503 Upgrading to 3.0 Upgrading to ArangoDB 3.0 Please read the following sections if you upgrade from a previous version to ArangoDB 3.0. Please be sure that you have checked the list of changes in 3.0 before upgrading. Migrating databases and collections from ArangoDB 2.8 to 3.0 ArangoDB 3.0 does not provide an automatic update mechanism for database directories created with the 2.x branches of ArangoDB. In order to migrate data from ArangoDB 2.8 (or an older 2.x version) into ArangoDB 3.0, it is necessary to export the data from 2.8 using arangodump , and then import the dump into a fresh ArangoDB 3.0 with To do this, first run the 2.8 version of arangodump arangorestore . to export the database data into a directory. arangodump will dump the _system database by default. In order to make it dump multiple databases, it needs to be invoked once per source database, e.g. # in 2.8 arangodump --server.database _system --output-directory dump-system arangodump --server.database mydb --output-directory dump-mydb ... That will produce a dump directory for each database that arangodump necessary to provide the required credentials when invoking is called for. If the server has authentication turned on, it may be arangodump , e.g. arangodump --server.database _system --server.username myuser --server.password mypasswd --output-directory dump-system The dumps produced by arangodump can now be imported into ArangoDB 3.0 using the 3.0 version of arangodump : # in 3.0 arangorestore --server.database _system --input-directory dump-system arangorestore --server.database mydb --input-directory dump-mydb ... arangorestore will by default fail if the target database does not exist. It can be told to create it automatically using the option database true --create- : arangorestore --server.database mydb --create-database true --input-directory dump-mydb And again it may be required to provide access credentials when invoking arangorestore : arangorestore --server.database mydb --create-database true --server.username myuser --server.password mypasswd --input-directo ry dump-system Please note that the version of dump/restore should match the server version, i.e. it is required to dump the original data with the 2.8 version of arangodump and restore it with the 3.0 version of arangorestore . After that the 3.0 instance of ArangoDB will contain the databases and collections that were present in the 2.8 instance. Adjusting authentication info Authentication information was stored per database in ArangoDB 2.8, meaning there could be different users and access credentials per database. In 3.0, the users are stored in a central location in the _system database. To use the same user setup as in 2.8, it may be required to create extra users and/or adjust their permissions. In order to do that, please connect to the 3.0 instance with an ArangoShell (this will connect to the _system database by default): arangosh --server.username myuser --server.password mypasswd 504 Upgrading to 3.0 Use the following commands to create a new user with some password and grant them access to a specific database require("@arangodb/users").save(username, password, true); require("@arangodb/users").grantDatabase(username, databaseName, "rw"); For example, to create a user myuser with password mypasswd and give them access to databases mydb1 and mydb2 , the commands would look as follows: require("@arangodb/users").save("myuser", "mypasswd", true); require("@arangodb/users").grantDatabase("myuser", "mydb1", "rw"); require("@arangodb/users").grantDatabase("myuser", "mydb2", "rw"); Existing users can also be updated, removed or listed using the following commands: /* update user myuser with password mypasswd */ require("@arangodb/users").update("myuser", "mypasswd", true); /* remove user myuser */ require("@arangodb/users").remove("myuser"); /* list all users */ require("@arangodb/users").all(); Foxx applications The dump/restore procedure described above will not export and re-import Foxx applications. In order to move these from 2.8 to 3.0, Foxx applications should be exported as zip files via the 2.8 web interface. The zip files can then be uploaded in the "Services" section in the ArangoDB 3.0 web interface. Applications may need to be adjusted manually to run in 3.0. Please consult the migration guide for Foxx apps. An alternative way of moving Foxx apps into 3.0 is to copy the source directory of a 2.8 Foxx application manually into the 3.0 Foxx apps directory for the target database (which is normally /var/lib/arangodb3-apps/_db/ / but the exact location is platform- specific). 505 Upgrading to 2.8 Upgrading to ArangoDB 2.8 Please read the following sections if you upgrade from a previous version to ArangoDB 2.8. Please be sure that you have checked the list of changes in 2.8 before upgrading. Please note first that a database directory used with ArangoDB 2.8 cannot be used with earlier versions (e.g. ArangoDB 2.7) any more. Upgrading a database directory cannot be reverted. Therefore please make sure to create a full backup of your existing ArangoDB installation before performing an upgrade. Database Directory Version Check and Upgrade ArangoDB will perform a database version check at startup. When ArangoDB 2.8 encounters a database created with earlier versions of ArangoDB, it will refuse to start. This is intentional. The output will then look like this: 2015-12-04T17:11:17Z [31432] ERROR In database '_system': Database directory version (20702) is lower than current version (208 00). 2015-12-04T17:11:17Z [31432] ERROR In database '_system': --------------------------------------------------------------------2015-12-04T17:11:17Z [31432] ERROR In database '_system': It seems like you have upgraded the ArangoDB binary. 2015-12-04T17:11:17Z [31432] ERROR In database '_system': If this is what you wanted to do, please restart with the 2015-12-04T17:11:17Z [31432] ERROR In database '_system': --upgrade 2015-12-04T17:11:17Z [31432] ERROR In database '_system': option to upgrade the data in the database directory. 2015-12-04T17:11:17Z [31432] ERROR In database '_system': Normally you can use the control script to upgrade your database 2015-12-04T17:11:17Z [31432] ERROR In database '_system': /etc/init.d/arangodb stop 2015-12-04T17:11:17Z [31432] ERROR In database '_system': /etc/init.d/arangodb upgrade 2015-12-04T17:11:17Z [31432] ERROR In database '_system': /etc/init.d/arangodb start 2015-12-04T17:11:17Z [31432] ERROR In database '_system': --------------------------------------------------------------------2015-12-04T17:11:17Z [31432] FATAL Database '_system' needs upgrade. Please start the server with the --upgrade option To make ArangoDB 2.8 start with a database directory created with an earlier ArangoDB version, you may need to invoke the upgrade procedure once. This can be done by running ArangoDB from the command line and supplying the --upgrade option. Note: here the same database should be specified that is also specified when arangod is started regularly. Please do not run the upgrade command on each individual database subfolder (named database- -- ). For example, if you regularly start your ArangoDB server with unix> arangod mydatabasefolder then running unix> arangod mydatabasefolder --upgrade will perform the upgrade for the whole ArangoDB instance, including all of its databases. Starting with --upgrade will run a database version check and perform any necessary migrations. As usual, you should create a backup of your database directory before performing the upgrade. The last line of the output should look like this: 2015-12-04T17:12:15Z [31558] INFO database upgrade passed Please check the full output the --upgrade run. Upgrading may produce errors, which need to be fixed before ArangoDB can be used properly. If no errors are present or they have been resolved manually, you can start ArangoDB 2.8 regularly. Upgrading a cluster planned in the web interface 506 Upgrading to 2.8 A cluster of ArangoDB instances has to be upgraded as well. This involves upgrading all ArangoDB instances in the cluster, as well as running the version check on the whole running cluster in the end. We have tried to make this procedure as painless and convenient for you. We assume that you planned, launched and administrated a cluster using the graphical front end in your browser. The upgrade procedure is then as follows: 1. First shut down your cluster using the graphical front end as usual. 2. Then upgrade all dispatcher instances on all machines in your cluster using the version check as described above and restart them. 3. Now open the cluster dash board in your browser by pointing it to the same dispatcher that you used to plan and launch the cluster in the graphical front end. In addition to the usual buttons "Relaunch", "Edit cluster plan" and "Delete cluster plan" you will see another button marked "Upgrade and relaunch cluster". 4. Hit this button, your cluster will be upgraded and launched and all is done for you behind the scenes. If all goes well, you will see the usual cluster dash board after a few seconds. If there is an error, you have to inspect the log files of your cluster ArangoDB instances. Please let us know if you run into problems. There is an alternative way using the ArangoDB shell. Instead of steps 3. and 4. above you can launch dispatcher that you have used to plan and launch the cluster using the option --server.endpoint , point it to the arangosh , and execute arangosh> require("org/arangodb/cluster").Upgrade("root",""); This upgrades the cluster and launches it, exactly as with the button above in the graphical front end. You have to replace user name and "" "root" with a with a password that is valid for authentication with the cluster. Upgrading Foxx apps generated by ArangoDB 2.7 and earlier The implementation of the require function used to import modules in ArangoDB and Foxx has changed in order to improve compatibility with Node.js modules. Given an app/service with the following layout: manifest.json controllers/ todos.js models/ todo.js repositories/ todos.js node_modules/ models/ todo.js The file controllers/todos.js would previously contain the following require calls: var _ = require('underscore'); var joi = require('joi'); var Foxx = require('org/arangodb/foxx'); var ArangoError = require('org/arangodb').ArangoError; var Todos = require('repositories/todos'); // <-- ! var Todo = require('models/todo'); // <-- ! The require paths repositories/todos and models/todo were previously resolved locally as relative to the app root. Starting with 2.8 these paths would instead be resolved as relative to the node_modules folder or the global ArangoDB module paths before being resolved locally as a fallback. In the given example layout the app would break in 2.8 because the module name node_modules/models/todo.js models/todo (which previously would have been ignored) instead of the local In order to make sure the app still works in 2.8, the require calls in controllers/todos.js would always resolve to models/todo.js . would need to be adjusted to look like this: 507 Upgrading to 2.8 var _ = require('underscore'); var joi = require('joi'); var Foxx = require('org/arangodb/foxx'); var ArangoError = require('org/arangodb').ArangoError; var Todos = require('../repositories/todos'); // <-- ! var Todo = require('../models/todo'); // <-- ! Note that the old "global" style require calls may still work in 2.8 but may break unexpectedly if modules with matching names are installed globally. 508 Upgrading to 2.6 Upgrading to ArangoDB 2.6 Please read the following sections if you upgrade from a previous version to ArangoDB 2.6. Please be sure that you have checked the list of changes in 2.6 before upgrading. Please note first that a database directory used with ArangoDB 2.6 cannot be used with earlier versions (e.g. ArangoDB 2.5) any more. Upgrading a database directory cannot be reverted. Therefore please make sure to create a full backup of your existing ArangoDB installation before performing an upgrade. Database Directory Version Check and Upgrade ArangoDB will perform a database version check at startup. When ArangoDB 2.6 encounters a database created with earlier versions of ArangoDB, it will refuse to start. This is intentional. The output will then look like this: 2015-02-17T09:43:11Z [8302] ERROR In database '_system': Database directory version (20501) is lower than current version (2060 0). 2015-02-17T09:43:11Z [8302] ERROR In database '_system': ---------------------------------------------------------------------2015-02-17T09:43:11Z [8302] ERROR In database '_system': It seems like you have upgraded the ArangoDB binary. 2015-02-17T09:43:11Z [8302] ERROR In database '_system': If this is what you wanted to do, please restart with the 2015-02-17T09:43:11Z [8302] ERROR In database '_system': --upgrade 2015-02-17T09:43:11Z [8302] ERROR In database '_system': option to upgrade the data in the database directory. 2015-02-17T09:43:11Z [8302] ERROR In database '_system': Normally you can use the control script to upgrade your database 2015-02-17T09:43:11Z [8302] ERROR In database '_system': /etc/init.d/arangodb stop 2015-02-17T09:43:11Z [8302] ERROR In database '_system': /etc/init.d/arangodb upgrade 2015-02-17T09:43:11Z [8302] ERROR In database '_system': /etc/init.d/arangodb start 2015-02-17T09:43:11Z [8302] ERROR In database '_system': ---------------------------------------------------------------------2015-02-17T09:43:11Z [8302] FATAL Database '_system' needs upgrade. Please start the server with the --upgrade option To make ArangoDB 2.6 start with a database directory created with an earlier ArangoDB version, you may need to invoke the upgrade procedure once. This can be done by running ArangoDB from the command line and supplying the --upgrade option. Note: here the same database should be specified that is also specified when arangod is started regularly. Please do not run the upgrade command on each individual database subfolder (named database- -- ). For example, if you regularly start your ArangoDB server with unix> arangod mydatabasefolder then running unix> arangod mydatabasefolder --upgrade will perform the upgrade for the whole ArangoDB instance, including all of its databases. Starting with --upgrade will run a database version check and perform any necessary migrations. As usual, you should create a backup of your database directory before performing the upgrade. The last line of the output should look like this: 2014-12-22T12:03:31Z [12026] INFO database upgrade passed Please check the full output the --upgrade run. Upgrading may produce errors, which need to be fixed before ArangoDB can be used properly. If no errors are present or they have been resolved manually, you can start ArangoDB 2.6 regularly. Upgrading a cluster planned in the web interface 509 Upgrading to 2.6 A cluster of ArangoDB instances has to be upgraded as well. This involves upgrading all ArangoDB instances in the cluster, as well as running the version check on the whole running cluster in the end. We have tried to make this procedure as painless and convenient for you. We assume that you planned, launched and administrated a cluster using the graphical front end in your browser. The upgrade procedure is then as follows: 1. First shut down your cluster using the graphical front end as usual. 2. Then upgrade all dispatcher instances on all machines in your cluster using the version check as described above and restart them. 3. Now open the cluster dash board in your browser by pointing it to the same dispatcher that you used to plan and launch the cluster in the graphical front end. In addition to the usual buttons "Relaunch", "Edit cluster plan" and "Delete cluster plan" you will see another button marked "Upgrade and relaunch cluster". 4. Hit this button, your cluster will be upgraded and launched and all is done for you behind the scenes. If all goes well, you will see the usual cluster dash board after a few seconds. If there is an error, you have to inspect the log files of your cluster ArangoDB instances. Please let us know if you run into problems. There is an alternative way using the ArangoDB shell. Instead of steps 3. and 4. above you can launch dispatcher that you have used to plan and launch the cluster using the option --server.endpoint arangosh , point it to the , and execute arangosh> require("org/arangodb/cluster").Upgrade("root",""); This upgrades the cluster and launches it, exactly as with the button above in the graphical front end. You have to replace user name and "" "root" with a with a password that is valid for authentication with the cluster. 510 Upgrading to 2.5 Upgrading to ArangoDB 2.5 Please read the following sections if you upgrade from a previous version to ArangoDB 2.5. Please be sure that you have checked the list of changes in 2.5 before upgrading. Please note first that a database directory used with ArangoDB 2.5 cannot be used with earlier versions (e.g. ArangoDB 2.4) any more. Upgrading a database directory cannot be reverted. Therefore please make sure to create a full backup of your existing ArangoDB installation before performing an upgrade. In 2.5 we have also changed the paths for Foxx applications. Please also make sure that you have a backup of all Foxx apps in your javascript.app-path and javascript.dev-app-path . It is sufficient to have the source files for Foxx somewhere else so you can reinstall them on error. To check that everything has worked during upgrade you could use the web-interface Applications tab or unix> foxx-manager list for all your databases. The listed apps should be identical before and after the upgrade. Database Directory Version Check and Upgrade ArangoDB will perform a database version check at startup. When ArangoDB 2.5 encounters a database created with earlier versions of ArangoDB, it will refuse to start. This is intentional. The output will then look like this: 2015-02-17T09:43:11Z [8302] ERROR In database '_system': Database directory version (20401) is lower than current version (2050 0). 2015-02-17T09:43:11Z [8302] ERROR In database '_system': ---------------------------------------------------------------------2015-02-17T09:43:11Z [8302] ERROR In database '_system': It seems like you have upgraded the ArangoDB binary. 2015-02-17T09:43:11Z [8302] ERROR In database '_system': If this is what you wanted to do, please restart with the 2015-02-17T09:43:11Z [8302] ERROR In database '_system': --upgrade 2015-02-17T09:43:11Z [8302] ERROR In database '_system': option to upgrade the data in the database directory. 2015-02-17T09:43:11Z [8302] ERROR In database '_system': Normally you can use the control script to upgrade your database 2015-02-17T09:43:11Z [8302] ERROR In database '_system': /etc/init.d/arangodb stop 2015-02-17T09:43:11Z [8302] ERROR In database '_system': /etc/init.d/arangodb upgrade 2015-02-17T09:43:11Z [8302] ERROR In database '_system': /etc/init.d/arangodb start 2015-02-17T09:43:11Z [8302] ERROR In database '_system': ---------------------------------------------------------------------2015-02-17T09:43:11Z [8302] FATAL Database '_system' needs upgrade. Please start the server with the --upgrade option To make ArangoDB 2.5 start with a database directory created with an earlier ArangoDB version, you may need to invoke the upgrade procedure once. This can be done by running ArangoDB from the command line and supplying the --upgrade option. Note: We have changed Foxx folder structure and implemented an upgrade task to move your applications to the new structure. In order to tell this upgrade task to also move your development Foxx apps please make sure you give the dev-app-path as well. If you have not used development mode for Foxx apps you can drop the --javascript.dev-app-path . It is only possible to upgrade one dev-app-path together with one data folder. unix> arangod data --upgrade --javascript.dev-app-path devapps where data is ArangoDB's main data directory and is the directory where you develop Foxx apps. devapps Note: here the same database should be specified that is also specified when arangod is started regularly. Please do not run the upgrade command on each individual database subfolder (named database- -- ). For example, if you regularly start your ArangoDB server with unix> arangod mydatabasefolder then running unix> arangod mydatabasefolder --upgrade 511 Upgrading to 2.5 will perform the upgrade for the whole ArangoDB instance, including all of its databases. Starting with --upgrade will run a database version check and perform any necessary migrations. As usual, you should create a backup of your database directory before performing the upgrade. The last line of the output should look like this: 2014-12-22T12:03:31Z [12026] INFO database upgrade passed Please check the full output the --upgrade run. Upgrading may produce errors, which need to be fixed before ArangoDB can be used properly. If no errors are present or they have been resolved manually, you can start ArangoDB 2.5 regularly. Upgrading a cluster planned in the web interface A cluster of ArangoDB instances has to be upgraded as well. This involves upgrading all ArangoDB instances in the cluster, as well as running the version check on the whole running cluster in the end. We have tried to make this procedure as painless and convenient for you. We assume that you planned, launched and administrated a cluster using the graphical front end in your browser. The upgrade procedure is then as follows: 1. First shut down your cluster using the graphical front end as usual. 2. Then upgrade all dispatcher instances on all machines in your cluster using the version check as described above and restart them. 3. Now open the cluster dash board in your browser by pointing it to the same dispatcher that you used to plan and launch the cluster in the graphical front end. In addition to the usual buttons "Relaunch", "Edit cluster plan" and "Delete cluster plan" you will see another button marked "Upgrade and relaunch cluster". 4. Hit this button, your cluster will be upgraded and launched and all is done for you behind the scenes. If all goes well, you will see the usual cluster dash board after a few seconds. If there is an error, you have to inspect the log files of your cluster ArangoDB instances. Please let us know if you run into problems. There is an alternative way using the ArangoDB shell. Instead of steps 3. and 4. above you can launch dispatcher that you have used to plan and launch the cluster using the option --server.endpoint arangosh , point it to the , and execute arangosh> require("org/arangodb/cluster").Upgrade("root",""); This upgrades the cluster and launches it, exactly as with the button above in the graphical front end. You have to replace user name and "" "root" with a with a password that is valid for authentication with the cluster. 512 Upgrading to 2.4 Upgrading to ArangoDB 2.4 Please read the following sections if you upgrade from a previous version to ArangoDB 2.4. Please be sure that you have checked the list of changes in 2.4 before upgrading. Please note first that a database directory used with ArangoDB 2.4 cannot be used with earlier versions (e.g. ArangoDB 2.3) any more. Upgrading a database directory cannot be reverted. Therefore please make sure to create a full backup of your existing ArangoDB installation before performing an upgrade. Database Directory Version Check and Upgrade ArangoDB will perform a database version check at startup. When ArangoDB 2.4 encounters a database created with earlier versions of ArangoDB, it will refuse to start. This is intentional. The output will then look like this: 2014-12-22T12:02:28Z [12001] ERROR In database '_system': Database directory version (20302) is lower than current version (204 00). 2014-12-22T12:02:28Z [12001] ERROR In database '_system': --------------------------------------------------------------------2014-12-22T12:02:28Z [12001] ERROR In database '_system': It seems like you have upgraded the ArangoDB binary. 2014-12-22T12:02:28Z [12001] ERROR In database '_system': If this is what you wanted to do, please restart with the 2014-12-22T12:02:28Z [12001] ERROR In database '_system': --upgrade 2014-12-22T12:02:28Z [12001] ERROR In database '_system': option to upgrade the data in the database directory. 2014-12-22T12:02:28Z [12001] ERROR In database '_system': Normally you can use the control script to upgrade your database 2014-12-22T12:02:28Z [12001] ERROR In database '_system': /etc/init.d/arangodb stop 2014-12-22T12:02:28Z [12001] ERROR In database '_system': /etc/init.d/arangodb upgrade 2014-12-22T12:02:28Z [12001] ERROR In database '_system': /etc/init.d/arangodb start 2014-12-22T12:02:28Z [12001] ERROR In database '_system': --------------------------------------------------------------------2014-12-22T12:02:28Z [12001] FATAL Database '_system' needs upgrade. Please start the server with the --upgrade option To make ArangoDB 2.4 start with a database directory created with an earlier ArangoDB version, you may need to invoke the upgrade procedure once. This can be done by running ArangoDB from the command line and supplying the --upgrade option: unix> arangod data --upgrade where data is ArangoDB's main data directory. Note: here the same database should be specified that is also specified when arangod is started regularly. Please do not run the upgrade command on each individual database subfolder (named database- -- ). For example, if you regularly start your ArangoDB server with unix> arangod mydatabasefolder then running unix> arangod mydatabasefolder --upgrade will perform the upgrade for the whole ArangoDB instance, including all of its databases. Starting with --upgrade will run a database version check and perform any necessary migrations. As usual, you should create a backup of your database directory before performing the upgrade. The last line of the output should look like this: 2014-12-22T12:03:31Z [12026] INFO database upgrade passed 513 Upgrading to 2.4 Please check the full output the --upgrade run. Upgrading may produce errors, which need to be fixed before ArangoDB can be