The Neo4j Manual V2.3.12 2.3.12

neo4j-manual-2.3.12

User Manual:

Open the PDF directly: View PDF .
Page Count: 656 [warning: Documents this large are best viewed by clicking the View PDF Link!]

The Neo4j Manual v2.3.12
Table of Contents
Preface
Part I. Introduction
- Chapter 1. Neo4j Highlights
- Chapter 2. Graph Database Concepts
  - 2.1. The Neo4j Graph Database
  - 2.2. Comparing Database Models
Part II. Tutorials
Part III. Cypher Query Language
Part IV. Reference
Part V. Operations
Part VI. Tools
Part VII. Advanced Usage
Terminology
Appendix A. Resources
Appendix B. Manpages

The Neo4j Manual v2.3.12

The Neo4j Team neo4j.com1

1 http://neo4j.com/

The Neo4j Manual v2.3.12

by The Neo4j Team neo4j.com1

Publication date 2017-12-0812:26:35

Starting points

•What is the Neo4j graph database?

•Cypher Query Language

•REST API

•Installation

•Upgrading

•Security

•Resources

License: Creative Commons 3.0

This book is presented in open source and licensed through Creative Commons 3.0. You are free to copy, distribute, transmit, and/or

adapt the work. This license is based upon the following conditions:

Attribution You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that

they endorse you or your use of the work).

Share Alike If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or

a compatible license.

Any of the above conditions can be waived if you get permission from the copyright holder.

In no way are any of the following rights affected by the license:

• Your fair dealing or fair use rights

• The author’s moral rights

• Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights

Note

For any reuse or distribution, you must make clear to the others the license terms of this work. The best way to do this is

with a direct link to this page: http://creativecommons.org/licenses/by-sa/3.0/ 2

1 http://neo4j.com/

2 http://creativecommons.org/licenses/by-sa/3.0/

Preface ........................................................................................................................................................ v

I. Introduction ............................................................................................................................................. 1

1. Neo4j Highlights ............................................................................................................................. 3

2. Graph Database Concepts ............................................................................................................. 4

II. Tutorials ................................................................................................................................................ 14

3. Introduction to Cypher ................................................................................................................ 16

4. Use Cypher in an application ...................................................................................................... 46

5. Basic Data Modeling Examples ................................................................................................... 47

6. Advanced Data Modeling Examples ............................................................................................ 62

7. Languages .................................................................................................................................... 96

III. Cypher Query Language ................................................................................................................... 102

8. Introduction ................................................................................................................................ 105

9. Syntax ......................................................................................................................................... 118

10. General Clauses ....................................................................................................................... 136

11. Reading Clauses ....................................................................................................................... 154

12. Writing Clauses ........................................................................................................................ 186

13. Functions .................................................................................................................................. 214

14. Schema ..................................................................................................................................... 243

15. Query Tuning ............................................................................................................................ 253

16. Execution Plans ........................................................................................................................ 259

IV. Reference ........................................................................................................................................... 277

17. Capabilities ............................................................................................................................... 279

18. Transaction Management ........................................................................................................ 285

19. Data Import .............................................................................................................................. 295

20. Graph Algorithms ..................................................................................................................... 296

21. REST API ................................................................................................................................... 297

22. Deprecations ............................................................................................................................ 434

V. Operations .......................................................................................................................................... 435

23. Installation & Deployment ....................................................................................................... 437

24. Configuration & Performance .................................................................................................. 448

25. High Availability ........................................................................................................................ 472

26. Backup ...................................................................................................................................... 494

27. Security ..................................................................................................................................... 499

28. Monitoring ................................................................................................................................ 505

VI. Tools .................................................................................................................................................. 529

29. Import tool ............................................................................................................................... 531

30. Web Interface ........................................................................................................................... 544

31. Neo4j Shell ............................................................................................................................... 545

VII. Advanced Usage ............................................................................................................................... 561

32. Extending the Neo4j Server ..................................................................................................... 563

33. Using Neo4j embedded in Java applications ........................................................................... 576

34. The Traversal Framework ........................................................................................................ 613

35. Legacy Indexing ........................................................................................................................ 621

36. Batch Insertion ......................................................................................................................... 636

Terminology ............................................................................................................................................ 640

A. Resources ........................................................................................................................................... 644

B. Manpages ........................................................................................................................................... 645

neo4j ............................................................................................................................................... 646

neo4j-shell ...................................................................................................................................... 647

neo4j-import ................................................................................................................................... 648

neo4j-backup .................................................................................................................................. 650

neo4j-arbiter ................................................................................................................................... 651

Preface

This is the reference manual for Neo4j version 2.3.12, authored by the Neo4j Team.

The main parts of the manual are:

•PartI, “Introduction” [1] — introducing graph database concepts and Neo4j.

•PartII, “Tutorials” [14] — learn how to use Neo4j.

•PartIII, “Cypher Query Language” [102] — details on the Cypher query language.

•PartIV, “Reference” [277] — detailed information on Neo4j.

•PartV, “Operations” [435] — how to install and maintain Neo4j.

•PartVI, “Tools” [529] — guides on tools.

•PartVII, “Advanced Usage” [561] — using Neo4j in more advanced ways.

•Terminology [640] — terminology about graph databases.

•AppendixA, Resources [644] — find additional documentation resources.

•AppendixB, Manpages [645] — command line documentation.

The material is practical, technical, and focused on answering specific questions. It addresses how

things work, what to do and what to avoid to successfully run Neo4j in a production environment.

The goal is to be thumb-through and rule-of-thumb friendly.

Each section should stand on its own, so you can hop right to whatever interests you. When possible,

the sections distill “rules of thumb” which you can keep in mind whenever you wander out of the house

without this manual in your back pocket.

The included code examples are executed when Neo4j is built and tested. Also, the REST API request

and response examples are captured from real interaction with a Neo4j server. Thus, the examples are

always in sync with how Neo4j actually works.

There are other documentation resources besides the manual as well, see AppendixA,

Resources [644].

Who should read this?

The topics should be relevant to architects, administrators, developers and operations personnel.

Where to get help?

You can learn a lot about Neo4j at different events. To get information on upcoming Neo4j events, have

a look here:

•http://neo4j.com/events/

•http://neo4j.meetup.com/

Get help from the Neo4j open source community; here are some starting points.

• The neo4j tag at stackoverflow: http://stackoverflow.com/questions/tagged/neo4j

• Neo4j Discussions: https://groups.google.com/forum/#!forum/neo4j

• Twitter: https://twitter.com/neo4j

Report a bug or add a feature request:

•https://github.com/neo4j/neo4j/issues

Questions regarding the documentation: The Neo4j Manual is published online with a comment

function, please use that to post any questions or comments regarding the documentation.

If you want to contribute to the Neo4j open source project, see http://neo4j.com/developer/contribute/.

PartI.Introduction

This part gives a bird’s eye view of what a graph database is and also outlines some specifics of Neo4j.

1. Neo4j Highlights ..................................................................................................................................... 3

2. Graph Database Concepts ..................................................................................................................... 4

2.1. The Neo4j Graph Database ........................................................................................................ 5

2.2. Comparing Database Models ................................................................................................... 11

Chapter1.Neo4j Highlights

As a robust, scalable and high-performance database, Neo4j is suitable for full enterprise deployment.

It features:

• true ACID transactions,

• high availability,

• scales to billions of nodes and relationships,

• high speed querying through traversals,

• declarative graph query language.

Proper ACID behavior is the foundation of data reliability. Neo4j enforces that all operations that

modify data occur within a transaction, guaranteeing consistent data. This robustness extends

from single instance embedded graphs to multi-server high availability installations. For details, see

Chapter18, Transaction Management [285].

Reliable graph storage can easily be added to any application. A graph can scale in size and complexity

as the application evolves, with little impact on performance. Whether starting new development, or

augmenting existing functionality, Neo4j is only limited by physical hardware.

A single server instance can handle a graph of billions of nodes and relationships. When data

throughput is insufficient, the graph database can be distributed among multiple servers in a high

availability configuration. See Chapter25, High Availability [472] to learn more.

The graph database storage shines when storing richly-connected data. Querying is performed through

traversals, which can perform millions of traversal steps per second. A traversal step resembles a join in

a RDBMS.

Chapter2.Graph Database Concepts

This chapter contains an introduction to the graph data model and also compares it to other data

models used when persisting data.

Graph Database Concepts

2.1.The Neo4j Graph Database

A graph database stores data in a graph, the most generic of data structures, capable of elegantly

representing any kind of data in a highly accessible way.

For terminology around graph databases, see Terminology [640].

Here’s an example graph which we will approach step by step in the following sections:

Person

nam e = 'Tom Hanks'

born = 1956

Movie

title = 'Forrest Gum p'

released = 1994

ACTED_IN

roles = ['Forrest']

Person

nam e = 'Robert Zemeckis'

born = 1951

DIRECTED

Nodes

A graph records data in nodes and relationships. Both can have properties. This is

sometimes referred to as the Property Graph Model.

The fundamental units that form a graph are nodes and relationships. In Neo4j, both nodes and

relationships can contain properties.

Nodes are often used to represent entities, but depending on the domain relationships may be used for

that purpose as well.

Apart from properties and relationships, nodes can also be labeled with zero or more labels.

The simplest possible graph is a single Node. A Node can have zero or more named values referred to

as properties. Let’s start out with one node that has a single property named title:

title = 'Forrest Gum p'

The next step is to have multiple nodes. Let’s add two more nodes and one more property on the node

in the previous example:

nam e = 'Tom Hanks'

born = 1956 title = 'Forrest Gum p'

released = 1994 nam e = 'Robert Zemeckis'

born = 1951

Relationships

Relationships organize the nodes by connecting them. A relationship connects two

nodes — a start node and an end node. Just like nodes, relationships can have properties.

Relationships between nodes are a key part of a graph database. They allow for finding related data.

Just like nodes, relationships can have properties.

A relationship connects two nodes, and is guaranteed to have valid start and end nodes.

Graph Database Concepts

Relationships organize nodes into arbitrary structures, allowing a graph to resemble a list, a tree,

a map, or a compound entity — any of which can be combined into yet more complex, richly inter-

connected structures.

Our example graph will make a lot more sense once we add relationships to it:

nam e = 'Tom Hanks'

born = 1956

title = 'Forrest Gum p'

released = 1994

ACTED_IN

roles = ['Forrest']

nam e = 'Robert Zemeckis'

born = 1951

DIRECTED

Our example uses ACTED_IN and DIRECTED as relationship types. The roles property on the ACTED_IN

relationship has an array value with a single item in it.

Below is an ACTED_IN relationship, with the Tom Hanks node as start node and Forrest Gump as end node.

nam e = 'Tom Hanks'

born = 1956 title = 'Forrest Gum p'

released = 1994

ACTED_IN

roles = ['Forrest']

You could also say that the Tom Hanks node has an outgoing relationship, while the Forrest Gump node has

an incoming relationship.

Relationships are equally well traversed in either direction.

This means that there is no need to add duplicate relationships in the opposite direction

(with regard to traversal or performance).

While relationships always have a direction, you can ignore the direction where it is not useful in your

application.

Note that a node can have relationships to itself as well:

nam e = 'Tom Hanks'

born = 1956 KNOWS

The example above would mean that Tom Hanks KNOWS himself.

To further enhance graph traversal all relationships have a relationship type.

Let’s have a look at what can be found by simply following the relationships of a node in our example

graph:

nam e = 'Tom Hanks'

born = 1956

title = 'Forrest Gum p'

released = 1994

ACTED_IN

roles = ['Forrest']

nam e = 'Robert Zemeckis'

born = 1951

DIRECTED

Graph Database Concepts

Using relationship direction and type

What we want to know Start from Relationship type Direction

get actors in movie movie node ACTED_IN incoming

get movies with actor person node ACTED_IN outgoing

get directors of movie movie node DIRECTED incoming

get movies directed by person node DIRECTED outgoing

Properties

Both nodes and relationships can have properties.

Properties are named values where the name is a string. The supported property values are:

• Numeric values,

• String values,

• Boolean values,

• Collections of any other type of value.

NULL is not a valid property value.

NULLs can instead be modeled by the absence of a key.

For further details on supported property values, see Section33.3, “Property values” [584].

Labels

Labels assign roles or types to nodes.

A label is a named graph construct that is used to group nodes into sets; all nodes labeled with the

same label belongs to the same set. Many database queries can work with these sets instead of the

whole graph, making queries easier to write and more efficient to execute. A node may be labeled with

any number of labels, including none, making labels an optional addition to the graph.

Labels are used when defining constraints and adding indexes for properties (see the section called

“Schema” [9]).

An example would be a label named User that you label all your nodes representing users with. With

that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all

users with a given name.

However, you can use labels for much more. For instance, since labels can be added and removed

during runtime, they can be used to mark temporary states for your nodes. You might create an Offline

label for phones that are offline, a Happy label for happy pets, and so on.

In our example, we’ll add Person and Movie labels to our graph:

Person

nam e = 'Tom Hanks'

born = 1956

Movie

title = 'Forrest Gum p'

released = 1994

ACTED_IN

roles = ['Forrest']

Person

nam e = 'Robert Zemeckis'

born = 1951

DIRECTED

Graph Database Concepts

A node can have multiple labels, let’s add an Actor label to the Tom Hanks node.

Person

Actor

nam e = 'Tom Hanks'

born = 1956

Label names

Any non-empty Unicode string can be used as a label name. In Cypher, you may need to use the

backtick (`) syntax to avoid clashes with Cypher identifier rules or to allow non-alphanumeric characters

in a label. By convention, labels are written with CamelCase notation, with the first letter in upper case.

For instance, User or CarOwner.

Labels have an id space of an int, meaning the maximum number of labels the database can contain is

roughly 2 billion.

Traversal

A traversal navigates through a graph to find paths.

A traversal is how you query a graph, navigating from starting nodes to related nodes, finding answers

to questions like “what music do my friends like that I don’t yet own,” or “if this power supply goes

down, what web services are affected?”

Traversing a graph means visiting its nodes, following relationships according to some rules. In most

cases only a subgraph is visited, as you already know where in the graph the interesting nodes and

relationships are found.

Cypher provides a declarative way to query the graph powered by traversals and other techniques. See

PartIII, “Cypher Query Language” [102] for more information.

When writing server plugins or using Neo4j embedded, Neo4j provides a callback based traversal API

which lets you specify the traversal rules. At a basic level there’s a choice between traversing breadth-

or depth-first.

If we want to find out which movies Tom Hanks acted in according to our tiny example database the

traversal would start from the Tom Hanks node, follow any ACTED_IN relationships connected to the node,

and end up with Forrest Gump as the result (see the dashed lines):

Person

nam e = 'Tom Hanks'

born = 1956

Movie

title = 'Forrest Gum p'

released = 1994

ACTED_IN

roles = ['Forrest']

Person

nam e = 'Robert Zemeckis'

born = 1951

DIRECTED

Paths

A path is one or more nodes with connecting relationships, typically retrieved as a query

or traversal result.

In the previous example, the traversal result could be returned as a path:

Graph Database Concepts

Person name = 'Tom Hanks'

born = 1956 Movie title = 'Forrest Gum p'

released = 1994

ACTED_IN

roles = ['Forrest']

The path above has length one.

The shortest possible path has length zero — that is it contains only a single node and no

relationships — and can look like this:

Person

nam e = 'Tom Hanks'

born = 1956

This path has length one:

Person

nam e = 'Tom Hanks'

born = 1956

KNOWS

Schema

Neo4j is a schema-optional graph database.

You can use Neo4j without any schema. Optionally you can introduce it in order to gain performance or

modeling benefits. This allows a way of working where the schema does not get in your way until you

are at a stage where you want to reap the benefits of having one.

Note

Schema commands can only be applied on the master machine in a Neo4j cluster (see

Chapter25, High Availability [472]). If you apply them on a slave you will receive a

Neo.ClientError.Transaction.InvalidType error code (see Section21.2, “Neo4j Status

Codes” [307]).

Indexes

Performance is gained by creating indexes, which improve the speed of looking up nodes

in the database.

Note

This feature was introduced in Neo4j 2.0, and is not the same as the legacy indexes (see

Chapter35, Legacy Indexing [621]).

Once you’ve specified which properties to index, Neo4j will make sure your indexes are kept up to date

as your graph evolves. Any operation that looks up nodes by the newly indexed properties will see a

significant performance boost.

Indexes in Neo4j are eventually available. That means that when you first create an index the operation

returns immediately. The index is populating in the background and so is not immediately available for

querying. When the index has been fully populated it will eventually come online. That means that it is

now ready to be used in queries.

If something should go wrong with the index, it can end up in a failed state. When it is failed, it will not

be used to speed up queries. To rebuild it, you can drop and recreate the index. Look at logs for clues

about the failure.

Graph Database Concepts

You can track the status of your index by asking for the index state through the API you are using. Note,

however, that this is not yet possible through Cypher.

How to use indexes through the different APIs:

• Cypher: Section14.1, “Indexes” [244]

• REST API: Section21.15, “Indexing” [368]

• Listing Indexes via Shell: the section called “Listing Indexes and Constraints” [554]

• Java Core API: Section33.4, “User database with indexes” [585]

Constraints

Note

This feature was introduced in Neo4j 2.0.

Neo4j can help you keep your data clean. It does so using constraints, that allow you to specify the

rules for what your data should look like. Any changes that break these rules will be denied.

In this version, unique constraints is the only available constraint type.

How to use constraints through the different APIs:

• Cypher: Section14.2, “Constraints” [247]

• REST API: Section21.16, “Constraints” [370]

• Listing Constraints via Shell: the section called “Listing Indexes and Constraints” [554]

Graph Database Concepts

2.2.Comparing Database Models

A graph database stores data structured in the nodes and relationships of a graph. How does this

compare to other persistence models? Because a graph is a generic structure, let’s compare how a few

models would look in a graph.

A Graph Database transforms a RDBMS

Topple the stacks of records in a relational database while keeping all the relationships, and you’ll see

a graph. Where an RDBMS is optimized for aggregated data, Neo4j is optimized for highly connected

data.

Figure2.1.RDBMS

Figure2.2.Graph Database as RDBMS

B1B2

B4B6

B3B5 B7

C1 C2C3

A Graph Database elaborates a Key-Value Store

A Key-Value model is great for lookups of simple values or lists. When the values are themselves

interconnected, you’ve got a graph. Neo4j lets you elaborate the simple data structures into more

complex, interconnected data.

Graph Database Concepts

Figure2.3.Key-Value Store

K* represents a key, V* a value. Note that some keys point to other keys as well as plain values.

Figure2.4.Graph Database as Key-Value Store

A Graph Database relates Column-Family

Column Family (BigTable-style) databases are an evolution of key-value, using "families" to allow

grouping of rows. Stored in a graph, the families could become hierarchical, and the relationships

among data becomes explicit.

A Graph Database navigates a Document Store

The container hierarchy of a document database accommodates nice, schema-free data that can easily

be represented as a tree. Which is of course a graph. Refer to other documents (or document elements)

within that tree and you have a more expressive representation of the same data. When in Neo4j, those

relationships are easily navigable.

Graph Database Concepts

Figure2.5.Document Store

S2S3

V1D2/S2 V2V3V4D1/S1

D=Document, S=Subdocument, V=Value, D2/S2 = reference to subdocument in (other) document.

Figure2.6.Graph Database as Document Store

D2 S2S3

PartII.Tutorials

The tutorial part describes how use Neo4j. It takes you from Hello World to advanced usage of graphs.

3. Introduction to Cypher ........................................................................................................................ 16

3.1. Background and Motivation ..................................................................................................... 17

3.2. Graphs, Patterns, and Cypher .................................................................................................. 18

3.3. Patterns in Practice ................................................................................................................... 21

3.4. Getting the Results You Want ................................................................................................... 26

3.5. How to Compose Large Statements ......................................................................................... 30

3.6. Labels, Constraints and Indexes ............................................................................................... 32

3.7. Loading Data ............................................................................................................................. 34

3.8. Utilizing Data Structures ........................................................................................................... 37

3.9. Cypher vs. SQL .......................................................................................................................... 40

4. Use Cypher in an application .............................................................................................................. 46

5. Basic Data Modeling Examples ........................................................................................................... 47

5.1. Movie Database ......................................................................................................................... 48

5.2. Social Movie Database .............................................................................................................. 50

5.3. Finding Paths ............................................................................................................................. 52

5.4. Linked Lists ................................................................................................................................ 56

5.5. TV Shows ................................................................................................................................... 58

6. Advanced Data Modeling Examples .................................................................................................... 62

6.1. ACL structures in graphs .......................................................................................................... 63

6.2. Hyperedges ................................................................................................................................ 67

6.3. Basic friend finding based on social neighborhood ................................................................. 69

6.4. Co-favorited places ................................................................................................................... 70

6.5. Find people based on similar favorites .................................................................................... 72

6.6. Find people based on mutual friends and groups ................................................................... 73

6.7. Find friends based on similar tagging ...................................................................................... 74

6.8. Multirelational (social) graphs ................................................................................................... 75

6.9. Implementing newsfeeds in a graph ........................................................................................ 76

6.10. Boosting recommendation results ......................................................................................... 79

6.11. Calculating the clustering coefficient of a network ................................................................ 80

6.12. Pretty graphs ........................................................................................................................... 81

6.13. A multilevel indexing structure (path tree) ............................................................................. 85

6.14. Complex similarity computations ........................................................................................... 89

6.15. The Graphity activity stream model ....................................................................................... 90

6.16. User roles in graphs ................................................................................................................ 92

7. Languages ............................................................................................................................................ 96

7.1. How to use the REST API from Java ......................................................................................... 97

Chapter3.Introduction to Cypher

This friendly guide will introduce you to Cypher, Neo4j’s query language.

The guide will help you:

• start thinking about graphs and patterns,

• apply this knowledge to simple problems,

• learn how to write Cypher statements,

• use Cypher for loading data,

• transition from SQL to Cypher.

If you want to keep a reference at your side while reading, please see the Cypher Refcard1.

Work in Progress

There may still be unfinished parts in this chapter. Please comment on it so we can make it

suit our readers better!

1 http://neo4j.com/docs/2.3.12/cypher-refcard/

Introduction to Cypher

3.1.Background and Motivation

Cypher provides a convenient way to express queries and other Neo4j actions. Although Cypher

is particularly useful for exploratory work, it is fast enough to be used in production. Java-based

approaches (eg, unmanaged extensions) can also be used to handle particularly demanding use cases.

Query processing

To use Cypher effectively, it’s useful to have an idea of how it works. So, let’s take a high-level look at the

way Cypher processes queries.

• Parse and validate the query.

• Generate the execution plan.

• Locate the initial node(s).

• Select and traverse relationships.

• Change and/or return values.

Preparation

Parsing and validating the Cypher statement(s) is important, but mundane. However, generating an

optimal search strategy can be far more challenging.

The execution plan must tell the database how to locate initial node(s), select relationships for traversal,

etc. This involves tricky optimization problems (eg, which actions should happen first), but we can safely

leave the details to the Neo4j engineers. So, let’s move on to locating the initial node(s).

Locate the initial node(s)

Neo4j is highly optimized for traversing property graphs. Under ideal circumstances, it can traverse

millions of nodes and relationships per second, following chains of pointers in the computer’s memory.

However, before traversal can begin, Neo4j must know one or more starting nodes. Unless the user (or,

more likely, a client program) can provide this information, Neo4j will have to search for these nodes.

A “brute force” search of the database (eg, for a specified property value) can be very time consuming.

Every node must be examined, first to see if it has the property, then to see if the value meets the

desired criteria. To avoid this effort, Neo4j creates and uses indexes. So, Neo4j uses a separate index

for each label/property combination.

Traversal and actions

Once the initial nodes are determined, Neo4j can traverse portions of the graph and perform any

requested actions. The execution plan helps Neo4j to determine which nodes are relevant, which

relationships to traverse, etc.

Introduction to Cypher

3.2.Graphs, Patterns, and Cypher

Nodes, Relationships, and Patterns

Neo4j’s Property Graphs are composed of nodes and relationships, either of which may have properties

(ie, attributes). Nodes represent entities (eg, concepts, events, places, things); relationships (which may

be directed) connect pairs of nodes.

However, nodes and relationships are simply low-level building blocks. The real strength of the Property

Graph lies in its ability to encode patterns of connected nodes and relationships. A single node or

relationship typically encodes very little information, but a pattern of nodes and relationships can

encode arbitrarily complex ideas.

Cypher, Neo4j’s query language, is strongly based on patterns. Specifically, patterns are used to match

desired graph structures. Once a matching structure has been found (or created), Neo4j can use it for

further processing.

Simple and Complex Patterns

A simple pattern, which has only a single relationship, connects a pair of nodes (or, occasionally, a node

to itself). For example, a Person LIVES_IN a City or a City is PART_OF a Country.

Complex patterns, using multiple relationships, can express arbitrarily complex concepts and support

a variety of interesting use cases. For example, we might want to match instances where a Person

LIVES_IN a Country. The following Cypher code combines two simple patterns into a (mildly) complex

pattern which performs this match:

(:Person) -[:LIVES_IN]-> (:City) -[:PART_OF]-> (:Country)

Pattern recognition is fundamental to the way that the brain works. Consequently, humans are very

good at working with patterns. When patterns are presented visually (eg, in a diagram or map), humans

can use them to recognize, specify, and understand concepts. As a pattern-based language, Cypher

takes advantage of this capability.

Cypher Concepts

Like SQL2 (used in relational databases3), Cypher is a textual, declarative query language. It uses a form

of ASCII art4 to represent graph-related patterns. SQL-like clauses and keywords (eg, MATCH, WHERE, DELETE)

are used to combine these patterns and specify desired actions.

This combination tells Neo4j which patterns to match and what to do with the matching items (eg,

nodes, relationships, paths, collections). However, as a declarative5 language, Cypher does not tell

Neo4j how to find nodes, traverse relationships, etc. (This level of control is available from Neo4j’s Java6

APIs7, see Section32.2, “Unmanaged Extensions” [568])

Diagrams made up of icons and arrows are commonly used to visualize graphs; textual annotations

provide labels, define properties, etc. Cypher’s ASCII-art syntax formalizes this approach, while adapting

it to the limitations of text.

Node Syntax

Cypher uses a pair of parentheses (usually containing a text string) to represent a node, eg: (), (foo).

This is reminiscent of a circle or a rectangle with rounded end caps. Here are some ASCII-art encodings

for example Neo4j nodes, providing varying types and amounts of detail:

()

2 https://en.wikipedia.org/wiki/SQL

3 https://en.wikipedia.org/wiki/Relational_database_management_system

4 https://en.wikipedia.org/wiki/ASCII_art

5 https://en.wikipedia.org/wiki/Declarative_programming

6 https://en.wikipedia.org/wiki/Java_(programming_language)

7 https://en.wikipedia.org/wiki/Application_programming_interface

Introduction to Cypher

(matrix)

(:Movie)

(matrix:Movie)

(matrix:Movie {title: "The Matrix"})

(matrix:Movie {title: "The Matrix", released: 1997})

The simplest form, (), represents an anonymous, uncharacterized node. If we want to refer to the

node elsewhere, we can add an identifier, eg: (matrix). Identifiers are restricted (ie, scoped) to a single

statement: an identifier may have different (or no) meaning in another statement.

The Movie label (prefixed in use with a colon) declares the node’s type. This restricts the pattern, keeping

it from matching (say) a structure with an Actor node in this position. Neo4j’s node indexes also use

labels: each index is specific to the combination of a label and a property.

The node’s properties (eg, title) are represented as a list of key/value pairs, enclosed within a pair of

braces, eg: {...}. Properties can be used to store information and/or restrict patterns. For example, we

could match nodes whose title is "The Matrix".

Relationship Syntax

Cypher uses a pair of dashes (--) to represent an undirected relationship. Directed relationships have

an arrowhead at one end (eg, <--, -->). Bracketed expressions (eg: [...]) can be used to add details.

This may include identifiers, properties, and/or type information, eg:

-->

-[role]->

-[:ACTED_IN]->

-[role:ACTED_IN]->

-[role:ACTED_IN {roles: ["Neo"]}]->

The syntax and semantics found within a relationship’s bracket pair are very similar to those used

between a node’s parentheses. An identifier (eg, role) can be defined, to be used elsewhere in the

statement. The relationship’s type (eg, ACTED_IN) is analogous to the node’s label. The properties (eg,

roles) are entirely equivalent to node properties. (Note that the value of a property may be an array.)

Pattern Syntax

Combining the syntax for nodes and relationships, we can express patterns. The following could be a

simple pattern (or fact) in this domain:

(keanu:Person:Actor {name: "Keanu Reeves"} )

-[role:ACTED_IN {roles: ["Neo"] } ]->

(matrix:Movie {title: "The Matrix"} )

Like with node labels, the relationship type ACTED_IN is added as a symbol, prefixed with a colon:

:ACTED_IN. Identifiers (eg, role) can be used elsewhere in the statement to refer to the relationship.

Node and relationship properties use the same notation. In this case, we used an array property for the

roles, allowing multiple roles to be specified.

Pattern Nodes vs. Database Nodes

When a node is used in a pattern, it describes zero or more nodes in the database. Similarly,

each pattern describes zero or more paths of nodes and relationships.

Pattern Identifiers

To increase modularity and reduce repetition, Cypher allows patterns to be assigned to identifiers. This

allow the matching paths to be inspected, used in other expressions, etc.

acted_in = (:Person)-[:ACTED_IN]->(:Movie)

The acted_in variable would contain two nodes and the connecting relationship for each path that was

found or created. There are a number of functions to access details of a path, including nodes(path),

rels(path) (same as relationships(path)), and length(path).

Introduction to Cypher

Clauses

Cypher statements typically have multiple clauses, each of which performs a specific task, eg:

• create and match patterns in the graph

• filter, project, sort, or paginate results

• connect/compose partial statements

By combining Cypher clauses, we can compose more complex statements that express what we want

to know or create. Neo4j then figures out how to achieve the desired goal in an efficient manner.

Introduction to Cypher

3.3.Patterns in Practice

Creating Data

We’ll start by looking into the clauses that allow us to create data.

To add data, we just use the patterns we already know. By providing patterns we can specify what

graph structures, labels and properties we would like to make part of our graph.

Obviously the simplest clause is called CREATE. It will just go ahead and directly create the patterns that

you specify.

For the patterns we’ve looked at so far this could look like the following:

CREATE (:Movie { title:"The Matrix",released:1997 })

If we execute this statement, Cypher returns the number of changes, in this case adding 1 node, 1 label

and 2 properties.

(empty result)

Nodes created: 1

Properties set: 2

Labels added: 1

As we started out with an empty database, we now have a database with a single node in it:

Movie

title = 'The Matrix'

released = 1997

If case we also want to return the created data we can add a RETURN clause, which refers to the identifier

we’ve assigned to our pattern elements.

CREATE (p:Person { name:"Keanu Reeves", born:1964 })

RETURN p

This is what gets returned:

Node[1]{name:"Keanu Reeves", born:1964}

1 row

Nodes created: 1

Properties set: 2

Labels added: 1

If we want to create more than one element, we can separate the elements with commas or use

multiple CREATE statements.

We can of course also create more complex structures, like an ACTED_IN relationship with information

about the character, or DIRECTED ones for the director.

CREATE (a:Person { name:"Tom Hanks",

born:1956 })-[r:ACTED_IN { roles: ["Forrest"]}]->(m:Movie { title:"Forrest Gump",released:1994 })

CREATE (d:Person { name:"Robert Zemeckis", born:1951 })-[:DIRECTED]->(m)

RETURN a,d,r,m

This is the part of the graph we just updated:

Introduction to Cypher

Person

nam e = 'Tom Hanks'

born = 1956

Movie

title = 'Forrest Gum p'

released = 1994

ACTED_IN

roles = ['Forrest']

Person

nam e = 'Robert Zemeckis'

born = 1951

DIRECTED

In most cases, we want to connect new data to existing structures. This requires that we know how to

find existing patterns in our graph data, which we will look at next.

Matching Patterns

Matching patterns is a task for the MATCH statement. We pass the same kind of patterns we’ve used so

far to MATCH to describe what we’re looking for. It is similar to query by example, only that our examples

also include the structures.

Note

A MATCH statement will search for the patterns we specify and return one row per successful

pattern match.

To find the data we’ve created so far, we can start looking for all nodes labeled with the Movie label.

MATCH (m:Movie)

RETURN m

Here’s the result:

Movie

title = 'The Matrix'

released = 1997

Movie

title = 'Forrest Gum p'

released = 1994

This should show both The Matrix and Forrest Gump.

We can also look for a specific person, like Keanu Reeves.

MATCH (p:Person { name:"Keanu Reeves" })

RETURN p

This query returns the matching node:

Person

nam e = 'Keanu Reeves'

born = 1964

Note that we only provide enough information to find the nodes, not all properties are required. In

most cases you have key-properties like SSN, ISBN, emails, logins, geolocation or product codes to look

for.

We can also find more interesting connections, like for instance the movies titles that Tom Hanks acted

in and the roles he played.

MATCH (p:Person { name:"Tom Hanks" })-[r:ACTED_IN]->(m:Movie)

Introduction to Cypher

RETURN m.title, r.roles

m.title r.roles

"Forrest Gump" ["Forrest"]

1 row

In this case we only returned the properties of the nodes and relationships that we were interested in.

You can access them everywhere via a dot notation identifer.property.

Of course this only lists his role as Forrest in Forrest Gump because that’s all data that we’ve added.

Now we know enough to connect new nodes to existing ones and can combine MATCH and CREATE to

attach structures to the graph.

Attaching Structures

To extend the graph with new information, we first match the existing connection points and then

attach the newly created nodes to them with relationships. Adding Cloud Atlas as a new movie for Tom

Hanks could be achieved like this:

MATCH (p:Person { name:"Tom Hanks" })

CREATE (m:Movie { title:"Cloud Atlas",released:2012 })

CREATE (p)-[r:ACTED_IN { roles: ['Zachry']}]->(m)

RETURN p,r,m

Here’s what the structure looks like in the database:

Person

nam e = 'Tom Hanks'

born = 1956

Movie

title = 'Cloud Atlas'

released = 2012

ACTED_IN

roles = ['Zachry']

Tip

It is important to remember that we can assign identifiers to both nodes and relationships

and use them later on, no matter if they were created or matched.

It is possible to attach both node and relationship in a single CREATE clause. For readability it helps to

split them up though.

Important

A tricky aspect of the combination of MATCH and CREATE is that we get one row per matched

pattern. This causes subsequent CREATE statements to be executed once for each row. In

many cases this is what you want. If that’s not intended, please move the CREATE statement

before the MATCH, or change the cardinality of the query with means discussed later or use

the get or create semantics of the next clause: MERGE.

Completing Patterns

Whenever we get data from external systems or are not sure if certain information already exists in

the graph, we want to be able to express a repeatable (idempotent) update operation. In Cypher MERGE

Introduction to Cypher

has this function. It acts like a combination of MATCH or CREATE, which checks for the existence of data

first before creating it. With MERGE you define a pattern to be found or created. Usually, as with MATCH

you only want to include the key property to look for in your core pattern. MERGE allows you to provide

additional properties you want to set ON CREATE.

If we wouldn’t know if our graph already contained Cloud Atlas we could merge it in again.

MERGE (m:Movie { title:"Cloud Atlas" })

ON CREATE SET m.released = 2012

RETURN m

Node[5]{title:"Cloud Atlas", released:2012}

1 row

We get a result in any both cases: either the data (potentially more than one row) that was already in

the graph or a single, newly created Movie node.

Note

A MERGE clause without any previously assigned identifiers in it either matches the full pattern

or creates the full pattern. It never produces a partial mix of matching and creating within a

pattern. To achieve a partial match/create, make sure to use already defined identifiers for

the parts that shouldn’t be affected.

So foremost MERGE makes sure that you can’t create duplicate information or structures, but it comes

with the cost of needing to check for existing matches first. Especially on large graphs it can be costly

to scan a large set of labeled nodes for a certain property. You can alleviate some of that by creating

supporting indexes or constraints, which we’ll discuss later. But it’s still not for free, so whenever you’re

sure to not create duplicate data use CREATE over MERGE.

Tip

MERGE can also assert that a relationship is only created once. For that to work you have to

pass in both nodes from a previous pattern match.

MATCH (m:Movie { title:"Cloud Atlas" })

MATCH (p:Person { name:"Tom Hanks" })

MERGE (p)-[r:ACTED_IN]->(m)

ON CREATE SET r.roles =['Zachry']

RETURN p,r,m

Person

nam e = 'Tom Hanks'

born = 1956

Movie

title = 'Cloud Atlas'

released = 2012

ACTED_IN

roles = ['Zachry']

In case the direction of a relationship is arbitrary, you can leave off the arrowhead. MERGE will then

check for the relationship in either direction, and create a new directed relationship if no matching

relationship was found.

Introduction to Cypher

If you choose to pass in only one node from a preceding clause, MERGE offers an interesting functionality.

It will then only match within the direct neighborhood of the provided node for the given pattern, and,

if not found create it. This can come in very handy for creating for example tree structures.

CREATE (y:Year { year:2014 })

MERGE (y)<-[:IN_YEAR]-(m10:Month { month:10 })

MERGE (y)<-[:IN_YEAR]-(m11:Month { month:11 })

RETURN y,m10,m11

This is the graph structure that gets created:

Year

year = 2014

Month

month = 11

IN_YEAR

Month

month = 10

IN_YEAR

Here there is no global search for the two Month nodes; they are only searched for in the context of the

2014 Year node.

Introduction to Cypher

3.4.Getting the Results You Want

Let’s first get some data in to retrieve results from:

CREATE (matrix:Movie { title:"The Matrix",released:1997 })

CREATE (cloudAtlas:Movie { title:"Cloud Atlas",released:2012 })

CREATE (forrestGump:Movie { title:"Forrest Gump",released:1994 })

CREATE (keanu:Person { name:"Keanu Reeves", born:1964 })

CREATE (robert:Person { name:"Robert Zemeckis", born:1951 })

CREATE (tom:Person { name:"Tom Hanks", born:1956 })

CREATE (tom)-[:ACTED_IN { roles: ["Forrest"]}]->(forrestGump)

CREATE (tom)-[:ACTED_IN { roles: ['Zachry']}]->(cloudAtlas)

CREATE (robert)-[:DIRECTED]->(forrestGump)

This is the data we will start out with:

Movie

title = 'The Matrix'

released = 1997

Movie

title = 'Cloud Atlas'

released = 2012

Movie

title = 'Forrest Gump'

released = 1994

Person

name = 'Keanu Reeves'

born = 1964

Person

name = 'Robert Zem eckis'

born = 1951

DIRECTED

Person

name = 'Tom Hanks'

born = 1956

ACTED_IN

roles = ['Zachry'] ACTED_IN

roles = ['Forrest']

Filtering Results

So far we’ve matched patterns in the graph and always returned all results we found. Quite often there

are conditions in play for what we want to see. Similar to in SQL those filter conditions are expressed

in a WHERE clause. This clause allows to use any number of boolean expressions (predicates) combined

with AND, OR, XOR and NOT. The simplest predicates are comparisons, especially equality.

MATCH (m:Movie)

WHERE m.title = "The Matrix"

RETURN m

Node[0]{title:"The Matrix", released:1997}

1 row

For equality on one or more properties, a more compact syntax can be used as well:

MATCH (m:Movie { title: "The Matrix" })

RETURN m

Other options are numeric comparisons, matching regular expressions and checking the existence of

values within a collection.

The WHERE clause below includes a regular expression match, a greater than comparison and a test to

see if a value exists in a collection.

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)

WHERE p.name =~ "K.+" OR m.released > 2000 OR "Neo" IN r.roles

RETURN p,r,m

p r m

Node[5]{name:"Tom Hanks",

born:1956}

:ACTED_IN[1]{roles:["Zachry"]} Node[1]{title:"Cloud Atlas",

released:2012}

1 row

Introduction to Cypher

One aspect that might be a little surprising is that you can even use patterns as predicates. Where MATCH

expands the number and shape of patterns matched, a pattern predicate restricts the current result

set. It only allows the paths to pass that satisfy the additional patterns as well (or NOT).

MATCH (p:Person)-[:ACTED_IN]->(m)

WHERE NOT (p)-[:DIRECTED]->()

RETURN p,m

p m

Node[5]{name:"Tom Hanks", born:1956} Node[1]{title:"Cloud Atlas", released:2012}

Node[5]{name:"Tom Hanks", born:1956} Node[2]{title:"Forrest Gump", released:1994}

2 rows

Here we find actors, because they sport an ACTED_IN relationship but then skip those that ever DIRECTED

any movie.

There are also more advanced ways of filtering like collection-predicates which we will look at later on.

Returning Results

So far we’ve returned only nodes, relationships, or paths directly via their identifiers. But the RETURN

clause can actually return any number of expressions. But what are actually expressions in Cypher?

The simplest expressions are literal values like numbers, strings and arrays as [1,2,3], and maps like

{name:"Tom Hanks", born:1964, movies:["Forrest Gump", ...], count:13}. You can access individual

properties of any node, relationship, or map with a dot-syntax like n.name. Individual elements or slices

of arrays can be retrieved with subscripts like names[0] or movies[1..-1]. Each function evaluation

like length(array), toInt("12"), substring("2014-07-01",0,4), or coalesce(p.nickname,"n/a") is also an

expression.

Predicates that you’d use in WHERE count as boolean expressions.

Of course simpler expressions can be composed and concatenated to form more complex expressions.

By default the expression itself will be used as label for the column, in many cases you want to alias

that with a more understandable name using expression AS alias. You can later on refer to that column

using its alias.

MATCH (p:Person)

RETURN p, p.name AS name, upper(p.name), coalesce(p.nickname,"n/a") AS nickname, { name: p.name,

label:head(labels(p))} AS person

p name upper(p.name) nickname person

Node[3]{name:"Keanu

Reeves", born:1964}

"Keanu Reeves" "KEANU REEVES" "n/a" {name -> "Keanu

Reeves", label ->

"Person"}

Node[4]

{name:"Robert

Zemeckis",

born:1951}

"Robert Zemeckis" "ROBERT ZEMECKIS" "n/a" {name -> "Robert

Zemeckis", label ->

"Person"}

Node[5]{name:"Tom

Hanks", born:1956}

"Tom Hanks" "TOM HANKS" "n/a" {name -> "Tom

Hanks", label ->

"Person"}

3 rows

If you’re interested in unique results you can use the DISTINCT keyword after RETURN to indicate that.

Aggregating Information

In many cases you want to aggregate or group the data that you encounter while traversing patterns

in your graph. In Cypher aggregation happens in the RETURN clause while computing your final results.

Introduction to Cypher

Many common aggregation functions are supported, e.g. count, sum, avg, min, and max, but there are

several more.

Counting the number of people in your database could be achieved by this:

MATCH (:Person)

RETURN count(*) AS people

people

1 row

Please note that NULL values are skipped during aggregation. For aggregating only unique values use

DISTINCT, like in count(DISTINCT role).

Aggregation in Cypher just works. You specify which result columns you want to aggregate and Cypher

will use all non-aggregated columns as grouping keys.

Aggregation affects which data is still visible in ordering or later query parts.

To find out how often an actor and director worked together, you’d run this statement:

MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(director:Person)

RETURN actor,director,count(*) AS collaborations

actor director collaborations

Node[5]{name:"Tom Hanks",

born:1956}

Node[4]{name:"Robert Zemeckis",

born:1951}

1 row

Frequently you want to sort and paginate after aggregating a count(x).

Ordering and Pagination

Ordering works like in other query languages, with an ORDER BY expression [ASC|DESC] clause. The

expression can be any expression discussed before as long as it is computable from the returned

information.

So for instance if you return person.name you can still ORDER BY person.age as both are accessible from

the person reference. You cannot order by things that you can’t infer from the information you return.

This is especially important with aggregation and DISTINCT return values as both remove the visibility of

data that is aggregated.

Pagination is a straightforward use of SKIP {offset} LIMIT {count}.

A common pattern is to aggregate for a count (score or frequency), order by it and only return the top-n

entries.

For instance to find the most prolific actors you could do:

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)

RETURN a,count(*) AS appearances

ORDER BY appearances DESC LIMIT 10;

a appearances

Node[5]{name:"Tom Hanks", born:1956} 2

1 row

Collecting Aggregation

The most helpful aggregation function is collect, which, as the name says, collects all aggregated

values into a real array or list. This comes very handy in many situations as you don’t loose the detail

information while aggregating.

Introduction to Cypher

Collect is well suited for retrieving the typical parent-child structures, where one core entity (parent,

root or head) is returned per row with all it’s dependent information in associated collections created

with collect. This means there’s no need to repeat the parent information per each child-row or even

running 1+n statements to retrieve the parent and its children individually.

To retrieve the cast of each movie in our database you could use this statement:

MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)

RETURN m.title AS movie, collect(a.name) AS cast, count(*) AS actors

movie cast actors

"Forrest Gump" ["Tom Hanks"] 1

"Cloud Atlas" ["Tom Hanks"] 1

2 rows

The lists created by collect can either be used from the client consuming the Cypher results or directly

within a statement with any of the collection functions or predicates.

Introduction to Cypher

3.5.How to Compose Large Statements

Let’s first get some data in to retrieve results from:

CREATE (matrix:Movie { title:"The Matrix",released:1997 })

CREATE (cloudAtlas:Movie { title:"Cloud Atlas",released:2012 })

CREATE (forrestGump:Movie { title:"Forrest Gump",released:1994 })

CREATE (keanu:Person { name:"Keanu Reeves", born:1964 })

CREATE (robert:Person { name:"Robert Zemeckis", born:1951 })

CREATE (tom:Person { name:"Tom Hanks", born:1956 })

CREATE (tom)-[:ACTED_IN { roles: ["Forrest"]}]->(forrestGump)

CREATE (tom)-[:ACTED_IN { roles: ['Zachry']}]->(cloudAtlas)

CREATE (robert)-[:DIRECTED]->(forrestGump)

Combine statements with UNION

A Cypher statement is usually quite compact. Expressing references between nodes as visual patterns

makes them easy to understand.

If you want to combine the results of two statements that have the same result structure, you can use

UNION [ALL].

For instance if you want to list both actors and directors without using the alternative relationship-type

syntax ()-[:ACTED_IN|:DIRECTED]->() you can do this:

MATCH (actor:Person)-[r:ACTED_IN]->(movie:Movie)

RETURN actor.name AS name, type(r) AS acted_in, movie.title AS title

UNION

MATCH (director:Person)-[r:DIRECTED]->(movie:Movie)

RETURN director.name AS name, type(r) AS acted_in, movie.title AS title

name acted_in title

"Tom Hanks" "ACTED_IN" "Cloud Atlas"

"Tom Hanks" "ACTED_IN" "Forrest Gump"

"Robert Zemeckis" "DIRECTED" "Forrest Gump"

3 rows

Use WITH to Chain Statements

In Cypher it’s possible to chain fragments of statements together, much like you would do within a

data-flow pipeline. Each fragment works on the output from the previous one and its results can feed

into the next one.

You use the WITH clause to combine the individual parts and declare which data flows from one to the

other. WITH is very much like RETURN with the difference that it doesn’t finish a query but prepares the

input for the next part. You can use the same expressions, aggregations, ordering and pagination as in

the RETURN clause.

The only difference is that you must alias all columns as they would otherwise not be accessible. Only

columns that you declare in your WITH clause is available in subsequent query parts.

See below for an example where we collect the movies someone appeared in, and then filter out those

which appear in only one movie.

MATCH (person:Person)-[:ACTED_IN]->(m:Movie)

WITH person, count(*) AS appearances, collect(m.title) AS movies

WHERE appearances > 1

RETURN person.name, appearances, movies

person.name appearances movies

"Tom Hanks" 2 ["Cloud Atlas", "Forrest Gump"]

1 row

Introduction to Cypher

Tip

If you want to filter by an aggregated value in SQL or similar languages you would have to

use HAVING. That’s a single purpose clause for filtering aggregated information. In Cypher,

WHERE can be used in both cases.

Introduction to Cypher

3.6.Labels, Constraints and Indexes

Labels are a convenient way to group nodes together. They are used to restrict queries, define

constraints and create indexes.

Using Constraints

You can also specify unique constraints that guarantee uniqueness of a certain property on nodes with

a specific label.

These constraints are also used by the MERGE clause to make certain that a node only exists once.

The following will give an example of how to use labels and add constraints and indexes to them. Let’s

start out adding a constraint — in this case we decided that all Movie node titles should be unique.

CREATE CONSTRAINT ON (movie:Movie) ASSERT movie.title IS UNIQUE

Note that adding the unique constraint will add an index on that property, so we won’t do that

separately. If we drop a constraint, and still want an index on the same property, we have to create such

an index.

Constraints can be added after a label is already in use, but that requires that the existing data

complies with the constraints.

Using indexes

For a graph query to run fast, you don’t need indexes, you only need them to find your starting points.

The main reason for using indexes in a graph database is to find the starting points in the graph as fast

as possible. After the initial index seek you rely on in-graph structures and the first class citizenship of

relationships in the graph database to achieve high performance.

In this case we want an index to speed up finding actors by name in the database:

CREATE INDEX ON :Actor(name)

Indexes can be added at any time. Note that it will take some time for an index to come online when

there’s existing data.

Now, let’s add some data.

CREATE (actor:Actor { name:"Tom Hanks" }),(movie:Movie { title:'Sleepless IN Seattle' }),

(actor)-[:ACTED_IN]->(movie);

Normally you don’t specify indexes when querying for data. They will be used automatically. This means

we can simply look up the Tom Hanks node, and the index will kick in behind the scenes to boost

performance.

MATCH (actor:Actor { name: "Tom Hanks" })

RETURN actor;

Labels

Now let’s say we want to add another label for a node. Here’s how to do that:

MATCH (actor:Actor { name: "Tom Hanks" })

SET actor :American;

To remove a label from nodes, this is what to do:

MATCH (actor:Actor { name: "Tom Hanks" })

REMOVE actor:American;

Related Content

For more information on labels and related topics, see:

Introduction to Cypher

•the section called “Labels” [7]

•Chapter14, Schema [243]

•Section14.2, “Constraints” [247]

•Section14.1, “Indexes” [244]

•Section10.8, “Using” [152]

•Section12.3, “Set” [200]

•Section12.5, “Remove” [205]

Introduction to Cypher

3.7.Loading Data

As you’ve seen you can not only query data expressively but also create data with Cypher statements.

Naturally in most cases you wouldn’t want to write or generate huge statements to generate your data

but instead use an existing data source that you pass into your statement and which is used to drive

the graph generation process.

That process not only includes creating completely new data but also integrating with existing

structures and updating your graph.

Parameters

In general we recommend passing in varying literal values from the outside as named parameters. This

allows Cypher to reuse existing execution plans for the statements.

Of course you can also pass in parameters for data to be imported. Those can be scalar values, maps,

lists or even lists of maps.

In your Cypher statement you can then iterate over those values (e.g. with UNWIND) to create your graph

structures.

For instance to create a movie graph from JSON data structures pulled from an API you could use:

{

"movies" : [ {

"title" : "Stardust",

"released" : 2007,

"cast" : [ {

"actor" : {

"name" : "Robert de Niro",

"born" : 1943

"characters" : [ "Captain Shakespeare" ]

}, {

"actor" : {

"name" : "Michelle Pfeiffer",

"born" : 1958

"characters" : [ "Lamia" ]

} ]

}

UNWIND {movies} as movie

MERGE (m:Movie {title:movie.title}) ON CREATE SET m.released = movie.released

FOREACH (role IN movie.cast |

MERGE (a:Person {name:role.actor.name}) ON CREATE SET a.born = role.actor.born

MERGE (a)-[:ACTED_IN {roles:role.characters}]->(m)

)

Importing CSV

Cypher provides an elegant built-in way to import tabular CSV data into graph structures.

The LOAD CSV clause parses a local or remote file into a stream of rows which represent maps (with

headers) or lists. Then you can use whatever Cypher operations you want to apply to either create

nodes or relationships or to merge with existing graph structures.

As CSV files usually represent either node- or relationship-lists, you run multiple passes to create nodes

and relationships separately.

For more details, see Section11.6, “Load CSV” [182].

movies.csv

id,title,country,year

Introduction to Cypher

1,Wall Street,USA,1987

2,The American President,USA,1995

3,The Shawshank Redemption,USA,1994

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/intro/movies.csv" AS line

CREATE (m:Movie { id:line.id,title:line.title, released:toInt(line.year)});

persons.csv

id,name

1,Charlie Sheen

2,Oliver Stone

3,Michael Douglas

4,Martin Sheen

5,Morgan Freeman

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/intro/persons.csv" AS line

MERGE (a:Person { id:line.id })

ON CREATE SET a.name=line.name;

roles.csv

personId,movieId,role

1,1,Bud Fox

4,1,Carl Fox

3,1,Gordon Gekko

4,2,A.J. MacInerney

3,2,President Andrew Shepherd

5,3,Ellis Boyd 'Red' Redding

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/intro/roles.csv" AS line

MATCH (m:Movie { id:line.movieId })

MATCH (a:Person { id:line.personId })

CREATE (a)-[:ACTED_IN { roles: [line.role]}]->(m);

Movie

id = '1'

title = 'Wall Street '

released = 1987

Movie

released = 1995

id = '2'

title = 'The Am erican President'

Movie

released = 1994

id = '3'

title = 'The Shawshank Redem ption'

Person

id = '1'

nam e = 'Charlie Sheen'

ACTED_IN

roles = ['Bud Fox']

Person

id = '2'

nam e = 'Oliver Stone'

Person

id = '3'

nam e = 'Michael Douglas'

ACTED_IN

roles = ['Gordon Gekko'] ACTED_IN

roles = ['President Andrew Shepherd']

Person

id = '4'

nam e = 'Martin Sheen'

ACTED_IN

roles = ['Carl Fox'] ACTED_IN

roles = ['A.J. MacInerney']

Person

id = '5'

nam e = 'Morgan Freem an'

ACTED_IN

roles = ['Ellis Boyd \\'Red\\' Redding']

If your file contains denormalized data, you can either run the same file with multiple passes and

simple operations as shown above or you might have to use MERGE to create entities uniquely.

For our use-case we can import the data using a CSV structure like this:

movie_actor_roles.csv

title;released;actor;born;characters

Back to the Future;1985;Michael J. Fox;1961;Marty McFly

Back to the Future;1985;Christopher Lloyd;1938;Dr. Emmet Brown

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/intro/movie_actor_roles.csv" AS line

FIELDTERMINATOR ";"

MERGE (m:Movie { title:line.title })

ON CREATE SET m.released = toInt(line.released)

MERGE (a:Person { name:line.actor })

ON CREATE SET a.born = toInt(line.born)

MERGE (a)-[:ACTED_IN { roles:split(line.characters,",")}]->(m)

Introduction to Cypher

Movie

id = '1'

tit le = 'Wall Street'

released = 19 87

Movie

released = 19 95

id = '2'

tit le = 'The American Presid ent'

Movie

released = 19 94

id = '3'

tit le = 'The Shawshank Redem ption'

Person

id = '1'

nam e = ' Charlie Sheen'

ACTED_IN

roles = ['Bud Fox']

Person

id = '2'

nam e = ' Oliv er Stone'

Person

id = '3'

nam e = ' Michael Douglas'

ACTED_IN

roles = ['Gordon Gekko'] ACTED_IN

roles = ['President Andrew Shepherd' ]

Person

id = '4'

nam e = ' Martin Sheen'

ACTED_IN

roles = ['Carl Fox'] ACTED_IN

roles = ['A.J. MacInerney']

Person

id = '5'

nam e = ' Morgan Freeman'

ACTED_IN

roles = ['Ellis Boyd \\'Red \\' Redding']

Movie

tit le = 'Back t o t he Fut ure'

released = 19 85

Person

nam e = ' Michael J. Fox'

born = 1961

ACTED_IN

roles = ['Mart y McFly' ]

Person

nam e = ' Christ opher Lloyd'

born = 1938

ACTED_IN

roles = ['Dr. Emmet Brow n']

If you import a large amount of data (more than 10000 rows), it is recommended to prefix your LOAD

CSV clause with a PERIODIC COMMIT hint. This allows Neo4j to regularly commit the import transactions to

avoid memory churn for large transaction-states.

Introduction to Cypher

3.8.Utilizing Data Structures

Cypher can create and consume more complex data structures out of the box. As already mentioned

you can create literal lists ([1,2,3]) and maps ({name: value}) within a statement.

There are a number of functions that work with lists. They range from simple ones like size(list) that

returns the size of a list to reduce, which runs an expression against the elements and accumulates the

results.

Let’s first load a bit of data into the graph. If you want more details on how the data is loaded, see the

section called “Importing CSV” [34].

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/intro/movies.csv" AS line

CREATE (m:Movie { id:line.id,title:line.title, released:toInt(line.year)});

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/intro/persons.csv" AS line

MERGE (a:Person { id:line.id })

ON CREATE SET a.name=line.name;

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/intro/roles.csv" AS line

MATCH (m:Movie { id:line.movieId })

MATCH (a:Person { id:line.personId })

CREATE (a)-[:ACTED_IN { roles: [line.role]}]->(m);

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/intro/movie_actor_roles.csv" AS line

FIELDTERMINATOR ";"

MERGE (m:Movie { title:line.title })

ON CREATE SET m.released = toInt(line.released)

MERGE (a:Person { name:line.actor })

ON CREATE SET a.born = toInt(line.born)

MERGE (a)-[:ACTED_IN { roles:split(line.characters,",")}]->(m)

Now, let’s try out data structures.

To begin with, collect the names of the actors per movie, and return two of them:

MATCH (movie:Movie)<-[:ACTED_IN]-(actor:Person)

RETURN movie.title AS movie, collect(actor.name)[0..2] AS two_of_cast

movie two_of_cast

"The American President" ["Michael Douglas", "Martin Sheen"]

"Back to the Future" ["Christopher Lloyd", "Michael J. Fox"]

"Wall Street" ["Michael Douglas", "Martin Sheen"]

"The Shawshank Redemption" ["Morgan Freeman"]

4 rows

You can also access individual elements or slices of a list quickly with list[1] or list[5..-5]. Other

functions to access parts of a list are head(list), tail(list) and last(list).

List Predicates

When using lists and arrays in comparisons you can use predicates like value IN list or any(x IN list

WHERE x = value). There are list predicates to satisfy conditions for all, any, none and single elements.

MATCH path =(:Person)-->(:Movie)<--(:Person)

WHERE ANY (n IN nodes(path) WHERE n.name = 'Michael Douglas')

RETURN extract(n IN nodes(path)| coalesce(n.name, n.title))

extract(n IN nodes(path coalesce(n.name, n.title))

["Martin Sheen", "Wall S et", "Michael Douglas"]

["Charlie Sheen", "Wall eet", "Michael Douglas"]

6 rows

Introduction to Cypher

extract(n IN nodes(path coalesce(n.name, n.title))

["Michael Douglas", "Wal treet", "Martin Sheen"]

["Michael Douglas", "Wal treet", "Charlie Sheen"]

["Martin Sheen", "The Am can President", "Michael Douglas"]

["Michael Douglas", "The erican President", "Martin Sheen"]

6 rows

List Processing

Oftentimes you want to process lists to filter, aggregate (reduce) or transform (extract) their values.

Those transformations can be done within Cypher or in the calling code. This kind of list-processing can

reduce the amount of data handled and returned, so it might make sense to do it within the Cypher

statement.

A simple, non-graph example would be:

WITH range(1,10) AS numbers

WITH extract(n IN numbers | n*n) AS squares

WITH filter(n IN squares WHERE n > 25) AS large_squares

RETURN reduce(a = 0, n IN large_squares | a + n) AS sum_large_squares

sum_large_squares

330

1 row

In a graph-query you can filter or aggregate collected values instead or work on array properties.

MATCH (m:Movie)<-[r:ACTED_IN]-(a:Person)

WITH m.title AS movie, collect({ name: a.name, roles: r.roles }) AS cast

RETURN movie, filter(actor IN cast WHERE actor.name STARTS WITH "M")

movie filter(actor IN cast WHERE actor.name STARTS

WITH "M")

"The American President" [{name -> "Michael Douglas", roles -> ["President

Andrew Shepherd"]}, {name -> "Martin Sheen", roles

-> ["A. J. MacInerney"]}]

"Back to the Future" [{name -> "Michael J. Fox", roles -> ["Marty

McFly"]}]

"Wall Street" [{name -> "Michael Douglas", roles -> ["Gordon

Gekko"]}, {name -> "Martin Sheen", roles -> ["Carl

Fox"]}]

"The Shawshank Redemption" [{name -> "Morgan Freeman", roles -> ["Ellis Boyd

'Red' Redding"]}]

4 rows

Unwind Lists

Sometimes you have collected information into a list, but want to use each element individually as a

row. For instance, you might want to further match patterns in the graph. Or you passed in a collection

of values but now want to create or match a node or relationship for each element. Then you can use

the UNWIND clause to unroll a list into a sequence of rows again.

For instance, a query to find the top 3 co-actors and then follow their movies and again list the cast for

each of those movies:

MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie)<-[:ACTED_IN]-(colleague:Person)

WHERE actor.name < colleague.name

Introduction to Cypher

WITH actor, colleague, count(*) AS frequency, collect(movie) AS movies

ORDER BY frequency DESC LIMIT 3 UNWIND movies AS m

MATCH (m)<-[:ACTED_IN]-(a)

RETURN m.title AS movie, collect(a.name) AS cast

movie cast

"The American President" ["Michael Douglas", "Martin Sheen"]

"Back to the Future" ["Christopher Lloyd", "Michael J. Fox"]

"Wall Street" ["Michael Douglas", "Martin Sheen", "Charlie

Sheen", "Michael Douglas", "Martin Sheen", "Charlie

Sheen"]

3 rows

Introduction to Cypher

3.9.Cypher vs. SQL

If you have used SQL and want to learn Cypher, this chapter is for you! We won’t dig very deep into

either of the languages, but focus on bridging the gap.

Data Model

For our example, we will use data about persons who act in, direct, produce movies.

Here’s an entity-relationship model for the example:

Person

Movie

acted in directed produced

We have Person and Movie entities, which are related in three different ways, each of which have many-

to-many cardinality.

In a RDBMS we would use tables for the entities as well as for the associative entities (join tables)

needed. In this case we decided to go with the following tables: movie, person, acted_in, directed,

produced. You’ll find the SQL for this below.

In Neo4j, the basic data units are nodes and relationships. Both can have properties, which correspond

to attributes in a RDBMS.

Nodes can be grouped by putting labels on them. In the example, we will use the labels Movie and

Person.

When using Neo4j, related entities can be represented directly by using relationships. There’s no need

to deal with foreign keys to handle the relationships, the database will take care of such mechanics.

Also, the relationships always have full referential integrity. There are no constraints to enable for this,

as it’s not optional; it’s really part of the underlying data model. Relationships always have a type, and

we will differentiate the different kinds of relationships by using the types ACTED_IN, DIRECTED, PRODUCED.

Sample Data

First off, let’s see how to set up our example data in a RDBMS. We’ll start out creating a few tables and

then go on to populate them.

CREATE TABLE movie (

id INTEGER,

title VARCHAR(100),

released INTEGER,

tagline VARCHAR(100)

);

CREATE TABLE person (

id INTEGER,

name VARCHAR(100),

born INTEGER

);

CREATE TABLE acted_in (

role varchar(100),

person_id INTEGER,

movie_id INTEGER

);

CREATE TABLE directed (

Introduction to Cypher

person_id INTEGER,

movie_id INTEGER

);

CREATE TABLE produced (

person_id INTEGER,

movie_id INTEGER

);

Populating with data:

INSERT INTO movie (id, title, released, tagline)

VALUES (

(1, 'The Matrix', 1999, 'Welcome to the Real World'),

(2, 'The Devil''s Advocate', 1997, 'Evil has its winning ways'),

(3, 'Monster', 2003, 'The first female serial killer of America')

);

INSERT INTO person (id, name, born)

VALUES (

(1, 'Keanu Reeves', 1964),

(2, 'Carrie-Anne Moss', 1967),

(3, 'Laurence Fishburne', 1961),

(4, 'Hugo Weaving', 1960),

(5, 'Andy Wachowski', 1967),

(6, 'Lana Wachowski', 1965),

(7, 'Joel Silver', 1952),

(8, 'Charlize Theron', 1975),

(9, 'Al Pacino', 1940),

(10, 'Taylor Hackford', 1944)

);

INSERT INTO acted_in (role, person_id, movie_id)

VALUES (

('Neo', 1, 1),

('Trinity', 2, 1),

('Morpheus', 3, 1),

('Agent Smith', 4, 1),

('Kevin Lomax', 1, 2),

('Mary Ann Lomax', 8, 2),

('John Milton', 9, 2),

('Aileen', 8, 3)

);

INSERT INTO directed (person_id, movie_id)

VALUES (

(5, 1),

(6, 1),

(10, 2)

);

INSERT INTO produced (person_id, movie_id)

VALUES (

(7, 1),

(8, 3)

);

Doing this in Neo4j will look quite different. To begin with, we won’t create any schema up front. We’ll

come back to schema later, for now it’s enough to know that labels can be used right away without

declaring them.

In the CREATE statements below, we tell Neo4j what data we want to have in the graph. Simply put, the

parentheses denote nodes, while the arrows (-->, or in our case with a relationship type included -

[:DIRECTED]->) denote relationships. For the nodes we set identifiers like TheMatrix so we can easily refer

to them later on in the statement. Note that the identifiers are scoped to the statement, and not visible

to other Cypher statements. We could use identifiers for the relationships as well, but there’s no need

for that in this case.

CREATE (TheMatrix:Movie { title:'The Matrix', released:1999, tagline:'Welcome to the Real World' })

Introduction to Cypher

CREATE (Keanu:Person { name:'Keanu Reeves', born:1964 })

CREATE (Carrie:Person { name:'Carrie-Anne Moss', born:1967 })

CREATE (Laurence:Person { name:'Laurence Fishburne', born:1961 })

CREATE (Hugo:Person { name:'Hugo Weaving', born:1960 })

CREATE (AndyW:Person { name:'Andy Wachowski', born:1967 })

CREATE (LanaW:Person { name:'Lana Wachowski', born:1965 })

CREATE (JoelS:Person { name:'Joel Silver', born:1952 })

CREATE (Keanu)-[:ACTED_IN { roles: ['Neo']}]->(TheMatrix),

(Carrie)-[:ACTED_IN { roles: ['Trinity']}]->(TheMatrix),

(Laurence)-[:ACTED_IN { roles: ['Morpheus']}]->(TheMatrix),

(Hugo)-[:ACTED_IN { roles: ['Agent Smith']}]->(TheMatrix),(AndyW)-[:DIRECTED]->(TheMatrix),

(LanaW)-[:DIRECTED]->(TheMatrix),(JoelS)-[:PRODUCED]->(TheMatrix)

CREATE (TheDevilsAdvocate:Movie { title:"The Devil's Advocate", released:1997,

tagline: 'Evil has its winning ways' })

CREATE (Monster:Movie { title: 'Monster', released: 2003,

tagline: 'The first female serial killer of America' })

CREATE (Charlize:Person { name:'Charlize Theron', born:1975 })

CREATE (Al:Person { name:'Al Pacino', born:1940 })

CREATE (Taylor:Person { name:'Taylor Hackford', born:1944 })

CREATE (Keanu)-[:ACTED_IN { roles: ['Kevin Lomax']}]->(TheDevilsAdvocate),

(Charlize)-[:ACTED_IN { roles: ['Mary Ann Lomax']}]->(TheDevilsAdvocate),

(Al)-[:ACTED_IN { roles: ['John Milton']}]->(TheDevilsAdvocate),

(Taylor)-[:DIRECTED]->(TheDevilsAdvocate),(Charlize)-[:ACTED_IN { roles: ['Aileen']}]->(Monster),

(Charlize)-[:PRODUCED { roles: ['Aileen']}]->(Monster)

Simple read of data

Let’s find all entries in the movie table and output their title attribute in our RDBMS:

SELECT movie.title

FROM movie;

TITLE

The Matrix

The Devil's Advocate

Monster

3 rows

Using Neo4j, find all nodes labeled Movie and output their title property:

MATCH (movie:Movie)

RETURN movie.title;

movie.title

"The Matrix"

"The Devil's Advocate"

"Monster"

3 rows

MATCH tells Neo4j to match a pattern in the graph. In this case the pattern is very simple: any node with a

Movie label on it. We bind the result of the pattern matching to the identifier movie, for use in the RETURN

clause. And as you can see, the RETURN keyword of Cypher is similar to SELECT in SQL.

Now let’s get movies released after 1998.

SELECT movie.title

FROM movie

WHERE movie.released > 1998;

Introduction to Cypher

TITLE

The Matrix

Monster

2 rows

In this case the addition actually looks identical in Cypher.

MATCH (movie:Movie)

WHERE movie.released > 1998

RETURN movie.title;

movie.title

"The Matrix"

"Monster"

2 rows

Note however that the semantics of WHERE in Cypher is somewhat different, see Section11.3,

“Where” [166] for more information.

Join

Let’s list all persons and the movies they acted in.

SELECT person.name, movie.title

FROM person

JOIN acted_in AS acted_in ON acted_in.person_id = person.id

JOIN movie ON acted_in.movie_id = movie.id;

NAME TITLE

Keanu Reeves The Matrix

Keanu Reeves The Devil's Advocate

Carrie-Anne Moss The Matrix

Laurence Fishburne The Matrix

Hugo Weaving The Matrix

Charlize Theron The Devil's Advocate

Charlize Theron Monster

Al Pacino The Devil's Advocate

8 rows

The same using Cypher:

MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)

RETURN person.name, movie.title;

Here we match a Person and a Movie node, in case they are connected with an ACTED_IN relationship.

person.name movie.title

"Hugo Weaving" "The Matrix"

"Laurence Fishburne" "The Matrix"

"Carrie-Anne Moss" "The Matrix"

"Keanu Reeves" "The Matrix"

"Al Pacino" "The Devil's Advocate"

8 rows

Introduction to Cypher

person.name movie.title

"Charlize Theron" "The Devil's Advocate"

"Keanu Reeves" "The Devil's Advocate"

"Charlize Theron" "Monster"

8 rows

To make things slightly more complex, let’s search for the co-actors of Keanu Reeves. In SQL we use a

self join on the person table and join on the acted_in table once for Keanu, and once for the co-actors.

SELECT DISTINCT co_actor.name

FROM person AS keanu

JOIN acted_in AS acted_in1 ON acted_in1.person_id = keanu.id

JOIN acted_in AS acted_in2 ON acted_in2.movie_id = acted_in1.movie_id

JOIN person AS co_actor

ON acted_in2.person_id = co_actor.id AND co_actor.id <> keanu.id

WHERE keanu.name = 'Keanu Reeves';

NAME

Al Pacino

Carrie-Anne Moss

Charlize Theron

Hugo Weaving

Laurence Fishburne

5 rows

In Cypher, we use a pattern with two paths that target the same Movie node.

MATCH (keanu:Person)-[:ACTED_IN]->(movie:Movie),(coActor:Person)-[:ACTED_IN]->(movie)

WHERE keanu.name = 'Keanu Reeves'

RETURN DISTINCT coActor.name;

You may have noticed that we used the co_actor.id <> keanu.id predicate in SQL only. This is because

Neo4j will only match on the ACTED_IN relationship once in the same pattern. If this is not what we want,

we can split the pattern up by using two MATCH clauses like this:

MATCH (keanu:Person)-[:ACTED_IN]->(movie:Movie)

MATCH (coActor:Person)-[:ACTED_IN]->(movie)

WHERE keanu.name = 'Keanu Reeves'

RETURN DISTINCT coActor.name;

This time Keanu Reeves is included in the result as well:

coActor.name

"Al Pacino"

"Charlize Theron"

"Keanu Reeves"

"Hugo Weaving"

"Laurence Fishburne"

"Carrie-Anne Moss"

6 rows

Next, let’s find out who has both acted in and produced movies.

SELECT person.name

Introduction to Cypher

FROM person

WHERE person.id IN (SELECT person_id FROM acted_in)

AND person.id IN (SELECT person_id FROM produced)

NAME

Charlize Theron

1 rows

In Cypher, we use patterns as predicates in this case. That is, we require the relationships to exist, but

don’t care about the connected nodes; thus the empty parentheses.

MATCH (person:Person)

WHERE (person)-[:ACTED_IN]->() AND (person)-[:PRODUCED]->()

RETURN person.name

Aggregation

Now let’s find out a bit about the directors in movies that Keanu Reeves acted in. We want to know how

many of those movies each of them directed.

SELECT director.name, count(*)

FROM person keanu

JOIN acted_in ON keanu.id = acted_in.person_id

JOIN directed ON acted_in.movie_id = directed.movie_id

JOIN person AS director ON directed.person_id = director.id

WHERE keanu.name = 'Keanu Reeves'

GROUP BY director.name

ORDER BY count(*) DESC

NAME C2

Andy Wachowski 1

Lana Wachowski 1

Taylor Hackford 1

3 rows

Here’s how we’ll do the same in Cypher:

MATCH (keanu:Person { name: 'Keanu Reeves' })-[:ACTED_IN]->(movie:Movie),

(director:Person)-[:DIRECTED]->(movie)

RETURN director.name, count(*)

ORDER BY count(*) DESC

As you can see there is no GROUP BY in the Cypher equivalent. Instead, Neo4j will automatically figure out

the grouping key.

Chapter4.Use Cypher in an application

The most direct way to use Cypher programmatically is to execute a HTTP POST operation against

the transactional Cypher endpoint. You can send a large number of statements with parameters to

the server with each request. For immediate execution you can use the /db/data/transaction/commit

endpoint with a JSON payload like this:

curl -i -H accept:application/json -H content-type:application/json -XPOST http://localhost:7474/db/data/transaction/commit \

-d '{"statements":[{"statement":"CREATE (p:Person {name:{name},born:{born}}) RETURN p","parameters":{"name":"Keanu

Reeves","born":1964}}]}'

The above command results in:

{"results":[{"columns":["p"],"data":[{"row":[{"name":"Keanu Reeves","born":1964}]}]}],"errors":[]}

You can add as many "statement" objects in the "statements" list as you want.

For larger use-cases that span multiple requests but whose read-write-read-write operations should

be executed within the same transactional scope you’d use the /db/data/transaction endpoint. This will

give you a transaction URL as the Location header, which you can continue to write to and read from. At

the end you either commit the whole transaction by POSTing to the (also returned) commit URL or by

issuing a DELETE request against the transaction URL.

curl -i -H accept:application/json -H content-type:application/json -XPOST http://localhost:7474/db/data/transaction \

-d '{"statements":[{"statement":"CREATE (p:Person {name:{name},born:{born}}) RETURN p","parameters":{"name":"Clint

Eastwood","born":1930}}]}'

The above command results in:

HTTP/1.1 201 Created

Location: http://localhost:7474/db/data/transaction/261

{"commit":"http://localhost:7474/db/data/transaction/261/commit","transaction":{"expires":"Wed, 03 Sep 2014 23:26:51

+0000"},"errors":[],

"results":[{"columns":["p"],"data":[{"row":[{"name":"Clint Eastwood","born":1930}]}]}]}

See Section21.1, “Transactional Cypher HTTP endpoint” [298] for more information.

Chapter5.Basic Data Modeling Examples

The following chapters contain simple examples to get you started thinking about data modeling with

graphs. If you are looking for more advanced examples you can head straight to Chapter6, Advanced

Data Modeling Examples [62].

The examples use Cypher queries a lot, read PartIII, “Cypher Query Language” [102] for more

information.

Basic Data Modeling Examples

5.1.Movie Database

Our example graph consists of movies with title and year and actors with a name. Actors have ACTS_IN

relationships to movies, which represents the role they played. This relationship also has a role

attribute.

We’ll go with three movies and three actors:

CREATE (matrix1:Movie { title : 'The Matrix', year : '1999-03-31' })

CREATE (matrix2:Movie { title : 'The Matrix Reloaded', year : '2003-05-07' })

CREATE (matrix3:Movie { title : 'The Matrix Revolutions', year : '2003-10-27' })

CREATE (keanu:Actor { name:'Keanu Reeves' })

CREATE (laurence:Actor { name:'Laurence Fishburne' })

CREATE (carrieanne:Actor { name:'Carrie-Anne Moss' })

CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix1)

CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix2)

CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix3)

CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix1)

CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix2)

CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix3)

CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix1)

CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix2)

CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix3)

This gives us the following graph to play with:

Movie

title = 'The Matrix'

year = '1999-03-31'

Movie

year = '2003-05-07'

title = 'The Matrix Reloaded'

Movie

year = '2003-10-27'

title = 'The Matrix Revolutions'

Actor

name = 'Keanu Reeves'

ACTS_IN

role = 'Neo'

ACTS_IN

role = 'Neo' ACTS_IN

role = 'Neo'

Actor

name = 'Laurence Fishburne'

ACTS_IN

role = 'Morpheus'

ACTS_IN

role = 'Morpheus' ACTS_IN

role = 'Morpheus'

Actor

name = 'Carrie-Anne Moss'

ACTS_IN

role = 'Trinity'

ACTS_IN

role = 'Trinity' ACTS_IN

role = 'Trinity'

Let’s check how many nodes we have now:

MATCH (n)

RETURN "Hello Graph with " + count(*)+ " Nodes!" AS welcome;

Return a single node, by name:

MATCH (movie:Movie { title: 'The Matrix' })

RETURN movie;

Return the title and date of the matrix node:

MATCH (movie:Movie { title: 'The Matrix' })

RETURN movie.title, movie.year;

Which results in:

movie.title movie.year

"The Matrix" "1999-03-31"

1 row

Show all actors:

MATCH (actor:Actor)

RETURN actor;

Return just the name, and order them by name:

Basic Data Modeling Examples

MATCH (actor:Actor)

RETURN actor.name

ORDER BY actor.name;

Count the actors:

MATCH (actor:Actor)

RETURN count(*);

Get only the actors whose names end with “s”:

MATCH (actor:Actor)

WHERE actor.name =~ ".*s$"

RETURN actor.name;

Here’s some exploratory queries for unknown datasets. Don’t do this on live production databases!

Count nodes:

MATCH (n)

RETURN count(*);

Count relationship types:

MATCH (n)-[r]->()

RETURN type(r), count(*);

type(r) count(*)

"ACTS_IN" 9

1 row

List all nodes and their relationships:

MATCH (n)-[r]->(m)

RETURN n AS FROM , r AS `->`, m AS to;

from -> to

Node[3]{name:"Keanu Reeves"} :ACTS_IN[2]{role:"Neo"} Node[2]{year:"2003-10-27",

title:"The Matrix Revolutions"}

Node[3]{name:"Keanu Reeves"} :ACTS_IN[1]{role:"Neo"} Node[1]{year:"2003-05-07",

title:"The Matrix Reloaded"}

Node[3]{name:"Keanu Reeves"} :ACTS_IN[0]{role:"Neo"} Node[0]{title:"The Matrix",

year:"1999-03-31"}

Node[4]{name:"Laurence

Fishburne"}

:ACTS_IN[5]{role:"Morpheus"} Node[2]{year:"2003-10-27",

title:"The Matrix Revolutions"}

Node[4]{name:"Laurence

Fishburne"}

:ACTS_IN[4]{role:"Morpheus"} Node[1]{year:"2003-05-07",

title:"The Matrix Reloaded"}

Node[4]{name:"Laurence

Fishburne"}

:ACTS_IN[3]{role:"Morpheus"} Node[0]{title:"The Matrix",

year:"1999-03-31"}

Node[5]{name:"Carrie-Anne Moss"} :ACTS_IN[8]{role:"Trinity"} Node[2]{year:"2003-10-27",

title:"The Matrix Revolutions"}

Node[5]{name:"Carrie-Anne Moss"} :ACTS_IN[7]{role:"Trinity"} Node[1]{year:"2003-05-07",

title:"The Matrix Reloaded"}

Node[5]{name:"Carrie-Anne Moss"} :ACTS_IN[6]{role:"Trinity"} Node[0]{title:"The Matrix",

year:"1999-03-31"}

9 rows

Basic Data Modeling Examples

5.2.Social Movie Database

Our example graph consists of movies with title and year and actors with a name. Actors have ACTS_IN

relationships to movies, which represents the role they played. This relationship also has a role

attribute.

So far, we queried the movie data; now let’s update the graph too.

CREATE (matrix1:Movie { title : 'The Matrix', year : '1999-03-31' })

CREATE (matrix2:Movie { title : 'The Matrix Reloaded', year : '2003-05-07' })

CREATE (matrix3:Movie { title : 'The Matrix Revolutions', year : '2003-10-27' })

CREATE (keanu:Actor { name:'Keanu Reeves' })

CREATE (laurence:Actor { name:'Laurence Fishburne' })

CREATE (carrieanne:Actor { name:'Carrie-Anne Moss' })

CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix1)

CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix2)

CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix3)

CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix1)

CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix2)

CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix3)

CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix1)

CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix2)

CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix3)

We will add ourselves, friends and movie ratings.

Here’s how to add a node for yourself and return it, let’s say your name is “Me”:

CREATE (me:User { name: "Me" })

RETURN me;

Node[6]{name:"Me"}

1 row

Nodes created: 1

Properties set: 1

Labels added: 1

Let’s check if the node is there:

MATCH (me:User { name: "Me" })

RETURN me.name;

Add a movie rating:

MATCH (me:User { name: "Me" }),(movie:Movie { title: "The Matrix" })

CREATE (me)-[:RATED { stars : 5, comment : "I love that movie!" }]->(movie);

Which movies did I rate?

MATCH (me:User { name: "Me" }),(me)-[rating:RATED]->(movie)

RETURN movie.title, rating.stars, rating.comment;

movie.title rating.stars rating.comment

"The Matrix" 5 "I love that movie!"

1 row

We need a friend!

CREATE (friend:User { name: "A Friend" })

RETURN friend;

Basic Data Modeling Examples

Add our friendship idempotently, so we can re-run the query without adding it several times. We return

the relationship to check that it has not been created several times.

MATCH (me:User { name: "Me" }),(friend:User { name: "A Friend" })

CREATE UNIQUE (me)-[friendship:FRIEND]->(friend)

RETURN friendship;

You can rerun the query, see that it doesn’t change anything the second time!

Let’s update our friendship with a since property:

MATCH (me:User { name: "Me" })-[friendship:FRIEND]->(friend:User { name: "A Friend" })

SET friendship.since='forever'

RETURN friendship;

Let’s pretend us being our friend and wanting to see which movies our friends have rated.

MATCH (me:User { name: "A Friend" })-[:FRIEND]-(friend)-[rating:RATED]->(movie)

RETURN movie.title, avg(rating.stars) AS stars, collect(rating.comment) AS comments, count(*);

movie.title stars comments count(*)

"The Matrix" 5. 0 ["I love that movie!"] 1

1 row

That’s too little data, let’s add some more friends and friendships.

MATCH (me:User { name: "Me" })

FOREACH (i IN range(1,10)| CREATE (friend:User { name: "Friend " + i }),(me)-[:FRIEND]->(friend));

Show all our friends:

MATCH (me:User { name: "Me" })-[r:FRIEND]->(friend)

RETURN type(r) AS friendship, friend.name;

friendship friend.name

"FRIEND" "Friend 5"

"FRIEND" "Friend 4"

"FRIEND" "Friend 3"

"FRIEND" "Friend 2"

"FRIEND" "Friend 1"

"FRIEND" "Friend 10"

"FRIEND" "Friend 8"

"FRIEND" "Friend 9"

"FRIEND" "Friend 6"

"FRIEND" "Friend 7"

"FRIEND" "A Friend"

11 rows

Basic Data Modeling Examples

5.3.Finding Paths

Our example graph consists of movies with title and year and actors with a name. Actors have ACTS_IN

relationships to movies, which represents the role they played. This relationship also has a role

attribute.

We queried and updated the data so far, now let’s find interesting constellations, a.k.a. paths.

CREATE (matrix1:Movie { title : 'The Matrix', year : '1999-03-31' })

CREATE (matrix2:Movie { title : 'The Matrix Reloaded', year : '2003-05-07' })

CREATE (matrix3:Movie { title : 'The Matrix Revolutions', year : '2003-10-27' })

CREATE (keanu:Actor { name:'Keanu Reeves' })

CREATE (laurence:Actor { name:'Laurence Fishburne' })

CREATE (carrieanne:Actor { name:'Carrie-Anne Moss' })

CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix1)

CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix2)

CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix3)

CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix1)

CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix2)

CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix3)

CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix1)

CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix2)

CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix3)

All other movies that actors in “The Matrix” acted in ordered by occurrence:

MATCH (:Movie { title: "The Matrix" })<-[:ACTS_IN]-(actor)-[:ACTS_IN]->(movie)

RETURN movie.title, count(*)

ORDER BY count(*) DESC ;

movie.title count(*)

"The Matrix Revolutions" 3

"The Matrix Reloaded" 3

2 rows

Let’s see who acted in each of these movies:

MATCH (:Movie { title: "The Matrix" })<-[:ACTS_IN]-(actor)-[:ACTS_IN]->(movie)

RETURN movie.title, collect(actor.name), count(*) AS count

ORDER BY count DESC ;

movie.title collect(actor.name) count

"The Matrix Revolutions" ["Carrie-Anne Moss", "Laurence

Fishburne", "Keanu Reeves"]

"The Matrix Reloaded" ["Carrie-Anne Moss", "Laurence

Fishburne", "Keanu Reeves"]

2 rows

What about co-acting, that is actors that acted together:

MATCH (:Movie { title: "The Matrix"

})<-[:ACTS_IN]-(actor)-[:ACTS_IN]->(movie)<-[:ACTS_IN]-(colleague)

RETURN actor.name, collect(DISTINCT colleague.name);

actor.name collect(distinct colleague.name)

"Carrie-Anne Moss" ["Laurence Fishburne", "Keanu Reeves"]

"Keanu Reeves" ["Carrie-Anne Moss", "Laurence Fishburne"]

3 rows

Basic Data Modeling Examples

actor.name collect(distinct colleague.name)

"Laurence Fishburne" ["Carrie-Anne Moss", "Keanu Reeves"]

3 rows

Who of those other actors acted most often with anyone from the matrix cast?

MATCH (:Movie { title: "The Matrix"

})<-[:ACTS_IN]-(actor)-[:ACTS_IN]->(movie)<-[:ACTS_IN]-(colleague)

RETURN colleague.name, count(*)

ORDER BY count(*) DESC LIMIT 10;

colleague.name count(*)

"Carrie-Anne Moss" 4

"Keanu Reeves" 4

"Laurence Fishburne" 4

3 rows

Starting with paths, a path is a sequence of nodes and relationships from a start node to an end node.

We know that Trinity loves Neo, but how many paths exist between the two actors? We’ll limit the path

length of the pattern as it exhaustively searches the graph otherwise. This is done by using *0..5 in the

pattern relationship.

MATCH p =(:Actor { name: "Keanu Reeves" })-[:ACTS_IN*0..5]-(:Actor { name: "Carrie-Anne Moss" })

RETURN p, length(p)

LIMIT 10;

p length(p)

[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[0]

{role:"Neo"}, Node[0]{title:"The Matrix",

year:"1999-03-31"}, :ACTS_IN[6]{role:"Trinity"},

Node[5]{name:"Carrie-Anne Moss"}]

[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[1]

{role:"Neo"}, Node[1]{year:"2003-05-07", title:"The

Matrix Reloaded"}, :ACTS_IN[4]{role:"Morpheus"},

Node[4]{name:"Laurence Fishburne"}, :ACTS_IN[3]

{role:"Morpheus"}, Node[0]{title:"The Matrix",

year:"1999-03-31"}, :ACTS_IN[6]{role:"Trinity"},

Node[5]{name:"Carrie-Anne Moss"}]

[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[2]

{role:"Neo"}, Node[2]{year:"2003-10-27",

title:"The Matrix Revolutions"}, :ACTS_IN[5]

{role:"Morpheus"}, Node[4]{name:"Laurence

Fishburne"}, :ACTS_IN[3]{role:"Morpheus"},

Node[0]{title:"The Matrix",

year:"1999-03-31"}, :ACTS_IN[6]{role:"Trinity"},

Node[5]{name:"Carrie-Anne Moss"}]

[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[1]

{role:"Neo"}, Node[1]{year:"2003-05-07", title:"The

Matrix Reloaded"}, :ACTS_IN[7]{role:"Trinity"},

Node[5]{name:"Carrie-Anne Moss"}]

[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[0]

{role:"Neo"}, Node[0]{title:"The Matrix",

9 rows

Basic Data Modeling Examples

p length(p)

year:"1999-03-31"}, :ACTS_IN[3]{role:"Morpheus"},

Node[4]{name:"Laurence Fishburne"}, :ACTS_IN[4]

{role:"Morpheus"}, Node[1]{year:"2003-05-07",

title:"The Matrix Reloaded"}, :ACTS_IN[7]

{role:"Trinity"}, Node[5]{name:"Carrie-Anne

Moss"}]

[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[2]

{role:"Neo"}, Node[2]{year:"2003-10-27",

title:"The Matrix Revolutions"}, :ACTS_IN[5]

{role:"Morpheus"}, Node[4]{name:"Laurence

Fishburne"}, :ACTS_IN[4]{role:"Morpheus"},

Node[1]{year:"2003-05-07", title:"The Matrix

Reloaded"}, :ACTS_IN[7]{role:"Trinity"}, Node[5]

{name:"Carrie-Anne Moss"}]

[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[2]

{role:"Neo"}, Node[2]{year:"2003-10-27", title:"The

Matrix Revolutions"}, :ACTS_IN[8]{role:"Trinity"},

Node[5]{name:"Carrie-Anne Moss"}]

[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[0]

{role:"Neo"}, Node[0]{title:"The Matrix",

year:"1999-03-31"}, :ACTS_IN[3]{role:"Morpheus"},

Node[4]{name:"Laurence Fishburne"}, :ACTS_IN[5]

{role:"Morpheus"}, Node[2]{year:"2003-10-27",

title:"The Matrix Revolutions"}, :ACTS_IN[8]

{role:"Trinity"}, Node[5]{name:"Carrie-Anne

Moss"}]

[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[1]

{role:"Neo"}, Node[1]{year:"2003-05-07", title:"The

Matrix Reloaded"}, :ACTS_IN[4]{role:"Morpheus"},

Node[4]{name:"Laurence Fishburne"}, :ACTS_IN[5]

{role:"Morpheus"}, Node[2]{year:"2003-10-27",

title:"The Matrix Revolutions"}, :ACTS_IN[8]

{role:"Trinity"}, Node[5]{name:"Carrie-Anne

Moss"}]

9 rows

But that’s a lot of data, we just want to look at the names and titles of the nodes of the path.

MATCH p =(:Actor { name: "Keanu Reeves" })-[:ACTS_IN*0..5]-(:Actor { name: "Carrie-Anne Moss" })

RETURN extract(n IN nodes(p)| coalesce(n.title,n.name)) AS `names AND titles`, length(p)

ORDER BY length(p)

LIMIT 10;

names and titles length(p)

["Keanu Reeves", "The Matrix", "Carrie-Anne Moss"] 2

["Keanu Reeves", "The Matrix Reloaded", "Carrie-

Anne Moss"]

["Keanu Reeves", "The Matrix Revolutions", "Carrie-

Anne Moss"]

["Keanu Reeves", "The Matrix Reloaded", "Laurence

Fishburne", "The Matrix", "Carrie-Anne Moss"]

9 rows

Basic Data Modeling Examples

names and titles length(p)

["Keanu Reeves", "The Matrix

Revolutions", "Laurence Fishburne", "The

Matrix", "Carrie-Anne Moss"]

["Keanu Reeves", "The Matrix", "Laurence

Fishburne", "The Matrix Reloaded", "Carrie-Anne

Moss"]

["Keanu Reeves", "The Matrix

Revolutions", "Laurence Fishburne", "The Matrix

Reloaded", "Carrie-Anne Moss"]

["Keanu Reeves", "The Matrix", "Laurence

Fishburne", "The Matrix Revolutions", "Carrie-Anne

Moss"]

["Keanu Reeves", "The Matrix Reloaded", "Laurence

Fishburne", "The Matrix Revolutions", "Carrie-Anne

Moss"]

9 rows

Basic Data Modeling Examples

5.4.Linked Lists

A powerful feature of using a graph database, is that you can create your own in-graph data

structures — for example a linked list.

This data structure uses a single node as the list reference. The reference has an outgoing relationship

to the head of the list, and an incoming relationship from the last element of the list. If the list is empty,

the reference will point to itself.

To make it clear what happens, we will show how the graph looks after each query.

To initialize an empty linked list, we simply create a node, and make it link to itself. Unlike the actual list

elements, it doesn’t have a value property.

CREATE (root { name: 'ROOT' })-[:LINK]->(root)

RETURN root

nam e = 'ROOT' LINK

Adding values is done by finding the relationship where the new value should be placed in, and

replacing it with a new node, and two relationships to it. We also have to handle the fact that the before

and after nodes could be the same as the root node. The case where before, after and the root node

are all the same, makes it necessary to use CREATE UNIQUE to not create two new value nodes by mistake.

MATCH (root)-[:LINK*0..]->(before),(after)-[:LINK*0..]->(root),(before)-[old:LINK]->(after)

WHERE root.name = 'ROOT' AND (before.value < 25 OR before = root) AND (25 < after.value OR after =

root)

CREATE UNIQUE (before)-[:LINK]->({ value:25 })-[:LINK]->(after)

DELETE old

nam e = 'ROOT'

value = 25

LINK LINK

Let’s add one more value:

MATCH (root)-[:LINK*0..]->(before),(after)-[:LINK*0..]->(root),(before)-[old:LINK]->(after)

WHERE root.name = 'ROOT' AND (before.value < 10 OR before = root) AND (10 < after.value OR after =

root)

CREATE UNIQUE (before)-[:LINK]->({ value:10 })-[:LINK]->(after)

DELETE old

Basic Data Modeling Examples

nam e = 'ROOT'

value = 10

LINK

value = 25

LINK

Deleting a value, conversely, is done by finding the node with the value, and the two relationships going

in and out from it, and replacing the relationships with a new one.

MATCH (root)-[:LINK*0..]->(before),(before)-[delBefore:LINK]->(del)-[delAfter:LINK]->(after),

(after)-[:LINK*0..]->(root)

WHERE root.name = 'ROOT' AND del.value = 10

CREATE UNIQUE (before)-[:LINK]->(after)

DELETE del, delBefore, delAfter

nam e = 'ROOT'

value = 25

LINK LINK

Deleting the last value node is what requires us to use CREATE UNIQUE when replacing the relationships.

Otherwise, we would end up with two relationships from the root node to itself, as both before and

after nodes are equal to the root node, meaning the pattern would match twice.

MATCH (root)-[:LINK*0..]->(before),(before)-[delBefore:LINK]->(del)-[delAfter:LINK]->(after),

(after)-[:LINK*0..]->(root)

WHERE root.name = 'ROOT' AND del.value = 25

CREATE UNIQUE (before)-[:LINK]->(after)

DELETE del, delBefore, delAfter

nam e = 'ROOT' LINK

Basic Data Modeling Examples

5.5.TV Shows

This example show how TV Shows with Seasons, Episodes, Characters, Actors, Users and Reviews can

be modeled in a graph database.

Data Model

Let’s start out with an entity-relationship model of the domain at hand:

TV Show

Season

has

Episode

has

Review

has

Character

featured

User

wrote

Actor

played

To implement this in Neo4j we’ll use the following relationship types:

Relationship Type Description

HAS_SEASON Connects a show with its seasons.

HAS_EPISODE Connects a season with its episodes.

FEATURED_CHARACTER Connects an episode with its characters.

PLAYED_CHARACTER Connects actors with characters. Note that an

actor can play multiple characters in an episode,

and that the same character can be played by

multiple actors as well.

HAS_REVIEW Connects an episode with its reviews.

WROTE_REVIEW Connects users with reviews they contributed.

Sample Data

Let’s create some data and see how the domain plays out in practice:

CREATE (himym:TVShow { name: "How I Met Your Mother" })

CREATE (himym_s1:Season { name: "HIMYM Season 1" })

CREATE (himym_s1_e1:Episode { name: "Pilot" })

CREATE (ted:Character { name: "Ted Mosby" })

CREATE (joshRadnor:Actor { name: "Josh Radnor" })

Basic Data Modeling Examples

CREATE UNIQUE (joshRadnor)-[:PLAYED_CHARACTER]->(ted)

CREATE UNIQUE (himym)-[:HAS_SEASON]->(himym_s1)

CREATE UNIQUE (himym_s1)-[:HAS_EPISODE]->(himym_s1_e1)

CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(ted)

CREATE (himym_s1_e1_review1 { title: "Meet Me At The Bar In 15 Minutes & Suit Up",

content: "It was awesome" })

CREATE (wakenPayne:User { name: "WakenPayne" })

CREATE (wakenPayne)-[:WROTE_REVIEW]->(himym_s1_e1_review1)<-[:HAS_REVIEW]-(himym_s1_e1)

This is how the data looks in the database:

TVShow

nam e = 'How I Met Your Mother'

Season

nam e = 'HIMYM Season 1'

HAS_SEASON

Episode

nam e = 'Pilot'

HAS_EPISODE

title = 'Meet Me At The Bar In 15 Minutes & Suit Up'

content = 'It was awesom e'

HAS_REVIEW

Character

nam e = 'Ted Mosby'

FEATURED_CHARACTER

Actor

nam e = 'Josh Radnor'

PLAYED_CHARACTER

User

nam e = 'WakenPayne'

WROTE_REVIEW

Note that even though we could have modeled the reviews as relationships with title and content

properties on them, we made them nodes instead. We gain a lot of flexibility in this way, for example if

we want to connect comments to each review.

Now let’s add more data:

MATCH (himym:TVShow { name: "How I Met Your Mother" }),(himym_s1:Season),

(himym_s1_e1:Episode { name: "Pilot" }),

(himym)-[:HAS_SEASON]->(himym_s1)-[:HAS_EPISODE]->(himym_s1_e1)

CREATE (marshall:Character { name: "Marshall Eriksen" })

CREATE (robin:Character { name: "Robin Scherbatsky" })

CREATE (barney:Character { name: "Barney Stinson" })

CREATE (lily:Character { name: "Lily Aldrin" })

CREATE (jasonSegel:Actor { name: "Jason Segel" })

CREATE (cobieSmulders:Actor { name: "Cobie Smulders" })

CREATE (neilPatrickHarris:Actor { name: "Neil Patrick Harris" })

CREATE (alysonHannigan:Actor { name: "Alyson Hannigan" })

CREATE UNIQUE (jasonSegel)-[:PLAYED_CHARACTER]->(marshall)

CREATE UNIQUE (cobieSmulders)-[:PLAYED_CHARACTER]->(robin)

CREATE UNIQUE (neilPatrickHarris)-[:PLAYED_CHARACTER]->(barney)

CREATE UNIQUE (alysonHannigan)-[:PLAYED_CHARACTER]->(lily)

CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(marshall)

CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(robin)

CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(barney)

CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(lily)

CREATE (himym_s1_e1_review2 { title: "What a great pilot for a show :)",

Basic Data Modeling Examples

content: "The humour is great." })

CREATE (atlasredux:User { name: "atlasredux" })

CREATE (atlasredux)-[:WROTE_REVIEW]->(himym_s1_e1_review2)<-[:HAS_REVIEW]-(himym_s1_e1)

Information for a show

For a particular TV show, show all the seasons and all the episodes and all the reviews and all the cast

members from that show, that is all of the information connected to that TV show.

MATCH (tvShow:TVShow)-[:HAS_SEASON]->(season)-[:HAS_EPISODE]->(episode)

WHERE tvShow.name = "How I Met Your Mother"

RETURN season.name, episode.name

season.name episode.name

"HIMYM Season 1" "Pilot"

1 row

We could also grab the reviews if there are any by slightly tweaking the query:

MATCH (tvShow:TVShow)-[:HAS_SEASON]->(season)-[:HAS_EPISODE]->(episode)

WHERE tvShow.name = "How I Met Your Mother"

WITH season, episode

OPTIONAL MATCH (episode)-[:HAS_REVIEW]->(review)

RETURN season.name, episode.name, review

season.name episode.name review

"HIMYM Season 1" "Pilot" Node[15]{title:"What a

great pilot for a show :)",

content:"The humour is great. "}

"HIMYM Season 1" "Pilot" Node[5]{title:"Meet Me At The

Bar In 15 Minutes & Suit Up",

content:"It was awesome"}

2 rows

Now let’s list the characters featured in a show. Note that in this query we only put identifiers on the

nodes we actually use later on. The other nodes of the path pattern are designated by ().

MATCH (tvShow:TVShow)-[:HAS_SEASON]->()-[:HAS_EPISODE]->()-[:FEATURED_CHARACTER]->(character)

WHERE tvShow.name = "How I Met Your Mother"

RETURN DISTINCT character.name

character.name

"Lily Aldrin"

"Barney Stinson"

"Robin Scherbatsky"

"Marshall Eriksen"

"Ted Mosby"

5 rows

Now let’s look at how to get all cast members of a show.

MATCH

(tvShow:TVShow)-[:HAS_SEASON]->()-[:HAS_EPISODE]->(episode)-[:FEATURED_CHARACTER]->()<-[:PLAYED_CHARACTER]-(actor)

WHERE tvShow.name = "How I Met Your Mother"

RETURN DISTINCT actor.name

Basic Data Modeling Examples

actor.name

"Alyson Hannigan"

"Neil Patrick Harris"

"Cobie Smulders"

"Jason Segel"

"Josh Radnor"

5 rows

Information for an actor

First let’s add another TV show that Josh Radnor appeared in:

CREATE (er:TVShow { name: "ER" })

CREATE (er_s7:Season { name: "ER S7" })

CREATE (er_s7_e17:Episode { name: "Peter's Progress" })

CREATE (tedMosby:Character { name: "The Advocate " })

CREATE UNIQUE (er)-[:HAS_SEASON]->(er_s7)

CREATE UNIQUE (er_s7)-[:HAS_EPISODE]->(er_s7_e17)

WITH er_s7_e17

MATCH (actor:Actor),(episode:Episode)

WHERE actor.name = "Josh Radnor" AND episode.name = "Peter's Progress"

WITH actor, episode

CREATE (keith:Character { name: "Keith" })

CREATE UNIQUE (actor)-[:PLAYED_CHARACTER]->(keith)

CREATE UNIQUE (episode)-[:FEATURED_CHARACTER]->(keith)

And now we’ll create a query to find the episodes that he has appeared in:

MATCH (actor:Actor)-[:PLAYED_CHARACTER]->(character)<-[:FEATURED_CHARACTER]-(episode)

WHERE actor.name = "Josh Radnor"

RETURN episode.name AS Episode, character.name AS Character

Episode Character

"Peter's Progress" "Keith"

"Pilot" "Ted Mosby"

2 rows

Now let’s go for a similar query, but add the season and show to it as well.

MATCH (actor:Actor)-[:PLAYED_CHARACTER]->(character)<-[:FEATURED_CHARACTER]-(episode),

(episode)<-[:HAS_EPISODE]-(season)<-[:HAS_SEASON]-(tvshow)

WHERE actor.name = "Josh Radnor"

RETURN tvshow.name AS Show, season.name AS Season, episode.name AS Episode,

character.name AS Character

Show Season Episode Character

"ER" "ER S7" "Peter's Progress" "Keith"

"How I Met Your Mother" "HIMYM Season 1" "Pilot" "Ted Mosby"

2 rows

Chapter6.Advanced Data Modeling Examples

The following chapters contain simplified examples of how different domains can be modeled

using Neo4j. The aim is not to give full examples, but to suggest possible ways to think using nodes,

relationships, graph patterns and data locality in traversals.

The examples use Cypher queries a lot, read PartIII, “Cypher Query Language” [102] for more

information.

Advanced Data Modeling Examples

6.1.ACL structures in graphs

This example gives a generic overview of an approach to handling Access Control Lists (ACLs) in graphs,

and a simplified example with concrete queries.

Generic approach

In many scenarios, an application needs to handle security on some form of managed objects. This

example describes one pattern to handle this through the use of a graph structure and traversers

that build a full permissions-structure for any managed object with exclude and include overriding

possibilities. This results in a dynamic construction of ACLs based on the position and context of the

managed object.

The result is a complex security scheme that can easily be implemented in a graph structure,

supporting permissions overriding, principal and content composition, without duplicating data

anywhere.

Technique

As seen in the example graph layout, there are some key concepts in this domain model:

•The managed content (folders and files) that are connected by HAS_CHILD_CONTENT relationships

• The Principal subtree pointing out principals that can act as ACL members, pointed out by the

PRINCIPAL relationships.

•The aggregation of principals into groups, connected by the IS_MEMBER_OF relationship. One principal

(user or group) can be part of many groups at the same time.

•The SECURITY — relationships, connecting the content composite structure to the principal composite

structure, containing a addition/removal modifier property ("+RW").

Constructing the ACL

The calculation of the effective permissions (e.g. Read, Write, Execute) for a principal for any given ACL-

managed node (content) follows a number of rules that will be encoded into the permissions-traversal:

Top-down-Traversal

This approach will let you define a generic permission pattern on the root content, and then refine that

for specific sub-content nodes and specific principals.

Advanced Data Modeling Examples

1. Start at the content node in question traverse upwards to the content root node to determine the

path to it.

2. Start with a effective optimistic permissions list of "all permitted" (111 in a bit encoded

ReadWriteExecute case) or 000 if you like pessimistic security handling (everything is forbidden unless

explicitly allowed).

3. Beginning from the topmost content node, look for any SECURITY relationships on it.

4. If found, look if the principal in question is part of the end-principal of the SECURITY relationship.

5. If yes, add the "+" permission modifiers to the existing permission pattern, revoke the "-" permission

modifiers from the pattern.

6. If two principal nodes link to the same content node, first apply the more generic prinipals modifiers.

7. Repeat the security modifier search all the way down to the target content node, thus overriding

more generic permissions with the set on nodes closer to the target node.

The same algorithm is applicable for the bottom-up approach, basically just traversing from the target

content node upwards and applying the security modifiers dynamically as the traverser goes up.

Example

Now, to get the resulting access rights for e.g. "user 1" on the "My File.pdf" in a Top-Down approach on

the model in the graph above would go like:

1. Traveling upward, we start with "Root folder", and set the permissions to 11 initially (only considering

Read, Write).

2. There are two SECURITY relationships to that folder. User 1 is contained in both of them, but "root" is

more generic, so apply it first then "All principals" +W +R → 11.

3. "Home" has no SECURITY instructions, continue.

4. "user1 Home" has SECURITY. First apply "Regular Users" (-R -W) → 00, Then "user 1" (+R +W) → 11.

5. The target node "My File.pdf" has no SECURITY modifiers on it, so the effective permissions for "User

1" on "My File.pdf" are ReadWrite → 11.

Read-permission example

In this example, we are going to examine a tree structure of directories and files. Also, there are users

that own files and roles that can be assigned to users. Roles can have permissions on directory or files

structures (here we model only canRead, as opposed to full rwx Unix permissions) and be nested. A more

thorough example of modeling ACL structures can be found at How to Build Role-Based Access Control

in SQL1.

1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/

Advanced Data Modeling Examples

Node[20]

'name' = 'HomeU1'

Node[17]

'name' = 'File1'

leaf

Node[23]

'name' = 'Desktop'

Node[16]

'name' = 'File2'

leaf

Node[10]

'name' = 'Home'

cont ains

Node[15]

'name' = 'HomeU2'

cont ains

Node[11]

'name' = 'init.d'

Node[12]

'name' = 'etc'

cont ains

Node[18]

'name' = 'FileRoot'

cont ains cont ains

Node[7]

'name' = 'User'

Node[14]

'name' = 'User1'

member

Node[13]

'name' = 'User2'

member

owns

Node[8]

'name' = 'Adm in2'

Node[9]

'name' = 'Adm in1'

Node[21]

'name' = 'Role'

subRole

Node[22]

'name' = 'SUDOers'

subRole

canReadmember m em ber

Nod e[19]

'name' = 'Root'

has

Find all files in the directory structure

In order to find all files contained in this structure, we need a variable length query that follows all

contains relationships and retrieves the nodes at the other end of the leaf relationships.

MATCH ({ name: 'FileRoot' })-[:contains*0..]->(parentDir)-[:leaf]->(file)

RETURN file

resulting in:

file

Node[10]{name:"File1"}

Node[9]{name:"File2"}

2 rows

What files are owned by whom?

If we introduce the concept of ownership on files, we then can ask for the owners of the files we

find — connected via owns relationships to file nodes.

MATCH ({ name: 'FileRoot' })-[:contains*0..]->()-[:leaf]->(file)<-[:owns]-(user)

RETURN file, user

Returning the owners of all files below the FileRoot node.

file user

Node[10]{name:"File1"} Node[7]{name:"User1"}

Node[9]{name:"File2"} Node[6]{name:"User2"}

2 rows

Who has access to a File?

If we now want to check what users have read access to all Files, and define our ACL as

• The root directory has no access granted.

Advanced Data Modeling Examples

•Any user having a role that has been granted canRead access to one of the parent folders of a File has

read access.

In order to find users that can read any part of the parent folder hierarchy above the files, Cypher

provides optional variable length path.

MATCH (file)<-[:leaf]-()<-[:contains*0..]-(dir)

OPTIONAL MATCH (dir)<-[:canRead]-(role)-[:member]->(readUser)

WHERE file.name =~ 'File.*'

RETURN file.name, dir.name, role.name, readUser.name

This will return the file, and the directory where the user has the canRead permission along with the

user and their role.

file.name dir.name role.name readUser.name

"File2" "Desktop" <null> <null>

"File2" "HomeU2" <null> <null>

"File2" "Home" <null> <null>

"File2" "FileRoot" "SUDOers" "Admin2"

"File2" "FileRoot" "SUDOers" "Admin1"

"File1" "HomeU1" <null> <null>

"File1" "Home" <null> <null>

"File1" "FileRoot" "SUDOers" "Admin2"

"File1" "FileRoot" "SUDOers" "Admin1"

9 rows

The results listed above contain null for optional path segments, which can be mitigated by either

asking several queries or returning just the really needed values.

Advanced Data Modeling Examples

6.2.Hyperedges

Imagine a user being part of different groups. A group can have different roles, and a user can be part

of different groups. He also can have different roles in different groups apart from the membership.

The association of a User, a Group and a Role can be referred to as a HyperEdge. However, it can be

easily modeled in a property graph as a node that captures this n-ary relationship, as depicted below in

the U1G2R1 node.

Figure6.1.Graph

nam e = 'U1G2R1'

nam e = 'Group2'

hasGroup

nam e = 'Role1'

hasRole

canHave

nam e = 'Role2'

canHave

nam e = 'Group'

isA

nam e = 'Role'

isA isA

nam e = 'Group1'

canHave canHaveisA

nam e = 'User1'

hasRoleInGroup

in in name = 'U1G1R2'

hasRoleInGroup

hasRole

hasGroup

Find Groups

To find out in what roles a user is for a particular groups (here Group2), the following query can traverse

this HyperEdge node and provide answers.

Query

MATCH ({ name: 'User1' })-[:hasRoleInGroup]->(hyperEdge)-[:hasGroup]->({ name: 'Group2' }),

(hyperEdge)-[:hasRole]->(role)

RETURN role.name

The role of User1 is returned:

Result

role.name

"Role1"

1 row

Advanced Data Modeling Examples

Find all groups and roles for a user

Here, find all groups and the roles a user has, sorted by the name of the role.

Query

MATCH ({ name: 'User1' })-[:hasRoleInGroup]->(hyperEdge)-[:hasGroup]->(group),

(hyperEdge)-[:hasRole]->(role)

RETURN role.name, group.name

ORDER BY role.name ASC

The groups and roles of User1 are returned:

Result

role.name group.name

"Role1" "Group2"

"Role2" "Group1"

2 rows

Find common groups based on shared roles

Assume a more complicated graph:

1. Two user nodes User1, User2.

2. User1 is in Group1, Group2, Group3.

3. User1 has Role1, Role2 in Group1; Role2, Role3 in Group2; Role3, Role4 in Group3 (hyper edges).

4. User2 is in Group1, Group2, Group3.

5. User2 has Role2, Role5 in Group1; Role3, Role4 in Group2; Role5, Role6 in Group3 (hyper edges).

The graph for this looks like the following (nodes like U1G2R23 representing the HyperEdges):

Figure6.2.Graph

nam e = 'U2G2R34'

nam e = 'Role3'

hasRole

nam e = 'Role4'

hasRole

nam e = 'Group2'

hasGroup

nam e = 'U1G3R34'

hasRolehasRole

nam e = 'Group3'

hasGroup

nam e = 'User2'

hasRoleInGroup

nam e = 'U2G3R56'

hasRoleInGroup

nam e = 'U2G1R25'

hasRoleInGroup

hasGroup

nam e = 'Role6'

hasRole

nam e = 'Role5'

hasRole

nam e = 'Role2'

hasRole hasRole

nam e = 'Group1'

hasGroup

nam e = 'User1'

hasRoleInGroup

nam e = 'U1G2R23'

hasRoleInGroup

nam e = 'U1G1R12'

hasRoleInGroup

hasRole hasGroup hasRole hasRole hasGroup

nam e = 'Role1'

hasRole

To return Group1 and Group2 as User1 and User2 share at least one common role in these two groups, the

query looks like this:

Query

MATCH (u1)-[:hasRoleInGroup]->(hyperEdge1)-[:hasGroup]->(group),(hyperEdge1)-[:hasRole]->(role),

(u2)-[:hasRoleInGroup]->(hyperEdge2)-[:hasGroup]->(group),(hyperEdge2)-[:hasRole]->(role)

WHERE u1.name = 'User1' AND u2.name = 'User2'

RETURN group.name, count(role)

ORDER BY group.name ASC

The groups where User1 and User2 share at least one common role:

Result

group.name count(role)

"Group1" 1

"Group2" 1

2 rows

Advanced Data Modeling Examples

6.3.Basic friend finding based on social neighborhood

Imagine an example graph like the following one:

Figure6.3.Graph

nam e = 'Bill'

nam e = 'Ian'

knows

nam e = 'Derrick'

knows

nam e = 'Sara'

knows

knows nam e = 'Jill'

knows

nam e = 'Joe'

knows

To find out the friends of Joe’s friends that are not already his friends, the query looks like this:

Query

MATCH (joe { name: 'Joe' })-[:knows*2..2]-(friend_of_friend)

WHERE NOT (joe)-[:knows]-(friend_of_friend)

RETURN friend_of_friend.name, COUNT(*)

ORDER BY COUNT(*) DESC , friend_of_friend.name

This returns a list of friends-of-friends ordered by the number of connections to them, and secondly by

their name.

Result

friend_of_friend.name COUNT(*)

"Ian" 2

"Derrick" 1

"Jill" 1

3 rows

Advanced Data Modeling Examples

6.4.Co-favorited places

Figure6.4.Graph

name = 'SaunaX' name = 'CoffeeShop1'

name = 'Cosy'

tagged

name = 'Cool'

tagged

name = 'MelsPlace'

taggedtagged

name = 'CoffeeShop3'

tagged

name = 'CoffeeShop2'

tagged

name = 'CoffeShop2'

name = 'Jill'

favorite favorite favorite

name = 'Joe'

favorite favorite favorite

Co-favorited places — users who like x also like y

Find places that people also like who favorite this place:

• Determine who has favorited place x.

• What else have they favorited that is not place x.

Query

MATCH (place)<-[:favorite]-(person)-[:favorite]->(stuff)

WHERE place.name = 'CoffeeShop1'

RETURN stuff.name, count(*)

ORDER BY count(*) DESC , stuff.name

The list of places that are favorited by people that favorited the start place.

Result

stuff.name count(*)

"MelsPlace" 2

"CoffeShop2" 1

"SaunaX" 1

3 rows

Co-Tagged places — places related through tags

Find places that are tagged with the same tags:

• Determine the tags for place x.

• What else is tagged the same as x that is not x.

Query

MATCH (place)-[:tagged]->(tag)<-[:tagged]-(otherPlace)

WHERE place.name = 'CoffeeShop1'

RETURN otherPlace.name, collect(tag.name)

ORDER BY length(collect(tag.name)) DESC , otherPlace.name

This query returns other places than CoffeeShop1 which share the same tags; they are ranked by the

number of tags.

Result

otherPlace.name collect(tag.name)

"MelsPlace" ["Cosy", "Cool"]

3 rows

Advanced Data Modeling Examples

otherPlace.name collect(tag.name)

"CoffeeShop2" ["Cool"]

"CoffeeShop3" ["Cosy"]

3 rows

Advanced Data Modeling Examples

6.5.Find people based on similar favorites

Figure6.5.Graph

nam e = 'Sara'

nam e = 'Bikes'

favorite

nam e = 'Cats'

favorite

nam e = 'Derrick'

favoritefavorite

nam e = 'Jill'

favorite

nam e = 'Joe'

friend

favoritefavorite

To find out the possible new friends based on them liking similar things as the asking person, use a

query like this:

Query

MATCH (me { name: 'Joe' })-[:favorite]->(stuff)<-[:favorite]-(person)

WHERE NOT (me)-[:friend]-(person)

RETURN person.name, count(stuff)

ORDER BY count(stuff) DESC

The list of possible friends ranked by them liking similar stuff that are not yet friends is returned.

Result

person.name count(stuff)

"Derrick" 2

"Jill" 1

2 rows

Advanced Data Modeling Examples

6.6.Find people based on mutual friends and groups

Figure6.6.Graph

Node[0]

nam e = 'Bill'

Node[1]

nam e = 'Group1'

member_of_group

Node[2]

nam e = 'Bob'

member_of_group

Node[3]

nam e = 'Jill'

knows

member_of_group

Node[4]

nam e = 'Joe'

knows

member_of_group

In this scenario, the problem is to determine mutual friends and groups, if any, between persons. If no

mutual groups or friends are found, there should be a 0 returned.

Query

MATCH (me { name: 'Joe' }),(other)

WHERE other.name IN ['Jill', 'Bob']

OPTIONAL MATCH pGroups=(me)-[:member_of_group]->(mg)<-[:member_of_group]-(other)

OPTIONAL MATCH pMutualFriends=(me)-[:knows]->(mf)<-[:knows]-(other)

RETURN other.name AS name, count(DISTINCT pGroups) AS mutualGroups,

count(DISTINCT pMutualFriends) AS mutualFriends

ORDER BY mutualFriends DESC

The question we are asking is — how many unique paths are there between me and Jill, the paths being

common group memberships, and common friends. If the paths are mandatory, no results will be

returned if me and Bob lack any common friends, and we don’t want that. To make a path optional, you

have to make at least one of it’s relationships optional. That makes the whole path optional.

Result

name mutualGroups mutualFriends

"Jill" 1 1

"Bob" 1 0

2 rows

Advanced Data Modeling Examples

6.7.Find friends based on similar tagging

Figure6.7.Graph

nam e = 'Animals' name = 'Hobby'

nam e = 'Surfing'

tagged

nam e = 'Sara'

nam e = 'Horses'

favorite

nam e = 'Bikes'

favorite

tagged tagged

nam e = 'Cats'

tagged

nam e = 'Derrick'

favorite

nam e = 'Joe'

favoritefavorite favoritefavorite

To find people similar to me based on the taggings of their favorited items, one approach could be:

• Determine the tags associated with what I favorite.

• What else is tagged with those tags?

• Who favorites items tagged with the same tags?

• Sort the result by how many of the same things these people like.

Query

MATCH

(me)-[:favorite]->(myFavorites)-[:tagged]->(tag)<-[:tagged]-(theirFavorites)<-[:favorite]-(people)

WHERE me.name = 'Joe' AND NOT me=people

RETURN people.name AS name, count(*) AS similar_favs

ORDER BY similar_favs DESC

The query returns the list of possible friends ranked by them liking similar stuff that are not yet friends.

Result

name similar_favs

"Sara" 2

"Derrick" 1

2 rows

Advanced Data Modeling Examples

6.8.Multirelational (social) graphs

Figure6.8.Graph

nam e = 'cats'

nam e = 'nature'

nam e = 'Ben'

nam e = 'Sara'

LIKES

FOLLOWS

nam e = 'cars'

LIKES

nam e = 'bikes'

LIKES

nam e = 'Joe'

FOLLOWS

LIKES

FOLLOWS

LIKES

nam e = 'Maria'

FOLLOWS LOVES

LIKES

FOLLOWS

LOVES

This example shows a multi-relational network between persons and things they like. A multi-relational

graph is a graph with more than one kind of relationship between nodes.

Query

MATCH (me { name: 'Joe' })-[r1:FOLLOWS|:LOVES]->(other)-[r2]->(me)

WHERE type(r1)=type(r2)

RETURN other.name, type(r1)

The query returns people that FOLLOWS or LOVES Joe back.

Result

other.name type(r1)

"Maria" "FOLLOWS"

"Maria" "LOVES"

"Sara" "FOLLOWS"

3 rows

Advanced Data Modeling Examples

6.9.Implementing newsfeeds in a graph

nam e = 'Bob'

nam e = 'Alice'

FRIEND

status = 'CONFIRMED'

date = 1

nam e = 'bob_s1'

text = 'bobs status1'

STATUS

nam e = 'Joe'

FRIEND

status = 'PENDING'

date = 2

nam e = 'alice_s1'

text = 'Alices status1'

STATUS

date = 4

nam e = 'bob_s2'

text = 'bobs status2'

FRIEND

status = 'CONFIRMED'

date = 3

nam e = 'joe_s1'

text = 'Joe status1'

STATUS

date = 5

nam e = 'alice_s2'

text = 'Alices status2'

date = 6

nam e = 'joe_s2'

text = 'Joe status2'

Implementation of newsfeed or timeline feature is a frequent requirement for social applications. The

following exmaples are inspired by Newsfeed feature powered by Neo4j Graph Database2. The query

asked here is:

Starting at me, retrieve the time-ordered status feed of the status updates of me and and all friends that

are connected via a CONFIRMED FRIEND relationship to me.

Query

MATCH (me { name: 'Joe' })-[rels:FRIEND*0..1]-(myfriend)

WHERE ALL (r IN rels WHERE r.status = 'CONFIRMED')

WITH myfriend

MATCH (myfriend)-[:STATUS]-(latestupdate)-[:NEXT*0..1]-(statusupdates)

RETURN myfriend.name AS name, statusupdates.date AS date, statusupdates.text AS text

ORDER BY statusupdates.date DESC LIMIT 3

To understand the strategy, let’s divide the query into five steps:

1. First Get the list of all my friends (along with me) through FRIEND relationship (MATCH (me {name:

'Joe'})-[rels:FRIEND*0..1]-(myfriend)). Also, the WHERE predicate can be added to check whether the

friend request is pending or confirmed.

2 https://web.archive.org/web/20121102191919/http://techfin.in/2012/10/newsfeed-feature-powered-by-neo4j-graph-database/

Advanced Data Modeling Examples

2. Get the latest status update of my friends through Status relationship (MATCH (myfriend)-[:STATUS]-

(latestupdate)).

3. Get subsequent status updates (along with the latest one) of my friends through NEXT relationships

(MATCH (myfriend)-[:STATUS]-(latestupdate)-[:NEXT*0..1]-(statusupdates)) which will give you the

latest and one additional statusupdate; adjust 0..1 to whatever suits your case.

4. Sort the status updates by posted date (ORDER BY statusupdates.date DESC).

5. LIMIT the number of updates you need in every query (LIMIT 3).

Result

name date text

"Joe" 6 "Joe status2"

"Bob" 4 "bobs status2"

"Joe" 3 "Joe status1"

3 rows

Here, the example shows how to add a new status update into the existing data for a user.

Query

MATCH (me)

WHERE me.name='Bob'

OPTIONAL MATCH (me)-[r:STATUS]-(secondlatestupdate)

DELETE r

CREATE (me)-[:STATUS]->(latest_update { text:'Status',date:123 })

WITH latest_update, collect(secondlatestupdate) AS seconds

FOREACH (x IN seconds | CREATE (latest_update)-[:NEXT]->(x))

RETURN latest_update.text AS new_status

Dividing the query into steps, this query resembles adding new item in middle of a doubly linked list:

1. Get the latest update (if it exists) of the user through the STATUS relationship (OPTIONAL MATCH (me)-

[r:STATUS]-(secondlatestupdate)).

2. Delete the STATUS relationship between user and secondlatestupdate (if it exists), as this would

become the second latest update now and only the latest update would be added through a STATUS

relationship; all earlier updates would be connected to their subsequent updates through a NEXT

relationship. (DELETE r).

3. Now, create the new statusupdate node (with text and date as properties) and connect

this with the user through a STATUS relationship (CREATE (me)-[:STATUS]->(latest_update

{ text:'Status',date:123 })).

4. Pipe over statusupdate or an empty collection to the next query part (WITH latest_update,

collect(secondlatestupdate) AS seconds).

5. Now, create a NEXT relationship between the latest status update and the second latest status update

(if it exists) (FOREACH(x in seconds | CREATE (latest_update)-[:NEXT]->(x))).

Result

new_status

"Status"

1 row

Nodes created: 1

Relationships created: 2

Properties set: 2

Relationships deleted: 1

Advanced Data Modeling Examples

Node[0]

nam e = 'Bob'

Node[1]

date = 1

nam e = 'bob_s1'

text = 'bobs status1'

STATUS

Node[2]

date = 4

nam e = 'bob_s2'

text = 'bobs status2'

Advanced Data Modeling Examples

6.10.Boosting recommendation results

Figure6.9.Graph

name = 'Clark Kent'

name = 'Daily Planet'

WORKS_AT

weight = 2

activity = 45

name = 'Jim my Olsen'

KNOWS

weight = 4

name = 'Lois Lane'

KNOWS

weight = 4

WORKS_AT

weight = 2

activity = 10

name = 'Perry White'

KNOWS

weight = 4

WORKS_AT

weight = 2

activity = 56

name = 'Anderson Cooper'

KNOWS

weight = 4 KNOWS

weight = 4

name = 'CNN'

WORKS_AT

weight = 2

activity = 2

WORKS_AT

weight = 2

activity = 6

WORKS_AT

weight = 2

activity = 3

This query finds the recommended friends for the origin that are working at the same place as the

origin, or know a person that the origin knows, also, the origin should not already know the target. This

recommendation is weighted for the weight of the relationship r2, and boosted with a factor of 2, if

there is an activity-property on that relationship

Query

MATCH (origin)-[r1:KNOWS|WORKS_AT]-(c)-[r2:KNOWS|WORKS_AT]-(candidate)

WHERE origin.name = "Clark Kent" AND type(r1)=type(r2) AND NOT (origin)-[:KNOWS]-(candidate)

RETURN origin.name AS origin, candidate.name AS candidate, SUM(ROUND(r2.weight

+(COALESCE(r2.activity,

0)* 2))) AS boost

ORDER BY boost DESC LIMIT 10

This returns the recommended friends for the origin nodes and their recommendation score.

Result

origin candidate boost

"Clark Kent" "Perry White" 22. 0

"Clark Kent" "Anderson Cooper" 4. 0

2 rows

Advanced Data Modeling Examples

6.11.Calculating the clustering coefficient of a network

Figure6.10.Graph

nam e = 'startnode'

KNOWS

KNOWS KNOWS

KNOWS KNOWS KNOWS

In this example, adapted from Niko Gamulins blog post on Neo4j for Social Network Analysis3,

the graph in question is showing the 2-hop relationships of a sample person as nodes with KNOWS

relationships.

The clustering coefficient4 of a selected node is defined as the probability that two randomly selected

neighbors are connected to each other. With the number of neighbors as n and the number of mutual

connections between the neighbors r the calculation is:

The number of possible connections between two neighbors is n!/(2!(n-2)!) = 4!/(2!(4-2)!) = 24/4 =

6, where n is the number of neighbors n = 4 and the actual number r of connections is 1. Therefore the

clustering coefficient of node 1 is 1/6.

n and r are quite simple to retrieve via the following query:

Query

MATCH (a { name: "startnode" })--(b)

WITH a, count(DISTINCT b) AS n

MATCH (a)--()-[r]-()--(a)

RETURN n, count(DISTINCT r) AS r

This returns n and r for the above calculations.

Result

n r

4 1

1 row

3 http://mypetprojects.blogspot.se/2012/06/social-network-analysis-with-neo4j.html

4 http://en.wikipedia.org/wiki/Clustering_coefficient

Advanced Data Modeling Examples

6.12.Pretty graphs

This section is showing how to create some of the named pretty graphs on Wikipedia5.

Star graph

The graph is created by first creating a center node, and then once per element in the range, creates a

leaf node and connects it to the center.

Query

CREATE (center)

FOREACH (x IN range(1,6)| CREATE (leaf),(center)-[:X]->(leaf))

RETURN id(center) AS id;

The query returns the id of the center node.

Result

1 row

Nodes created: 7

Relationships created: 6

Figure6.11.Graph

Wheel graph

This graph is created in a number of steps:

• Create a center node.

• Once per element in the range, create a leaf and connect it to the center.

• Connect neighboring leafs.

• Find the minimum and maximum leaf and connect these.

• Return the id of the center node.

Query

CREATE (center)

5 http://en.wikipedia.org/wiki/Gallery_of_named_graphs

Advanced Data Modeling Examples

FOREACH (x IN range(1,6)| CREATE (leaf { count:x }),(center)-[:X]->(leaf))

WITH center

MATCH (large_leaf)<--(center)-->(small_leaf)

WHERE large_leaf.count = small_leaf.count + 1

CREATE (small_leaf)-[:X]->(large_leaf)

WITH center, min(small_leaf.count) AS min, max(large_leaf.count) AS max

MATCH (first_leaf)<--(center)-->(last_leaf)

WHERE first_leaf.count = min AND last_leaf.count = max

CREATE (last_leaf)-[:X]->(first_leaf)

RETURN id(center) AS id

The query returns the id of the center node.

Result

1 row

Nodes created: 7

Relationships created: 12

Properties set: 6

Figure6.12.Graph

count = 6

count = 5

count = 4

count = 3

count = 2 X

count = 1

Complete graph

To create this graph, we first create 6 nodes and label them with the Leaf label. We then match all the

unique pairs of nodes, and create a relationship between them.

Query

FOREACH (x IN range(1,6)| CREATE (leaf:Leaf { count : x }))

WITH *

MATCH (leaf1:Leaf),(leaf2:Leaf)

WHERE id(leaf1)< id(leaf2)

CREATE (leaf1)-[:X]->(leaf2);

Nothing is returned by this query.

Result

(empty result)

Nodes created: 6

Relationships created: 15

Properties set: 6

Labels added: 6

Advanced Data Modeling Examples

Figure6.13.Graph

Leaf

count = 1

Leaf

count = 6

Leaf

count = 5

Leaf

count = 4

Leaf

count = 3

Leaf

count = 2

X X

Friendship graph

This query first creates a center node, and then once per element in the range, creates a cycle graph

and connects it to the center

Query

CREATE (center)

FOREACH (x IN range(1,3)| CREATE (leaf1),(leaf2),(center)-[:X]->(leaf1),(center)-[:X]->(leaf2),

(leaf1)-[:X]->(leaf2))

RETURN ID(center) AS id

The id of the center node is returned by the query.

Result

1 row

Nodes created: 7

Relationships created: 9

Advanced Data Modeling Examples

Figure6.14.Graph

Advanced Data Modeling Examples

6.13.A multilevel indexing structure (path tree)

In this example, a multi-level tree structure is used to index event nodes (here Event1, Event2 and Event3,

in this case with a YEAR-MONTH-DAY granularity, making this a timeline indexing structure. However,

this approach should work for a wide range of multi-level ranges.

The structure follows a couple of rules:

• Events can be indexed multiple times by connecting the indexing structure leafs with the events via a

VALUE relationship.

• The querying is done in a path-range fashion. That is, the start- and end path from the indexing root

to the start and end leafs in the tree are calculated

• Using Cypher, the queries following different strategies can be expressed as path sections and put

together using one single query.

The graph below depicts a structure with 3 Events being attached to an index structure at different

leafs.

Figure6.15.Graph

Root

Year 2010

2010

Year 2011

2011

Month 12

Month 01

Day 31

Day 01

Day 02

Day 03

Event1

VALUE

Event2

VALUE

Event3

VALUE

Return zero range

Here, only the events indexed under one leaf (2010-12-31) are returned. The query only needs one path

segment rootPath (color Green) through the index.

Advanced Data Modeling Examples

Figure6.16.Graph

Root

Year 2010

2010

Year 2011

2011

Month 12

Month 01

Day 31

Day 01

Day 02

Day 03

Event1

VALUE

Event2

VALUE

Event3

VALUE

Query

MATCH rootPath=(root)-[:`2010`]->()-[:`12`]->()-[:`31`]->(leaf),(leaf)-[:VALUE]->(event)

WHERE root.name = 'Root'

RETURN event.name

ORDER BY event.name ASC

Returning all events on the date 2010-12-31, in this case Event1 and Event2

Result

event.name

"Event1"

"Event2"

2 rows

Return the full range

In this case, the range goes from the first to the last leaf of the index tree. Here, startPath (color

Greenyellow) and endPath (color Green) span up the range, valuePath (color Blue) is then connecting the

leafs, and the values can be read from the middle node, hanging off the values (color Red) path.

Advanced Data Modeling Examples

Figure6.17.Graph

Root

Year 2010

2010

Year 2011

2011

Month 12

Month 01

Day 31

Day 01

Day 02

Day 03

Event1

VALUE

Event2

VALUE

Event3

VALUE

Query

MATCH startPath=(root)-[:`2010`]->()-[:`12`]->()-[:`31`]->(startLeaf),

endPath=(root)-[:`2011`]->()-[:`01`]->()-[:`03`]->(endLeaf),

valuePath=(startLeaf)-[:NEXT*0..]->(middle)-[:NEXT*0..]->(endLeaf),

vals=(middle)-[:VALUE]->(event)

WHERE root.name = 'Root'

RETURN event.name

ORDER BY event.name ASC

Returning all events between 2010-12-31 and 2011-01-03, in this case all events.

Result

event.name

"Event1"

"Event2"

"Event3"

4 rows

Return partly shared path ranges

Here, the query range results in partly shared paths when querying the index, making the introduction

of and common path segment commonPath (color Black) necessary, before spanning up startPath (color

Greenyellow) and endPath (color Darkgreen) . After that, valuePath (color Blue) connects the leafs and the

indexed values are returned off values (color Red) path.

Advanced Data Modeling Examples

Figure6.18.Graph

Root

Year 2010

2010

Year 2011

2011

Month 12

Month 01

Day 31

Day 01

Day 02

Day 03

Event1

VALUE

Event2

VALUE

Event3

VALUE

Query

MATCH commonPath=(root)-[:`2011`]->()-[:`01`]->(commonRootEnd),

startPath=(commonRootEnd)-[:`01`]->(startLeaf), endPath=(commonRootEnd)-[:`03`]->(endLeaf),

valuePath=(startLeaf)-[:NEXT*0..]->(middle)-[:NEXT*0..]->(endLeaf),

vals=(middle)-[:VALUE]->(event)

WHERE root.name = 'Root'

RETURN event.name

ORDER BY event.name ASC

Returning all events between 2011-01-01 and 2011-01-03, in this case Event2 and Event3.

Result

event.name

"Event2"

"Event3"

2 rows

Advanced Data Modeling Examples

6.14.Complex similarity computations

Calculate similarities by complex calculations

Here, a similarity between two players in a game is calculated by the number of times they have eaten

the same food.

Query

MATCH (me { name: 'me' })-[r1:ATE]->(food)<-[r2:ATE]-(you)

WITH me,count(DISTINCT r1) AS H1,count(DISTINCT r2) AS H2,you

MATCH (me)-[r1:ATE]->(food)<-[r2:ATE]-(you)

RETURN sum((1-ABS(r1.times/H1-r2.times/H2))*(r1.times+r2.times)/(H1+H2)) AS similarity

The two players and their similarity measure.

Result

similarity

-30. 0

1 row

Figure6.19.Graph

nam e = 'me'

nam e = 'meat'

ATE

tim es = 10

nam e = 'you'

ATE

tim es = 5

Advanced Data Modeling Examples

6.15.The Graphity activity stream model

Find Activity Streams in a network without scaling penalty

This is an approach for scaling the retrieval of activity streams in a friend graph put forward by Rene

Pickard as Graphity6. In short, a linked list is created for every persons friends in the order that the last

activities of these friends have occured. When new activities occur for a friend, all the ordered friend

lists that this friend is part of are reordered, transferring computing load to the time of new event

updates instead of activity stream reads.

Tip

This approach of course makes excessive use of relationship types. This needs to be

taken into consideration when designing a production system with this approach. See

Section17.5, “Capacity” [284] for the maximum number of relationship types.

To find the activity stream for a person, just follow the linked list of the friend list, and retrieve the

needed amount of activities form the respective activity list of the friends.

Query

MATCH p=(me { name: 'Jane' })-[:jane_knows*]->(friend),(friend)-[:has]->(status)

RETURN me.name, friend.name, status.name, length(p)

ORDER BY length(p)

The returns the activity stream for Jane.

Result

me.name friend.name status.name length(p)

"Jane" "Bill" "Bill_s1" 1

"Jane" "Joe" "Joe_s1" 2

"Jane" "Bob" "Bob_s1" 3

3 rows

6 http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-

networks/

Advanced Data Modeling Examples

Figure6.20.Graph

nam e = 'Bill'

nam e = 'Joe'

jane_knows

nam e = 'Bill_s1'

has

nam e = 'Joe_s1'

has

nam e = 'Bob'

jane_knows

nam e = 'Bill_s2'

nam e = 'Ted_s1'

nam e = 'Ted_s2'

nam e = 'Jane'

jane_knows

nam e = 'Joe_s2'

nam e = 'Bob_s1'

has

nam e = 'Ted'

bob_knows

has

Advanced Data Modeling Examples

6.16.User roles in graphs

This is an example showing a hierarchy of roles. What’s interesting is that a tree is not sufficient for

storing this kind of structure, as elaborated below.

This is an implementation of an example found in the article A Model to Represent Directed Acyclic

Graphs (DAG) on SQL Databases7 by Kemal Erdogan8. The article discusses how to store directed

acyclic graphs9 (DAGs) in SQL based DBs. DAGs are almost trees, but with a twist: it may be possible to

reach the same node through different paths. Trees are restricted from this possibility, which makes

them much easier to handle. In our case it is “Ali” and “Engin”, as they are both admins and users and

thus reachable through these group nodes. Reality often looks this way and can’t be captured by tree

structures.

In the article an SQL Stored Procedure solution is provided. The main idea, that also have some support

from scientists, is to pre-calculate all possible (transitive) paths. Pros and cons of this approach:

• decent performance on read

• low performance on insert

• wastes lots of space

• relies on stored procedures

In Neo4j storing the roles is trivial. In this case we use PART_OF (green edges) relationships to model the

group hierarchy and MEMBER_OF (blue edges) to model membership in groups. We also connect the top

level groups to the reference node by ROOT relationships. This gives us a useful partitioning of the graph.

Neo4j has no predefined relationship types, you are free to create any relationship types and give them

the semantics you want.

Lets now have a look at how to retrieve information from the graph. The the queries are done using

Cypher, the Java code is using the Neo4j Traversal API (see Section34.2, “Traversal Framework Java

API” [615], which is part of PartVII, “Advanced Usage” [561]).

Get the admins

In Cypher, we could get the admins like this:

7 http://www.codeproject.com/Articles/22824/A-Model-to-Represent-Directed-Acyclic-Graphs-DAG-o

8 http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=274518

9 http://en.wikipedia.org/wiki/Directed_acyclic_graph

Advanced Data Modeling Examples

MATCH ({ name: 'Admins' })<-[:PART_OF*0..]-(group)<-[:MEMBER_OF]-(user)

RETURN user.name, group.name

resulting in:

user.name group.name

"Ali" "Admins"

"Demet" "HelpDesk"

"Engin" "HelpDesk"

3 rows

And here’s the code when using the Java Traversal API:

Node admins = getNodeByName( "Admins" );

TraversalDescription traversalDescription = db.traversalDescription()

.breadthFirst()

.evaluator( Evaluators.excludeStartPosition() )

.relationships( RoleRels.PART_OF, Direction.INCOMING )

.relationships( RoleRels.MEMBER_OF, Direction.INCOMING );

Traverser traverser = traversalDescription.traverse( admins );

resulting in the output

Found: Ali at depth: 0

Found: HelpDesk at depth: 0

Found: Demet at depth: 1

Found: Engin at depth: 1

The result is collected from the traverser using this code:

String output = "";

for ( Path path : traverser )

{

Node node = path.endNode();

output += "Found: " + node.getProperty( NAME ) + " at depth: "

+ ( path.length() - 1 ) + "\n";

}

Get the group memberships of a user

In Cypher:

MATCH ({ name: 'Jale' })-[:MEMBER_OF]->()-[:PART_OF*0..]->(group)

RETURN group.name

group.name

"ABCTechnicians"

"Technicians"

"Users"

3 rows

Using the Neo4j Java Traversal API, this query looks like:

Node jale = getNodeByName( "Jale" );

traversalDescription = db.traversalDescription()

.depthFirst()

.evaluator( Evaluators.excludeStartPosition() )

.relationships( RoleRels.MEMBER_OF, Direction.OUTGOING )

.relationships( RoleRels.PART_OF, Direction.OUTGOING );

Advanced Data Modeling Examples

traverser = traversalDescription.traverse( jale );

resulting in:

Found: ABCTechnicians at depth: 0

Found: Technicians at depth: 1

Found: Users at depth: 2

Get all groups

In Cypher:

MATCH ({ name: 'Reference_Node' })<-[:ROOT]->()<-[:PART_OF*0..]-(group)

RETURN group.name

group.name

"Users"

"Managers"

"Technicians"

"ABCTechnicians"

"Admins"

"HelpDesk"

6 rows

In Java:

Node referenceNode = getNodeByName( "Reference_Node") ;

traversalDescription = db.traversalDescription()

.breadthFirst()

.evaluator( Evaluators.excludeStartPosition() )

.relationships( RoleRels.ROOT, Direction.INCOMING )

.relationships( RoleRels.PART_OF, Direction.INCOMING );

traverser = traversalDescription.traverse( referenceNode );

resulting in:

Found: Users at depth: 0

Found: Admins at depth: 0

Found: Technicians at depth: 1

Found: Managers at depth: 1

Found: HelpDesk at depth: 1

Found: ABCTechnicians at depth: 2

Get all members of all groups

Now, let’s try to find all users in the system being part of any group.

In Cypher, this looks like:

MATCH ({ name: 'Reference_Node' })<-[:ROOT]->(root), p=(root)<-[PART_OF*0..]-()<-[:MEMBER_OF]-(user)

RETURN user.name, min(length(p))

ORDER BY min(length(p)), user.name

and results in the following output:

user.name min(length(p))

"Ali" 1

"Burcu" 1

10 rows

Advanced Data Modeling Examples

user.name min(length(p))

"Can" 1

"Engin" 1

"Demet" 2

"Fuat" 2

"Gul" 2

"Hakan" 2

"Irmak" 2

"Jale" 3

10 rows

in Java:

traversalDescription = db.traversalDescription()

.breadthFirst()

.evaluator(

Evaluators.includeWhereLastRelationshipTypeIs( RoleRels.MEMBER_OF ) );

traverser = traversalDescription.traverse( referenceNode );

Found: Can at depth: 1

Found: Burcu at depth: 1

Found: Engin at depth: 1

Found: Ali at depth: 1

Found: Irmak at depth: 2

Found: Hakan at depth: 2

Found: Fuat at depth: 2

Found: Gul at depth: 2

Found: Demet at depth: 2

Found: Jale at depth: 3

As seen above, querying even more complex scenarios can be done using comparatively short

constructs in Cypher or Java.

Chapter7.Languages

Please see http://neo4j.com/developer/language-guides/ for the current set of drivers!

There’s an included Java example which shows a “low-level” approach to using the Neo4j REST API from

Java.

Languages

7.1.How to use the REST API from Java

Creating a graph through the REST API from Java

The REST API uses HTTP and JSON, so that it can be used from many languages and platforms. Still,

when geting started it’s useful to see some patterns that can be re-used. In this brief overview, we’ll

show you how to create and manipulate a simple graph through the REST API and also how to query it.

For these examples, we’ve chosen the Jersey1 client components, which are easily downloaded2 via

Maven.

Start the server

Before we can perform any actions on the server, we need to start it as per Section23.2, “Server

Installation” [439]. Next up, we’ll check the connection to the server:

WebResource resource = Client.create()

.resource( SERVER_ROOT_URI );

ClientResponse response = resource.get( ClientResponse.class );

System.out.println( String.format( "GET on [%s], status code [%d]",

SERVER_ROOT_URI, response.getStatus() ) );

response.close();

If the status of the response is 200 OK, then we know the server is running fine and we can continue. If

the code fails to connect to the server, then please have a look at PartV, “Operations” [435].

Note

If you get any other response than 200 OK (particularly 4xx or 5xx responses) then please

check your configuration and look in the log files in the data/log directory.

Sending Cypher

Using the REST API, we can send Cypher queries to the server. This is the main way to use Neo4j. It

allows control of the transactional boundaries as needed.

Let’s try to use this to list all the nodes in the database which have a name property.

final String txUri = SERVER_ROOT_URI + "transaction/commit";

WebResource resource = Client.create().resource( txUri );

String payload = "{\"statements\" : [ {\"statement\" : \"" +query + "\"} ]}";

ClientResponse response = resource

.accept( MediaType.APPLICATION_JSON )

.type( MediaType.APPLICATION_JSON )

.entity( payload )

.post( ClientResponse.class );

System.out.println( String.format(

"POST [%s] to [%s], status code [%d], returned data: "

+ System.lineSeparator() + "%s",

payload, txUri, response.getStatus(),

response.getEntity( String.class ) ) );

response.close();

For more details, see Section21.1, “Transactional Cypher HTTP endpoint” [298].

Fine-grained REST API calls

For exploratory and special purposes, there is a fine grained REST API, see Chapter21, REST API [297].

The following sections highlight some of the basic operations.

1 http://jersey.java.net/

2 https://jersey.java.net/nonav/documentation/1.9/user-guide.html#chapter_deps

Languages

Creating a node

The REST API uses POST to create nodes. Encapsulating that in Java is straightforward using the Jersey

client:

final String nodeEntryPointUri = SERVER_ROOT_URI + "node";

// http://localhost:7474/db/data/node

WebResource resource = Client.create()

.resource( nodeEntryPointUri );

// POST {} to the node entry point URI

ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )

.type( MediaType.APPLICATION_JSON )

.entity( "{}" )

.post( ClientResponse.class );

final URI location = response.getLocation();

System.out.println( String.format(

"POST to [%s], status code [%d], location header [%s]",

nodeEntryPointUri, response.getStatus(), location.toString() ) );

response.close();

return location;

If the call completes successfully, under the covers it will have sent a HTTP request containing a JSON

payload to the server. The server will then have created a new node in the database and responded

with a 201 Created response and a Location header with the URI of the newly created node.

In our example, we call this functionality twice to create two nodes in our database.

Adding properties

Once we have nodes in our datatabase, we can use them to store useful data. In this case, we’re going

to store information about music in our database. Let’s start by looking at the code that we use to

create nodes and add properties. Here we’ve added nodes to represent "Joe Strummer" and a band

called "The Clash".

URI firstNode = createNode();

addProperty( firstNode, "name", "Joe Strummer" );

URI secondNode = createNode();

addProperty( secondNode, "band", "The Clash" );

Inside the addProperty method we determine the resource that represents properties for the node and

decide on a name for that property. We then proceed to PUT the value of that property to the server.

String propertyUri = nodeUri.toString() + "/properties/" + propertyName;

// http://localhost:7474/db/data/node/{node_id}/properties/{property_name}

WebResource resource = Client.create()

.resource( propertyUri );

ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )

.type( MediaType.APPLICATION_JSON )

.entity( "\"" + propertyValue + "\"" )

.put( ClientResponse.class );

System.out.println( String.format( "PUT to [%s], status code [%d]",

propertyUri, response.getStatus() ) );

response.close();

If everything goes well, we’ll get a 204 No Content back indicating that the server processed the request

but didn’t echo back the property value.

Adding relationships

Now that we have nodes to represent Joe Strummer and The Clash, we can relate them. The REST

API supports this through a POST of a relationship representation to the start node of the relationship.

Languages

Correspondingly in Java we POST some JSON to the URI of our node that represents Joe Strummer, to

establish a relationship between that node and the node representing The Clash.

URI relationshipUri = addRelationship( firstNode, secondNode, "singer",

"{ \"from\" : \"1976\", \"until\" : \"1986\" }" );

Inside the addRelationship method, we determine the URI of the Joe Strummer node’s relationships,

and then POST a JSON description of our intended relationship. This description contains the destination

node, a label for the relationship type, and any attributes for the relation as a JSON collection.

private static URI addRelationship( URI startNode, URI endNode,

String relationshipType, String jsonAttributes )

throws URISyntaxException

{

URI fromUri = new URI( startNode.toString() + "/relationships" );

String relationshipJson = generateJsonRelationship( endNode,

relationshipType, jsonAttributes );

WebResource resource = Client.create()

.resource( fromUri );

// POST JSON to the relationships URI

ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )

.type( MediaType.APPLICATION_JSON )

.entity( relationshipJson )

.post( ClientResponse.class );

final URI location = response.getLocation();

System.out.println( String.format(

"POST to [%s], status code [%d], location header [%s]",

fromUri, response.getStatus(), location.toString() ) );

response.close();

return location;

}

If all goes well, we receive a 201 Created status code and a Location header which contains a URI of the

newly created relation.

Add properties to a relationship

Like nodes, relationships can have properties. Since we’re big fans of both Joe Strummer and the Clash,

we’ll add a rating to the relationship so that others can see he’s a 5-star singer with the band.

addMetadataToProperty( relationshipUri, "stars", "5" );

Inside the addMetadataToProperty method, we determine the URI of the properties of the relationship

and PUT our new values (since it’s PUT it will always overwrite existing values, so be careful).

private static void addMetadataToProperty( URI relationshipUri,

String name, String value ) throws URISyntaxException

{

URI propertyUri = new URI( relationshipUri.toString() + "/properties" );

String entity = toJsonNameValuePairCollection( name, value );

WebResource resource = Client.create()

.resource( propertyUri );

ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )

.type( MediaType.APPLICATION_JSON )

.entity( entity )

.put( ClientResponse.class );

System.out.println( String.format(

"PUT [%s] to [%s], status code [%d]", entity, propertyUri,

response.getStatus() ) );

response.close();

Languages

100

}

Assuming all goes well, we’ll get a 204 OK response back from the server (which we can check by calling

ClientResponse.getStatus()) and we’ve now established a very small graph that we can query.

Querying graphs

As with the embedded version of the database, the Neo4j server uses graph traversals to look for data

in graphs. Currently the Neo4j server expects a JSON payload describing the traversal to be POST-ed at

the starting node for the traversal (though this is likely to change in time to a GET-based approach).

To start this process, we use a simple class that can turn itself into the equivalent JSON, ready for POST-

ing to the server, and in this case we’ve hardcoded the traverser to look for all nodes with outgoing

relationships with the type "singer".

// TraversalDefinition turns into JSON to send to the Server

TraversalDefinition t = new TraversalDefinition();

t.setOrder( TraversalDefinition.DEPTH_FIRST );

t.setUniqueness( TraversalDefinition.NODE );

t.setMaxDepth( 10 );

t.setReturnFilter( TraversalDefinition.ALL );

t.setRelationships( new Relation( "singer", Relation.OUT ) );

Once we have defined the parameters of our traversal, we just need to transfer it. We do this by

determining the URI of the traversers for the start node, and then POST-ing the JSON representation of

the traverser to it.

URI traverserUri = new URI( startNode.toString() + "/traverse/node" );

WebResource resource = Client.create()

.resource( traverserUri );

String jsonTraverserPayload = t.toJson();

ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )

.type( MediaType.APPLICATION_JSON )

.entity( jsonTraverserPayload )

.post( ClientResponse.class );

System.out.println( String.format(

"POST [%s] to [%s], status code [%d], returned data: "

+ System.lineSeparator() + "%s",

jsonTraverserPayload, traverserUri, response.getStatus(),

response.getEntity( String.class ) ) );

response.close();

Once that request has completed, we get back our dataset of singers and the bands they belong to:

[ {

"outgoing_relationships" : "http://localhost:7474/db/data/node/82/relationships/out",

"data" : {

"band" : "The Clash",

"name" : "Joe Strummer"

"traverse" : "http://localhost:7474/db/data/node/82/traverse/{returnType}",

"all_typed_relationships" : "http://localhost:7474/db/data/node/82/relationships/all/{-list|&|types}",

"property" : "http://localhost:7474/db/data/node/82/properties/{key}",

"all_relationships" : "http://localhost:7474/db/data/node/82/relationships/all",

"self" : "http://localhost:7474/db/data/node/82",

"properties" : "http://localhost:7474/db/data/node/82/properties",

"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/82/relationships/out/{-list|&|types}",

"incoming_relationships" : "http://localhost:7474/db/data/node/82/relationships/in",

"incoming_typed_relationships" : "http://localhost:7474/db/data/node/82/relationships/in/{-list|&|types}",

"create_relationship" : "http://localhost:7474/db/data/node/82/relationships"

}, {

"outgoing_relationships" : "http://localhost:7474/db/data/node/83/relationships/out",

Languages

101

"data" : {

"traverse" : "http://localhost:7474/db/data/node/83/traverse/{returnType}",

"all_typed_relationships" : "http://localhost:7474/db/data/node/83/relationships/all/{-list|&|types}",

"property" : "http://localhost:7474/db/data/node/83/properties/{key}",

"all_relationships" : "http://localhost:7474/db/data/node/83/relationships/all",

"self" : "http://localhost:7474/db/data/node/83",

"properties" : "http://localhost:7474/db/data/node/83/properties",

"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/83/relationships/out/{-list|&|types}",

"incoming_relationships" : "http://localhost:7474/db/data/node/83/relationships/in",

"incoming_typed_relationships" : "http://localhost:7474/db/data/node/83/relationships/in/{-list|&|types}",

"create_relationship" : "http://localhost:7474/db/data/node/83/relationships"

} ]

Phew, is that it?

That’s a flavor of what we can do with the REST API. Naturally any of the HTTP idioms we provide on the

server can be easily wrapped, including removing nodes and relationships through DELETE. Still if you’ve

gotten this far, then switching .post() for .delete() in the Jersey client code should be straightforward.

What’s next?

The HTTP API provides a good basis for implementers of client libraries, it’s also great for HTTP and

REST folks. In the future though we expect that idiomatic language bindings will appear to take

advantage of the REST API while providing comfortable language-level constructs for developers to use,

much as there are similar bindings for the embedded database.

Appendix: the code

•CreateSimpleGraph.java3

•Relation.java4

•TraversalDefinition.java5

3 https://github.com/neo4j/neo4j/blob/2.3.12/community/server-examples/src/main/java/org/neo4j/examples/server/

CreateSimpleGraph.java

4 https://github.com/neo4j/neo4j/blob/2.3.12/community/server-examples/src/main/java/org/neo4j/examples/server/

Relation.java

5 https://github.com/neo4j/neo4j/blob/2.3.12/community/server-examples/src/main/java/org/neo4j/examples/server/

TraversalDefinition.java

PartIII.Cypher Query Language

The Cypher part is the authoritative source for details on the Cypher Query Language. For a short

introduction, see Section8.1, “What is Cypher?” [106]. To take your first steps with Cypher, see

Chapter3, Introduction to Cypher [16]. For the terminology used, see Terminology [640].

103

8. Introduction ........................................................................................................................................ 105

8.1. What is Cypher? ...................................................................................................................... 106

8.2. Updating the graph ................................................................................................................. 109

8.3. Transactions ............................................................................................................................ 110

8.4. Uniqueness .............................................................................................................................. 111

8.5. Parameters .............................................................................................................................. 113

8.6. Compatibility ........................................................................................................................... 117

9. Syntax ................................................................................................................................................. 118

9.1. Values ...................................................................................................................................... 119

9.2. Expressions .............................................................................................................................. 120

9.3. Identifiers ................................................................................................................................. 123

9.4. Operators ................................................................................................................................ 124

9.5. Comments ............................................................................................................................... 126

9.6. Patterns ................................................................................................................................... 127

9.7. Collections ............................................................................................................................... 131

9.8. Working with NULL ................................................................................................................. 134

10. General Clauses ............................................................................................................................... 136

10.1. Return .................................................................................................................................... 137

10.2. Order by ................................................................................................................................ 140

10.3. Limit ....................................................................................................................................... 142

10.4. Skip ........................................................................................................................................ 144

10.5. With ....................................................................................................................................... 146

10.6. Unwind .................................................................................................................................. 148

10.7. Union ..................................................................................................................................... 150

10.8. Using ...................................................................................................................................... 152

11. Reading Clauses ............................................................................................................................... 154

11.1. Match ..................................................................................................................................... 155

11.2. Optional Match ...................................................................................................................... 164

11.3. Where .................................................................................................................................... 166

11.4. Start ....................................................................................................................................... 174

11.5. Aggregation ........................................................................................................................... 176

11.6. Load CSV ............................................................................................................................... 182

12. Writing Clauses ................................................................................................................................ 186

12.1. Create .................................................................................................................................... 187

12.2. Merge ..................................................................................................................................... 192

12.3. Set .......................................................................................................................................... 200

12.4. Delete .................................................................................................................................... 204

12.5. Remove .................................................................................................................................. 205

12.6. Foreach .................................................................................................................................. 207

12.7. Create Unique ....................................................................................................................... 208

12.8. Importing CSV files with Cypher ........................................................................................... 211

12.9. Using Periodic Commit ......................................................................................................... 213

13. Functions .......................................................................................................................................... 214

13.1. Predicates .............................................................................................................................. 215

13.2. Scalar functions ..................................................................................................................... 218

13.3. Collection functions ............................................................................................................... 224

13.4. Mathematical functions ........................................................................................................ 229

13.5. String functions ..................................................................................................................... 238

14. Schema ............................................................................................................................................. 243

14.1. Indexes .................................................................................................................................. 244

14.2. Constraints ............................................................................................................................ 247

14.3. Statistics ................................................................................................................................. 252

15. Query Tuning .................................................................................................................................... 253

15.1. How are queries executed? .................................................................................................. 254

15.2. How do I profile a query? ..................................................................................................... 255

15.3. Basic query tuning example ................................................................................................. 256

Cypher Query Language

104

16. Execution Plans ................................................................................................................................ 259

16.1. Starting point operators ....................................................................................................... 260

16.2. Expand operators .................................................................................................................. 263

16.3. Combining operators ............................................................................................................ 265

16.4. Row operators ....................................................................................................................... 270

16.5. Update Operators ................................................................................................................. 275

105

Chapter8.Introduction

To get an overview of Cypher, continue reading Section8.1, “What is Cypher?” [106]. The rest of this

chapter deals with the context of Cypher statements, like for example transaction management and

how to use parameters. For the Cypher language reference itself see other chapters at PartIII, “Cypher

Query Language” [102]. To take your first steps with Cypher, see Chapter3, Introduction to Cypher [16].

For the terminology used, see Terminology [640].

Introduction

106

8.1.What is Cypher?

Introduction

Cypher is a declarative graph query language that allows for expressive and efficient querying and

updating of the graph store. Cypher is a relatively simple but still very powerful language. Very

complicated database queries can easily be expressed through Cypher. This allows you to focus on

your domain instead of getting lost in database access.

Cypher is designed to be a humane query language, suitable for both developers and (importantly, we

think) operations professionals. Our guiding goal is to make the simple things easy, and the complex

things possible. Its constructs are based on English prose and neat iconography which helps to make

queries more self-explanatory. We have tried to optimize the language for reading and not for writing.

Being a declarative language, Cypher focuses on the clarity of expressing what to retrieve from a graph,

not on how to retrieve it. This is in contrast to imperative languages like Java, scripting languages like

Gremlin1, and the JRuby Neo4j bindings2. This approach makes query optimization an implementation

detail instead of burdening the user with it and requiring her to update all traversals just because the

physical database structure has changed (new indexes etc.).

Cypher is inspired by a number of different approaches and builds upon established practices for

expressive querying. Most of the keywords like WHERE and ORDER BY are inspired by SQL3. Pattern

matching borrows expression approaches from SPARQL4. Some of the collection semantics have been

borrowed from languages such as Haskell and Python.

Structure

Cypher borrows its structure from SQL — queries are built up using various clauses.

Clauses are chained together, and the they feed intermediate result sets between each other. For

example, the matching identifiers from one MATCH clause will be the context that the next clause exists

in.

The query language is comprised of several distinct clauses. You can read more details about them later

in the manual.

Here are a few clauses used to read from the graph:

•MATCH: The graph pattern to match. This is the most common way to get data from the graph.

•WHERE: Not a clause in its own right, but rather part of MATCH, OPTIONAL MATCH and WITH. Adds constraints

to a pattern, or filters the intermediate result passing through WITH.

•RETURN: What to return.

Let’s see MATCH and RETURN in action.

Imagine an example graph like the following one:

1 http://gremlin.tinkerpop.com

2 https://github.com/neo4jrb/neo4j/

3 http://en.wikipedia.org/wiki/SQL

4 http://en.wikipedia.org/wiki/SPARQL

Introduction

107

Figure8.1.Example Graph

nam e = 'Sara'

nam e = 'Maria'

friend

nam e = 'Steve'

nam e = 'John'

friend

nam e = 'Joe'

friend

For example, here is a query which finds a user called John and John’s friends (though not his direct

friends) before returning both John and any friends-of-friends that are found.

MATCH (john {name: 'John'})-[:friend]->()-[:friend]->(fof)

RETURN john.name, fof.name

Resulting in:

john.name fof.name

"John" "Maria"

"John" "Steve"

2 rows

Next up we will add filtering to set more parts in motion:

We take a list of user names and find all nodes with names from this list, match their friends and return

only those followed users who have a name property starting with S.

MATCH (user)-[:friend]->(follower)

WHERE user.name IN ['Joe', 'John', 'Sara', 'Maria', 'Steve'] AND follower.name =~ 'S.*'

RETURN user.name, follower.name

Resulting in:

user.name follower.name

"John" "Sara"

"Joe" "Steve"

2 rows

And here are examples of clauses that are used to update the graph:

•CREATE (and DELETE): Create (and delete) nodes and relationships.

•SET (and REMOVE): Set values to properties and add labels on nodes using SET and use REMOVE to remove

them.

•MERGE: Match existing or create new nodes and patterns. This is especially useful together with

uniqueness constraints.

For more Cypher examples, see Chapter5, Basic Data Modeling Examples [47] as well as the rest of

the Cypher part with details on the language. To use Cypher from Java, see Section33.15, “Execute

Introduction

108

Cypher Queries from Java” [609]. To take your first steps with Cypher, see Chapter3, Introduction to

Cypher [16].

Introduction

109

8.2.Updating the graph

Cypher can be used for both querying and updating your graph.

The Structure of Updating Queries

• A Cypher query part can’t both match and update the graph at the same time.

• Every part can either read and match on the graph, or make updates on it.

If you read from the graph and then update the graph, your query implicitly has two parts — the

reading is the first part, and the writing is the second part.

If your query only performs reads, Cypher will be lazy and not actually match the pattern until you ask

for the results. In an updating query, the semantics are that all the reading will be done before any

writing actually happens.

The only pattern where the query parts are implicit is when you first read and then write — any other

order and you have to be explicit about your query parts. The parts are separated using the WITH

statement. WITH is like an event horizon — it’s a barrier between a plan and the finished execution of

that plan.

When you want to filter using aggregated data, you have to chain together two reading query

parts — the first one does the aggregating, and the second filters on the results coming from the first

one.

MATCH (n {name: 'John'})-[:FRIEND]-(friend)

WITH n, count(friend) as friendsCount

WHERE friendsCount > 3

RETURN n, friendsCount

Using WITH, you specify how you want the aggregation to happen, and that the aggregation has to be

finished before Cypher can start filtering.

Here’s an example of updating the graph, writing the aggregated data to the graph:

MATCH (n {name: 'John'})-[:FRIEND]-(friend)

WITH n, count(friend) as friendsCount

SET n.friendCount = friendsCount

RETURN n.friendsCount

You can chain together as many query parts as the available memory permits.

Returning data

Any query can return data. If your query only reads, it has to return data — it serves no purpose if it

doesn’t, and it is not a valid Cypher query. Queries that update the graph don’t have to return anything,

but they can.

After all the parts of the query comes one final RETURN clause. RETURN is not part of any query part — it

is a period symbol at the end of a query. The RETURN clause has three sub-clauses that come with it:

SKIP/LIMIT and ORDER BY.

If you return graph elements from a query that has just deleted them — beware, you are holding a

pointer that is no longer valid. Operations on that node are undefined.

Introduction

110

8.3.Transactions

Any query that updates the graph will run in a transaction. An updating query will always either fully

succeed, or not succeed at all.

Cypher will either create a new transaction or run inside an existing one:

• If no transaction exists in the running context Cypher will create one and commit it once the query

finishes.

• In case there already exists a transaction in the running context, the query will run inside it, and

nothing will be persisted to disk until that transaction is successfully committed.

This can be used to have multiple queries be committed as a single transaction:

1. Open a transaction,

2. run multiple updating Cypher queries,

3. and commit all of them in one go.

Note that a query will hold the changes in memory until the whole query has finished executing. A large

query will consequently need a JVM with lots of heap space.

For using transactions over the REST API, see Section21.1, “Transactional Cypher HTTP

endpoint” [298].

When writing server extensions or using Neo4j embedded, remember that all iterators returned from

an execution result should be either fully exhausted or closed to ensure that the resources bound to

them will be properly released. Resources include transactions started by the query, so failing to do so

may, for example, lead to deadlocks or other weird behavior.

Introduction

111

8.4.Uniqueness

While pattern matching, Neo4j makes sure to not include matches where the same graph relationship

is found multiple times in a single pattern. In most use cases, this is a sensible thing to do.

Example: looking for a user’s friends of friends should not return said user.

Let’s create a few nodes and relationships:

CREATE (adam:User { name: 'Adam' }),(pernilla:User { name: 'Pernilla' }),(david:User { name: 'David'

}),

(adam)-[:FRIEND]->(pernilla),(pernilla)-[:FRIEND]->(david)

Which gives us the following graph:

User

nam e = 'Adam'

User

nam e = 'Pernilla'

FRIEND

User

nam e = 'David'

FRIEND

Now let’s look for friends of friends of Adam:

MATCH (user:User { name: 'Adam' })-[r1:FRIEND]-()-[r2:FRIEND]-(friend_of_a_friend)

RETURN friend_of_a_friend.name AS fofName

fofName

"David"

1 row

In this query, Cypher makes sure to not return matches where the pattern relationships r1 and r2 point

to the same graph relationship.

This is however not always desired. If the query should return the user, it is possible to spread the

matching over multiple MATCH clauses, like so:

MATCH (user:User { name: 'Adam' })-[r1:FRIEND]-(friend)

MATCH (friend)-[r2:FRIEND]-(friend_of_a_friend)

RETURN friend_of_a_friend.name AS fofName

fofName

"David"

"Adam"

2 rows

Note that while the following query looks similar to the previous one, it is actually equivalent to the one

before.

Introduction

112

MATCH (user:User { name: 'Adam' })-[r1:FRIEND]-(friend),(friend)-[r2:FRIEND]-(friend_of_a_friend)

RETURN friend_of_a_friend.name AS fofName

Here, the MATCH clause has a single pattern with two paths, while the previous query has two distinct

patterns.

fofName

"David"

1 row

Introduction

113

8.5.Parameters

Cypher supports querying with parameters. This means developers don’t have to resort to string

building to create a query. In addition to that, it also makes caching of execution plans much easier for

Cypher.

Parameters can be used for literals and expressions in the WHERE clause, for the index value in the

START clause, index queries, and finally for node/relationship ids. Parameters can not be used as for

property names, relationship types and labels, since these patterns are part of the query structure that

is compiled into a query plan.

Accepted names for parameters are letters and numbers, and any combination of these.

For details on using parameters via the Neo4j REST API, see Section21.1, “Transactional Cypher

HTTP endpoint” [298]. For details on parameters when using the Neo4j embedded Java API, see

Section33.16, “Query Parameters” [611].

Below follows a comprehensive set of examples of parameter usage. The parameters are given as JSON

here. Exactly how to submit them depends on the driver in use.

String literal

Parameters

{

"name" : "Johan"

}

Query

MATCH (n)

WHERE n.name = { name }

RETURN n

You can use parameters in this syntax as well:

Parameters

{

"name" : "Johan"

}

Query

MATCH (n { name: { name }})

RETURN n

Regular expression

Parameters

{

"regex" : ".*h.*"

}

Query

MATCH (n)

WHERE n.name =~ { regex }

RETURN n.name

Case-sensitive string pattern matching

Parameters

Introduction

114

{

"name" : "Michael"

}

Query

MATCH (n)

WHERE n.name STARTS WITH { name }

RETURN n.name

Create node with properties

Parameters

{

"props" : {

"position" : "Developer",

"name" : "Andres"

}

Query

CREATE ({ props })

Create multiple nodes with properties

Parameters

{

"props" : [ {

"position" : "Developer",

"awesome" : true,

"name" : "Andres"

}, {

"position" : "Developer",

"name" : "Michael",

"children" : 3

} ]

}

Query

CREATE (n:Person { props })

RETURN n

Setting all properties on node

Note that this will replace all the current properties.

Parameters

{

"props" : {

"position" : "Developer",

"name" : "Andres"

}

Query

MATCH (n)

WHERE n.name='Michaela'

SET n = { props }

Introduction

115

SKIP and LIMIT

Parameters

{

"s" : 1,

"l" : 1

}

Query

MATCH (n)

RETURN n.name

SKIP { s }

LIMIT { l }

Node id

Parameters

{

"id" : 0

}

Query

MATCH n

WHERE id(n)= { id }

RETURN n.name

Multiple node ids

Parameters

{

"ids" : [ 0, 1, 2 ]

}

Query

MATCH n

WHERE id(n) IN { ids }

RETURN n.name

Index value (legacy indexes)

Parameters

{

"value" : "Michaela"

}

Query

START n=node:people(name = { value })

RETURN n

Index query (legacy indexes)

Parameters

{

"query" : "name:Andreas"

}

Introduction

116

Query

START n=node:people({ query })

RETURN n

Introduction

117

8.6.Compatibility

Cypher is still changing rather rapidly. Parts of the changes are internal — we add new pattern

matchers, aggregators and optimizations or write new query planners, which hopefully makes your

queries run faster.

Other changes are directly visible to our users — the syntax is still changing. New concepts are being

added and old ones changed to fit into new possibilities. To guard you from having to keep up with our

syntax changes, Neo4j allows you to use an older parser, but still gain speed from new optimizations.

There are two ways you can select which parser to use. You can configure your database with the

configuration parameter cypher_parser_version, and enter which parser you’d like to use (see the

section called “Supported Language Versions” [117])). Any Cypher query that doesn’t explicitly say

anything else, will get the parser you have configured, or the latest parser if none is configured.

The other way is on a query by query basis. By simply putting CYPHER 2.2 at the beginning, that

particular query will be parsed with the 2.2 version of the parser. Below is an example using the START

clause to access a legacy index:

CYPHER 2.2

START n=node:nodes(name = "A")

RETURN n

Accessing entities by id via START

In versions of Cypher prior to 2.2 it was also possible to access specific nodes or relationships using the

START clause. In this case you could use a syntax like the following:

CYPHER 1.9

START n=node(42)

RETURN n

Note

The use of the START clause to find nodes by ID was deprecated from Cypher 2.0 onwards

and is now entirely disabled in Cypher 2.2 and up. You should instead make use of the MATCH

clause for starting points. See Section11.1, “Match” [155] for more information on the

correct syntax for this. The START clause should only be used when accessing legacy indexes

(see Chapter35, Legacy Indexing [621]).

Supported Language Versions

Neo4j 2.3 supports the following versions of the Cypher language:

• Neo4j Cypher 2.3

• Neo4j Cypher 2.2

• Neo4j Cypher 1.9

Tip

Each release of Neo4j supports a limited number of old Cypher Language Versions. When

you upgrade to a new release of Neo4j, please make sure that it supports the Cypher

language version you need. If not, you may need to modify your queries to work with a

newer Cypher language version.

118

Chapter9.Syntax

The nitty-gritty details of Cypher syntax.

Syntax

119

9.1.Values

All values that are handled by Cypher have a distinct type. The supported types of values are:

• Numeric values,

• String values,

• Boolean values,

• Nodes,

• Relationships,

• Paths,

• Maps from Strings to other values,

• Collections of any other type of value.

Most types of values can be constructed in a query using literal expressions (see Section9.2,

“Expressions” [120]). Special care must be taken when using NULL, as NULL is a value of every type (see

Section9.8, “Working with NULL” [134]). Nodes, relationships, and paths are returned as a result of

pattern matching.

Note that labels are not values but are a form of pattern syntax.

Syntax

120

9.2.Expressions

Expressions in general

An expression in Cypher can be:

•A decimal (integer or double) literal: 13, -40000, 3.14, 6.022E23.

•A hexadecimal integer literal (starting with 0x): 0x13zf, 0xFC3A9, -0x66eff.

•An octal integer literal (starting with 0): 01372, 02127, -05671.

•A string literal: "Hello", 'World'.

•A boolean literal: true, false, TRUE, FALSE.

•An identifier: n, x, rel, myFancyIdentifier, `A name with weird stuff in it[]!`.

•A property: n.prop, x.prop, rel.thisProperty, myFancyIdentifier.`(weird property name)`.

•A dynamic property: n["prop"], rel[n.city + n.zip], map[coll[0]].

•A parameter: {param}, {0}

•A collection of expressions: ["a", "b"], [1,2,3], ["a", 2, n.property, {param}], [ ].

•A function call: length(p), nodes(p).

•An aggregate function: avg(x.prop), count(*).

•A path-pattern: (a)-->()<--(b).

•An operator application: 1 + 2 and 3 < 4.

•A predicate expression is an expression that returns true or false: a.prop = "Hello", length(p) > 10,

has(a.name).

•A regular expression: a.name =~ "Tob.*"

•A case-sensitive string matching expression: a.surname STARTS WITH "Sven", a.surname ENDS WITH "son"

or a.surname CONTAINS "son"

•A CASE expression.

Note on string literals

String literals can contain these escape sequences.

Escape

sequence

Character

\t Tab

\b Backspace

\n Newline

\r Carriage return

\f Form feed

\' Single quote

\" Double quote

\\ Backslash

\uxxxx Unicode UTF-16 code point (4

hex digits must follow the \u)

\Uxxxxxxxx Unicode UTF-32 code point (8

hex digits must follow the \U)

Case Expressions

Cypher supports CASE expressions, which is a generic conditional expression, similar to if/else

statements in other languages. Two variants of CASE exist — the simple form and the generic form.

Syntax

121

Simple CASE

The expression is calculated, and compared in order with the WHEN clauses until a match is found. If no

match is found the expression in the ELSE clause is used, or null, if no ELSE case exists.

Syntax:

CASE test

WHEN value THEN result

[WHEN ...]

[ELSE default]

END

Arguments:

•test: A valid expression.

•value: An expression whose result will be compared to the test expression.

•result: This is the result expression used if the value expression matches the test expression.

•default: The expression to use if no match is found.

Query

MATCH (n)

RETURN

CASE n.eyes

WHEN 'blue'

THEN 1

WHEN 'brown'

THEN 2

ELSE 3 END AS result

Result

result

5 rows

Generic CASE

The predicates are evaluated in order until a true value is found, and the result value is used. If no

match is found the expression in the ELSE clause is used, or null, if no ELSE case exists.

Syntax:

CASE

WHEN predicate THEN result

[WHEN ...]

[ELSE default]

END

Arguments:

•predicate: A predicate that is tested to find a valid alternative.

•result: This is the result expression used if the predicate matches.

•default: The expression to use if no match is found.

Query

Syntax

122

MATCH (n)

RETURN

CASE

WHEN n.eyes = 'blue'

THEN 1

WHEN n.age < 40

THEN 2

ELSE 3 END AS result

Result

result

5 rows

Syntax

123

9.3.Identifiers

When you reference parts of a pattern or a query, you do so by naming them. The names you give the

different parts are called identifiers.

In this example:

MATCH (n)-->(b) RETURN b

The identifiers are n and b.

Identifier names are case sensitive, and can contain underscores and alphanumeric characters (a-z,

0-9), but must always start with a letter. If other characters are needed, you can quote the identifier

using backquote (`) signs.

The same rules apply to property names.

Identifiers are only visible in the same query part

Identifiers are not carried over to subsequent queries. If multiple query parts are chained

together using WITH, identifiers have to be listed in the WITH clause to be carried over to the

next part. For more information see Section10.5, “With” [146].

Syntax

124

9.4.Operators

Mathematical operators

The mathematical operators are +, -, *, / and %, ^.

Comparison operators

The comparison operators are =, <>, <, >, <=, >=, IS NULL, and IS NOT NULL. See the section called “Equality

and Comparison of Values” [124] on how they behave.

The operators STARTS WITH, ENDS WITH and CONTAINS can be used to search for a string value by its

content.

Boolean operators

The boolean operators are AND, OR, XOR, NOT.

String operators

Strings can be concatenated using the + operator. For regular expression matching the =~ operator is

used.

Collection operators

Collections can be concatenated using the + operator. To check if an element exists in a collection, you

can use the IN operator.

Property operators

Note

Since version 2.0, the previously existing property operators ? and ! have been removed.

This syntax is no longer supported. Missing properties are now returned as NULL. Please use

(NOT(has(<ident>.prop)) OR <ident>.prop=<value>) if you really need the old behavior of the

? operator. — Also, the use of ? for optional relationships has been removed in favor of the

newly introduced OPTIONAL MATCH clause.

Equality and Comparison of Values

Equality

Cypher supports comparing values (see Section9.1, “Values” [119]) by equality using the = and <>

operators.

Values of the same type are only equal if they are the same identical value (e.g. 3 = 3 and "x" <> "xy").

Maps are only equal if they map exactly the same keys to equal values and collections are only equal if

they contain the same sequence of equal values (e.g. [3, 4] = [1+2, 8/2]).

Values of different types are considered as equal according to the following rules:

• Paths are treated as collections of alternating nodes and relationships and are equal to all collections

that contain that very same sequence of nodes and relationships.

•Testing any value against NULL with both the = and the <> operators always is NULL. This includes NULL =

NULL and NULL <> NULL. The only way to reliably test if a value v is NULL is by using the special v IS NULL,

or v IS NOT NULL equality operators.

All other combinations of types of values cannot be compared with each other. Especially, nodes,

relationships, and literal maps are incomparable with each other.

It is an error to compare values that cannot be compared.

Syntax

125

Ordering and Comparison of Values

The comparison operators <=, < (for ascending) and >=, > (for descending) are used to compare values

for ordering. The following points give some details on how the comparison is performed.

•Numerical values are compared for ordering using numerical order (e.g. 3 < 4 is true).

•The special value java.lang.Double.NaN is regarded as being larger than all other numbers.

•String values are compared for ordering using lexicographic order (e.g. "x" < "xy").

•Boolean values are compared for ordering such that false < true.

•Comparing for ordering when one argument is NULL is NULL (e.g. NULL < 3 is NULL).

• It is an error to compare other types of values with each other for ordering.

Chaining Comparison Operations

Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y AND y <= z.

Formally, if a, b, c, ..., y, z are expressions and op1, op2, ..., opN are comparison operators, then a

op1 b op2 c ... y opN z is equivalent to a op1 b and b op2 c and ... y opN z.

Note that a op1 b op2 c does not imply any kind of comparison between a and c, so that, e.g., x < y > z

is perfectly legal (though perhaps not pretty).

The example:

MATCH (n) WHERE 21 < n.age <= 30 RETURN n

is equivalent to

MATCH (n) WHERE 21 < n.age AND n.age <= 30 RETURN n

Thus it will match all nodes where the age is between 21 and 30.

This syntax extends to all equality and inequality comparisons, as well as extending to chains longer

than three.

For example:

a < b = c <= d <> e

Is equivalent to:

a < b AND b = c AND c <= d AND d <> e

For other comparison operators, see the section called “Comparison operators” [124].

Syntax

126

9.5.Comments

To add comments to your queries, use double slash. Examples:

MATCH (n) RETURN n //This is an end of line comment

MATCH (n)

//This is a whole line comment

RETURN n

MATCH (n) WHERE n.property = "//This is NOT a comment" RETURN n

Syntax

127

9.6.Patterns

Patterns and pattern-matching are at the very heart of Cypher, so being effective with Cypher requires

a good understanding of patterns.

Using patterns, you describe the shape of the data you’re looking for. For example, in the MATCH clause

you describe the shape with a pattern, and Cypher will figure out how to get that data for you.

The pattern describes the data using a form that is very similar to how one typically draws the shape of

property graph data on a whiteboard: usually as circles (representing nodes) and arrows between them

to represent relationships.

Patterns appear in multiple places in Cypher: in MATCH, CREATE and MERGE clauses, and in pattern

expressions. Each of these is described in more details in:

•Section11.1, “Match” [155]

•Section11.2, “Optional Match” [164]

•Section12.1, “Create” [187]

•Section12.2, “Merge” [192]

•the section called “Using path patterns in WHERE” [170]

Patterns for nodes

The very simplest “shape” that can be described in a pattern is a node. A node is described using a pair

of parentheses, and is typically given a name. For example:

(a)

This simple pattern describes a single node, and names that node using the identifier a.

Patterns for related nodes

More interesting is patterns that describe multiple nodes and relationships between them. Cypher

patterns describe relationships by employing an arrow between two nodes. For example:

(a)-->(b)

This pattern describes a very simple data shape: two nodes, and a single relationship from one to the

other. In this example, the two nodes are both named as a and b respectively, and the relationship is

“directed”: it goes from a to b.

This way of describing nodes and relationships can be extended to cover an arbitrary number of nodes

and the relationships between them, for example:

(a)-->(b)<--(c)

Such a series of connected nodes and relationships is called a "path".

Note that the naming of the nodes in these patterns is only necessary should one need to refer to the

same node again, either later in the pattern or elsewhere in the Cypher query. If this is not necessary

then the name may be omitted, like so:

(a)-->()<--(c)

Labels

In addition to simply describing the shape of a node in the pattern, one can also describe attributes.

The most simple attribute that can be described in the pattern is a label that the node must have. For

example:

(a:User)-->(b)

Syntax

128

One can also describe a node that has multiple labels:

(a:User:Admin)-->(b)

Specifying properties

Nodes and relationships are the fundamental structures in a graph. Neo4j uses properties on both of

these to allow for far richer models.

Properties can be expressed in patterns using a map-construct: curly brackets surrounding a number of

key-expression pairs, separated by commas. E.g. a node with two properties on it would look like:

(a { name: "Andres", sport: "Brazilian Ju-Jitsu" })

A relationship with expectations on it would could look like:

(a)-[{blocked: false}]->(b)

When properties appear in patterns, they add an additional constraint to the shape of the data. In

the case of a CREATE clause, the properties will be set in the newly created nodes and relationships.

In the case of a MERGE clause, the properties will be used as additional constraints on the shape any

existing data must have (the specified properties must exactly match any existing data in the graph).

If no matching data is found, then MERGE behaves like CREATE and the properties will be set in the newly

created nodes and relationships.

Note that patterns supplied to CREATE may use a single parameter to specify properties, e.g: CREATE

(node {paramName}). This is not possible with patterns used in other clauses, as Cypher needs to know

the property names at the time the query is compiled, so that matching can be done effectively.

Describing relationships

The simplest way to describe a relationship is by using the arrow between two nodes, as in the

previous examples. Using this technique, you can describe that the relationship should exist and the

directionality of it. If you don’t care about the direction of the relationship, the arrow head can be

omitted, like so:

(a)--(b)

As with nodes, relationships may also be given names. In this case, a pair of square brackets is used to

break up the arrow and the identifier is placed between. For example:

(a)-[r]->(b)

Much like labels on nodes, relationships can have types. To describe a relationship with a specific type,

you can specify this like so:

(a)-[r:REL_TYPE]->(b)

Unlike labels, relationships can only have one type. But if we’d like to describe some data such that the

relationship could have any one of a set of types, then they can all be listed in the pattern, separating

them with the pipe symbol | like this:

(a)-[r:TYPE1|TYPE2]->(b)

Note that this form of pattern can only be used to describe existing data (ie. when using a pattern

with MATCH or as an expression). It will not work with CREATE or MERGE, since it’s not possible to create a

relationship with multiple types.

As with nodes, the name of the relationship can always be omitted, in this case like so:

(a)-[:REL_TYPE]->(b)

Syntax

129

Variable length

Caution

Variable length pattern matching in versions 2.1.x and earlier does not enforce relationship

uniqueness for patterns described inside of a single MATCH clause. This means that a query

such as the following: MATCH (a)-[r]->(b), (a)-[rs*]->(c) RETURN * may include r as part of

the rs set. This behavior has changed in versions 2.2.0 and later, in such a way that r will be

excluded from the result set, as this better adheres to the rules of relationship uniqueness

as documented here Section8.4, “Uniqueness” [111]. If you have a query pattern that needs

to retrace relationships rather than ignoring them as the relationship uniqueness rules

normally dictate, you can accomplish this using multiple match clauses, as follows: MATCH

(a)-[r]->(b) MATCH (a)-[rs*]->(c) RETURN *. This will work in all versions of Neo4j that

support the MATCH clause, namely 2.0.0 and later.

Rather than describing a long path using a sequence of many node and relationship descriptions in a

pattern, many relationships (and the intermediate nodes) can be described by specifying a length in the

relationship description of a pattern. For example:

(a)-[*2]->(b)

This describes a graph of three nodes and two relationship, all in one path (a path of length 2). This is

equivalent to:

(a)-->()-->(b)

A range of lengths can also be specified: such relationship patterns are called “variable length

relationships”. For example:

(a)-[*3..5]->(b)

This is a minimum length of 3, and a maximum of 5. It describes a graph of either 4 nodes and 3

relationships, 5 nodes and 4 relationships or 6 nodes and 5 relationships, all connected together in a

single path.

Either bound can be omitted. For example, to describe paths of length 3 or more, use:

(a)-[*3..]->(b)

And to describe paths of length 5 or less, use:

(a)-[*..5]->(b)

Both bounds can be omitted, allowing paths of any length to be described:

(a)-[*]->(b)

As a simple example, let’s take the query below:

Query

MATCH (me)-[:KNOWS*1..2]-(remote_friend)

WHERE me.name = "Filipa"

RETURN remote_friend.name

Result

remote_friend.name

"Dilshad"

"Anders"

2 rows

Syntax

130

This query finds data in the graph which a shape that fits the pattern: specifically a node (with the name

property Filipa) and then the KNOWS related nodes, one or two steps out. This is a typical example of

finding first and second degree friends.

Note that variable length relationships can not be used with CREATE and MERGE.

Assigning to path identifiers

As described above, a series of connected nodes and relationships is called a "path". Cypher allows

paths to be named using an identifer, like so:

p = (a)-[*3..5]->(b)

You can do this in MATCH, CREATE and MERGE, but not when using patterns as expressions.

Syntax

131

9.7.Collections

Cypher has good support for collections.

Collections in general

A literal collection is created by using brackets and separating the elements in the collection with

commas.

Query

RETURN [0,1,2,3,4,5,6,7,8,9] AS collection

Result

collection

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

1 row

In our examples, we’ll use the range function. It gives you a collection containing all numbers between

given start and end numbers. Range is inclusive in both ends.

To access individual elements in the collection, we use the square brackets again. This will extract from

the start index and up to but not including the end index.

Query

RETURN range(0,10)[3]

Result

range(0,10)[3]

1 row

You can also use negative numbers, to start from the end of the collection instead.

Query

RETURN range(0,10)[-3]

Result

range(0,10)[-3]

1 row

Finally, you can use ranges inside the brackets to return ranges of the collection.

Query

RETURN range(0,10)[0..3]

Result

range(0,10)[0..3]

[0, 1, 2]

1 row

Query

RETURN range(0,10)[0..-5]

Result

range(0,10)[0..-5]

[0, 1, 2, 3, 4, 5]

1 row

Syntax

132

Query

RETURN range(0,10)[-5..]

Result

range(0,10)[-5..]

[6, 7, 8, 9, 10]

1 row

Query

RETURN range(0,10)[..4]

Result

range(0,10)[..4]

[0, 1, 2, 3]

1 row

Note

Out-of-bound slices are simply truncated, but out-of-bound single elements return NULL.

Query

RETURN range(0,10)[15]

Result

range(0,10)[15]

<null>

1 row

Query

RETURN range(0,10)[5..15]

Result

range(0,10)[5..15]

[5, 6, 7, 8, 9, 10]

1 row

You can get the size of a collection like this:

Query

RETURN size(range(0,10)[0..3])

Result

size(range(0,10)[0..3])

1 row

List comprehension

List comprehension is a syntactic construct available in Cypher for creating a collection based on

existing collections. It follows the form of the mathematical set-builder notation (set comprehension)

instead of the use of map and filter functions.

Query

Syntax

133

RETURN [x IN range(0,10) WHERE x % 2 = 0 | x^3] AS result

Result

result

[0. 0, 8. 0, 64. 0, 216. 0, 512. 0, 1000. 0]

1 row

Either the WHERE part, or the expression, can be omitted, if you only want to filter or map respectively.

Query

RETURN [x IN range(0,10) WHERE x % 2 = 0] AS result

Result

result

[0, 2, 4, 6, 8, 10]

1 row

Query

RETURN [x IN range(0,10)| x^3] AS result

Result

result

[0. 0, 1. 0, 8. 0, 27. 0, 64. 0, 125. 0, 216. 0, 343. 0, 512. 0, 729. 0, 1000. 0]

1 row

Literal maps

From Cypher, you can also construct maps. Through REST you will get JSON objects; in Java they will be

java.util.Map<String,Object>.

Query

RETURN { key : "Value", collectionKey: [{ inner: "Map1" }, { inner: "Map2" }]} AS result

Result

result

{key -> "Value", collectionKey -> [{inner -> "Map1"}, {inner -> "Map2"}]}

1 row

Syntax

134

9.8.Working with NULL

Introduction to NULL in Cypher

In Cypher, NULL is used to represent missing or undefined values. Conceptually, NULL means “a missing

unknown value” and it is treated somewhat differently from other values. For example getting a

property from a node that does not have said property produces NULL. Most expressions that take NULL

as input will produce NULL. This includes boolean expressions that are used as predicates in the WHERE

clause. In this case, anything that is not TRUE is interpreted as being false.

NULL is not equal to NULL. Not knowing two values does not imply that they are the same value. So the

expression NULL = NULL yields NULL and not TRUE.

Logical operations with NULL

The logical operators (AND, OR, XOR, IN, NOT) treat NULL as the “unknown” value of three-valued logic. Here is

the truth table for AND, OR and XOR.

a b a AND b a OR b a XOR b

FALSE FALSE FALSE FALSE FALSE

FALSE NULL FALSE NULL NULL

FALSE TRUE FALSE TRUE TRUE

TRUE FALSE FALSE TRUE TRUE

TRUE NULL NULL TRUE NULL

TRUE TRUE TRUE TRUE FALSE

NULL FALSE FALSE NULL NULL

NULL NULL NULL NULL NULL

NULL TRUE NULL TRUE NULL

The IN operator and NULL

The IN operator follows similar logic. If Cypher knows that something exists in a collection, the result

will be TRUE. Any collection that contains a NULL and doesn’t have a matching element will return NULL.

Otherwise, the result will be false. Here is a table with examples:

Expression Result

2 IN [1, 2, 3] TRUE

2 IN [1, NULL, 3] NULL

2 IN [1, 2, NULL]TRUE

2 IN [1] FALSE

2 IN [] FALSE

NULL IN [1,2,3] NULL

NULL IN [1,NULL,3] NULL

NULL IN [] FALSE

Using ALL, ANY, NONE, and SINGLE follows a similar rule. If the result can be calculated definitely, TRUE or

FALSE is returned. Otherwise NULL is produced.

Expressions that return NULL

•Getting a missing element from a collection: [][0], head([])

Syntax

135

•Trying to access a property that does not exist on a node or relationship: n.missingProperty

•Comparisons when either side is NULL: 1 < NULL

•Arithmetic expressions containing NULL: 1 + NULL

•Function calls where any arguments are NULL: sin(NULL)

136

Chapter10.General Clauses

General Clauses

137

10.1.Return

The RETURN clause defines what to include in the query result set.

In the RETURN part of your query, you define which parts of the pattern you are interested in. It can be

nodes, relationships, or properties on these.

Tip

If what you actually want is the value of a property, make sure to not return the full node/

relationship. This will improve performance.

Figure10.1.Graph

nam e = 'A'

happy = 'Yes!'

age = 55

nam e = 'B'

BLOCKS KNOWS

Return nodes

To return a node, list it in the RETURN statement.

Query

MATCH (n { name: "B" })

RETURN n

The example will return the node.

Result

Node[1]{name:"B"}

1 row

Return relationships

To return a relationship, just include it in the RETURN list.

Query

MATCH (n { name: "A" })-[r:KNOWS]->(c)

RETURN r

The relationship is returned by the example.

Result

:KNOWS[0]{}

1 row

Return property

To return a property, use the dot separator, like this:

General Clauses

138

Query

MATCH (n { name: "A" })

RETURN n.name

The value of the property name gets returned.

Result

n.name

"A"

1 row

Return all elements

When you want to return all nodes, relationships and paths found in a query, you can use the * symbol.

Query

MATCH p=(a { name: "A" })-[r]->(b)

RETURN *

This returns the two nodes, the relationship and the path used in the query.

Result

a b p r

Node[0]{name:"A",

happy:"Yes!", age:55}

Node[1]{name:"B"} [Node[0]{name:"A",

happy:"Yes!",

age:55}, :BLOCKS[1]{},

Node[1]{name:"B"}]

:BLOCKS[1]{}

Node[0]{name:"A",

happy:"Yes!", age:55}

Node[1]{name:"B"} [Node[0]{name:"A",

happy:"Yes!",

age:55}, :KNOWS[0]{},

Node[1]{name:"B"}]

:KNOWS[0]{}

2 rows

Identifier with uncommon characters

To introduce a placeholder that is made up of characters that are outside of the english alphabet, you

can use the ` to enclose the identifier, like this:

Query

MATCH (`This isn't a common identifier`)

WHERE `This isn't a common identifier`.name='A'

RETURN `This isn't a common identifier`.happy

The node with name "A" is returned

Result

`This isn't a common identifier`.happy

"Yes!"

1 row

Column alias

If the name of the column should be different from the expression used, you can rename it by using AS

<new name>.

Query

MATCH (a { name: "A" })

RETURN a.age AS SomethingTotallyDifferent

General Clauses

139

Returns the age property of a node, but renames the column.

Result

SomethingTotallyDifferent

1 row

Optional properties

If a property might or might not be there, you can still select it as usual. It will be treated as NULL if it is

missing

Query

MATCH (n)

RETURN n.age

This example returns the age when the node has that property, or null if the property is not there.

Result

n.age

<null>

2 rows

Other expressions

Any expression can be used as a return item — literals, predicates, properties, functions, and everything

else.

Query

MATCH (a { name: "A" })

RETURN a.age > 30, "I'm a literal",(a)-->()

Returns a predicate, a literal and function call with a pattern expression parameter.

Result

a.age > 30 "I'm a literal" (a)-->()

true "I'm a literal" [[Node[0]{name:"A", happy:"Yes!",

age:55}, :BLOCKS[1]{}, Node[1]

{name:"B"}], [Node[0]{name:"A",

happy:"Yes!", age:55}, :KNOWS[0]

{}, Node[1]{name:"B"}]]

1 row

Unique results

DISTINCT retrieves only unique rows depending on the columns that have been selected to output.

Query

MATCH (a { name: "A" })-->(b)

RETURN DISTINCT b

The node named B is returned by the query, but only once.

Result

Node[1]{name:"B"}

1 row

General Clauses

140

10.2.Order by

ORDER BY is a sub-clause following RETURN or WITH, and it specifies that the output should be

sorted and how.

Note that you can not sort on nodes or relationships, just on properties on these. ORDER BY relies on

comparisons to sort the output, see the section called “Ordering and Comparison of Values” [125].

In terms of scope of identifiers, ORDER BY follows special rules, depending on if the projecting RETURN

or WITH clause is either aggregating or DISTINCT. If it is an aggregating or DISTINCT projection, only the

identifiers available in the projection are available. If the projection does not alter the output cardinality

(which aggregation and DISTINCT do), identifiers available from before the projecting clause are also

available. When the projection clause shadows already existing identifiers, only the new identifiers are

available.

Lastly, it is not allowed to use aggregating expressions in the ORDER BY sub-clause if they are not also

listed in the projecting clause. This last rule is to make sure that ORDER BY does not change the results,

only the order of them.

Figure10.2.Graph

nam e = 'A'

age = 34

length = 170

nam e = 'B'

age = 34

KNOWS

nam e = 'C'

age = 32

length = 185

KNOWS

Order nodes by property

ORDER BY is used to sort the output.

Query

MATCH (n)

RETURN n

ORDER BY n.name

The nodes are returned, sorted by their name.

Result

Node[0]{name:"A", age:34, length:170}

Node[1]{name:"B", age:34}

Node[2]{name:"C", age:32, length:185}

3 rows

General Clauses

141

Order nodes by multiple properties

You can order by multiple properties by stating each identifier in the ORDER BY clause. Cypher will sort

the result by the first identifier listed, and for equals values, go to the next property in the ORDER BY

clause, and so on.

Query

MATCH (n)

RETURN n

ORDER BY n.age, n.name

This returns the nodes, sorted first by their age, and then by their name.

Result

Node[2]{name:"C", age:32, length:185}

Node[0]{name:"A", age:34, length:170}

Node[1]{name:"B", age:34}

3 rows

Order nodes in descending order

By adding DESC[ENDING] after the identifier to sort on, the sort will be done in reverse order.

Query

MATCH (n)

RETURN n

ORDER BY n.name DESC

The example returns the nodes, sorted by their name reversely.

Result

Node[2]{name:"C", age:32, length:185}

Node[1]{name:"B", age:34}

Node[0]{name:"A", age:34, length:170}

3 rows

Ordering NULL

When sorting the result set, NULL will always come at the end of the result set for ascending sorting, and

first when doing descending sort.

Query

MATCH (n)

RETURN n.length, n

ORDER BY n.length

The nodes are returned sorted by the length property, with a node without that property last.

Result

n.length n

170 Node[0]{name:"A", age:34, length:170}

185 Node[2]{name:"C", age:32, length:185}

<null> Node[1]{name:"B", age:34}

3 rows

General Clauses

142

10.3.Limit

LIMIT constrains the number of rows in the output.

LIMIT accepts any expression that evaluates to a positive integer — however the expression cannot refer

to nodes or relationships.

Figure10.3.Graph

nam e = 'D' nam e = 'E'

nam e = 'A'

KNOWS KNOWS

nam e = 'C'

KNOWS

nam e = 'B'

KNOWS

Return first part

To return a subset of the result, starting from the top, use this syntax:

Query

MATCH (n)

RETURN n

ORDER BY n.name

LIMIT 3

The top three items are returned by the example query.

Result

Node[2]{name:"A"}

Node[3]{name:"B"}

Node[4]{name:"C"}

3 rows

Return first from expression

Limit accepts any expression that evaluates to a positive integer as long as it is not referring to any

external identifiers:

Parameters

{

"p" : 12

}

Query

MATCH (n)

RETURN n

ORDER BY n.name

LIMIT toInt(3 * rand())+ 1

Returns one to three top items

General Clauses

143

Result

Node[2]{name:"A"}

Node[3]{name:"B"}

2 rows

General Clauses

144

10.4.Skip

SKIP defines from which row to start including the rows in the output.

By using SKIP, the result set will get trimmed from the top. Please note that no guarantees are made on

the order of the result unless the query specifies the ORDER BY clause. SKIP accepts any expression that

evaluates to a positive integer — however the expression cannot refer to nodes or relationships.

Figure10.4.Graph

nam e = 'D' nam e = 'E'

nam e = 'A'

KNOWS KNOWS

nam e = 'C'

KNOWS

nam e = 'B'

KNOWS

Skip first three

To return a subset of the result, starting from the fourth result, use the following syntax:

Query

MATCH (n)

RETURN n

ORDER BY n.name

SKIP 3

The first three nodes are skipped, and only the last two are returned in the result.

Result

Node[0]{name:"D"}

Node[1]{name:"E"}

2 rows

Return middle two

To return a subset of the result, starting from somewhere in the middle, use this syntax:

Query

MATCH (n)

RETURN n

ORDER BY n.name

SKIP 1

LIMIT 2

Two nodes from the middle are returned.

Result

Node[3]{name:"B"}

Node[4]{name:"C"}

2 rows

General Clauses

145

Skip first from expression

Skip accepts any expression that evaluates to a positive integer as long as it is not referring to any

external identifiers:

Query

MATCH (n)

RETURN n

ORDER BY n.name

SKIP toInt(3*rand())+ 1

The first three nodes are skipped, and only the last two are returned in the result.

Result

Node[3]{name:"B"}

Node[4]{name:"C"}

Node[0]{name:"D"}

Node[1]{name:"E"}

4 rows

General Clauses

146

10.5.With

The WITH clause allows query parts to be chained together, piping the results from one to

be used as starting points or criteria in the next.

Using WITH, you can manipulate the output before it is passed on to the following query parts. The

manipulations can be of the shape and/or number of entries in the result set.

One common usage of WITH is to limit the number of entries that are then passed on to other MATCH

clauses. By combining ORDER BY and LIMIT, it’s possible to get the top X entries by some criteria, and then

bring in additional data from the graph.

Another use is to filter on aggregated values. WITH is used to introduce aggregates which can then by

used in predicates in WHERE. These aggregate expressions create new bindings in the results. WITH can

also, like RETURN, alias expressions that are introduced into the results using the aliases as binding name.

WITH is also used to separate reading from updating of the graph. Every part of a query must be either

read-only or write-only. When going from a writing part to a reading part, the switch must be done with

a WITH clause.

Figure10.5.Graph

nam e = 'David'

nam e = 'Anders'

KNOWS

nam e = 'Ceasar'

BLOCKS

nam e = 'Bossman'

KNOWS

nam e = 'Emil'

KNOWS

BLOCKS

KNOWS

Filter on aggregate function results

Aggregated results have to pass through a WITH clause to be able to filter on.

Query

MATCH (david { name: "David" })--(otherPerson)-->()

WITH otherPerson, count(*) AS foaf

WHERE foaf > 1

RETURN otherPerson

The person connected to David with the at least more than one outgoing relationship will be returned

by the query.

General Clauses

147

Result

otherPerson

Node[2]{name:"Anders"}

1 row

Sort results before using collect on them

You can sort your results before passing them to collect, thus sorting the resulting collection.

Query

MATCH (n)

WITH n

ORDER BY n.name DESC LIMIT 3

RETURN collect(n.name)

A list of the names of people in reverse order, limited to 3, in a collection.

Result

collect(n.name)

["Emil", "David", "Ceasar"]

1 row

Limit branching of your path search

You can match paths, limit to a certain number, and then match again using those paths as a base As

well as any number of similar limited searches.

Query

MATCH (n { name: "Anders" })--(m)

WITH m

ORDER BY m.name DESC LIMIT 1

MATCH (m)--(o)

RETURN o.name

Starting at Anders, find all matching nodes, order by name descending and get the top result, then find

all the nodes connected to that top result, and return their names.

Result

o.name

"Bossman"

"Anders"

2 rows

General Clauses

148

10.6.Unwind

UNWIND expands a collection into a sequence of rows.

With UNWIND, you can transform any collection back into individual rows. These collections can be

parameters that were passed in, previously COLLECTed result or other collection expressions.

One common usage of unwind is to create distinct collections. Another is to create data from

parameter collections that are provided to the query.

UNWIND requires you to specify a new name for the inner values.

Unwind a collection

We want to transform the literal collection into rows named x and return them.

Query

UNWIND[1,2,3] AS x

RETURN x

Each value of the original collection is returned as an individual row.

Result

3 rows

Create a distinct collection

We want to transform a collection of duplicates into a set using DISTINCT.

Query

WITH [1,1,2,2] AS coll UNWIND coll AS x

WITH DISTINCT x

RETURN collect(x) AS SET

Each value of the original collection is unwound and passed through DISTINCT to create a unique set.

Result

set

[1, 2]

1 row

Create nodes from a collection parameter

Create a number of nodes and relationships from a parameter-list without using FOREACH.

Parameters

{

"events" : [ {

"year" : 2014,

"id" : 1

}, {

"year" : 2014,

"id" : 2

} ]

General Clauses

149

}

Query

UNWIND { events } AS event

MERGE (y:Year { year:event.year })

MERGE (y)<-[:IN]-(e:Event { id:event.id })

RETURN e.id AS x

ORDER BY x

Each value of the original collection is unwound and passed through MERGE to find or create the nodes

and relationships.

Result

2 rows

Nodes created: 3

Relationships created: 2

Properties set: 3

Labels added: 3

General Clauses

150

10.7.Union

The UNION clause is used to combine the result of multiple queries.

It combines the results of two or more queries into a single result set that includes all the rows that

belong to all queries in the union.

The number and the names of the columns must be identical in all queries combined by using UNION.

To keep all the result rows, use UNION ALL. Using just UNION will combine and remove duplicates from the

result set.

Figure10.6.Graph

Actor

nam e = 'Anthony Hopkins'

Movie

title = 'Hitchcock'

ACTS_IN Actor

nam e = 'Helen Mirren'

KNOWS

ACTS_IN

Actor

nam e = 'Hitchcock'

Combine two queries

Combining the results from two queries is done using UNION ALL.

Query

MATCH (n:Actor)

RETURN n.name AS name

UNION ALL MATCH (n:Movie)

RETURN n.title AS name

The combined result is returned, including duplicates.

Result

name

"Anthony Hopkins"

"Helen Mirren"

"Hitchcock"

4 rows

Combine two queries and remove duplicates

By not including ALL in the UNION, duplicates are removed from the combined result set

Query

MATCH (n:Actor)

General Clauses

151

RETURN n.name AS name

UNION

MATCH (n:Movie)

RETURN n.title AS name

The combined result is returned, without duplicates.

Result

name

"Anthony Hopkins"

"Helen Mirren"

"Hitchcock"

3 rows

General Clauses

152

10.8.Using

USING is used to influence the decisions of the planner when building an execution plan for

a query.

Caution

Forcing planner behavior is an advanced feature, and should be used with caution by

experienced developers and/or database administrators only, as it may cause queries to

perform poorly.

When executing a query, Neo4j needs to decide where in the query graph to start matching. This is

done by looking at the MATCH clause and the WHERE conditions and using that information to find useful

indexes.

This index might not be the best choice though — sometimes multiple indexes could be used, and

Neo4j has picked the wrong one (from a performance point of view).

You can force Neo4j to use a specific starting point through the USING clause. This is called giving an

index hint.

If your query matches large parts of an index, it might be faster to scan the label and filter out nodes

that do not match. To do this, you can use USING SCAN. It will force Cypher to not use an index that could

have been used, and instead do a label scan.

Note

You cannot use index hints if your query has a START clause.

Query using an index hint

To query using an index hint, use USING INDEX.

Query

MATCH (n:Swede)

USING INDEX n:Swede(surname)

WHERE n.surname = 'Taylor'

RETURN n

Query Plan

+-----------------+----------------+------+---------+-------------+-----------------+

+-----------------+----------------+------+---------+-------------+-----------------+

| +ProduceResults | 1 | 1 | 0 | n | n |

| | +----------------+------+---------+-------------+-----------------+

| +NodeIndexSeek | 1 | 1 | 2 | n | :Swede(surname) |

+-----------------+----------------+------+---------+-------------+-----------------+

Total database accesses: 2

Query using multiple index hints

To query using multiple index hints, use USING INDEX.

Query

MATCH (m:German)-->(n:Swede)

USING INDEX m:German(surname)

USING INDEX n:Swede(surname)

WHERE m.surname = 'Plantikow' AND n.surname = 'Taylor'

RETURN m

General Clauses

153

Query Plan

+-------------------+------+---------+----------------+----------------+

+-------------------+------+---------+----------------+----------------+

| +ColumnFilter | 1 | 0 | m | keep columns m |

| | +------+---------+----------------+----------------+

+-------------------+------+---------+----------------+----------------+

Total database accesses: 11

Hinting a label scan

If the best performance is to be had by scanning all nodes in a label and then filtering on that set, use

USING SCAN.

Query

MATCH (m:German)

USING SCAN m:German

WHERE m.surname = 'Plantikow'

RETURN m

Query Plan

+------------------+----------------+------+---------+-------------+------------------------------+

+------------------+----------------+------+---------+-------------+------------------------------+

| +ProduceResults | 1 | 1 | 0 | m | m |

| | +----------------+------+---------+-------------+------------------------------+

| +Filter | 1 | 1 | 1 | m | m.surname == { AUTOSTRING0} |

| | +----------------+------+---------+-------------+------------------------------+

| +NodeByLabelScan | 1 | 1 | 2 | m | :German |

+------------------+----------------+------+---------+-------------+------------------------------+

Total database accesses: 3

154

Chapter11.Reading Clauses

The flow of data within a Cypher query is an unordered sequence of maps with key-value pairs — a set

of possible bindings between the identifiers in the query and values derived from the database. This set

is refined and augmented by subsequent parts of the query.

Reading Clauses

155

11.1.Match

The MATCH clause is used to search for the pattern described in it.

Introduction

The MATCH clause allows you to specify the patterns Neo4j will search for in the database. This is

the primary way of getting data into the current set of bindings. It is worth reading up more on the

specification of the patterns themselves in Section9.6, “Patterns” [127].

MATCH is often coupled to a WHERE part which adds restrictions, or predicates, to the MATCH patterns,

making them more specific. The predicates are part of the pattern description, not a filter applied after

the matching is done. This means that WHERE should always be put together with the MATCH clause it belongs

to.

MATCH can occur at the beginning of the query or later, possibly after a WITH. If it is the first clause,

nothing will have been bound yet, and Neo4j will design a search to find the results matching the clause

and any associated predicates specified in any WHERE part. This could involve a scan of the database,

a search for nodes of a certain label, or a search of an index to find starting points for the pattern

matching. Nodes and relationships found by this search are available as bound pattern elements, and

can be used for pattern matching of sub-graphs. They can also be used in any further MATCH clauses,

where Neo4j will use the known elements, and from there find further unknown elements.

Cypher is declarative, and so usually the query itself does not specify the algorithm to use to perform

the search. Neo4j will automatically work out the best approach to finding start nodes and matching

patterns. Predicates in WHERE parts can be evaluated before pattern matching, during pattern matching,

or after finding matches. However, there are cases where you can influence the decisions taken by

the query compiler. Read more about indexes in Section14.1, “Indexes” [244], and more about the

specifying index hints to force Neo4j to use a specific index in Section10.8, “Using” [152].

Tip

To understand more about the patterns used in the MATCH clause, read Section9.6,

“Patterns” [127].

The following graph is used for the examples below:

Figure11.1.Graph

Person

name = 'Oliver Stone'

Movie

name = 'WallStreet'

title = 'Wall Street'

DIRECTED

Person

name = 'Charlie Sheen'

ACTED_IN Person

name = 'Martin Sheen'

FATHER

ACTED_IN

Movie

title = 'The American President'

name = 'TheAm ericanPresident'

ACTED_IN

Person

name = 'Rob Reiner'

DIRECTED

Person

name = 'Michael Douglas'

ACTED_IN ACTED_IN

name = 'Rob Reiner'

name = 'Charlie Sheen'

TYPE THAT HAS SPACE IN IT

Basic node finding

Get all nodes

By just specifying a pattern with a single node and no labels, all nodes in the graph will be returned.

Query

MATCH (n)

RETURN n

Reading Clauses

156

Returns all the nodes in the database.

Result

Node[0]{name:"Oliver Stone"}

Node[1]{name:"Charlie Sheen"}

Node[2]{name:"Martin Sheen"}

Node[3]{title:"The American President", name:"TheAmericanPresident"}

Node[4]{name:"WallStreet", title:"Wall Street"}

Node[5]{name:"Rob Reiner"}

Node[6]{name:"Michael Douglas"}

Node[7]{name:"Rob Reiner"}

Node[8]{name:"Charlie Sheen"}

9 rows

Get all nodes with a label

Getting all nodes with a label on them is done with a single node pattern where the node has a label on

it.

Query

MATCH (movie:Movie)

RETURN movie

Returns all the movies in the database.

Result

movie

Node[3]{title:"The American President", name:"TheAmericanPresident"}

Node[4]{name:"WallStreet", title:"Wall Street"}

2 rows

Related nodes

The symbol -- means related to, without regard to type or direction of the relationship.

Query

MATCH (director { name:'Oliver Stone' })--(movie)

RETURN movie.title

Returns all the movies directed by Oliver Stone.

Result

movie.title

"Wall Street"

1 row

Match with labels

To constrain your pattern with labels on nodes, you add it to your pattern nodes, using the label syntax.

Query

MATCH (charlie:Person { name:'Charlie Sheen' })--(movie:Movie)

RETURN movie

Reading Clauses

157

Return any nodes connected with the Person Charlie that are labeled Movie.

Result

movie

Node[4]{name:"WallStreet", title:"Wall Street"}

1 row

Relationship basics

Outgoing relationships

When the direction of a relationship is interesting, it is shown by using --> or <--, like this:

Query

MATCH (martin { name:'Martin Sheen' })-->(movie)

RETURN movie.title

Returns nodes connected to Martin by outgoing relationships.

Result

movie.title

"The American President"

"Wall Street"

2 rows

Directed relationships and identifier

If an identifier is needed, either for filtering on properties of the relationship, or to return the

relationship, this is how you introduce the identifier.

Query

MATCH (martin { name:'Martin Sheen' })-[r]->(movie)

RETURN r

Returns all outgoing relationships from Martin.

Result

:ACTED_IN[3]{}

:ACTED_IN[1]{}

2 rows

Match by relationship type

When you know the relationship type you want to match on, you can specify it by using a colon

together with the relationship type.

Query

MATCH (wallstreet { title:'Wall Street' })<-[:ACTED_IN]-(actor)

RETURN actor

Returns nodes that ACTED_IN Wall Street.

Result

actor

Node[6]{name:"Michael Douglas"}

3 rows

Reading Clauses

158

actor

Node[2]{name:"Martin Sheen"}

Node[1]{name:"Charlie Sheen"}

3 rows

Match by multiple relationship types

To match on one of multiple types, you can specify this by chaining them together with the pipe symbol

Query

MATCH (wallstreet { title:'Wall Street' })<-[:ACTED_IN|:DIRECTED]-(person)

RETURN person

Returns nodes with a ACTED_IN or DIRECTED relationship to Wall Street.

Result

person

Node[0]{name:"Oliver Stone"}

Node[6]{name:"Michael Douglas"}

Node[2]{name:"Martin Sheen"}

Node[1]{name:"Charlie Sheen"}

4 rows

Match by relationship type and use an identifier

If you both want to introduce an identifier to hold the relationship, and specify the relationship type

you want, just add them both, like this.

Query

MATCH (wallstreet { title:'Wall Street' })<-[r:ACTED_IN]-(actor)

RETURN r

Returns nodes that ACTED_IN Wall Street.

Result

:ACTED_IN[2]{}

:ACTED_IN[1]{}

:ACTED_IN[0]{}

3 rows

Relationships in depth

Note

Inside a single pattern, relationships will only be matched once. You can read more about

this in Section8.4, “Uniqueness” [111].

Relationship types with uncommon characters

Sometime your database will have types with non-letter characters, or with spaces in them. Use `

(backtick) to quote these.

Query

MATCH (n { name:'Rob Reiner' })-[r:`TYPE THAT HAS SPACE IN IT`]->()

Reading Clauses

159

RETURN r

Returns a relationship of a type with spaces in it.

Result

:TYPE THAT HAS SPACE IN IT[8]{}

1 row

Multiple relationships

Relationships can be expressed by using multiple statements in the form of ()--(), or they can be

strung together, like this:

Query

MATCH (charlie { name:'Charlie Sheen' })-[:ACTED_IN]->(movie)<-[:DIRECTED]-(director)

RETURN charlie,movie,director

Returns the three nodes in the path.

Result

charlie movie director

Node[1]{name:"Charlie Sheen"} Node[4]{name:"WallStreet",

title:"Wall Street"}

Node[0]{name:"Oliver Stone"}

1 row

Variable length relationships

Nodes that are a variable number of relationship→node hops away can be found using the following

syntax: -[:TYPE*minHops..maxHops]->. minHops and maxHops are optional and default to 1 and infinity

respectively. When no bounds are given the dots may be omitted.

Query

MATCH (martin { name:"Martin Sheen" })-[:ACTED_IN*1..2]-(x)

RETURN x

Returns nodes that are 1 or 2 relationships away from Martin.

Result

Node[4]{name:"WallStreet", title:"Wall Street"}

Node[1]{name:"Charlie Sheen"}

Node[6]{name:"Michael Douglas"}

Node[3]{title:"The American President", name:"TheAmericanPresident"}

Node[6]{name:"Michael Douglas"}

5 rows

Relationship identifier in variable length relationships

When the connection between two nodes is of variable length, a relationship identifier becomes an

collection of relationships.

Query

MATCH (actor { name:'Charlie Sheen' })-[r:ACTED_IN*2]-(co_actor)

RETURN r

The query returns a collection of relationships.

Reading Clauses

160

Result

[:ACTED_IN[0]{}, :ACTED_IN[1]{}]

[:ACTED_IN[0]{}, :ACTED_IN[2]{}]

2 rows

Match with properties on a variable length path

A variable length relationship with properties defined on in it means that all relationships in the path

must have the property set to the given value. In this query, there are two paths between Charile Sheen

and his dad Martin Sheen. One of the includes a “blocked” relationship and the other doesn’t. In this

case we first alter the original graph by using the following query to add “blocked” and “unblocked”

relationships:

MATCH (charlie:Person { name:'Charlie Sheen' }),(martin:Person { name:'Martin Sheen' })

CREATE (charlie)-[:X { blocked:false }]->(:Unblocked)<-[:X { blocked:false }]-(martin)

CREATE (charlie)-[:X { blocked:true }]->(:Blocked)<-[:X { blocked:false }]-(martin);

This means that we are starting out with the following graph:

Person

name = 'Oliv er St on e'

Mov ie

name = 'WallStreet '

title = 'Wall Str ee t'

DIRECTED

Person

name = 'Charlie Sheen'

ACTED_IN

Blocked

blocked = true

Unb locked

blocked = false

Blocked

blocked = true

Unb locked

blocked = false

Blocked

blocked = true

Unb locked

blocked = false

Person

name = 'Marti n Sheen'

FATHER

ACTED_IN X

blocked = false X

blocked = false

blocked = false X

blocked = false

Mov ie

title = 'The Am erican President '

name = 'Th eAmerica nPresident '

ACTED_IN

Person

name = 'Rob Reiner'

DIRECTED

Person

name = 'Mich ael Dou glas'

ACTED_IN ACTED_IN

name = 'Rob Reiner'

name = 'Charlie Sheen'

TYPE THAT HAS SPACE IN IT

Query

MATCH p =(charlie:Person)-[* { blocked:false }]-(martin:Person)

WHERE charlie.name = 'Charlie Sheen' AND martin.name = 'Martin Sheen'

RETURN p

Returns the paths between Charlie and Martin Sheen where all relationships have the blocked property

set to FALSE.

Result

[Node[1]{name:"Charlie Sheen"}, :X[9]{blocked:false}, Node[9]{}, :X[10]{blocked:false}, Node[2]

{name:"Martin Sheen"}]

1 row

Zero length paths

Using variable length paths that have the lower bound zero means that two identifiers can point to

the same node. If the distance between two nodes is zero, they are by definition the same node.

Note that when matching zero length paths the result may contain a match even when matching on a

relationship type not in use.

Query

MATCH (wallstreet:Movie { title:'Wall Street' })-[*0..1]-(x)

RETURN x

Returns all nodes that are zero or one relationships away from Wall Street.

Result

Node[4]{name:"WallStreet", title:"Wall Street"}

5 rows

Reading Clauses

161

Node[1]{name:"Charlie Sheen"}

Node[2]{name:"Martin Sheen"}

Node[6]{name:"Michael Douglas"}

Node[0]{name:"Oliver Stone"}

5 rows

Named path

If you want to return or filter on a path in your pattern graph, you can a introduce a named path.

Query

MATCH p =(michael { name:'Michael Douglas' })-->()

RETURN p

Returns the two paths starting from Michael.

Result

[Node[6]{name:"Michael Douglas"}, :ACTED_IN[4]{}, Node[3]{title:"The American President",

name:"TheAmericanPresident"}]

[Node[6]{name:"Michael Douglas"}, :ACTED_IN[2]{}, Node[4]{name:"WallStreet", title:"Wall Street"}]

2 rows

Matching on a bound relationship

When your pattern contains a bound relationship, and that relationship pattern doesn’t specify

direction, Cypher will try to match the relationship in both directions.

Query

MATCH (a)-[r]-(b)

WHERE id(r)= 0

RETURN a,b

This returns the two connected nodes, once as the start node, and once as the end node.

Result

a b

Node[1]{name:"Charlie Sheen"} Node[4]{name:"WallStreet", title:"Wall Street"}

Node[4]{name:"WallStreet", title:"Wall Street"} Node[1]{name:"Charlie Sheen"}

2 rows

Shortest path

Single shortest path

Finding a single shortest path between two nodes is as easy as using the shortestPath function. It’s done

like this:

Query

MATCH (martin:Person { name:"Martin Sheen" }),(oliver:Person { name:"Oliver Stone" }),

p = shortestPath((martin)-[*..15]-(oliver))

RETURN p

This means: find a single shortest path between two nodes, as long as the path is max 15 relationships

long. Inside of the parentheses you define a single link of a path — the starting node, the connecting

Reading Clauses

162

relationship and the end node. Characteristics describing the relationship like relationship type, max

hops and direction are all used when finding the shortest path. You can also mark the path as optional.

Result

[Node[2]{name:"Martin Sheen"}, :ACTED_IN[1]{}, Node[4]{name:"WallStreet", title:"Wall

Street"}, :DIRECTED[5]{}, Node[0]{name:"Oliver Stone"}]

1 row

All shortest paths

Finds all the shortest paths between two nodes.

Query

MATCH (martin:Person { name:"Martin Sheen" }),(michael:Person { name:"Michael Douglas" }),

p = allShortestPaths((martin)-[*]-(michael))

RETURN p

Finds the two shortest paths between Martin and Michael.

Result

[Node[2]{name:"Martin Sheen"}, :ACTED_IN[3]{}, Node[3]{title:"The American President",

name:"TheAmericanPresident"}, :ACTED_IN[4]{}, Node[6]{name:"Michael Douglas"}]

[Node[2]{name:"Martin Sheen"}, :ACTED_IN[1]{}, Node[4]{name:"WallStreet", title:"Wall

Street"}, :ACTED_IN[2]{}, Node[6]{name:"Michael Douglas"}]

2 rows

Get node or relationship by id

Node by id

Search for nodes by id can be done with the id function in a predicate.

Note

Neo4j reuses its internal ids when nodes and relationships are deleted. This means that

applications using, and relying on internal Neo4j ids, are brittle or at risk of making mistakes.

Rather use application generated ids.

Query

MATCH (n)

WHERE id(n)= 1

RETURN n

The corresponding node is returned.

Result

Node[1]{name:"Charlie Sheen"}

1 row

Relationship by id

Search for nodes by id can be done with the id function in a predicate.

This is not recommended practice. See the section called “Node by id” [162] for more information on

the use of Neo4j ids.

Query

Reading Clauses

163

MATCH ()-[r]->()

WHERE id(r)= 0

RETURN r

The relationship with id 0 is returned.

Result

:ACTED_IN[0]{}

1 row

Multiple nodes by id

Multiple nodes are selected by specifying them in an IN clause.

Query

MATCH (n)

WHERE id(n) IN [1, 2, 0]

RETURN n

This returns the nodes listed in the IN expression.

Result

Node[0]{name:"Oliver Stone"}

Node[1]{name:"Charlie Sheen"}

Node[2]{name:"Martin Sheen"}

3 rows

Reading Clauses

164

11.2.Optional Match

The OPTIONAL MATCH clause is used to search for the pattern described in it, while using NULLs

for missing parts of the pattern.

Introduction

OPTIONAL MATCH matches patterns against your graph database, just like MATCH does. The difference is that

if no matches are found, OPTIONAL MATCH will use NULLs for missing parts of the pattern. OPTIONAL MATCH

could be considered the Cypher equivalent of the outer join in SQL.

Either the whole pattern is matched, or nothing is matched. Remember that WHERE is part of the pattern

description, and the predicates will be considered while looking for matches, not after. This matters

especially in the case of multiple (OPTIONAL) MATCH clauses, where it is crucial to put WHERE together with

the MATCH it belongs to.

Tip

To understand the patterns used in the OPTIONAL MATCH clause, read Section9.6,

“Patterns” [127].

The following graph is used for the examples below:

Figure11.2.Graph

Person

nam e = 'Oliver Stone'

Movie

nam e = 'WallStreet'

title = 'Wall Street'

DIRECTED

Person

nam e = 'Charlie Sheen'

ACTED_IN Person

nam e = 'Martin Sheen'

FATHER

ACTED_IN

Movie

title = 'The Am erican President'

nam e = 'TheAm ericanPresident'

ACTED_IN

Person

nam e = 'Rob Reiner'

DIRECTED

Person

nam e = 'Michael Douglas'

ACTED_IN ACTED_IN

Relationship

If a relationship is optional, use the OPTIONAL MATCH clause. This is similar to how a SQL outer join works.

If the relationship is there, it is returned. If it’s not, NULL is returned in it’s place.

Query

MATCH (a:Movie { title: 'Wall Street' })

OPTIONAL MATCH (a)-->(x)

RETURN x

Returns NULL, since the node has no outgoing relationships.

Result

<null>

1 row

Properties on optional elements

Returning a property from an optional element that is NULL will also return NULL.

Reading Clauses

165

Query

MATCH (a:Movie { title: 'Wall Street' })

OPTIONAL MATCH (a)-->(x)

RETURN x, x.name

Returns the element x (NULL in this query), and NULL as its name.

Result

x x.name

1 row

Optional typed and named relationship

Just as with a normal relationship, you can decide which identifier it goes into, and what relationship

type you need.

Query

MATCH (a:Movie { title: 'Wall Street' })

OPTIONAL MATCH (a)-[r:ACTS_IN]->()

RETURN r

This returns a node, and NULL, since the node has no outgoing ACTS_IN relationships.

Result

<null>

1 row

Reading Clauses

166

11.3.Where

WHERE adds constraints to the patterns in a MATCH or OPTIONAL MATCH clause or filters the results

of a WITH clause.

WHERE is not a clause in it’s own right — rather, it’s part of MATCH, OPTIONAL MATCH, START and WITH.

In the case of WITH and START, WHERE simply filters the results.

For MATCH and OPTIONAL MATCH on the other hand, WHERE adds constraints to the patterns described. It

should not be seen as a filter after the matching is finished.

Important

In the case of multiple MATCH / OPTIONAL MATCH clauses, the predicate in WHERE is always

a part of the patterns in the directly preceding MATCH / OPTIONAL MATCH. Both results and

performance may be impacted if the WHERE is put inside the wrong MATCH clause.

Figure11.3.Graph

address = 'Sweden/Malm o'

nam e = 'Tobias'

age = 25

em ail = 'peter_n@exam ple.com'

nam e = 'Peter'

age = 34

Swedish

nam e = 'Andres'

age = 36

belt = 'white'

KNOWS KNOWS

Basic usage

Boolean operations

You can use the expected boolean operators AND and OR, and also the boolean function NOT. See

Section9.8, “Working with NULL” [134] for more information on how this works with NULL.

Query

MATCH (n)

WHERE n.name = 'Peter' XOR (n.age < 30 AND n.name = "Tobias") OR NOT (n.name = "Tobias" OR

n.name="Peter")

RETURN n

Result

Node[0]{address:"Sweden/Malmo", name:"Tobias", age:25}

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

Node[2]{name:"Andres", age:36, belt:"white"}

3 rows

Filter on node label

To filter nodes by label, write a label predicate after the WHERE keyword using WHERE n:foo.

Query

MATCH (n)

WHERE n:Swedish

RETURN n

Reading Clauses

167

The "Andres" node will be returned.

Result

Node[2]{name:"Andres", age:36, belt:"white"}

1 row

Filter on node property

To filter on a property, write your clause after the WHERE keyword. Filtering on relationship properties

works just the same way.

Query

MATCH (n)

WHERE n.age < 30

RETURN n

"Tobias" is returned because he is younger than 30.

Result

Node[0]{address:"Sweden/Malmo", name:"Tobias", age:25}

1 row

Filter on dynamic node property

To filter on a property using a dynamically computed name, use square bracket syntax.

Parameters

{

"prop" : "AGE"

}

Query

MATCH (n)

WHERE n[toLower({ prop })]< 30

RETURN n

"Tobias" is returned because he is younger than 30.

Result

Node[0]{address:"Sweden/Malmo", name:"Tobias", age:25}

1 row

Property exists

Use the EXISTS() function to only include nodes or relationships in which a property exists.

Query

MATCH (n)

WHERE exists(n.belt)

RETURN n

"Andres" will be returned because he is the only one with a belt property.

Important

The HAS() function has been superseded by EXISTS() and will be removed in a future release.

Reading Clauses

168

Result

Node[2]{name:"Andres", age:36, belt:"white"}

1 row

String matching

The start and end of strings can be matched using STARTS WITH and ENDS WITH. To match regardless of

location in a string, use CONTAINS. The matching is case-sensitive.

Match the start of a string

The STARTS WITH operator is used to perform case-sensitive matching on the start of strings.

Query

MATCH (n)

WHERE n.name STARTS WITH 'Pet'

RETURN n

"Peter" will be returned because his name starts with Pet.

Result

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

1 row

Match the end of a string

The ENDS WITH operator is used to perform case-sensitive matching on the end of strings.

Query

MATCH (n)

WHERE n.name ENDS WITH 'ter'

RETURN n

"Peter" will be returned because his name ends with ter.

Result

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

1 row

Match anywhere in a string

The CONTAINS operator is used to perform case-sensitive matching regardless of location in strings.

Query

MATCH (n)

WHERE n.name CONTAINS 'ete'

RETURN n

"Peter" will be returned because his name contains ete.

Result

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

1 row

String matching negation

Use the NOT keyword to exclude all matches on given string from your result:

Reading Clauses

169

Query

MATCH (n)

WHERE NOT n.name ENDS WITH 's'

RETURN n

"Peter" will be returned because his name does not end with s.

Result

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

1 row

Regular expressions

Cypher supports filtering using regular expressions. The regular expression syntax is inherited from

the Java regular expressions1. This includes support for flags that change how strings are matched,

including case-insensitive (?i), multiline (?m) and dotall (?s). Flags are given at the start of the regular

expression, for example MATCH (n) WHERE n.name =~ '(?i)Lon.*' RETURN n will return nodes with name

London or with name LonDoN.

Regular expressions

You can match on regular expressions by using =~ "regexp", like this:

Query

MATCH (n)

WHERE n.name =~ 'Tob.*'

RETURN n

"Tobias" is returned because his name starts with Tob.

Result

Node[0]{address:"Sweden/Malmo", name:"Tobias", age:25}

1 row

Escaping in regular expressions

If you need a forward slash inside of your regular expression, escape it. Remember that back slash

needs to be escaped in string literals.

Query

MATCH (n)

WHERE n.address =~ 'Sweden\\/Malmo'

RETURN n

"Tobias" is returned because his address is in Sweden/Malmo.

Result

Node[0]{address:"Sweden/Malmo", name:"Tobias", age:25}

1 row

Case insensitive regular expressions

By pre-pending a regular expression with (?i), the whole expression becomes case insensitive.

Query

1 https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Reading Clauses

170

MATCH (n)

WHERE n.name =~ '(?i)ANDR.*'

RETURN n

"Andres" is returned because his name starts with ANDR regardless of case.

Result

Node[2]{name:"Andres", age:36, belt:"white"}

1 row

Using path patterns in WHERE

Filter on patterns

Patterns are expressions in Cypher, expressions that return a collection of paths. Collection expressions

are also predicates — an empty collection represents false, and a non-empty represents true.

So, patterns are not only expressions, they are also predicates. The only limitation to your pattern is

that you must be able to express it in a single path. You can not use commas between multiple paths

like you do in MATCH. You can achieve the same effect by combining multiple patterns with AND.

Note that you can not introduce new identifiers here. Although it might look very similar to the MATCH

patterns, the WHERE clause is all about eliminating matched subgraphs. MATCH (a)-[*]->(b) is very

different from WHERE (a)-[*]->(b); the first will produce a subgraph for every path it can find between

a and b, and the latter will eliminate any matched subgraphs where a and b do not have a directed

relationship chain between them.

Query

MATCH (tobias { name: 'Tobias' }),(others)

WHERE others.name IN ['Andres', 'Peter'] AND (tobias)<--(others)

RETURN others

Nodes that have an outgoing relationship to the "Tobias" node are returned.

Result

others

Node[2]{name:"Andres", age:36, belt:"white"}

1 row

Filter on patterns using NOT

The NOT function can be used to exclude a pattern.

Query

MATCH (persons),(peter { name: 'Peter' })

WHERE NOT (persons)-->(peter)

RETURN persons

Nodes that do not have an outgoing relationship to the "Peter" node are returned.

Result

persons

Node[0]{address:"Sweden/Malmo", name:"Tobias", age:25}

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

2 rows

Filter on patterns with properties

You can also add properties to your patterns:

Reading Clauses

171

Query

MATCH (n)

WHERE (n)-[:KNOWS]-({ name:'Tobias' })

RETURN n

Finds all nodes that have a KNOWS relationship to a node with the name "Tobias".

Result

Node[2]{name:"Andres", age:36, belt:"white"}

1 row

Filtering on relationship type

You can put the exact relationship type in the MATCH pattern, but sometimes you want to be able to do

more advanced filtering on the type. You can use the special property TYPE to compare the type with

something else. In this example, the query does a regular expression comparison with the name of the

relationship type.

Query

MATCH (n)-[r]->()

WHERE n.name='Andres' AND type(r)=~ 'K.*'

RETURN r

This returns relationships that has a type whose name starts with K.

Result

:KNOWS[1]{}

:KNOWS[0]{}

2 rows

Collections

IN operator

To check if an element exists in a collection, you can use the IN operator.

Query

MATCH (a)

WHERE a.name IN ["Peter", "Tobias"]

RETURN a

This query shows how to check if a property exists in a literal collection.

Result

Node[0]{address:"Sweden/Malmo", name:"Tobias", age:25}

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

2 rows

Missing properties and values

Default to false if property is missing

As missing properties evaluate to NULL, the comparision in the example will evaluate to FALSE for nodes

without the belt property.

Reading Clauses

172

Query

MATCH (n)

WHERE n.belt = 'white'

RETURN n

Only nodes with white belts are returned.

Result

Node[2]{name:"Andres", age:36, belt:"white"}

1 row

Default to true if property is missing

If you want to compare a property on a graph element, but only if it exists, you can compare the

property against both the value you are looking for and NULL, like:

Query

MATCH (n)

WHERE n.belt = 'white' OR n.belt IS NULL RETURN n

ORDER BY n.name

This returns all nodes, even those without the belt property.

Result

Node[2]{name:"Andres", age:36, belt:"white"}

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

Node[0]{address:"Sweden/Malmo", name:"Tobias", age:25}

3 rows

Filter on NULL

Sometimes you might want to test if a value or an identifier is NULL. This is done just like SQL does it,

with IS NULL. Also like SQL, the negative is IS NOT NULL, although NOT(IS NULL x) also works.

Query

MATCH (person)

WHERE person.name = 'Peter' AND person.belt IS NULL RETURN person

Nodes that have name Peter but no belt property are returned.

Result

person

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

1 row

Using ranges

Simple range

To check for an element being inside a specific range, use the inequality operators <, <=, >=, >.

Query

MATCH (a)

WHERE a.name >= 'Peter'

RETURN a

Reading Clauses

173

Nodes having a name property lexicographically greater than or equal to Peter are returned.

Result

Node[0]{address:"Sweden/Malmo", name:"Tobias", age:25}

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

2 rows

Composite range

Several inequalities can be used to construct a range.

Query

MATCH (a)

WHERE a.name > 'Andres' AND a.name < 'Tobias'

RETURN a

Nodes having a name property lexicographically between Andres and Tobias are returned.

Result

Node[1]{email:"peter_n@example. com", name:"Peter", age:34}

1 row

Reading Clauses

174

11.4.Start

Find starting points through legacy indexes.

Important

The START clause should only be used when accessing legacy indexes (see Chapter35, Legacy

Indexing [621]). In all other cases, use MATCH instead (see Section11.1, “Match” [155]).

In Cypher, every query describes a pattern, and in that pattern one can have multiple starting points.

A starting point is a relationship or a node where a pattern is anchored. Using START you can only

introduce starting points by legacy index seeks. Note that trying to use a legacy index that doesn’t exist

will generate an error.

This is the graph the examples are using:

Figure11.4.Graph

Node[0]

nam e = 'A'

Node[2]

nam e = 'C'

KNOWS

Node[1]

nam e = 'B'

KNOWS

Get node or relationship from index

Node by index seek

When the starting point can be found by using index seeks, it can be done like this: node:index-name(key

= "value"). In this example, there exists a node index named nodes.

Query

START n=node:nodes(name = "A")

RETURN n

The query returns the node indexed with the name "A".

Result

Node[0]{name:"A"}

1 row

Relationship by index seek

When the starting point can be found by using index seeks, it can be done like this: relationship:index-

name(key = "value").

Query

START r=relationship:rels(name = "Andrés")

RETURN r

The relationship indexed with the name property set to "Andrés" is returned by the query.

Reading Clauses

175

Result

:KNOWS[0]{name:"Andrés"

1 row

Node by index query

When the starting point can be found by more complex Lucene queries, this is the syntax to use:

node:index-name("query").This allows you to write more advanced index queries.

Query

START n=node:nodes("name:A")

RETURN n

The node indexed with name "A" is returned by the query.

Result

Node[0]{name:"A"}

1 row

Reading Clauses

176

11.5.Aggregation

Introduction

To calculate aggregated data, Cypher offers aggregation, much like SQL’s GROUP BY.

Aggregate functions take multiple input values and calculate an aggregated value from them. Examples

are avg that calculates the average of multiple numeric values, or min that finds the smallest numeric

value in a set of values.

Aggregation can be done over all the matching subgraphs, or it can be further divided by introducing

key values. These are non-aggregate expressions, that are used to group the values going into the

aggregate functions.

So, if the return statement looks something like this:

RETURN n, count(*)

We have two return expressions: n, and count(*). The first, n, is no aggregate function, and so it will be

the grouping key. The latter, count(*) is an aggregate expression. So the matching subgraphs will be

divided into different buckets, depending on the grouping key. The aggregate function will then run on

these buckets, calculating the aggregate values.

If you want to use aggregations to sort your result set, the aggregation must be included in the RETURN

to be used in your ORDER BY.

The last piece of the puzzle is the DISTINCT keyword. It is used to make all values unique before running

them through an aggregate function.

An example might be helpful. In this case, we are running the query against the following data:

Person

nam e = 'D'

eyes = 'brown'

Person

nam e = 'A'

property = 13

KNOWS

Person

nam e = 'C'

property = 44

eyes = 'blue'

KNOWS

Person

nam e = 'B'

property = 33

eyes = 'blue'

KNOWS

Person

nam e = 'D'

KNOWS KNOWS

Query

MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person)

WHERE me.name = 'A'

RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)

In this example we are trying to find all our friends of friends, and count them. The first aggregate

function, count(DISTINCT friend_of_friend), will only see a friend_of_friend once — DISTINCT removes

the duplicates. The latter aggregate function, count(friend_of_friend), might very well see the same

friend_of_friend multiple times. In this case, both B and C know D and thus D will get counted twice,

when not using DISTINCT.

Reading Clauses

177

Result

count(distinct friend_of_friend) count(friend_of_friend)

1 2

1 row

The following examples are assuming the example graph structure below.

Figure11.5.Graph

Person

nam e = 'D'

eyes = 'brown'

Person

nam e = 'A'

property = 13

KNOWS

Person

nam e = 'C'

property = 44

eyes = 'blue'

KNOWS

Person

nam e = 'B'

property = 33

eyes = 'blue'

KNOWS

COUNT

COUNT is used to count the number of rows.

COUNT can be used in two forms — COUNT(*) which just counts the number of matching rows, and

COUNT(<identifier>), which counts the number of non-NULL values in <identifier>.

Count nodes

To count the number of nodes, for example the number of nodes connected to one node, you can use

count(*).

Query

MATCH (n { name: 'A' })-->(x)

RETURN n, count(*)

This returns the start node and the count of related nodes.

Result

n count(*)

Node[1]{name:"A", property:13} 3

1 row

Group Count Relationship Types

To count the groups of relationship types, return the types and count them with count(*).

Query

MATCH (n { name: 'A' })-[r]->()

RETURN type(r), count(*)

The relationship types and their group count is returned by the query.

Result

type(r) count(*)

"KNOWS" 3

1 row

Reading Clauses

178

Count entities

Instead of counting the number of results with count(*), it might be more expressive to include the

name of the identifier you care about.

Query

MATCH (n { name: 'A' })-->(x)

RETURN count(x)

The example query returns the number of connected nodes from the start node.

Result

count(x)

1 row

Count non-null values

You can count the non-NULL values by using count(<identifier>).

Query

MATCH (n:Person)

RETURN count(n.property)

The count of related nodes with the property property set is returned by the query.

Result

count(n.property)

1 row

Statistics

sum

The sum aggregation function simply sums all the numeric values it encounters. NULLs are silently

dropped.

Query

MATCH (n:Person)

RETURN sum(n.property)

This returns the sum of all the values in the property property.

Result

sum(n.property)

1 row

avg

avg calculates the average of a numeric column.

Query

MATCH (n:Person)

RETURN avg(n.property)

The average of all the values in the property property is returned by the example query.

Reading Clauses

179

Result

avg(n.property)

30. 0

1 row

percentileDisc

percentileDisc calculates the percentile of a given value over a group, with a percentile from 0.0 to 1.0.

It uses a rounding method, returning the nearest value to the percentile. For interpolated values, see

percentileCont.

Query

MATCH (n:Person)

RETURN percentileDisc(n.property, 0.5)

The 50th percentile of the values in the property property is returned by the example query. In this case,

0.5 is the median, or 50th percentile.

Result

percentileDisc(n.property, 0.5)

1 row

percentileCont

percentileCont calculates the percentile of a given value over a group, with a percentile from 0.0 to 1.0.

It uses a linear interpolation method, calculating a weighted average between two values, if the desired

percentile lies between them. For nearest values using a rounding method, see percentileDisc.

Query

MATCH (n:Person)

RETURN percentileCont(n.property, 0.4)

The 40th percentile of the values in the property property is returned by the example query, calculated

with a weighted average.

Result

percentileCont(n.property, 0.4)

29. 0

1 row

stdev

stdev calculates the standard deviation for a given value over a group. It uses a standard two-pass

method, with N - 1 as the denominator, and should be used when taking a sample of the population

for an unbiased estimate. When the standard variation of the entire population is being calculated,

stdevp should be used.

Query

MATCH (n)

WHERE n.name IN ['A', 'B', 'C']

RETURN stdev(n.property)

The standard deviation of the values in the property property is returned by the example query.

Result

stdev(n.property)

15. 716233645501712

1 row

Reading Clauses

180

stdevp

stdevp calculates the standard deviation for a given value over a group. It uses a standard two-pass

method, with N as the denominator, and should be used when calculating the standard deviation for an

entire population. When the standard variation of only a sample of the population is being calculated,

stdev should be used.

Query

MATCH (n)

WHERE n.name IN ['A', 'B', 'C']

RETURN stdevp(n.property)

The population standard deviation of the values in the property property is returned by the example

query.

Result

stdevp(n.property)

12. 832251036613439

1 row

max

max find the largest value in a numeric column.

Query

MATCH (n:Person)

RETURN max(n.property)

The largest of all the values in the property property is returned.

Result

max(n.property)

1 row

min

min takes a numeric property as input, and returns the smallest value in that column.

Query

MATCH (n:Person)

RETURN min(n.property)

This returns the smallest of all the values in the property property.

Result

min(n.property)

1 row

collect

collect collects all the values into a list. It will ignore NULLs.

Query

MATCH (n:Person)

RETURN collect(n.property)

Returns a single row, with all the values collected.

Reading Clauses

181

Result

collect(n.property)

[13, 33, 44]

1 row

DISTINCT

All aggregation functions also take the DISTINCT modifier, which removes duplicates from the values. So,

to count the number of unique eye colors from nodes related to a, this query can be used:

Query

MATCH (a:Person { name: 'A' })-->(b)

RETURN count(DISTINCT b.eyes)

Returns the number of eye colors.

Result

count(distinct b.eyes)

1 row

Reading Clauses

182

11.6.Load CSV

LOAD CSV is used to import data from CSV files.

•The URL of the CSV file is specified by using FROM followed by an arbitrary expression evaluating to the

URL in question.

•It is required to specify an identifier for the CSV data using AS.

•LOAD CSV supports resources compressed with gzip, Deflate, as well as ZIP archives.

•CSV files can be stored on the database server and are then accessible using a file:/// URL.

Alternatively, LOAD CSV also supports accessing CSV files via HTTPS, HTTP, and FTP.

•LOAD CSV will follow HTTP redirects but for security reasons it will not follow redirects that changes the

protocol, for example if the redirect is going from HTTPS to HTTP.

Configuration settings for file URLs

allow_file_urls [465] This setting determines if Cypher will allow the use of

file:/// URLs when loading data using LOAD CSV. Such

URLs identify files on the filesystem of the database server.

Default is true.

dbms.security.load_csv_file_url_root [468]

Sets the root directory for file:/// URLs used with the

Cypher LOAD CSV clause. This must be set to a single

directory on the filesystem of the database server, and will

make all requests to load from file:/// URLs relative to the

specified directory (similar to how a unix chroot operates).

By default, this setting is not configured.

• When not set, file URLs will be resolved as relative

to the root of the database server filesystem. If this

is the case, a file URL will typically look like file:///

home/username/myfile.csv or file:///C:/Users/username/

myfile.csv. Using these URLs in LOAD CSV will read

content from files on the database server filesystem,

specifically /home/username/myfile.csv and C:\Users

\username\myfile.csv respectively. For security reasons

you may not want users to be able to load files located

anywhere on the database server filesystem and should set

dbms.security.load_csv_file_url_root to a safe directory to

load files from.

• When set, file URLs will be resolved as relative to the

directory it’s set to. In this case a file URL will typically

look like file:///myfile.csv or file:///myproject/

myfile.csv.

•If set to data/import using the above URLs in LOAD CSV

would read content from data/import/myfile.csv and

data/import/myproject/myfile.csv respectively, where

both are relative to the database install directory.

•If set to /home/neo4j using the above URLs in LOAD CSV

would read content from /home/neo4j/myfile.csv and /

home/neo4j/myproject/myfile.csv respectively.

See the examples below for further details.

There is also a worked example, see Section12.8, “Importing CSV files with Cypher” [211].

CSV file format

The CSV file to use with LOAD CSV must have the following characteristics:

Reading Clauses

183

• the character encoding is UTF-8;

•the end line termination is system dependent, e.g., it is \n on unix or \r\n on windows;

•the default field terminator is ,;

•the field terminator character can be change by using the option FIELDTERMINATOR available in the LOAD

CSV command;

• quoted strings are allowed in the CSV file and the quotes are dropped when reading the data;

•the character for string quotation is double quote ";

•the escape character is \.

Import data from a CSV file

To import data from a CSV file into Neo4j, you can use LOAD CSV to get the data into your query. Then

you write it to your database using the normal updating clauses of Cypher.

artists.csv

"1","ABBA","1992"

"2","Roxette","1986"

"3","Europe","1979"

"4","The Cardigans","1992"

Query

LOAD CSV FROM 'http://neo4j.com/docs/2.3.12/csv/artists.csv' AS line

CREATE (:Artist { name: line[1], year: toInt(line[2])})

A new node with the Artist label is created for each row in the CSV file. In addition, two columns from

the CSV file are set as properties on the nodes.

Result

(empty result)

Nodes created: 4

Properties set: 8

Labels added: 4

Import data from a CSV file containing headers

When your CSV file has headers, you can view each row in the file as a map instead of as an array of

strings.

artists-with-headers.csv

"Id","Name","Year"

"1","ABBA","1992"

"2","Roxette","1986"

"3","Europe","1979"

"4","The Cardigans","1992"

Query

LOAD CSV WITH HEADERS FROM 'http://neo4j.com/docs/2.3.12/csv/artists-with-headers.csv' AS line

CREATE (:Artist { name: line.Name, year: toInt(line.Year)})

This time, the file starts with a single row containing column names. Indicate this using WITH HEADERS and

you can access specific fields by their corresponding column name.

Result

(empty result)

Nodes created: 4

Properties set: 8

Labels added: 4

Reading Clauses

184

Import data from a CSV file with a custom field delimiter

Sometimes, your CSV file has other field delimiters than commas. You can specify which delimiter your

file uses using FIELDTERMINATOR.

artists-fieldterminator.csv

"1";"ABBA";"1992"

"2";"Roxette";"1986"

"3";"Europe";"1979"

"4";"The Cardigans";"1992"

Query

LOAD CSV FROM 'http://neo4j.com/docs/2.3.12/csv/artists-fieldterminator.csv' AS line FIELDTERMINATOR

';'

CREATE (:Artist { name: line[1], year: toInt(line[2])})

As values in this file are separated by a semicolon, a custom FIELDTERMINATOR is specified in the LOAD CSV

clause.

Result

(empty result)

Nodes created: 4

Properties set: 8

Labels added: 4

Importing large amounts of data

If the CSV file contains a significant number of rows (approaching hundreds of thousands or millions),

USING PERIODIC COMMIT can be used to instruct Neo4j to perform a commit after a number of rows. This

reduces the memory overhead of the transaction state. By default, the commit will happen every 1000

rows. For more information, see Section12.9, “Using Periodic Commit” [213].

Query

USING PERIODIC COMMIT

LOAD CSV FROM 'http://neo4j.com/docs/2.3.12/csv/artists.csv' AS line

CREATE (:Artist { name: line[1], year: toInt(line[2])})

Result

(empty result)

Nodes created: 4

Properties set: 8

Labels added: 4

Setting the rate of periodic commits

You can set the number of rows as in the example, where it is set to 500 rows.

Query

USING PERIODIC COMMIT 500

LOAD CSV FROM 'http://neo4j.com/docs/2.3.12/csv/artists.csv' AS line

CREATE (:Artist { name: line[1], year: toInt(line[2])})

Result

(empty result)

Nodes created: 4

Properties set: 8

Labels added: 4

Reading Clauses

185

Import data containing escaped characters

In this example, we both have additional quotes around the values, as well as escaped quotes inside

one value.

artists-with-escaped-char.csv

"1","The ""Symbol""","1992"

Query

LOAD CSV FROM 'http://neo4j.com/docs/2.3.12/csv/artists-with-escaped-char.csv' AS line

CREATE (a:Artist { name: line[1], year: toInt(line[2])})

RETURN a.name AS name, a.year AS year, length(a.name) AS length

Note that strings are wrapped in quotes in the output here. You can see that when comparing to the

length of the string in this case!

Result

name year length

"The "Symbol"" 1992 12

1 row

Nodes created: 1

Properties set: 2

Labels added: 1

186

Chapter12.Writing Clauses

Write data to the database.

Writing Clauses

187

12.1.Create

The CREATE clause is used to create graph elements — nodes and relationships.

Tip

In the CREATE clause, patterns are used a lot. Read Section9.6, “Patterns” [127] for an

introduction.

Create nodes

Create single node

Creating a single node is done by issuing the following query.

Query

CREATE (n)

Nothing is returned from this query, except the count of affected nodes.

Result

(empty result)

Nodes created: 1

Create multiple nodes

Creating multiple nodes is done by separating them with a comma.

Query

CREATE (n),(m)

Result

(empty result)

Nodes created: 2

Create a node with a label

To add a label when creating a node, use the syntax below.

Query

CREATE (n:Person)

Nothing is returned from this query.

Result

(empty result)

Nodes created: 1

Labels added: 1

Create a node with multiple labels

To add labels when creating a node, use the syntax below. In this case, we add two labels.

Query

CREATE (n:Person:Swedish)

Writing Clauses

188

Nothing is returned from this query.

Result

(empty result)

Nodes created: 1

Labels added: 2

Create node and add labels and properties

When creating a new node with labels, you can add properties at the same time.

Query

CREATE (n:Person { name : 'Andres', title : 'Developer' })

Nothing is returned from this query.

Result

(empty result)

Nodes created: 1

Properties set: 2

Labels added: 1

Return created node

Creating a single node is done by issuing the following query.

Query

CREATE (a { name : 'Andres' })

RETURN a

The newly created node is returned.

Result

Node[0]{name:"Andres"}

1 row

Nodes created: 1

Properties set: 1

Create relationships

Create a relationship between two nodes

To create a relationship between two nodes, we first get the two nodes. Once the nodes are loaded, we

simply create a relationship between them.

Query

MATCH (a:Person),(b:Person)

WHERE a.name = 'Node A' AND b.name = 'Node B'

CREATE (a)-[r:RELTYPE]->(b)

RETURN r

The created relationship is returned by the query.

Result

:RELTYPE[0]{}

1 row

Relationships created: 1

Writing Clauses

189

Create a relationship and set properties

Setting properties on relationships is done in a similar manner to how it’s done when creating nodes.

Note that the values can be any expression.

Query

MATCH (a:Person),(b:Person)

WHERE a.name = 'Node A' AND b.name = 'Node B'

CREATE (a)-[r:RELTYPE { name : a.name + '<->' + b.name }]->(b)

RETURN r

The newly created relationship is returned by the example query.

Result

:RELTYPE[0]{name:"Node A<->Node B"}

1 row

Relationships created: 1

Properties set: 1

Create a full path

When you use CREATE and a pattern, all parts of the pattern that are not already in scope at this time will

be created.

Query

CREATE p =(andres { name:'Andres' })-[:WORKS_AT]->(neo)<-[:WORKS_AT]-(michael { name:'Michael' })

RETURN p

This query creates three nodes and two relationships in one go, assigns it to a path identifier, and

returns it.

Result

[Node[0]{name:"Andres"}, :WORKS_AT[0]{}, Node[1]{}, :WORKS_AT[1]{}, Node[2]{name:"Michael"}]

1 row

Nodes created: 3

Relationships created: 2

Properties set: 2

Use parameters with CREATE

Create node with a parameter for the properties

You can also create a graph entity from a map. All the key/value pairs in the map will be set as

properties on the created relationship or node. In this case we add a Person label to the node as well.

Parameters

{

"props" : {

"name" : "Andres",

"position" : "Developer"

}

Query

CREATE (n:Person { props })

RETURN n

Writing Clauses

190

Result

Node[0]{name:"Andres", position:"Developer"}

1 row

Nodes created: 1

Properties set: 2

Labels added: 1

Create multiple nodes with a parameter for their properties

By providing Cypher an array of maps, it will create a node for each map.

Parameters

{

"props" : [ {

"name" : "Andres",

"position" : "Developer"

}, {

"name" : "Michael",

"position" : "Developer"

} ]

}

Query

UNWIND { props } AS map

CREATE (n)

SET n = map

Result

(empty result)

Nodes created: 2

Properties set: 4

Create multiple nodes with a parameter for their properties using old syntax

By providing Cypher an array of maps, it will create a node for each map.

Note

When you do this, you can’t create anything else in the same CREATE clause.

Note

This syntax is deprecated in Neo4j version 2.3. It may be removed in a future major release.

See the above example using UNWIND for how to achieve the same functionality.

Parameters

{

"props" : [ {

"name" : "Andres",

"position" : "Developer"

}, {

"name" : "Michael",

"position" : "Developer"

} ]

}

Query

CREATE (n { props })

Writing Clauses

191

RETURN n

Result

Node[0]{name:"Andres", position:"Developer"}

Node[1]{name:"Michael", position:"Developer"}

2 rows

Nodes created: 2

Properties set: 4

Writing Clauses

192

12.2.Merge

The MERGE clause ensures that a pattern exists in the graph. Either the pattern already

exists, or it needs to be created.

Introduction

MERGE either matches existing nodes and binds them, or it creates new data and binds that. It’s like a

combination of MATCH and CREATE that additionally allows you to specify what happens if the data was

matched or created.

For example, you can specify that the graph must contain a node for a user with a certain name. If

there isn’t a node with the correct name, a new node will be created and its name property set.

When using MERGE on full patterns, the behavior is that either the whole pattern matches, or the whole

pattern is created. MERGE will not partially use existing patterns — it’s all or nothing. If partial matches are

needed, this can be accomplished by splitting a pattern up into multiple MERGE clauses.

As with MATCH, MERGE can match multiple occurrences of a pattern. If there are multiple matches, they will

all be passed on to later stages of the query.

The last part of MERGE is the ON CREATE and ON MATCH. These allow a query to express additional changes to

the properties of a node or relationship, depending on if the element was MATCHed in the database or if

it was CREATEd.

The rule planner (see Section15.1, “How are queries executed?” [254]) expands a MERGE pattern from

the end point that has the identifier with the lowest lexicographical order. This means that it might

choose a suboptimal expansion path, expanding from a node with a higher degree. The pattern MERGE

(a:A)-[:R]->(b:B) will always expand from a to b, so if it is known that b nodes are a better choice for

start point, renaming identifiers could improve performance.

The following graph is used for the examples below:

Figure12.1.Graph

Person

chauffeurNam e = 'Bill White'

name = 'Oliver Stone'

bornIn = 'New York'

Movie

name = 'WallStreet'

title = 'Wall Street '

DIRECTED

Person

chauffeurNam e = 'John Brown'

name = 'Charlie Sheen'

bornIn = 'New York'

ACTED_IN

Person

chauffeurNam e = 'Bob Brown'

name = 'Martin Sheen'

bornIn = 'Ohio'

FATHER

ACTED_IN

Movie

title = 'The American President'

name = 'TheAm ericanPresident'

ACTED_IN

Person

chauffeurNam e = 'Ted Green'

name = 'Rob Reiner'

bornIn = 'New York'

DIRECTED

Person

bornIn = 'New Jersey'

chauffeurNam e = 'John Brown'

name = 'Michael Douglas'

ACTED_IN ACTED_IN

Merge nodes

Merge single node with a label

Merging a single node with a given label.

Query

MERGE (robert:Critic)

RETURN robert, labels(robert)

A new node is created because there are no nodes labeled Critic in the database.

Writing Clauses

193

Result

robert labels(robert)

Node[7]{} ["Critic"]

1 row

Nodes created: 1

Labels added: 1

Merge single node with properties

Merging a single node with properties where not all properties match any existing node.

Query

MERGE (charlie { name:'Charlie Sheen', age:10 })

RETURN charlie

A new node with the name Charlie Sheen will be created since not all properties matched the existing

Charlie Sheen node.

Result

charlie

Node[7]{name:"Charlie Sheen", age:10}

1 row

Nodes created: 1

Properties set: 2

Merge single node specifying both label and property

Merging a single node with both label and property matching an existing node.

Query

MERGE (michael:Person { name:'Michael Douglas' })

RETURN michael.name, michael.bornIn

Michael Douglas will be matched and the name and bornIn properties returned.

Result

michael.name michael.bornIn

"Michael Douglas" "New Jersey"

1 row

Merge single node derived from an existing node property

For some property p in each bound node in a set of nodes, a single new node is created for each

unique value for p.

Query

MATCH (person:Person)

MERGE (city:City { name: person.bornIn })

RETURN person.name, person.bornIn, city

Three nodes labeled City are created, each of which contains a name property with the value of New

York, Ohio, and New Jersey, respectively. Note that even though the MATCH clause results in three bound

nodes having the value New York for the bornIn property, only a single New York node (i.e. a City node

with a name of New York) is created. As the New York node is not matched for the first bound node, it

is created. However, the newly-created New York node is matched and bound for the second and third

bound nodes.

Writing Clauses

194

Result

person.name person.bornIn city

"Oliver Stone" "New York" Node[7]{name:"New York"}

"Charlie Sheen" "New York" Node[7]{name:"New York"}

"Martin Sheen" "Ohio" Node[8]{name:"Ohio"}

"Rob Reiner" "New York" Node[7]{name:"New York"}

"Michael Douglas" "New Jersey" Node[9]{name:"New Jersey"}

5 rows

Nodes created: 3

Properties set: 3

Labels added: 3

Use ON CREATE and ON MATCH

Merge with ON CREATE

Merge a node and set properties if the node needs to be created.

Query

MERGE (keanu:Person { name:'Keanu Reeves' })

ON CREATE SET keanu.created = timestamp()

RETURN keanu.name, keanu.created

The query creates the keanu node and sets a timestamp on creation time.

Result

keanu.name keanu.created

"Keanu Reeves" 1512735402884

1 row

Nodes created: 1

Properties set: 2

Labels added: 1

Merge with ON MATCH

Merging nodes and setting properties on found nodes.

Query

MERGE (person:Person)

ON MATCH SET person.found = TRUE RETURN person.name, person.found

The query finds all the Person nodes, sets a property on them, and returns them.

Result

person.name person.found

"Oliver Stone" true

"Charlie Sheen" true

"Martin Sheen" true

"Rob Reiner" true

"Michael Douglas" true

5 rows

Properties set: 5

Merge with ON CREATE and ON MATCH

Merge a node and set properties if the node needs to be created.

Query

Writing Clauses

195

MERGE (keanu:Person { name:'Keanu Reeves' })

ON CREATE SET keanu.created = timestamp()

ON MATCH SET keanu.lastSeen = timestamp()

RETURN keanu.name, keanu.created, keanu.lastSeen

The query creates the keanu node, and sets a timestamp on creation time. If keanu had already existed,

a different property would have been set.

Result

keanu.name keanu.created keanu.lastSeen

"Keanu Reeves" 1512735405486 <null>

1 row

Nodes created: 1

Properties set: 2

Labels added: 1

Merge with ON MATCH setting multiple properties

If multiple properties should be set, simply separate them with commas.

Query

MERGE (person:Person)

ON MATCH SET person.found = TRUE , person.lastAccessed = timestamp()

RETURN person.name, person.found, person.lastAccessed

Result

person.name person.found person.lastAccessed

"Oliver Stone" true 1512735404553

"Charlie Sheen" true 1512735404553

"Martin Sheen" true 1512735404553

"Rob Reiner" true 1512735404553

"Michael Douglas" true 1512735404553

5 rows

Properties set: 10

Merge relationships

Merge on a relationship

MERGE can be used to match or create a relationship.

Query

MATCH (charlie:Person { name:'Charlie Sheen' }),(wallStreet:Movie { title:'Wall Street' })

MERGE (charlie)-[r:ACTED_IN]->(wallStreet)

RETURN charlie.name, type(r), wallStreet.title

Charlie Sheen had already been marked as acting in Wall Street, so the existing relationship is found and

returned. Note that in order to match or create a relationship when using MERGE, at least one bound

node must be specified, which is done via the MATCH clause in the above example.

Result

charlie.name type(r) wallStreet.title

"Charlie Sheen" "ACTED_IN" "Wall Street"

1 row

Merge on multiple relationships

When MERGE is used on a whole pattern, either everything matches, or everything is created.

Query

Writing Clauses

196

MATCH (oliver:Person { name:'Oliver Stone' }),(reiner:Person { name:'Rob Reiner' })

MERGE (oliver)-[:DIRECTED]->(movie:Movie)<-[:ACTED_IN]-(reiner)

RETURN movie

In our example graph, Oliver Stone and Rob Reiner have never worked together. When we try to MERGE a

movie between them, Neo4j will not use any of the existing movies already connected to either person.

Instead, a new movie node is created.

Result

movie

Node[7]{}

1 row

Nodes created: 1

Relationships created: 2

Labels added: 1

Merge on an undirected relationship

MERGE can also be used with an undirected relationship. When it needs to create a new one, it will pick a

direction.

Query

MATCH (charlie:Person { name:'Charlie Sheen' }),(oliver:Person { name:'Oliver Stone' })

MERGE (charlie)-[r:KNOWS]-(oliver)

RETURN r

As Charlie Sheen and Oliver Stone do not know each other, this MERGE query will create a :KNOWS

relationship between them. The direction of the created relationship is arbitrary.

Result

:KNOWS[8]{}

1 row

Relationships created: 1

Merge on a relationship between two existing nodes

MERGE can be used in conjunction with preceding MATCH and MERGE clauses to create a relationship

between two bound nodes m and n, where m is returned by MATCH and n is created or matched by the

earlier MERGE.

Query

MATCH (person:Person)

MERGE (city:City { name: person.bornIn })

MERGE (person)-[r:BORN_IN]->(city)

RETURN person.name, person.bornIn, city

This builds on the example from the section called “Merge single node derived from an existing node

property” [193]. The second MERGE creates a BORN_IN relationship between each person and a city

corresponding to the value of the person’s bornIn property. Charlie Sheen, Rob Reiner and Oliver Stone all

have a BORN_IN relationship to the same City node (New York).

Result

person.name person.bornIn city

"Oliver Stone" "New York" Node[7]{name:"New York"}

5 rows

Nodes created: 3

Relationships created: 5

Properties set: 3

Labels added: 3

Writing Clauses

197

person.name person.bornIn city

"Charlie Sheen" "New York" Node[7]{name:"New York"}

"Martin Sheen" "Ohio" Node[8]{name:"Ohio"}

"Rob Reiner" "New York" Node[7]{name:"New York"}

"Michael Douglas" "New Jersey" Node[9]{name:"New Jersey"}

5 rows

Nodes created: 3

Relationships created: 5

Properties set: 3

Labels added: 3

Merge on a relationship between an existing node and a merged node derived from a

node property

MERGE can be used to simultaneously create both a new node n and a relationship between a bound

node m and n.

Query

MATCH (person:Person)

MERGE (person)-[r:HAS_CHAUFFEUR]->(chauffeur:Chauffeur { name: person.chauffeurName })

RETURN person.name, person.chauffeurName, chauffeur

As MERGE found no matches — in our example graph, there are no nodes labeled with Chauffeur and no

HAS_CHAUFFEUR relationships — MERGE creates five nodes labeled with Chauffeur, each of which contains

a name property whose value corresponds to each matched Person node’s chauffeurName property

value. MERGE also creates a HAS_CHAUFFEUR relationship between each Person node and the newly-created

corresponding Chauffeur node. As Charlie Sheen and Michael Douglas both have a chauffeur with the

same name — John Brown — a new node is created in each case, resulting in two Chauffeur nodes having

a name of John Brown, correctly denoting the fact that even though the name property may be identical,

these are two separate people. This is in contrast to the example shown above in the section called

“Merge on a relationship between two existing nodes” [196], where we used the first MERGE to bind the

City nodes to prevent them from being recreated (and thus duplicated) in the second MERGE.

Result

person.name person.chauffeurName chauffeur

"Oliver Stone" "Bill White" Node[7]{name:"Bill White"}

"Charlie Sheen" "John Brown" Node[8]{name:"John Brown"}

"Martin Sheen" "Bob Brown" Node[9]{name:"Bob Brown"}

"Rob Reiner" "Ted Green" Node[10]{name:"Ted Green"}

"Michael Douglas" "John Brown" Node[11]{name:"John Brown"}

5 rows

Nodes created: 5

Relationships created: 5

Properties set: 5

Labels added: 5

Using unique constraints with MERGE

Cypher prevents getting conflicting results from MERGE when using patterns that involve uniqueness

constrains. In this case, there must be at most one node that matches that pattern.

For example, given two uniqueness constraints on :Person(id) and :Person(ssn): then a query such as

MERGE (n:Person {id: 12, ssn: 437}) will fail, if there are two different nodes (one with id 12 and one

with ssn 437) or if there is only one node with only one of the properties. In other words, there must be

exactly one node that matches the pattern, or no matching nodes.

Note that the following examples assume the existence of uniqueness constraints that have been

created using:

Writing Clauses

198

CREATE CONSTRAINT ON (n:Person) ASSERT n.name IS UNIQUE;

CREATE CONSTRAINT ON (n:Person) ASSERT n.role IS UNIQUE;

Merge using unique constraints creates a new node if no node is found

Merge using unique constraints creates a new node if no node is found.

Query

MERGE (laurence:Person { name: 'Laurence Fishburne' })

RETURN laurence.name

The query creates the laurence node. If laurence had already existed, MERGE would just match the existing

node.

Result

laurence.name

"Laurence Fishburne"

1 row

Nodes created: 1

Properties set: 1

Labels added: 1

Merge using unique constraints matches an existing node

Merge using unique constraints matches an existing node.

Query

MERGE (oliver:Person { name:'Oliver Stone' })

RETURN oliver.name, oliver.bornIn

The oliver node already exists, so MERGE just matches it.

Result

oliver.name oliver.bornIn

"Oliver Stone" "New York"

1 row

Merge with unique constraints and partial matches

Merge using unique constraints fails when finding partial matches.

Query

MERGE (michael:Person { name:'Michael Douglas', role:'Gordon Gekko' })

RETURN michael

While there is a matching unique michael node with the name Michael Douglas, there is no unique node

with the role of Gordon Gekko and MERGE fails to match.

Error message

Merge did not find a matching node and can not create a new node due to conflicts

with both existing and missing unique nodes. The conflicting constraints are on:

:Person.name and :Person.role

Merge with unique constraints and conflicting matches

Merge using unique constraints fails when finding conflicting matches.

Query

MERGE (oliver:Person { name:'Oliver Stone', role:'Gordon Gekko' })

RETURN oliver

Writing Clauses

199

While there is a matching unique oliver node with the name Oliver Stone, there is also another unique

node with the role of Gordon Gekko and MERGE fails to match.

Error message

Merge did not find a matching node and can not create a new node due to conflicts

with both existing and missing unique nodes. The conflicting constraints are on:

:Person.name and :Person.role

Using map parameters with MERGE

MERGE does not support map parameters like for example CREATE does. To use map parameters with

MERGE, it is necessary to explicitly use the expected properties, like in the following example. For more

information on parameters, see Section8.5, “Parameters” [113].

Parameters

{

"param" : {

"name" : "Keanu Reeves",

"role" : "Neo"

}

Query

MERGE (person:Person { name: { param }.name, role: { param }.role })

RETURN person.name, person.role

Result

person.name person.role

"Keanu Reeves" "Neo"

1 row

Nodes created: 1

Properties set: 2

Labels added: 1

Writing Clauses

200

12.3.Set

The SET clause is used to update labels on nodes and properties on nodes and

relationships.

SET can also be used with maps from parameters to set properties.

Note

Setting labels on a node is an idempotent operations — if you try to set a label on a node

that already has that label on it, nothing happens. The query statistics will tell you if

something needed to be done or not.

The examples use this graph as a starting point:

nam e = 'Emil'

nam e = 'Peter'

age = 34

KNOWS

nam e = 'Stefan'

Swedish

nam e = 'Andres'

age = 36

hungry = true

KNOWS

Set a property

To set a property on a node or relationship, use SET.

Query

MATCH (n { name: 'Andres' })

SET n.surname = 'Taylor'

RETURN n

The newly changed node is returned by the query.

Result

Node[3]{surname:"Taylor", name:"Andres", age:36, hungry:true}

1 row

Properties set: 1

Remove a property

Normally you remove a property by using REMOVE, but it’s sometimes handy to do it using the SET

command. One example is if the property comes from a parameter.

Query

Writing Clauses

201

MATCH (n { name: 'Andres' })

SET n.name = NULL RETURN n

The node is returned by the query, and the name property is now missing.

Result

Node[3]{hungry:true, age:36}

1 row

Properties set: 1

Copying properties between nodes and relationships

You can also use SET to copy all properties from one graph element to another. Remember that doing

this will remove all other properties on the receiving graph element.

Query

MATCH (at { name: 'Andres' }),(pn { name: 'Peter' })

SET at = pn

RETURN at, pn

The Andres node has had all it’s properties replaced by the properties in the Peter node.

Result

at pn

Node[3]{name:"Peter", age:34} Node[2]{name:"Peter", age:34}

1 row

Properties set: 3

Adding properties from maps

When setting properties from a map (literal, paremeter, or graph element), you can use the += form of

SET to only add properties, and not remove any of the existing properties on the graph element.

Query

MATCH (peter { name: 'Peter' })

SET peter += { hungry: TRUE , position: 'Entrepreneur' }

Result

(empty result)

Properties set: 2

Set a property using a parameter

Use a parameter to give the value of a property.

Parameters

{

"surname" : "Taylor"

}

Query

MATCH (n { name: 'Andres' })

SET n.surname = { surname }

RETURN n

The Andres node has got an surname added.

Writing Clauses

202

Result

Node[3]{surname:"Taylor", name:"Andres", age:36, hungry:true}

1 row

Properties set: 1

Set all properties using a parameter

This will replace all existing properties on the node with the new set provided by the parameter.

Parameters

{

"props" : {

"name" : "Andres",

"position" : "Developer"

}

Query

MATCH (n { name: 'Andres' })

SET n = { props }

RETURN n

The Andres node has had all it’s properties replaced by the properties in the props parameter.

Result

Node[3]{name:"Andres", position:"Developer"}

1 row

Properties set: 4

Set multiple properties using one SET clause

If you want to set multiple properties in one go, simply separate them with a comma.

Query

MATCH (n { name: 'Andres' })

SET n.position = 'Developer', n.surname = 'Taylor'

Result

(empty result)

Properties set: 2

Set a label on a node

To set a label on a node, use SET.

Query

MATCH (n { name: 'Stefan' })

SET n :German

RETURN n

The newly labeled node is returned by the query.

Result

Node[1]{name:"Stefan"}

1 row

Labels added: 1

Writing Clauses

203

Set multiple labels on a node

To set multiple labels on a node, use SET and separate the different labels using :.

Query

MATCH (n { name: 'Emil' })

SET n :Swedish:Bossman

RETURN n

The newly labeled node is returned by the query.

Result

Node[0]{name:"Emil"}

1 row

Labels added: 2

Writing Clauses

204

12.4.Delete

The DELETE clause is used to delete graph elements — nodes, relationships or paths.

For removing properties and labels, see Section12.5, “Remove” [205]. Remember that you can not

delete a node without also deleting relationships that start or end on said node. Either explicitly delete

the relationships, or use DETACH DELETE.

The examples start out with the following database:

nam e = 'Tobias'

age = 25 nam e = 'Peter'

age = 34

nam e = 'Andres'

age = 36

KNOWS KNOWS

Delete single node

To delete a node, use the DELETE clause.

Query

MATCH (n:Useless)

DELETE n

Result

(empty result)

Nodes deleted: 1

Delete all nodes and relationships

This query isn’t for deleting large amounts of data, but is nice when playing around with small example

data sets.

Query

MATCH (n)

DETACH DELETE n

Result

(empty result)

Nodes deleted: 3

Relationships deleted: 2

Delete a node with all its relationships

When you want to delete a node and any relationship going to or from it, use DETACH DELETE.

Query

MATCH (n { name:'Andres' })

DETACH DELETE n

Result

(empty result)

Nodes deleted: 1

Relationships deleted: 2

Writing Clauses

205

12.5.Remove

The REMOVE clause is used to remove properties and labels from graph elements.

For deleting nodes and relationships, see Section12.4, “Delete” [204].

Note

Removing labels from a node is an idempotent operation: If you try to remove a label from a

node that does not have that label on it, nothing happens. The query statistics will tell you if

something needed to be done or not.

The examples start out with the following database:

Swedish

nam e = 'Tobias'

age = 25

Swedish, Germ an

nam e = 'Peter'

age = 34

Swedish

nam e = 'Andres'

age = 36

KNOWS KNOWS

Remove a property

Neo4j doesn’t allow storing null in properties. Instead, if no value exists, the property is just not there.

So, to remove a property value on a node or a relationship, is also done with REMOVE.

Query

MATCH (andres { name: 'Andres' })

REMOVE andres.age

RETURN andres

The node is returned, and no property age exists on it.

Result

andres

Node[2]{name:"Andres"}

1 row

Properties set: 1

Remove a label from a node

To remove labels, you use REMOVE.

Query

MATCH (n { name: 'Peter' })

REMOVE n:German

RETURN n

Result

Node[1]{name:"Peter", age:34}

1 row

Labels removed: 1

Writing Clauses

206

Removing multiple labels

To remove multiple labels, you use REMOVE.

Query

MATCH (n { name: 'Peter' })

REMOVE n:German:Swedish

RETURN n

Result

Node[1]{name:"Peter", age:34}

1 row

Labels removed: 2

Writing Clauses

207

12.6.Foreach

The FOREACH clause is used to update data within a collection, whether components of a

path, or result of aggregation.

Collections and paths are key concepts in Cypher. To use them for updating data, you can use the

FOREACH construct. It allows you to do updating commands on elements in a collection — a path, or a

collection created by aggregation.

The identifier context inside of the foreach parenthesis is separate from the one outside it. This means

that if you CREATE a node identifier inside of a FOREACH, you will not be able to use it outside of the

foreach statement, unless you match to find it.

Inside of the FOREACH parentheses, you can do any of the updating commands — CREATE, CREATE UNIQUE,

MERGE, DELETE, and FOREACH.

If you want to execute an additional MATCH for each element in a collection then UNWIND (see Section10.6,

“Unwind” [148]) would be a more appropriate command.

Figure12.2.Data for the examples

nam e = 'D'

nam e = 'A'

nam e = 'B'

KNOWS

nam e = 'C'

KNOWS

Mark all nodes along a path

This query will set the property marked to true on all nodes along a path.

Query

MATCH p =(begin)-[*]->(END )

WHERE begin.name='A' AND END .name='D'

FOREACH (n IN nodes(p)| SET n.marked = TRUE )

Nothing is returned from this query, but four properties are set.

Result

(empty result)

Properties set: 4

Writing Clauses

208

12.7.Create Unique

The CREATE UNIQUE clause is a mix of MATCH and CREATE — it will match what it can, and create

what is missing.

Introduction

Tip

MERGE might be what you want to use instead of CREATE UNIQUE. Note however, that MERGE

doesn’t give as strong guarantees for relationships being unique.

CREATE UNIQUE is in the middle of MATCH and CREATE — it will match what it can, and create what is missing.

CREATE UNIQUE will always make the least change possible to the graph — if it can use parts of the

existing graph, it will.

Another difference to MATCH is that CREATE UNIQUE assumes the pattern to be unique. If multiple matching

subgraphs are found an error will be generated.

Tip

In the CREATE UNIQUE clause, patterns are used a lot. Read Section9.6, “Patterns” [127] for an

introduction.

The examples start out with the following data set:

nam e = 'A'

nam e = 'C'

KNOWS

nam e = 'root'

Xnam e = 'B'

Create unique nodes

Create node if missing

If the pattern described needs a node, and it can’t be matched, a new node will be created.

Query

MATCH (root { name: 'root' })

CREATE UNIQUE (root)-[:LOVES]-(someone)

RETURN someone

The root node doesn’t have any LOVES relationships, and so a node is created, and also a relationship to

that node.

Writing Clauses

209

Result

someone

Node[4]{}

1 row

Nodes created: 1

Relationships created: 1

Create nodes with values

The pattern described can also contain values on the node. These are given using the following syntax:

prop : <expression>.

Query

MATCH (root { name: 'root' })

CREATE UNIQUE (root)-[:X]-(leaf { name:'D' })

RETURN leaf

No node connected with the root node has the name D, and so a new node is created to match the

pattern.

Result

leaf

Node[4]{name:"D"}

1 row

Nodes created: 1

Relationships created: 1

Properties set: 1

Create labeled node if missing

If the pattern described needs a labeled node and there is none with the given labels, Cypher will create

a new one.

Query

MATCH (a { name: 'A' })

CREATE UNIQUE (a)-[:KNOWS]-(c:blue)

RETURN c

The A node is connected in a KNOWS relationship to the c node, but since C doesn’t have the :blue label, a

new node labeled as :blue is created along with a KNOWS relationship from A to it.

Result

Node[4]{}

1 row

Nodes created: 1

Relationships created: 1

Labels added: 1

Create unique relationships

Create relationship if it is missing

CREATE UNIQUE is used to describe the pattern that should be found or created.

Query

MATCH (lft { name: 'A' }),(rgt)

WHERE rgt.name IN ['B', 'C']

CREATE UNIQUE (lft)-[r:KNOWS]->(rgt)

Writing Clauses

210

RETURN r

The left node is matched agains the two right nodes. One relationship already exists and can be

matched, and the other relationship is created before it is returned.

Result

:KNOWS[4]{}

:KNOWS[3]{}

2 rows

Relationships created: 1

Create relationship with values

Relationships to be created can also be matched on values.

Query

MATCH (root { name: 'root' })

CREATE UNIQUE (root)-[r:X { since:'forever' }]-()

RETURN r

In this example, we want the relationship to have a value, and since no such relationship can be found,

a new node and relationship are created. Note that since we are not interested in the created node, we

don’t name it.

Result

:X[4]{since:"forever"}

1 row

Nodes created: 1

Relationships created: 1

Properties set: 1

Describe complex pattern

The pattern described by CREATE UNIQUE can be separated by commas, just like in MATCH and CREATE.

Query

MATCH (root { name: 'root' })

CREATE UNIQUE (root)-[:FOO]->(x),(root)-[:BAR]->(x)

RETURN x

This example pattern uses two paths, separated by a comma.

Result

Node[4]{}

1 row

Nodes created: 1

Relationships created: 2

Writing Clauses

211

12.8.Importing CSV files with Cypher

This tutorial will show you how to import data from CSV files using LOAD CSV.

In this example, we’re given three CSV files: a list of persons, a list of movies, and a list of which role was

played by some of these persons in each movie.

CSV files can be stored on the database server and are then accessible using a file:// URL.

Alternatively, LOAD CSV also supports accessing CSV files via HTTPS, HTTP, and FTP. LOAD CSV will follow HTTP

redirects but for security reasons it will not follow redirects that changes the protocol, for example if

the redirect is going from HTTPS to HTTP.

For more details, see Section11.6, “Load CSV” [182].

Using the following Cypher queries, we’ll create a node for each person, a node for each movie and a

relationship between the two with a property denoting the role. We’re also keeping track of the country

in which each movie was made.

Let’s start with importing the persons:

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/import/persons.csv" AS csvLine

CREATE (p:Person { id: toInt(csvLine.id), name: csvLine.name })

The CSV file we’re using looks like this:

persons.csv

id,name

1,Charlie Sheen

2,Oliver Stone

3,Michael Douglas

4,Martin Sheen

5,Morgan Freeman

Now, let’s import the movies. This time, we’re also creating a relationship to the country in which the

movie was made. If you are storing your data in a SQL database, this is the one-to-many relationship

type.

We’re using MERGE to create nodes that represent countries. Using MERGE avoids creating duplicate

country nodes in the case where multiple movies have been made in the same country.

Important

When using MERGE or MATCH with LOAD CSV we need to make sure we have an index

(see Section14.1, “Indexes” [244]) or a unique constraint (see Section14.2,

“Constraints” [247]) on the property we’re merging. This will ensure the query executes in

a performant way.

Before running our query to connect movies and countries we’ll create an index for the name property

on the Country label to ensure the query runs as fast as it can:

CREATE INDEX ON :Country(name)

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/import/movies.csv" AS csvLine

MERGE (country:Country { name: csvLine.country })

CREATE (movie:Movie { id: toInt(csvLine.id), title: csvLine.title, year:toInt(csvLine.year)})

CREATE (movie)-[:MADE_IN]->(country)

movies.csv

id,title,country,year

1,Wall Street,USA,1987

2,The American President,USA,1995

Writing Clauses

212

3,The Shawshank Redemption,USA,1994

Lastly, we create the relationships between the persons and the movies. Since the relationship is a

many to many relationship, one actor can participate in many movies, and one movie has many actors

in it. We have this data in a separate file.

We’ll index the id property on Person and Movie nodes. The id property is a temporary property used

to look up the appropriate nodes for a relationship when importing the third file. By indexing the id

property, node lookup (e.g. by MATCH) will be much faster. Since we expect the ids to be unique in each

set, we’ll create a unique constraint. This protects us from invalid data since constraint creation will

fail if there are multiple nodes with the same id property. Creating a unique constraint also creates a

unique index (which is faster than a regular index).

CREATE CONSTRAINT ON (person:Person) ASSERT person.id IS UNIQUE

CREATE CONSTRAINT ON (movie:Movie) ASSERT movie.id IS UNIQUE

Now importing the relationships is a matter of finding the nodes and then creating relationships

between them.

For this query we’ll use USING PERIODIC COMMIT (see Section12.9, “Using Periodic Commit” [213]) which

is helpful for queries that operate on large CSV files. This hint tells Neo4j that the query might build up

inordinate amounts of transaction state, and so needs to be periodically committed. In this case we

also set the limit to 500 rows per commit.

USING PERIODIC COMMIT 500

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/import/roles.csv" AS csvLine

MATCH (person:Person { id: toInt(csvLine.personId)}),(movie:Movie { id: toInt(csvLine.movieId)})

CREATE (person)-[:PLAYED { role: csvLine.role }]->(movie)

roles.csv

personId,movieId,role

1,1,Bud Fox

4,1,Carl Fox

3,1,Gordon Gekko

4,2,A.J. MacInerney

3,2,President Andrew Shepherd

5,3,Ellis Boyd 'Red' Redding

Finally, as the id property was only necessary to import the relationships, we can drop the constraints

and the id property from all movie and person nodes.

DROP CONSTRAINT ON (person:Person) ASSERT person.id IS UNIQUE

DROP CONSTRAINT ON (movie:Movie) ASSERT movie.id IS UNIQUE

MATCH (n)

WHERE n:Person OR n:Movie

REMOVE n.id

Writing Clauses

213

12.9.Using Periodic Commit

Note

See Section12.8, “Importing CSV files with Cypher” [211] on how to import data from CSV

files.

Importing large amounts of data using LOAD CSV with a single Cypher query may fail due to memory

constraints. This will manifest itself as an OutOfMemoryError.

For this situation only, Cypher provides the global USING PERIODIC COMMIT query hint for updating

queries using LOAD CSV. You can optionally set the limit for the number of rows per commit like so: USING

PERIODIC COMMIT 500.

PERIODIC COMMIT will process the rows until the number of rows reaches a limit. Then the current

transaction will be committed and replaced with a newly opened transaction. If no limit is set, a default

value will be used.

See the section called “Importing large amounts of data” [184] in Section11.6, “Load CSV” [182] for

examples of USING PERIODIC COMMIT with and without setting the number of rows per commit.

Important

Using periodic commit will prevent running out of memory when importing large amounts

of data. However, it will also break transactional isolation and thus it should only be used

where needed.

214

Chapter13.Functions

This chapter contains information on all functions in Cypher. Note that related information exists in

Section9.4, “Operators” [124].

Note

Most functions in Cypher will return NULL if an input parameter is NULL.

Functions

215

13.1.Predicates

Predicates are boolean functions that return true or false for a given set of input. They are most

commonly used to filter out subgraphs in the WHERE part of a query.

See also the section called “Comparison operators” [124].

Figure13.1.Graph

nam e = 'Daniel'

age = 54

eyes = 'brown'

Spouse

array = ['one', 'two', 'three']

nam e = 'Eskil'

age = 41

eyes = 'blue'

foo, bar

nam e = 'Alice'

age = 38

eyes = 'brown'

nam e = 'Charlie'

age = 53

eyes = 'green'

KNOWS

nam e = 'Bob'

age = 25

eyes = 'blue'

KNOWS

KNOWS KNOWS MARRIED

ALL

Tests whether a predicate holds for all elements of this collection.

Syntax: ALL(identifier in collection WHERE predicate)

Arguments:

•collection: An expression that returns a collection

•identifier: This is the identifier that can be used from the predicate.

•predicate: A predicate that is tested against all items in the collection.

Query

MATCH p=(a)-[*1..3]->(b)

WHERE a.name='Alice' AND b.name='Daniel' AND ALL (x IN nodes(p) WHERE x.age > 30)

RETURN p

All nodes in the returned paths will have an age property of at least 30.

Result

[Node[2]{name:"Alice", age:38, eyes:"brown"}, :KNOWS[1]{}, Node[4]{name:"Charlie", age:53,

eyes:"green"}, :KNOWS[3]{}, Node[0]{name:"Daniel", age:54, eyes:"brown"}]

1 row

ANY

Tests whether a predicate holds for at least one element in the collection.

Syntax: ANY(identifier in collection WHERE predicate)

Functions

216

Arguments:

•collection: An expression that returns a collection

•identifier: This is the identifier that can be used from the predicate.

•predicate: A predicate that is tested against all items in the collection.

Query

MATCH (a)

WHERE a.name='Eskil' AND ANY (x IN a.array WHERE x = "one")

RETURN a

All nodes in the returned paths has at least one one value set in the array property named array.

Result

Node[1]{array:["one", "two", "three"], name:"Eskil", age:41, eyes:"blue"}

1 row

NONE

Returns true if the predicate holds for no element in the collection.

Syntax: NONE(identifier in collection WHERE predicate)

Arguments:

•collection: An expression that returns a collection

•identifier: This is the identifier that can be used from the predicate.

•predicate: A predicate that is tested against all items in the collection.

Query

MATCH p=(n)-[*1..3]->(b)

WHERE n.name='Alice' AND NONE (x IN nodes(p) WHERE x.age = 25)

RETURN p

No nodes in the returned paths has a age property set to 25.

Result

[Node[2]{name:"Alice", age:38, eyes:"brown"}, :KNOWS[1]{}, Node[4]{name:"Charlie", age:53, eyes:"green"}]

[Node[2]{name:"Alice", age:38, eyes:"brown"}, :KNOWS[1]{}, Node[4]{name:"Charlie", age:53,

eyes:"green"}, :KNOWS[3]{}, Node[0]{name:"Daniel", age:54, eyes:"brown"}]

2 rows

SINGLE

Returns true if the predicate holds for exactly one of the elements in the collection.

Syntax: SINGLE(identifier in collection WHERE predicate)

Arguments:

•collection: An expression that returns a collection

•identifier: This is the identifier that can be used from the predicate.

•predicate: A predicate that is tested against all items in the collection.

Query

MATCH p=(n)-->(b)

Functions

217

WHERE n.name='Alice' AND SINGLE (var IN nodes(p) WHERE var.eyes = "blue")

RETURN p

Exactly one node in every returned path will have the eyes property set to "blue".

Result

[Node[2]{name:"Alice", age:38, eyes:"brown"}, :KNOWS[0]{}, Node[3]{name:"Bob", age:25, eyes:"blue"}]

1 row

EXISTS

Returns true if a match for the pattern exists in the graph, or the property exists in the node,

relationship or map.

Syntax: EXISTS( pattern-or-property )

Arguments:

•pattern-or-property: A pattern or a property (in the form identifier.prop).

Query

MATCH (n)

WHERE EXISTS(n.name)

RETURN n.name AS name, EXISTS((n)-[:MARRIED]->()) AS is_married

This query returns all the nodes with a name property along with a boolean true/false indicating if they

are married.

Result

name is_married

"Daniel" false

"Eskil" false

"Alice" false

"Bob" true

"Charlie" false

5 rows

Functions

218

13.2.Scalar functions

Scalar functions return a single value.

Important

The LENGTH and SIZE functions are quite similar, and so it is important to take note of the

difference. Due to backwards compatibility LENGTH currently works on four types: strings,

paths, collections and pattern expressions. However, for clarity it is recommended to only

use LENGTH on strings and paths, and use the new SIZE function on collections and pattern

expressions. LENGTH on those types may be deprecated in future.

Figure13.2.Graph

nam e = 'Daniel'

age = 54

eyes = 'brown'

Spouse

array = ['one', 'two', 'three']

nam e = 'Eskil'

age = 41

eyes = 'blue'

foo, bar

nam e = 'Alice'

age = 38

eyes = 'brown'

nam e = 'Charlie'

age = 53

eyes = 'green'

KNOWS

nam e = 'Bob'

age = 25

eyes = 'blue'

KNOWS

KNOWS KNOWS MARRIED

SIZE

To return or filter on the size of a collection, use the SIZE() function.

Syntax: SIZE( collection )

Arguments:

•collection: An expression that returns a collection

Query

RETURN size(['Alice', 'Bob']) AS col

The number of items in the collection is returned by the query.

Result

col

1 row

SIZE of pattern expression

This is the same SIZE() method described before, but instead of passing in a collection directly, you

provide a pattern expression that can be used in a match query to provide a new set of results. The size

of the result is calculated, not the length of the expression itself.

Functions

219

Syntax: SIZE( pattern expression )

Arguments:

•pattern expression: A pattern expression that returns a collection

Query

MATCH (a)

WHERE a.name='Alice'

RETURN size((a)-->()-->()) AS fof

The number of sub-graphs matching the pattern expression is returned by the query.

Result

fof

1 row

LENGTH

To return or filter on the length of a path, use the LENGTH() function.

Syntax: LENGTH( path )

Arguments:

•path: An expression that returns a path

Query

MATCH p=(a)-->(b)-->(c)

WHERE a.name='Alice'

RETURN length(p)

The length of the path p is returned by the query.

Result

length(p)

3 rows

LENGTH of string

To return or filter on the length of a string, use the LENGTH() function.

Syntax: LENGTH( string )

Arguments:

•string: An expression that returns a string

Query

MATCH (a)

WHERE length(a.name)> 6

RETURN length(a.name)

The length of the name Charlie is returned by the query.

Functions

220

Result

length(a.name)

1 row

TYPE

Returns a string representation of the relationship type.

Syntax: TYPE( relationship )

Arguments:

•relationship: A relationship.

Query

MATCH (n)-[r]->()

WHERE n.name='Alice'

RETURN type(r)

The relationship type of r is returned by the query.

Result

type(r)

"KNOWS"

2 rows

Returns the id of the relationship or node.

Syntax: ID( property-container )

Arguments:

•property-container: A node or a relationship.

Query

MATCH (a)

RETURN id(a)

This returns the node id for three nodes.

Result

id(a)

5 rows

COALESCE

Returns the first non-NULL value in the list of expressions passed to it. In case all arguments are NULL,

NULL will be returned.

Functions

221

Syntax: COALESCE( expression [, expression]* )

Arguments:

•expression: The expression that might return NULL.

Query

MATCH (a)

WHERE a.name='Alice'

RETURN coalesce(a.hairColor, a.eyes)

Result

coalesce(a.hairColor, a.eyes)

"brown"

1 row

HEAD

HEAD returns the first element in a collection.

Syntax: HEAD( expression )

Arguments:

•expression: This expression should return a collection of some kind.

Query

MATCH (a)

WHERE a.name='Eskil'

RETURN a.array, head(a.array)

The first node in the path is returned.

Result

a.array head(a.array)

["one", "two", "three"] "one"

1 row

LAST

LAST returns the last element in a collection.

Syntax: LAST( expression )

Arguments:

•expression: This expression should return a collection of some kind.

Query

MATCH (a)

WHERE a.name='Eskil'

RETURN a.array, last(a.array)

The last node in the path is returned.

Result

a.array last(a.array)

["one", "two", "three"] "three"

1 row

Functions

222

TIMESTAMP

TIMESTAMP returns the difference, measured in milliseconds, between the current time and midnight,

January 1, 1970 UTC. It will return the same value during the whole one query, even if the query is a

long running one.

Syntax: TIMESTAMP()

Arguments:

Query

RETURN timestamp()

The time in milliseconds is returned.

Result

timestamp()

1512735441499

1 row

STARTNODE

STARTNODE returns the starting node of a relationship

Syntax: STARTNODE( relationship )

Arguments:

•relationship: An expression that returns a relationship

Query

MATCH (x:foo)-[r]-()

RETURN startNode(r)

Result

startNode(r)

Node[2]{name:"Alice", age:38, eyes:"brown"}

2 rows

ENDNODE

ENDNODE returns the end node of a relationship

Syntax: ENDNODE( relationship )

Arguments:

•relationship: An expression that returns a relationship

Query

MATCH (x:foo)-[r]-()

RETURN endNode(r)

Result

endNode(r)

Node[4]{name:"Charlie", age:53, eyes:"green"}

Node[3]{name:"Bob", age:25, eyes:"blue"}

2 rows

Functions

223

TOINT

TOINT converts the argument to an integer. A string is parsed as if it was an integer number. If the

parsing fails, NULL will be returned. A floating point number will be cast into an integer.

Syntax: TOINT( expression )

Arguments:

•expression: An expression that returns anything

Query

RETURN toInt("42"), toInt("not a number")

Result

toInt("42") toInt("not a number")

42 <null>

1 row

TOFLOAT

TOFLOAT converts the argument to a float. A string is parsed as if it was an floating point number. If the

parsing fails, NULL will be returned. An integer will be cast to a floating point number.

Syntax: TOFLOAT( expression )

Arguments:

•expression: An expression that returns anything

Query

RETURN toFloat("11.5"), toFloat("not a number")

Result

toFloat("11.5") toFloat("not a number")

11. 5 <null>

1 row

Functions

224

13.3.Collection functions

Collection functions return collections of things — nodes in a path, and so on.

See also the section called “Collection operators” [124].

Figure13.3.Graph

nam e = 'Daniel'

age = 54

eyes = 'brown'

Spouse

array = ['one', 'two', 'three']

nam e = 'Eskil'

age = 41

eyes = 'blue'

foo, bar

nam e = 'Alice'

age = 38

eyes = 'brown'

nam e = 'Charlie'

age = 53

eyes = 'green'

KNOWS

nam e = 'Bob'

age = 25

eyes = 'blue'

KNOWS

KNOWS KNOWS MARRIED

NODES

Returns all nodes in a path.

Syntax: NODES( path )

Arguments:

•path: A path.

Query

MATCH p=(a)-->(b)-->(c)

WHERE a.name='Alice' AND c.name='Eskil'

RETURN nodes(p)

All the nodes in the path p are returned by the example query.

Result

nodes(p)

[Node[2]{name:"Alice", age:38, eyes:"brown"}, Node[3]{name:"Bob", age:25, eyes:"blue"}, Node[1]{array:

["one", "two", "three"], name:"Eskil", age:41, eyes:"blue"}]

1 row

RELATIONSHIPS

Returns all relationships in a path.

Syntax: RELATIONSHIPS( path )

Arguments:

•path: A path.

Functions

225

Query

MATCH p=(a)-->(b)-->(c)

WHERE a.name='Alice' AND c.name='Eskil'

RETURN relationships(p)

All the relationships in the path p are returned.

Result

relationships(p)

[:KNOWS[0]{}, :MARRIED[4]{}]

1 row

LABELS

Returns a collection of string representations for the labels attached to a node.

Syntax: LABELS( node )

Arguments:

•node: Any expression that returns a single node

Query

MATCH (a)

WHERE a.name='Alice'

RETURN labels(a)

The labels of n is returned by the query.

Result

labels(a)

["foo", "bar"]

1 row

KEYS

Returns a collection of string representations for the property names of a node, relationship, or map.

Syntax: KEYS( property-container )

Arguments:

•property-container: A node, a relationship, or a literal map.

Query

MATCH (a)

WHERE a.name='Alice'

RETURN keys(a)

The name of the properties of n is returned by the query.

Result

keys(a)

["name", "age", "eyes"]

1 row

EXTRACT

To return a single property, or the value of a function from a collection of nodes or relationships, you

can use EXTRACT. It will go through a collection, run an expression on every element, and return the

Functions

226

results in an collection with these values. It works like the map method in functional languages such as

Lisp and Scala.

Syntax: EXTRACT( identifier in collection | expression )

Arguments:

•collection: An expression that returns a collection

•identifier: The closure will have an identifier introduced in it’s context. Here you decide which

identifier to use.

•expression: This expression will run once per value in the collection, and produces the result

collection.

Query

MATCH p=(a)-->(b)-->(c)

WHERE a.name='Alice' AND b.name='Bob' AND c.name='Daniel'

RETURN extract(n IN nodes(p)| n.age) AS extracted

The age property of all nodes in the path are returned.

Result

extracted

[38, 25, 54]

1 row

FILTER

FILTER returns all the elements in a collection that comply to a predicate.

Syntax: FILTER(identifier in collection WHERE predicate)

Arguments:

•collection: An expression that returns a collection

•identifier: This is the identifier that can be used from the predicate.

•predicate: A predicate that is tested against all items in the collection.

Query

MATCH (a)

WHERE a.name='Eskil'

RETURN a.array, filter(x IN a.array WHERE size(x)= 3)

This returns the property named array and a list of values in it, which have size 3.

Result

a.array filter(x in a.array WHERE size(x) = 3)

["one", "two", "three"] ["one", "two"]

1 row

TAIL

TAIL returns all but the first element in a collection.

Syntax: TAIL( expression )

Arguments:

•expression: This expression should return a collection of some kind.

Query

Functions

227

MATCH (a)

WHERE a.name='Eskil'

RETURN a.array, tail(a.array)

This returns the property named array and all elements of that property except the first one.

Result

a.array tail(a.array)

["one", "two", "three"] ["two", "three"]

1 row

RANGE

Returns numerical values in a range with a non-zero step value step. Range is inclusive in both ends.

Syntax: RANGE( start, end [, step] )

Arguments:

•start: A numerical expression.

•end: A numerical expression.

•step: A numerical expression.

Query

RETURN range(0,10), range(2,18,3)

Two lists of numbers are returned.

Result

range(0,10) range(2,18,3)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [2, 5, 8, 11, 14, 17]

1 row

REDUCE

To run an expression against individual elements of a collection, and store the result of the expression

in an accumulator, you can use REDUCE. It will go through a collection, run an expression on every

element, storing the partial result in the accumulator. It works like the fold or reduce method in

functional languages such as Lisp and Scala.

Syntax: REDUCE( accumulator = initial, identifier in collection | expression )

Arguments:

•accumulator: An identifier that will hold the result and the partial results as the collection is iterated

•initial: An expression that runs once to give a starting value to the accumulator

•collection: An expression that returns a collection

•identifier: The closure will have an identifier introduced in it’s context. Here you decide which

identifier to use.

•expression: This expression will run once per value in the collection, and produces the result value.

Query

MATCH p=(a)-->(b)-->(c)

WHERE a.name='Alice' AND b.name='Bob' AND c.name='Daniel'

RETURN reduce(totalAge = 0, n IN nodes(p)| totalAge + n.age) AS reduction

The age property of all nodes in the path are summed and returned as a single value.

Functions

228

Result

reduction

117

1 row

Functions

229

13.4.Mathematical functions

These functions all operate on numerical expressions only, and will return an error if used on any other

values.

See also the section called “Mathematical operators” [124].

Figure13.4.Graph

nam e = 'Daniel'

age = 54

eyes = 'brown'

Spouse

array = ['one', 'two', 'three']

nam e = 'Eskil'

age = 41

eyes = 'blue'

foo, bar

nam e = 'Alice'

age = 38

eyes = 'brown'

nam e = 'Charlie'

age = 53

eyes = 'green'

KNOWS

nam e = 'Bob'

age = 25

eyes = 'blue'

KNOWS

KNOWS KNOWS MARRIED

ABS

ABS returns the absolute value of a number.

Syntax: ABS( expression )

Arguments:

•expression: A numeric expression.

Query

MATCH (a),(e)

WHERE a.name = 'Alice' AND e.name = 'Eskil'

RETURN a.age, e.age, abs(a.age - e.age)

The absolute value of the age difference is returned.

Result

a.age e.age abs(a.age - e.age)

38 41 3. 0

1 row

ACOS

ACOS returns the arccosine of the expression, in radians.

Syntax: ACOS( expression )

Arguments:

•expression: A numeric expression.

Functions

230

Query

RETURN acos(0.5)

The arccosine of 0.5.

Result

acos(0.5)

1. 0471975511965979

1 row

ASIN

ASIN returns the arcsine of the expression, in radians.

Syntax: ASIN( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN asin(0.5)

The arcsine of 0.5.

Result

asin(0.5)

0. 5235987755982989

1 row

ATAN

ATAN returns the arctangent of the expression, in radians.

Syntax: ATAN( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN atan(0.5)

The arctangent of 0.5.

Result

atan(0.5)

0. 4636476090008061

1 row

ATAN2

ATAN2 returns the arctangent2 of a set of coordinates, in radians.

Syntax: ATAN2( expression , expression)

Arguments:

•expression: A numeric expression for y.

Functions

231

•expression: A numeric expression for x.

Query

RETURN atan2(0.5, 0.6)

The arctangent2 of 0.5, 0.6.

Result

atan2(0.5, 0.6)

0. 6947382761967033

1 row

CEIL

CEIL returns the smallest integer greater than or equal to the number.

Syntax: CEIL( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN ceil(0.1)

The ceil of 0.1

Result

ceil(0.1)

1. 0

1 row

COS

COS returns the cosine of the expression.

Syntax: COS( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN cos(0.5)

The cosine of 0.5 is returned.

Result

cos(0.5)

0. 8775825618903728

1 row

COT

COT returns the cotangent of the expression.

Syntax: COT( expression )

Arguments:

Functions

232

•expression: A numeric expression.

Query

RETURN cot(0.5)

The cotangent of 0.5 is returned.

Result

cot(0.5)

1. 830487721712452

1 row

DEGREES

DEGREES converts radians to degrees.

Syntax: DEGREES( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN degrees(3.14159)

The number of degrees in something close to pi.

Result

degrees(3.14159)

179. 99984796050427

1 row

E returns the constant, e.

Syntax: E()

Arguments:

Query

RETURN e()

The constant e is returned (the base of natural log).

Result

e()

2. 718281828459045

1 row

EXP

EXP returns the value e raised to the power of the expression.

Syntax: EXP( expression )

Arguments:

•expression: A numeric expression.

Query

Functions

233

RETURN exp(2)

The exp of 2 is returned: e2.

Result

exp(2)

7. 38905609893065

1 row

FLOOR

FLOOR returns the greatest integer less than or equal to the expression.

Syntax: FLOOR( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN floor(0.9)

The floor of 0.9 is returned.

Result

floor(0.9)

0. 0

1 row

HAVERSIN

HAVERSIN returns half the versine of the expression.

Syntax: HAVERSIN( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN haversin(0.5)

The haversine of 0.5 is returned.

Result

haversin(0.5)

0. 06120871905481362

1 row

Spherical distance using the haversin function

The haversin function may be used to compute the distance on the surface of a sphere between two

points (each given by their latitude and longitude). In this example the spherical distance (in km)

between Berlin in Germany (at lat 52.5, lon 13.4) and San Mateo in California (at lat 37.5, lon -122.3) is

calculated using an average earth radius of 6371 km.

Query

CREATE (ber:City { lat: 52.5, lon: 13.4 }),(sm:City { lat: 37.5, lon: -122.3 })

RETURN 2 * 6371 * asin(sqrt(haversin(radians(sm.lat - ber.lat))+ cos(radians(sm.lat))*

Functions

234

cos(radians(ber.lat))* haversin(radians(sm.lon - ber.lon)))) AS dist

The distance between Berlin and San Mateo is returned (about 9129 km).

Result

dist

9129. 969740051658

1 row

Nodes created: 2

Properties set: 4

Labels added: 2

LOG

LOG returns the natural logarithm of the expression.

Syntax: LOG( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN log(27)

The log of 27 is returned.

Result

log(27)

3. 295836866004329

1 row

LOG10

LOG10 returns the base 10 logarithm of the expression.

Syntax: LOG10( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN log10(27)

The log10 of 27 is returned.

Result

log10(27)

1. 4313637641589874

1 row

PI returns the mathematical constant pi.

Syntax: PI()

Arguments:

Query

Functions

235

RETURN pi()

The constant pi is returned.

Result

pi()

3. 141592653589793

1 row

RADIANS

RADIANS converts degrees to radians.

Syntax: RADIANS( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN radians(180)

The number of radians in 180 is returned (pi).

Result

radians(180)

3. 141592653589793

1 row

RAND

RAND returns a random double between 0 and 1.0.

Syntax: RAND( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN rand() AS x1

A random number is returned.

Result

0. 38708134468955

1 row

ROUND

ROUND returns the numerical expression, rounded to the nearest integer.

Syntax: ROUND( expression )

Arguments:

•expression: A numerical expression.

Query

Functions

236

RETURN round(3.141592)

Result

round(3.141592)

3. 0

1 row

SIGN

SIGN returns the signum of a number — zero if the expression is zero, -1 for any negative number, and 1

for any positive number.

Syntax: SIGN( expression )

Arguments:

•expression: A numerical expression

Query

RETURN sign(-17), sign(0.1)

Result

sign(-17) sign(0.1)

-1. 0 1. 0

1 row

SIN

SIN returns the sine of the expression.

Syntax: SIN( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN sin(0.5)

The sine of 0.5 is returned.

Result

sin(0.5)

0. 479425538604203

1 row

SQRT

SQRT returns the square root of a number.

Syntax: SQRT( expression )

Arguments:

•expression: A numerical expression

Query

RETURN sqrt(256)

Functions

237

Result

sqrt(256)

16. 0

1 row

TAN

TAN returns the tangent of the expression.

Syntax: TAN( expression )

Arguments:

•expression: A numeric expression.

Query

RETURN tan(0.5)

The tangent of 0.5 is returned.

Result

tan(0.5)

0. 5463024898437905

1 row

Functions

238

13.5.String functions

These functions all operate on string expressions only, and will return an error if used on any other

values. The exception to this rule is TOSTRING(), which also accepts numbers.

See also the section called “String operators” [124].

Figure13.5.Graph

nam e = 'Daniel'

age = 54

eyes = 'brown'

Spouse

array = ['one', 'two', 'three']

nam e = 'Eskil'

age = 41

eyes = 'blue'

foo, bar

nam e = 'Alice'

age = 38

eyes = 'brown'

nam e = 'Charlie'

age = 53

eyes = 'green'

KNOWS

nam e = 'Bob'

age = 25

eyes = 'blue'

KNOWS

KNOWS KNOWS MARRIED

STR

STR returns a string representation of the expression. If the expression returns a string the result willbe

wrapped in quotation marks.

Syntax: STR( expression )

Arguments:

•expression: An expression that returns anything

Query

RETURN str(1), str("hello")

Result

str(1) str("hello")

"1" ""hello""

1 row

Note

The STR() function is deprecated from Neo4j version 2.3 and onwards. This means it may be

removed in a future Neo4j major release.

REPLACE

REPLACE returns a string with the search string replaced by the replace string. It replaces all occurrences.

Syntax: REPLACE( original, search, replace )

Arguments:

Functions

239

•original: An expression that returns a string

•search: An expression that returns a string to search for

•replace: An expression that returns the string to replace the search string with

Query

RETURN replace("hello", "l", "w")

Result

replace("hello", "l", "w")

"hewwo"

1 row

SUBSTRING

SUBSTRING returns a substring of the original, with a 0-based index start and length. If length is omitted,

it returns a substring from start until the end of the string.

Syntax: SUBSTRING( original, start [, length] )

Arguments:

•original: An expression that returns a string

•start: An expression that returns a positive number

•length: An expression that returns a positive number

Query

RETURN substring("hello", 1, 3), substring("hello", 2)

Result

substring("hello", 1, 3) substring("hello", 2)

"ell" "llo"

1 row

LEFT

LEFT returns a string containing the left n characters of the original string.

Syntax: LEFT( original, length )

Arguments:

•original: An expression that returns a string

•n: An expression that returns a positive number

Query

RETURN left("hello", 3)

Result

left("hello", 3)

"hel"

1 row

RIGHT

RIGHT returns a string containing the right n characters of the original string.

Syntax: RIGHT( original, length )

Functions

240

Arguments:

•original: An expression that returns a string

•n: An expression that returns a positive number

Query

RETURN right("hello", 3)

Result

right("hello", 3)

"llo"

1 row

LTRIM

LTRIM returns the original string with whitespace removed from the left side.

Syntax: LTRIM( original )

Arguments:

•original: An expression that returns a string

Query

RETURN ltrim(" hello")

Result

ltrim(" hello")

"hello"

1 row

RTRIM

RTRIM returns the original string with whitespace removed from the right side.

Syntax: RTRIM( original )

Arguments:

•original: An expression that returns a string

Query

RETURN rtrim("hello ")

Result

rtrim("hello ")

"hello"

1 row

TRIM

TRIM returns the original string with whitespace removed from both sides.

Syntax: TRIM( original )

Arguments:

•original: An expression that returns a string

Functions

241

Query

RETURN trim(" hello ")

Result

trim(" hello ")

"hello"

1 row

LOWER

LOWER returns the original string in lowercase.

Syntax: LOWER( original )

Arguments:

•original: An expression that returns a string

Query

RETURN lower("HELLO")

Result

lower("HELLO")

"hello"

1 row

UPPER

UPPER returns the original string in uppercase.

Syntax: UPPER( original )

Arguments:

•original: An expression that returns a string

Query

RETURN upper("hello")

Result

upper("hello")

"HELLO"

1 row

SPLIT

SPLIT returns the sequence of strings witch are delimited by split patterns.

Syntax: SPLIT( original, splitPattern )

Arguments:

•original: An expression that returns a string

•splitPattern: The string to split the original string with

Query

RETURN split("one,two", ",")

Functions

242

Result

split("one,two", ",")

["one", "two"]

1 row

REVERSE

REVERSE returns the original string reversed.

Syntax: REVERSE( original )

Arguments:

•original: An expression that returns a string

Query

RETURN reverse("anagram")

Result

reverse("anagram")

"margana"

1 row

TOSTRING

TOSTRING converts the argument to a string. It converts integral and floating point numbers to strings,

and if called with a string will leave it unchanged.

Syntax: TOSTRING( expression )

Arguments:

•expression: An expression that returns a number or a string

Query

RETURN toString(11.5), toString("already a string")

Result

toString(11.5) toString("already a string")

"11. 5" "already a string"

1 row

243

Chapter14.Schema

Neo4j 2.0 introduced an optional schema for the graph, based around the concept of labels. Labels are

used in the specification of indexes, and for defining constraints on the graph. Together, indexes and

constraints are the schema of the graph. Cypher includes data definition language (DDL) statements for

manipulating the schema.

Schema

244

14.1.Indexes

A database index is a redundant copy of information in the database for the purpose of making

retrieving said data more efficient. This comes at the cost of additional storage space and slower writes,

so deciding what to index and what not to index is an important and often non-trivial task.

Cypher allows the creation of indexes over a property for all nodes that have a given label. Once

an index has been created, it will automatically be managed and kept up to date by the database

whenever the graph is changed. Neo4j will automatically pick up and start using the index once it has

been created and brought online.

Create an index

To create an index on a property for all nodes that have a label, use CREATE INDEX ON. Note that the index

is not immediately available, but will be created in the background.

Query

CREATE INDEX ON :Person(name)

Result

(empty result)

Drop an index

To drop an index on all nodes that have a label and property combination, use the DROP INDEX clause.

Query

DROP INDEX ON :Person(name)

Result

(empty result)

Indexes removed: 1

Use index

There is usually no need to specify which indexes to use in a query, Cypher will figure that out by itself.

For example the query below will use the Person(name) index, if it exists. If you want Cypher to use

specific indexes, you can enforce it using hints. See Section10.8, “Using” [152].

Query

MATCH (person:Person { name: 'Andres' })

RETURN person

Query Plan

+-----------------+----------------+------+---------+-------------+---------------+

+-----------------+----------------+------+---------+-------------+---------------+

| +ProduceResults | 1 | 1 | 0 | person | person |

| | +----------------+------+---------+-------------+---------------+

| +NodeIndexSeek | 1 | 1 | 2 | person | :Person(name) |

+-----------------+----------------+------+---------+-------------+---------------+

Total database accesses: 2

Use index with WHERE using equality

Indexes are also automatically used for equality comparisons of an indexed property in the WHERE

clause. If you want Cypher to use specific indexes, you can enforce it using hints. See Section10.8,

“Using” [152].

Schema

245

Query

MATCH (person:Person)

WHERE person.name = 'Andres'

RETURN person

Query Plan

+-----------------+----------------+------+---------+-------------+---------------+

+-----------------+----------------+------+---------+-------------+---------------+

| +ProduceResults | 1 | 1 | 0 | person | person |

| | +----------------+------+---------+-------------+---------------+

| +NodeIndexSeek | 1 | 1 | 2 | person | :Person(name) |

+-----------------+----------------+------+---------+-------------+---------------+

Total database accesses: 2

Use index with WHERE using inequality

Indexes are also automatically used for inequality (range) comparisons of an indexed property in

the WHERE clause. If you want Cypher to use specific indexes, you can enforce it using hints. See

Section10.8, “Using” [152].

Query

MATCH (person:Person)

WHERE person.name > 'B'

RETURN person

Query Plan

+-----------------------+----------------+------+---------+-------------+---------------------------------+

+-----------------------+----------------+------+---------+-------------+---------------------------------+

| +ProduceResults | 33 | 1 | 0 | person | person |

| | +----------------+------+---------+-------------+---------------------------------+

| +NodeIndexSeekByRange | 33 | 1 | 2 | person | :Person(name) > { AUTOSTRING0} |

+-----------------------+----------------+------+---------+-------------+---------------------------------+

Total database accesses: 2

Use index with IN

The IN predicate on person.name in the following query will use the Person(name) index, if it exists. If you

want Cypher to use specific indexes, you can enforce it using hints. See Section10.8, “Using” [152].

Query

MATCH (person:Person)

WHERE person.name IN ['Andres', 'Mark']

RETURN person

Query Plan

+-----------------+----------------+------+---------+-------------+---------------+

+-----------------+----------------+------+---------+-------------+---------------+

| +ProduceResults | 2 | 2 | 0 | person | person |

| | +----------------+------+---------+-------------+---------------+

| +NodeIndexSeek | 2 | 2 | 4 | person | :Person(name) |

+-----------------+----------------+------+---------+-------------+---------------+

Total database accesses: 4

Schema

246

Use index with STARTS WITH

The STARTS WITH predicate on person.name in the following query will use the Person(name) index, if it

exists.

Note

The similar operators ENDS WITH and CONTAINS cannot currently be solved using indexes.

Query

MATCH (person:Person)

WHERE person.name STARTS WITH 'And'

RETURN person

Query Plan

+-----------------------+----------------+------+---------+-------------+-------------------------------------------+

+-----------------------+----------------+------+---------+-------------+-------------------------------------------+

| +ProduceResults | 26 | 1 | 0 | person | person |

| | +----------------+------+---------+-------------+-------------------------------------------+

| +NodeIndexSeekByRange | 26 | 1 | 2 | person | :Person(name STARTS WITH { AUTOSTRING0}) |

+-----------------------+----------------+------+---------+-------------+-------------------------------------------+

Total database accesses: 2

Use index when checking for the existence of a property

The has(p.name) predicate in the following query will use the Person(name) index, if it exists.

Query

MATCH (p:Person)

WHERE HAS (p.name)

RETURN p

Query Plan

+-----------------+----------------+------+---------+-------------+---------------+

+-----------------+----------------+------+---------+-------------+---------------+

| +ProduceResults | 2 | 2 | 0 | p | p |

| | +----------------+------+---------+-------------+---------------+

| +NodeIndexScan | 2 | 2 | 3 | p | :Person(name) |

+-----------------+----------------+------+---------+-------------+---------------+

Total database accesses: 3

Schema

247

14.2.Constraints

Neo4j helps enforce data integrity with the use of constraints. Constraints can be applied

to either nodes or relationships. Unique node property constraints can be created, as well

as node and relationship property existence constraints.

You can use unique property constraints to ensure that property values are unique for all nodes with

a specific label. Unique constraints do not mean that all nodes have to have a unique value for the

properties — nodes without the property are not subject to this rule.

You can use property existence constraints to ensure that a property exists for all nodes with a specific

label or for all relationships with a specific type. All queries that try to create new nodes or relationships

without the property, or queries that try to remove the mandatory property will now fail.

Note

Property existence constraints are only available in the Neo4j Enterprise Edition. Note that

databases with property existence constraints cannot be opened using Neo4j Community

Edition.

You can have multiple constraints for a given label and you can also combine unique and property

existence constraints on the same property.

Remember that adding constraints is an atomic operation that can take a while — all existing data has

to be scanned before Neo4j can turn the constraint “on”.

Note that adding a unique property constraint on a property will also add an index on that property, so

you cannot add such an index separately. Cypher will use that index for lookups just like other indexes.

If you drop a unique property constraint and still want an index on the property, you will have to create

the index.

The existing constraints can be listed using the REST API, see Section21.16, “Constraints” [370].

Unique node property constraints

Create uniqueness constraint

To create a constraint that makes sure that your database will never contain more than one node with a

specific label and one property value, use the IS UNIQUE syntax.

Query

CREATE CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE

Result

(empty result)

Unique constraints added: 1

Drop uniqueness constraint

By using DROP CONSTRAINT, you remove a constraint from the database.

Query

DROP CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE

Result

(empty result)

Unique constraints removed: 1

Create a node that complies with unique property constraints

Create a Book node with an isbn that isn’t already in the database.

Schema

248

Query

CREATE (book:Book { isbn: '1449356265', title: 'Graph Databases' })

Result

(empty result)

Nodes created: 1

Properties set: 2

Labels added: 1

Create a node that breaks a unique property constraint

Create a Book node with an isbn that is already used in the database.

Query

CREATE (book:Book { isbn: '1449356265', title: 'Graph Databases' })

In this case the node isn’t created in the graph.

Error message

Node 0 already exists with label Book and property "isbn"=[1449356265]

Failure to create a unique property constraint due to conflicting nodes

Create a unique property constraint on the property isbn on nodes with the Book label when there are

two nodes with the same isbn.

Query

CREATE CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE

In this case the constraint can’t be created because it is violated by existing data. We may choose to use

Section14.1, “Indexes” [244] instead or remove the offending nodes and then re-apply the constraint.

Error message

Unable to create CONSTRAINT ON ( book:Book ) ASSERT book.isbn IS UNIQUE:

Multiple nodes with label `Book` have property `isbn` = '1449356265':

node(0)

node(1)

Node property existence constraints

Create node property existence constraint

To create a constraint that makes sure that all nodes with a certain label have a certain property, use

the ASSERT exists(identifier.propertyName) syntax.

Query

CREATE CONSTRAINT ON (book:Book) ASSERT exists(book.isbn)

Result

(empty result)

Property existence constraints added: 1

Drop node property existence constraint

By using DROP CONSTRAINT, you remove a constraint from the database.

Query

DROP CONSTRAINT ON (book:Book) ASSERT exists(book.isbn)

Schema

249

Result

(empty result)

Property existence constraints removed: 1

Create a node that complies with property existence constraints

Create a Book node with an existing isbn property.

Query

CREATE (book:Book { isbn: '1449356265', title: 'Graph Databases' })

Result

(empty result)

Nodes created: 1

Properties set: 2

Labels added: 1

Create a node that breaks a property existence constraint

Trying to create a Book node without an isbn property, given a property existence constraint on

:Book(isbn).

Query

CREATE (book:Book { title: 'Graph Databases' })

In this case the node isn’t created in the graph.

Error message

Node 1 with label "Book" must have the property "isbn" due to a constraint

Removing an existence constrained node property

Trying to remove the isbn property from an existing node book, given a property existence constraint on

:Book(isbn).

Query

MATCH (book:Book { title: 'Graph Databases' })

REMOVE book.isbn

In this case the property is not removed.

Error message

Node 0 with label "Book" must have the property "isbn" due to a constraint

Failure to create a node property existence constraint due to existing node

Create a constraint on the property isbn on nodes with the Book label when there already exists a node

without an isbn.

Query

CREATE CONSTRAINT ON (book:Book) ASSERT exists(book.isbn)

In this case the constraint can’t be created because it is violated by existing data. We may choose to

remove the offending nodes and then re-apply the constraint.

Error message

Unable to create CONSTRAINT ON ( book:Book ) ASSERT exists(book.isbn):

Node(0) with label `Book` has no value for property `isbn`

Schema

250

Relationship property existence constraints

Create relationship property existence constraint

To create a constraint that makes sure that all relationships with a certain type have a certain property,

use the ASSERT exists(identifier.propertyName) syntax.

Query

CREATE CONSTRAINT ON ()-[like:LIKED]-() ASSERT exists(like.day)

Result

(empty result)

Property existence constraints added: 1

Drop relationship property existence constraint

To remove a constraint from the database, use DROP CONSTRAINT.

Query

DROP CONSTRAINT ON ()-[like:LIKED]-() ASSERT exists(like.day)

Result

(empty result)

Property existence constraints removed: 1

Create a relationship that complies with property existence constraints

Create a LIKED relationship with an existing day property.

Query

CREATE (user:User)-[like:LIKED { day: 'yesterday' }]->(book:Book)

Result

(empty result)

Nodes created: 2

Relationships created: 1

Properties set: 1

Labels added: 2

Create a relationship that breaks a property existence constraint

Trying to create a LIKED relationship without a day property, given a property existence constraint

:LIKED(day).

Query

CREATE (user:User)-[like:LIKED]->(book:Book)

In this case the relationship isn’t created in the graph.

Error message

Relationship 1 with type "LIKED" must have the property "day" due to a constraint

Removing an existence constrained relationship property

Trying to remove the day property from an existing relationship like of type LIKED, given a property

existence constraint :LIKED(day).

Query

MATCH (user:User)-[like:LIKED]->(book:Book)

Schema

251

REMOVE like.day

In this case the property is not removed.

Error message

Relationship 0 with type "LIKED" must have the property "day" due to a constraint

Failure to create a relationship property existence constraint due to existing relationship

Create a constraint on the property day on relationships with the LIKED type when there already exists a

relationship without a property named day.

Query

CREATE CONSTRAINT ON ()-[like:LIKED]-() ASSERT exists(like.day)

In this case the constraint can’t be created because it is violated by existing data. We may choose to

remove the offending relationships and then re-apply the constraint.

Error message

Unable to create CONSTRAINT ON ()-[ liked:LIKED ]-() ASSERT exists(liked.day):

Relationship(0) with type `LIKED` has no value for property `day`

Schema

252

14.3.Statistics

When you issue a Cypher query, it gets compiled to an execution plan (see Chapter16, Execution

Plans [259]) that can run and answer your question. To produce an efficient plan for your query,

Neo4j needs information about your database, such as the schema — what indexes and constraints do

exist? Neo4j will also use statistical information it keeps about your database to optimize the execution

plan. With this information, Neo4j can decide which access pattern leads to the best performing plans.

The statistical information that Neo4j keeps is:

1. The number of nodes with a certain label.

2. Selectivity per index.

3. The number of relationships by type.

4. The number of relationships by type, ending or starting from a node with a specific label.

Neo4j keeps the statistics up to date in two different ways. For label counts for example, the number

is updated whenever you set or remove a label from a node. For indexes, Neo4j needs to scan the full

index to produce the selectivity number. Since this is potentially a very time-consuming operation,

these numbers are collected in the background when enough data on the index has been changed.

Configuration options

Execution plans are cached and will not be replanned until the statistical information used to produce

the plan has changed. The following configuration options allows you to control how sensitive

replanning should be to updates of the database.

index_background_sampling_enabled

Controls whether indexes will automatically be re-sampled when they

have been updated enough. The Cypher query planner depends on

accurate statistics to create efficient plans, so it is important it is kept

up to date as the database evolves.

Tip

If background sampling is turned off, make sure to trigger

manual sampling when data has been updated.

index_sampling_update_percentage Controls how large portion of the index has to have been updated

before a new sampling run is triggered.

dbms.cypher.statistics_divergence_threshold

Controls how much the above statistical information is allowed to

change before an execution plan is considered stale and has to be

replanned. If the relative change in any of statistics is larger than this

threshold, the plan will be thrown away and a new one will be created.

A threshold of 0.0 means always replan, and a value of 1.0 means

never replan.

Managing statistics from the shell

Usage:

schema sample -a will sample all indexes.

schema sample -l Person -p

name

will sample the index for label Person on property name (if existing).

schema sample -a -f will force a sample of all indexes.

schema sample -f -l :Person -

p name

will force sampling of a specific index.

253

Chapter15.Query Tuning

Neo4j works very hard to execute queries as fast as possible.

However, when optimizing for maximum query execution performance, it may be helpful to rephrase

queries using knowledge about the domain and the application.

The overall goal of manual query performance optimization is to ensure that only necessary data is

retrieved from the graph. At least data should get filtered out as early as possible in order to reduce

the amount of work that has to be done at later stages of query execution. This also goes for what gets

returned: avoid returning whole nodes and relationships — instead, pick the data you need and return

only that. You should also make sure to set an upper limit on variable length patterns, so they don’t

cover larger portions of the dataset than needed.

Each Cypher query gets optimized and transformed into an execution plan by the Cypher execution

engine. To minimize the resources used for this, make sure to use parameters instead of literals when

possible. This allows Cypher to re-use your queries instead of having to parse and build new execution

plans.

To read more about the execution plan operators mentioned in this chapter, see Chapter16, Execution

Plans [259].

Query Tuning

254

15.1.How are queries executed?

Each query is turned into an execution plan by something called the execution planner. The execution

plan tells Neo4j which operations to perform when executing the query. Two different execution

planning strategies are included in Neo4j:

Rule This planner has rules that are used to produce execution plans. The planner considers

available indexes, but does not use statistical information to guide the query compilation.

Cost This planner uses the statistics service in Neo4j to assign cost to alternative plans and picks

the cheapest one. While this should lead to superior execution plans in most cases, it is still

under development.

By default, Neo4j 2.3.12 will use the cost planner for some queries, but not all. You can force

it to use a specific planner by using the query.planner.version configuration setting (see

dbms.cypher.planner [467]), or by prepending your query with CYPHER planner=cost or CYPHER

planner=rule. Neo4j might still not use the planner you selected — not all queries are solvable by the

cost planner at this point. Note that using PLANNER COST or PLANNER RULE in order to switch between

planners has been deprecated and will stop working in future versions.

You can see which planner was used by looking at the execution plan.

Note

When Cypher is building execution plans, it looks at the schema to see if it can find indexes

it can use. These index decisions are only valid until the schema changes, so adding or

removing indexes leads to the execution plan cache being flushed.

Query Tuning

255

15.2.How do I profile a query?

There are two options to choose from when you want to analyze a query by looking at its execution

plan:

EXPLAIN If you want to see the execution plan but not run the statement, prepend your Cypher

statement with EXPLAIN. The statement will always return an empty result and make no

changes to the database.

PROFILE If you want to run the statement and see which operators are doing most of the work,

use PROFILE. This will run your statement and keep track of how many rows pass through

each operator, and how much each operator needs to interact with the storage layer to

retrieve the necessary data. Please note that profiling your query uses more resources, so

you should not profile unless you are actively working on a query.

See Chapter16, Execution Plans [259] for a detailed explanation of each of the operators contained in

an execution plan.

Tip

Being explicit about what types and labels you expect relationships and nodes to have in

your query helps Neo4j use the best possible statistical information, which leads to better

execution plans. This means that when you know that a relationship can only be of a certain

type, you should add that to the query. The same goes for labels, where declaring labels on

both the start and end nodes of a relationship helps Neo4j find the best way to execute the

statement.

Query Tuning

256

15.3.Basic query tuning example

We’ll start with a basic example to help you get the hang of profiling queries. The following examples

will use a movies data set.

Let’s start by importing the data:

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/2.3.12/csv/query-tuning/movies.csv" AS line

MERGE (m:Movie { title:line.title })

ON CREATE SET m.released = toInt(line.released), m.tagline = line.tagline

LOAD CSV WITH HEADERS FROM 'http://neo4j.com/docs/2.3.12/csv/query-tuning/actors.csv' AS line

MATCH (m:Movie { title:line.title })

MERGE (p:Person { name:line.name })

ON CREATE SET p.born = toInt(line.born)

MERGE (p)-[:ACTED_IN { roles:split(line.roles,";")}]->(m)

LOAD CSV WITH HEADERS FROM 'http://neo4j.com/docs/2.3.12/csv/query-tuning/directors.csv' AS line

MATCH (m:Movie { title:line.title })

MERGE (p:Person { name:line.name })

ON CREATE SET p.born = toInt(line.born)

MERGE (p)-[:DIRECTED]->(m)

Let’s say we want to write a query to find Tom Hanks. The naive way of doing this would be to write the

following:

MATCH (p { name:"Tom Hanks" })

RETURN p

This query will find the Tom Hanks node but as the number of nodes in the database increase it will

become slower and slower. We can profile the query to find out why that is.

You can learn more about the options for profiling queries in Section15.2, “How do I profile a

query?” [255] but in this case we’re going to prefix our query with PROFILE:

PROFILE

MATCH (p { name:"Tom Hanks" })

RETURN p

+-----------------+----------------+------+---------+-------------+---------------------------+

+-----------------+----------------+------+---------+-------------+---------------------------+

| +ProduceResults | 16 | 1 | 0 | p | p |

| | +----------------+------+---------+-------------+---------------------------+

| +Filter | 16 | 1 | 163 | p | p.name == { AUTOSTRING0} |

| | +----------------+------+---------+-------------+---------------------------+

| +AllNodesScan | 163 | 163 | 164 | p | |

+-----------------+----------------+------+---------+-------------+---------------------------+

Total database accesses: 327

The first thing to keep in mind when reading execution plans is that you need to read from the bottom

up.

In that vein, starting from the last row, the first thing we notice is that the value in the Rows column

seems high given there is only one node with the name property Tom Hanks in the database. If we look

across to the Operator column we’ll see that AllNodesScan has been used which means that the query

planner scanned through all the nodes in the database.

Moving up to the previous row we see the Filter operator which will check the name property on each of

the nodes passed through by AllNodesScan.

Query Tuning

257

This seems like an inefficient way of finding Tom Hanks given that we are looking at many nodes that

aren’t even people and therefore aren’t what we’re looking for.

The solution to this problem is that whenever we’re looking for a node we should specify a label to help

the query planner narrow down the search space. For this query we’d need to add a Person label.

MATCH (p:Person { name:"Tom Hanks" })

RETURN p

This query will be faster than the first one but as the number of people in our database increase we

again notice that the query slows down.

Again we can profile the query to work out why:

PROFILE

MATCH (p:Person { name:"Tom Hanks" })

RETURN p

+------------------+----------------+------+---------+-------------+---------------------------+

+------------------+----------------+------+---------+-------------+---------------------------+

| +ProduceResults | 13 | 1 | 0 | p | p |

| | +----------------+------+---------+-------------+---------------------------+

| +Filter | 13 | 1 | 125 | p | p.name == { AUTOSTRING0} |

| | +----------------+------+---------+-------------+---------------------------+

| +NodeByLabelScan | 125 | 125 | 126 | p | :Person |

+------------------+----------------+------+---------+-------------+---------------------------+

Total database accesses: 251

This time the Rows value on the last row has reduced so we’re not scanning some nodes that we were

before which is a good start. The NodeByLabelScan operator indicates that we achieved this by first

doing a linear scan of all the Person nodes in the database.

Once we’ve done that we again scan through all those nodes using the Filter operator, comparing the

name property of each one.

This might be acceptable in some cases but if we’re going to be looking up people by name frequently

then we’ll see better performance if we create an index on the name property for the Person label:

CREATE INDEX ON :Person(name)

Now if we run the query again it will run more quickly:

MATCH (p:Person { name:"Tom Hanks" })

RETURN p

Let’s profile the query to see why that is:

PROFILE

MATCH (p:Person { name:"Tom Hanks" })

RETURN p

+-----------------+----------------+------+---------+-------------+---------------+

+-----------------+----------------+------+---------+-------------+---------------+

| +ProduceResults | 1 | 1 | 0 | p | p |

| | +----------------+------+---------+-------------+---------------+

| +NodeIndexSeek | 1 | 1 | 2 | p | :Person(name) |

+-----------------+----------------+------+---------+-------------+---------------+

Total database accesses: 2

Our execution plan is down to a single row and uses the Node Index Seek operator which does a

schema index seek (see Section14.1, “Indexes” [244]) to find the appropriate node.

Query Tuning

258

259

Chapter16.Execution Plans

Neo4j breaks down the work of executing a query into small pieces called operators. Each operator is

responsible for a small part of the overall query. The operators are connected together in a pattern

called a execution plan.

Each operator is annotated with statistics.

Rows The number of rows that the operator produced. Only available if the query was

profiled.

EstimatedRows If Neo4j used the cost-based compiler you will see the estimated number of rows

that will be produced by the operator. The compiler uses this estimate to choose

a suitable execution plan.

DbHits Each operator will ask the Neo4j storage engine to do work such as retrieving or

updating data. A database hit is an abstract unit of this storage engine work.

See Section15.2, “How do I profile a query?” [255] for how to view the execution plan for your query.

For a deeper understanding of how each operator works, see the relevant section. Operators are

grouped into high-level categories. Please remember that the statistics of the actual database where

the queries run on will decide the plan used. There is no guarantee that a specific query will always be

solved with the same plan.

Execution Plans

260

16.1.Starting point operators

These operators find parts of the graph from which to start.

All Nodes Scan

Reads all nodes from the node store. The identifier that will contain the nodes is seen in the arguments.

If your query is using this operator, you are very likely to see performance problems on any non-trivial

database.

Query

MATCH (n)

RETURN n

Query Plan

+-----------------+----------------+------+---------+-------------+-------+

+-----------------+----------------+------+---------+-------------+-------+

| +ProduceResults | 35 | 35 | 0 | n | n |

| | +----------------+------+---------+-------------+-------+

| +AllNodesScan | 35 | 35 | 36 | n | |

+-----------------+----------------+------+---------+-------------+-------+

Total database accesses: 36

Directed Relationship By Id Seek

Reads one or more relationships by id from the relationship store. Produces both the relationship and

the nodes on either side.

Query

MATCH (n1)-[r]->()

WHERE id(r)= 0

RETURN r, n1

Query Plan

+-----------------------------------+----------------+------+---------+-----------------

+--------------------------------------------+

+-----------------------------------+----------------+------+---------+-----------------

+--------------------------------------------+

| +ProduceResults | 1 | 1 | 0 | n1, r | r, n1

| | +----------------+------+---------+-----------------

+--------------------------------------------+

| +DirectedRelationshipByIdSeekPipe | 1 | 1 | 1 | anon[17], n1, r | EntityByIdRhs(SingleSeekArg({

AUTOINT0})) |

+-----------------------------------+----------------+------+---------+-----------------

+--------------------------------------------+

Total database accesses: 1

Node by Id seek

Reads one or more nodes by id from the node store.

Query

MATCH (n)

WHERE id(n)= 0

Execution Plans

261

RETURN n

Query Plan

+-----------------+----------------+------+---------+-------------+-------+

+-----------------+----------------+------+---------+-------------+-------+

| +ProduceResults | 1 | 1 | 0 | n | n |

| | +----------------+------+---------+-------------+-------+

| +NodeByIdSeek | 1 | 1 | 1 | n | |

+-----------------+----------------+------+---------+-------------+-------+

Total database accesses: 1

Node by label scan

Using the label index, fetches all nodes with a specific label on them from the node label index.

Query

MATCH (person:Person)

RETURN person

Query Plan

+------------------+----------------+------+---------+-------------+---------+

+------------------+----------------+------+---------+-------------+---------+

| +ProduceResults | 14 | 14 | 0 | person | person |

| | +----------------+------+---------+-------------+---------+

| +NodeByLabelScan | 14 | 14 | 15 | person | :Person |

+------------------+----------------+------+---------+-------------+---------+

Total database accesses: 15

Node index seek

Finds nodes using an index seek. The node identifier and the index used is shown in the arguments of

the operator. If the index is a unique index, the operator is called NodeUniqueIndexSeek instead.

Query

MATCH (location:Location { name: "Malmo" })

RETURN location

Query Plan

+-----------------+----------------+------+---------+-------------+-----------------+

+-----------------+----------------+------+---------+-------------+-----------------+

| +ProduceResults | 1 | 1 | 0 | location | location |

| | +----------------+------+---------+-------------+-----------------+

| +NodeIndexSeek | 1 | 1 | 2 | location | :Location(name) |

+-----------------+----------------+------+---------+-------------+-----------------+

Total database accesses: 2

Node index range seek

Finds nodes using an index seek where the value of the property matches a given prefix string. This

operator can be used for STARTS WITH and comparators such as <, >, <= and >=

Query

MATCH (l:Location)

Execution Plans

262

WHERE l.name STARTS WITH 'Lon'

RETURN l

Query Plan

+-----------------------+----------------+------+---------+-------------+---------------------------------------------+

+-----------------------+----------------+------+---------+-------------+---------------------------------------------+

| +ProduceResults | 26 | 1 | 0 | l | l |

| | +----------------+------+---------+-------------+---------------------------------------------+

| +NodeIndexSeekByRange | 26 | 1 | 2 | l | :Location(name STARTS WITH { AUTOSTRING0}) |

+-----------------------+----------------+------+---------+-------------+---------------------------------------------+

Total database accesses: 2

Node index scan

An index scan goes through all values stored in an index, and can be used to find all nodes with a

particular label having a specified property (e.g. exists(n.prop)).

Query

MATCH (l:Location)

WHERE HAS (l.name)

RETURN l

Query Plan

+-----------------+----------------+------+---------+-------------+-----------------+

+-----------------+----------------+------+---------+-------------+-----------------+

| +ProduceResults | 10 | 10 | 0 | l | l |

| | +----------------+------+---------+-------------+-----------------+

| +NodeIndexScan | 10 | 10 | 11 | l | :Location(name) |

+-----------------+----------------+------+---------+-------------+-----------------+

Total database accesses: 11

Undirected Relationship By Id Seek

Reads one or more relationships by id from the relationship store. For each relationship, two rows are

produced with start and end nodes arranged differently.

Query

MATCH (n1)-[r]-()

WHERE id(r)= 1

RETURN r, n1

Query Plan

+---------------------------------+----------------+------+---------+-----------------+-------+

+---------------------------------+----------------+------+---------+-----------------+-------+

| +ProduceResults | 1 | 2 | 0 | n1, r | r, n1 |

| | +----------------+------+---------+-----------------+-------+

| +UndirectedRelationshipByIdSeek | 1 | 2 | 1 | anon[16], n1, r | |

+---------------------------------+----------------+------+---------+-----------------+-------+

Total database accesses: 1

Execution Plans

263

16.2.Expand operators

Thes operators explore the graph by expanding graph patterns.

Expand All

Given a start node, expand-all will follow relationships coming in or out, depending on the pattern

relationship. Can also handle variable length pattern relationships.

Query

MATCH (p:Person { name: "me" })-[:FRIENDS_WITH]->(fof)

RETURN fof

Query Plan

+-----------------+----------------+------+---------+------------------+----------------------------+

+-----------------+----------------+------+---------+------------------+----------------------------+

| +ProduceResults | 0 | 1 | 0 | fof | fof |

| | +----------------+------+---------+------------------+----------------------------+

| +Expand(All) | 0 | 1 | 2 | anon[30], fof, p | (p)-[:FRIENDS_WITH]->(fof) |

| | +----------------+------+---------+------------------+----------------------------+

| +NodeIndexSeek | 1 | 1 | 2 | p | :Person(name) |

+-----------------+----------------+------+---------+------------------+----------------------------+

Total database accesses: 4

Expand Into

When both the start and end node have already been found, expand-into is used to find all connecting

relationships between the two nodes.

Query

MATCH (p:Person { name: "me" })-[:FRIENDS_WITH]->(fof)-->(p)

RETURN fof

Query Plan

+-----------------+----------------+------+---------+----------------------------+----------------------------+

+-----------------+----------------+------+---------+----------------------------+----------------------------+

| +ProduceResults | 0 | 0 | 0 | fof | fof |

| | +----------------+------+---------+----------------------------+----------------------------+

| +Filter | 0 | 0 | 0 | anon[30], anon[53], fof, p | NOT(anon[30] == anon[53]) |

| | +----------------+------+---------+----------------------------+----------------------------+

| +Expand(Into) | 0 | 0 | 0 | anon[30], anon[53], fof, p | (p)-[:FRIENDS_WITH]->(fof) |

| | +----------------+------+---------+----------------------------+----------------------------+

| +Expand(All) | 0 | 0 | 1 | anon[53], fof, p | (p)<--(fof) |

| | +----------------+------+---------+----------------------------+----------------------------+

| +NodeIndexSeek | 1 | 1 | 2 | p | :Person(name) |

+-----------------+----------------+------+---------+----------------------------+----------------------------+

Total database accesses: 3

Optional Expand All

Optional expand traverses relationships from a given node, and makes sure that predicates are

evaluated before producing rows.

If no matching relationships are found, a single row with NULL for the relationship and end node

identifier is produced.

Query

Execution Plans

264

MATCH (p:Person)

OPTIONAL MATCH (p)-[works_in:WORKS_IN]->(l)

WHERE works_in.duration > 180

RETURN p, l

Query Plan

+----------------------+----------------+------+---------+----------------+------------------------------+

+----------------------+----------------+------+---------+----------------+------------------------------+

| +ProduceResults | 14 | 15 | 0 | l, p | p, l |

| | +----------------+------+---------+----------------+------------------------------+

| +OptionalExpand(All) | 14 | 15 | 44 | l, p, works_in | (p)-[works_in:WORKS_IN]->(l) |

| | +----------------+------+---------+----------------+------------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | p | :Person |

+----------------------+----------------+------+---------+----------------+------------------------------+

Total database accesses: 59

Execution Plans

265

16.3.Combining operators

Node Hash Join

Using a hash table, a node hash join joins the inputs coming from the left with the inputs coming from

the right. The join key is specified in the arguments of the operator.

Query

MATCH (andy:Person { name:'Andreas' })-[:WORKS_IN]->(loc)<-[:WORKS_IN]-(matt:Person { name:'Mattis'

})

RETURN loc

Query Plan

+------------------+----------------+------+---------+-------------------------------------+---------------------------+

+------------------+----------------+------+---------+-------------------------------------+---------------------------+

| +ProduceResults | 35 | 0 | 0 | loc | loc |

| | +----------------+------+---------+-------------------------------------+---------------------------+

| +Filter | 35 | 0 | 0 | anon[37], anon[56], andy, loc, matt | NOT(anon[37] == anon[56]) |

| | +----------------+------+---------+-------------------------------------+---------------------------+

| +NodeHashJoin | 35 | 0 | 0 | anon[37], anon[56], andy, loc, matt | loc |

| |\ +----------------+------+---------+-------------------------------------+---------------------------+

| | +Expand(All) | 35 | 0 | 0 | anon[56], loc, matt | (matt)-[:WORKS_IN]->(loc) |

| | | +----------------+------+---------+-------------------------------------+---------------------------+

| | +NodeIndexSeek | 1 | 0 | 1 | matt | :Person(name) |

| | +----------------+------+---------+-------------------------------------+---------------------------+

| +Expand(All) | 35 | 0 | 1 | anon[37], andy, loc | (andy)-[:WORKS_IN]->(loc) |

| | +----------------+------+---------+-------------------------------------+---------------------------+

| +NodeIndexSeek | 1 | 1 | 2 | andy | :Person(name) |

+------------------+----------------+------+---------+-------------------------------------+---------------------------+

Total database accesses: 4

Apply

Apply works by performing a nested loop. Every row being produced on the left hand side of the Apply

operator will be fed to the Argument operator on the right hand side, and then Apply will yield the

results coming from the RHS. Apply, being a nested loop, can be seen as a warning that a better plan

was not found.

Query

MATCH (p:Person)-[:FRIENDS_WITH]->(f)

WITH p, count(f) AS fs

WHERE fs > 0

OPTIONAL MATCH (p)-[:WORKS_IN*1..2]->(city)

RETURN p, city

Query Plan

+---------------------------+----------------+------+---------+----------------------------------+--------------------------+

+---------------------------+----------------+------+---------+----------------------------------+--------------------------+

| +ProduceResults | 1 | 2 | 0 | city, p | p, city |

| | +----------------+------+---------+----------------------------------+--------------------------+

| +Apply | 1 | 2 | 0 | anon[92], anon[126], city, fs, p | |

| |\ +----------------+------+---------+----------------------------------+--------------------------+

| | +Apply | 1 | 2 | 0 | anon[92], anon[126], city, fs, p | |

| | |\ +----------------+------+---------+----------------------------------+--------------------------+

| | | +Optional | 1 | 2 | 0 | anon[126], city, p | |

| | | | +----------------+------+---------+----------------------------------+--------------------------+

| | | +VarLengthExpand(All) | 1 | 2 | 6 | anon[126], city, p | (p)-[:WORKS_IN*]->(city) |

Execution Plans

266

| | | | +----------------+------+---------+----------------------------------+--------------------------+

| | | +Argument | 1 | 2 | 0 | p | |

| | | +----------------+------+---------+----------------------------------+--------------------------+

| | +Filter | 1 | 2 | 0 | anon[92], fs, p | anon[92] |

| | | +----------------+------+---------+----------------------------------+--------------------------+

| | +Argument | 1 | 2 | 0 | anon[92], fs, p | |

| | +----------------+------+---------+----------------------------------+--------------------------+

| +Projection | 1 | 2 | 0 | anon[92], fs, p | p; fs; fs > { AUTOINT0} |

| | +----------------+------+---------+----------------------------------+--------------------------+

| +EagerAggregation | 1 | 2 | 0 | fs, p | p |

| | +----------------+------+---------+----------------------------------+--------------------------+

| +Expand(All) | 2 | 2 | 16 | anon[17], f, p | (p)-[:FRIENDS_WITH]->(f) |

| | +----------------+------+---------+----------------------------------+--------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | p | :Person |

+---------------------------+----------------+------+---------+----------------------------------+--------------------------+

Total database accesses: 37

Anti Semi Apply

Tests for the absence of a pattern predicate. A pattern predicate that is prepended by NOT is solved with

AntiSemiApply.

Query

MATCH (me:Person { name: "me" }),(other:Person)

WHERE NOT (me)-[:FRIENDS_WITH]->(other)

RETURN other

Query Plan

+--------------------+----------------+------+---------+---------------------+-------------------------------+

+--------------------+----------------+------+---------+---------------------+-------------------------------+

| +ProduceResults | 4 | 13 | 0 | other | other |

| | +----------------+------+---------+---------------------+-------------------------------+

| +AntiSemiApply | 4 | 13 | 0 | me, other | |

| |\ +----------------+------+---------+---------------------+-------------------------------+

| | +Expand(Into) | 0 | 0 | 50 | anon[73], me, other | (me)-[:FRIENDS_WITH]->(other) |

| | | +----------------+------+---------+---------------------+-------------------------------+

| | +Argument | 14 | 14 | 0 | me, other | |

| | +----------------+------+---------+---------------------+-------------------------------+

| +CartesianProduct | 14 | 14 | 0 | me, other | |

| |\ +----------------+------+---------+---------------------+-------------------------------+

| | +NodeByLabelScan | 14 | 14 | 15 | other | :Person |

| | +----------------+------+---------+---------------------+-------------------------------+

| +NodeIndexSeek | 1 | 1 | 2 | me | :Person(name) |

+--------------------+----------------+------+---------+---------------------+-------------------------------+

Total database accesses: 67

Let Anti Semi Apply

Tests for the absence of a pattern predicate. When a query contains multiple pattern predicates

LetSemiApply will be used to evaluate the first of these. It will record the result of evaluating the

predicate but will leave any filtering to another operator. The following query will find all the people

who don’t have anyfriend or who work somewhere. The LetSemiApply operator will be used to check for

the absence of the FRIENDS_WITH relationship from each person.

Query

MATCH (other:Person)

WHERE NOT ((other)-[:FRIENDS_WITH]->()) OR (other)-[:WORKS_IN]->()

RETURN other

Execution Plans

267

Query Plan

+--------------------+----------------+------+---------+---------------------------+-----------------------------+

+--------------------+----------------+------+---------+---------------------------+-----------------------------+

| +ProduceResults | 11 | 14 | 0 | other | other |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +SelectOrSemiApply | 11 | 14 | 0 | anon[42], other | anon[42] |

| |\ +----------------+------+---------+---------------------------+-----------------------------+

| | +Expand(All) | 15 | 0 | 2 | anon[82], anon[96], other | (other)-[:WORKS_IN]->() |

| | | +----------------+------+---------+---------------------------+-----------------------------+

| | +Argument | 14 | 2 | 0 | other | |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +LetAntiSemiApply | 14 | 14 | 0 | anon[42], other | |

| |\ +----------------+------+---------+---------------------------+-----------------------------+

| | +Expand(All) | 2 | 0 | 14 | anon[50], anon[68], other | (other)-[:FRIENDS_WITH]->() |

| | | +----------------+------+---------+---------------------------+-----------------------------+

| | +Argument | 14 | 14 | 0 | other | |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | other | :Person |

+--------------------+----------------+------+---------+---------------------------+-----------------------------+

Total database accesses: 31

Let Semi Apply

Tests for the existence of a pattern predicate. When a query contains multiple pattern predicates

LetSemiApply will be used to evaluate the first of these. It will record the result of evaluating the

predicate but will leave any filtering to a another operator. The following query will find all the people

who have a friend or who work somewhere. The LetSemiApply operator will be used to check for the

existence of the FRIENDS_WITH relationship from each person.

Query

MATCH (other:Person)

WHERE (other)-[:FRIENDS_WITH]->() OR (other)-[:WORKS_IN]->()

RETURN other

Query Plan

+--------------------+----------------+------+---------+---------------------------+-----------------------------+

+--------------------+----------------+------+---------+---------------------------+-----------------------------+

| +ProduceResults | 13 | 14 | 0 | other | other |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +SelectOrSemiApply | 13 | 14 | 0 | anon[38], other | anon[38] |

| |\ +----------------+------+---------+---------------------------+-----------------------------+

| | +Expand(All) | 15 | 0 | 12 | anon[77], anon[91], other | (other)-[:WORKS_IN]->() |

| | | +----------------+------+---------+---------------------------+-----------------------------+

| | +Argument | 14 | 12 | 0 | other | |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +LetSemiApply | 14 | 14 | 0 | anon[38], other | |

| |\ +----------------+------+---------+---------------------------+-----------------------------+

| | +Expand(All) | 2 | 0 | 14 | anon[46], anon[64], other | (other)-[:FRIENDS_WITH]->() |

| | | +----------------+------+---------+---------------------------+-----------------------------+

| | +Argument | 14 | 14 | 0 | other | |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | other | :Person |

+--------------------+----------------+------+---------+---------------------------+-----------------------------+

Total database accesses: 41

Select Or Anti Semi Apply

Tests for the absence of a pattern predicate and evaluates a predicate.

Execution Plans

268

Query

MATCH (other:Person)

WHERE other.age > 25 OR NOT (other)-[:FRIENDS_WITH]->()

RETURN other

Query Plan

+------------------------+----------------+------+---------+---------------------------+-----------------------------+

+------------------------+----------------+------+---------+---------------------------+-----------------------------+

| +ProduceResults | 4 | 12 | 0 | other | other |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +SelectOrAntiSemiApply | 4 | 12 | 28 | other | other.age > { AUTOINT0} |

| |\ +----------------+------+---------+---------------------------+-----------------------------+

| | +Expand(All) | 2 | 0 | 14 | anon[68], anon[86], other | (other)-[:FRIENDS_WITH]->() |

| | | +----------------+------+---------+---------------------------+-----------------------------+

| | +Argument | 14 | 14 | 0 | other | |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | other | :Person |

+------------------------+----------------+------+---------+---------------------------+-----------------------------+

Total database accesses: 57

Select Or Semi Apply

Tests for the existence of a pattern predicate and evaluates a predicate. This operator allows for the

mixing of normal predicates and pattern predicates that check for the existing of a pattern. First the

normal expression predicate is evaluated, and only if it returns FALSE the costly pattern predicate

evaluation is performed.

Query

MATCH (other:Person)

WHERE other.age > 25 OR (other)-[:FRIENDS_WITH]->()

RETURN other

Query Plan

+--------------------+----------------+------+---------+---------------------------+-----------------------------+

+--------------------+----------------+------+---------+---------------------------+-----------------------------+

| +ProduceResults | 11 | 2 | 0 | other | other |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +SelectOrSemiApply | 11 | 2 | 28 | other | other.age > { AUTOINT0} |

| |\ +----------------+------+---------+---------------------------+-----------------------------+

| | +Expand(All) | 2 | 0 | 14 | anon[64], anon[82], other | (other)-[:FRIENDS_WITH]->() |

| | | +----------------+------+---------+---------------------------+-----------------------------+

| | +Argument | 14 | 14 | 0 | other | |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | other | :Person |

+--------------------+----------------+------+---------+---------------------------+-----------------------------+

Total database accesses: 57

Semi Apply

Tests for the existence of a pattern predicate. SemiApply takes a row from it’s child operator and feeds

it to the Argument operator on the right hand side of SemiApply. If the right hand side operator tree

yields at least one row, the row from the left hand side is yielded by the SemiApply operator. This makes

SemiApply a filtering operator, used mostly for pattern predicates in queries.

Query

MATCH (other:Person)

Execution Plans

269

WHERE (other)-[:FRIENDS_WITH]->()

RETURN other

Query Plan

+------------------+----------------+------+---------+---------------------------+-----------------------------+

+------------------+----------------+------+---------+---------------------------+-----------------------------+

| +ProduceResults | 11 | 2 | 0 | other | other |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +SemiApply | 11 | 2 | 0 | other | |

| |\ +----------------+------+---------+---------------------------+-----------------------------+

| | +Expand(All) | 2 | 0 | 14 | anon[46], anon[64], other | (other)-[:FRIENDS_WITH]->() |

| | | +----------------+------+---------+---------------------------+-----------------------------+

| | +Argument | 14 | 14 | 0 | other | |

| | +----------------+------+---------+---------------------------+-----------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | other | :Person |

+------------------+----------------+------+---------+---------------------------+-----------------------------+

Total database accesses: 29

Triadic

Triadic is used to solve triangular queries, such as the very common "find my friend-of-friends that are

not already my friend". It does so by putting all the "friends" in a set, and use that set to check if the

friend-of-friends are already connected to me.

Query

MATCH (me:Person)-[:FRIENDS_WITH]-()-[:FRIENDS_WITH]-(other)

WHERE NOT (me)-[:FRIENDS_WITH]-(other)

RETURN other

Query Plan

+-------------------+----------------+------+---------+-----------------------------------------+----------------------------+

+-------------------+----------------+------+---------+-----------------------------------------+----------------------------+

| +ProduceResults | 0 | 2 | 0 | other | other |

| | +----------------+------+---------+-----------------------------------------+----------------------------+

| +TriadicSelection | 0 | 2 | 0 | anon[18], anon[35], anon[37], me, other | me, anon[35], other |

| |\ +----------------+------+---------+-----------------------------------------+----------------------------+

| | +Filter | 0 | 2 | 0 | anon[18], anon[35], anon[37], me, other | NOT(anon[18] == anon[37]) |

| | | +----------------+------+---------+-----------------------------------------+----------------------------+

| | +Expand(All) | 0 | 6 | 10 | anon[18], anon[35], anon[37], me, other | ()-[:FRIENDS_WITH]-(other) |

| | | +----------------+------+---------+-----------------------------------------+----------------------------+

| | +Argument | 4 | 4 | 0 | anon[18], anon[35], me | |

| | +----------------+------+---------+-----------------------------------------+----------------------------+

| +Expand(All) | 4 | 4 | 18 | anon[18], anon[35], me | (me)-[:FRIENDS_WITH]-() |

| | +----------------+------+---------+-----------------------------------------+----------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | me | :Person |

+-------------------+----------------+------+---------+-----------------------------------------+----------------------------+

Total database accesses: 43

Execution Plans

270

16.4.Row operators

These operators take rows produced by another operator and transform them to a different set of rows

Eager

For isolation purposes this operator makes sure that operations that affect subsequent operations are

executed fully for the whole dataset before continuing execution. Otherwise it could trigger endless

loops, matching data again, that was just created. The Eager operator can cause high memory usage

when importing data or migrating graph structures. In such cases split up your operations into simpler

steps e.g. you can import nodes and relationships separately. Alternatively return the records to be

updated and run an update statement afterwards.

Query

MATCH (p:Person)

MERGE (:Person:Clone { name:p.name })

Query Plan

+--------------+------+---------+-------------+----------------------------------+

+--------------+------+---------+-------------+----------------------------------+

| +EmptyResult | 0 | 0 | | |

| | +------+---------+-------------+----------------------------------+

| | +------+---------+-------------+----------------------------------+

| +Eager | 14 | 0 | p | |

| | +------+---------+-------------+----------------------------------+

| +NodeByLabel | 14 | 15 | p | :Person |

+--------------+------+---------+-------------+----------------------------------+

Total database accesses: 210

Distinct

Removes duplicate rows from the incoming stream of rows.

Query

MATCH (l:Location)<-[:WORKS_IN]-(p:Person)

RETURN DISTINCT l

Query Plan

+------------------+----------------+------+---------+----------------+----------------------+

+------------------+----------------+------+---------+----------------+----------------------+

| +ProduceResults | 14 | 6 | 0 | l | l |

| | +----------------+------+---------+----------------+----------------------+

| +Distinct | 14 | 6 | 0 | l | l |

| | +----------------+------+---------+----------------+----------------------+

| +Filter | 15 | 15 | 15 | anon[19], l, p | p:Person |

| | +----------------+------+---------+----------------+----------------------+

| +Expand(All) | 15 | 15 | 25 | anon[19], l, p | (l)<-[:WORKS_IN]-(p) |

| | +----------------+------+---------+----------------+----------------------+

| +NodeByLabelScan | 10 | 10 | 11 | l | :Location |

+------------------+----------------+------+---------+----------------+----------------------+

Total database accesses: 51

Eager Aggregation

Eagerly loads underlying results and stores it in a hash-map, using the grouping keys as the keys for the

map.

Execution Plans

271

Query

MATCH (l:Location)<-[:WORKS_IN]-(p:Person)

RETURN l.name AS location, COLLECT(p.name) AS people

Query Plan

+-------------------+----------------+------+---------+--------------------------+----------------------+

+-------------------+----------------+------+---------+--------------------------+----------------------+

| +ProduceResults | 4 | 6 | 0 | location, people | location, people |

| | +----------------+------+---------+--------------------------+----------------------+

| +EagerAggregation | 4 | 6 | 15 | location, people | location |

| | +----------------+------+---------+--------------------------+----------------------+

| +Projection | 15 | 15 | 15 | anon[19], l, location, p | l.name; p |

| | +----------------+------+---------+--------------------------+----------------------+

| +Filter | 15 | 15 | 15 | anon[19], l, p | p:Person |

| | +----------------+------+---------+--------------------------+----------------------+

| +Expand(All) | 15 | 15 | 25 | anon[19], l, p | (l)<-[:WORKS_IN]-(p) |

| | +----------------+------+---------+--------------------------+----------------------+

| +NodeByLabelScan | 10 | 10 | 11 | l | :Location |

+-------------------+----------------+------+---------+--------------------------+----------------------+

Total database accesses: 81

Filter

Filters each row coming from the child operator, only passing through rows that evaluate the predicates

to TRUE.

Query

MATCH (p:Person)

WHERE p.name =~ "^a.*"

RETURN p

Query Plan

+------------------+----------------+------+---------+-------------+-----------------------------+

+------------------+----------------+------+---------+-------------+-----------------------------+

| +ProduceResults | 14 | 0 | 0 | p | p |

| | +----------------+------+---------+-------------+-----------------------------+

| +Filter | 14 | 0 | 14 | p | p.name ~= /{ AUTOSTRING0}/ |

| | +----------------+------+---------+-------------+-----------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | p | :Person |

+------------------+----------------+------+---------+-------------+-----------------------------+

Total database accesses: 29

Limit

Returns the first n rows from the incoming input.

Query

MATCH (p:Person)

RETURN p

LIMIT 3

Query Plan

+------------------+----------------+------+---------+-------------+------------+

+------------------+----------------+------+---------+-------------+------------+

| +ProduceResults | 3 | 3 | 0 | p | p |

Execution Plans

272

| | +----------------+------+---------+-------------+------------+

| +Limit | 3 | 3 | 0 | p | Literal(3) |

| | +----------------+------+---------+-------------+------------+

| +NodeByLabelScan | 14 | 3 | 4 | p | :Person |

+------------------+----------------+------+---------+-------------+------------+

Total database accesses: 4

Projection

For each row from its input, projection evaluates a set of expressions and produces a row with the

results of the expressions.

Query

RETURN "hello" AS greeting

Query Plan

+-----------------+----------------+------+---------+-------------+-----------------+

+-----------------+----------------+------+---------+-------------+-----------------+

| +ProduceResults | 1 | 1 | 0 | greeting | greeting |

| | +----------------+------+---------+-------------+-----------------+

| +Projection | 1 | 1 | 0 | greeting | { AUTOSTRING0} |

+-----------------+----------------+------+---------+-------------+-----------------+

Total database accesses: 0

Skip

Skips n rows from the incoming rows

Query

MATCH (p:Person)

RETURN p

ORDER BY p.id

SKIP 1

Query Plan

+------------------+----------------+------+---------+--------------------------+-----------------------+

+------------------+----------------+------+---------+--------------------------+-----------------------+

| +ProduceResults | 14 | 13 | 0 | p | p |

| | +----------------+------+---------+--------------------------+-----------------------+

| +Projection | 14 | 13 | 0 | anon[35], anon[59], p, p | anon[35] |

| | +----------------+------+---------+--------------------------+-----------------------+

| +Skip | 14 | 13 | 0 | anon[35], anon[59], p | { AUTOINT0} |

| | +----------------+------+---------+--------------------------+-----------------------+

| +Sort | 14 | 14 | 0 | anon[35], anon[59], p | anon[59] |

| | +----------------+------+---------+--------------------------+-----------------------+

| +Projection | 14 | 14 | 28 | anon[35], anon[59], p | anon[35]; anon[35].id |

| | +----------------+------+---------+--------------------------+-----------------------+

| +Projection | 14 | 14 | 0 | anon[35], p | p |

| | +----------------+------+---------+--------------------------+-----------------------+

| +NodeByLabelScan | 14 | 14 | 15 | p | :Person |

+------------------+----------------+------+---------+--------------------------+-----------------------+

Total database accesses: 43

Sort

Sorts rows by a provided key.

Execution Plans

273

Query

MATCH (p:Person)

RETURN p

ORDER BY p.name

Query Plan

+------------------+----------------+------+---------+--------------------------+-------------------------+

+------------------+----------------+------+---------+--------------------------+-------------------------+

| +ProduceResults | 14 | 14 | 0 | p | p |

| | +----------------+------+---------+--------------------------+-------------------------+

| +Projection | 14 | 14 | 0 | anon[24], anon[37], p, p | anon[24] |

| | +----------------+------+---------+--------------------------+-------------------------+

| +Sort | 14 | 14 | 0 | anon[24], anon[37], p | anon[37] |

| | +----------------+------+---------+--------------------------+-------------------------+

| +Projection | 14 | 14 | 14 | anon[24], anon[37], p | anon[24]; anon[24].name |

| | +----------------+------+---------+--------------------------+-------------------------+

| +Projection | 14 | 14 | 0 | anon[24], p | p |

| | +----------------+------+---------+--------------------------+-------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | p | :Person |

+------------------+----------------+------+---------+--------------------------+-------------------------+

Total database accesses: 29

Top

Returns the first n rows sorted by a provided key. The physical operator is called Top. Instead of sorting

the whole input, only the top X rows are kept.

Query

MATCH (p:Person)

RETURN p

ORDER BY p.name

LIMIT 2

Query Plan

+------------------+----------------+------+---------+--------------------------+-------------------------+

+------------------+----------------+------+---------+--------------------------+-------------------------+

| +ProduceResults | 2 | 2 | 0 | p | p |

| | +----------------+------+---------+--------------------------+-------------------------+

| +Projection | 2 | 2 | 0 | anon[24], anon[37], p, p | anon[24] |

| | +----------------+------+---------+--------------------------+-------------------------+

| +Top | 2 | 2 | 0 | anon[24], anon[37], p | Literal(2); |

| | +----------------+------+---------+--------------------------+-------------------------+

| +Projection | 14 | 14 | 14 | anon[24], anon[37], p | anon[24]; anon[24].name |

| | +----------------+------+---------+--------------------------+-------------------------+

| +Projection | 14 | 14 | 0 | anon[24], p | p |

| | +----------------+------+---------+--------------------------+-------------------------+

| +NodeByLabelScan | 14 | 14 | 15 | p | :Person |

+------------------+----------------+------+---------+--------------------------+-------------------------+

Total database accesses: 29

Union

Union concatenates the results from the right plan after the results of the left plan.

Query

MATCH (p:Location)

Execution Plans

274

RETURN p.name

UNION ALL MATCH (p:Country)

RETURN p.name

Query Plan

+--------------------+----------------+------+---------+-------------+-----------+

+--------------------+----------------+------+---------+-------------+-----------+

| +ProduceResults | 10 | 11 | 0 | p.name | p.name |

| | +----------------+------+---------+-------------+-----------+

| +Union | 10 | 11 | 0 | p.name | |

| |\ +----------------+------+---------+-------------+-----------+

| | +Projection | 1 | 1 | 1 | p, p.name | p.name |

| | | +----------------+------+---------+-------------+-----------+

| | +NodeByLabelScan | 1 | 1 | 2 | p | :Country |

| | +----------------+------+---------+-------------+-----------+

| +Projection | 10 | 10 | 10 | p, p.name | p.name |

| | +----------------+------+---------+-------------+-----------+

| +NodeByLabelScan | 10 | 10 | 11 | p | :Location |

+--------------------+----------------+------+---------+-------------+-----------+

Total database accesses: 24

Unwind

Takes a collection of values and returns one row per item in the collection.

Query

UNWIND range(1,5) AS value

RETURN value;

Query Plan

+-----------------+----------------+------+---------+-------------+-------+

+-----------------+----------------+------+---------+-------------+-------+

| +ProduceResults | 10 | 5 | 0 | value | value |

| | +----------------+------+---------+-------------+-------+

| +UNWIND | 10 | 5 | 0 | value | |

| | +----------------+------+---------+-------------+-------+

| +Argument | 1 | 1 | 0 | | |

+-----------------+----------------+------+---------+-------------+-------+

Total database accesses: 0

Execution Plans

275

16.5.Update Operators

These operators are used in queries that update the graph.

Constraint Operation

Creates a constraint on a (label,property) pair. The following query will create a unique constraint on

the name property of nodes with the Country label.

Query

CREATE CONSTRAINT ON (c:Country) ASSERT c.name IS UNIQUE

Query Plan

+----------------------+------+---------+

| Operator | Rows | DB Hits |

+----------------------+------+---------+

| +ConstraintOperation | 0 | 3 |

+----------------------+------+---------+

Total database accesses: 3

Empty Result

Eagerly loads everything coming in to the EmptyResult operator and discards it.

Query

CREATE (:Person)

Query Plan

+--------------+------+---------+-------------+------------+

+--------------+------+---------+-------------+------------+

| +EmptyResult | 0 | 0 | | |

| | +------+---------+-------------+------------+

+--------------+------+---------+-------------+------------+

Total database accesses: 2

Update Graph

Applies updates to the graph.

Query

CREATE (:Person { name: "Alistair" })

Query Plan

+--------------+------+---------+-------------+------------+

+--------------+------+---------+-------------+------------+

| +EmptyResult | 0 | 0 | | |

| | +------+---------+-------------+------------+

+--------------+------+---------+-------------+------------+

Total database accesses: 4

Merge Into

When both the start and end node have already been found, merge-into is used to find all connecting

relationships or creating a new relationship between the two nodes.

Execution Plans

276

Query

MATCH (p:Person { name: "me" }),(f:Person { name: "Andres" })

MERGE (p)-[:FRIENDS_WITH]->(f)

Query Plan

+--------------+------+---------+----------------+--------------------------------+

+--------------+------+---------+----------------+--------------------------------+

| +EmptyResult | 0 | 0 | | |

| | +------+---------+----------------+--------------------------------+

| | +------+---------+----------------+--------------------------------+

| +SchemaIndex | 1 | 2 | f, p | { AUTOSTRING1}; :Person(name) |

| | +------+---------+----------------+--------------------------------+

| +SchemaIndex | 1 | 2 | p | { AUTOSTRING0}; :Person(name) |

+--------------+------+---------+----------------+--------------------------------+

Total database accesses: 9

PartIV.Reference

The reference part is the authoritative source for details on Neo4j usage. It covers details on

capabilities, transactions, indexing and queries among other topics.

278

17. Capabilities ....................................................................................................................................... 279

17.1. Data Security ......................................................................................................................... 280

17.2. Data Integrity ......................................................................................................................... 281

17.3. Data Integration .................................................................................................................... 282

17.4. Availability and Reliability ..................................................................................................... 283

17.5. Capacity ................................................................................................................................. 284

18. Transaction Management ................................................................................................................ 285

18.1. Interaction cycle .................................................................................................................... 286

18.2. Isolation levels ....................................................................................................................... 287

18.3. Default locking behavior ....................................................................................................... 288

18.4. Deadlocks .............................................................................................................................. 289

18.5. Delete semantics ................................................................................................................... 292

18.6. Creating unique nodes .......................................................................................................... 293

18.7. Transaction events ................................................................................................................ 294

19. Data Import ...................................................................................................................................... 295

20. Graph Algorithms ............................................................................................................................. 296

21. REST API ........................................................................................................................................... 297

21.1. Transactional Cypher HTTP endpoint ................................................................................... 298

21.2. Neo4j Status Codes ............................................................................................................... 307

21.3. REST API Authentication and Authorization ......................................................................... 312

21.4. Service root ........................................................................................................................... 316

21.5. Streaming .............................................................................................................................. 317

21.6. Legacy Cypher HTTP endpoint ............................................................................................. 318

21.7. Property values ..................................................................................................................... 332

21.8. Nodes .................................................................................................................................... 333

21.9. Relationships ......................................................................................................................... 338

21.10. Relationship types ............................................................................................................... 349

21.11. Node properties .................................................................................................................. 350

21.12. Relationship properties ....................................................................................................... 354

21.13. Node labels ......................................................................................................................... 360

21.14. Node degree ........................................................................................................................ 366

21.15. Indexing ............................................................................................................................... 368

21.16. Constraints .......................................................................................................................... 370

21.17. Traversals ............................................................................................................................. 374

21.18. Graph Algorithms ................................................................................................................ 401

21.19. Batch operations ................................................................................................................. 408

21.20. Legacy indexing ................................................................................................................... 416

21.21. Unique Indexing .................................................................................................................. 422

21.22. WADL Support ..................................................................................................................... 432

21.23. Using the REST API from WebLogic .................................................................................... 433

22. Deprecations .................................................................................................................................... 434

279

Chapter17.Capabilities

Capabilities

280

17.1.Data Security

Some data may need to be protected from unauthorized access (e.g., theft, modification). Neo4j

does not deal with data encryption explicitly, but supports all means built into the Java programming

language and the JVM to protect data by encrypting it before storing.

Furthermore, data can be easily secured by running on an encrypted datastore at the file system level.

Finally, data protection should be considered in the upper layers of the surrounding system in order to

prevent problems with scraping, malicious data insertion, and other threats.

Capabilities

281

17.2.Data Integrity

In order to keep data consistent, a good database needs mechanisms and structures that guarantee

the integrity of all stored data. In Neo4j, data integrity is guaranteed both for graph elements (Nodes,

Relationships and Properties) and for non-graph data, such as the indexes. Neo4j’s transactional

architecture ensures that data is protected and provides for fast recovery from an unexpected failure,

without the need to rebuild internal indexes or other costly operations.

Capabilities

282

17.3.Data Integration

Most enterprises rely primarily on relational databases to store their data, but this may cause

performance limitations. In some of these cases, Neo4j can be used as an extension to supplement

search/lookup for faster decision making. However, in any situation where multiple data repositories

contain the same data, synchronization can be an issue.

In some applications, it is acceptable for the search platform to be slightly out of sync with the

relational database. In others, tight data integrity (eg., between Neo4j and RDBMS) is necessary.

Typically, this has to be addressed for data changing in real-time and for bulk data changes happening

in the RDBMS.

A few strategies for synchronizing integrated data follows.

Event-based Synchronization

In this scenario, all data stores, both RDBMS and Neo4j, are fed with domain-specific events via

an event bus. Thus, the data held in the different backends is not actually synchronized but rather

replicated.

Periodic Synchronization

Another viable scenario is the periodic export of the latest changes in the RDBMS to Neo4j via some

form of SQL query. This allows a small amount of latency in the synchronization, but has the advantage

of using the RDBMS as the master for all data purposes. The same process can be applied with Neo4j

as the master data source.

Periodic Full Export/Import of Data

Using the Batch Inserter tools for Neo4j, even large amounts of data can be imported into the database

in very short times. Thus, a full export from the RDBMS and import into Neo4j becomes possible. If the

propagation lag between the RDBMS and Neo4j is not a big issue, this is a very viable solution.

Capabilities

283

17.4.Availability and Reliability

Most mission-critical systems require the database subsystem to be accessible at all times. Neo4j

ensures availability and reliability through a few different strategies.

Operational Availability

In order not to create a single point of failure, Neo4j supports different approaches which provide

transparent fallback and/or recovery from failures.

Online backup (Cold spare)

In this approach, a single instance of the master database is used, with Online Backup enabled. In

case of a failure, the backup files can be mounted onto a new Neo4j instance and reintegrated into the

application.

Online Backup High Availability (Hot spare)

Here, a Neo4j "backup" instance listens to online transfers of changes from the master. In the event of

a failure of the master, the backup is already running and can directly take over the load.

High Availability cluster

This approach uses a cluster of database instances, with one (read/write) master and a number of

(read-only) slaves. Failing slaves can simply be restarted and brought back online. Alternatively, a new

slave may be added by cloning an existing one. Should the master instance fail, a new master will be

elected by the remaining cluster nodes.

Disaster Recovery/ Resiliency

In cases of a breakdown of major part of the IT infrastructure, there need to be mechanisms in place

that enable the fast recovery and regrouping of the remaining services and servers. In Neo4j, there are

different components that are suitable to be part of a disaster recovery strategy.

Prevention

• Online Backup High Availability to other locations outside the current data center.

• Online Backup to different file system locations: this is a simpler form of backup, applying changes

directly to backup files; it is thus more suited for local backup scenarios.

• Neo4j High Availability cluster: a cluster of one write-master Neo4j server and a number of read-

slaves, getting transaction logs from the master. Write-master failover is handled by quorum election

among the read-slaves for a new master.

Detection

• SNMP and JMX monitoring can be used for the Neo4j database.

Correction

• Online Backup: A new Neo4j server can be started directly on the backed-up files and take over new

requests.

• Neo4j High Availability cluster: A broken Neo4j read slave can be reinserted into the cluster, getting

the latest updates from the master. Alternatively, a new server can be inserted by copying an existing

server and applying the latest updates to it.

Capabilities

284

17.5.Capacity

File Sizes

Neo4j relies on Java’s Non-blocking I/O subsystem for all file handling. Furthermore, while the storage

file layout is optimized for interconnected data, Neo4j does not require raw devices. Thus, file sizes are

only limited by the underlying operating system’s capacity to handle large files. Physically, there is no

built-in limit of the file handling capacity in Neo4j.

Neo4j has a built-in page cache, that will cache the contents of the storage files. If there is not enough

RAM to keep the storage files resident, then Neo4j will page parts of the files in and out as necessary,

while keeping the most popular parts of the files resident at all times. Thus, ACID speed degrades

gracefully as RAM becomes the limiting factor.

Read speed

Enterprises want to optimize the use of hardware to deliver the maximum business value from

available resources. Neo4j’s approach to reading data provides the best possible usage of all available

hardware resources. Neo4j does not block or lock any read operations; thus, there is no danger for

deadlocks in read operations and no need for read transactions. With a threaded read access to the

database, queries can be run simultaneously on as many processors as may be available. This provides

very good scale-up scenarios with bigger servers.

Write speed

Write speed is a consideration for many enterprise applications. However, there are two different

scenarios:

1. sustained continuous operation and

2. bulk access (e.g., backup, initial or batch loading).

To support the disparate requirements of these scenarios, Neo4j supports two modes of writing to the

storage layer.

In transactional, ACID-compliant normal operation, isolation level is maintained and read operations

can occur at the same time as the writing process. At every commit, the data is persisted to disk and

can be recovered to a consistent state upon system failures. This requires disk write access and a real

flushing of data. Thus, the write speed of Neo4j on a single server in continuous mode is limited by the

I/O capacity of the hardware. Consequently, the use of fast SSDs is highly recommended for production

scenarios.

Neo4j has a Batch Inserter that operates directly on the store files. This mode does not provide

transactional security, so it can only be used when there is a single write thread. Because data is written

sequentially, and never flushed to the logical logs, huge performance boosts are achieved. The Batch

Inserter is optimized for non-transactional bulk import of large amounts of data.

Data size

In Neo4j, data size is mainly limited by the address space of the primary keys for Nodes, Relationships,

Properties and RelationshipTypes. Currently, the address space is as follows:

nodes 235 (∼ 34 billion)

relationships 235 (∼ 34 billion)

properties 236 to 238 depending on property types (maximum

∼ 274 billion, always at least ∼ 68 billion)

relationship types 216 (∼ 65 000)

285

Chapter18.Transaction Management

In order to fully maintain data integrity and ensure good transactional behavior, Neo4j supports the

ACID properties:

• atomicity: If any part of a transaction fails, the database state is left unchanged.

• consistency: Any transaction will leave the database in a consistent state.

• isolation: During a transaction, modified data cannot be accessed by other operations.

• durability: The DBMS can always recover the results of a committed transaction.

Specifically:

• All database operations that access the graph, indexes, or the schema must be performed in a

transaction.

•The default isolation level is READ_COMMITTED.

• Data retrieved by traversals is not protected from modification by other transactions.

• Non-repeatable reads may occur (i.e., only write locks are acquired and held until the end of the

transaction).

• One can manually acquire write locks on nodes and relationships to achieve higher level of isolation

(SERIALIZABLE).

• Locks are acquired at the Node and Relationship level.

• Deadlock detection is built into the core transaction management.

Transaction Management

286

18.1.Interaction cycle

All database operations that access the graph, indexes, or the schema must be performed in a

transaction. Transactions are thread confined and can be nested as “flat nested transactions”. Flat

nested transactions means that all nested transactions are added to the scope of the top level

transaction. A nested transaction can mark the top level transaction for rollback, meaning the entire

transaction will be rolled back. To only rollback changes made in a nested transaction is not possible.

The interaction cycle of working with transactions looks like this:

1. Begin a transaction.

2. Perform database operations.

3. Mark the transaction as successful or not.

4. Finish the transaction.

It is very important to finish each transaction. The transaction will not release the locks or memory it

has acquired until it has been finished. The idiomatic use of transactions in Neo4j is to use a try-finally

block, starting the transaction and then try to perform the graph operations. The last operation in the

try block should mark the transaction as successful while the finally block should finish the transaction.

Finishing the transaction will perform commit or rollback depending on the success status.

Caution

All modifications performed in a transaction are kept in memory. This means that very large

updates have to be split into several top level transactions to avoid running out of memory.

It must be a top level transaction since splitting up the work in many nested transactions will

just add all the work to the top level transaction.

In an environment that makes use of thread pooling other errors may occur when failing to finish a

transaction properly. Consider a leaked transaction that did not get finished properly. It will be tied

to a thread and when that thread gets scheduled to perform work starting a new (what looks to be a)

top level transaction it will actually be a nested transaction. If the leaked transaction state is “marked

for rollback” (which will happen if a deadlock was detected) no more work can be performed on that

transaction. Trying to do so will result in error on each call to a write operation.

Transaction Management

287

18.2.Isolation levels

Transactions in Neo4j use a read-committed isolation level, which means they will see data as soon as it

has been committed and will not see data in other transactions that have not yet been committed. This

type of isolation is weaker than serialization but offers significant performance advantages whilst being

sufficient for the overwhelming majority of cases.

In addition, the Neo4j Java API (see PartVII, “Advanced Usage” [561]) enables explicit locking of nodes

and relationships. Using locks gives the opportunity to simulate the effects of higher levels of isolation

by obtaining and releasing locks explicitly. For example, if a write lock is taken on a common node or

relationship, then all transactions will serialize on that lock — giving the effect of a serialization isolation

level.

Lost Updates in Cypher

In Cypher it is possible to acquire write locks to simulate improved isolation in some cases. Consider

the case where multiple concurrent Cypher queries increment the value of a property. Due to the

limitations of the read-committed isolation level, the increments will not result in a deterministic final

value.

For example, the following query, if run by one hundred concurrent clients, will very likely not

increment the property n.prop to 100, but some value lower than 100.

MATCH (n:X {id: 42})

SET n.prop = n.prop + 1

This is because all queries will read the value of n.prop within their own transaction. They will not

see the incremented value from any other transaction that has not yet committed. In the worst case

scenario the final value could be as low as 1, if all threads perform the read before any has committed

their transaction.

To ensure deterministic behavior, it is necessary to grab a write lock on the node in question. In Cypher

there is no explicit support for this, but we can work around this limitation by writing to a temporary

property.

MATCH (n:X {id: 42})

SET n._LOCK_ = true

SET n.prop = n.prop + 1

REMOVE n._LOCK_

The existence of the SET n._LOCK_ statement before the read of the n.prop read ensures the lock

is acquired before the read action, and no updates will be lost due to enforced serialization of all

concurrent queries on that specific node.

Transaction Management

288

18.3.Default locking behavior

• When adding, changing or removing a property on a node or relationship a write lock will be taken on

the specific node or relationship.

• When creating or deleting a node a write lock will be taken for the specific node.

• When creating or deleting a relationship a write lock will be taken on the specific relationship and

both its nodes.

The locks will be added to the transaction and released when the transaction finishes.

Transaction Management

289

18.4.Deadlocks

Understanding deadlocks

Since locks are used it is possible for deadlocks to happen. Neo4j will however detect any deadlock

(caused by acquiring a lock) before they happen and throw an exception. Before the exception is

thrown the transaction is marked for rollback. All locks acquired by the transaction are still being held

but will be released when the transaction is finished (in the finally block as pointed out earlier). Once

the locks are released other transactions that were waiting for locks held by the transaction causing the

deadlock can proceed. The work performed by the transaction causing the deadlock can then be retried

by the user if needed.

Experiencing frequent deadlocks is an indication of concurrent write requests happening in such a

way that it is not possible to execute them while at the same time live up to the intended isolation and

consistency. The solution is to make sure concurrent updates happen in a reasonable way. For example

given two specific nodes (A and B), adding or deleting relationships to both these nodes in random

order for each transaction will result in deadlocks when there are two or more transactions doing that

concurrently. One solution is to make sure that updates always happens in the same order (first A then

B). Another solution is to make sure that each thread/transaction does not have any conflicting writes

to a node or relationship as some other concurrent transaction. This can for example be achieved by

letting a single thread do all updates of a specific type.

Important

Deadlocks caused by the use of other synchronization than the locks managed by Neo4j can

still happen. Since all operations in the Neo4j API are thread safe unless specified otherwise,

there is no need for external synchronization. Other code that requires synchronization

should be synchronized in such a way that it never performs any Neo4j operation in the

synchronized block.

Deadlock handling example code

Below you’ll find examples of how deadlocks can be handled in server extensions/plugins or when

using Neo4j embedded.

Tip

The full source code used for the code snippets can be found at DeadlockDocTest.java1.

When dealing with deadlocks in code, there are several issues you may want to address:

• Only do a limited amount of retries, and fail if a threshold is reached.

• Pause between each attempt to allow the other transaction to finish before trying again.

• A retry-loop can be useful not only for deadlocks, but for other types of transient errors as well.

In the following sections you’ll find example code in Java which shows how this can be implemented.

Handling deadlocks using TransactionTemplate

If you don’t want to write all the code yourself, there is a class called TransactionTemplate2 that will help

you achieve what’s needed. Below is an example of how to create, customize, and use this template for

retries in transactions.

First, define the base template:

TransactionTemplate template = new TransactionTemplate( ).retries( 5 ).backoff( 3, TimeUnit.SECONDS );

1 https://github.com/neo4j/neo4j/blob/2.3.12/community/kernel/src/test/java/examples/DeadlockDocTest.java

2 http://neo4j.com/docs/2.3.12/javadocs/org/neo4j/helpers/TransactionTemplate.html

Transaction Management

290

Next, specify the database to use and a function to execute:

Object result = template.with(graphDatabaseService).execute( new Function<Transaction, Object>()

{

@Override

public Object apply( Transaction transaction ) throws RuntimeException

{

Object result = null;

return result;

}

} );

The operations that could lead to a deadlock should go into the apply method.

The TransactionTemplate uses a fluent API for configuration, and you can choose whether to set

everything at once, or (as in the example) provide some details just before using it. The template allows

setting a predicate for what exceptions to retry on, and also allows for easy monitoring of events that

take place.

Handling deadlocks using a retry loop

If you want to roll your own retry-loop code, see below for inspiration. Here’s an example of what a

retry block might look like:

Throwable txEx = null;

int RETRIES = 5;

int BACKOFF = 3000;

for ( int i = 0; i < RETRIES; i++ )

{

try ( Transaction tx = graphDatabaseService.beginTx() )

{

Object result = doStuff(tx);

tx.success();

return result;

}

catch ( Throwable ex )

{

txEx = ex;

// Add whatever exceptions to retry on here

if ( !(ex instanceof DeadlockDetectedException) )

{

break;

}

// Wait so that we don't immediately get into the same deadlock

if ( i < RETRIES - 1 )

{

try

{

Thread.sleep( BACKOFF );

}

catch ( InterruptedException e )

{

throw new TransactionFailureException( "Interrupted", e );

}

if ( txEx instanceof TransactionFailureException )

{

throw ((TransactionFailureException) txEx);

}

else if ( txEx instanceof Error )

Transaction Management

291

{

throw ((Error) txEx);

}

else if ( txEx instanceof RuntimeException )

{

throw ((RuntimeException) txEx);

}

else

{

throw new TransactionFailureException( "Failed", txEx );

}

The above is the gist of what such a retry block would look like, and which you can customize to fit your

needs.

Transaction Management

292

18.5.Delete semantics

When deleting a node or a relationship all properties for that entity will be automatically removed but

the relationships of a node will not be removed.

Caution

Neo4j enforces a constraint (upon commit) that all relationships must have a valid

start node and end node. In effect this means that trying to delete a node that still has

relationships attached to it will throw an exception upon commit. It is however possible

to choose in which order to delete the node and the attached relationships as long as no

relationships exist when the transaction is committed.

The delete semantics can be summarized in the following bullets:

• All properties of a node or relationship will be removed when it is deleted.

• A deleted node can not have any attached relationships when the transaction commits.

• It is possible to acquire a reference to a deleted relationship or node that has not yet been

committed.

• Any write operation on a node or relationship after it has been deleted (but not yet committed) will

throw an exception

• After commit trying to acquire a new or work with an old reference to a deleted node or relationship

will throw an exception.

Transaction Management

293

18.6.Creating unique nodes

In many use cases, a certain level of uniqueness is desired among entities. You could for instance

imagine that only one user with a certain e-mail address may exist in a system. If multiple concurrent

threads naively try to create the user, duplicates will be created. There are three main strategies for

ensuring uniqueness, and they all work across High Availability and single-instance deployments.

Single thread

By using a single thread, no two threads will even try to create a particular entity simultaneously. On

High Availability, an external single-threaded client can perform the operations on the cluster.

Get or create

The preferred way to get or create a unique node is to use unique constraints and Cypher. See the

section called “Get or create unique node using Cypher and unique constraints” [605] for more

information.

By using put-if-absent3 functionality, entity uniqueness can be guaranteed using a legacy index. Here

the legacy index acts as the lock and will only lock the smallest part needed to guaranteed uniqueness

across threads and transactions.

See the section called “Get or create unique node using a legacy index” [605] for how to do this using

the core Java API. When using the REST API, see Section21.21, “Unique Indexing” [422].

Pessimistic locking

Important

While this is a working solution, please consider using the preferred the section called “Get

or create” [293] instead.

By using explicit, pessimistic locking, unique creation of entities can be achieved in a multi-threaded

environment. It is most commonly done by locking on a single or a set of common nodes.

See the section called “Pessimistic locking for node creation” [606] for how to do this using the core

Java API.

3 http://neo4j.com/docs/2.3.12/javadocs/org/neo4j/graphdb/index/Index.html#putIfAbsent%28T,%20java.lang.String,

%20java.lang.Object%29

Transaction Management

294

18.7.Transaction events

Transaction event handlers can be registered to receive Neo4j Transaction events. Once it has been

registered at a GraphDatabaseService instance it will receive events about what has happened in

each transaction which is about to be committed. Handlers won’t get notified about transactions

which haven’t performed any write operation or won’t be committed (either if Transactionsuccess()

hasn’t been called or the transaction has been marked as failed Transactionfailure(). Right before

a transaction is about to be committed the beforeCommit method is called with the entire diff of

modifications made in the transaction. At this point the transaction is still running so changes can still

be made. However there’s no guarantee that other handlers will see such changes since the order in

which handlers are executed is undefined. This method can also throw an exception and will, in such

a case, prevent the transaction from being committed (where a call to afterRollback will follow). If

beforeCommit is successfully executed in all registered handlers the transaction will be committed and

the afterCommit method will be called with the same transaction data as well as the object returned

from beforeCommit. In afterCommit the transaction has been closed and so accessing data outside of what

TransactionData covers requires a new transaction to be opened. TransactionEventHandler gets notified

about transactions that has any change accessible via TransactionData so some indexing and schema

changes will not be triggering these events.

295

Chapter19.Data Import

For importing data using Cypher and CSV, see Section12.8, “Importing CSV files with Cypher” [211].

For high-performance data import, see Chapter29, Import tool [531].

296

Chapter20.Graph Algorithms

Neo4j graph algorithms is a component that contains Neo4j implementations of some common

algorithms for graphs. It includes algorithms like:

• Shortest paths,

• all paths,

• all simple paths,

• Dijkstra and

• A*.

The graph algorithms are included with Neo4j.

For usage examples, see Section21.18, “Graph Algorithms” [401] (REST API) and Section33.11, “Graph

Algorithm examples” [602] (embedded database). The shortest path algorithm can be used from

Cypher as well, see the section called “Shortest path” [161].

297

Chapter21.REST API

The Neo4j REST API is designed with discoverability in mind, so that you can start with a GET on

the Section21.4, “Service root” [316] and from there discover URIs to perform other requests.

The examples below uses URIs in the examples; they are subject to change in the future, so for

future-proofness discover URIs where possible, instead of relying on the current layout. The default

representation is json1, both for responses and for data sent with POST/PUT requests.

Below follows a listing of ways to interact with the REST API. For language bindings to the REST API, see

Chapter7, Languages [96].

To interact with the JSON interface you must explicitly set the request header Accept:application/json

for those requests that responds with data. You should also set the header Content-Type:application/

json if your request sends data, for example when you’re creating a relationship. The examples include

the relevant request and response headers.

The server supports streaming results, with better performance and lower memory overhead. See

Section21.5, “Streaming” [317] for more information.

1 http://www.json.org/

REST API

298

21.1.Transactional Cypher HTTP endpoint

The default way to interact with Neo4j is by using this endpoint.

The Neo4j transactional HTTP endpoint allows you to execute a series of Cypher statements within

the scope of a transaction. The transaction may be kept open across multiple HTTP requests, until

the client chooses to commit or roll back. Each HTTP request can include a list of statements, and for

convenience you can include statements along with a request to begin or commit a transaction.

The server guards against orphaned transactions by using a timeout. If there are no requests for a

given transaction within the timeout period, the server will roll it back. You can configure the timeout

in the server configuration, by setting org.neo4j.server.transaction.timeout to the number of seconds

before timeout. The default timeout is 60 seconds.

The key difference between the transactional HTTP endpoint for Cypher and the Cypher endpoint (see

Section21.6, “Legacy Cypher HTTP endpoint” [318]) is the ability to use the same transaction across

multiple HTTP requests. The Cypher endpoint always attempts to commit a transaction at the end of

each HTTP request. There has also been improvements to the serialization format.

Note

• Literal line breaks are not allowed inside Cypher statements.

• Open transactions are not shared among members of an HA cluster. Therefore, if you use

this endpoint in an HA cluster, you must ensure that all requests for a given transaction

are sent to the same Neo4j instance.

•Cypher queries with USING PERIODIC COMMIT (see Section12.9, “Using Periodic

Commit” [213]) may only be executed when creating a new transaction and immediately

committing it with a single HTTP request (see the section called “Begin and commit a

transaction in one request” [298] for how to do that).

• The serialization format for Cypher results is mostly the same as the Cypher endpoint.

However, the format for raw entities is slightly less verbose and does not include

hypermedia links.

Tip

In order to speed up queries in repeated scenarios, try not to use literals but replace

them with parameters wherever possible. This will let the server cache query plans. See

Section8.5, “Parameters” [113] for more information.

Begin and commit a transaction in one request

If there is no need to keep a transaction open across multiple HTTP requests, you can begin a

transaction, execute statements, and commit with just a single HTTP request.

Example request

•POST http://localhost:7474/db/data/transaction/commit

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ {

"statement" : "CREATE (n) RETURN id(n)"

} ]

}

Example response

•200: OK

•Content-Type: application/json

REST API

299

{

"results" : [ {

"columns" : [ "id(n)" ],

"data" : [ {

"row" : [ 18 ]

} ]

} ],

"errors" : [ ]

}

Execute multiple statements

You can send multiple Cypher statements in the same request. The response will contain the result of

each statement.

Example request

•POST http://localhost:7474/db/data/transaction/commit

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ {

"statement" : "CREATE (n) RETURN id(n)"

}, {

"statement" : "CREATE (n {props}) RETURN n",

"parameters" : {

"props" : {

"name" : "My Node"

}

} ]

}

Example response

•200: OK

•Content-Type: application/json

{

"results" : [ {

"columns" : [ "id(n)" ],

"data" : [ {

"row" : [ 14 ]

} ]

}, {

"columns" : [ "n" ],

"data" : [ {

"row" : [ {

"name" : "My Node"

} ]

} ],

"errors" : [ ]

}

Begin a transaction

You begin a new transaction by posting zero or more Cypher statements to the transaction endpoint.

The server will respond with the result of your statements, as well as the location of your open

transaction.

Example request

REST API

300

•POST http://localhost:7474/db/data/transaction

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ {

"statement" : "CREATE (n {props}) RETURN n",

"parameters" : {

"props" : {

"name" : "My Node"

}

} ]

}

Example response

•201: Created

•Content-Type: application/json

•Location: http://localhost:7474/db/data/transaction/9

{

"commit" : "http://localhost:7474/db/data/transaction/9/commit",

"results" : [ {

"columns" : [ "n" ],

"data" : [ {

"row" : [ {

"name" : "My Node"

} ]

} ],

"transaction" : {

"expires" : "Fri, 08 Dec 2017 11:04:46 +0000"

"errors" : [ ]

}

Execute statements in an open transaction

Given that you have an open transaction, you can make a number of requests, each of which executes

additional statements, and keeps the transaction open by resetting the transaction timeout.

Example request

•POST http://localhost:7474/db/data/transaction/11

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ {

"statement" : "CREATE (n) RETURN n"

} ]

}

Example response

•200: OK

•Content-Type: application/json

{

"commit" : "http://localhost:7474/db/data/transaction/11/commit",

REST API

301

"results" : [ {

"columns" : [ "n" ],

"data" : [ {

"row" : [ { } ]

} ]

} ],

"transaction" : {

"expires" : "Fri, 08 Dec 2017 11:04:46 +0000"

"errors" : [ ]

}

Execute statements in an open transaction in REST format for the return

Given that you have an open transaction, you can make a number of requests, each of which executes

additional statements, and keeps the transaction open by resetting the transaction timeout. Specifying

the REST format will give back full Neo4j Rest API representations of the Neo4j Nodes, Relationships and

Paths, if returned.

Example request

•POST http://localhost:7474/db/data/transaction/1

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ {

"statement" : "CREATE (n) RETURN n",

"resultDataContents" : [ "REST" ]

} ]

}

Example response

•200: OK

•Content-Type: application/json

{

"commit" : "http://localhost:7474/db/data/transaction/1/commit",

"results" : [ {

"columns" : [ "n" ],

"data" : [ {

"rest" : [ {

"outgoing_relationships" : "http://localhost:7474/db/data/node/12/relationships/out",

"labels" : "http://localhost:7474/db/data/node/12/labels",

"all_typed_relationships" : "http://localhost:7474/db/data/node/12/relationships/all/{-list|&|types}",

"traverse" : "http://localhost:7474/db/data/node/12/traverse/{returnType}",

"self" : "http://localhost:7474/db/data/node/12",

"property" : "http://localhost:7474/db/data/node/12/properties/{key}",

"properties" : "http://localhost:7474/db/data/node/12/properties",

"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/12/relationships/out/{-list|&|types}",

"incoming_relationships" : "http://localhost:7474/db/data/node/12/relationships/in",

"create_relationship" : "http://localhost:7474/db/data/node/12/relationships",

"paged_traverse" : "http://localhost:7474/db/data/node/12/paged/traverse/{returnType}{?pageSize,leaseTime}",

"all_relationships" : "http://localhost:7474/db/data/node/12/relationships/all",

"incoming_typed_relationships" : "http://localhost:7474/db/data/node/12/relationships/in/{-list|&|types}",

"metadata" : {

"id" : 12,

"labels" : [ ]

"data" : { }

} ]

REST API

302

} ],

"transaction" : {

"expires" : "Fri, 08 Dec 2017 11:04:42 +0000"

"errors" : [ ]

}

Reset transaction timeout of an open transaction

Every orphaned transaction is automatically expired after a period of inactivity. This may be prevented

by resetting the transaction timeout.

The timeout may be reset by sending a keep-alive request to the server that executes an empty list

of statements. This request will reset the transaction timeout and return the new time at which the

transaction will expire as an RFC1123 formatted timestamp value in the “transaction” section of the

response.

Example request

•POST http://localhost:7474/db/data/transaction/2

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ ]

}

Example response

•200: OK

•Content-Type: application/json

{

"commit" : "http://localhost:7474/db/data/transaction/2/commit",

"results" : [ ],

"transaction" : {

"expires" : "Fri, 08 Dec 2017 11:04:45 +0000"

"errors" : [ ]

}

Commit an open transaction

Given you have an open transaction, you can send a commit request. Optionally, you submit additional

statements along with the request that will be executed before committing the transaction.

Example request

•POST http://localhost:7474/db/data/transaction/6/commit

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ {

"statement" : "CREATE (n) RETURN id(n)"

} ]

}

Example response

•200: OK

•Content-Type: application/json

REST API

303

{

"results" : [ {

"columns" : [ "id(n)" ],

"data" : [ {

"row" : [ 17 ]

} ]

} ],

"errors" : [ ]

}

Rollback an open transaction

Given that you have an open transaction, you can send a rollback request. The server will rollback the

transaction. Any further statements trying to run in this transaction will fail immediately.

Example request

•DELETE http://localhost:7474/db/data/transaction/3

•Accept: application/json; charset=UTF-8

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"results" : [ ],

"errors" : [ ]

}

Include query statistics

By setting includeStats to true for a statement, query statistics will be returned for it.

Example request

•POST http://localhost:7474/db/data/transaction/commit

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ {

"statement" : "CREATE (n) RETURN id(n)",

"includeStats" : true

} ]

}

Example response

•200: OK

•Content-Type: application/json

{

"results" : [ {

"columns" : [ "id(n)" ],

"data" : [ {

"row" : [ 16 ]

} ],

"stats" : {

"contains_updates" : true,

"nodes_created" : 1,

"nodes_deleted" : 0,

REST API

304

"properties_set" : 0,

"relationships_created" : 0,

"relationship_deleted" : 0,

"labels_added" : 0,

"labels_removed" : 0,

"indexes_added" : 0,

"indexes_removed" : 0,

"constraints_added" : 0,

"constraints_removed" : 0

}

} ],

"errors" : [ ]

}

Return results in graph format

If you want to understand the graph structure of nodes and relationships returned by your query, you

can specify the "graph" results data format. For example, this is useful when you want to visualise the

graph structure. The format collates all the nodes and relationships from all columns of the result, and

also flattens collections of nodes and relationships, including paths.

Example request

•POST http://localhost:7474/db/data/transaction/commit

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ {

"statement" : "CREATE ( bike:Bike { weight: 10 } ) CREATE ( frontWheel:Wheel { spokes: 3 } ) CREATE ( backWheel:Wheel

{ spokes: 32 } ) CREATE p1 = (bike)-[:HAS { position: 1 } ]->(frontWheel) CREATE p2 = (bike)-[:HAS { position: 2 } ]-

>(backWheel) RETURN bike, p1, p2",

"resultDataContents" : [ "row", "graph" ]

} ]

}

Example response

•200: OK

•Content-Type: application/json

{

"results" : [ {

"columns" : [ "bike", "p1", "p2" ],

"data" : [ {

"row" : [ {

"weight" : 10

}, [ {

"weight" : 10

}, {

"position" : 1

}, {

"spokes" : 3

} ], [ {

"weight" : 10

}, {

"position" : 2

}, {

"spokes" : 32

} ] ],

"graph" : {

"nodes" : [ {

"id" : "19",

REST API

305

"labels" : [ "Bike" ],

"properties" : {

"weight" : 10

}

}, {

"id" : "21",

"labels" : [ "Wheel" ],

"properties" : {

"spokes" : 32

}

}, {

"id" : "20",

"labels" : [ "Wheel" ],

"properties" : {

"spokes" : 3

}

} ],

"relationships" : [ {

"id" : "9",

"type" : "HAS",

"startNode" : "19",

"endNode" : "20",

"properties" : {

"position" : 1

}

}, {

"id" : "10",

"type" : "HAS",

"startNode" : "19",

"endNode" : "21",

"properties" : {

"position" : 2

}

} ]

}

} ]

} ],

"errors" : [ ]

}

Handling errors

The result of any request against the transaction endpoint is streamed back to the client. Therefore

the server does not know whether the request will be successful or not when it sends the HTTP status

code.

Because of this, all requests against the transactional endpoint will return 200 or 201 status code,

regardless of whether statements were successfully executed. At the end of the response payload, the

server includes a list of errors that occurred while executing statements. If this list is empty, the request

completed successfully.

If any errors occur while executing statements, the server will roll back the transaction.

In this example, we send the server an invalid statement to demonstrate error handling.

For more information on the status codes, see Section21.2, “Neo4j Status Codes” [307].

Example request

•POST http://localhost:7474/db/data/transaction/10/commit

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"statements" : [ {

REST API

306

"statement" : "This is not a valid Cypher Statement."

} ]

}

Example response

•200: OK

•Content-Type: application/json

{

"results" : [ ],

"errors" : [ {

"code" : "Neo.ClientError.Statement.InvalidSyntax",

"message" : "Invalid input 'T': expected <init> (line 1, column 1 (offset: 0))\n\"This is not a valid Cypher Statement.

\"\n ^"

} ]

}

REST API

307

21.2.Neo4j Status Codes

The transactional endpoint may in any response include zero or more status codes, indicating issues or

information for the client. Each status code follows the same format: "Neo.[Classification].[Category].

[Title]". The fact that a status code is returned by the server does always mean there is a fatal error.

Status codes can also indicate transient problems that may go away if you retry the request.

What the effect of the status code is can be determined by its classification.

Note

This is not the same thing as HTTP status codes. Neo4j Status Codes are returned in the

response body, at the very end of the response.

Classifications

Classification Description Effect on

transaction

ClientError The Client sent a bad request - changing the request might

yield a successful outcome.

Rollback

ClientNotification There are notifications about the request sent by the client. None

DatabaseError The database failed to service the request. Rollback

TransientError The database cannot service the request right now, retrying

later might yield a successful outcome.

Rollback

Status codes

This is a complete list of all status codes Neo4j may return, and what they mean.

Status Code Description

Neo. ClientError. General. ReadOnly This is a read only database, writing or modifying

the database is not allowed.

Neo. ClientError. LegacyIndex. NoSuchIndex The request (directly or indirectly) referred to a

index that does not exist.

Neo. ClientError. Request. Invalid The client provided an invalid request.

Neo. ClientError. Request. InvalidFormat The client provided a request that was missing

required fields, or had values that are not allowed.

Neo. ClientError. Schema. ConstraintAlreadyExists Unable to perform operation because it would

clash with a pre-existing constraint.

Neo. ClientError. Schema.

ConstraintVerificationFailure

Unable to create constraint because data that

exists in the database violates it.

Neo. ClientError. Schema. ConstraintViolation A constraint imposed by the database was

violated.

Neo. ClientError. Schema. IllegalTokenName A token name, such as a label, relationship type or

property key, used is not valid. Tokens cannot be

empty strings and cannot be null.

Neo. ClientError. Schema. IndexAlreadyExists Unable to perform operation because it would

clash with a pre-existing index.

Neo. ClientError. Schema. IndexBelongsToConstraint A requested operation can not be performed on

the specified index because the index is part of

a constraint. If you want to drop the index, for

instance, you must drop the constraint.

REST API

308

Status Code Description

Neo. ClientError. Schema. IndexLimitReached The maximum number of index entries supported

has been reached, no more entities can be

indexed.

Neo. ClientError. Schema. LabelLimitReached The maximum number of labels supported has

been reached, no more labels can be created.

Neo. ClientError. Schema. NoSuchConstraint The request (directly or indirectly) referred to a

constraint that does not exist.

Neo. ClientError. Schema. NoSuchIndex The request (directly or indirectly) referred to an

index that does not exist.

Neo. ClientError. Security. AuthenticationFailed The client provided an incorrect username and/or

password.

Neo. ClientError. Security. AuthenticationRateLimit The client has provided incorrect authentication

details too many times in a row.

Neo. ClientError. Security. AuthorizationFailed The client does not have privileges to perform the

operation requested.

Neo. ClientError. Statement. ArithmeticError Invalid use of arithmetic, such as dividing by zero.

Neo. ClientError. Statement. ConstraintViolation A constraint imposed by the statement is violated

by the data in the database.

Neo. ClientError. Statement. EntityNotFound The statement is directly referring to an entity that

does not exist.

Neo. ClientError. Statement. InvalidArguments The statement is attempting to perform

operations using invalid arguments

Neo. ClientError. Statement. InvalidSemantics The statement is syntactically valid, but expresses

something that the database cannot do.

Neo. ClientError. Statement. InvalidSyntax The statement contains invalid or unsupported

syntax.

Neo. ClientError. Statement. InvalidType The statement is attempting to perform

operations on values with types that are not

supported by the operation.

Neo. ClientError. Statement. NoSuchLabel The statement is referring to a label that does not

exist.

Neo. ClientError. Statement. NoSuchProperty The statement is referring to a property that does

not exist.

Neo. ClientError. Statement. ParameterMissing The statement is referring to a parameter that was

not provided in the request.

Neo. ClientError. Transaction. ConcurrentRequest There were concurrent requests accessing the

same transaction, which is not allowed.

Neo. ClientError. Transaction.

EventHandlerThrewException

A transaction event handler threw an exception.

The transaction will be rolled back.

Neo. ClientError. Transaction. HookFailed Transaction hook failure.

Neo. ClientError. Transaction. InvalidType The transaction is of the wrong type to service the

request. For instance, a transaction that has had

schema modifications performed in it cannot be

used to subsequently perform data operations,

and vice versa.

Neo. ClientError. Transaction. MarkedAsFailed Transaction was marked as both successful

and failed. Failure takes precedence and so this

REST API

309

Status Code Description

transaction was rolled back although it may have

looked like it was going to be committed

Neo. ClientError. Transaction. UnknownId The request referred to a transaction that does not

exist.

Neo. ClientError. Transaction. ValidationFailed Transaction changes did not pass validation checks

Neo. ClientNotification. Statement. CartesianProduct This query builds a cartesian product between

disconnected patterns.

Neo. ClientNotification. Statement.

DeprecationWarning

This feature is deprecated and will be removed in

future versions.

Neo. ClientNotification. Statement.

DynamicPropertyWarning

Queries using dynamic properties will use neither

index seeks nor index scans for those properties

Neo. ClientNotification. Statement. EagerWarning The execution plan for this query contains the

Eager operator, which forces all dependent

data to be materialized in main memory before

proceeding

Neo. ClientNotification. Statement.

IndexMissingWarning

Adding a schema index may speed up this query.

Neo. ClientNotification. Statement.

JoinHintUnfulfillableWarning

The database was unable to plan a hinted join.

Neo. ClientNotification. Statement.

JoinHintUnsupportedWarning

Queries with join hints are not supported by the

RULE planner.

Neo. ClientNotification. Statement.

LabelMissingWarning

The provided label is not in the database.

Neo. ClientNotification. Statement.

PlannerUnsupportedWarning

This query is not supported by the COST planner.

Neo. ClientNotification. Statement.

PropertyNameMissingWarning

The provided property name is not in the database

Neo. ClientNotification. Statement.

RelTypeMissingWarning

The provided relationship type is not in the

database.

Neo. ClientNotification. Statement.

RuntimeUnsupportedWarning

This query is not supported by the compiled

runtime.

Neo. ClientNotification. Statement.

UnboundedPatternWarning

The provided pattern is unbounded, consider

adding an upper limit to the number of node

hops.

Neo. DatabaseError. General. CorruptSchemaRule A malformed schema rule was encountered.

Please contact your support representative.

Neo. DatabaseError. General. FailedIndex The request (directly or indirectly) referred to an

index that is in a failed state. The index needs to

be dropped and recreated manually.

Neo. DatabaseError. General. UnknownFailure An unknown failure occurred.

Neo. DatabaseError. Schema.

ConstraintCreationFailure

Creating a requested constraint failed.

Neo. DatabaseError. Schema. ConstraintDropFailure The database failed to drop a requested

constraint.

Neo. DatabaseError. Schema. DuplicateSchemaRule The request referred to a schema rule that defined

multiple times.

REST API

310

Status Code Description

Neo. DatabaseError. Schema. IndexCreationFailure Failed to create an index.

Neo. DatabaseError. Schema. IndexDropFailure The database failed to drop a requested index.

Neo. DatabaseError. Schema. NoSuchLabel The request accessed a label that did not exist.

Neo. DatabaseError. Schema. NoSuchPropertyKey The request accessed a property that does not

exist.

Neo. DatabaseError. Schema. NoSuchRelationshipType The request accessed a relationship type that does

not exist.

Neo. DatabaseError. Schema. NoSuchSchemaRule The request referred to a schema rule that does

not exist.

Neo. DatabaseError. Statement. ExecutionFailure The database was unable to execute the

statement.

Neo. DatabaseError. Transaction. CouldNotBegin The database was unable to start the transaction.

Neo. DatabaseError. Transaction. CouldNotCommit The database was unable to commit the

transaction.

Neo. DatabaseError. Transaction. CouldNotRollback The database was unable to roll back the

transaction.

Neo. DatabaseError. Transaction. CouldNotWriteToLog The database was unable to write transaction to

log.

Neo. DatabaseError. Transaction. ReleaseLocksFailed The transaction was unable to release one or more

of its locks.

Neo. TransientError. General. DatabaseUnavailable The database is not currently available to serve

your request, refer to the database logs for more

details. Retrying your request at a later time may

succeed.

Neo. TransientError. Network. UnknownFailure An unknown network failure occurred, a retry may

resolve the issue.

Neo. TransientError. Schema. ModifiedConcurrently The database schema was modified while this

transaction was running, the transaction should be

retried.

Neo. TransientError. Security. ModifiedConcurrently The user was modified concurrently to this

request.

Neo. TransientError. Statement.

ExternalResourceFailure

The external resource is not available

Neo. TransientError. Transaction. AcquireLockTimeout The transaction was unable to acquire a lock,

for instance due to a timeout or the transaction

thread being interrupted.

Neo. TransientError. Transaction. ConstraintsChanged Database constraints changed since the start of

this transaction

Neo. TransientError. Transaction. DeadlockDetected This transaction, and at least one more

transaction, has acquired locks in a way that it will

wait indefinitely, and the database has aborted

it. Retrying this transaction will most likely be

successful.

Neo. TransientError. Transaction. LockClientStopped Transaction terminated, no more locks can be

acquired.

REST API

311

Status Code Description

Neo. TransientError. Transaction. Outdated Transaction has seen state which has been

invalidated by applied updates while transaction

was active. Transaction may succeed if retried.

Neo. TransientError. Transaction. Terminated Explicitly terminated by the user.

REST API

312

21.3.REST API Authentication and Authorization

In order to prevent unauthorized access to Neo4j, the REST API supports authorization and

authentication. When enabled, requests to the REST API must be authorized using the username

and password of a valid user. Authorization is enabled by default, see the section called “Server

authentication and authorization” [500] for how to disable it.

When Neo4j is first installed you can authenticate with the default user neo4j and the default password

neo4j. However, the default password must be changed (see the section called “User status and

password changing” [313]) before access to resources will be permitted. This can easily be done via

the Neo4j Browser, or via direct HTTP calls.

The username and password combination is local to each Neo4j instance. If you wish to have multiple

instances in a cluster, you should ensure that all instances share the same credential. For automated

deployments, you may also copy security configuration from another Neo4j instance (see the section

called “Copying security configuration from one instance to another” [315]).

Authenticating

Missing authorization

If an Authorization header is not supplied, the server will reply with an error.

Example request

•GET http://localhost:7474/db/data/

•Accept: application/json; charset=UTF-8

Example response

•401: Unauthorized

•Content-Type: application/json; charset=UTF-8

•WWW-Authenticate: None

{

"errors" : [ {

"message" : "No authorization header supplied.",

"code" : "Neo.ClientError.Security.AuthorizationFailed"

} ]

}

Authenticate to access the server

Authenticate by sending a username and a password to Neo4j using HTTP Basic Auth. Requests should

include an Authorization header, with a value of Basic <payload>, where "payload" is a base64 encoded

string of "username:password".

Example request

•GET http://localhost:7474/user/neo4j

•Accept: application/json; charset=UTF-8

•Authorization: Basic bmVvNGo6c2VjcmV0

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"username" : "neo4j",

"password_change" : "http://localhost:7474/user/neo4j/password",

REST API

313

"password_change_required" : false

}

Incorrect authentication

If an incorrect username or password is provided, the server replies with an error.

Example request

•POST http://localhost:7474/db/data/

•Accept: application/json; charset=UTF-8

•Authorization: Basic bmVvNGo6aW5jb3JyZWN0

Example response

•401: Unauthorized

•Content-Type: application/json; charset=UTF-8

•WWW-Authenticate: None

{

"errors" : [ {

"message" : "Invalid username or password.",

"code" : "Neo.ClientError.Security.AuthorizationFailed"

} ]

}

Required password changes

In some cases, like the very first time Neo4j is accessed, the user will be required to choose a new

password. The database will signal that a new password is required and deny access.

See the section called “User status and password changing” [313] for how to set a new password.

Example request

•GET http://localhost:7474/db/data/

•Accept: application/json; charset=UTF-8

•Authorization: Basic bmVvNGo6bmVvNGo=

Example response

•403: Forbidden

•Content-Type: application/json; charset=UTF-8

{

"password_change" : "http://localhost:7474/user/neo4j/password",

"errors" : [ {

"message" : "User is required to change their password.",

"code" : "Neo.ClientError.Security.AuthorizationFailed"

} ]

}

User status and password changing

User status

Given that you know the current password, you can ask the server for the user status.

Example request

•GET http://localhost:7474/user/neo4j

•Accept: application/json; charset=UTF-8

REST API

314

•Authorization: Basic bmVvNGo6c2VjcmV0

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"username" : "neo4j",

"password_change" : "http://localhost:7474/user/neo4j/password",

"password_change_required" : false

}

User status on first access

On first access, and using the default password, the user status will indicate that the users password

requires changing.

Example request

•GET http://localhost:7474/user/neo4j

•Accept: application/json; charset=UTF-8

•Authorization: Basic bmVvNGo6bmVvNGo=

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"username" : "neo4j",

"password_change" : "http://localhost:7474/user/neo4j/password",

"password_change_required" : true

}

Changing the user password

Given that you know the current password, you can ask the server to change a users password. You can

choose any password you like, as long as it is different from the current password.

Example request

•POST http://localhost:7474/user/neo4j/password

•Accept: application/json; charset=UTF-8

•Authorization: Basic bmVvNGo6bmVvNGo=

•Content-Type: application/json

{

"password" : "secret"

}

Example response

•200: OK

Access when auth is disabled

When auth is disabled

When auth has been disabled in the configuration, requests can be sent without an Authorization

header.

REST API

315

Example request

•GET http://localhost:7474/db/data/

•Accept: application/json; charset=UTF-8

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"extensions" : { },

"node" : "http://localhost:7474/db/data/node",

"node_index" : "http://localhost:7474/db/data/index/node",

"relationship_index" : "http://localhost:7474/db/data/index/relationship",

"extensions_info" : "http://localhost:7474/db/data/ext",

"relationship_types" : "http://localhost:7474/db/data/relationship/types",

"batch" : "http://localhost:7474/db/data/batch",

"cypher" : "http://localhost:7474/db/data/cypher",

"indexes" : "http://localhost:7474/db/data/schema/index",

"constraints" : "http://localhost:7474/db/data/schema/constraint",

"transaction" : "http://localhost:7474/db/data/transaction",

"node_labels" : "http://localhost:7474/db/data/labels",

"neo4j_version" : "2.3.12"

}

Copying security configuration from one instance to another

In many cases, such as automated deployments, you may want to start a Neo4j instance with pre-

configured authentication and authorization. This is possible by copying the auth database file from a

pre-existing Neo4j instance to your new instance.

This file is located at data/dbms/auth, and simply copying that file into a new Neo4j instance will transfer

your password and authorization token.

REST API

316

21.4.Service root

Get service root

The service root is your starting point to discover the REST API. It contains the basic starting points for

the database, and some version and extension information.

Figure21.1.Final Graph

Example request

•GET http://localhost:7474/db/data/

•Accept: application/json; charset=UTF-8

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"extensions" : { },

"node" : "http://localhost:7474/db/data/node",

"node_index" : "http://localhost:7474/db/data/index/node",

"relationship_index" : "http://localhost:7474/db/data/index/relationship",

"extensions_info" : "http://localhost:7474/db/data/ext",

"relationship_types" : "http://localhost:7474/db/data/relationship/types",

"batch" : "http://localhost:7474/db/data/batch",

"cypher" : "http://localhost:7474/db/data/cypher",

"indexes" : "http://localhost:7474/db/data/schema/index",

"constraints" : "http://localhost:7474/db/data/schema/constraint",

"transaction" : "http://localhost:7474/db/data/transaction",

"node_labels" : "http://localhost:7474/db/data/labels",

"neo4j_version" : "2.3.12"

}

REST API

317

21.5.Streaming

All responses from the REST API can be transmitted as JSON streams, resulting in better performance

and lower memory overhead on the server side. To use streaming, supply the header X-Stream: true

with each request.

Example request

•GET http://localhost:7474/db/data/

•Accept: application/json

•X-Stream: true

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8; stream=true

{

"extensions" : { },

"node" : "http://localhost:7474/db/data/node",

"node_index" : "http://localhost:7474/db/data/index/node",

"relationship_index" : "http://localhost:7474/db/data/index/relationship",

"extensions_info" : "http://localhost:7474/db/data/ext",

"relationship_types" : "http://localhost:7474/db/data/relationship/types",

"batch" : "http://localhost:7474/db/data/batch",

"cypher" : "http://localhost:7474/db/data/cypher",

"indexes" : "http://localhost:7474/db/data/schema/index",

"constraints" : "http://localhost:7474/db/data/schema/constraint",

"transaction" : "http://localhost:7474/db/data/transaction",

"node_labels" : "http://localhost:7474/db/data/labels",

"neo4j_version" : "2.3.12"

}

REST API

318

21.6.Legacy Cypher HTTP endpoint

Note

This endpoint is deprecated. Please transition to using the new transactional endpoint (see

Section21.1, “Transactional Cypher HTTP endpoint” [298]). Among other things it allows you

to run multiple Cypher statements in the same transaction.

The Neo4j REST API allows querying with Cypher, see PartIII, “Cypher Query Language” [102]. The

results are returned as a list of string headers (columns), and a data part, consisting of a list of all rows,

every row consisting of a list of REST representations of the field value — Node, Relationship, Path or any

simple value like String.

Tip

In order to speed up queries in repeated scenarios, try not to use literals but replace them

with parameters wherever possible in order to let the server cache query plans, see the

section called “Use parameters” [318] for details. Also see Section8.5, “Parameters” [113]

for where parameters can be used.

Use parameters

Cypher supports queries with parameters which are submitted as JSON.

MATCH (x { name: { startName }})-[r]-(friend)

WHERE friend.name = { name }

RETURN TYPE(r)

Figure21.2.Final Graph

Node[79]

nam e = 'you'

Node[80]

nam e = 'I'

know

Example request

•POST http://localhost:7474/db/data/cypher

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "MATCH (x {name: {startName}})-[r]-(friend) WHERE friend.name = {name} RETURN TYPE(r)",

"params" : {

"startName" : "I",

"name" : "you"

}

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

REST API

319

"columns" : [ "TYPE(r)" ],

"data" : [ [ "know" ] ]

}

Create a node

Create a node with a label and a property using Cypher. See the request for the parameter sent with

the query.

CREATE (n:Person { name : { name }})

RETURN n

Figure21.3.Final Graph

Node[66]: Person

nam e = 'Andres'

Example request

•POST http://localhost:7474/db/data/cypher

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "CREATE (n:Person { name : {name} }) RETURN n",

"params" : {

"name" : "Andres"

}

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"columns" : [ "n" ],

"data" : [ [ {

"labels" : "http://localhost:7474/db/data/node/66/labels",

"outgoing_relationships" : "http://localhost:7474/db/data/node/66/relationships/out",

"data" : {

"name" : "Andres"

"all_typed_relationships" : "http://localhost:7474/db/data/node/66/relationships/all/{-list|&|types}",

"traverse" : "http://localhost:7474/db/data/node/66/traverse/{returnType}",

"self" : "http://localhost:7474/db/data/node/66",

"property" : "http://localhost:7474/db/data/node/66/properties/{key}",

"properties" : "http://localhost:7474/db/data/node/66/properties",

"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/66/relationships/out/{-list|&|types}",

"incoming_relationships" : "http://localhost:7474/db/data/node/66/relationships/in",

"extensions" : { },

"create_relationship" : "http://localhost:7474/db/data/node/66/relationships",

"paged_traverse" : "http://localhost:7474/db/data/node/66/paged/traverse/{returnType}{?pageSize,leaseTime}",

"all_relationships" : "http://localhost:7474/db/data/node/66/relationships/all",

"incoming_typed_relationships" : "http://localhost:7474/db/data/node/66/relationships/in/{-list|&|types}",

"metadata" : {

"id" : 66,

"labels" : [ "Person" ]

}

} ] ]

}

REST API

320

Create a node with multiple properties

Create a node with a label and multiple properties using Cypher. See the request for the parameter

sent with the query.

CREATE (n:Person { props })

RETURN n

Figure21.4.Final Graph

Node[63]: Person

awesom e = true

children = 3

nam e = 'Michael'

position = 'Developer'

Example request

•POST http://localhost:7474/db/data/cypher

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "CREATE (n:Person { props } ) RETURN n",

"params" : {

"props" : {

"position" : "Developer",

"name" : "Michael",

"awesome" : true,

"children" : 3

}

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"columns" : [ "n" ],

"data" : [ [ {

"labels" : "http://localhost:7474/db/data/node/63/labels",

"outgoing_relationships" : "http://localhost:7474/db/data/node/63/relationships/out",

"data" : {

"position" : "Developer",

"awesome" : true,

"name" : "Michael",

"children" : 3

"all_typed_relationships" : "http://localhost:7474/db/data/node/63/relationships/all/{-list|&|types}",

"traverse" : "http://localhost:7474/db/data/node/63/traverse/{returnType}",

"self" : "http://localhost:7474/db/data/node/63",

"property" : "http://localhost:7474/db/data/node/63/properties/{key}",

"properties" : "http://localhost:7474/db/data/node/63/properties",

"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/63/relationships/out/{-list|&|types}",

"incoming_relationships" : "http://localhost:7474/db/data/node/63/relationships/in",

"extensions" : { },

"create_relationship" : "http://localhost:7474/db/data/node/63/relationships",

"paged_traverse" : "http://localhost:7474/db/data/node/63/paged/traverse/{returnType}{?pageSize,leaseTime}",

"all_relationships" : "http://localhost:7474/db/data/node/63/relationships/all",

"incoming_typed_relationships" : "http://localhost:7474/db/data/node/63/relationships/in/{-list|&|types}",

"metadata" : {

REST API

321

"id" : 63,

"labels" : [ "Person" ]

}

} ] ]

}

Create multiple nodes with properties

Create multiple nodes with properties using Cypher. See the request for the parameter sent with the

query.

UNWIND { props } AS map

CREATE (n:Person)

SET n = map

RETURN n

Figure21.5.Final Graph

Node[66]: Person

nam e = 'Andres'

Node[67]: Person

nam e = 'Andres'

position = 'Developer'

Node[68]: Person

nam e = 'Michael'

position = 'Developer'

Example request

•POST http://localhost:7474/db/data/cypher

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "UNWIND {props} as map CREATE (n:Person) SET n = map RETURN n",

"params" : {

"props" : [ {

"name" : "Andres",

"position" : "Developer"

}, {

"name" : "Michael",

"position" : "Developer"

} ]

}

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"columns" : [ "n" ],

"data" : [ [ {

"labels" : "http://localhost:7474/db/data/node/67/labels",

"outgoing_relationships" : "http://localhost:7474/db/data/node/67/relationships/out",

"data" : {

"position" : "Developer",

"name" : "Andres"

"all_typed_relationships" : "http://localhost:7474/db/data/node/67/relationships/all/{-list|&|types}",

"traverse" : "http://localhost:7474/db/data/node/67/traverse/{returnType}",

"self" : "http://localhost:7474/db/data/node/67",

"property" : "http://localhost:7474/db/data/node/67/properties/{key}",

"properties" : "http://localhost:7474/db/data/node/67/properties",

"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/67/relationships/out/{-list|&|types}",

"incoming_relationships" : "http://localhost:7474/db/data/node/67/relationships/in",

"extensions" : { },

REST API

322

"create_relationship" : "http://localhost:7474/db/data/node/67/relationships",

"paged_traverse" : "http://localhost:7474/db/data/node/67/paged/traverse/{returnType}{?pageSize,leaseTime}",

"all_relationships" : "http://localhost:7474/db/data/node/67/relationships/all",

"incoming_typed_relationships" : "http://localhost:7474/db/data/node/67/relationships/in/{-list|&|types}",

"metadata" : {

"id" : 67,

"labels" : [ "Person" ]

}

} ], [ {

"labels" : "http://localhost:7474/db/data/node/68/labels",

"outgoing_relationships" : "http://localhost:7474/db/data/node/68/relationships/out",

"data" : {

"position" : "Developer",

"name" : "Michael"

"all_typed_relationships" : "http://localhost:7474/db/data/node/68/relationships/all/{-list|&|types}",

"traverse" : "http://localhost:7474/db/data/node/68/traverse/{returnType}",

"self" : "http://localhost:7474/db/data/node/68",

"property" : "http://localhost:7474/db/data/node/68/properties/{key}",

"properties" : "http://localhost:7474/db/data/node/68/properties",

"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/68/relationships/out/{-list|&|types}",

"incoming_relationships" : "http://localhost:7474/db/data/node/68/relationships/in",

"extensions" : { },

"create_relationship" : "http://localhost:7474/db/data/node/68/relationships",

"paged_traverse" : "http://localhost:7474/db/data/node/68/paged/traverse/{returnType}{?pageSize,leaseTime}",

"all_relationships" : "http://localhost:7474/db/data/node/68/relationships/all",

"incoming_typed_relationships" : "http://localhost:7474/db/data/node/68/relationships/in/{-list|&|types}",

"metadata" : {

"id" : 68,

"labels" : [ "Person" ]

}

} ] ]

}

Set all properties on a node using Cypher

Set all properties on a node.

CREATE (n:Person { name: 'this property is to be deleted' })

SET n = { props }

RETURN n

Figure21.6.Final Graph

Node[94]: Person

awesom e = true

children = 3

firstName = 'Michael'

position = 'Developer'

Example request

•POST http://localhost:7474/db/data/cypher

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "CREATE (n:Person { name: 'this property is to be deleted' } ) SET n = { props } RETURN n",

"params" : {

"props" : {

"position" : "Developer",

"firstName" : "Michael",

"awesome" : true,

"children" : 3

REST API

323

}

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"columns" : [ "n" ],

"data" : [ [ {

"labels" : "http://localhost:7474/db/data/node/94/labels",

"outgoing_relationships" : "http://localhost:7474/db/data/node/94/relationships/out",

"data" : {

"position" : "Developer",

"awesome" : true,

"children" : 3,

"firstName" : "Michael"

"all_typed_relationships" : "http://localhost:7474/db/data/node/94/relationships/all/{-list|&|types}",

"traverse" : "http://localhost:7474/db/data/node/94/traverse/{returnType}",

"self" : "http://localhost:7474/db/data/node/94",

"property" : "http://localhost:7474/db/data/node/94/properties/{key}",

"properties" : "http://localhost:7474/db/data/node/94/properties",

"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/94/relationships/out/{-list|&|types}",

"incoming_relationships" : "http://localhost:7474/db/data/node/94/relationships/in",

"extensions" : { },

"create_relationship" : "http://localhost:7474/db/data/node/94/relationships",

"paged_traverse" : "http://localhost:7474/db/data/node/94/paged/traverse/{returnType}{?pageSize,leaseTime}",

"all_relationships" : "http://localhost:7474/db/data/node/94/relationships/all",

"incoming_typed_relationships" : "http://localhost:7474/db/data/node/94/relationships/in/{-list|&|types}",

"metadata" : {

"id" : 94,

"labels" : [ "Person" ]

}

} ] ]

}

Send a query

A simple query returning all nodes connected to some node, returning the node and the name

property, if it exists, otherwise NULL:

MATCH (x { name: 'I' })-[r]->(n)

RETURN type(r), n.name, n.age

Figure21.7.Final Graph

Node[86]

nam e = 'you'

Node[87]

nam e = 'him '

age = 25

Node[88]

nam e = 'I'

know know

Example request

•POST http://localhost:7474/db/data/cypher

REST API

324

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "MATCH (x {name: 'I'})-[r]->(n) RETURN type(r), n.name, n.age",

"params" : { }

}

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"columns" : [ "type(r)", "n.name", "n.age" ],

"data" : [ [ "know", "him", 25 ], [ "know", "you", null ] ]

}

Return paths

Paths can be returned just like other return types.

MATCH path =(x { name: 'I' })--(friend)

RETURN path, friend.name

Figure21.8.Final Graph

Node[92]

nam e = 'you'

Node[93]

nam e = 'I'

know

Example request

•POST http://localhost:7474/db/data/cypher

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "MATCH path = (x {name: 'I'})--(friend) RETURN path, friend.name",

"params" : { }

}

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"columns" : [ "path", "friend.name" ],

"data" : [ [ {

"directions" : [ "->" ],

"start" : "http://localhost:7474/db/data/node/93",

"nodes" : [ "http://localhost:7474/db/data/node/93", "http://localhost:7474/db/data/node/92" ],

"length" : 1,

"relationships" : [ "http://localhost:7474/db/data/relationship/21" ],

"end" : "http://localhost:7474/db/data/node/92"

REST API

325

}, "you" ] ]

}

Nested results

When sending queries that return nested results like list and maps, these will get serialized into nested

JSON representations according to their types.

MATCH (n)

WHERE n.name IN ['I', 'you']

RETURN collect(n.name)

Figure21.9.Final Graph

Node[89]

nam e = 'you'

Node[90]

nam e = 'I'

know

Example request

•POST http://localhost:7474/db/data/cypher

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "MATCH (n) WHERE n.name in ['I', 'you'] RETURN collect(n.name)",

"params" : { }

}

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"columns" : [ "collect(n.name)" ],

"data" : [ [ [ "you", "I" ] ] ]

}

Retrieve query metadata

By passing in an additional GET parameter when you execute Cypher queries, metadata about the

query will be returned, such as how many labels were added or removed by the query.

MATCH (n { name: 'I' })

SET n:Actor

REMOVE n:Director

RETURN labels(n)

Figure21.10.Final Graph

Node[91]: Actor

nam e = 'I'

Example request

REST API

326

•POST http://localhost:7474/db/data/cypher?includeStats=true

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "MATCH (n {name: 'I'}) SET n:Actor REMOVE n:Director RETURN labels(n)",

"params" : { }

}

Example response

•200: OK

•Content-Type: application/json; charset=UTF-8

{

"columns" : [ "labels(n)" ],

"data" : [ [ [ "Actor" ] ] ],

"stats" : {

"relationships_created" : 0,

"nodes_deleted" : 0,

"relationship_deleted" : 0,

"indexes_added" : 0,

"properties_set" : 0,

"constraints_removed" : 0,

"indexes_removed" : 0,

"labels_removed" : 1,

"constraints_added" : 0,

"labels_added" : 1,

"nodes_created" : 0,

"contains_updates" : true

}

Errors

Errors on the server will be reported as a JSON-formatted message, exception name and stacktrace.

MATCH (x { name: 'I' })

RETURN x.dummy/0

Figure21.11.Final Graph

Node[72]

nam e = 'you'

Node[73]

nam e = 'I'

know

Example request

•POST http://localhost:7474/db/data/cypher

•Accept: application/json; charset=UTF-8

•Content-Type: application/json

{

"query" : "MATCH (x {name: 'I'}) RETURN x.dummy/0",

"params" : { }

REST API

327

}

Example response

•400: Bad Request

•Content-Type: application/json; charset=UTF-8

{

"message": "/ by zero",

"exception": "BadInputException",

"fullname": "org.neo4j.server.rest.repr.BadInputException",

"stackTrace": [

"org.neo4j.server.rest.repr.RepresentationExceptionHandlingIterable.exceptionOnNext(RepresentationExceptionHandlingIterable.java:39)",

"org.neo4j.helpers.collection.ExceptionHandlingIterable$1.next(ExceptionHandlingIterable.java:55)",

"org.neo4j.helpers.collection.IteratorWrapper.next(IteratorWrapper.java:47)",

"org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:64)",

"org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:75)",

"org.neo4j.server.rest.repr.MappingSerializer.putList(MappingSerializer.java:61)",

"org.neo4j.server.rest.repr.CypherResultRepresentation.serialize(CypherResultRepresentation.java:58)",

"org.neo4j.server.rest.repr.MappingRepresentation.serialize(MappingRepresentation.java:41)",

"org.neo4j.server.rest.repr.OutputFormat.assemble(OutputFormat.java:245)",

"org.neo4j.server.rest.repr.OutputFormat.formatRepresentation(OutputFormat.java:177)",

"org.neo4j.server.rest.repr.OutputFormat.response(OutputFormat.java:160)",

"org.neo4j.server.rest.repr.OutputFormat.ok(OutputFormat.java:73)",

"org.neo4j.server.rest.web.CypherService.cypher(CypherService.java:127)",

"java.lang.reflect.Method.invoke(Method.java:606)",

"org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:139)",

"org.neo4j.server.rest.web.CollectUserAgentFilter.doFilter(CollectUserAgentFilter.java:69)",

"java.lang.Thread.run(Thread.java:745)"

"cause": {

"message": "/ by zero",

"errors": [

{

"message": "/ by zero",

"code": "Neo.ClientError.Statement.ArithmeticError"

}

"cause": {

"message": "/ by zero",

"errors": [

{

"message": "/ by zero",

"code": "Neo.ClientError.Statement.ArithmeticError"

}

"cause": {

"message": "/ by zero",

"errors": [

{

"message": "/ by zero",

"code": "Neo.ClientError.Statement.ArithmeticError"

}

"cause": {

"errors": [

{

"code": "Neo.DatabaseError.General.UnknownFailure",

"stackTrace": "org.neo4j.cypher.internal.frontend.v2_3.ArithmeticException\n\tat

org.neo4j.cypher.internal.compiler.v2_3.commands.expressions.Divide.apply(Divide.scala:36)\n\tat

org.neo4j.cypher.internal.compiler.v2_3.pipes.ProjectionPipe$$anonfun$internalCreateResults$1$$anonfun$apply

$1.apply(ProjectionPipe.scala:48)\n\tat org.neo4j.cypher.internal.compiler.v2_3.pipes.ProjectionPipe$$anonfun

$internalCreateResults$1$$anonfun$apply$1.apply(ProjectionPipe.scala:46)\n\tat scala.collection.immutable.Map

REST API

328

$Map1.foreach(Map.scala:116)\n\tat org.neo4j.cypher.internal.compiler.v2_3.pipes.ProjectionPipe$$anonfun

$internalCreateResults$1.apply(ProjectionPipe.scala:46)\n\tat org.neo4j.cypher.internal.compiler.v2_3.pipes.ProjectionPipe

$$anonfun$internalCreateResults$1.apply(ProjectionPipe.scala:45)\n\tat scala.collection.Iterator$$anon

$11.next(Iterator.scala:370)\n\tat scala.collection.Iterator$$anon$11.next(Iterator.scala:370)\n\tat

org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:75)\n\tat

org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:72)\n\tat

org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator$$anonfun$failIfThrows$1.apply(ResultIterator.scala:121)\n

\tat org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.decoratedCypherException(ResultIterator.scala:130)\n

\tat org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.failIfThrows(ResultIterator.scala:119)\n

\tat org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.next(ResultIterator.scala:72)\n\tat

org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.next(ResultIterator.scala:50)\n\tat

org.neo4j.cypher.internal.compiler.v2_3.PipeExecutionResult.next(PipeExecutionResult.scala:77)\n\tat

org.neo4j.cypher.internal.compiler.v2_3.PipeExecutionResult$$anon$2.next(PipeExecutionResult.scala:70)\n\tat

org.neo4j.cypher.internal.compiler.v2_3.PipeExecutionResult$$anon$2.next(PipeExecutionResult.scala:68)\n

\tat org.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1$$anonfun$next

$1.apply(CompatibilityFor2_3.scala:234)\n\tat org.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1$

$anonfun$next$1.apply(CompatibilityFor2_3.scala:234)\n\tat

org.neo4j.cypher.internal.compatibility.exceptionHandlerFor2_3$.runSafely(CompatibilityFor2_3.scala:116)\n\tat

org.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1.next(CompatibilityFor2_3.scala:234)\n\tat

org.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1.next(CompatibilityFor2_3.scala:229)\n

\tat org.neo4j.cypher.javacompat.ExecutionResult.next(ExecutionResult.java:233)\n

\tat org.neo4j.cypher.javacompat.ExecutionResult.next(ExecutionResult.java:55)\n\tat

org.neo4j.helpers.collection.ExceptionHandlingIterable$1.next(ExceptionHandlingIterable.java:53)\n

\tat org.neo4j.helpers.collection.IteratorWrapper.next(IteratorWrapper.java:47)\n\tat

org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:64)\n

\tat org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:75)\n\tat

org.neo4j.server.rest.repr.MappingSerializer.putList(MappingSerializer.java:61)\n\tat

org.neo4j.server.rest.repr.CypherResultRepresentation.serialize(CypherResultRepresentation.java:58)\n

\tat org.neo4j.server.rest.repr.MappingRepresentation.serialize(MappingRepresentation.java:41)\n

\tat org.neo4j.server.rest.repr.OutputFormat.assemble(OutputFormat.java:245)\n\tat

org.neo4j.server.rest.repr.OutputFormat.formatRepresentation(OutputFormat.java:177)\n

\tat org.neo4j.server.rest.repr.OutputFormat.response(OutputFormat.java:160)\n\tat

org.neo4j.server.rest.repr.OutputFormat.ok(OutputFormat.java:73)\n\tat

org.neo4j.server.rest.web.CypherService.cypher(CypherService.java:127)\n\tat

sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\n\tat

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat

java.lang.reflect.Method.invoke(Method.java:606)\n\tat com.sun.jersey.spi.container.JavaMethodInvokerFactory

$1.invoke(JavaMethodInvokerFactory.java:60)\n\tat

com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider

$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)\n\tat

com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)\n

\tat org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:139)\n

\tat com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)\n\tat

com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)\n\tat

com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat

com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)\n\tat

com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)\n\tat

com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)\n

\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)\n

\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)\n

\tat com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)\n\tat

com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)\n\tat

com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)\n\tat

javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat

org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808)\n\tat

org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)\n\tat

org.neo4j.server.rest.web.CollectUserAgentFilter.doFilter(CollectUserAgentFilter.java:69)\n

\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n

\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat

org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)\n\tat

org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n

\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat

org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat

org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n

REST API

329

\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n

\tat org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)\n\tat

org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat

org.eclipse.jetty.server.Server.handle(Server.java:497)\n\tat

org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat

org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat org.eclipse.jetty.io.AbstractConnection

$2.run(AbstractConnection.java:540)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n

\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat

java.lang.Thread.run(Thread.java:745)\n"

}

"exception": "ArithmeticException",

"fullname": "org.neo4j.cypher.internal.frontend.v2_3.ArithmeticException",

"stackTrace": [

"org.neo4j.cypher.internal.compiler.v2_3.commands.expressions.Divide.apply(Divide.scala:36)",

"org.neo4j.cypher.internal.compiler.v2_3.pipes.ProjectionPipe$$anonfun$internalCreateResults$1$$anonfun$apply

$1.apply(ProjectionPipe.scala:48)",

"org.neo4j.cypher.internal.compiler.v2_3.pipes.ProjectionPipe$$anonfun$internalCreateResults$1$$anonfun$apply

$1.apply(ProjectionPipe.scala:46)",

"scala.collection.immutable.Map$Map1.foreach(Map.scala:116)",

"org.neo4j.cypher.internal.compiler.v2_3.pipes.ProjectionPipe$$anonfun$internalCreateResults

$1.apply(ProjectionPipe.scala:46)",

"org.neo4j.cypher.internal.compiler.v2_3.pipes.ProjectionPipe$$anonfun$internalCreateResults

$1.apply(ProjectionPipe.scala:45)",

"scala.collection.Iterator$$anon$11.next(Iterator.scala:370)",

"org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:75)",

"org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:72)",

"org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator$$anonfun$failIfThrows$1.apply(ResultIterator.scala:121)",

"org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.decoratedCypherException(ResultIterator.scala:130)",

"org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.failIfThrows(ResultIterator.scala:119)",

"org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.next(ResultIterator.scala:72)",

"org.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.next(ResultIterator.scala:50)",

"org.neo4j.cypher.internal.compiler.v2_3.PipeExecutionResult.next(PipeExecutionResult.scala:77)",

"org.neo4j.cypher.internal.compiler.v2_3.PipeExecutionResult$$anon$2.next(PipeExecutionResult.scala:70)",

"org.neo4j.cypher.internal.compiler.v2_3.PipeExecutionResult$$anon$2.next(PipeExecutionResult.scala:68)",

"org.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1$$anonfun$next

$1.apply(CompatibilityFor2_3.scala:234)",

"org.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1$$anonfun$next

$1.apply(CompatibilityFor2_3.scala:234)",

"org.neo4j.cypher.internal.compatibility.exceptionHandlerFor2_3$.runSafely(CompatibilityFor2_3.scala:116)",

"org.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon

$1.next(CompatibilityFor2_3.scala:234)",

"org.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon

$1.next(CompatibilityFor2_3.scala:229)",

"org.neo4j.cypher.javacompat.ExecutionResult.next(ExecutionResult.java:233)",

"org.neo4j.cypher.javacompat.ExecutionResult.next(ExecutionResult.java:55)",

"org.neo4j.helpers.collection.ExceptionHandlingIterable$1.next(ExceptionHandlingIterable.java:53)",