Getting Started with Neo4J - Part 1


Bhaskar S 11/26/2017


Overview

A Graph is a data structure in which data is represented as a set of vertices that are connected by edges. A Directed graph is a special case of a graph in which the edges have direction from one vertex to another.

A Graph database is a type of NoSQL datastore that uses the directed graph data structure under-the-hood to store information (or data). In the Graph database parlance, vertices are referred to as Nodes and the edges are referred to as Relationships.

Neo4J is an open source Graph database implemented using Java. The following are some of the features of Neo4J:

There are two versions of Neo4J - the Community edition (free) and the Enterprise edition ($$$). The Enterprise edition provides more advanced enterprise capabilities such as High Availability, Clustering, Security (authentication and authorization), Monitoring, etc.

For our demonstration, we will be using the Community edition of Neo4J.

Terminology

The following is a simplified example from the Identity Access Management (IAM) space of some nodes (retangular boxes) and relationships (directed arrows with a textual tag) between them:

IAM Example
IAM Example

In this section, we will list and briefly describe some of the terms referred to in this article.

Term Description
Node an entity in a domain. User, Group, Role, and Permission from the above diagram
Relationship named and directional connection between two Nodes. BELONGS_TO, ASSIGNED_TO, and GRANTED from the above diagram
Label A textual name or tag assigned to a Node so they can be grouped together. User, Group, Role, and Permission from the above diagram
Property a name-value pair associated with either a Node or a Relationship. path and action of the Permission Node from the above diagram
Cypher a declarative query language (similar to SQL) for working with Neo4J

Setup

The setup will be on a Ubuntu 16.04 LTS based Linux desktop.

Ensure Docker is installed and setup. Else, refer to the article Introduction to Docker.

Assume a hypothetical user alice with the home directory located at /home/alice.

Create a directory called Neo4J under /home/alice by executing the following command:

mkdir Neo4J

Create three more directories called conf, data, and logs under /home/alice/Neo4J by executing the following command:

mkdir -p /home/alice/Neo4J/conf /home/alice/Neo4J/data /home/alice/Neo4J/logs

Change to the directory /home/alice/Neo4J by executing the following command:

cd /home/alice/Neo4J

For our exploration, we will be downloading and using the official docker image neo4j:3.3.0 (latest version is 3.3.0 as 11/26/2017).

To pull and download the docker image for Neo4J, execute the following command:

docker pull neo4j:3.3.0

The following should be the typical output:

Output.1

3.3.0: Pulling from library/neo4j
b56ae66c2937: Pull complete 
81cebc5bcaf8: Pull complete 
3b27fd892ecb: Pull complete 
f120ec548211: Pull complete 
0c039dfb6b17: Pull complete 
c7bf503f11ca: Pull complete 
94ca2e957a63: Pull complete 
Digest: sha256:d28780b3f37fda0290c7cf64fab077253215ba5a97281c7a33ca0d0c2a140457
Status: Downloaded newer image for neo4j:3.3.0

The docker instance for Neo4J will use default configuration settings built into the image. One can customize the configuration settings by changing the default values. To create an initial configuration file from the docker image for Neo4J, execute the following command:

docker run --rm --volume=$HOME/Neo4J/conf:/conf neo4j:3.3.0 dump-config

This will dump the default configuration used by Neo4J to the file called neo4j.conf under /home/alice/Neo4J/conf. The following are the contents of neo4j.conf:

neo4j.conf
#*****************************************************************
# Neo4j configuration
#
# For more details and a complete list of settings, please see
# https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/
#*****************************************************************

# The name of the database to mount
#dbms.active_database=graph.db

# Paths of directories in the installation.
#dbms.directories.data=data
#dbms.directories.plugins=plugins
#dbms.directories.certificates=certificates
#dbms.directories.logs=logs
#dbms.directories.lib=lib
#dbms.directories.run=run

# This setting constrains all `LOAD CSV` import files to be under the `import` directory. Remove or comment it out to
# allow files to be loaded from anywhere in the filesystem; this introduces possible security problems. See the
# `LOAD CSV` section of the manual for details.
dbms.directories.import=import

# Whether requests to Neo4j are authenticated.
# To disable authentication, uncomment this line
#dbms.security.auth_enabled=false

# Enable this to be able to upgrade a store from an older version.
#dbms.allow_upgrade=true

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size.
#dbms.memory.heap.initial_size=512m
#dbms.memory.heap.max_size=512m

# The amount of memory to use for mapping the store files, in bytes (or
# kilobytes with the 'k' suffix, megabytes with 'm' and gigabytes with 'g').
# If Neo4j is running on a dedicated server, then it is generally recommended
# to leave about 2-4 gigabytes for the operating system, give the JVM enough
# heap to hold all your transaction state and query context, and then leave the
# rest for the page cache.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the max Java heap size.
#dbms.memory.pagecache.size=10g

#*****************************************************************
# Network connector configuration
#*****************************************************************

# With default configuration Neo4j only accepts local connections.
# To accept non-local connections, uncomment this line:
#dbms.connectors.default_listen_address=0.0.0.0

# You can also choose a specific network interface, and configure a non-default
# port for each connector, by setting their individual listen_address.

# The address at which this server can be reached by its clients. This may be the server's IP address or DNS name, or
# it may be the address of a reverse proxy which sits in front of the server. This setting may be overridden for
# individual connectors below.
#dbms.connectors.default_advertised_address=localhost

# You can also choose a specific advertised hostname or IP address, and
# configure an advertised port for each connector, by setting their
# individual advertised_address.

# Bolt connector
dbms.connector.bolt.enabled=true
#dbms.connector.bolt.tls_level=OPTIONAL
#dbms.connector.bolt.listen_address=:7687

# HTTP Connector. There must be exactly one HTTP connector.
dbms.connector.http.enabled=true
#dbms.connector.http.listen_address=:7474

# HTTPS Connector. There can be zero or one HTTPS connectors.
dbms.connector.https.enabled=true
#dbms.connector.https.listen_address=:7473

# Number of Neo4j worker threads.
#dbms.threads.worker_count=

#*****************************************************************
# SSL system configuration
#*****************************************************************

# Names of the SSL policies to be used for the respective components.

# The legacy policy is a special policy which is not defined in
# the policy configuration section, but rather derives from
# dbms.directories.certificates and associated files
# (by default: neo4j.key and neo4j.cert). Its use will be deprecated.

# The policies to be used for connectors.
#
# N.B: Note that a connector must be configured to support/require
#      SSL/TLS for the policy to actually be utilized.
#
# see: dbms.connector.*.tls_level

#bolt.ssl_policy=legacy
#https.ssl_policy=legacy

#*****************************************************************
# SSL policy configuration
#*****************************************************************

# Each policy is configured under a separate namespace, e.g.
#    dbms.ssl.policy.<policyname>.*
#
# The example settings below are for a new policy named 'default'.

# The base directory for cryptographic objects. Each policy will by
# default look for its associated objects (keys, certificates, ...)
# under the base directory.
#
# Every such setting can be overriden using a full path to
# the respective object, but every policy will by default look
# for cryptographic objects in its base location.
#
# Mandatory setting

#dbms.ssl.policy.default.base_directory=certificates/default

# Allows the generation of a fresh private key and a self-signed
# certificate if none are found in the expected locations. It is
# recommended to turn this off again after keys have been generated.
#
# Keys should in general be generated and distributed offline
# by a trusted certificate authority (CA) and not by utilizing
# this mode.

#dbms.ssl.policy.default.allow_key_generation=false

# Enabling this makes it so that this policy ignores the contents
# of the trusted_dir and simply resorts to trusting everything.
#
# Use of this mode is discouraged. It would offer encryption but no security.

#dbms.ssl.policy.default.trust_all=false

# The private key for the default SSL policy. By default a file
# named private.key is expected under the base directory of the policy.
# It is mandatory that a key can be found or generated.

#dbms.ssl.policy.default.private_key=

# The private key for the default SSL policy. By default a file
# named public.crt is expected under the base directory of the policy.
# It is mandatory that a certificate can be found or generated.

#dbms.ssl.policy.default.public_certificate=

# The certificates of trusted parties. By default a directory named
# 'trusted' is expected under the base directory of the policy. It is
# mandatory to create the directory so that it exists, because it cannot
# be auto-created (for security purposes).
#
# To enforce client authentication client_auth must be set to 'require'!

#dbms.ssl.policy.default.trusted_dir=

# Client authentication setting. Values: none, optional, require
# The default is to require client authentication.
#
# Servers are always authenticated unless explicitly overridden
# using the trust_all setting. In a mutual authentication setup this
# should be kept at the default of require and trusted certificates
# must be installed in the trusted_dir.

#dbms.ssl.policy.default.client_auth=require

# A comma-separated list of allowed TLS versions.
# By default TLSv1, TLSv1.1 and TLSv1.2 are allowed.

#dbms.ssl.policy.default.tls_versions=

# A comma-separated list of allowed ciphers.
# The default ciphers are the defaults of the JVM platform.

#dbms.ssl.policy.default.ciphers=

#*****************************************************************
# Logging configuration
#*****************************************************************

# To enable HTTP logging, uncomment this line
#dbms.logs.http.enabled=true

# Number of HTTP logs to keep.
#dbms.logs.http.rotation.keep_number=5

# Size of each HTTP log that is kept.
#dbms.logs.http.rotation.size=20m

# To enable GC Logging, uncomment this line
#dbms.logs.gc.enabled=true

# GC Logging Options
# see http://docs.oracle.com/cd/E19957-01/819-0084-10/pt_tuningjava.html#wp57013 for more information.
#dbms.logs.gc.options=-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution

# Number of GC logs to keep.
#dbms.logs.gc.rotation.keep_number=5

# Size of each GC log that is kept.
#dbms.logs.gc.rotation.size=20m

# Size threshold for rotation of the debug log. If set to zero then no rotation will occur. Accepts a binary suffix "k",
# "m" or "g".
#dbms.logs.debug.rotation.size=20m

# Maximum number of history files for the internal log.
#dbms.logs.debug.rotation.keep_number=7

#*****************************************************************
# Miscellaneous configuration
#*****************************************************************

# Enable this to specify a parser other than the default one.
#cypher.default_language_version=3.3

# Determines if Cypher will allow using file URLs when loading data using
# `LOAD CSV`. Setting this value to `false` will cause Neo4j to fail `LOAD CSV`
# clauses that load data from the file system.
#dbms.security.allow_csv_import_from_file_urls=true

# Retention policy for transaction logs needed to perform recovery and backups.
dbms.tx_log.rotation.retention_policy=1 days

# Enable a remote shell server which Neo4j Shell clients can log in to.
#dbms.shell.enabled=true
# The network interface IP the shell will listen on (use 0.0.0.0 for all interfaces).
#dbms.shell.host=127.0.0.1
# The port the shell will listen on, default is 1337.
#dbms.shell.port=1337

# Only allow read operations from this Neo4j instance. This mode still requires
# write access to the directory for lock purposes.
#dbms.read_only=false

# Comma separated list of JAX-RS packages containing JAX-RS resources, one
# package name for each mountpoint. The listed package names will be loaded
# under the mountpoints specified. Uncomment this line to mount the
# org.neo4j.examples.server.unmanaged.HelloWorldResource.java from
# neo4j-server-examples under /examples/unmanaged, resulting in a final URL of
# http://localhost:7474/examples/unmanaged/helloworld/{nodeId}
#dbms.unmanaged_extension_classes=org.neo4j.examples.server.unmanaged=/examples/unmanaged

#********************************************************************
# JVM Parameters
#********************************************************************

# G1GC generally strikes a good balance between throughput and tail
# latency, without too much tuning.
dbms.jvm.additional=-XX:+UseG1GC

# Have common exceptions keep producing stack traces, so they can be
# debugged regardless of how often logs are rotated.
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow

# Make sure that `initmemory` is not only allocated, but committed to
# the process, before starting the database. This reduces memory
# fragmentation, increasing the effectiveness of transparent huge
# pages. It also reduces the possibility of seeing performance drop
# due to heap-growing GC events, where a decrease in available page
# cache leads to an increase in mean IO response time.
# Try reducing the heap memory, if this flag degrades performance.
dbms.jvm.additional=-XX:+AlwaysPreTouch

# Trust that non-static final fields are really final.
# This allows more optimizations and improves overall performance.
# NOTE: Disable this if you use embedded mode, or have extensions or dependencies that may use reflection or
# serialization to change the value of final fields!
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields

# Disable explicit garbage collection, which is occasionally invoked by the JDK itself.
dbms.jvm.additional=-XX:+DisableExplicitGC

# Remote JMX monitoring, uncomment and adjust the following lines as needed. Absolute paths to jmx.access and
# jmx.password files are required.
# Also make sure to update the jmx.access and jmx.password files with appropriate permission roles and passwords,
# the shipped configuration contains only a read only role called 'monitor' with password 'Neo4j'.
# For more details, see: http://download.oracle.com/javase/8/docs/technotes/guides/management/agent.html
# On Unix based systems the jmx.password file needs to be owned by the user that will run the server,
# and have permissions set to 0600.
# For details on setting these file permissions on Windows see:
#     http://docs.oracle.com/javase/8/docs/technotes/guides/management/security-windows.html
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.port=3637
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.authenticate=true
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.ssl=false
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.password.file=/absolute/path/to/conf/jmx.password
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.access.file=/absolute/path/to/conf/jmx.access

# Some systems cannot discover host name automatically, and need this line configured:
#dbms.jvm.additional=-Djava.rmi.server.hostname=$THE_NEO4J_SERVER_HOSTNAME

# Expand Diffie Hellman (DH) key size from default 1024 to 2048 for DH-RSA cipher suites used in server TLS handshakes.
# This is to protect the server from any potential passive eavesdropping.
dbms.jvm.additional=-Djdk.tls.ephemeralDHKeySize=2048

#********************************************************************
# Wrapper Windows NT/2000/XP Service Properties
#********************************************************************
# WARNING - Do not modify any of these properties when an application
#  using this configuration file has been installed as a service.
#  Please uninstall the service before modifying this section.  The
#  service can then be reinstalled.

# Name of the service
dbms.windows_service_name=neo4j

#********************************************************************
# Other Neo4j system properties
#********************************************************************
dbms.jvm.additional=-Dunsupported.dbms.udc.source=tarball

The configuration file neo4j.conf under /home/alice/Neo4J/conf by default is owned by root. To change the ownership to user alice, execute the following command:

sudo chown alice:alice conf/neo4j.conf

Now we should be able to change some of the settings in the file neo4j.conf under /home/alice/Neo4J/conf.

The following are the contents of the modified configuration file neo4j.conf under /home/alice/Neo4J/conf:

neo4j.conf (Modified)
#*****************************************************************
# Neo4j configuration
#
# For more details and a complete list of settings, please see
# https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/
#*****************************************************************

# The name of the database to mount
dbms.active_database=demo_graph.db

# Paths of directories in the installation.
dbms.directories.data=/data
dbms.directories.logs=/logs

# This setting constrains all `LOAD CSV` import files to be under the `import` directory. Remove or comment it out to
# allow files to be loaded from anywhere in the filesystem; this introduces possible security problems. See the
# `LOAD CSV` section of the manual for details.
dbms.directories.import=import

# Enable this to be able to upgrade a store from an older version.
dbms.allow_upgrade=true

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size.
dbms.memory.heap.initial_size=1024m
dbms.memory.heap.max_size=2048m

# The amount of memory to use for mapping the store files, in bytes (or
# kilobytes with the 'k' suffix, megabytes with 'm' and gigabytes with 'g').
# If Neo4j is running on a dedicated server, then it is generally recommended
# to leave about 2-4 gigabytes for the operating system, give the JVM enough
# heap to hold all your transaction state and query context, and then leave the
# rest for the page cache.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the max Java heap size.
dbms.memory.pagecache.size=1g

#*****************************************************************
# Network connector configuration
#*****************************************************************

# You can also choose a specific advertised hostname or IP address, and
# configure an advertised port for each connector, by setting their
# individual advertised_address.

# Bolt connector
dbms.connector.bolt.enabled=true
dbms.connector.bolt.listen_address=:7687

# HTTP Connector. There must be exactly one HTTP connector.
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=:7474

#*****************************************************************
# Miscellaneous configuration
#*****************************************************************

# Enable this to specify a parser other than the default one.
cypher.default_language_version=3.0

# Determines if Cypher will allow using file URLs when loading data using
# `LOAD CSV`. Setting this value to `false` will cause Neo4j to fail `LOAD CSV`
# clauses that load data from the file system.
dbms.security.allow_csv_import_from_file_urls=true

# Retention policy for transaction logs needed to perform recovery and backups.
dbms.tx_log.rotation.retention_policy=1 days

#********************************************************************
# JVM Parameters
#********************************************************************

# G1GC generally strikes a good balance between throughput and tail
# latency, without too much tuning.
dbms.jvm.additional=-XX:+UseG1GC

# Have common exceptions keep producing stack traces, so they can be
# debugged regardless of how often logs are rotated.
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow

# Make sure that `initmemory` is not only allocated, but committed to
# the process, before starting the database. This reduces memory
# fragmentation, increasing the effectiveness of transparent huge
# pages. It also reduces the possibility of seeing performance drop
# due to heap-growing GC events, where a decrease in available page
# cache leads to an increase in mean IO response time.
# Try reducing the heap memory, if this flag degrades performance.
dbms.jvm.additional=-XX:+AlwaysPreTouch

# Trust that non-static final fields are really final.
# This allows more optimizations and improves overall performance.
# NOTE: Disable this if you use embedded mode, or have extensions or dependencies that may use reflection or
# serialization to change the value of final fields!
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields

# Disable explicit garbage collection, which is occasionally invoked by the JDK itself.
dbms.jvm.additional=-XX:+DisableExplicitGC

# Expand Diffie Hellman (DH) key size from default 1024 to 2048 for DH-RSA cipher suites used in server TLS handshakes.
# This is to protect the server from any potential passive eavesdropping.
dbms.jvm.additional=-Djdk.tls.ephemeralDHKeySize=2048

To launch the docker instance for Neo4J, execute the following command:

docker run --rm --name neo4j --publish=7474:7474 --publish=7687:7687 --volume=$HOME/Neo4J/data:/data --volume=$HOME/Neo4J/logs:/logs --volume=$HOME/Neo4J/conf:/conf neo4j:3.3.0

The following should be the typical output:

Output.2

Active database: demo_graph.db
Directories in use:
  home:         /var/lib/neo4j
  config:       /var/lib/neo4j/conf
  logs:         /logs
  plugins:      /var/lib/neo4j/plugins
  import:       /var/lib/neo4j/import
  data:         /data
  certificates: /var/lib/neo4j/certificates
  run:          /var/lib/neo4j/run
Starting Neo4j.
2017-11-26 17:47:33.291+0000 WARN  Unknown config option: causal_clustering.discovery_listen_address
2017-11-26 17:47:33.296+0000 WARN  Unknown config option: causal_clustering.raft_advertised_address
2017-11-26 17:47:33.296+0000 WARN  Unknown config option: causal_clustering.raft_listen_address
2017-11-26 17:47:33.296+0000 WARN  Unknown config option: ha.host.coordination
2017-11-26 17:47:33.296+0000 WARN  Unknown config option: causal_clustering.transaction_advertised_address
2017-11-26 17:47:33.297+0000 WARN  Unknown config option: causal_clustering.discovery_advertised_address
2017-11-26 17:47:33.297+0000 WARN  Unknown config option: ha.host.data
2017-11-26 17:47:33.297+0000 WARN  Unknown config option: causal_clustering.transaction_listen_address
2017-11-26 17:47:33.313+0000 INFO  ======== Neo4j 3.3.0 ========
2017-11-26 17:47:33.339+0000 INFO  Starting...
2017-11-26 17:47:34.567+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2017-11-26 17:47:38.677+0000 INFO  Started.
2017-11-26 17:47:39.667+0000 INFO  Remote interface available at http://localhost:7474/

To check if the docker instance for Neo4J is up and running, execute the following command:

docker ps

The following should be the typical output:

Output.3

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                                      NAMES
812aef6fb1e9        neo4j:3.3.0         "/docker-entrypoint.s"   3 minutes ago       Up 3 minutes        0.0.0.0:7474->7474/tcp, 7473/tcp, 0.0.0.0:7687->7687/tcp   neo4j

Fire up a web-browser and open the URL http://localhost:7474. The following should be the typical output:

Neo4J Browser
Neo4J Browser

The default user name is neo4j and the default initial password is neo4j.

On successful login, one is prompted to change the default password as shown in the screenshot below:

Change Password
Change Password

On successful password change, one is presented with the Neo4J web interface as shown in the screenshot below:

Neo4J Interface
Neo4J Interface

On the left-hand side of the Neo4J web interface is a set of icons, the first of which is for the Database as shown in the screenshot below:

Icon Database
Database

Clicking on the Database icon displays information about the Neo4J database as shown in the screenshot below:

Database Information
Database Info

The second icon is for the Favorites as shown in the screenshot below:

Icon Favorites
Favorites

Clicking on the Favorites icon displays information about the Neo4J favorites as shown in the screenshot below:

Neo4j Favorites
Neo4j Favorites

The third icon is for the Documentation as shown in the screenshot below:

Icon Documentation
Documentation

Clicking on the Documentation icon displays links to various Neo4J documentation as shown in the screenshot below:

Neo4j Documentation
Neo4j Docs

The fifth icon is for the user interface Settings as shown in the screenshot below:

Icon Settings
Settings

Clicking on the Settings icon displays various Neo4J web interface settings as shown in the screenshot below:

Neo4j Settings
Neo4j Settings

References

Neo4J Official Site