Introduction

In the realm of Neo4j, the APOC (Awesome Procedures on Cypher) library stands as a powerful tool. Previously, We have talked about the importance of APOC in optimising Cypher queries and improving query efficiency in our article Exploring Methods of Cypher Query Optimisations. This article delves deeper into some important functions available within the APOC library.

Developed as a core library supported by Neo4j, APOC serves as a standard utility library, offering access to various common procedures and functions. Beyond simply improving query efficiency, APOC extends Cypher queries’ scope into broader areas such as data integration, graph algorithms, and data conversion.

Installation

1.Neo4j Desktop

After creating a Project and a DBMS(Database Management System), click on the DBMS and you will find the Plugins tab. Look for the APOC tab

Then click on APOC to expand more and click the install button

2. Neo4j Server

APOC depends on the internal API for Neo4j. In Neo4j, you need to use the matching APOC version to ensure that the first two version numbers between Neo4j and APOC match.

Choose the right version and install.

For more details, please check the APOC Installation documentation.

Common features

With over 450 standard procedures, APOC provides rich functionality covering utilities, conversions, graph updates, and more. These procedures are well-supported and highly deployable, whether used as separate functions or integrated into Cypher queries.

You can find an introduction guide in APOC User Guide documentation

Import and export

The APOC library supports importing various data formats including JSON, HTML, XML, CSV, and XLS to the Neo4j database. It also supports exporting data into various data formats, including JSON, CSV, GraphML, Excel, Gephi, and Cypher script.

Data integration

The APOC library supports integration with other databases, including relational databases (via JDBC apoc.load.jdbc), MongoDB, Elastic, and Couchbase. It also supports importing data from LDAP directories (via apoc.load.ldap) and executing queries against other Neo4j databases such as bolt procedures via bolt protocol. It also allows for database modelling (via apoc.model.jdbc) to extract metadata information by any JDBC compatible database.

Data structure

The APOC library provides functions and procedures for operating data structures, which are mainly divided into three types, namely Conversion Functions, Map Functions, and Collection Functions. The conversion function is used to cast the “Any” value to a specific type, it is mainly located under the apoc.convert package.

The mapping function is used to operate the map type, it mainly resides in the apoc.map package. The collection function is used to operate collections and lists, mainly under the apoc.coll package.

Temporal (Date Time)

APOC library adds support for formatting time types, timestamps, and date string values, mainly under the apoc.temporal and apoc.date packages.

Mathematical

APOC provides functions and procedures for mathematical operations, including math functions (including rounding, maximum and minimum values, etc.), exact math, number format conversion functions, and bitwise operation functions. The mathematical operation function is mainly located in the apoc.math package, the exact math calculation function is located in the apoc.number.exact package, the number format conversion function is located in the apoc.number package, and the bitwise operation is located in the apoc.bitwise package

Advanced Graph Querying

The advanced graph query provided by APOC library includes: path expander (apoc.path), neighbour function (apoc.neighbors), path manipulation (apoc.path), relationship querying (apoc.rel), node querying (apoc.nodes), parallel node search (apoc.search), etc.

Cypher Execution

  • Running Cypher fragments: provide a safe, graph-aware, partially compiled scripting Cypher language within APOC (apoc.cypher)
  • Conditional Cypher Execution: simulate an if / else structure.
  • Timeboxed Cypher statements: terminate a cypher statement with a given threshold.
  • Run multiple Statements: run each semicolon separated statement
  • Run Cypher Script Files: run each statement in the file / each file, separated by semicolons
  • Parallel Cypher Execution: provide parallel execution of Cypher statements

Examples

The APOC library showcases its adaptability and effectiveness in real world application. From node and relationship creation to batch transactions, APOC offers indispensable functionalities for optimising workflows and maximising efficiency.

Help Command

To access the APOC supported procedures and functions:

call apoc.help('apoc')

Create Nodes and Relationships

  1. Use apoc.create to create nodes.
  • Create a node with dynamic labels:

apoc.create.node([‘Label’], {key:value,…​})

  • For example:
CALL apoc.create.node(["Person", "Actor"], {name: "Tom Hanks"});
Create a Node Result

This will create a Node with Label Person” and “Actor”, Attribute name” : “Tom Hanks”.

  • Create multiple node with dynamic labels:
apoc.create.nodes([‘Label’], [{key:value,…​},{key:value,…}…])

2. Use apoc.create.relationship to create relationships.

  • Create relationship with dynamic rel-type:

apoc.create.relationship(person1,’KNOWS’,{key:value,…​}, person2)

  • For example:
MATCH (p:Person {name: "Tom Hanks"})
MATCH (m:Movie {title:"You've Got Mail"})
CALL apoc.create.relationship(p, "ACTED_IN", {roles:['Joe Fox']}, m)
YIELD rel
RETURN rel;
Create Relationship Result

This will create a relationship of type “ACTED_IN” between the node representing Tom Hanks (matched as a Person node with the name “Tom Hanks”) and the movie “You’ve Got Mail” (matched as a Movie node with the title “You’ve Got Mail”). This relationship means that Tom Hanks acted in the movie “You’ve Got Mail” and played the role of “Joe Fox”.

Merge Nodes and Relationships

  1. Use apoc.merge to create nodes.
  • Merge nodes with dynamic labels:

apoc.merge.node([‘Label’], identProps:{key:value, …​}, onCreateProps:{key:value,…​}, onMatchProps:{key:value,…​}})

  • For example:
CALL apoc.merge.node(
 ["Person", "Actor"],
 {name: "Tom Hanks"},
 {created: datetime()},
 {lastSeen: datetime()}
);
Merge Result to Create node

This will merge an existing node labelled as “Person” and “Actor” with the name “Tom Hanks” and add properties created and lastseen time or create a new node with these labels and properties if one does not already exist.

2. Use apoc.create.relationship to merge relationships.

  • Create relationship with dynamic rel-type:

apoc.merge.relationship(startNode, relType, identProps:{key:value, …​}, onCreateProps:{key:value, …​}, endNode, onMatchProps:{key:value, …​})

  • For example:
MATCH (p:Person {name: "Tom Hanks"})
MATCH (m:Movie {title:"You've Got Mail"})
CALL apoc.merge.relationship(p, "ACTED_IN",
 {roles:['Joe Fox']},
 {created: datetime()},
 m,
 {lastSeen: datetime()}
)
YIELD rel
RETURN rel;
Merge Result with added properties

This will merge a relationship between the node representing Tom Hanks and the movie “You’ve Got Mail” with the relationship type “ACTED_IN‘’ and the role “Joe Fox ‘’, adding properties created and lastseen time.

3. Use apoc.refactor.mergeNodes

  • Merge nodes and relationships together:

apoc.refactor.mergeNodes(nodes LIST<NODE>, config MAP<STRING, ANY>)

- Merges the given LIST<NODE> onto the first NODE in the LIST<NODE>. All RELATIONSHIP values are merged onto that NODE as well.

Supported config properties:

  • Discard: If a node’s property is already set, no changes are made. If the node’s property is not set yet, the first property in the list is written.
  • Override / Overwrite: The last property in the list is written regardless of whether the node’s property is already set, overriding any existing value.
  • Combine: If there is only one property in the list, it is set as the node’s property. If there are multiple properties in the list, an array is created, and an attempt is made to combine these values together. This may involve value transformation or type coercion.
  • mergeRels: This parameter controls whether to allow merging relationships with the same type and direction. If set to true, relationships with the same type and direction will be merged. If set to false, these relationships will not be merged.
  • produceSelfRel: If set to true (default value), any new self-relationship will be inserted into the target node. If set to false, no new self-relationship will be inserted. Note that this parameter is independent of the mergeRels configuration and does not affect existing self-relationships.
  • preserveExistingSelfRels: This parameter is only effective when mergeRels is set to true. If set to true (default value), existing self-relationships in the target node will be preserved; otherwise, they will be deleted.
  • singleElementAsArray: If set to false (default value) and the type is combine, setting to true ensures that the result remains as an array when merging two arrays, while setting to false makes the result a single value when the array size is 1.

Batch Transactions

  1. Use apoc.periodic to perform periodic or batch operations.
  • Run the second statement for each item returned by the first statement:

apoc.periodic.iterate(‘statement returning items’, ‘statement per item’, {batchSize:1000,iterateList:true,parallel:false,params:{},concurrency:50,retries:0}) YIELD batches, total

  • For example:
CALL apoc.periodic.iterate(
 "MATCH (p:Person) WHERE (p)-[:ACTED_IN]->() RETURN p",
 "SET p:Actor",
 {batchSize:10000, parallel:true})

This will iterate over batches of nodes labelled as “Person” who have acted in any movie, adding the label “Actor” to each node in batches of 10,000 nodes and processing them in parallel.

2. Use apoc.periodic.commit to execute a write operation.

  • runs the given statement in separate transactions until it returns 0:

apoc.periodic.commit(statement,params)

  • For example:
CALL apoc.periodic.commit(
 "MATCH (person:Person)
 WHERE exists(person.city)
 WHERE person limit $limit
 MERGE (person:City {name:person.city})
 MERGE (person)-[:LIVES_IN]->(city)
 REMOVE person.city
 RETURN count(*)",
 {limit:1000});

This will merge a City node for each Person node with a city property, create a LIVES_IN relationship, remove the city property from each Person node, and return the count of processed nodes, executed in batches of 1000 nodes until completion.

Practical Use Case

I will use the APOC library to merge duplicate nodes.

Firstly create nodes and relationships:

CALL apoc.create.nodes(["Dataset"], [{doi:'10.1001/965111',key:'orcid/001',title:'Environmental Impact on Agriculture',source:'orcid.org'},{doi:'10.1001/965111',key:'orcid/002',title:'Environmental Impact on Agriculture',source:'orcid.org'},{doi:'10.1001/965111',key:'orcid/003',title:'Environmental Impact on Agriculture',source:'orcid.org'}]);
CALL apoc.create.node(["Researcher"], {name: "Amy Jamison"});
CALL apoc.create.node(["Publication"], {title: "Environment"});
CALL apoc.create.node(["Grant"], {title: "Research Institution"});

MATCH (d1:Dataset {key:'orcid/001'})
MATCH (g:Grant {title: "Research Institution"})
CALL apoc.create.relationship(d1, "HAS", null, g)
YIELD rel
RETURN rel;

MATCH (d2:Dataset {key:'orcid/002'})
MATCH (g:Grant {title: "Research Institution"})
CALL apoc.create.relationship(d2, "HAS", null, g)
YIELD rel
RETURN rel;

MATCH (d3:Dataset {key:'orcid/003'})
MATCH (g:Grant {title: "Research Institution"})
CALL apoc.create.relationship(d3, "HAS", null, g)
YIELD rel
RETURN rel;

MATCH (p:Publication {title: "Environment"})
MATCH (d1:Dataset {key:'orcid/001'})
CALL apoc.create.relationship(p, "PUBLICATED", null, d1)
YIELD rel
RETURN rel;

MATCH (p:Publication {title: "Environment"})
MATCH (d2:Dataset {key:'orcid/002'})
CALL apoc.create.relationship(p, "PUBLICATED", null, d2)
YIELD rel
RETURN rel;

MATCH (p:Publication {title: "Environment"})
MATCH (d3:Dataset {key:'orcid/003'})
CALL apoc.create.relationship(p, "PUBLICATED", null, d3)
YIELD rel
RETURN rel;

MATCH (r:Researcher {name: "Amy Jamison"})
MATCH (d1:Dataset {key:'orcid/001'})
CALL apoc.create.relationship(r, "OWNED", null, d1)
YIELD rel
RETURN rel;

MATCH (r:Researcher {name: "Amy Jamison"})
MATCH (d2:Dataset {key:'orcid/002'})
CALL apoc.create.relationship(r, "OWNED", null, d2)
YIELD rel
RETURN rel;

MATCH (r:Researcher {name: "Amy Jamison"})
MATCH (d3:Dataset {key:'orcid/003'})
CALL apoc.create.relationship(r, "OWNED", null, d3)
YIELD rel
RETURN rel;

This creates six nodes with three Dataset nodes having the same doi, title, and source but different key, along with one Researcher, one Publication, and one Grant node, and establishes nine relationships between them.

Create Node and Relationship

Then use the apoc.refactor.mergeNodes to merge the duplicate nodes:

MATCH (n:Dataset)
WITH n.doi AS doi, collect(n) as nodes
CALL apoc.refactor.mergeNodes(nodes, {properties: {
 title:'combine',
 source:'combine',
 key:'combine',
 `.*`: 'discard'
},mergeRels:TRUE})
YIELD node
RETURN node

This merges nodes based on their doi property while combining specified properties and discarding all other properties. Any relationships associated with the merged nodes will also be merged

Merge Node and Relationship
Merge table

Since I set the key attribute property as combine, that if multiple nodes being merged have different values for the key property, those values will be combined into an array and as set mergeRels to true that relationships of the same type and direction will be merged.

If mergeRels not set to True, the result is like this:

Merge Result without mergeRels

Conclusion

In summary, the APOC library is the cornerstone of Neo4j users, whether optimising queries, integrating data sources, or performing advanced graphical analysis. APOC is an important asset in every Neo4j practitioner’s toolkit. It greatly simplifies the development process and saves a lot of time and effort, providing users with more flexibility and scalability.

References

Catch the latest version of this article over on Medium.com. Hit the button below to join our readers there.

Learn more on Medium