Importing & Exporting Data-1


📥 Loading Data from CSV in Neo4j

LOAD CSV lets you import external CSV files into Neo4j and turn them into nodes and relationships.


1. Basic Syntax

LOAD CSV FROM 'file:///filename.csv' AS row RETURN row
  • file:/// → means file should be in Neo4j’s import directory (default: <neo4j_home>/import).

  • row → each row of the CSV is returned as a list of strings.


2. Example CSV

📄 users.csv

id,name,age 1,Alice,30 2,Bob,25 3,Charlie,35

3. Load and Create Nodes

LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row CREATE (:User {id: toInteger(row.id), name: row.name, age: toInteger(row.age)});

✅ Creates 3 User nodes:

(:User {id: 1, name: "Alice", age: 30}) (:User {id: 2, name: "Bob", age: 25}) (:User {id: 3, name: "Charlie", age: 35})

4. Load Relationships

📄 friends.csv

from,to 1,2 2,3 1,3

Query:

LOAD CSV WITH HEADERS FROM 'file:///friends.csv' AS row MATCH (u1:User {id: toInteger(row.from)}) MATCH (u2:User {id: toInteger(row.to)}) CREATE (u1)-[:FRIEND_WITH]->(u2);

✅ Creates relationships:

(Alice)-[:FRIEND_WITH]->(Bob) (Bob)-[:FRIEND_WITH]->(Charlie) (Alice)-[:FRIEND_WITH]->(Charlie)

5. MERGE Instead of CREATE (avoid duplicates)

LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row MERGE (u:User {id: toInteger(row.id)}) SET u.name = row.name, u.age = toInteger(row.age);

👉 Ensures nodes are not duplicated if the CSV is loaded again.


6. Handling Large CSVs

  • Use USING PERIODIC COMMIT to commit in batches:

USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'file:///bigfile.csv' AS row ...

This commits every 1000 rows → prevents memory issues.


7. Extra Tricks

  • Skip empty values:

WHERE row.name IS NOT NULL
  • Split list values from CSV:

SET u.skills = split(row.skills, ";")

(CSV: "Python;Neo4j;SQL" → stored as ["Python","Neo4j","SQL"])


⚡ Summary

  • LOAD CSV FROM ... AS row → read file.

  • Use WITH HEADERS if CSV has headers.

  • Convert data types using toInteger(), toFloat().

  • Use MERGE (not CREATE) to prevent duplicates.

  • For large imports → USING PERIODIC COMMIT.

📥 Importing JSON / APIs into Neo4j

1. Using APOC Procedures (apoc.load.json)

👉 APOC (Awesome Procedures on Cypher) is an extension library for Neo4j.
It adds support for loading JSON directly from files or APIs.


Example JSON file

📄 users.json

[ { "id": 1, "name": "Alice", "age": 30 }, { "id": 2, "name": "Bob", "age": 25 } ]

Load JSON from file

CALL apoc.load.json("file:///users.json") YIELD value CREATE (:User {id: value.id, name: value.name, age: value.age});

✅ Creates nodes:

(:User {id: 1, name: "Alice", age: 30}) (:User {id: 2, name: "Bob", age: 25})

2. Import from a REST API

If the API returns JSON, you can fetch it directly:

CALL apoc.load.json("https://jsonplaceholder.typicode.com/users") YIELD value CREATE (:User {id: value.id, name: value.name, email: value.email});

✅ Imports users from a public API.


3. Handling Nested JSON

📄 Example JSON:

{ "id": 1, "name": "Alice", "projects": [ {"title": "Graph DB", "year": 2023}, {"title": "AI System", "year": 2024} ] }

Cypher:

CALL apoc.load.json("file:///user_projects.json") YIELD value MERGE (u:User {id: value.id}) SET u.name = value.name UNWIND value.projects AS proj MERGE (p:Project {title: proj.title}) MERGE (u)-[:WORKS_ON {year: proj.year}]->(p);

✅ Creates:

(Alice)-[:WORKS_ON {year:2023}]->(Graph DB) (Alice)-[:WORKS_ON {year:2024}]->(AI System)

4. JSON from Parameters

You can also pass JSON into a query as a parameter (e.g., from a Python/Java app):

WITH $json AS data UNWIND data AS row CREATE (:User {id: row.id, name: row.name, age: row.age});

And from your driver (Python example):

session.run(""" WITH $json AS data UNWIND data AS row CREATE (:User {id: row.id, name: row.name, age: row.age}) """, json=[{"id":1,"name":"Alice","age":30},{"id":2,"name":"Bob","age":25}])

5. Best Practices

  • Use MERGE instead of CREATE for re-runs.

  • Use UNWIND to handle arrays.

  • For huge JSON → break into chunks before loading.

  • Secure API calls → pass headers (APOC supports this: apoc.load.jsonParams).


Summary

  • CSV → LOAD CSV

  • JSON/API → apoc.load.json or apoc.load.jsonParams

  • Nested JSON → UNWIND arrays and create relationships

  • From code → pass JSON params into Cypher

📦 Bulk Imports with neo4j-admin import

1. When to use

  • Fresh database (not an existing one).

  • Loading very large datasets.

  • Input data is available in CSV format.
    ❌ Cannot be run on an active running database.


2. Basic Command

neo4j-admin import \ --database=graph.db \ --nodes=import/users.csv \ --nodes=import/projects.csv \ --relationships=import/works_on.csv

✅ This creates a new database named graph.db with data from CSV files.


3. CSV Format Requirements

  • Header row defines columns.

  • Use :ID, :LABEL, :START_ID, :END_ID, and :TYPE.

  • Properties are just column names.


Example: Users

📄 users.csv

userId:ID,name,age:int,:LABEL 1,Alice,30,User 2,Bob,25,User

Example: Projects

📄 projects.csv

projectId:ID,title,year:int,:LABEL 10,Graph DB,2023,Project 20,AI System,2024,Project

Example: Relationships

📄 works_on.csv

:START_ID,:END_ID,role,:TYPE 1,10,Developer,WORKS_ON 1,20,Lead,WORKS_ON 2,10,Tester,WORKS_ON

4. Run Import

neo4j-admin import \ --database=graph.db \ --nodes=users.csv \ --nodes=projects.csv \ --relationships=works_on.csv

✅ Resulting graph:

(:User {id:1, name:"Alice", age:30})-[:WORKS_ON {role:"Developer"}]->(:Project {id:10, title:"Graph DB", year:2023}) (:User {id:2, name:"Bob", age:25})-[:WORKS_ON {role:"Tester"}]->(:Project {id:10, title:"Graph DB", year:2023})

5. Advanced Options

  • --high-io → use max disk speed.

  • --delimiter="|" → custom CSV delimiter.

  • --array-delimiter=";" → handle lists.

  • --skip-bad-relationships=true → skip errors.


6. Best Practices

  • Always import into an empty database.

  • Prepare clean CSVs (no missing IDs).

  • Use integer IDs for speed.

  • After import, create indexes & constraints for queries.


Summary

  • neo4j-admin import = fastest way for first-time bulk loading.

  • Needs structured CSV files with :ID, :START_ID, etc.

  • Great for millions or billions of records.

  • Not for updates → only for new databases.

Comments