📥 Loading Data from CSV in Neo4j
LOAD CSV
lets you import external CSV files into Neo4j and turn them into nodes and relationships.
1. Basic Syntax
-
file:///
→ means file should be in Neo4j’s import directory (default:<neo4j_home>/import
). -
row
→ each row of the CSV is returned as a list of strings.
2. Example CSV
📄 users.csv
3. Load and Create Nodes
✅ Creates 3 User
nodes:
4. Load Relationships
📄 friends.csv
Query:
✅ Creates relationships:
5. MERGE Instead of CREATE (avoid duplicates)
👉 Ensures nodes are not duplicated if the CSV is loaded again.
6. Handling Large CSVs
-
Use
USING PERIODIC COMMIT
to commit in batches:
This commits every 1000 rows → prevents memory issues.
7. Extra Tricks
-
Skip empty values:
-
Split list values from CSV:
(CSV: "Python;Neo4j;SQL"
→ stored as ["Python","Neo4j","SQL"]
)
⚡ Summary
-
LOAD CSV FROM ... AS row
→ read file. -
Use
WITH HEADERS
if CSV has headers. -
Convert data types using
toInteger()
,toFloat()
. -
Use
MERGE
(notCREATE
) to prevent duplicates. -
For large imports →
USING PERIODIC COMMIT
.
📥 Importing JSON / APIs into Neo4j
1. Using APOC Procedures (apoc.load.json
)
👉 APOC (Awesome Procedures on Cypher) is an extension library for Neo4j.
It adds support for loading JSON directly from files or APIs.
Example JSON file
📄 users.json
Load JSON from file
✅ Creates nodes:
2. Import from a REST API
If the API returns JSON, you can fetch it directly:
✅ Imports users from a public API.
3. Handling Nested JSON
📄 Example JSON:
Cypher:
✅ Creates:
4. JSON from Parameters
You can also pass JSON into a query as a parameter (e.g., from a Python/Java app):
And from your driver (Python example):
5. Best Practices
-
Use
MERGE
instead ofCREATE
for re-runs. -
Use
UNWIND
to handle arrays. -
For huge JSON → break into chunks before loading.
-
Secure API calls → pass headers (APOC supports this:
apoc.load.jsonParams
).
✅ Summary
-
CSV →
LOAD CSV
-
JSON/API →
apoc.load.json
orapoc.load.jsonParams
-
Nested JSON →
UNWIND
arrays and create relationships -
From code → pass JSON params into Cypher
📦 Bulk Imports with neo4j-admin import
1. When to use
-
Fresh database (not an existing one).
-
Loading very large datasets.
-
Input data is available in CSV format.
❌ Cannot be run on an active running database.
2. Basic Command
✅ This creates a new database named graph.db
with data from CSV files.
3. CSV Format Requirements
-
Header row defines columns.
-
Use
:ID
,:LABEL
,:START_ID
,:END_ID
, and:TYPE
. -
Properties are just column names.
Example: Users
📄 users.csv
Example: Projects
📄 projects.csv
Example: Relationships
📄 works_on.csv
4. Run Import
✅ Resulting graph:
5. Advanced Options
-
--high-io
→ use max disk speed. -
--delimiter="|"
→ custom CSV delimiter. -
--array-delimiter=";"
→ handle lists. -
--skip-bad-relationships=true
→ skip errors.
6. Best Practices
-
Always import into an empty database.
-
Prepare clean CSVs (no missing IDs).
-
Use integer IDs for speed.
-
After import, create indexes & constraints for queries.
✅ Summary
-
neo4j-admin import
= fastest way for first-time bulk loading. -
Needs structured CSV files with
:ID
,:START_ID
, etc. -
Great for millions or billions of records.
-
Not for updates → only for new databases.
Comments
Post a Comment