How to Import CSV Files into Neo4j: A Step-by-Step Tutorial

03.10.2024

Before importing a CSV file into Neo4j, it is crucial to ensure that the data is properly formatted and structured. A well-prepared file minimizes errors and improves query performance.

From CSV To GraphRAG Systems With Neo4j And LangChain | Knowledge ...

1. Formatting Requirements

  • Use a **comma (`,`)** or **semicolon (`;`)** as the delimiter.
  • Ensure that the file is encoded in UTF-8 to avoid character issues.
  • Column names should be clear, without spaces (use underscores instead).

2. Structuring Your Data

Your CSV file should be structured in a way that represents nodes and relationships separately:

“A graph database is most effective when data is normalized and stored in a structured manner.”

Progress tracker

Ensure the CSV file has the correct format and headers.
+10%
Install and configure Neo4j database.
+20%
Use Cypher commands to import the CSV file.
+30%
Check Neo4j for successful data import
+40%

Example 1: Nodes CSV (Users)

user_id,name,age,email
1,Alice,30,alice@email.com
2,Bob,25,bob@email.com

Example 2: Relationships CSV (Friendship)

user_id_1,user_id_2,since
1,2,2022

3. Handling Special Cases

  • Null values: Use an empty string or `NULL` to represent missing data.
  • Escape characters: If a field contains commas, enclose it in double quotes (`” “`).
  • Dates: Ensure consistent formats like `YYYY-MM-DD`.

Once your CSV file is formatted correctly, you are ready to import it into **Neo4j** without issues. Next, we’ll configure Neo4j for a smooth import process.

Before importing a CSV file into Neo4j, you must ensure the file is properly structured. A well-organized dataset reduces errors and improves query efficiency.

1. Formatting Your CSV File

  • Use a comma (`,`) or semicolon (`;`) as the delimiter.
  • Ensure the file is encoded in UTF-8 to prevent character issues.
  • Column names should be simple, without spaces (use underscores instead, e.g., first_name).

2. Structuring Data for Neo4j

Neo4j requires separate CSV files for nodes and relationships. Each file must follow a specific structure.

“Graph databases perform best when data is structured according to relationships rather than traditional table-based formats.”

Example: Nodes CSV (Users)

user_id,name,age,email
1,Alice,30,alice@email.com
2,Bob,25,bob@email.com

Example: Relationships CSV (Friendship)

user_id_1,user_id_2,since
1,2,2022

3. Handling Common Issues

  • Missing values: Use an empty string or `NULL` where data is unavailable.
  • Escape characters: If a field contains commas, enclose it in double quotes (`” “`).
  • Date formats: Use a consistent format, such as YYYY-MM-DD, to avoid errors.

4. Verifying Your CSV File

Before importing, review the file to ensure data integrity. You can open it in a text editor or spreadsheet tool.

With your CSV properly formatted, you’re ready to proceed with importing it into Neo4j efficiently.

Before you can import your CSV file into Neo4j, you need to ensure the database is properly set up. This includes configuring Neo4j settings, placing your CSV files in the correct directory, and verifying database access.

1. Configuring Neo4j for CSV Import

By default, Neo4j restricts file imports for security reasons. To enable imports, you must modify the Neo4j configuration file.

Steps to Enable File Import:

  • Locate the neo4j.conf file in the conf/ directory of your Neo4j installation.
  • Find the following line:

    #dbms.directories.import=import

  • Uncomment and modify it to:

    dbms.directories.import=import

  • Save the file and restart Neo4j.

2. Placing CSV Files in the Import Directory

Neo4j requires CSV files to be in the import directory of your Neo4j installation. To move your files:

mv your_data.csv /path/to/neo4j/import/

3. Verifying Database Access

Ensure Neo4j is running and accessible before attempting the import. Open Neo4j Browser and run:

:server status

If Neo4j is running, proceed with the import. Otherwise, restart the service using:

neo4j start

4. Testing CSV File Readability

Before full import, test whether Neo4j can read your CSV file:

LOAD CSV WITH HEADERS FROM 'file:///your_data.csv' AS row RETURN row LIMIT 5;
  • If the query returns data: The file is correctly placed, and you can proceed with import.
  • If an error occurs: Check file placement and permissions.

Once Neo4j is properly set up, you’re ready to begin the actual data import process.

Once your CSV file is prepared and Neo4j is properly configured, you can import the data using Cypher commands. Cypher provides a powerful and flexible way to load structured data into your Neo4j database.

1. Understanding the LOAD CSV Command

The LOAD CSV command in Cypher is used to read data from a CSV file and process it into nodes and relationships.

Basic Syntax:

LOAD CSV WITH HEADERS FROM 'file:///your_data.csv' AS row RETURN row LIMIT 5;

This command checks if the CSV file is accessible and displays the first five rows.

2. Creating Nodes from CSV Data

To create nodes in Neo4j from your CSV file, use:

LOAD CSV WITH HEADERS FROM 'file:///your_data.csv' AS row CREATE (:Person {name: row.name, age: toInteger(row.age), city: row.city});
  • WITH HEADERS ensures the first row is treated as column headers.
  • CREATE adds a node of type Person with properties.
  • toInteger() converts text values into numbers where necessary.

3. Creating Relationships from CSV Data

If your CSV file contains relationships, you can import them using the MERGE command:

LOAD CSV WITH HEADERS FROM 'file:///relationships.csv' AS row MATCH (a:Person {name: row.person1}), (b:Person {name: row.person2}) CREATE (a)-[:FRIENDS_WITH]->(b);
  • MATCH finds existing nodes.
  • CREATE establishes a relationship between them.

4. Handling Large Imports Efficiently

For large datasets, use USING PERIODIC COMMIT to improve performance:

USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'file:///large_data.csv' AS row CREATE (:Person {name: row.name, age: toInteger(row.age)});

This processes data in batches of 1,000 rows, reducing memory load.

5. Verifying Data After Import

To check imported data, run:

MATCH (p:Person) RETURN p LIMIT 10;

This query retrieves the first ten Person nodes for validation.

By following these steps, you can efficiently import CSV files into Neo4j using Cypher commands, ensuring structured and connected data within your graph database.

When importing CSV files into Neo4j, you may encounter various errors that prevent a smooth data import. Understanding common issues and their solutions can save time and frustration. Below are some of the most frequent problems and how to resolve them.

1. File Not Found Error

If Neo4j can’t locate your CSV file, check the following:

  • Ensure the file is in the import directory. By default, Neo4j only reads files from neo4j/import.
  • Use the correct file path. The command should follow this format:
LOAD CSV WITH HEADERS FROM 'file:///your_data.csv' AS row RETURN row LIMIT 5;

If the file is outside the import folder, modify neo4j.conf to allow access:

dbms.security.allow_csv_import_from_file_urls=true

2. Incorrect Data Formatting

Neo4j expects properly formatted CSV files. Common mistakes include:

  • Missing headers: Ensure the first row contains column names.
  • Extra spaces: Use TRIM() to clean up whitespace:
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row CREATE (:Person {name: TRIM(row.name), age: toInteger(row.age)});

3. Type Conversion Errors

If numbers or dates fail to import correctly, check the data types:

  • Convert string to integer: toInteger(row.age)
  • Convert string to float: toFloat(row.salary)
  • Convert string to date: date(row.birthday)

4. Duplicate Data Issues

To prevent duplicate nodes, use MERGE instead of CREATE:

MERGE (p:Person {name: row.name}) ON CREATE SET p.age = toInteger(row.age);

5. Memory and Performance Problems

For large imports, Neo4j may slow down or crash. Optimize performance by:

  • Using batch commits: Load data in chunks with:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'file:///large_data.csv' AS row CREATE (:Person {name: row.name, age: toInteger(row.age)});
  • Indexing nodes: Speed up queries by indexing frequently used properties:
CREATE INDEX FOR (p:Person) ON (p.name);

6. Encoding Issues

If you see strange characters, your file may have encoding issues. Convert it to UTF-8:

iconv -f ISO-8859-1 -t UTF-8 input.csv -o output.csv

7. Verifying Data After Import

To confirm data is correctly loaded, run:

MATCH (p:Person) RETURN p LIMIT 10;

By addressing these common import issues, you can ensure a smooth and error-free data import process in Neo4j.

What is the first step in preparing a CSV file for Neo4j import?

The first step is to ensure that your CSV file is properly formatted with the correct headers and data types.

Which command is used to import CSV files in Neo4j?

The command used is `LOAD CSV WITH HEADERS FROM 'file:///path/to/your/file.csv' AS row`

What should you do if you encounter an import error?

Check the error message for details, ensure your CSV file is correctly formatted, and verify that the Neo4j database is properly configured.

Do you like the article?

Yan Hadzhyisky

fullstack PHP+JS+REACT developer