Graph Databases Neo4j Guide | Pillai Infotech LLP

Q: Can't I just use SQL JOINs instead of a graph database?

For 1-2 hops, yes. At 3+ hops, graph databases are orders of magnitude faster due to index-free adjacency.

Q: How large can a Neo4j graph be?

Billions of nodes and relationships on a single instance. Neo4j 5.x supports sharding via Fabric for truly massive graphs.

Q: Is GQL the new standard for graph queries?

Yes, GQL was approved as an ISO standard in 2024, heavily influenced by Cypher. Learning Cypher prepares you for GQL.

Q: How do graph databases handle transactions?

Neo4j supports full ACID transactions. Use relational DBs for financial transactions, graph DBs for relationship queries.

Q: Should I use graph databases for RAG (AI knowledge graphs)?

Yes — GraphRAG combines vector search with knowledge graphs for better LLM accuracy. Neo4j has built-in vector indexes for this.

Graph Databases: Building Connected Data Applications with Neo4j

When relationships ARE your data — friend networks, fraud rings, supply chains, knowledge graphs — relational databases hit a wall. Graph databases make connected queries trivial.

🗄️ Database & Data September 27, 2025 12 min read

In This Guide

1. When Relational Databases Struggle with Relationships
2. Graph Database Concepts — Nodes, Edges, Properties
3. Neo4j — Cypher Query Language
4. Real-World Use Cases
5. Neo4j vs Alternatives
6. When NOT to Use a Graph Database
7. Frequently Asked Questions

In a relational database, finding "friends of friends who like the same products" requires multiple self-JOINs — and the query time grows exponentially with depth. In a graph database, the same query is a simple traversal that runs in milliseconds regardless of dataset size. When your data is defined by connections, a graph database isn't just convenient — it's a fundamentally better model.

1. When Relational Databases Struggle with Relationships

The JOIN Problem

-- SQL: "Find friends of friends who bought the same product"
-- This is already painful at 2 hops...
SELECT DISTINCT f2.name
FROM users u
JOIN friendships f1 ON u.id = f1.user_id
JOIN users friend ON f1.friend_id = friend.id
JOIN friendships f2_link ON friend.id = f2_link.user_id
JOIN users f2 ON f2_link.friend_id = f2.id
JOIN purchases p1 ON u.id = p1.user_id
JOIN purchases p2 ON f2.id = p2.user_id
WHERE u.id = 12345
  AND p1.product_id = p2.product_id
  AND f2.id != u.id;

-- At 3+ hops, this becomes exponentially slower
-- 1M users, 10M friendships: 2 hops = seconds, 3 hops = minutes, 4 hops = timeout

-- Cypher (Neo4j): Same query
MATCH (u:User {id: 12345})-[:FRIENDS*2]->(fof:User),
      (u)-[:BOUGHT]->(p:Product)<-[:BOUGHT]-(fof)
WHERE u <> fof
RETURN DISTINCT fof.name
-- Runs in milliseconds regardless of graph size

2. Graph Database Concepts

Concept	Graph Term	SQL Equivalent	Example
Entity	Node	Row in a table	A person, product, or location
Connection	Edge (Relationship)	Foreign key / JOIN table	FRIENDS_WITH, BOUGHT, WORKS_AT
Attribute	Property	Column	name: "Alice", since: 2024
Category	Label	Table name	:User, :Product, :Company

3. Neo4j — Cypher Query Language

Creating a Social Graph

// Create nodes
CREATE (alice:User {name: "Alice", age: 30, city: "Mumbai"})
CREATE (bob:User {name: "Bob", age: 28, city: "Delhi"})
CREATE (charlie:User {name: "Charlie", age: 35, city: "Mumbai"})
CREATE (phone:Product {name: "iPhone 16", price: 999, category: "Electronics"})
CREATE (laptop:Product {name: "MacBook Pro", price: 2499, category: "Electronics"})

// Create relationships (with properties)
CREATE (alice)-[:FRIENDS_WITH {since: 2022}]->(bob)
CREATE (bob)-[:FRIENDS_WITH {since: 2023}]->(charlie)
CREATE (alice)-[:BOUGHT {date: "2026-01-15", amount: 999}]->(phone)
CREATE (charlie)-[:BOUGHT {date: "2026-01-20", amount: 999}]->(phone)
CREATE (bob)-[:BOUGHT {date: "2025-12-01", amount: 2499}]->(laptop)

Querying — Pattern Matching with Cypher

// Recommendation: "People who bought this also bought..."
MATCH (u:User)-[:BOUGHT]->(p:Product {name: "iPhone 16"})<-[:BOUGHT]-(other:User)
      -[:BOUGHT]->(rec:Product)
WHERE rec <> p
RETURN rec.name, COUNT(*) AS score
ORDER BY score DESC
LIMIT 5

// Fraud detection: Find circular money transfers
MATCH path = (a:Account)-[:TRANSFERRED*3..6]->(a)
WHERE ALL(r IN relationships(path) WHERE r.amount > 10000)
RETURN path, length(path) AS hops

// Shortest path between two users
MATCH path = shortestPath(
    (alice:User {name: "Alice"})-[:FRIENDS_WITH*]-(target:User {name: "Charlie"})
)
RETURN path, length(path) AS degrees_of_separation

// Knowledge graph: Find all skills connected to a technology
MATCH (t:Technology {name: "Kubernetes"})-[:REQUIRES|USES*1..3]->(skill)
RETURN DISTINCT skill.name, labels(skill)

4. Real-World Use Cases

Use Case	Graph Pattern	Companies Using
Recommendation engines	Collaborative filtering via shared purchases/views	eBay, Walmart, Airbnb
Fraud detection	Circular transfers, identity clusters, device sharing	PayPal, HSBC, Citi
Knowledge graphs	Entities + relationships + semantic connections	Google, NASA, Novartis
Social networks	Friend suggestions, influence mapping, communities	LinkedIn, Twitter/X
Supply chain	Supplier dependencies, risk propagation	Maersk, Caterpillar
Access control (IAM)	User → Role → Permission → Resource traversals	Auth0, many enterprises

5. Neo4j vs Alternatives

Database	Model	Best For	Pricing
Neo4j	Property graph (Cypher)	General graph workloads, developer experience	Community (free) / Enterprise
Amazon Neptune	Property + RDF (Gremlin, SPARQL)	AWS-native, managed	Pay-per-use
ArangoDB	Multi-model (doc + graph + KV)	Teams needing graph + document in one	Open source / Cloud
Memgraph	Property graph (Cypher)	Real-time streaming graph analytics	Community (free) / Enterprise
PostgreSQL + Apache AGE	Graph extension for PG	Adding graph queries to existing PG	Free (extension)

6. When NOT to Use a Graph Database

Bulk analytics on tabular data — use a data warehouse or lakehouse
Simple CRUD with few relationships — PostgreSQL handles this better
Document storage — use MongoDB or DynamoDB
Time-series data — use a time-series database
Full-text search — use Elasticsearch

Our Approach: We don't recommend graph databases as your primary database. Use PostgreSQL as your system of record, and add Neo4j as a secondary store for the specific queries that need graph traversal. Sync via CDC or application-level events. This gives you the best of both worlds — ACID for transactions, graph for relationship queries.

Frequently Asked Questions

Can't I just use SQL JOINs instead of a graph database?

For 1-2 hops, yes — SQL JOINs work fine. At 3+ hops (friends of friends of friends), SQL performance degrades exponentially because each hop multiplies the number of JOINs. Graph databases use index-free adjacency — traversing a relationship is O(1) regardless of graph size. If your queries involve variable-depth traversals, a graph database is orders of magnitude faster.

How large can a Neo4j graph be?

Neo4j handles billions of nodes and relationships on a single instance (given enough RAM). For datasets that don't fit in memory, Neo4j uses disk-based storage with memory-mapped I/O. Neo4j 5.x supports sharding via Fabric for truly massive graphs. In practice, most enterprise graphs are under 1 billion nodes — well within single-instance capacity.

Is GQL the new standard for graph queries?

GQL (Graph Query Language) was approved as an ISO standard in 2024. It's heavily influenced by Cypher (Neo4j's language) and will become the SQL equivalent for graph databases. Neo4j, Oracle, and others are adopting it. If you learn Cypher today, you're already 90% of the way to GQL. It's safe to invest in Cypher skills.

How do graph databases handle transactions?

Neo4j supports full ACID transactions — reads and writes within a transaction are atomic and isolated. This is a significant advantage over some NoSQL alternatives. However, graph databases don't support the same level of complex multi-table transactions that relational databases offer. Use a relational database for financial transactions, a graph database for relationship queries.

Should I use graph databases for RAG (AI knowledge graphs)?

Yes — GraphRAG (combining vector search with knowledge graphs) is one of the most promising approaches for improving LLM accuracy. Store entities and relationships in a graph database, use vector similarity for initial retrieval, then traverse the graph for context enrichment. Neo4j has built-in vector search indexes for this pattern. It's more complex than basic RAG but produces significantly better results for domain-specific questions.

🗄️

Pillai Infotech LLP

We build graph-powered applications — from recommendation engines to knowledge graphs and fraud detection systems. Let's explore how graphs can solve your data challenges.

NoSQL Databases Guide: MongoDB, Redis, Cassandra, and DynamoDB → PostgreSQL vs MySQL: Database Comparison for 2026 → Elasticsearch: Building Powerful Search for Your Application →