Updated Case Study: Addressing MongoDB Vector Search Limitations with Neo4j

3 min readOct 6, 2024

In our original case study, we used MongoDB for vector storage and similarity search in the real estate sector, which you can read using below link for the context

Case Study — Custom LLM Powered Knowledge Base for Real Estate Industry using Gemma and MongoDB

In the fast-paced, data-intensive environment of real estate, the management and retrieval of extensive project-related…

medium.com

However, as we scaled, MongoDB presented several challenges.

Scalability

MongoDB’s document-based model struggles with large-scale, high-dimensional vector searches. MongoDB computes O(N) similarity comparisons using brute-force methods, which slows down searches as datasets grow.

Performance

Performance degradation occurs due to the lack of optimization for complex relationships between vectors.

Limited Graph Capabilities

MongoDB lacks native graph processing, making it inefficient for multi-hop contextual queries and relationship-driven data retrieval.

To overcome these issues, we transitioned to Neo4j, a graph database optimized for vector similarity searches, particularly when relationships between vectors are key.

Transition to Neo4j — A Graph-Native Solution

We have finilized Neo4j for our use based on the below parameters

Graph-Based Traversal for Similarity Search

Neo4j leverages Hierarchical Navigable Small World (HNSW) graphs and other graph traversal techniques for efficient high-dimensional vector searches. Instead of performing O(N) brute-force comparisons like MongoDB, Neo4j reduces this to O(log(N)) using graph-based nearest neighbor searches, dramatically improving performance.

Here how it was different compare to MongoDB Vector Search

MongoDB’s k-NN algorithm searches for the closest vectors by computing pairwise distances between the query vector and all stored vectors.

For each query, MongoDB computes the below formula

This results in an O(N) complexity, where N is the number of vectors and d is the vector dimensionality.

Neo4j uses HNSW to traverse the graph of vectors. Instead of brute-force comparison, HNSW organizes vectors into hierarchical layers, where nearest neighbors are connected based on a graph structure.

The search complexity in Neo4j reduces to O(log(N)) for nearest neighbor lookups, resulting in significant performance improvements in high-dimensional spaces.

Scalability and Performance

Neo4j’s graph structure allows for faster, more scalable searches by leveraging index-free adjacency. When vector data is interconnected, Neo4j optimizes the traversal of these relationships, making it more efficient than MongoDB.

Contextual and Relationship-Based Searches

Unlike MongoDB’s isolated vector searches, Neo4j excels at multi-hop queries. In Neo4j, vectors can be connected via relationships (edges), allowing for better context and richer, more relevant query responses. For example, searching for related projects in real estate becomes more powerful when we account for related compliance documents, zoning regulations, and previous project histories.

Here is a detailed comparison for the key aspects we have considered

Conclusion

While MongoDB is effective for simple vector similarity searches, it struggles with performance as the dataset grows and relationships between vectors become more important. Neo4j, with its graph-based architecture, efficiently handles large-scale, high-dimensional data, making it ideal for real estate applications where contextual relationships between documents (e.g., zoning laws, compliance reports, project histories) are crucial. By leveraging Neo4j, we can perform faster, more accurate searches, optimize resource use, and scale the system for future growth.

Incorporating Neo4j has revolutionized our LLM-Powered Knowledge Base, enabling us to provide real estate firms with more powerful, scalable, and contextually relevant data retrieval. As we continue to improve this system, we anticipate even greater efficiencies and insights for our clients.

Feel free to share among your developer friends and also don’t forget to clap :)