Updated Case Study: Addressing MongoDB Vector Search Limitations with Neo4j
In our original case study, we used MongoDB for vector storage and similarity search in the real estate sector, which you can read using below link for the context
However, as we scaled, MongoDB presented several challenges.
Scalability
MongoDB’s document-based model struggles with large-scale, high-dimensional vector searches. MongoDB computes O(N) similarity comparisons using brute-force methods, which slows down searches as datasets grow.
Performance
Performance degradation occurs due to the lack of optimization for complex relationships between vectors.
Limited Graph Capabilities
MongoDB lacks native graph processing, making it inefficient for multi-hop contextual queries and relationship-driven data retrieval.
To overcome these issues, we transitioned to Neo4j, a graph database optimized for vector similarity searches, particularly when relationships between vectors are key.
Transition to Neo4j — A Graph-Native Solution
We have finilized Neo4j for our use based on the below parameters
Graph-Based Traversal for Similarity Search
Neo4j leverages Hierarchical Navigable Small World (HNSW) graphs and other graph traversal techniques for efficient high-dimensional vector searches. Instead of performing O(N) brute-force comparisons like MongoDB, Neo4j reduces this to O(log(N)) using graph-based nearest neighbor searches, dramatically improving performance.
Here how it was different compare to MongoDB Vector Search
MongoDB’s k-NN algorithm searches for the closest vectors by computing pairwise distances between the query vector and all stored vectors.
For each query, MongoDB computes the below formula
This results in an O(N) complexity, where N is the number of vectors and d is the vector dimensionality.
Neo4j uses HNSW to traverse the graph of vectors. Instead of brute-force comparison, HNSW organizes vectors into hierarchical layers, where nearest neighbors are connected based on a graph structure.
The search complexity in Neo4j reduces to O(log(N)) for nearest neighbor lookups, resulting in significant performance improvements in high-dimensional spaces.
Scalability and Performance
Neo4j’s graph structure allows for faster, more scalable searches by leveraging index-free adjacency. When vector data is interconnected, Neo4j optimizes the traversal of these relationships, making it more efficient than MongoDB.
Contextual and Relationship-Based Searches
Unlike MongoDB’s isolated vector searches, Neo4j excels at multi-hop queries. In Neo4j, vectors can be connected via relationships (edges), allowing for better context and richer, more relevant query responses. For example, searching for related projects in real estate becomes more powerful when we account for related compliance documents, zoning regulations, and previous project histories.
Here is a detailed comparison for the key aspects we have considered
Conclusion
While MongoDB is effective for simple vector similarity searches, it struggles with performance as the dataset grows and relationships between vectors become more important. Neo4j, with its graph-based architecture, efficiently handles large-scale, high-dimensional data, making it ideal for real estate applications where contextual relationships between documents (e.g., zoning laws, compliance reports, project histories) are crucial. By leveraging Neo4j, we can perform faster, more accurate searches, optimize resource use, and scale the system for future growth.
Incorporating Neo4j has revolutionized our LLM-Powered Knowledge Base, enabling us to provide real estate firms with more powerful, scalable, and contextually relevant data retrieval. As we continue to improve this system, we anticipate even greater efficiencies and insights for our clients.
Feel free to share among your developer friends and also don’t forget to clap :)