A movie recommendation application that combines Neo4j's graph database capabilities with Google Cloud's Vertex AI to deliver intelligent, natural language-based movie recommendations. The system performs semantic vector search using Vertex AI embeddings, then leverages large language models to generate and execute Cypher queries on the Neo4j knowledge graph, enabling multi-hop reasoning and contextual recommendations powered by the GraphRAG pattern.
Check out the detailed explanation of this project in the blog post: Building an Intelligent Movie Search with Neo4j and Vertex AI
This project demonstrates how to build an GenAI-powered movie recommendation engine using the GraphRAG pattern by integrating:
- Neo4j: A graph database for storing movie data, knowledge graph relationships, and vector embeddings
- Google Vertex AI: For generating semantic embeddings (text-embedding-005) and leveraging Gemini for natural language understanding and Cypher generation
- Gradio: To build an intuitive web interface for interactive recommendations
The system performs semantic search using vector embeddings to retrieve relevant movie context, then dynamically generates Cypher queries using Gemini based on this context and the Neo4j knowledge graph schema. These Cypher queries are executed to fetch precise results, which are then summarized conversationally by Gemini β enabling a powerful, explainable, and context-aware movie recommendation experience.
π¬ Live Demo
- Data Ingestion: Movie metadata (titles, plots, genres, actors, etc.) is loaded into a Neo4j graph database and modeled using nodes and relationships.
- Vector Embeddings: Vertex AI's
text-embedding-005model is used to generate semantic embeddings for movie descriptions, which are stored in Neo4j with a vector index. - Vector Search: When a user enters a query, the system computes its embedding and performs a vector similarity search in Neo4j to retrieve semantically relevant movies.
- Cypher Query Generation: Using the vector search results and the graph schema (ontology), Gemini generates a Cypher query tailored to the user's intent.
- Graph Reasoning: The generated Cypher query is executed on the Neo4j knowledge graph to perform multi-hop reasoning and extract deeper insights or related entities.
- Natural Language Summary: Gemini then summarizes the Cypher query results in a human-friendly, conversational format.
example.env: Template for required environment variables.env.yamlβ Cloud Run deployment environment configurationnormalized_data/: Contains normalized movie datasetgraph_build.py: Loads movies, genres, actors, and relationships into Neo4jgenerate_embeddings.py: Generates semantic vector embeddings using Vertex AIgenerate_embeddings_to_csv.py/export_embeddings_to_csv.pyβ Scripts for exporting or generating embeddings into CSVload_embeddings.pyβ Loads generated embeddings into Neo4j with vector indexmovie_embeddings.csvβ Precomputed embeddings file (used for faster load/testing)prompts.pyβ Prompt templates for Gemini (Cypher generation, summarization, query repair)app.py: Main application that powers the Gradio UI and implements the GraphRAG pipeline (vector search + LLM-based Cypher execution)Dockerfileβ Used to containerize and deploy the application (e.g., to Cloud Run)requirements.txtβ Python dependencies
- Python 3.7+
- Neo4j database (can be self-hosted or Aura DB(recommended))
- Google Cloud account with Vertex AI API enabled
- Service account with appropriate permissions for Vertex AI
π‘ Tip: Run these steps in Google Cloud Shell for a pre-authenticated environment with gcloud, Vertex AI SDK, and permissions already set up β no need to manually manage service account keys.
- Clone this repository
git clone https://github.com/your-username/neo4j-vertexai-codelab.git cd neo4j-vertexai-codelab - Copy
example.envto.envand fill in your configuration:NEO4J_URI=your-neo4j-connection-string NEO4J_USER=your-neo4j-username NEO4J_PASSWORD=your-neo4j-password PROJECT_ID=your-gcp-project-id LOCATION=your-gcp-location
- (Optional) Create a service account in Google Cloud and download the JSON key file
- Ensure it has access to Vertex AI and Cloud Storage.
- Grant roles: Vertex AI User, Storage Object Viewer, etc.
- (Optional) Place the service account key JSON file in the project directory (referenced in
generate_embeddings.py)- Set the path using:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/your-key.json"
- Set the path using:
# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtFirst, load movie data into Neo4j:
python graph_build.pyGenerate vector embeddings for movie descriptions:
python generate_embeddings.pyEmbedding CSV Utilities:
generate_embeddings_to_csv.py: A one-time script used to generatemovie_embeddings.csv, which contains pre-computed vector embeddings for movies.export_embeddings_to_csv.py: A utility script to export existing embeddings from Neo4j to a CSV file.
Loading Embeddings Directly from CSV:
If you want to skip generating embeddings and load precomputed embeddings (in a CSV file) directly into Neo4j, you have two options:
a. Running Cypher Directly in Neo4j Aura Console
You can load the CSV file directly into Neo4j using the following Cypher query:
LOAD CSV WITH HEADERS FROM 'https://storage.googleapis.com/neo4j-vertexai-codelab/movie_embeddings.csv' AS row
WITH row
MATCH (m:Movie {tmdbId: toInteger(row.tmdbId)})
SET m.embedding = apoc.convert.fromJsonList(row.embedding)b. Using the Python Script
Alternatively, you can run the load_embeddings.py script, which automates this process via the Neo4j Python driver.
python load_embeddings.pyLaunch the Gradio web interface:
python app.pyThe application will be available at http://0.0.0.0:8080 by default.
Before deploying to Cloud Run, ensure your requirements.txt file includes all necessary dependencies for Neo4j and Vertex AI integration. Additionally, you need a Dockerfile to containerize your application for deployment.
Both requirements.txt and Dockerfile are present in this repository:
requirements.txt: Lists all the Python dependencies required to run the application.Dockerfile: Defines the container environment, including the base image, required packages, and how the application is executed.
If you want to deploy this application to Google Cloud Run for production use, follow these steps:
# Set your Google Cloud project ID
export GCP_PROJECT='your-project-id' # Change this as per your GCP Project ID
# Set your preferred region
export GCP_REGION='us-central1' # Change this as per your GCP region# Set the Artifact Registry repository name
export AR_REPO='movies-reco' # Change this if needed
# Set your service name
export SERVICE_NAME='movies-reco' # Change if needed
# Create the Artifact Registry repository
gcloud artifacts repositories create "$AR_REPO" \
--location="$GCP_REGION" \
--repository-format=Docker
# Configure Docker to use Google Cloud's Artifact Registry
gcloud auth configure-docker "$GCP_REGION-docker.pkg.dev"
# Build and submit the container image
gcloud builds submit \
--tag "$GCP_REGION-docker.pkg.dev/$GCP_PROJECT/$AR_REPO/$SERVICE_NAME"Before deployment, ensure your requirements.txt file is properly configured with all necessary dependencies for your Neo4j and VertexAI integration.
Before deploying the application to Cloud Run, configure .env.yaml for Cloud Run Deployment in your project root with the following structure:
NEO4J_URI: "bolt+s://<your-neo4j-uri>"
NEO4J_USER: "neo4j"
NEO4J_PASSWORD: "<your-neo4j-password>"
PROJECT_ID: "<your-gcp-project-id>"
LOCATION: "<your-gcp-region>"β This YAML file is used during Cloud Run deployment to inject environment variables into your container runtime. Once set, you can proceed with the gcloud run deploy command.
The following command deploys your application to Cloud Run using environment variables defined in .env.yaml. It ensures proper formatting, removes any commented lines automatically handled by gcloud, and sets up the containerized app with public access:
gcloud run deploy "$SERVICE_NAME" \
--port=8080 \
--image="$GCP_REGION-docker.pkg.dev/$GCP_PROJECT/$AR_REPO/$SERVICE_NAME" \
--allow-unauthenticated \
--region=$GCP_REGION \
--platform=managed \
--project=$GCP_PROJECT \
--env-vars-file=.env.yamlAfter deployment, your application will be accessible at a URL like:
https://movies-reco-[unique-id].us-central1.run.app/
Note:
- Your
requirements.txtshould list all Python dependencies. - Make sure your application's
Dockerfileis set up properly to run in a containerized environment. TheDockerfileshould include apip install -r requirements.txtcommand to ensure all dependencies are installed during the container build process. - You'll need to include your service account credentials (unless running from Google Cloud Shell directly) and environment variables in the container.
- "Which time travel movies star Bruce Willis?"
- "Show me thrillers from the 2000s with mind-bending plots."
- "I'm in the mood for something with superheroes but not too serious"
- "I want a thriller that keeps me on the edge of my seat"
- "Show me movies about artificial intelligence taking over the world"
- Neo4j Vector Search Documentation
- Vertex AI Embeddings
- Gemini API
- Gradio Documentation
- Cloud Run Documentation
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.