CodeGraph turns any codebase into a Neo4j knowledge graph of its files, classes,
functions and interfaces — then layers vector embeddings on top so you can ask it
questions in plain English. Graph traversal and semantic search answer together, through a
clean Streamlit UI. Point it at a GitHub repo or a local folder and start asking.
Structure and meaning are different kinds of information. A call graph is topology; "what does this module do" is semantics. CodeGraph keeps both and bridges them — so a single query can walk relationships and rank by similarity, instead of forcing you to pick one.
Structural layer
A Neo4j knowledge graph of typed entities — Class, Function, Interface, File — wired by the calls, imports and definitions that connect them.
Semantic layer
High-dimensional OpenAI embeddings index every chunk of code and its description, so meaning — not just exact names — drives retrieval. Nearest-neighbour search finds the relevant, even when the words differ.
Bridge & hybrid query
A bridge layer ties graph nodes to their embeddings. A question can traverse the graph for context and pull semantically similar code in one pass — answers come back with the surrounding structure attached.
The whole pipeline runs from the Streamlit interface with a live progress tracker — point it at a codebase and watch it ingest, build, index, and become queryable.
A public GitHub repository or a local folder. Inputs are validated before anything runs.
Configurable parsing walks the source and splits it into overlapping chunks, with progress tracked end to end.
An LLM extracts entities and relationships — classes, functions, interfaces, files — into Neo4j.
OpenAI text-embedding-3-large vectors are computed and indexed for similarity search.
Natural-language questions return answers with rich, structured context drawn from both layers.
Live database stats, health checks, and guarded cleanup — reset the UI without losing your data.
Not a notebook demo — a self-contained app with persistence, safety rails, and a one-command Docker boot.
Hybrid GraphRAG
Graph traversal and vector search combined, so structure and meaning inform every answer instead of competing.
A real interface
A polished Streamlit UI with live Neo4j connection status, a step-by-step workflow tracker, and real-time feedback.
Configurable LLM
Bring any OpenAI model for entity extraction and answers; tune chunk size, overlap, and the embedding model from config.
Persistent & safe
State survives page refreshes; destructive actions need multi-step confirmation. Reset the UI without dropping the database.
Observable
Real-time database metrics and health indicators keep the system legible while it's running, not just after.
One-command Docker
Docker Compose brings up the app and Neo4j together — configure a .env, run, and you're querying.
Two steps with Docker. You need Docker, Docker Compose, and an OpenAI API key.
OPENAI_API_KEY into a .env file — chunking and model choices live here too.# 1. clone & configure git clone https://github.com/Abhishek-Aditya-bs/CodeGraph cd CodeGraph # add your keys to .env NEO4J_URI=neo4j://neo4j:7687 OPENAI_API_KEY=sk-... EMBEDDING_MODEL=text-embedding-3-large LLM_MODEL=gpt-4o # 2. bring it all up docker compose up --build # → open the Streamlit UI and start querying
Give an unfamiliar codebase a structure you can query and a memory you can search — then ask it anything.