CodeGraph — Turn any codebase into a queryable knowledge graph

Three layers, one answer

GraphRAG

Structure and meaning are different kinds of information. A call graph is topology; "what does this module do" is semantics. CodeGraph keeps both and bridges them — so a single query can walk relationships and rank by similarity, instead of forcing you to pick one.

Structural layer

A Neo4j knowledge graph of typed entities — Class, Function, Interface, File — wired by the calls, imports and definitions that connect them.

graph · Neo4j→

Semantic layer

High-dimensional OpenAI embeddings index every chunk of code and its description, so meaning — not just exact names — drives retrieval. Nearest-neighbour search finds the relevant, even when the words differ.

vectors · embeddings→

Bridge & hybrid query

A bridge layer ties graph nodes to their embeddings. A question can traverse the graph for context and pull semantically similar code in one pass — answers come back with the surrounding structure attached.

hybrid · graph + vector→

From repo to answer in six steps

6 stages

The whole pipeline runs from the Streamlit interface with a live progress tracker — point it at a codebase and watch it ingest, build, index, and become queryable.

01 · Input

Point at a codebase

A public GitHub repository or a local folder. Inputs are validated before anything runs.

02 · Ingestion

Parse & chunk

Configurable parsing walks the source and splits it into overlapping chunks, with progress tracked end to end.

03 · Graph

Extract the graph

An LLM extracts entities and relationships — classes, functions, interfaces, files — into Neo4j.

04 · Index

Embed & index

OpenAI text-embedding-3-large vectors are computed and indexed for similarity search.

05 · Query

Ask in English

Natural-language questions return answers with rich, structured context drawn from both layers.

06 · Manage

Inspect & reset

Live database stats, health checks, and guarded cleanup — reset the UI without losing your data.

Built to actually run

Not a notebook demo — a self-contained app with persistence, safety rails, and a one-command Docker boot.

Hybrid GraphRAG

Graph traversal and vector search combined, so structure and meaning inform every answer instead of competing.

A real interface

A polished Streamlit UI with live Neo4j connection status, a step-by-step workflow tracker, and real-time feedback.

Configurable LLM

Bring any OpenAI model for entity extraction and answers; tune chunk size, overlap, and the embedding model from config.

Persistent & safe

State survives page refreshes; destructive actions need multi-step confirmation. Reset the UI without dropping the database.

Observable

Real-time database metrics and health indicators keep the system legible while it's running, not just after.

One-command Docker

Docker Compose brings up the app and Neo4j together — configure a .env, run, and you're querying.

Quickstart

Two steps with Docker. You need Docker, Docker Compose, and an OpenAI API key.

envDrop your Neo4j credentials and OPENAI_API_KEY into a .env file — chunking and model choices live here too.
upOne Compose command builds the app and starts Neo4j alongside it.
openVisit the Streamlit UI, point it at a repo, and run the pipeline.

codegraph — quickstart

# 1. clone & configure
git clone https://github.com/Abhishek-Aditya-bs/CodeGraph
cd CodeGraph

# add your keys to .env
NEO4J_URI=neo4j://neo4j:7687
OPENAI_API_KEY=sk-...
EMBEDDING_MODEL=text-embedding-3-large
LLM_MODEL=gpt-4o

# 2. bring it all up
docker compose up --build

# → open the Streamlit UI and start querying

Read code the way it's actually connected.