Graph-sitter
Graph-sitter is a Python library for manipulating codebases.
It provides a scriptable interface to a powerful, multi-lingual language server built on top of Tree-sitter.
from graph_sitter import Codebase
# Graph-sitter builds a complete graph connecting
# functions, classes, imports and their relationships
codebase = Codebase("./")
# Work with code without dealing with syntax trees or parsing
for function in codebase.functions:
# Comprehensive static analysis for references, dependencies, etc.
if not function.usages:
# Updates references and imports through graph-aware edit APIs
function.remove()
# Fast, in-memory code index
codebase.commit()Graph-sitter is designed for graph-aware refactors and codebase analysis. See correctness and parity for the current tested scope and known limits.
Quick Start
Graph-sitter requires Python 3.12 - 3.13 (recommended: Python 3.13+).
Using UV (Recommended)
uv tool install graph-sitter --python 3.13Using Pipx
Pipx is not officially supported by Codegen, but it should still work.
pipx install graph-sitterFor further & more in depth installation instructions, see the installation guide.
What can I do with Graph-sitter?
Graph-sitter's simple yet powerful APIs enable a range of applications, including:
Generate interactive visualizations of your codebase's structure, dependencies, and relationships.
Mine Codebase DataCreate high-quality training data for fine-tuning LLMs on your codebase.
Build CodemodsCreate powerful code transformations to automate large-scale changes.
See below for an example call graph visualization generated with Graph-sitter.
View source code on modal/modal-client. View codemod on codegen.sh
Get Started
Follow our step-by-step tutorial to start manipulating code with Graph-sitter.
TutorialsLearn how to use Graph-sitter for common code transformation tasks.
Star us on GitHub and contribute to the project.
Get help and connect with the Graph-sitter community.
Why Graph-sitter?
Many software engineering tasks - refactors, enforcing patterns, analyzing control flow, etc. - are fundamentally programmatic operations. Yet the tools we use to express these transformations often feel disconnected from how we think about code.
Graph-sitter was engineered backwards from real-world refactors we performed for enterprises at Codegen, Inc.. Instead of starting with theoretical abstractions, we built the set of APIs that map directly to how humans and AI think about code changes:
- Natural Mental Model: Express transformations through high-level operations that match how you reason about code changes, not low-level text or AST manipulation.
- Clean Business Logic: Let the engine handle the complexities of imports, references, and cross-file dependencies.
- Scale with Evidence: Make sweeping changes across large codebases using tested Python, TypeScript, JavaScript, and React workflows. See the large-repo benchmarks for current Airflow and Next.js proof.
As AI becomes increasingly sophisticated, we're seeing a fascinating shift: AI agents aren't bottlenecked by their ability to understand code or generate solutions. Instead, they're limited by their ability to efficiently manipulate codebases. The challenge isn't the "brain" - it's the "hands."
We built Graph-sitter with a key insight: future AI agents will need to "act via code," building their own sophisticated tools for code manipulation. Rather than generating diffs or making direct text changes, these agents will:
- Express transformations as composable programs
- Build higher-level tools by combining primitive operations
- Create and maintain their own abstractions for common patterns
This creates a shared language that both humans and AI can reason about effectively, making code changes more predictable, reviewable, and maintainable. Whether you're a developer writing a complex refactoring script or an AI agent building transformation tools, Graph-sitter provides the foundation for expressing code changes as they should be: through code itself.