Code Intelligence β Structural Understanding for AI Agents β
Stop scanning. Start querying. Skeleton Index (<4s, zero deps) + AST graph + architecture diagrams = instant code understanding. Inspired by CodeGraph + GitDiagram. TRIZ-optimized: 10 inventive principles applied.
When to Use β
ALWAYS for medium-to-large projects. This is infrastructure, not an action skill.
- Auto-triggered by:
cm-startStep 0.7 (project init) β ALWAYS runs Layer 0 - Manually triggered for: "understand this codebase", "what calls X?", "what breaks if I change Y?"
- Skip when: NEVER β Layer 0 (Skeleton) works on any project size
Detection Thresholds (Auto-Trigger) β
TRIGGER if ANY of these are true:
β Project has >50 source files
β User wants to refactor or re-code an existing project
β User says "understand the codebase" / "what does this do?"
β cm-execution encounters >3 grep/glob calls for one task
β cm-debugging needs callers/callees to trace a bugArchitecture: 4 Layers β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β cm-codeintell β
ββββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββββββββ€
β LAYER 0 β LAYER 1 β LAYER 2 β LAYER 3 β
β Skeleton Index β Code Graph β Architecture β Smart Context β
β (Instant) β (Structure) β Diagram (Visual)β (Synthesis) β
ββββββββββββββββββββΌβββββββββββββββββββΌβββββββββββββββββββΌβββββββββββββββββ€
β grep/find/awk β tree-sitter AST β File tree + LLM β All layers + β
β β skeleton.md β β SQLite graph β β Mermaid.js β qmd β focused β
β (~5K tokens) β β MCP server β β .cm/ storage β context packet β
ββββββββββββββββββββΌβββββββββββββββββββΌβββββββββββββββββββΌβββββββββββββββββ€
β ZERO deps β codegraph_* β Auto-generated β Feeds: exec, β
β <4 seconds β MCP tools β at project init β plan, debug β
β ANY project size β 50+ files β 20+ files β All consumers β
ββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββTRIZ Principles Applied β
| # | Principle | How Applied |
|---|---|---|
| #1 Segmentation | 4 independent layers β each usable alone | |
| #2 Taking Out | Extract only signatures, discard function bodies | |
| #5 Merging | CodeGraph + GitDiagram + Skeleton β one unified skill | |
| #10 Prior Action | Pre-index at project init, not at query time | |
| #13 Inversion | Code summarizes ITSELF to agent (push, not pull) | |
| #15 Dynamicity | Adaptive: skeleton (<20) vs graph (>50) vs full (>200) | |
| #25 Self-Service | Auto-detect project size β auto-select intelligence level | |
| #28 Mechanics Substitution | Replace file reading (slow) with pattern matching (fast) | |
| #35 Parameter Changes | Unit: file content β function signature β 95% compression | |
| #40 Composite | One skill = skeleton + graph + diagrams + context builder |
Layer 0: Skeleton Index (Instant β Zero Dependencies) β
Purpose: Lightning-fast grep-based extraction of function signatures, class definitions, exports, and module boundaries. Produces a compact
.cm/skeleton.mdthat gives the agent instant understanding of any codebase.
How It Works β
1. SCAN β find all source files (14 languages supported)
2. EXTRACT β grep for function/class/export signatures only
3. GROUP β organize by directory (module boundaries)
4. CAP β limit per-dir (15 files) + total (600 lines)
5. OUTPUT β .cm/skeleton.md (~5K tokens for 600-file project)Usage β
# Run from project root
bash scripts/index-codebase.sh
# Custom paths
bash scripts/index-codebase.sh /path/to/project /path/to/output.mdWhat It Extracts (Per Language) β
| Language | Patterns Extracted |
|---|---|
| TypeScript/JavaScript | export, function, class, interface, type, enum, const =, routes |
| Python | def, async def, class, @app.route, from...import |
| Go | func, type...struct, type...interface, package |
| Rust | pub fn, struct, enum, impl, trait, mod |
| Java/Kotlin | class, interface, fun, data class, package |
| PHP | function, class, interface, trait, namespace |
| Ruby | def, class, module |
| C/C++ | function declarations, struct, class, typedef, #define |
| Swift | func, class, struct, protocol, extension |
Output Format β
# 𦴠Skeleton Index: my-project
| Meta | Value |
|------|-------|
| Source Files | 127 |
| Languages | typescript(89) python(38) |
| Framework | next.js+cloudflare |
## Entry Points
- `src/index.ts`
- `app/layout.tsx`
## Directory Structure
(compact tree, depth 2)
## Code Skeleton
### `src/auth/`
**AuthService.ts**
βββ
3:export class AuthService
5:export async function login(email, password)
12:export function validateToken(token)
βββ
### `src/api/`
**routes.ts**
βββ
8:export const router
15:router.get('/users'
22:router.post('/auth'
βββCompression Stats β
ββββββββββββββββββββ¬βββββββββββββ¬βββββββββββββββββ¬βββββββββββββββ
β Project Size β Raw Tokens β Skeleton Tokensβ Compression β
ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌβββββββββββββββ€
β 50 files (small) β ~20,000 β ~1,500 β 92.5% β
β 200 files (med) β ~80,000 β ~3,000 β 96.3% β
β 600 files (large)β ~240,000 β ~5,000 β 97.9% β
ββββββββββββββββββββ΄βββββββββββββ΄βββββββββββββββββ΄βββββββββββββββAgent Protocol β
AT SESSION START:
1. Check if .cm/skeleton.md exists
2. IF exists β read it (~5K tokens) β instant codebase understanding
3. IF not exists β run: bash scripts/index-codebase.sh
4. Use skeleton to:
β Know what functions exist and where
β Understand module boundaries
β Navigate to the right file for any task
β Skip grep/list_dir when exploring
WHEN TO RE-GENERATE:
β After major refactoring (>20 files changed)
β After branch switch
β When skeleton is >24h old
β User requests: "re-index the codebase"Layer 1: Code Graph (Structure) β
Purpose: Pre-indexed AST-based knowledge graph. Functions, classes, imports, call relationships β all queryable instantly.
Setup β
# Install CodeGraph (one-time)
npx @colbymchenry/codegraph
# Initialize for current project
codegraph init .
# Index the codebase (tree-sitter AST extraction)
codegraph index .MCP Server Setup β
Add to your MCP config (.mcp.json, claude_desktop_config.json, etc.):
{
"mcpServers": {
"codegraph": {
"command": "codegraph",
"args": ["serve"]
}
}
}Key MCP Tools β
| Tool | What It Does | Replaces |
|---|---|---|
codegraph_context(task) | Build focused context for a task | Multiple grep + view_file calls |
codegraph_search(query) | Find symbols by name or meaning | grep -r "pattern" |
codegraph_callers(symbol) | What calls this function? | Manual file-by-file search |
codegraph_callees(symbol) | What does this function call? | Reading entire function + tracing |
codegraph_impact(symbol) | What breaks if I change this? | Nothing (CM couldn't do this) |
codegraph_files(path) | Project structure with metadata | list_dir recursive + view_file |
codegraph_node(symbol) | Full details of one symbol | view_file + manual parsing |
When Agents Use These Tools β
INSTEAD OF: USE:
βββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββ
grep -r "UserService" src/ codegraph_search("UserService")
list_dir + view_file Γ 10 codegraph_context("implement auth")
"What calls validatePayment?" codegraph_callers("validatePayment")
"What if I change this class?" codegraph_impact("UserService", depth=2)
list_dir src/ --recursive codegraph_files("src/", format="tree")Keeping Index Fresh β
AUTO-SYNC (built-in):
β CodeGraph hooks auto-sync when files change (if hooks installed)
MANUAL SYNC (if hooks not installed):
β codegraph sync .
WHEN TO RE-INDEX:
β After major refactoring (>20 files changed)
β After branch switch
β When codegraph_status reports stale index
AI RULE: Before starting any task, check:
β codegraph status .
β If stale β codegraph sync . β then proceedLayer 2: Architecture Diagram (Visual) β
Purpose: Auto-generated Mermaid.js architecture diagram from project structure. See the big picture at a glance.
Generation Process β
1. EXTRACT β Read file tree structure (codegraph_files or list_dir)
2. ANALYZE β Identify key directories, patterns, entry points
3. GENERATE β Produce Mermaid.js diagram showing:
- Major modules/directories
- Key relationships (imports, API boundaries)
- Entry points (main, routes, handlers)
- Data flow direction
4. STORE β Save to .cm/architecture.mmd
5. RENDER β Display inline or via Pencil MCPDiagram Template β
When generating the architecture diagram, use this Mermaid structure:
## Architecture Diagram
β```mermaid
graph TD
subgraph "Frontend"
A[pages/] --> B[components/]
B --> C[hooks/]
C --> D[utils/]
end
subgraph "Backend"
E[routes/] --> F[controllers/]
F --> G[services/]
G --> H[models/]
end
subgraph "Infrastructure"
I[config/]
J[middleware/]
K[database/]
end
A -->|API calls| E
G --> K
J --> E
β```When to Generate β
AUTO-GENERATE at:
β cm-start Step 0.5 (project init)
β cm-brainstorm-idea Phase 1a (codebase scan)
β First time running cm-codeintell on a project
RE-GENERATE when:
β Major architectural change (new module, new service)
β User requests: "update the architecture diagram"
β >30 files added/removed since last generation
STORE at:
β .cm/architecture.mmd (Mermaid source)
β Include in brainstorm-output.md when relevantIntegration with Pencil MCP β
If Pencil MCP is available, render the diagram visually:
1. Generate Mermaid code β .cm/architecture.mmd
2. If pencil MCP available β render as visual node
3. If not β display Mermaid code inline (agents can parse it)Layer 3: Smart Context Builder (Synthesis) β
Purpose: Combine graph data + diagram + text search into a focused context packet for any task.
Context Building Protocol β
When any CM skill needs to understand the codebase for a specific task:
1. QUERY GRAPH β codegraph_context(task, maxNodes=20)
Returns: entry points, related symbols, code snippets
2. CHECK DIAGRAM β Read .cm/architecture.mmd
Identify which module/layer the task affects
3. SEARCH DOCS β IF qmd available: qmd query "task description"
Returns: relevant documentation, past decisions
4. COMPOSE PACKET β Merge results into a structured context:
{
"task": "...",
"affected_modules": ["..."],
"entry_points": ["..."],
"related_symbols": ["..."],
"impact_radius": ["..."],
"relevant_docs": ["..."],
"architecture_context": "..."
}
5. FEED DOWNSTREAM β Pass context packet to requesting skillAdaptive Intelligence Levels β
ββββββββββββββββ¬βββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββ
β Project Size β Level β What Activates β
ββββββββββββββββΌβββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββ€
β ANY size β SKELETON β Skeleton Index always runs (Layer 0) β
β <20 files β MINIMAL β Skeleton only (no graph, no diagram) β
β 20-50 files β LITE β Skeleton + architecture diagram β
β 50-200 files β STANDARD β Skeleton + CodeGraph + diagram β
β >200 files β FULL β Skeleton + CodeGraph + diagram + qmd β
ββββββββββββββββ΄βββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββ
Skeleton Index ALWAYS runs β it's the foundation for all levels.
Detection is automatic at cm-start Step 0.7.
User can override: "Use FULL intelligence mode"Integration with CodyMaster Skills β
cm-start (Step 0.5 β enhanced) β
EXISTING Step 0.5: Skill Coverage Check
NEW addition:
0.5b. Code Intelligence Setup:
1. Count source files β determine intelligence level
2. IF level >= LITE:
β Auto-generate architecture diagram β .cm/architecture.mmd
3. IF level >= STANDARD:
β Check if CodeGraph installed: codegraph status
β IF not installed β suggest: "npx @colbymchenry/codegraph"
β IF installed but not indexed β codegraph init . && codegraph index .
β IF indexed β codegraph sync . (ensure fresh)
4. IF level >= FULL:
β Also check qmd (cm-deep-search detection)
5. Log intelligence level to CONTINUITY.mdcm-execution (Pre-flight β enhanced) β
EXISTING Pre-flight: Skill Coverage Audit
NEW addition:
Pre-flight Step 2: Code Context Loading
IF codegraph available:
β For each task in current batch:
β context = codegraph_context(task.description, maxNodes=15)
β Inject context into agent prompt
β For tasks modifying shared code:
β impact = codegraph_impact(symbol, depth=2)
β If impact.affected > 10 files β WARN: "High impact change"
Result: Agents start with pre-loaded context instead of exploringcm-planning (Impact Analysis β new) β
NEW addition to Phase A:
Before writing implementation plan:
1. For each proposed change:
β codegraph_impact(affected_symbol) β list affected files
2. If total impact > 20 files:
β Flag as HIGH RISK in plan
β Recommend cm-tdd coverage for all impacted callers
3. Include impact summary in implementation_plan.mdcm-debugging (Trace Analysis β enhanced) β
EXISTING Phase 2: Hypothesis Formation
NEW enhancement:
IF codegraph available:
1. From error stack trace β extract function name
2. codegraph_callers(function) β who calls this?
3. codegraph_callees(function) β what does it call?
4. codegraph_impact(function) β what else is affected?
5. Use call chain to narrow hypotheses
Result: Root cause found in 1-2 queries instead of 5-10 grep callscm-brainstorm-idea (Phase 1a β enhanced) β
EXISTING Phase 1a: Codebase Scan
NEW enhancement:
1. Read .cm/architecture.mmd for instant overview
2. IF codegraph available:
β codegraph_files(".", format="tree", includeMetadata=true)
β Summary: X symbols, Y edges, Z files
3. Present architecture diagram to user in Discovery output
4. Use graph to identify:
β Most connected modules (highest coupling)
β Isolated modules (candidates for parallel work)
β Dead code (unreferenced symbols)File Storage β
.cm/
βββ skeleton.md # Skeleton Index output (Layer 0)
βββ architecture.mmd # Mermaid architecture diagram
βββ codegraph-meta.json # Graph metadata (last indexed, stats)
βββ CONTINUITY.md # (existing) β updated with intelligence level
βββ learnings.json # (existing)
βββ decisions.json # (existing)
.codegraph/ # CodeGraph's own directory (auto-created)
βββ codegraph.db # SQLite graph database
βββ config.json # CodeGraph configurationcodegraph-meta.json Format β
{
"intelligenceLevel": "STANDARD",
"lastIndexed": "2026-03-25T22:25:00+07:00",
"stats": {
"sourceFiles": 127,
"symbols": 387,
"edges": 1204,
"languages": ["typescript", "javascript"]
},
"diagramGenerated": "2026-03-25T22:25:30+07:00",
"codegraphVersion": "1.0.0"
}Lifecycle Position β
cm-project-bootstrap β cm-codeintell (auto) β cm-brainstorm-idea β cm-planning β cm-execution
(create) (index + diagram) (analyze) (plan) (implement)
β β
cm-debugging βββββ cm-quality-gate βββββ cm-tdd
(trace callers) (verify) (test first)Memory System (Updated) β
Tier 1: SENSORY β Temporary session variables
Tier 2: WORKING β CONTINUITY.md (~500 words)
Tier 3: LONG-TERM β learnings.json, decisions.json
Tier 4: SEMANTIC TEXT β qmd (BM25 + vector over docs/text)
Tier 5: STRUCTURAL β CodeGraph (AST symbols + call graph) β NEWIntegration Table β
| Skill | Relationship |
|---|---|
cm-start | TRIGGERED AT: Step 0.5 β auto-detect, auto-setup |
cm-execution | CONSUMER: pre-flight context loading + impact warnings |
cm-planning | CONSUMER: impact analysis for change proposals |
cm-debugging | CONSUMER: caller/callee tracing for root cause |
cm-brainstorm-idea | CONSUMER: architecture diagram + graph summary |
cm-deep-search | COMPLEMENT: qmd = text search, codegraph = structural |
cm-continuity | STORES: intelligence level + graph metadata |
cm-tdd | CONSUMER: know all callers before refactoring |
cm-safe-deploy | CONSUMER: impact analysis as pre-deploy gate |
cm-dockit | CONSUMER: auto-generate architecture docs from graph |
Rules β
β
DO:
- Auto-detect project size and select appropriate intelligence level
- Keep graph index fresh (sync before major tasks)
- Use codegraph_context INSTEAD of grep/glob for code exploration
- Generate architecture diagram at project init
- Store metadata in .cm/codegraph-meta.json
- Feed context to downstream skills (execution, planning, debugging)
β DON'T:
- Force CodeGraph on tiny projects (<20 files)
- Skip freshness checks (stale index worse than no index)
- Use codegraph as REPLACEMENT for qmd (they complement each other)
- Assume codegraph is installed β always check first
- Generate diagrams without validating Mermaid syntax
- Store sensitive code in architecture diagramsRequirements β
Layer 0 (Skeleton Index):
- ZERO dependencies (grep, find, awk β standard POSIX)
- Works on any OS (macOS, Linux, WSL)
- <4 seconds for 600-file projects
Layer 1 (CodeGraph):
- Node.js 18+ (for tree-sitter binaries)
- npx @colbymchenry/codegraph (one-time install)
- ~50MB disk for SQLite + embeddings per project
Layer 2 (Diagrams):
- No additional dependencies (uses agent's LLM)
- Mermaid.js knowledge (built into agent)
Layer 3 (Smart Context):
- Layer 0 required (always available)
- Layers 1 + 2 optional upgrades
- Optional: qmd for text search complementThe Bottom Line β
Skeleton Index = instant understanding. Code graph = deep meaning. Architecture diagrams = big picture. Together = AI that truly understands your code.