You Don't Always Need Codemap

A field note on the moment before cloning: how Codemap, ghx, Aider repo maps, Repomix, and Gitingest fit into an agent's decision to explore, search, map, pack, or read a repository.

§ 0 — The Wrong First Move

Most advice about AI code context starts too late. It assumes the repo has already been chosen, cloned, and promoted into the coding session. From there, the only remaining question seems to be how much of it to compress and hand to the model.

That is often useful. It is also often the wrong first move. Agents spend a lot of time in a messier state: comparing libraries, auditing dependencies, checking whether a pattern exists, or trying to find the one file that matters in a repository they may never clone.

In that state, "clone the repo and build a complete map" is not neutral. It is a commitment. A lot of workflows make that commitment early, then call the resulting pile of context agentic engineering.

It is not.

Agentic engineering is not "give the model more stuff." It is making the next action cheaper, narrower, and easier to verify.

"The question is not which context tool is best. The question is what decision the agent is trying to make right now."

Codemap, Aider, Gitingest, Repomix, and ghx all live near the same problem. They help agents see code without drowning in it. But they do not do the same job.

If the decision is what should I send to the model?, use a packer.
If the decision is how is this local codebase structured?, use a mapper.
If the decision is is this remote repo even worth reading?, use ghx.

ghx is not a Codemap replacement. It is the remote reconnaissance step before a repo deserves local indexing, packing, or full-file reading. ghx is built around a deliberately narrow belief: the best context is often the context you do not load.

§ 1 — Mapping Is Not Packing

Code mapping is not repo packing

Two workflows get blurred together because both produce model-shaped context.

Repo packing prepares a large artifact for a model to consume: XML, Markdown, JSON, plain text, or some digest format. It answers: what should I send to the model?

Code mapping extracts structure before the agent reads implementation: imports, types, functions, classes, methods, line ranges, references, and sometimes comments. It answers: what should I read next?

The operating difference

A packed repo is a commitment to context. A map is a refusal to commit too early.

That refusal matters. The first full-file read is often where an agent goes wrong. It reads implementation before relevance is established, then the next prompt inherits helper functions, generated code, compatibility branches, comments, tests, and whatever else happened to be nearby.

The disciplined path asks structural questions first:

What files exist?
Which files define the public surface?
Which files contain the functions, types, or imports I care about?
Which file deserves a full read?

An agent trying to understand an unfamiliar package does not always need every source file. Often it only needs the skeleton.

ghx read gkoreli/ghx --map v2/pkg/ghx/explore.go

=== v2/pkg/ghx/explore.go (3111 bytes) ===
package ghx
import (
type FileEntry struct {
type ExploreResult struct {
func Explore(repo string, path string) (*ExploreResult, error) {

That map is not a replacement for reading the file. It is the step before reading the file.

Without that step, the agent usually does this:

list directory
read likely file
read another likely file
read third likely file
realize the useful thing was in a fourth file

With ghx, the first pass can stay structural:

ghx read shadcn-ui/ui packages/shadcn/src/utils
ghx read shadcn-ui/ui "packages/shadcn/src/utils/*.ts" --map

The point is not the exact token count in a README example. The point is that the agent sees the neighborhood before choosing a house to enter.

§ 2 — The Search Trap

The GitHub code search correction

There is an attractive shortcut that does not work: "just use GitHub's symbol: search through the API."

GitHub.com has modern code search. The web UI understands queries like symbol:useState. That product is backed by richer infrastructure than the public code-search APIs expose.

REST /search/code still exists, but it uses the legacy code search engine.
gh search code says results may not match GitHub.com, and newer features like regex search are unavailable through the API.
GraphQL v4 has a search field, but SearchType does not include CODE.

So ghx search --symbol is not a cheap feature waiting to be wired up. Pretending otherwise would produce plausible garbage, which is worse than an error. It teaches the agent confidence at the exact moment it should be skeptical.

The honest design is smaller:

Fetch files through documented GitHub APIs.
Parse the content locally.
Return compact structural maps.
Fall back when a parser is unavailable.

That is why --kind is the right interface. It filters structural categories inside fetched files instead of pretending to be GitHub-wide symbol search.

ghx read owner/repo --map --kind func path/to/file.ts
ghx read owner/repo --map --kind type path/to/file.ts

The API cannot reliably answer "where is symbol X defined across GitHub?" ghx can answer "what functions, types, and imports does this fetched file contain?" That second question is smaller, but it is real.

"Real beats pretend. Every time."

§ 3 — Tool Boundaries

The mature tools have different cost models

The weak tools in this category all fail the same way. They say "AI-friendly" and then hand the model a wall of text. The strong tools have a cost model. They know what they are making cheap.

Tool	Optimizes for	Cost paid up front	Best output
Codemap	Deep local understanding	Local repo plus indexing	Structural index, references, maps
Aider repomap	Coding-session relevance	Local repo plus active chat state	Ranked context for Aider
Gitingest	Fast digest generation	Clone/fetch plus full digest pass	Prompt-friendly repo extract
Repomix	Complete AI-ready packaging	File collection plus packing/compression	XML, Markdown, JSON, or plain text artifact
ghx	Remote first-pass exploration	GitHub API calls only for requested paths	Tree, map, grep, selective reads

Codemap: local structural indexing

Codemap is the strongest local code-mapping tool I found. Its README has the right vocabulary: full, standard, compact, minimal, and outline. The implementation backs it up with token-budget reduction, cache refresh, reference updates, nested symbol rendering, call graphs, type hierarchy, and dedicated extraction across several language families.

That is not a toy. Codemap is a local structural database with a CLI on top. It is deeper than ghx. ghx is earlier in the workflow. That difference is not a weakness to apologize for. It is the product boundary.

The point is not only that Codemap extracts symbols. It does the hard boring work around them: DETAIL_LEVELS, reduceDetailLevel, fitToBudget, cache refresh, reference updates, source-map generation, comments at the right detail levels, and reference summaries. Those details matter because repeated local analysis gets worse fast when the cache, budget, and renderer are naive.

Aider repomap: context selection inside a coding agent

Aider's repo map is not a neutral "map my repo" product. It is part of Aider's coding loop. It extracts tags, caches them, tracks definitions and references, uses graph ranking, and fits the result to the active token budget. The map is intentionally biased by the current chat files, mentioned identifiers, and model context.

That makes sense. Aider already knows the conversation. ghx is for the earlier moment when an agent is still scouting.

The ranking is the important part. Aider is not just listing symbols. It uses Tree-sitter tags, cached definitions and references, mentioned files, mentioned identifiers, PageRank-style graph scoring, and token-budget fitting. A neutral map would be less useful inside an already-running coding session.

Gitingest: prompt-friendly repo digests

Gitingest lowers friction from GitHub URL to LLM-readable digest. Clone handling, ignore patterns, tree walking, token estimates, and output formatting all serve that job. It is useful. It is also not exploration. Digesting a repo before you know what you need is just a polite way to overfeed the model.

The product idea is memorable because the interface is memorable: replace hub with ingest in a GitHub URL and get a prompt-shaped view of the repository. That is a good workflow. It is just a different workflow from scouting.

Repomix: full-repo packing with smart compression

Repomix is the most polished repo-packing tool in this set. It supports local and remote repositories, several output formats, config files, ignore handling, token counts, secret scanning, git logs, diffs, output splitting, GitHub Actions, Docker, and Tree-sitter compression through WASM.

When the job is "prepare this codebase for an LLM," use it. The mistake is reaching for a packer when the agent has not established which files matter.

Its Tree-sitter compression path is also a good example of practical engineering. Repomix uses WASM Tree-sitter so compression can stay cross-platform, avoid native compilation, bundle parsers, and accept the overhead where the distribution trade-off is worth it.

Tools reveal their philosophy through what they make cheap. Codemap makes repeated local structural queries cheap. Aider makes coding-session context cheap. Gitingest and Repomix make "send the repo to a model" cheap. ghx makes not cloning cheap.

Not cloning is not laziness. It is a different default. "Just clone it" is not engineering guidance. It is a local optimum disguised as wisdom. Sometimes you should clone it. Sometimes cloning is the wrong first IO operation.

The agent does not need a copy of the repo to ask what is in the repo.

§ 4 — The Cost Model

Agentic engineering is a cost model

The phrase "agentic engineering" gets abused because people want it to mean architecture plus vibes. It is simpler than that. In practice, agentic engineering is the discipline of shaping the environment so the agent can make the next correct move with less context, less guessing, and fewer irreversible actions.

That means the tool should care about:

what the agent knows right now
what the agent needs to decide next
what evidence would change that decision
how much irrelevant context the tool is about to inject
whether the output can be checked by the next command

The line

If a tool cannot answer those questions, it is not an agent tool yet. It is a content hose.

ghx is built from the opposite direction. It assumes the agent is not ready for the whole repo. It assumes the first answer should be small. It assumes the agent should be able to escalate: repo overview, tree, map, filtered map, grep, file read.

That is why this little binary has taken so much work. The hard part is not printing files from GitHub. Anyone can do that. The hard part is designing the sequence so agents stop doing the dumb expensive thing by default.

§ 5 — ghx Territory

Where ghx fits

ghx is not trying to become a local static-analysis database, a full coding agent, or a repo packer. Good. It should not.

Its bet is narrower: agents often need to inspect GitHub repos they have not cloned and may never clone. They compare libraries, audit dependencies, inspect examples, and check whether a pattern exists before reading implementation.

Clone-first tools add friction in that workflow:

A clone costs time.
A clone costs disk.
A clone may be unavailable in locked-down environments.
A clone is wasteful if the repo is only being sampled.
A local index is overkill if the agent only needs three files.

The deeper cost is posture. Once the repo is local, the temptation is to search broadly, read broadly, and stuff context broadly. That can be correct for implementation work. It is wasteful for reconnaissance.

Do not read the file yet.
First ask what files exist.
Then ask what those files contain.
Then read the smallest thing that can answer the question.

ghx keeps that first pass API-native:

ghx explore owner/repo
ghx tree owner/repo src
ghx read owner/repo --map "src/**/*.ts"
ghx read owner/repo --map --kind func path/to/file.ts
ghx read owner/repo --grep "pattern" path/to/file.ts
ghx read owner/repo path/to/file.ts

The model is progressive disclosure:

Look at repo metadata and top-level files.
List the tree.
Map candidate files.
Filter by symbol kind.
Read only the files that matter.

The old --map implementation was regex-based. Useful, but limited. The current engine is parser-backed for common cases: Go uses go/ast; TypeScript, TSX, JavaScript, JSX, Python, and Rust use Tree-sitter through a CGo-free Go runtime. Unsupported languages fall back to regex.

That keeps the promise intact: remote code orientation without cloning. The small interface hides real engineering choices: GraphQL batching, directory handling, glob expansion, parser routing, Go AST where it wins, Tree-sitter where it wins, regex fallback where it is honest, and output designed for agents instead of humans scrolling a terminal.

It also keeps ghx honest. If a parser is not available, ghx falls back. If GitHub's public API does not expose modern code search, ghx does not pretend that it does.

"Bad agent tools fail by looking successful." A query returns results, a context blob looks comprehensive, and the model reasons from the wrong slice.

If the tool cannot be honest about uncertainty, the model will be dishonest with confidence.

The product rule

ghx should give the agent enough structure to choose the next read. It should not pretend every GitHub investigation needs a local database, a giant prompt artifact, or a full call graph.

§ 6 — Try It

The smallest useful version

If you want the practical setup, the ghx README has the install commands, MCP config, and ghx skill output for teaching an agent how to use it. The whole idea fits in one instruction:

Agent rule

When inspecting a GitHub repo, use ghx explore, ghx tree, ghx read --map, or ghx read --grep before cloning, packing, or reading whole files.

A good first pass is intentionally boring:

npx @gkoreli/ghx explore owner/repo
npx @gkoreli/ghx read owner/repo --map "src/**/*.ts"
npx @gkoreli/ghx read owner/repo path/to/file.ts

§ 7 — The Decision Framework

Use the tool that matches the decision

Where is the code?
├── Local filesystem
│   ├── Need deep structural/indexed analysis? -> Codemap
│   ├── Coding inside Aider? -> Aider repo map
│   └── Need one packed prompt artifact? -> Repomix or Gitingest
└── Remote GitHub repo
    ├── Need a complete packed artifact? -> Repomix remote or Gitingest
    └── Need to inspect before reading/cloning? -> ghx
        ├── Start with repo shape -> ghx explore / ghx tree
        ├── Need structure -> ghx read --map
        ├── Need only functions/types/imports -> ghx read --map --kind
        └── Need implementation -> ghx read

The next ghx surface should follow the same rule. A top-level ghx map command would be useful for humans, but it should be a wrapper over read --map, not a second implementation. Repo-wide mapping should be capped and filtered by default. If a user wants the whole repo packed, Repomix and Gitingest already exist.

"Before you clone, map." Not always. Not forever. Just first.

That sentence is useful because it changes behavior. It tells the agent to delay the irreversible move from "I am investigating" to "I am loading context."

Do not clone before you know the repo matters.
Do not pack before you know the files matter.
Do not read implementation before structure.
Do not call legacy search modern.
Do not mistake more context for better engineering.

Codemap is the local structural index. Aider's repo map is context selection inside a coding agent. Gitingest is the quick repo digest. Repomix is the full-featured repo packer. ghx is the remote GitHub exploration layer.

The point is not to make ghx win every row of a comparison table. The point is to stop using the wrong row. Universal claims age badly. Narrow tools with clear cost models survive contact with real workflows.

That is why you do not always need Codemap. Sometimes you absolutely do. But sometimes the agent is standing outside the repo, hand on the doorknob, not yet sure it should go in.

That is ghx territory.

I trust a tool more when it can tell me what not to use it for.

Sources & Evidence

GitHub.com code search supports regex, boolean operators, specialized qualifiers, and symbol search in the web product. GitHub Code Search

GitHub documents the syntax for modern code search, including the symbol qualifier. Code Search Syntax

GraphQL has a search field, but its public SearchType enum does not include CODE. GraphQL Search

The live GraphQL schema checked on 2026-04-15 returned ISSUE, ISSUE_ADVANCED, ISSUE_SEMANTIC, ISSUE_HYBRID, REPOSITORY, USER, and DISCUSSION. SearchType Enum

GitHub CLI warns that code search results may not match GitHub.com and newer features such as regex are unavailable through the API. gh search code

GitHub keeps separate legacy code-search documentation for public API-era search behavior. Legacy Code Search

ghx search currently uses REST code search with text matches. ghx search.go

ghx read batches GitHub file and directory fetches, then maps fetched content locally. ghx read.go

ghx glob handling is part of the selective remote-read path. ghx glob.go

The map command design is captured in the ghx ADR for parser-backed mapping. ADR-0013

ghx parser routing is explicit: Go uses go/ast; TypeScript, TSX, JavaScript, JSX, Python, and Rust use gotreesitter; unsupported languages fall back to regex. mapengine types

The Go map engine uses go/ast instead of Tree-sitter for top-level declaration extraction. goast.go

The Tree-sitter-backed path handles the non-Go language map engine. treesitter.go

Codemap documents named detail levels, local code mapping, references, and call graph features. kcosr/codemap

Codemap implements token-budget reduction and source-map machinery in its core source map module. sourceMap.ts

Codemap rendering groups and formats symbols for mapped output. render.ts

Aider describes repo maps as context selected for the active coding conversation. Aider Repo Map

Aider repomap implementation uses tags, references, ranking, and token-budget fitting. repomap.py

Gitingest is a prompt-friendly repository digest tool with CLI and Python entry points. Gitingest

Gitingest clone handling, ingestion, and output formatting back the digest workflow. output_formatter.py

Repomix packs repositories into AI-friendly artifacts across local, remote, CI, Docker, and website workflows. Repomix

Repomix compression uses web-tree-sitter WASM in its Tree-sitter parsing path. parseFile.ts

You Don't AlwaysNeed Codemap