Most advice about AI code context starts too late. It assumes the repo has already been chosen, cloned, and promoted into the coding session. From there, the only remaining question seems to be how much of it to compress and hand to the model.
That is often useful. It is also often the wrong first move. Agents spend a lot of time in a messier state: comparing libraries, auditing dependencies, checking whether a pattern exists, or trying to find the one file that matters in a repository they may never clone.
In that state, "clone the repo and build a complete map" is not neutral. It is a commitment. A lot of workflows make that commitment early, then call the resulting pile of context agentic engineering.
It is not.
Agentic engineering is not "give the model more stuff." It is making the next action cheaper, narrower, and easier to verify.
"The question is not which context tool is best. The question is what decision the agent is trying to make right now."
Codemap, Aider, Gitingest, Repomix, and ghx all live near the same problem. They help agents see code without drowning in it. But they do not do the same job.
- If the decision is what should I send to the model?, use a packer.
- If the decision is how is this local codebase structured?, use a mapper.
- If the decision is is this remote repo even worth reading?, use ghx.
ghx is built around a deliberately narrow belief: the best context is often the context you do not load.
Code mapping is not repo packing
Two workflows get blurred together because both produce model-shaped context.
Repo packing prepares a large artifact for a model to consume: XML, Markdown, JSON, plain text, or some digest format. It answers: what should I send to the model?
Code mapping extracts structure before the agent reads implementation: imports, types, functions, classes, methods, line ranges, references, and sometimes comments. It answers: what should I read next?
A packed repo is a commitment to context. A map is a refusal to commit too early.
That refusal matters. The first full-file read is often where an agent goes wrong. It reads implementation before relevance is established, then the next prompt inherits helper functions, generated code, compatibility branches, comments, tests, and whatever else happened to be nearby.
The disciplined path asks structural questions first:
- What files exist?
- Which files define the public surface?
- Which files contain the functions, types, or imports I care about?
- Which file deserves a full read?
An agent trying to understand an unfamiliar package does not always need every source file. Often it only needs the skeleton.
ghx read gkoreli/ghx --map v2/pkg/ghx/explore.go
=== v2/pkg/ghx/explore.go (3111 bytes) ===
package ghx
import (
type FileEntry struct {
type ExploreResult struct {
func Explore(repo string, path string) (*ExploreResult, error) {
That map is not a replacement for reading the file. It is the step before reading the file.
Without that step, the agent usually does this:
list directory
read likely file
read another likely file
read third likely file
realize the useful thing was in a fourth file
With ghx, the first pass can stay structural:
ghx read shadcn-ui/ui packages/shadcn/src/utils
ghx read shadcn-ui/ui "packages/shadcn/src/utils/*.ts" --map
The point is not the exact token count in a README example. The point is that the agent sees the neighborhood before choosing a house to enter.
The GitHub code search correction
There is an attractive shortcut that does not work: "just use
GitHub's symbol: search through the API."
GitHub.com has modern code search.
The web UI understands queries like symbol:useState.
That product is backed by richer infrastructure than the public
code-search APIs expose.
- REST
/search/codestill exists, but it uses the legacy code search engine. gh search codesays results may not match GitHub.com, and newer features like regex search are unavailable through the API.- GraphQL v4 has a
searchfield, butSearchTypedoes not includeCODE.
So ghx search --symbol is not a cheap feature waiting
to be wired up. Pretending otherwise would produce plausible
garbage, which is worse than an error. It teaches the agent
confidence at the exact moment it should be skeptical.
The honest design is smaller:
- Fetch files through documented GitHub APIs.
- Parse the content locally.
- Return compact structural maps.
- Fall back when a parser is unavailable.
That is why --kind is the right interface. It filters
structural categories inside fetched files instead of pretending to
be GitHub-wide symbol search.
ghx read owner/repo --map --kind func path/to/file.ts
ghx read owner/repo --map --kind type path/to/file.ts
The API cannot reliably answer "where is symbol X defined across GitHub?" ghx can answer "what functions, types, and imports does this fetched file contain?" That second question is smaller, but it is real.
"Real beats pretend. Every time."
The mature tools have different cost models
The weak tools in this category all fail the same way. They say "AI-friendly" and then hand the model a wall of text. The strong tools have a cost model. They know what they are making cheap.
| Tool | Optimizes for | Cost paid up front | Best output |
|---|---|---|---|
| Codemap | Deep local understanding | Local repo plus indexing | Structural index, references, maps |
| Aider repomap | Coding-session relevance | Local repo plus active chat state | Ranked context for Aider |
| Gitingest | Fast digest generation | Clone/fetch plus full digest pass | Prompt-friendly repo extract |
| Repomix | Complete AI-ready packaging | File collection plus packing/compression | XML, Markdown, JSON, or plain text artifact |
| ghx | Remote first-pass exploration | GitHub API calls only for requested paths | Tree, map, grep, selective reads |
Codemap: local structural indexing
Codemap is the strongest local code-mapping tool I found. Its
README has the right vocabulary: full,
standard, compact, minimal,
and outline. The implementation backs it up with
token-budget reduction, cache refresh, reference updates, nested
symbol rendering, call graphs, type hierarchy, and dedicated
extraction across several language families.
That is not a toy. Codemap is a local structural database with a CLI on top. It is deeper than ghx. ghx is earlier in the workflow. That difference is not a weakness to apologize for. It is the product boundary.
The point is not only that Codemap extracts symbols. It does the
hard boring work around them: DETAIL_LEVELS,
reduceDetailLevel, fitToBudget, cache
refresh, reference updates, source-map generation, comments at the
right detail levels, and reference summaries. Those details matter
because repeated local analysis gets worse fast when the cache,
budget, and renderer are naive.
Aider repomap: context selection inside a coding agent
Aider's repo map is not a neutral "map my repo" product. It is part of Aider's coding loop. It extracts tags, caches them, tracks definitions and references, uses graph ranking, and fits the result to the active token budget. The map is intentionally biased by the current chat files, mentioned identifiers, and model context.
That makes sense. Aider already knows the conversation. ghx is for the earlier moment when an agent is still scouting.
The ranking is the important part. Aider is not just listing symbols. It uses Tree-sitter tags, cached definitions and references, mentioned files, mentioned identifiers, PageRank-style graph scoring, and token-budget fitting. A neutral map would be less useful inside an already-running coding session.
Gitingest: prompt-friendly repo digests
Gitingest lowers friction from GitHub URL to LLM-readable digest. Clone handling, ignore patterns, tree walking, token estimates, and output formatting all serve that job. It is useful. It is also not exploration. Digesting a repo before you know what you need is just a polite way to overfeed the model.
The product idea is memorable because the interface is memorable:
replace hub with ingest in a GitHub URL
and get a prompt-shaped view of the repository. That is a good
workflow. It is just a different workflow from scouting.
Repomix: full-repo packing with smart compression
Repomix is the most polished repo-packing tool in this set. It supports local and remote repositories, several output formats, config files, ignore handling, token counts, secret scanning, git logs, diffs, output splitting, GitHub Actions, Docker, and Tree-sitter compression through WASM.
When the job is "prepare this codebase for an LLM," use it. The mistake is reaching for a packer when the agent has not established which files matter.
Its Tree-sitter compression path is also a good example of practical engineering. Repomix uses WASM Tree-sitter so compression can stay cross-platform, avoid native compilation, bundle parsers, and accept the overhead where the distribution trade-off is worth it.
Tools reveal their philosophy through what they make cheap. Codemap makes repeated local structural queries cheap. Aider makes coding-session context cheap. Gitingest and Repomix make "send the repo to a model" cheap. ghx makes not cloning cheap.
Not cloning is not laziness. It is a different default. "Just clone it" is not engineering guidance. It is a local optimum disguised as wisdom. Sometimes you should clone it. Sometimes cloning is the wrong first IO operation.
The agent does not need a copy of the repo to ask what is in the repo.
Agentic engineering is a cost model
The phrase "agentic engineering" gets abused because people want it to mean architecture plus vibes. It is simpler than that. In practice, agentic engineering is the discipline of shaping the environment so the agent can make the next correct move with less context, less guessing, and fewer irreversible actions.
That means the tool should care about:
- what the agent knows right now
- what the agent needs to decide next
- what evidence would change that decision
- how much irrelevant context the tool is about to inject
- whether the output can be checked by the next command
If a tool cannot answer those questions, it is not an agent tool yet. It is a content hose.
ghx is built from the opposite direction. It assumes the agent is not ready for the whole repo. It assumes the first answer should be small. It assumes the agent should be able to escalate: repo overview, tree, map, filtered map, grep, file read.
That is why this little binary has taken so much work. The hard part is not printing files from GitHub. Anyone can do that. The hard part is designing the sequence so agents stop doing the dumb expensive thing by default.
Where ghx fits
ghx is not trying to become a local static-analysis database, a full coding agent, or a repo packer. Good. It should not.
Its bet is narrower: agents often need to inspect GitHub repos they have not cloned and may never clone. They compare libraries, audit dependencies, inspect examples, and check whether a pattern exists before reading implementation.
Clone-first tools add friction in that workflow:
- A clone costs time.
- A clone costs disk.
- A clone may be unavailable in locked-down environments.
- A clone is wasteful if the repo is only being sampled.
- A local index is overkill if the agent only needs three files.
The deeper cost is posture. Once the repo is local, the temptation is to search broadly, read broadly, and stuff context broadly. That can be correct for implementation work. It is wasteful for reconnaissance.
Do not read the file yet.
First ask what files exist.
Then ask what those files contain.
Then read the smallest thing that can answer the question.
ghx keeps that first pass API-native:
ghx explore owner/repo
ghx tree owner/repo src
ghx read owner/repo --map "src/**/*.ts"
ghx read owner/repo --map --kind func path/to/file.ts
ghx read owner/repo --grep "pattern" path/to/file.ts
ghx read owner/repo path/to/file.ts
The model is progressive disclosure:
- Look at repo metadata and top-level files.
- List the tree.
- Map candidate files.
- Filter by symbol kind.
- Read only the files that matter.
The old --map implementation was regex-based. Useful,
but limited. The current engine is parser-backed for common cases:
Go uses go/ast; TypeScript, TSX, JavaScript, JSX,
Python, and Rust use Tree-sitter through a CGo-free Go runtime.
Unsupported languages fall back to regex.
That keeps the promise intact: remote code orientation without cloning. The small interface hides real engineering choices: GraphQL batching, directory handling, glob expansion, parser routing, Go AST where it wins, Tree-sitter where it wins, regex fallback where it is honest, and output designed for agents instead of humans scrolling a terminal.
It also keeps ghx honest. If a parser is not available, ghx falls back. If GitHub's public API does not expose modern code search, ghx does not pretend that it does.
"Bad agent tools fail by looking successful." A query returns results, a context blob looks comprehensive, and the model reasons from the wrong slice.
If the tool cannot be honest about uncertainty, the model will be dishonest with confidence.
ghx should give the agent enough structure to choose the next read. It should not pretend every GitHub investigation needs a local database, a giant prompt artifact, or a full call graph.
The smallest useful version
If you want the practical setup, the
ghx README
has the install commands, MCP config, and ghx skill
output for teaching an agent how to use it. The whole idea fits in
one instruction:
When inspecting a GitHub repo, use ghx explore, ghx tree, ghx read --map, or ghx read --grep before cloning, packing, or reading whole files.
A good first pass is intentionally boring:
npx @gkoreli/ghx explore owner/repo
npx @gkoreli/ghx read owner/repo --map "src/**/*.ts"
npx @gkoreli/ghx read owner/repo path/to/file.ts
Use the tool that matches the decision
Where is the code?
├── Local filesystem
│ ├── Need deep structural/indexed analysis? -> Codemap
│ ├── Coding inside Aider? -> Aider repo map
│ └── Need one packed prompt artifact? -> Repomix or Gitingest
└── Remote GitHub repo
├── Need a complete packed artifact? -> Repomix remote or Gitingest
└── Need to inspect before reading/cloning? -> ghx
├── Start with repo shape -> ghx explore / ghx tree
├── Need structure -> ghx read --map
├── Need only functions/types/imports -> ghx read --map --kind
└── Need implementation -> ghx read
The next ghx surface should follow the same rule. A top-level
ghx map command would be useful for humans, but it
should be a wrapper over read --map, not a second
implementation. Repo-wide mapping should be capped and filtered by
default. If a user wants the whole repo packed, Repomix and
Gitingest already exist.
"Before you clone, map." Not always. Not forever. Just first.
That sentence is useful because it changes behavior. It tells the agent to delay the irreversible move from "I am investigating" to "I am loading context."
- Do not clone before you know the repo matters.
- Do not pack before you know the files matter.
- Do not read implementation before structure.
- Do not call legacy search modern.
- Do not mistake more context for better engineering.
Codemap is the local structural index. Aider's repo map is context selection inside a coding agent. Gitingest is the quick repo digest. Repomix is the full-featured repo packer. ghx is the remote GitHub exploration layer.
The point is not to make ghx win every row of a comparison table. The point is to stop using the wrong row. Universal claims age badly. Narrow tools with clear cost models survive contact with real workflows.
That is why you do not always need Codemap. Sometimes you absolutely do. But sometimes the agent is standing outside the repo, hand on the doorknob, not yet sure it should go in.
That is ghx territory.
I trust a tool more when it can tell me what not to use it for.