gh shim
A gh-compatible binary that answers from local SQLite first and falls through to the real gh for everything else. The fastest way to cut GitHub API load across an agent fleet.
#What it is
The same gitcrawl binary serves a gh-compatible mode. Invoked as gitcrawl gh ..., or as gh / gitcrawl-gh via symlink, it intercepts read-only commands and serves them from the local mirror. Anything it cannot serve locally falls through to the real gh binary you already have installed, with a short persistent cache layered on top.
The shim never adds GitHub write behavior. Mutating commands (gh issue close, gh pr merge, gh api -X POST ..., gh label create, etc.) pass straight through to the real gh, increment a write counter, and clear the relevant cache entries on success.
#Install
# Side-by-side: agents opt in by calling `gitcrawl-gh`.
mkdir -p "$HOME/bin"
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh"
# Or replace the global `gh` so every caller picks up the cache automatically.
REAL_GH="$(command -v gh)" # capture this before shadowing gh
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gh"
export GITCRAWL_GH_PATH="$REAL_GH" # tell the shim where the real gh is
Make sure ~/bin is on PATH before the original gh location if you want the shim to be picked up as gh. If GITCRAWL_GH_PATH is unset, the shim probes common Homebrew paths and then PATH. Set it explicitly when you replace the global gh so the shim does not recurse into itself.
#Supported local reads
#gh search issues|prs
gh search issues "download stalls" -R owner/repo --state open \
--match comments --json number,title,url
gh search prs "manifest cache" -R owner/repo --state open \
--json number,title,url --limit 20
Answered from the local FTS index. Honors --state, --json, --limit. --match is accepted for parity (the local index already covers documents). Falls through if an unsupported filter combination is requested.
#gh issue view / gh pr view
gh issue view 123 -R owner/repo --json number,title,state,url,body,labels,author
gh pr view 123 -R owner/repo --json number,title,state,url,isDraft,author,headRef,baseRef
Supported JSON fields include number, title, state, url, body, author, createdAt, updatedAt, closedAt, labels, plus PR-specific isDraft, headRef, baseRef. PR detail fields (files, commits, checks, statusCheckRollup) are answered from cached PR detail and trigger auto-hydration on miss.
#gh issue list / gh pr list
gh issue list -R owner/repo --state open --search "hot loop" \
--author octocat --label bug --label triage --json number,title,url
gh pr list -R owner/repo --state open --search "manifest cache" \
--assignee me --json number,title,url
Supports --state, --search (keyword search), --author, --assignee, repeated --label, --limit, and --json. Falls through for unsupported filters.
#gh pr checks
gh pr checks 123 -R owner/repo --json name,state,conclusion,detailsUrl
Returns the cached check/status summary for the PR. If the cached PR detail is older than 90 seconds or its head SHA is stale, auto-hydration refreshes it before answering. Supported fields: name, state, status, conclusion, detailsUrl, workflow, startedAt, completedAt.
#gh run list / gh run view
gh run list -R owner/repo --branch main --limit 20 \
--json databaseId,workflowName,status,conclusion
gh run view 123456789 -R owner/repo --json status,conclusion,headSha
Workflow runs come from cached PR detail. Filters: --branch, --commit (head SHA). Supported fields: databaseId, id, number, workflowName, name, displayTitle, status, conclusion, url, event, headBranch, headSha, createdAt, updatedAt.
#Read-only fallthroughs (cached)
These commands always run real gh but the response body is cached for the next caller in the same workspace:
gh pr diff— keyed by the cached PR head SHA when available, so the cache is stable across many sequential agent readsgh issue list/status/view,gh pr list/status/view/checks, and unsupported read-only local shim shapesgh release list/view,gh workflow list/view,gh secret list, andgh variable get/listgh project list/view/field-list/item-list,gh ruleset check/list/view,gh gist list/view, andgh org listgh repo view/gh repo listgh search code/commits/issues/prs/reposgh label listgh api <GET path>— onlyGETrequests; never cached forPOST/PATCH/DELETE/PUT
Default cache TTL is short (30 seconds for most reads, 60 seconds for gh api, 5 minutes for gh pr diff without a stable SHA, 7 days for gh pr diff with a stable SHA). Override with GITCRAWL_GH_CACHE_TTL=5m or similar.
Repeat read failures are cached by default too. That avoids a fleet of agents all rediscovering the same missing release, workflow, secret, or unsupported field. Set GITCRAWL_GH_CACHE_ERRORS=0 to cache successful reads only.
#Auto-hydration
When a local PR-detail read misses the cache, the shim can auto-hydrate exactly one PR before falling back:
- Shim detects missing or stale PR detail (older than 90s, or head SHA mismatch)
- If
GITCRAWL_GH_AUTO_HYDRATE != 0(the default), runsgitcrawl sync --numbers <n> --with pr-details - Retries the local query against the freshly populated cache
- Falls through to the real
ghif hydration failed
This keeps gh pr view, gh pr checks, and gh run reads cheap and fresh without manual sync orchestration. Disable with GITCRAWL_GH_AUTO_HYDRATE=0 if you want the shim to be strictly cache-or-fallthrough.
#Cache inspection: xcache
gitcrawl gh xcache stats # summary
gitcrawl gh xcache keys # per-entry detail
gitcrawl gh xcache gc # remove expired entries + stale lock files
gitcrawl gh xcache flush # clear everything
All accept --json for scripting.
stats JSON:
{
"cache_dir": "/Users/me/.config/gitcrawl/cache/gh-shim",
"entries": 142,
"expired": 6,
"locks": 0,
"bytes": 1841234,
"counters": {
"local_hits": 540,
"fallback_hits": 88,
"backend_misses": 12,
"pass_through_writes": 4
},
"commands": {
"pr diff": { "entries": 30, "bytes": 184320 },
"release view": { "entries": 14, "bytes": 18230 }
}
}
local_hits are answered from SQLite; fallback_hits are answered from the fallthrough cache; backend_misses actually hit GitHub. Watching the ratio is the easiest way to confirm the shim is paying for itself.
#Cache key composition
Cache keys are deterministic SHA-256 hashes of:
- A version tag (
v2) - The resolved gitcrawl config path
- The current working directory
- The
GH_HOSTenv var - The
GH_REPOenv var - For
gh pr diff: the stable identitypr-diff:owner/repo:number:head-sha(when available) - The full command argument vector, null-separated
This isolates sibling checkouts and portable stores while still coalescing repeated calls from the same agent workspace. Concurrent cache misses use a lock file so one process populates the entry while peers wait for the result, instead of all of them firing at GitHub.
#What does not flow through the shim
- Mutating commands —
gh issue close,gh pr merge,gh pr comment,gh api -X POST, etc. These pass straight through, incrementpass_through_writes, and clear the relevant cache entries on success. - Auth flows —
gh auth login,gh auth refresh, etc. Always realgh. - Anything the shim does not recognize — falls through unmodified.
#Agent integration
Pattern: replace gh with gitcrawl-gh (or symlink to gh) for every agent in the fleet, then keep your existing prompts and tools. Most read-only triage flows ("look up this issue", "check the PR status", "list open issues for this label") become local-only without any prompt changes.
For best results, schedule a periodic gitcrawl refresh owner/repo (every few minutes per repo, depending on activity) so the local mirror stays warm. The shim's --sync-if-stale (via gitcrawl search) and auto-hydration handle the rest.
See Automation for full agent recipes and JSON contracts.