Reference

Reference

Reference

Lookup tables for paths, environment variables, and defaults.

#Paths

PathPurpose
~/.config/gitcrawl/config.tomlConfiguration file
~/.config/gitcrawl/gitcrawl.dbSQLite database
~/.config/gitcrawl/cache/Caches (PR detail, gh-shim fallthrough)
~/.config/gitcrawl/cache/gh-shim/gh-shim fallthrough cache
~/.config/gitcrawl/vectors/Vector store backing embeddings
~/.config/gitcrawl/logs/Operational logs
~/.config/gitcrawl/portable/Portable-store checkout (when configured)

Override the config root with --config <path> or GITCRAWL_CONFIG.

#Environment variables

#Core

VariableDefaultUsed byPurpose
GITCRAWL_CONFIG~/.config/gitcrawl/config.tomlAll commandsOverride config path
GITCRAWL_DB_PATH~/.config/gitcrawl/gitcrawl.dbAll commandsOverride database path
GITHUB_TOKEN(none)sync, gh shimGitHub API token
OPENAI_API_KEY(none)embed, refreshOpenAI API key

#Models

VariableDefaultPurpose
GITCRAWL_SUMMARY_MODELgpt-5.4Summary model (reserved for future commands)
GITCRAWL_EMBED_MODELtext-embedding-3-smallOpenAI embedding model
GITCRAWL_OPENAI_RETRY_DISABLED(off)Set 1 to disable OpenAI retry/backoff
GITCRAWL_OPENAI_BASE_URL / OPENAI_BASE_URLOpenAI defaultCustom OpenAI endpoint

#GitHub overrides

VariableDefaultPurpose
GITCRAWL_GITHUB_BASE_URL / GITHUB_BASE_URLGitHub defaultCustom GitHub API endpoint
GH_HOST(none)Included in gh-shim cache key
GH_REPO(none)Default -R value; included in gh-shim cache key

#gh shim

VariableDefaultPurpose
GITCRAWL_GH_PATH(probed)Path to the real gh binary
GITCRAWL_GH_AUTO_HYDRATE(on)Set 0 to disable PR auto-hydration on cache miss
GITCRAWL_GH_CACHE_TTL30s for most commandsOverride fallthrough cache TTL (e.g., 5m, 1h)
GITCRAWL_GH_CACHE_ERRORS(on)Set 0 to avoid caching non-zero read-only fallthroughs

#Configuration defaults

FieldDefault
summary_modelgpt-5.4
embed_modeltext-embedding-3-small
embed_dimensions1024
embedding_basistitle_original
batch_size (embeddings)64
concurrency (embeddings)2
tui_default_sortsize

#Clustering defaults

ParameterDefaultSource
--threshold0.80cluster, refresh
--cross-kind-threshold0.93cluster, refresh
--min-size1cluster, refresh
--max-cluster-size40cluster, refresh
--k (nearest-neighbor fanout)16cluster, refresh
Weak-edge title overlap floor0.18internal
High-confidence edge score0.90internal
Deterministic reference edge score0.94internal
Body-only reference prefix length240 charsinternal

#TUI defaults

ParameterDefault
--min-size5
--sortsize
Working set limit500 rows
Refresh interval15s

#gh shim cache TTLs

Cache classTTL
Most read-only fallthroughs30s
gh api (GET)60s
gh pr diff without stable head SHA5m
gh pr diff with stable head SHA7d
OverrideGITCRAWL_GH_CACHE_TTL
Cache read failureson by default; disable with GITCRAWL_GH_CACHE_ERRORS=0

#gh shim cache key composition

A SHA-256 hash of:

  • Version tag (v2)
  • Resolved gitcrawl config path
  • Current working directory
  • GH_HOST env var
  • GH_REPO env var
  • For gh pr diff: pr-diff:owner/repo:number:head-sha (when head SHA is known)
  • Full command argument vector (null-separated)

This isolates sibling checkouts and portable stores while coalescing repeated calls from the same workspace.

#Output formats

FormatWhere to use
textHuman terminal use (default)
jsonPipelines, scripts, agents (also via --json)
logInternal structured logging output

#Exit codes

  • 0 — success
  • non-zero — usage error, "not implemented" command, or runtime failure

stderr always carries error messages. stdout is reserved for command output.

#File-system layout (worked example)

~/.config/gitcrawl/
├── config.toml
├── gitcrawl.db                  # SQLite mirror
├── gitcrawl.db-shm              # SQLite shared-memory file
├── gitcrawl.db-wal              # SQLite write-ahead log
├── cache/
│   ├── gh-shim/                 # gh fallthrough cache; inspect with xcache
│   └── pr/                      # hydrated PR detail blobs
├── vectors/                     # vector store backing embeddings
├── logs/
└── portable/                    # portable-store checkout (optional)
    └── data/
        └── owner__repo.sync.db

#See also