Configure

Configuration

Configuration

Where gitcrawl reads settings from, and how to override them.

#Resolution order

For each setting, gitcrawl looks in this order and uses the first match:

  1. CLI flag (e.g., --config, --summary-model)
  2. Environment variable (GITCRAWL_*, then standard GITHUB_TOKEN / OPENAI_API_KEY)
  3. [env] table inside config.toml
  4. Top-level config field inside config.toml
  5. Built-in default

#Default paths

PathPurpose
~/.config/gitcrawl/config.tomlConfiguration file
~/.config/gitcrawl/gitcrawl.dbSQLite database
~/.config/gitcrawl/cache/Caches (PR detail, gh shim fallthrough)
~/.config/gitcrawl/cache/gh-shim/gh shim fallthrough cache
~/.config/gitcrawl/vectors/Vector store backing embeddings
~/.config/gitcrawl/logs/Operational logs

Override the config root by setting GITCRAWL_CONFIG=/path/to/config.toml or by passing --config to any command.

#config.toml

gitcrawl init writes a minimal config. You can edit it by hand or with gitcrawl configure:

summary_model = "gpt-5.4"
embed_model = "text-embedding-3-small"
embed_dimensions = 1024
embedding_basis = "title_original"

[env]
GITHUB_TOKEN = "<github-token>"
OPENAI_API_KEY = "<openai-api-key>"

[portable_store]
url = "https://github.com/org/portable-store.git"
db_path = "data/openclaw__openclaw.sync.db"
checkout_dir = "/Users/me/.config/gitcrawl/portable"

#Notable fields

FieldDefaultNotes
summary_modelgpt-5.4Reserved for future summary commands
embed_modeltext-embedding-3-smallOpenAI embedding model
embed_dimensions1024Must match the model
embedding_basistitle_originalOnly title_original is implemented
[env](empty)Sets process env at startup; useful for tokens you do not want in your shell rc
[portable_store](empty)Used when working from a shared, Git-backed cache

#Environment variables

#Core

VariablePurpose
GITCRAWL_CONFIGOverride config path
GITCRAWL_DB_PATHOverride database path
GITHUB_TOKENGitHub API token (required for sync, gh shim fallthroughs)
OPENAI_API_KEYOpenAI API key (required for embed)

#Model overrides

VariablePurpose
GITCRAWL_SUMMARY_MODELOverride summary model
GITCRAWL_EMBED_MODELOverride embedding model
GITCRAWL_OPENAI_RETRY_DISABLEDSet to 1 to disable OpenAI retry/backoff
GITCRAWL_OPENAI_BASE_URL / OPENAI_BASE_URLCustom OpenAI endpoint (e.g., for a proxy)

#GitHub overrides

VariablePurpose
GITCRAWL_GITHUB_BASE_URL / GITHUB_BASE_URLCustom GitHub API endpoint (used by sync and the gh shim)
GH_HOSTGitHub host; included in the gh shim cache key
GH_REPODefault repo when -R is omitted; included in the gh shim cache key

#gh shim

VariablePurpose
GITCRAWL_GH_PATHPath to the real gh binary used for fallthrough
GITCRAWL_GH_AUTO_HYDRATESet to 0 to disable PR auto-hydration on cache miss
GITCRAWL_GH_CACHE_TTLOverride fallthrough cache TTL (e.g., 5m, 1h)
GITCRAWL_GH_CACHE_ERRORSSet to 0 to avoid caching non-zero read-only fallthroughs

If GITCRAWL_GH_PATH is unset, the shim probes common Homebrew install paths and then your PATH. Set it explicitly when you symlink the gitcrawl binary as gh (otherwise the shim will recurse into itself).

#Global flags

These flags work on every command:

FlagDefaultDescription
--config <path>$GITCRAWL_CONFIG or defaultOverride config path for this invocation
--format text|json|logtextOutput format
--json(off)Shorthand for --format json
--no-color(off)Suppress ANSI color codes
--version(off)Print version and exit (global only)

--json overrides --format. Both are honored on subcommands that produce output.

#gitcrawl configure

Interactive-friendly config edits without opening the file:

gitcrawl configure --summary-model gpt-5.4
gitcrawl configure --embed-model text-embedding-3-small
gitcrawl configure --embedding-basis title_original
gitcrawl configure --json

Returns the resolved config path, the values that were updated, and the now-current model selection. See gitcrawl configure --help.

#gitcrawl doctor

A health check for everything covered above:

gitcrawl doctor          # human-readable
gitcrawl doctor --json   # for scripts

Reports config path and existence, database path, whether GITHUB_TOKEN and OPENAI_API_KEY are present (and whether they came from env vs. config), the active summary/embed models, the embedding basis, and counts of repositories, threads, open threads, clusters, plus the last sync timestamp. If the API call surface is unsupported (older Go, missing crypto), api_supported: false is reported so you can investigate.