You don’t have a hallucination problem. You have a documentation problem. If your operational knowledge lives across Notion, Confluence, Slack threads, Google Docs, and five vendor wikis, your agents (and your new hires) are playing telephone. The result: slow incident response, flaky automations, and a creeping distrust of AI inside engineering.
There’s a fix that’s boring, fast, and works: a Git-backed, plaintext wiki that humans and agents both read and write. Inspired by recent community chatter about Karpathy-style LLM wikis and the reminder that plain text endures, this is not nostalgia. It’s a decision that optimizes for auditability, latency, and cost—exactly what a production AI stack needs.
Why plaintext + Git beats your current sprawl
- Auditability you already know: PRs, diffs, CODEOWNERS, and reviews are muscle memory for your team. “Docs as code” gives you provenance and change control without buying another tool.
- LLM-friendly by default: Markdown is token-friendly, chunkable, and compresses well. You can feed it to RAG without fighting proprietary formats or brittle connectors.
- Cheap and fast: A 100k-chunk vector index (768-dim float32) is ~300 MB of vectors plus index overhead—call it under 500 MB. Modern vector stores return top-k in under 50 ms at that scale.
- Offline-first durability: If your vendors throttle APIs or change formats, your knowledge still compiles. Git mirrors and S3 snapshots outlive tools and org charts.
A CTO’s decision framework for a Git-backed, agent-readable wiki
1) Scope: what belongs (and what doesn’t)
- In: Architecture overviews, runbooks, playbooks, service contracts, API specs, product FAQs, onboarding checklists, CI/CD docs, shadow IT inventories, vendor limits and SLAs.
- Out: Secrets, credentials, customer PII, and unredacted logs. Don’t argue with this. Your wiki is for reference and reasoning, not keys or raw data.
- Classification: Apply a simple triage tag in front matter—green (public-internal), amber (restricted-internal), red (goes elsewhere). Agents never see red. Amber is gated by directory.
2) Repo topology and metadata
- One kb repo per business unit (or a single mono-kb if you’re sub-100 engineers). Over-sharding kills discoverability; under-sharding overloads owners.
- Folder layout by domain: platform/, app/, data/, sec/, ops/, product/. Within each, service-level folders with a standard skeleton: overview.md, runbook.md, metrics.md, limits.md, faq.md.
- Front matter: Title, tags, owners (Git handles), last_verified_at, TTL_days, sensitivity. Your CI can enforce presence and freshness of these fields.
3) Contribution and review
- PRs only: Humans and agents both propose changes via PR. No direct commits to main. Require at least one human approval for any change to runbooks or limits.
- CI guards: Lint Markdown, spellcheck, broken link check, basic PII regex scan, and a “doc freshness” job that flags outdated last_verified_at. Fail the build for red flags.
- SLA by tier: Tier 0 (incidents, security) PRs get 4-hour review targets; Tier 1 (product-critical) 24 hours; Tier 2 weekly.
4) Storage and indexing: hybrid beats pure vector
- Primary: GitHub or GitLab with protected branches, mirrored to S3 nightly. Keep the wiki repo decoupled from your code repos.
- Search: Use a hybrid index: an inverted index (OpenSearch, Meilisearch) for exact lookup plus a vector store (Qdrant, Weaviate) for semantic. Blend scores at query time for the best of both worlds.
- Ingestion: Watch git diffs; chunk changed Markdown into 300–500 token spans with 10–15% overlap; embed and upsert. This avoids re-indexing the world on every commit.
5) Agent interface: read, cite, and write like an engineer
- Retrieval policy: Top-k=8 hybrid candidates, then cross-encoder re-rank to 3–5 excerpts. Force agents to cite file paths and line ranges in responses. Log citations.
- Updates via PRs: When an agent detects drift (e.g., a 429 from a vendor API contradicts documented limits), it opens a PR with a minimal diff and links to evidence. Humans review; CI enforces freshness.
- Memory hygiene: Give agents a short-term scratchpad (Redis with 24–72 hour TTL) for ephemeral context. Durable memory lives in Git after review.
6) Identity, permissions, and blast radius
- AuthN/AuthZ: OIDC SSO to your Git provider; CODEOWNERS for domain gates; protected branches for main; bots scoped to kb repos only.
- Directory-level access, not file-level, to keep complexity tolerable. If you need doc-level ACLs, your problem is classification, not tooling.
- Agent RBAC: Separate service accounts per agent with write limited to draft branches. Every agent PR labels itself with the calling workflow and run id.
7) Toolchain for humans and machines
- Markdown (GFM) with headings, tables, and mermaid diagrams. Avoid proprietary embeds. Keep images alongside docs with descriptive alt text to help OCR/LLM pipelines.
- Static site for humans: Docusaurus or MkDocs-Material published to an internal domain. Fast, searchable, and versioned.
- Diagram discipline: Prefer text-based diagrams checked into Git rather than PNG exports. Agents can diff and update text; they can’t edit your screenshot.
8) Quality, drift detection, and telemetry
- Coverage metric: Track “knowledge coverage” as (# of Tier 0/1 services with current runbooks and limits) divided by total Tier 0/1 services. Target 90%+ in 60 days.
- Automated drift probes: Lightweight jobs that query critical vendor APIs for limits/endpoints and compare with docs. Open a PR when they diverge by a threshold (e.g., 10%).
- Search analytics: Log failed queries and zero-result searches on the static site and agent layer; turn the top 20 into docs each sprint.
9) Cost and performance: numbers you can defend
- Scale example: 5,000 Markdown pages averaging 800 tokens chunked into 400-token spans ≈ 10,000 chunks.
- Storage: 10,000 vectors × 3 KB ≈ 30 MB of vectors; with HNSW index and metadata, budget 100–150 MB.
- Latency: Hybrid search at 10–100k chunks typically returns in 30–70 ms on commodity hardware. Rerank adds 10–25 ms CPU time. End-to-end RAG under 200 ms is achievable on-prem.
- Embedding cost: If you self-host a small embedding model, a single T4-class GPU can embed 10–20k chunks in minutes; CPU-only is slower but cheap for nightly jobs. Managed embeddings are cents-to-dollars per reindex—still negligible at this scale.
A pragmatic reference architecture
- Authoring: GitHub/GitLab repo (kb) → PRs (humans + agents) → CI (lint, PII scan, freshness, broken links) → protected main.
- Publish: Static site build to internal domain with built-in keyword search.
- Index: Ingestion service watches repo diffs → chunk/normalize → embed → upsert to Qdrant (semantic) and OpenSearch (lexical).
- Serve: Retrieval API blends lexical + vector → optional rerank → returns ranked excerpts + citations → agent/human consumers.
- Govern: Drift probes + search analytics → PRs to close gaps → dashboards for coverage/freshness.
Security and compliance: boring on purpose
- No secrets in the wiki. Treat red-classified knowledge as non-wiki and point to your vault or ticket system.
- PII guardrails: CI regexes catch obvious leakage; periodic DLP scans catch the rest. Fail builds on hits.
- Backups: Nightly S3 mirror with immutable retention and cross-region replication. Test restores quarterly.
- Legal holds: Tag commits associated with investigations; prevent garbage collection of related branches.
Common failure modes (and how to avoid them)
- Letting agents commit to main: Make every agent change go through a PR with human review. This preserves trust.
- Trying to model permissions at the file level: It’s a trap. Use directory-level scoping and classify better.
- “We’ll sync from Notion later”: Connectors rot and rate limit. Migrate critical docs into Markdown now; leave a deprecation banner behind.
- Binary-only diagrams and PDFs: If an LLM can’t diff it, it can’t maintain it. Prefer text-first assets. Run OCR on legacy PDFs and annotate with alt text.
- Indexing every commit globally: Watch diffs; only re-embed changed chunks. This keeps cost and latency predictable.
A 30-60-90 day rollout that won’t derail delivery
Days 0–30: Stand up the backbone
- Create kb repo(s), choose static site generator, enforce CI guards (lint, PII scan, freshness), and define front matter and folder skeleton.
- Instrument hybrid search (OpenSearch + Qdrant) and a simple retrieval API. Don’t over-optimize ranking yet.
- Migrate Tier 0 runbooks and limits for your top 10 services. Appoint owners and set SLAs.
Days 31–60: Wire in agents and drift detection
- Enable “agent PRs” with a scoped bot account. Require citations for any agent-authored change.
- Add drift probes for two external dependencies (e.g., auth provider rate limits, billing API pagination). Wire probes to auto-open PRs on divergence.
- Publish usage dashboards: coverage %, freshness %, top failed searches.
Days 61–90: Scale and harden
- Expand to Tier 1 services and platform docs. Target 90% coverage on Tier 0/1.
- Add reranking and tune blending weights. Aim for sub-200 ms end-to-end retrieval.
- Run a red team exercise: attempt to slip PII into the wiki and bypass CI. Fix the findings.
Where nearshore teams help (and where they don’t)
If you’re starved for capacity, a disciplined nearshore team can bootstrap this without hijacking your roadmap. In Brazil you get 6–8 hours of overlap with US time zones, Git-native workflows, and engineers who live in CI. They’re useful for the unglamorous work: doc migration, CI plumbing, drift probes, and indexing pipelines. Keep ownership and review with your domain leads; don’t outsource the voice of your runbooks.
When not to do this
- If 90% of your knowledge is screenshots and vendor PDFs and you can’t commit to text-first going forward, you’ll fight the tide. Fix that first.
- If you don’t have a review culture, a Git wiki becomes a graveyard as fast as your Notion did. Appoint owners and enforce SLAs.
- If your primary constraint is compliance-driven doc access at a per-user, per-file level, accept a heavier CMS with doc-level ACLs and integrate cautiously with agents.
The strategic point: unify human truth and agent truth
Most teams accidentally build two knowledge graphs: one for people (pretty pages) and one for machines (indexes and prompts). That’s where hallucinations and drift originate. A Git-backed, plaintext wiki collapses the distance. Humans propose and review; agents cite and PR. Everyone sees the same truth, with the same guardrails, using the same workflow you already trust for code.
Key Takeaways
- Plaintext + Git gives you auditability, speed, and LLM-friendliness without buying another platform.
- Keep secrets and PII out; classify docs and gate by directory with CODEOWNERS and protected branches.
- Use a hybrid search stack (lexical + vector) fed by a diff-aware ingestion pipeline; aim for sub-200 ms retrieval.
- Force agents to cite sources and submit PRs; never let them commit to main.
- Measure coverage and freshness, add drift probes, and turn failed searches into docs each sprint.
- Start with Tier 0 runbooks and limits; hit 90% coverage in 60–90 days without derailing delivery.