Plaintext Wins: Build a Git-Backed Wiki Your AI Agents and Engineers Trust

By Diogo Hudson Dias
Engineer reviewing a documentation pull request on a large monitor with a vector database dashboard on a second screen in a São Paulo office.

You don’t have a hallucination problem. You have a documentation problem. If your operational knowledge lives across Notion, Confluence, Slack threads, Google Docs, and five vendor wikis, your agents (and your new hires) are playing telephone. The result: slow incident response, flaky automations, and a creeping distrust of AI inside engineering.

There’s a fix that’s boring, fast, and works: a Git-backed, plaintext wiki that humans and agents both read and write. Inspired by recent community chatter about Karpathy-style LLM wikis and the reminder that plain text endures, this is not nostalgia. It’s a decision that optimizes for auditability, latency, and cost—exactly what a production AI stack needs.

Why plaintext + Git beats your current sprawl

  • Auditability you already know: PRs, diffs, CODEOWNERS, and reviews are muscle memory for your team. “Docs as code” gives you provenance and change control without buying another tool.
  • LLM-friendly by default: Markdown is token-friendly, chunkable, and compresses well. You can feed it to RAG without fighting proprietary formats or brittle connectors.
  • Cheap and fast: A 100k-chunk vector index (768-dim float32) is ~300 MB of vectors plus index overhead—call it under 500 MB. Modern vector stores return top-k in under 50 ms at that scale.
  • Offline-first durability: If your vendors throttle APIs or change formats, your knowledge still compiles. Git mirrors and S3 snapshots outlive tools and org charts.

A CTO’s decision framework for a Git-backed, agent-readable wiki

1) Scope: what belongs (and what doesn’t)

  • In: Architecture overviews, runbooks, playbooks, service contracts, API specs, product FAQs, onboarding checklists, CI/CD docs, shadow IT inventories, vendor limits and SLAs.
  • Out: Secrets, credentials, customer PII, and unredacted logs. Don’t argue with this. Your wiki is for reference and reasoning, not keys or raw data.
  • Classification: Apply a simple triage tag in front matter—green (public-internal), amber (restricted-internal), red (goes elsewhere). Agents never see red. Amber is gated by directory.

2) Repo topology and metadata

  • One kb repo per business unit (or a single mono-kb if you’re sub-100 engineers). Over-sharding kills discoverability; under-sharding overloads owners.
  • Folder layout by domain: platform/, app/, data/, sec/, ops/, product/. Within each, service-level folders with a standard skeleton: overview.md, runbook.md, metrics.md, limits.md, faq.md.
  • Front matter: Title, tags, owners (Git handles), last_verified_at, TTL_days, sensitivity. Your CI can enforce presence and freshness of these fields.

3) Contribution and review

  • PRs only: Humans and agents both propose changes via PR. No direct commits to main. Require at least one human approval for any change to runbooks or limits.
  • CI guards: Lint Markdown, spellcheck, broken link check, basic PII regex scan, and a “doc freshness” job that flags outdated last_verified_at. Fail the build for red flags.
  • SLA by tier: Tier 0 (incidents, security) PRs get 4-hour review targets; Tier 1 (product-critical) 24 hours; Tier 2 weekly.

4) Storage and indexing: hybrid beats pure vector

  • Primary: GitHub or GitLab with protected branches, mirrored to S3 nightly. Keep the wiki repo decoupled from your code repos.
  • Search: Use a hybrid index: an inverted index (OpenSearch, Meilisearch) for exact lookup plus a vector store (Qdrant, Weaviate) for semantic. Blend scores at query time for the best of both worlds.
  • Ingestion: Watch git diffs; chunk changed Markdown into 300–500 token spans with 10–15% overlap; embed and upsert. This avoids re-indexing the world on every commit.

5) Agent interface: read, cite, and write like an engineer

  • Retrieval policy: Top-k=8 hybrid candidates, then cross-encoder re-rank to 3–5 excerpts. Force agents to cite file paths and line ranges in responses. Log citations.
  • Updates via PRs: When an agent detects drift (e.g., a 429 from a vendor API contradicts documented limits), it opens a PR with a minimal diff and links to evidence. Humans review; CI enforces freshness.
  • Memory hygiene: Give agents a short-term scratchpad (Redis with 24–72 hour TTL) for ephemeral context. Durable memory lives in Git after review.

6) Identity, permissions, and blast radius

  • AuthN/AuthZ: OIDC SSO to your Git provider; CODEOWNERS for domain gates; protected branches for main; bots scoped to kb repos only.
  • Directory-level access, not file-level, to keep complexity tolerable. If you need doc-level ACLs, your problem is classification, not tooling.
  • Agent RBAC: Separate service accounts per agent with write limited to draft branches. Every agent PR labels itself with the calling workflow and run id.

7) Toolchain for humans and machines

  • Markdown (GFM) with headings, tables, and mermaid diagrams. Avoid proprietary embeds. Keep images alongside docs with descriptive alt text to help OCR/LLM pipelines.
  • Static site for humans: Docusaurus or MkDocs-Material published to an internal domain. Fast, searchable, and versioned.
  • Diagram discipline: Prefer text-based diagrams checked into Git rather than PNG exports. Agents can diff and update text; they can’t edit your screenshot.

8) Quality, drift detection, and telemetry

  • Coverage metric: Track “knowledge coverage” as (# of Tier 0/1 services with current runbooks and limits) divided by total Tier 0/1 services. Target 90%+ in 60 days.
  • Automated drift probes: Lightweight jobs that query critical vendor APIs for limits/endpoints and compare with docs. Open a PR when they diverge by a threshold (e.g., 10%).
  • Search analytics: Log failed queries and zero-result searches on the static site and agent layer; turn the top 20 into docs each sprint.

9) Cost and performance: numbers you can defend

  • Scale example: 5,000 Markdown pages averaging 800 tokens chunked into 400-token spans ≈ 10,000 chunks.
  • Storage: 10,000 vectors × 3 KB ≈ 30 MB of vectors; with HNSW index and metadata, budget 100–150 MB.
  • Latency: Hybrid search at 10–100k chunks typically returns in 30–70 ms on commodity hardware. Rerank adds 10–25 ms CPU time. End-to-end RAG under 200 ms is achievable on-prem.
  • Embedding cost: If you self-host a small embedding model, a single T4-class GPU can embed 10–20k chunks in minutes; CPU-only is slower but cheap for nightly jobs. Managed embeddings are cents-to-dollars per reindex—still negligible at this scale.

A pragmatic reference architecture

  • Authoring: GitHub/GitLab repo (kb) → PRs (humans + agents) → CI (lint, PII scan, freshness, broken links) → protected main.
  • Publish: Static site build to internal domain with built-in keyword search.
  • Index: Ingestion service watches repo diffs → chunk/normalize → embed → upsert to Qdrant (semantic) and OpenSearch (lexical).
  • Serve: Retrieval API blends lexical + vector → optional rerank → returns ranked excerpts + citations → agent/human consumers.
  • Govern: Drift probes + search analytics → PRs to close gaps → dashboards for coverage/freshness.

Security and compliance: boring on purpose

  • No secrets in the wiki. Treat red-classified knowledge as non-wiki and point to your vault or ticket system.
  • PII guardrails: CI regexes catch obvious leakage; periodic DLP scans catch the rest. Fail builds on hits.
  • Backups: Nightly S3 mirror with immutable retention and cross-region replication. Test restores quarterly.
  • Legal holds: Tag commits associated with investigations; prevent garbage collection of related branches.

Common failure modes (and how to avoid them)

  • Letting agents commit to main: Make every agent change go through a PR with human review. This preserves trust.
  • Trying to model permissions at the file level: It’s a trap. Use directory-level scoping and classify better.
  • “We’ll sync from Notion later”: Connectors rot and rate limit. Migrate critical docs into Markdown now; leave a deprecation banner behind.
  • Binary-only diagrams and PDFs: If an LLM can’t diff it, it can’t maintain it. Prefer text-first assets. Run OCR on legacy PDFs and annotate with alt text.
  • Indexing every commit globally: Watch diffs; only re-embed changed chunks. This keeps cost and latency predictable.

A 30-60-90 day rollout that won’t derail delivery

Days 0–30: Stand up the backbone

  • Create kb repo(s), choose static site generator, enforce CI guards (lint, PII scan, freshness), and define front matter and folder skeleton.
  • Instrument hybrid search (OpenSearch + Qdrant) and a simple retrieval API. Don’t over-optimize ranking yet.
  • Migrate Tier 0 runbooks and limits for your top 10 services. Appoint owners and set SLAs.

Days 31–60: Wire in agents and drift detection

  • Enable “agent PRs” with a scoped bot account. Require citations for any agent-authored change.
  • Add drift probes for two external dependencies (e.g., auth provider rate limits, billing API pagination). Wire probes to auto-open PRs on divergence.
  • Publish usage dashboards: coverage %, freshness %, top failed searches.

Days 61–90: Scale and harden

  • Expand to Tier 1 services and platform docs. Target 90% coverage on Tier 0/1.
  • Add reranking and tune blending weights. Aim for sub-200 ms end-to-end retrieval.
  • Run a red team exercise: attempt to slip PII into the wiki and bypass CI. Fix the findings.

Where nearshore teams help (and where they don’t)

If you’re starved for capacity, a disciplined nearshore team can bootstrap this without hijacking your roadmap. In Brazil you get 6–8 hours of overlap with US time zones, Git-native workflows, and engineers who live in CI. They’re useful for the unglamorous work: doc migration, CI plumbing, drift probes, and indexing pipelines. Keep ownership and review with your domain leads; don’t outsource the voice of your runbooks.

When not to do this

  • If 90% of your knowledge is screenshots and vendor PDFs and you can’t commit to text-first going forward, you’ll fight the tide. Fix that first.
  • If you don’t have a review culture, a Git wiki becomes a graveyard as fast as your Notion did. Appoint owners and enforce SLAs.
  • If your primary constraint is compliance-driven doc access at a per-user, per-file level, accept a heavier CMS with doc-level ACLs and integrate cautiously with agents.

The strategic point: unify human truth and agent truth

Most teams accidentally build two knowledge graphs: one for people (pretty pages) and one for machines (indexes and prompts). That’s where hallucinations and drift originate. A Git-backed, plaintext wiki collapses the distance. Humans propose and review; agents cite and PR. Everyone sees the same truth, with the same guardrails, using the same workflow you already trust for code.

Key Takeaways

  • Plaintext + Git gives you auditability, speed, and LLM-friendliness without buying another platform.
  • Keep secrets and PII out; classify docs and gate by directory with CODEOWNERS and protected branches.
  • Use a hybrid search stack (lexical + vector) fed by a diff-aware ingestion pipeline; aim for sub-200 ms retrieval.
  • Force agents to cite sources and submit PRs; never let them commit to main.
  • Measure coverage and freshness, add drift probes, and turn failed searches into docs each sprint.
  • Start with Tier 0 runbooks and limits; hit 90% coverage in 60–90 days without derailing delivery.

Ready to scale your engineering team?

Tell us about your project and we'll get back to you within 24 hours.

Start a conversation