2026-04-25 · 9 min read

Plaintext Wins: Build a Git-Backed Wiki Your AI Agents and Engineers Trust

By Diogo Hudson Dias

Engineer reviewing a documentation pull request on a large monitor with a vector database dashboard on a second screen in a São Paulo office.

You don’t have a hallucination problem. You have a documentation problem. If your operational knowledge lives across Notion, Confluence, Slack threads, Google Docs, and five vendor wikis, your agents (and your new hires) are playing telephone. The result: slow incident response, flaky automations, and a creeping distrust of AI inside engineering.

There’s a fix that’s boring, fast, and works: a Git-backed, plaintext wiki that humans and agents both read and write. Inspired by recent community chatter about Karpathy-style LLM wikis and the reminder that plain text endures, this is not nostalgia. It’s a decision that optimizes for auditability, latency, and cost—exactly what a production AI stack needs.

Why plaintext + Git beats your current sprawl

Auditability you already know: PRs, diffs, CODEOWNERS, and reviews are muscle memory for your team. “Docs as code” gives you provenance and change control without buying another tool.
LLM-friendly by default: Markdown is token-friendly, chunkable, and compresses well. You can feed it to RAG without fighting proprietary formats or brittle connectors.
Cheap and fast: A 100k-chunk vector index (768-dim float32) is ~300 MB of vectors plus index overhead—call it under 500 MB. Modern vector stores return top-k in under 50 ms at that scale.
Offline-first durability: If your vendors throttle APIs or change formats, your knowledge still compiles. Git mirrors and S3 snapshots outlive tools and org charts.

A CTO’s decision framework for a Git-backed, agent-readable wiki

1) Scope: what belongs (and what doesn’t)

In: Architecture overviews, runbooks, playbooks, service contracts, API specs, product FAQs, onboarding checklists, CI/CD docs, shadow IT inventories, vendor limits and SLAs.
Out: Secrets, credentials, customer PII, and unredacted logs. Don’t argue with this. Your wiki is for reference and reasoning, not keys or raw data.
Classification: Apply a simple triage tag in front matter—green (public-internal), amber (restricted-internal), red (goes elsewhere). Agents never see red. Amber is gated by directory.

2) Repo topology and metadata

One kb repo per business unit (or a single mono-kb if you’re sub-100 engineers). Over-sharding kills discoverability; under-sharding overloads owners.
Folder layout by domain: platform/, app/, data/, sec/, ops/, product/. Within each, service-level folders with a standard skeleton: overview.md, runbook.md, metrics.md, limits.md, faq.md.
Front matter: Title, tags, owners (Git handles), last_verified_at, TTL_days, sensitivity. Your CI can enforce presence and freshness of these fields.

3) Contribution and review

PRs only: Humans and agents both propose changes via PR. No direct commits to main. Require at least one human approval for any change to runbooks or limits.
CI guards: Lint Markdown, spellcheck, broken link check, basic PII regex scan, and a “doc freshness” job that flags outdated last_verified_at. Fail the build for red flags.
SLA by tier: Tier 0 (incidents, security) PRs get 4-hour review targets; Tier 1 (product-critical) 24 hours; Tier 2 weekly.

4) Storage and indexing: hybrid beats pure vector

Primary: GitHub or GitLab with protected branches, mirrored to S3 nightly. Keep the wiki repo decoupled from your code repos.
Search: Use a hybrid index: an inverted index (OpenSearch, Meilisearch) for exact lookup plus a vector store (Qdrant, Weaviate) for semantic. Blend scores at query time for the best of both worlds.
Ingestion: Watch git diffs; chunk changed Markdown into 300–500 token spans with 10–15% overlap; embed and upsert. This avoids re-indexing the world on every commit.

5) Agent interface: read, cite, and write like an engineer

Retrieval policy: Top-k=8 hybrid candidates, then cross-encoder re-rank to 3–5 excerpts. Force agents to cite file paths and line ranges in responses. Log citations.
Updates via PRs: When an agent detects drift (e.g., a 429 from a vendor API contradicts documented limits), it opens a PR with a minimal diff and links to evidence. Humans review; CI enforces freshness.
Memory hygiene: Give agents a short-term scratchpad (Redis with 24–72 hour TTL) for ephemeral context. Durable memory lives in Git after review.

6) Identity, permissions, and blast radius

AuthN/AuthZ: OIDC SSO to your Git provider; CODEOWNERS for domain gates; protected branches for main; bots scoped to kb repos only.
Directory-level access, not file-level, to keep complexity tolerable. If you need doc-level ACLs, your problem is classification, not tooling.
Agent RBAC: Separate service accounts per agent with write limited to draft branches. Every agent PR labels itself with the calling workflow and run id.

7) Toolchain for humans and machines

Markdown (GFM) with headings, tables, and mermaid diagrams. Avoid proprietary embeds. Keep images alongside docs with descriptive alt text to help OCR/LLM pipelines.
Static site for humans: Docusaurus or MkDocs-Material published to an internal domain. Fast, searchable, and versioned.
Diagram discipline: Prefer text-based diagrams checked into Git rather than PNG exports. Agents can diff and update text; they can’t edit your screenshot.

8) Quality, drift detection, and telemetry

Coverage metric: Track “knowledge coverage” as (# of Tier 0/1 services with current runbooks and limits) divided by total Tier 0/1 services. Target 90%+ in 60 days.
Automated drift probes: Lightweight jobs that query critical vendor APIs for limits/endpoints and compare with docs. Open a PR when they diverge by a threshold (e.g., 10%).
Search analytics: Log failed queries and zero-result searches on the static site and agent layer; turn the top 20 into docs each sprint.

9) Cost and performance: numbers you can defend

Scale example: 5,000 Markdown pages averaging 800 tokens chunked into 400-token spans ≈ 10,000 chunks.
Storage: 10,000 vectors × 3 KB ≈ 30 MB of vectors; with HNSW index and metadata, budget 100–150 MB.
Latency: Hybrid search at 10–100k chunks typically returns in 30–70 ms on commodity hardware. Rerank adds 10–25 ms CPU time. End-to-end RAG under 200 ms is achievable on-prem.
Embedding cost: If you self-host a small embedding model, a single T4-class GPU can embed 10–20k chunks in minutes; CPU-only is slower but cheap for nightly jobs. Managed embeddings are cents-to-dollars per reindex—still negligible at this scale.

A pragmatic reference architecture

Authoring: GitHub/GitLab repo (kb) → PRs (humans + agents) → CI (lint, PII scan, freshness, broken links) → protected main.
Publish: Static site build to internal domain with built-in keyword search.
Index: Ingestion service watches repo diffs → chunk/normalize → embed → upsert to Qdrant (semantic) and OpenSearch (lexical).
Serve: Retrieval API blends lexical + vector → optional rerank → returns ranked excerpts + citations → agent/human consumers.
Govern: Drift probes + search analytics → PRs to close gaps → dashboards for coverage/freshness.

Security and compliance: boring on purpose

No secrets in the wiki. Treat red-classified knowledge as non-wiki and point to your vault or ticket system.
PII guardrails: CI regexes catch obvious leakage; periodic DLP scans catch the rest. Fail builds on hits.
Backups: Nightly S3 mirror with immutable retention and cross-region replication. Test restores quarterly.
Legal holds: Tag commits associated with investigations; prevent garbage collection of related branches.

Common failure modes (and how to avoid them)

Letting agents commit to main: Make every agent change go through a PR with human review. This preserves trust.
Trying to model permissions at the file level: It’s a trap. Use directory-level scoping and classify better.
“We’ll sync from Notion later”: Connectors rot and rate limit. Migrate critical docs into Markdown now; leave a deprecation banner behind.
Binary-only diagrams and PDFs: If an LLM can’t diff it, it can’t maintain it. Prefer text-first assets. Run OCR on legacy PDFs and annotate with alt text.
Indexing every commit globally: Watch diffs; only re-embed changed chunks. This keeps cost and latency predictable.

A 30-60-90 day rollout that won’t derail delivery

Days 0–30: Stand up the backbone

Create kb repo(s), choose static site generator, enforce CI guards (lint, PII scan, freshness), and define front matter and folder skeleton.
Instrument hybrid search (OpenSearch + Qdrant) and a simple retrieval API. Don’t over-optimize ranking yet.
Migrate Tier 0 runbooks and limits for your top 10 services. Appoint owners and set SLAs.

Days 31–60: Wire in agents and drift detection

Enable “agent PRs” with a scoped bot account. Require citations for any agent-authored change.
Add drift probes for two external dependencies (e.g., auth provider rate limits, billing API pagination). Wire probes to auto-open PRs on divergence.
Publish usage dashboards: coverage %, freshness %, top failed searches.

Days 61–90: Scale and harden

Expand to Tier 1 services and platform docs. Target 90% coverage on Tier 0/1.
Add reranking and tune blending weights. Aim for sub-200 ms end-to-end retrieval.
Run a red team exercise: attempt to slip PII into the wiki and bypass CI. Fix the findings.

Where nearshore teams help (and where they don’t)

If you’re starved for capacity, a disciplined nearshore team can bootstrap this without hijacking your roadmap. In Brazil you get 6–8 hours of overlap with US time zones, Git-native workflows, and engineers who live in CI. They’re useful for the unglamorous work: doc migration, CI plumbing, drift probes, and indexing pipelines. Keep ownership and review with your domain leads; don’t outsource the voice of your runbooks.

When not to do this

If 90% of your knowledge is screenshots and vendor PDFs and you can’t commit to text-first going forward, you’ll fight the tide. Fix that first.
If you don’t have a review culture, a Git wiki becomes a graveyard as fast as your Notion did. Appoint owners and enforce SLAs.
If your primary constraint is compliance-driven doc access at a per-user, per-file level, accept a heavier CMS with doc-level ACLs and integrate cautiously with agents.

The strategic point: unify human truth and agent truth

Most teams accidentally build two knowledge graphs: one for people (pretty pages) and one for machines (indexes and prompts). That’s where hallucinations and drift originate. A Git-backed, plaintext wiki collapses the distance. Humans propose and review; agents cite and PR. Everyone sees the same truth, with the same guardrails, using the same workflow you already trust for code.

Key Takeaways

Plaintext + Git gives you auditability, speed, and LLM-friendliness without buying another platform.
Keep secrets and PII out; classify docs and gate by directory with CODEOWNERS and protected branches.
Use a hybrid search stack (lexical + vector) fed by a diff-aware ingestion pipeline; aim for sub-200 ms retrieval.
Force agents to cite sources and submit PRs; never let them commit to main.
Measure coverage and freshness, add drift probes, and turn failed searches into docs each sprint.
Start with Tier 0 runbooks and limits; hit 90% coverage in 60–90 days without derailing delivery.