Quotery is live at https://www.quotery.io. It is a multi-tenant B2B SaaS that takes the single worst part of distribution and services work — turning a supplier PDF or a customer spreadsheet into a clean, priced quote — and makes it a one-click operation. Then it keeps going: reservation, delivery notes, returns, stock receipts, and an embedded assistant that answers questions about your own data. DHD Tech designed, built, and shipped Quotery. This post is the engineering tour behind the launch.
The public promise on the homepage is "Quote faster. Deliver with certainty." Behind that line there are three numbers the product team is willing to defend in public: 70% time saved on quote intake, 99.2% line-item match accuracy, and three supported languages (en-US, pt-BR, es-US). The post below is how those numbers are earned — and why the architecture will still look sane two years from now.
Announcing Quotery
Quotery targets distributors, contractors, and service shops that move fast. The daily pain is familiar to anyone who has worked inside one of these businesses. A customer sends a shopping list as a messy PDF, or a supplier sends an XLSX that looks nothing like the last one. A sales rep retypes line after line into whatever ERP the company settled on years ago, matching each row against an internal SKU by squinting at description columns and hoping the price is current. Then fulfillment happens somewhere else — on paper, in a separate tool, or in someone's head — and by the time stock moves, three different systems disagree about what was sold.
Quotery collapses all of that into one tenant-scoped platform. An incoming document lands in the AI import endpoint, the system deterministically resolves everything it can against your product catalog, asks an LLM to decide the rest, and drops you onto a draft quote with every line classified. When the rep is happy, closing the quote reserves stock. Posting a delivery note consumes it. A return note brings it back. A stock receipt increments it. Every move writes an append-only ledger row and an immutable audit event. Nothing about that loop requires a second system.
The metrics on the public site reflect real usage shape: intake is the expensive step, so 70% time saved is dominated by the AI import doing the retyping for you. The 99.2% match accuracy is the combined hit rate of deterministic code matching and the batched LLM pick — more on that below. And the trilingual UI is there because Quotery's first customers operate across the US, Brazil, and the Caribbean in the same working day.
The Rational Process Behind It
Quotery's backend enforces a layered architecture where every Django app is split into four packages: models, utils, services, views. This is not a preference; it is enforced by convention and by the shape of the test suite. Each layer owns exactly one concern, and the data flow is one-way on the way in and one-way on the way out.
Models
Schema only. Field definitions, Meta, __str__. Every business model inherits from core.models.BaseModel, which gives you a UUIDField primary key, created_at, updated_at, an is_deleted soft-delete flag, and a soft_delete() method. The auth User is the only documented exception. No business logic lives in the model layer, and there are no custom save() overrides that hide side effects from the rest of the codebase.
Utils
The only layer allowed to touch the ORM. If you see .objects, .filter(), .create(), .save(), or .delete(), you are in a utils/ module. The naming is formulaic — build_<entity>_queryset, apply_<field>_filter, create_<entity>, update_<entity>, delete_<entity> — so a reviewer can predict what is in the file without opening it.
Services
Business logic and validation. Services compose utils; they never call the ORM themselves. Payloads get whitelisted against an ALLOWED_FIELDS set so a client can never inject tenant, is_deleted, or similar fields through a stray JSON key. ValueError signals a validation failure; missing records raise Model.DoesNotExist. Every stock-mutating service runs inside a transaction.atomic() block and locks StockItem rows with select_for_update() in deterministic product-id, location-id order to stay deadlock-free under concurrent closes.
Views
DRF GenericViewSets that do three things: authenticate, deserialize, dispatch to a service. Errors are mapped to HTTP status codes one line at a time — ValueError becomes 400, DoesNotExist becomes 404, PermissionDenied becomes 403. Pagination is DRF's configured PageNumberPagination; nothing is hand-rolled.
Tests mirror the layers one-to-one: test_<entity>_models.py, test_<entity>_utils.py, test_<entity>_services.py, test_<entity>_views.py. A failure tells you exactly which layer it lives in. Linting is ruff in the api container; pre-commit pins black, isort, flake8, and bandit on the host; Swagger via drf-spectacular is mounted only in local and dev so production never exposes an introspection endpoint.
The Stack
Picks are deliberately boring. Boring compounds.
- Backend: Python 3.12, Django 5, DRF 3.15, PostgreSQL 17, Redis 7, Gunicorn.
- Frontend: React 19, Vite 7, TypeScript, Tailwind CSS, Framer Motion, react-i18next.
- Auth: django-allauth (Google OAuth) for the SPA exchange, djangorestframework-simplejwt where a classic JWT flow is needed, session cookies set HttpOnly so the SPA cannot read them from JavaScript.
- Admin: django-unfold with the Quotery logo baked in; drf-spectacular Swagger mounted only in local and dev environments.
- AI: the OpenAI SDK, pointed at
gpt-4.1-minifor import orchestration andgpt-image-1for document-class imagery. - Document handling: WeasyPrint for PDF export, python-magic for MIME detection, pypdf and openpyxl for parsing incoming PDF / XLSX / XLS / CSV files.
- Infrastructure: Docker Compose for the full stack locally, Render.com for cloud deploy. The frontend and the landing page are independent Render static sites; the API is a Render web service.
The rule for any new dependency is a one-paragraph ADR under docs/decisions/. If it is not worth a paragraph, it is not worth the install.
Caching That Doesn't Lie
Redis is in front of every expensive read, and it is the part of the system most likely to go subtly wrong in multi-tenant software. Quotery's rule is that every cache key goes through one helper — core.cache.make_key — which produces keys of the shape qf:v1:<group>:<scope>:<params-hash>. <scope> is a tenant UUID, a user:<user_id> segment for per-user caches like the current-user payload, or the literal global for platform-wide reads. Mixing scopes on a single call raises ValueError. A bare cache.set("dashboard", ...) is unreachable.
The second rule is invalidation over TTL. Every service-layer mutation that touches cached data ends its atomic block with an explicit invalidate_<group>(tenant_id=...) call. TTLs are a safety net, not the correctness bar. A static regression guard walks every @transaction.atomic service function in the repo and fails the suite if the body neither invalidates nor declares itself cache-safe. The static check extends to management commands and ModelAdmin subclasses, since both bypass the service layer. With that net in place, server-side TTLs were raised to 24 hours across the board, except for the authorization-sensitive me cache which stays at one hour. If a role change fails to bust the cache for any reason, the worst case is that one user sees stale permissions for at most an hour — not a day.
The third rule is soft-fail. cache_get_or_set wraps both cache.get and cache.set in try-except and falls through to the producer on any backend exception, logging a WARNING to the core.cache logger. A Redis incident becomes a latency regression, not an outage. Per-group TTLs and invalidation fan-out are documented in an invalidation matrix that lives alongside the cache module, so any new mutation that forgets to update the matrix is caught at review time.
AI Quote Import: Three Calls in an Atomic Transaction
This is the feature that earns the 70% time-saved number. A user uploads a document — PDF, XLSX, XLS, CSV, or pasted plain text — and the system returns a draft quote with every line classified and a human-readable summary. The whole flow runs synchronously inside a single @transaction.atomic block, so if anything fails, nothing persists.
Call A: Extract the structure
Normalized document text goes in, a typed {groups, ungrouped_items} payload comes out. The call uses the OpenAI SDK with a strict JSON-schema response format, so the model cannot return shape-drifted data. When Call A returns an empty payload the orchestrator raises ValueError("no_items_detected"), which the surrounding atomic transaction converts into a clean rollback of the empty Quote row that would otherwise leak into the database.
Deterministic matching
Before the model sees a single product, Quotery runs exact case-sensitive equality against four code columns on Product: sku, import_code, internal_code, and export_code. Every column carries a GIN trigram index. Every hit becomes an EXACT_MATCH line and skips the LLM entirely. This is where the cost and accuracy numbers really come from — catalogs with well-maintained codes resolve most lines for free.
Shortlist plus Call B: batched pick-or-reject
For every line the deterministic step missed, Quotery builds a shortlist of up to 15 candidate products using the first eight tokens of the line description against name plus all four code columns. Then all unmatched lines are sent to the LLM in a single batched call. The model picks one candidate id per line or returns a rejection. Batching the decisions trades one round trip for many — a cost and latency win that gets bigger as documents get longer.
Every returned id is validated against the shortlist before it touches the database. If the model hallucinates an id that was never in the candidate set, or a cross-tenant id that leaked in some other way, that line degrades to NOT_FOUND rather than binding to a wrong product. This guardrail matters: a mispriced quote at scale is more expensive than a quote with a manual line left for the rep.
Call C: the summary banner
After the lines persist, Quotery asks for a one-to-three-sentence summary in the user's locale — English, Portuguese, or Spanish. The summary is ephemeral. It returns inside the HTTP response envelope, renders once in an import banner on the resulting quote detail page, and clears on reload. It is never written to the database, so the feature has no migration and no cleanup story.
Match-kind chips and the atomicity guarantee
Every persisted line carries an import_match_kind: EXACT_MATCH, AI_DECISION, NOT_FOUND, or MANUAL for hand-entered rows. The SPA renders a chip next to each line's product label so the rep sees at a glance how every row was classified. Nothing about AI import reserves stock — imported quotes land in draft, the rep reviews, and the usual close flow runs reservation with the full concurrency guarantees described below.
Quote Lifecycle + Stock Ledger
A Quote lives on a state machine: draft → sent → closed → partially_delivered → delivered, with cancelled branches from draft, sent, and closed. Per-tenant-per-year numbering uses a gapless allocator — Q-YYYY-NNNN — backed by a SELECT ... FOR UPDATE on a QuoteNumberSequence row so two concurrent closes in the same tenant-year cannot collide. Cancelling a closed quote writes RELEASE ledger rows to unwind its reservation; cancelling a partially-delivered or delivered quote returns a 409 and tells you to post a return note instead.
Stock is tracked in an append-only ledger. Every change to a StockItem writes a StockMovement row with a typed kind — RECEIPT, DELIVERY, ADJUSTMENT, RETURN — a signed delta, and optional source-document foreign keys. The ledger is enforced as append-only even for staff: StockMovement._meta.default_permissions is empty, and the admin overrides has_add_permission, has_change_permission, and has_delete_permission to False. Manual corrections go through a dedicated POST /api/stock/{id}/adjust/ endpoint that is admin-only and requires a non-blank notes field.
Two invariants deserve calling out. First, on_hand is allowed to go negative — backorders are a real thing in the businesses Quotery serves, and hiding them behind a 409 is worse than surfacing them. Second, reserved is allowed to exceed on_hand: closing a quote when the warehouse is short does not 409; it returns a shortages[] list in the response so the rep can act on it. Availability enforcement happens at delivery time, where it belongs — the moment stock actually moves. Multi-location is native: every StockItem is keyed on a Location, locations are tenant-scoped, and the default location is auto-seeded when a Tenant is created.
Fulfillment runs on three documents with a shared draft-to-posted state machine and per-tenant-per-year numbering on a single DocumentNumberSequence table keyed by (tenant, prefix, year). Delivery notes use the prefix DN-YYYY-NNNN; posting one consumes on_hand plus reserved and transitions the parent quote to partially_delivered or delivered. Return notes use RN-YYYY-NNNN; posting one adds to on_hand only and leaves the parent quote unchanged. Stock receipts use SR-YYYY-NNNN; they are supplier-inbound, have no quote link, and add to on_hand.
RBAC, Audit, Share Links, and Answers
Authorization runs on a custom permission catalog plus a group-permission mapping. Quotes are visibility-gated by auth.Group membership: commercial users see their own quotes plus quotes shared by their group; admin and manager roles see tenant-wide. Warehouse users see inventory and fulfillment documents, but not the sales surface at all. Every cross-tenant lookup returns 404 rather than 403 so the API never confirms the existence of records in another tenant.
Every business mutation writes an immutable AuditEvent in the same transaction as the write. StockMovement is the audit trail for inventory; AuditEvent is the audit trail for everything else. Together they give an operator a full reconstruction of who did what, when, and against which record.
PDF export runs through WeasyPrint — the Quote detail page has a one-click "download PDF" action that renders the same layout the customer sees on a public share link. QuoteShare issues a read-only public URL with a rotated token, so a rep can send a link that survives a tab close without opening up the rest of the tenant to anonymous traffic. The contact form at /contact writes anonymous submissions into ContactMessage for the platform team to triage.
The UI is fully trilingual — en-US, pt-BR, and es-US — with locale-prefixed routes on the SPA. A shared qf-language cookie on the .quotery.io parent domain carries the user's choice across the landing page, the app, and any future subdomain. And the product ships with "Answers, not dashboards": an embedded assistant that lets the user ask natural-language questions about their own quotes, clients, and stock. When a rep wants to know which three clients bought the most of a given SKU this quarter, they ask; the assistant composes the query, the tenant-scoping rules keep the answer inside the tenant, and the response comes back as a sentence and a small table. No BI tool. No exports.
What's Next
- Promote the synchronous AI import path behind a background worker for documents over a few hundred lines, so the 180-second timeout stack becomes a non-issue.
- Map OpenAI SDK errors to translated toasts in the SPA instead of surfacing them as generic 500s.
- Ship a public, stable REST API surface under a versioned prefix for tenants who want to integrate Quotery into their own ERPs.
- Add streaming of intermediate import stages so the processing page reflects actual progress rather than elapsed-time pacing.
- Expand the embedded assistant's grounding to cover fulfillment documents, not just quotes and stock.
Try It
Quotery is live at https://www.quotery.io. If your team is bleeding hours on quote intake and wants a tenant-scoped platform that runs the rest of the loop, talk to us at https://www.quotery.io/contact. DHD Tech built it and runs the roadmap; we are taking on a small number of early design partners.