Artificial intelligence is reshaping how work gets done, and the flood of startups riding this wave may be disorienting for investors. The upside is obvious: cheaper inference, smarter models, new workflows, and business fundamentals that can improve by double-digit percentages. The hard part is separating demand from experimentation, durable moats from temporary model advantage, and quality revenue from creative accounting
Below, we provide a list of good practices for evaluating AI companies. We start with the big picture, then move into how to think about market size when AI displaces traditional budgets (like labour), why release cycles and platform dynamics cause user churn, what moats actually reduce switching, what to inspect (qualitatively and quantitatively) to judge product quality, and finally how to diligence revenue recognition, ARR, and projections.
We close by connecting diligence to valuation, where conviction in a particular future can be converted into an understanding of today’s value. In fast-moving categories, investors often anchor on comps (similar deals) to triangulate value. In AI, that habit has quickly become procyclical: each hype-fueled round lifts the reference point for the next, creating a feedback loop where pricing outruns fundamentals.
1) AI’s Opportunity and Today’s Market Confusions
There is genuine, compounding economic energy behind AI: as tasks shift from human effort to software, marginal costs fall and throughput rises. Entire workflows, from customer support to coding assistance to claims processing, can be re-imagined. But opportunity is not the same as investability. AI markets are unusually contestable because switching costs are low, underlying model prices fall over time, and the gatekeepers of distribution (clouds, productivity suites, IDEs, CRMs) can quickly redirect demand. The same forces that make AI powerful also compress the average revenue per user (ARPU) and erode margins for undifferentiated vendors.
A rational investor lens needs to do two things at once: acknowledge that the needs-served market can expand dramatically as price falls (because buyers do more when it’s cheaper), and assume more intense competition and faster price decay than in classic SaaS. In other words: big pie, but a fight over the slices.
Investor checklist (big picture).
- Anchor on units of work (tickets, claims, PRs merged), not on budgets.
- Expect ARPU compression as models and orchestration get cheaper.
- Prioritize companies that own the workflow or have distribution advantages (defaults, OEM, marketplace position).
- Treat “state-of-the-art model” claims as ephemeral unless paired with moats.
2) Augmentation vs. Replacement
Many pitches assume that if AI can replace a function, the “TAM” equals today’s payroll for that function. That is usually wrong. When you replace labor with software, the supply curve shifts outward: quantity demanded increases, but price per unit of work often drops faster than usage rises—especially in competitive markets. Your buyer-spend TAM at steady state equals post-automation price × resulting quantity, not the old payroll. Elasticity matters: in highly elastic workflows, revenue pools can grow; in inelastic or capped workflows, vendor revenue can shrink even as customers do more.
Near-term, most adoption is augmentation, not full replacement. Gains come from assistance embedded in familiar tools; headcount displacement arrives later, and often asymmetrically (avoided hires vs. immediate cuts). Model TAM in two layers; Augmentation TAM (today’s workflow with assist) and Replacement TAM (re-engineered workflow); and do not pay replacement multiples for augmentation-only evidence.
Investor checklist (TAM).
- Build TAM from work units and post-automation price curves.
- Separate Augmentation TAM (assist, partial automation) from Replacement TAM (re-engineered flow).
- Haircut ARPU for annual price step-downs and platform take-rates.
- Assume faster price competition where switching is easy (APIs, connectors).
3) Release Cycles, Distribution, and the “Sloshing User” Problem
AI adoption is spiky. New foundation-model releases trigger trial surges; users “slosh” between providers when benchmarks or usability leap ahead. That volatility is amplified by distribution: if a gatekeeper gives end users a toggle between models or providers inside a default surface (e.g., office suites, IDEs), short-lived model advantage is worth less than placement. For downstream apps (IDEs, agents, support assistants), policy or pricing changes by upstream model vendors can shift performance, cost, or access overnight.
What this means for underwriting: don’t pay for post-launch vanity metrics. Focus on cohorted retention and the startup’s ability to maintain outcomes even when the underlying model changes. Ask for a demonstration of multi-model routing and failover. Then assume your customers can flip, too.
Investor checklist (release cycles & distribution).
- Track post-release spike → decay half-life; value the area under the retention curve, not peak MAU.
- Inspect model swap resilience: outcomes stable when the LLM changes?
- Quantify distribution dependence: what share of new users comes via surfaces controlled by others (marketplaces, clouds, productivity suites)?
- Ask for 72-hour re-platform drills: if a provider changes access or price, what breaks?
4) “Real Market Pull”
The cleanest evidence of product truth is a business-owner budget purchasing the tool to move a named KPI, not an IT experimentation line item. Real pull shows up as rapid adoption within a workflow, renewal from the P&L that benefits from it, and “if this vanished tomorrow, we’d need to reassign/re-hire X FTEs” feedback. In practice, that looks like: support AHT falling, claims cycle time shortening, PR throughput rising, or sales email reply rates improving with downstream revenue lift.
Work backward from the job-to-be-done: trigger → input data → decision rules → action → measurable outcome. If the vendor only owns the chat interface and not the data ingestion, enrichment, and actioning layers, they can be swapped out. Market pull couples to depth of workflow ownership.
Investor checklist (market pull).
- Ask for ten reference calls in three segments; record which line item funds the tool.
- Require before/after metrics on the business KPI, not just usage.
- Look for expansion to adjacent workflows inside the same department within 6–12 months.
- Inspect lost and churned accounts; code reasons and time-to-churn.
5) Moats That Matter
In a world where model capability diffuses fast and prices fall, the moats that persist are (a) workflow capture, (b) proprietary context and feedback loops, (c) multi-model orchestration, (d) distribution control, and (e) compliance and governance. Performance that depends on private corpora (tickets, code, claims) and task-specific feedback (labels, QA traces, playbooks) travels with the vendor; performance that depends only on which LLM they call is fragile. Embedding into upstream and downstream systems (so the tool reads, writes, and closes loops) raises switching costs.
A practical litmus test: ask the company to swap out its underlying model and show KPI drift. If the outcomes hold, the moat is in the system, not the model. If the outcomes collapse, you are underwriting someone else’s roadmap.
Investor checklist (moats).
- Integration depth: # systems with bi-directional connectors; % of actions executed automatically.
- Data moat: share of inferences augmented by private retrieval; volume/quality of feedback traces.
- Orchestration: policy for provider routing/failover; margin by route.
- Distribution: OEM/default placements; marketplace position; exclusive channels.
- Governance: audit trails, red-teaming, PII/PHI controls; time-to-security-approval at enterprises.
6) What to Examine: Qualitative and Quantitative
Great AI companies explain the workflow in concrete terms and show cohorted, compounding usage. Your diligence should therefore pair narrative depth (who buys, why now, what breaks without it) with hard telemetry (retention curves, engagement ladders, unit economics by provider). Treat this as a funnel: problem → product → adoption → outcome → expansion.
On the numbers side, split cohorts by product, segment, and pricing model (subscription vs. usage). Usage-only expansion is fragile when prices fall; seat/workflow expansion is healthier. Disaggregate gross margin by model route and by product SKU. Inspect contracts for pilots, minimum commits, and credit policies, which shape both revenue recognition and durability.
Qualitative examination.
- Jobs-to-be-done maps for top 3 use cases (trigger → data → decision → action → KPI).
- Buying center interviews: budget owner, economic buyer, power user, security.
- Implementation friction: time-to-value, change management, training, “seed then automate” path.
- Churn narratives: interview five churned customers; identify systemic vs. idiosyncratic causes.
Quantitative examination.
- Cohorts: logo, gross $, and net $ retention at 90/180/360 days; break out by subscription vs. usage.
- Engagement ladders: % of seats or workflows active weekly; p50/p90 activity; % tasks fully automated.
- Unit economics: revenue and COGS by provider route (OpenAI/Anthropic/local), by product; trend in inference cost/price.
- Contract anatomy: pilots vs. production, min-commits vs. on-demand, true-ups/overages; credit breakage history.
7) Revenue Recognition, ARR, and the Classic Pitfalls
AI startups often blend subscriptions, usage, and services. That makes revenue recognition and ARR disclosure error-prone even when intentions are good. Your goal is to separate quality recurring revenue from one-offs and from accounting choices that inflate optics.
Key hot spots: (1) Usage-based revenue must be recognized as usage occurs; committed “stand-ready” obligations are ratable only when criteria are met; variable consideration must be constrained. (2) Principal vs. agent: reselling a third-party model/API may be net revenue, not gross — material for revenue multiples. (3) ARR must exclude one-time services and pilots; annualizing a spiky month is misleading. (4) Prepaid credits & breakage: recognize breakage only when reliably estimable; disclose policies and history. (5) Round-tripping / rebates and channel-stuffing analogs: beware circular deals, end-of-period pre-buys, or refundable “commits.”
ARR & revenue diligence checklist.
- ARR reconciliation to revenue by component: subscription, usage, services, pilots, credits.
- Policy memos: usage recognition, variable consideration, principal–agent, credits/breakage.
- ARR hygiene: definitions approved by the board; method notes; changes across periods.
- Quarter-end spikes: commits vs. consumption; rights of return; refunds/cancellations.
- Gross margin by route: provider pass-through vs. value-add; sensitivity to price cuts.
8) Cohorts, Projections, and Scenario Planning
In fast, price-competitive categories you need forecasts that bake in price decay and test elasticity. Build three cases (Base, Bear, Bull) using explicit assumptions for price per unit, adoption rates, seat/workflow expansion, and the proportion of growth that depends on usage vs. subscription. Tie every assumption to observed cohort behaviors. Where distribution gatekeepers can switch models/providers, cap the pace and durability of expansion to what those channels historically allow.
A special focus is gross margin quality under competition. If the company’s growth depends on the most expensive model route, and the category keeps getting cheaper, can routing/pruning keep margins stable? If not, haircut the long-term operating model.
Scenario template (use in models).
- Inputs: seats/workflows adopted; price per unit (declining curve); usage elasticity; mix (subscription vs. usage); provider route margins; distribution constraints.
- Base: gradual price decline, steady adoption, moderate elasticity, small workflow expansion.
- Bear: faster price cuts, gatekeeper routing to competitors, low expansion, minimal seat growth.
- Bull: second workflow lands, high elasticity, automation increases share-of-wallet even as ARPU falls.
9) Step-by-Step Deal Process
The best AI investments follow a consistent discipline: validate the market’s economic energy, prove product truth with cohorts and outcomes, de-risk moats against the release-cycle storm, and normalize revenue quality before debating valuation. Doing these steps in order prevents overpaying for transient spikes and helps you back the companies most likely to become default workflow layers.
Below is a compact, stage-by-stage guide you can lift into your deal playbook.
Stage 1 — Big-picture screen.
- Map target workflows and units of work; test if demand expands enough to offset expected price drops.
- Identify distribution chokepoints and potential gatekeepers.
- Confirm the startup’s thesis isn’t “we’re the best model today.”
Stage 2 — Problem & product truth.
- Ten customer calls; document KPI change and budget provenance (not “innovation spend”).
- Jobs-to-be-done maps for top 3 use cases; confirm closed-loop actions (not just chat).
- Implementation friction and stickiness: time-to-value, % of users touching product weekly.
Stage 3 — Moats & resilience.
- Live model-swap demo; KPI drift within tolerance?
- Multi-model routing & 72-hour re-platform drill.
- Integration depth, proprietary context, feedback loops; governance/audit.
Stage 4 — Revenue quality & accounting.
- ARR policy and reconciliation; exclude one-offs.
- Revenue recognition memos: usage, variable consideration, principal–agent, credits/breakage.
- Quarter-end analysis: commits vs. consumption; refundable terms; side letters.
Stage 5 — Cohorts, margins & forecasts.
- Cohorts by product/segment; 90/180/360-day logo/$ retention; post-pilot conversion.
- Gross margin by provider route; sensitivity to price cuts and routing shifts.
- Base/Bear/Bull projections with explicit price-decay and elasticity; cap expansion by distribution constraints.
Stage 6 — Pre-term-sheet synthesis.
- Summarize what breaks the thesis (e.g., gatekeeper rerouting, price race, lack of workflow ownership).
- Convert diligence into operating covenants or milestone-based terms where appropriate.
- Only then proceed to valuation discussions.
10) Putting It All Together: Valuation and IC
When you brief your investment committee, frame the narrative around durability and convertibility. Durability comes from workflow ownership, proprietary context, and distribution—not from any single model release. Convertibility is the bridge from technical capability to business results: cohorts that keep using the product after the honeymoon, margins that survive price compression, and revenue that stands on GAAP legs rather than run-rate algebra.
Make it explicit where value accrues: to the startup (moats and defaults), to the platform (gatekeepers), or to the customer (consumer surplus). Your job is to back the companies that keep enough of the value they create to compound.
IC one-pagers (structure).
- Thesis: workflow, units of work, elasticity, price trajectory.
- Evidence: KPI lift, cohort retention, model-swap resilience.
- Risks: distribution dependence, price compression, policy/access changes.
- Economics: ARR quality, margin by route, forecast scenarios.
- Decision: what we’ll pay, what must stay true, what triggers a re-underwrite.
- Conclusion: From Conviction to Valuation
You’ve identified a compelling opportunity: a startup that solves a real problem in a market with energy, proves durable advantage beyond “best model today,” and shows cohorts that stick because outcomes matter. You’ve pressure-tested revenue recognition, normalized ARR, and built scenarios that account for price decay, elasticity, and distribution constraints. What happens next?
This is the moment to translate diligence into valuation logic. The inputs you’ve built (normalized current revenue, margin by route, cohort durability, and defensibility) inform your forward cash-flow expectations and the risk profile you should price. If most growth depends on usage expansion amidst falling unit prices, you assign a steeper discount and tighter terms; if growth comes from workflow expansion and distribution lock-ins, you can underwrite a longer compounding runway. In both cases, the move from conviction to terms is best made with a consistent valuation framework.
Once you’ve finished the diligence steps outlined above, you have the clean revenue base, credible scenarios, and risk adjustments needed to feed a transparent valuation process. The output is not just a number; it’s a defensible bridge between what is true today and the future you believe the company can earn. It reflects the shared understanding of future expectations that you have reached with the founders, by scrutinizing their assumptions. It reflects the return you hope to achieve in future, and the risk associated with getting there. And it reflects the cost of capital; what else you may have been able to do with the capital.
Appendix: Fast Reference Checklists
Market & product
- Units-of-work TAM with elasticity; price-decay haircuts
- Budget provenance (line item) and KPI lift
- Jobs-to-be-done maps; closed-loop actions; implementation friction
Moats
- Workflow integration depth; proprietary context + feedback
- Multi-model routing; model-swap resilience; re-platform drill
- Distribution: OEM/defaults/marketplaces; governance/audit trails
Revenue quality
- ARR policy & reconciliation; exclude one-offs/pilots
- Usage recognition, variable consideration, principal–agent
- Credits/breakage, quarter-end spikes, side letters/contra-revenue
Cohorts & margins
- 90/180/360-day logo/$ retention; post-pilot conversion
- Gross margin by provider route; sensitivity to price cuts and routing
- Base/Bear/Bull with explicit price and elasticity assumptions
When those boxes tick the right way, you’ve earned the right to discuss valuation with confidence and to separate enduring AI businesses from the noise.
FAQs: Quick Answers for Investors
-
How is valuing AI startups different from classic SaaS?
Price decay is faster, markets are more contestable, and distribution gatekeepers can reroute demand. Value defensible workflow ownership and data moats over transient model leads.
-
Should TAM equal today’s payroll for the function replaced by AI?
No. Use post-automation buyer spend (price × quantity) with elasticities and price-decay haircuts. Separate augmentation from replacement.
-
What’s the single best sanity check on revenue quality?
A board-approved ARR policy and a line-item ARR→GAAP reconciliation that excludes pilots, services, and credits — plus route-level margin sensitivity to price cuts.
-
How do I test defensibility quickly?
Ask for a live model-swap demo, a 72-hour re-platform drill, and a margin-by-route breakdown. If outcomes collapse, the moat lives with the model, not the company.
-
How to do valuation for AI startups?
Once you’ve reached conviction in the business idea, and have some faith in the underlying financial metrics and projections, it’s relatively straightforward to plug these inputs into Equidam to produce a transparent valuation report built on best practices.