The price did not change. The invoice did.
That is the situation facing any SaaS or fintech company running Anthropic or Microsoft AI tools on an enterprise plan. In April 2026, Anthropic shifted from flat-seat pricing to a $20/seat base plus usage-based billing at API token rates. Microsoft is moving GitHub Copilot to the same structure in June. The headline numbers look similar. The actual cost does not.
What changed and why it matters to your P&L
Token-based billing means your AI spend scales with how intensively your team uses the tools, not how many seats you provisioned. That sounds reasonable until you model it.
According to SVB's 2026 CFO Survey of 230 VC-backed finance leaders, median AI tool spend across their portfolios went from roughly $2,000 in 2024 to $20,000 in 2025 and is tracking toward $50,000 in 2026. That is a 25x increase in two years. The survey also found that 85% of organizations miss their AI cost forecasts by more than 10%, and nearly a quarter miss by more than 50%.
The gap between budget and reality is not incompetence. It is that flat-rate pricing made AI spend look like a known fixed cost. It never was.
The tokenizer problem nobody is explaining
The billing model shift is only part of the problem. In mid-April 2026, Anthropic released Claude Opus 4.7 with an updated tokenizer that produces up to 35% more tokens for the same input text, at unchanged published rates. According to Finout's analysis of the release, a request that cost $0.10 on Opus 4.6 costs up to $0.135 on 4.7 with no price change on the rate card.
For a SaaS company that embeds AI features into its product, that 35% cost increase flows directly into COGS. If your product pricing is based on a flat subscription, your gross margin absorbs the entire increase with no offset.
This is not a hypothetical risk. The FinOps Foundation's State of FinOps 2026 report, covering more than $83 billion in managed tech spend, found that only 7.5% of enterprises currently build FinOps practices into AI projects. IDC's FutureScape 2026 warns of up to a 30% rise in underestimated AI infrastructure costs by 2027, driven specifically by what it calls "Context Window Creep": models getting larger, context windows getting longer, and token counts growing with every upgrade cycle.
What AI-embedded gross margins actually look like
If your financial model still shows 75-80% gross margins on AI-touched revenue, the model is wrong.
ICONIQ's 2026 State of AI benchmark, covering roughly 300 software executives across a range of ARR sizes, puts average AI product gross margins at 52% in 2026. That is up from 45% in 2025 and 41% in 2024, but still 15-30 points below the traditional SaaS norm. Inference alone averages approximately 23% of revenue at scaling-stage AI B2B companies. Bessemer's analysis puts thin-wrapper AI gross margins as low as 25%.
Snowflake's FY27 guidance, released in February 2026, shows the company intentionally accepting a 75% non-GAAP product gross margin, down from 75.8%, to fund its Cortex AI platform. That is a best-in-class hyperscaler with dedicated infrastructure teams making a deliberate trade. A $5-15M SaaS company with no FinOps function does not have the optimization muscle to manage that compression the same way.
Three things to do before your June renewal cycle
The first is to reforecast your 2026 AI opex on a per-token basis, not a per-seat basis. Pull actual token consumption from your vendors for the last 90 days and model forward at 1.2x to account for usage growth and tokenizer drift. If you do not have that data, request it from your vendor before the next invoice cycle.
The second is to put usage caps and cost alerts on every AI vendor account now. Most enterprise AI platforms support spend limits or API rate limits at the account or key level. Setting them takes less than an hour and prevents a single power-user workflow from generating a five-figure overage before anyone notices.
The third is to break AI inference into its own COGS line if it does not already exist. Blending it into general hosting or software costs masks the unit economics problem. A board member looking at your gross margin trend needs to be able to see AI-attributable infrastructure as a separate driver, not a buried variance.
If your 2026 model still treats AI spend as a fixed line item and your product roadmap includes AI-embedded features, the gap between your forecast and your actuals is going to show up in the board pack before it shows up in a planning session. Hudson CFO Solutions works with SaaS and fintech founders on exactly this kind of reforecast and cost architecture work. Book a strategy call if you want a second set of eyes on the numbers before June.