Scope check // cost per verified outcome

Humans are not cheaper at every task. They are still cheaper at many complete workflows.

AI can classify, summarize, and generate at extraordinary speed. For high-volume, repeatable work, automation can win decisively. The misleading comparison is one employee salary versus one API invoice. A defensible comparison is total cost per accepted outcome: compute, integration, data preparation, monitoring, human review, exception handling, remediation, compliance, and expected loss when the system is confidently wrong.

This analysis compares those full systems. It does not equate a biological brain with a GPU benchmark, and it does not pretend every AI deployment fails. The practical conclusion is narrower: automate stable volume, preserve accountable judgment, and measure the hybrid workflow instead of the model demo.

01 // The economic reality behind the demo

Technical feasibility is not the same as economic substitution.

MIT FutureTech modeled the end-to-end cost of computer-vision automation, including the system needed to reach a task’s required performance. Its 2024 study found that, at then-current costs, only 23% of wages paid for vision tasks were economically attractive to automate. That result is limited to computer vision, not a universal score for generative AI, but its method exposes the missing denominator: firms buy functioning systems, not benchmark capability.

23%of wages for vision tasks attractive to automate in MIT FutureTech’s model
>80%of firms reported no AI impact on productivity or employment over three years
1.4%average productivity gain executives forecast for the next three years

A February 2026 NBER survey of almost 6,000 executives in the United States, United Kingdom, Germany, and Australia found roughly 70% of firms actively used AI. Yet more than 80% reported no impact on either productivity or employment during the previous three years. Executives still expected a 1.4% productivity increase over the next three. Adoption is real; realized firm-level transformation remains much smaller than the keynote version.

Why? The model is one component inside authentication, retrieval, permissions, evaluation, observability, fallback queues, and support. Variable inference charges may fall while engineering and governance remain stubbornly human. Cheap generation can even increase total cost by creating more output that someone must verify.

02 // Hardware metrics without benchmark theater

A 20-watt brain versus a 10.2-kilowatt AI system is useful context—not an apples-to-apples race.

Neurochemistry estimates put the human brain’s metabolic power near 20 watts. NVIDIA specifies up to 700 watts of thermal design power for one H100 SXM GPU. Its eight-GPU DGX H100 datasheet lists approximately 10.2 kilowatts of maximum system power, before allocating facility cooling and networking overhead to a workload. That is roughly 510 times the brain’s 20-watt figure at the system boundary.

Biology~20 W

Human brain

General reasoning, perception, memory, social context, and continual adaptation inside one biological system.

Accelerated compute~10.2 kW

DGX H100 maximum

Eight H100 GPUs plus CPUs, memory, storage, and networking engineered for massive parallel computation.

Do not turn that ratio into “one person is 510 times more efficient.” GPUs process workloads at speeds no human can approach, while a person’s full body, workplace, and salary consume resources beyond brain wattage. The honest lesson is that human cognition arrives with unusually broad capabilities already integrated. AI needs servers, orchestration, data pipelines, cooling, security, and a person who knows when the answer is nonsense.

03 // The long tail changes the unit economics

The expensive AI output is not the common answer. It is the rare answer nobody catches.

A workflow may be 95% routine yet remain unsafe to automate because the remaining 5% contains fraud, unusual contracts, vulnerable customers, novel medical presentations, or irreversible actions. Models learn statistical regularity. Businesses are often sued, fined, or abandoned over the exception.

MIT researchers studying chest X-rays found that a self-supervised model surpassed radiologists in predictive ability across a long tail of diseases. That is important evidence against lazy claims that humans always win rare cases. The same paper still noted deployment hurdles: converting model scores into diagnostic decisions requires thresholds, and threshold selection is especially difficult for rare pathologies. Prediction is not the entire clinical workflow; accountability and action remain.

Total cost per verified outcome compute + integration + oversight + exceptions + expected failure loss

Human review is also not free or infallible. The design goal is selective verification: confidence thresholds, risk tiers, sampled audits, mandatory escalation for consequential decisions, and a kill switch. If every output needs line-by-line approval, the system has not removed labor; it has transformed a skilled worker into a slower proofreader.

04 // Operational case study: Klarna

The headline measured conversations. The mature strategy restored a human lane.

Klarna is a useful case because its results are neither pure triumph nor simple reversal. In May 2024, the company said its AI assistant had engaged more than four million customers and would produce $40 million in annualized savings. Later SEC disclosures said the assistant handled 69% of customer-service chats during the twelve months ending June 2025, performed work equivalent to more than 700 full-time agents, and delivered about $39 million in 2024 savings.

Then the hidden metric—quality—became impossible to treat as a footnote. In a May 2025 comment letter, the SEC cited public remarks from Klarna’s CEO describing some AI services as lower quality than human representatives and asked the company to explain how its approach had changed. Klarna’s final filing described a dual-track model: scalable AI plus high-quality human support, with customers retaining the option to reach a person.

Data work

Knowledge cleanup, taxonomy design, labeling, retrieval maintenance, and feedback datasets.

Behavior work

Prompt design, tool permissions, evaluations, regression tests, and model-change reviews.

Assurance work

Human auditing, escalations, incident response, appeals, and customer recovery.

A useful procurement exercise is to build an AI cost ledger before approving the pilot. Put every recurring dependency on it, even when another department currently pays the bill:

  • Model: input and output tokens, reserved capacity, retries, embeddings, fine-tuning, and evaluation calls.
  • Platform: vector storage, orchestration, logging, guardrails, identity controls, networking, and disaster recovery.
  • People: data labeling, prompt and knowledge-base maintenance, domain review, security testing, and support.
  • Control: red-team exercises, compliance evidence, drift monitoring, model upgrades, vendor assessments, and access reviews.
  • Failure: rework, customer credits, appeals, incident investigation, legal advice, notification, and reputation repair.

Klarna does not publicly itemize every labeling, prompt-engineering, evaluation, or auditing dollar, so those costs cannot honestly be subtracted from its claimed savings here. That reporting gap is the lesson: gross automation savings are not total cost of ownership. Measure resolution quality by complexity tier, repeat contacts, escalations, complaints, and remediation—not chats closed.

05 // Hallucinations become balance-sheet events

The company owns the answer even when the model wrote it.

In Moffatt v. Air Canada, a customer relied on incorrect bereavement-fare guidance from the airline’s chatbot. The British Columbia Civil Resolution Tribunal found Air Canada liable for negligent misrepresentation and ordered CAD $812.02 in damages, interest, and fees. The amount was small; the precedent for operating logic was not. A chatbot is not an independent legal entity that can absorb responsibility for its publisher.

In healthcare, lending, insurance, employment, and legal services, failure costs include patient harm, discrimination claims, adverse regulatory action, professional negligence, breach notification, rework, and lost trust. The EU AI Act adds statutory exposure: the European Commission’s guidance lists penalties up to €35 million or 7% of worldwide annual turnover for certain prohibited practices or data-related noncompliance, with different thresholds for other violations.

06 // A financially honest automation test

Choose AI, humans, or a hybrid by workflow shape.

Workflow signalBest starting modelReason
High volume, stable rules, reversible outputAI-ledScale can amortize integration and oversight.
Messy context, low volume, changing policyHuman-ledFlexibility beats a large support stack.
Fast draft, expert accountable for final resultHuman + AIAutomation compresses routine work without outsourcing judgment.
Irreversible action or severe legal exposureHuman gateVerification and named accountability are part of the product.

The strongest evidence favors augmentation. An NBER study of roughly 5,000 customer-support agents found AI assistance increased issues resolved per hour by 13.8%, with the largest gains among less experienced workers. The agents could accept or ignore suggestions. That architecture captured machine speed while keeping human context and responsibility in the loop.

Before approving automation, run a shadow deployment and record cost per correctly resolved case, p95 latency, escalation rate, repeat contacts, severity-weighted errors, review minutes, and incident cost. Include model migration and vendor-exit expense. If the business case survives those numbers, automate confidently. If it depends on pretending reviewers, power, compliance, and mistakes are free, the human was cheaper all along.

That does not make human labor free, perfectly accurate, or endlessly scalable. It makes human adaptability an asset with a price that is already visible. AI costs are often fragmented across cloud, product, security, legal, and operations budgets, which makes automation look cheaper than it is. Put every cost on one ledger, assign an owner to every consequential output, and compare systems on verified business value. The future is not humans versus machines; it is disciplined combinations outperforming careless replacement.

Resources // primary research and official records

Sources and further reading

  1. MIT FutureTech — Beyond AI Exposure: Which Tasks Are Cost-Effective to Automate with Computer Vision?
  2. NBER Working Paper 34836 — Firm Data on AI
  3. NBER — Measuring the Productivity Impact of Generative AI
  4. AEA Papers and Proceedings — Comparative Advantage of Humans versus AI in the Long Tail
  5. NVIDIA — DGX H100 Datasheet
  6. NCBI Bookshelf — Regulation of Cerebral Metabolic Rate
  7. Klarna Investor Relations — Q1 2024 results and AI savings claim
  8. U.S. SEC — Klarna final prospectus and dual-track customer-support disclosures
  9. BC Civil Resolution Tribunal — Moffatt v. Air Canada, 2024 BCCRT 149
  10. European Commission — AI Act implementation and penalties

Method: Economic figures are reported with their original scope and dates. Hardware figures compare power envelopes, not equivalent intelligence or work. Company savings are attributed claims, not independently audited total-cost calculations. Legal examples are informational and are not legal advice.

Written, sourced, and technically reviewed by

Kawshik Ahmed Ornob

Cybersecurity specialist, AI and NLP researcher, and full-stack engineer writing about the economics, security, and operational reality of production AI.