Why AI vendor evaluations fail before they start

Most health systems evaluate AI vendors the way they buy medical devices: RFP, committee, score sheet, contract. This process was designed for hardware. It fails for software. It fails catastrophically for AI.

The criteria problem

A medical device does what it says on the label. An AI system does what it learned, and what it learned depends on your data, your workflows, and your patient population. Accuracy benchmarks from a vendor's white paper mean almost nothing without knowing whether those numbers hold in your environment.

The typical evaluation criteria, HIPAA compliance, SOC 2 certification, uptime guarantees, are necessary but not sufficient. They tell you that a vendor has passed a bar. They tell you nothing about whether the tool will reduce physician documentation time by 40% or by zero.

The people problem

The committee that evaluates AI vendors in most health systems is made up of IT, procurement, and a few clinical champions who volunteered to participate. The people who will actually use the tool, the charge nurses, the hospitalists, the ED attending who will have to click through three extra screens to dismiss an AI alert, are rarely in the room.

This produces evaluations that select for features rather than adoption. A vendor wins because their demo looked clean, not because their change management support is exceptional.

What to do instead

Start with use cases, not vendors. Define the workflow problem with specificity, not "reduce documentation burden" but "reduce the time a primary care physician spends on after-visit notes for a standard 15-minute encounter." Build your evaluation criteria around that problem.

Then identify vendors who solve that specific problem and evaluate them in your environment. Not a demo environment. Yours. With your EHR, your templates, your patient population.

Design a pilot that actually tells you something. Six physicians for four weeks. Measure before and after. Include a control group if you can. The goal is not to validate a vendor's claims, it's to discover whether those claims hold in your context.

The vendors who will fight this process hardest are the ones whose claims won't hold. The vendors worth working with will ask for the same thing.

Why AI vendor evaluations fail before they start

The criteria problem

The people problem

What to do instead

Get The Signal

The hidden cost of physician documentation time

What health systems actually need from a clinical LLM