AI Vendor Risk Checklist for Procurement and IT

A practical AI vendor due-diligence checklist and scoring model for procurement and IT teams to assess risk, SLAs, data controls, and compliance.

AI vendor risk is no longer a niche security concern. For procurement teams, IT leaders, and compliance stakeholders, third-party AI has become a supply chain decision with legal, financial, and operational consequences. A model can be brilliant at summarisation and still be unsafe to deploy if its provenance is unclear, its training data controls are weak, its update policy is opaque, or its support commitments collapse under real-world usage. That is why AI procurement needs a structured vendor assessment model, not just a feature comparison.

This guide is a practical checklist for evaluating AI vendors before purchase, during contract negotiation, and after go-live. It is designed for UK-focused technology teams who need to move quickly without accepting hidden risk. If you are also building internal governance for conversational systems, you may want to pair this process with our guides on chat integration for business efficiency, agentic workflow settings, and workflow UX standards so the vendor you choose fits both your operations and your users.

Pro tip: Treat AI procurement like a hybrid of software sourcing, data processing assessment, and cyber due diligence. If the vendor cannot explain model provenance, data retention, and incident response in plain language, you should assume your contract will not protect you either.

1. Why AI vendor risk is different from ordinary SaaS risk

Models can change without the UI changing

Traditional SaaS vendors usually ship known functionality with predictable release notes. AI vendors can change model behaviour, retrieval sources, safety filters, and orchestration logic without altering the interface your users see. That creates a unique form of model drift: the service may still “work,” but performance, accuracy, and compliance posture can change materially between versions. Procurement therefore has to evaluate not just the current product, but the vendor’s policy for change control and notification.

Outputs may be probabilistic, not deterministic

Many business stakeholders still expect software behaviour to be repeatable. Generative AI breaks that assumption because outputs vary by prompt, context window, temperature settings, and upstream model updates. That is especially risky in regulated workflows such as customer support, contract triage, and HR or financial decision support. Teams that are defining evaluation standards may find it useful to compare with other measurement disciplines, such as reliable conversion tracking and market reaction forecasting models, where measurement must remain stable despite external change.

Third-party AI introduces supply chain risk

When an AI product depends on upstream model providers, hosting platforms, plugins, vector databases, or content pipelines, risk multiplies across the chain. A single weak link can expose prompts, metadata, or customer content. That is why the vendor assessment must identify all material sub-processors and dependencies, not just the company name on the invoice. In practice, AI supply chain risk often resembles multi-vendor cloud risk more than classic software procurement.

2. The procurement checklist: what to request before you shortlist

Ask for a model card, not just a sales deck

Your first procurement request should be a model card or equivalent technical disclosure. At minimum, ask which base model is used, whether it is proprietary or open-source-derived, what safety tuning has been applied, and what known limitations exist. You also need to know whether the vendor is using a single model or a routing layer that dynamically chooses between several. This matters because model provenance is a core control in both security reviews and compliance sign-off.

Demand training data and retention disclosures

Training data controls are one of the most overlooked issues in AI vendor risk. You need to know whether customer inputs are used for training, whether prompts are retained for debugging, whether retention can be disabled, and how deletion requests are handled. If the vendor uses third-party model providers, ask whether your data is shared with those providers and under what contractual safeguards. For teams already thinking about broader digital identity and data boundaries, our guide on digital identity in the cloud is a useful companion.

Require a clear update and versioning policy

One of the fastest ways to create hidden operational risk is to buy a vendor that can silently alter model behaviour. Ask how often models are updated, whether updates are automatic or controlled, whether version pinning is available, and how backward compatibility is managed. If your use case is customer-facing, you may need a staged rollout model with canary testing and rollback support. Without that, one unnoticed vendor update can break a support flow, distort recommendations, or produce inconsistent answers across teams.

3. The contract layer: SLAs, security clauses, and accountability

Define SLA metrics that actually matter for AI

AI SLAs should not stop at uptime. They should include API availability, latency thresholds, support response times, incident notification windows, and service credits tied to business-critical failures. For enterprise deployments, it is also worth asking for accuracy or escalation commitments on specific workflows, though those are usually framed as service objectives rather than hard guarantees. If your current sourcing process is weak on measurable service outcomes, compare it with approaches used in cloud service management and AI platform operations, where uptime and throughput are tied directly to revenue.

Insert security and audit rights into the paper

The contract should specify encryption requirements, access controls, logging retention, vulnerability management expectations, and audit rights. You should also include obligations for breach notification, subcontractor management, and cooperation with your own incident response process. In many cases, procurement should insist on the right to review independent security assessments or certifications, especially if the AI service touches personal data or regulated records. Vendors that resist audit language often create downstream surprises when security teams start asking difficult questions.

Make liability fit the actual risk

Standard SaaS contracts often limit liability to a small amount of fees paid, which may be completely out of step with AI-driven harm. If the vendor is processing personal data, assisting decisions, or generating public-facing content, ask legal to assess whether caps, indemnities, and exclusions are adequate. Make sure the warranty language covers authority to process data, compliance with applicable law, and non-infringement where relevant. Procurement should not assume the sales team’s assurances will survive contact with the MSA unless they are written down.

4. Technical due diligence: the questions IT and security teams should ask

How is the model hosted and isolated?

IT teams need clarity on hosting architecture. Is the model hosted in a dedicated tenant, shared environment, or through a public API broker? Are prompts isolated from other customers’ traffic? Are logs encrypted at rest and in transit? If the vendor offers connectors to business tools, ask whether those integrations use scoped permissions and whether tokens are stored securely. The more interconnected the service, the more important it is to understand the blast radius of compromise.

What red-team testing has been done?

Ask for red-team reports, adversarial testing summaries, or safety evaluations that show how the system behaves under prompt injection, jailbreak attempts, data exfiltration prompts, and policy evasion. Mature vendors should be able to explain both the test methodology and the residual issues they have not yet eliminated. If they cannot provide a formal report, ask for evidence of internal abuse testing, external penetration testing, or bug bounty coverage. A vendor that has not tested failure modes is effectively asking your organisation to discover them in production.

How does the system handle prompt and retrieval security?

Many AI failures happen not because the base model is weak, but because the surrounding system is insecure. Retrieval-augmented generation needs controls around source ranking, prompt sanitation, document access, and user permissions. Prompt injection can weaponise external content or untrusted documents, especially where the AI agent has tool access. For organisations building robust automations, the thinking should be similar to the security discipline in vulnerability analysis and to the practical safeguards used in security device placement, where architecture choices shape exposure.

5. A practical scoring model for AI vendor assessment

Use weighted categories, not gut feel

To reduce bias and speed up decisions, score each vendor against the same weighted rubric. A simple 100-point model works well for procurement and IT collaboration. The categories below focus on the factors that most often create operational, legal, and commercial risk. You can adjust the weights for your own sector, but the structure should remain consistent across vendors.

Category	Weight	What to examine	Scoring guidance
Model provenance	20	Base model origin, ownership, lineage, versioning	20 = fully disclosed; 10 = partial; 0 = opaque
Training data controls	20	Retention, training opt-out, deletion, sub-processors	20 = strong controls; 10 = some controls; 0 = unclear
Security and red-team evidence	15	Testing reports, abuse cases, prompt injection defence	15 = evidence provided; 7 = limited evidence; 0 = none
SLA and support	15	Uptime, latency, incidents, response times, credits	15 = business-aligned SLA; 7 = basic SLA; 0 = weak
Compliance posture	15	UK GDPR, sector controls, audit rights, DPA terms	15 = mature compliance; 7 = partial; 0 = weak
Vendor stability	15	Financial health, funding, roadmap, market position	15 = stable; 7 = uncertain; 0 = distressed

Interpret scores with clear thresholds

Once you have the raw score, apply decision thresholds. A score of 85 or above typically indicates a low-risk candidate suitable for broader rollout, assuming your use case is covered by the scope of the due diligence. Scores between 70 and 84 should trigger conditional approval, with specific remediation actions before launch. Anything below 70 should be treated as a high-risk vendor and escalated to legal, security, and senior leadership. If your organisation is comparing vendors in a crowded market, this scoring discipline is similar in spirit to how teams assess product fit in e-commerce tooling or performance in AI planning systems.

Weight the score by use case criticality

Not every AI use case deserves the same level of scrutiny. A marketing copy assistant may tolerate lower operational rigidity than a claims triage system or HR chatbot. To account for that, multiply the vendor score by a criticality factor: 1.0 for low-risk internal productivity, 1.2 for customer-facing systems, and 1.5 for regulated or high-impact workflows. This keeps the model simple while reflecting the real-world stakes of the deployment.

6. Financial, market, and dependency risk: the part procurement often misses

Assess the vendor’s runway and concentration risk

Financial instability can become a service problem long before bankruptcy. If the vendor relies heavily on a single hyperscaler, a small number of model licensors, or one dominant customer segment, that concentration creates fragile economics. Ask about runway, funding history, gross margin profile, churn, and the proportion of revenue tied to your own sector. You should also check whether pricing depends on usage patterns that may become unpredictable as your adoption grows.

Review roadmap realism and product dependency

Vendors often sell an ambitious roadmap that assumes future access to cheaper inference, broader model rights, or higher-quality data sources. Procurement should separate what exists today from what is promised later, then contract only for the current capabilities. If the vendor’s differentiation depends on a third-party model provider, ask how quickly they can switch or degrade gracefully if that upstream service changes terms. A useful analogy is found in AI in budget travel, where the product experience can shift dramatically when a single upstream pricing rule changes.

Watch for hidden commercial fragility

Some AI vendors underprice initial pilots and then expose customers to steep consumption-based costs after adoption. Others use aggressive minimum commitments, expensive overage pricing, or paid support tiers that become unavoidable once the system is embedded. Procurement should build a three-scenario cost model: pilot, steady state, and scaled deployment. That model should include not only licence fees, but integration work, monitoring overhead, human review costs, and exit costs.

7. Compliance and governance: making vendor risk defensible

Map the use case to regulatory obligations

Before signing, determine whether the AI system processes personal data, special category data, confidential commercial information, or regulated records. Under UK GDPR and related obligations, this can affect lawful basis, DPIA requirements, cross-border transfer controls, retention, and subject rights handling. If the AI vendor supports automated decisions, human review and explainability become central concerns. Procurement is safer when compliance is treated as a design input rather than a final checkbox.

Demand documentation you can actually audit

Vendor assessment should yield artefacts that are useful six months later, not just a one-time sign-off. Keep copies of the DPA, security questionnaire, red-team summary, architecture diagram, subprocessors list, and approved use-case description. Record which internal approvers reviewed the system and which risks were accepted. For organisations that value measurable governance, this approach echoes the reporting rigor in digital audio strategy and high-trust live operations, where documentation shapes confidence.

Plan for human oversight from day one

Even where the vendor is highly rated, your own controls matter. Define who can approve prompts, who can override outputs, when human review is required, and how exceptions are escalated. Document whether users are allowed to paste sensitive data into the system and how that is enforced. Governance fails when responsibility is assumed rather than assigned.

8. A step-by-step due-diligence workflow for procurement and IT

Step 1: triage the use case

Start by classifying the use case as low, medium, or high impact. Ask whether the AI will be customer-facing, whether it will touch personal data, whether it can influence decisions, and whether it has tool access. This triage determines the depth of review needed and keeps procurement cycles focused. A lightweight internal assistant may need only standard controls, while a workflow that writes to CRM or ERP demands full security and legal review.

Step 2: run the vendor questionnaire

Send a standardised questionnaire covering model provenance, data usage, data retention, update policy, SLAs, incident response, subprocessors, red-team testing, and financial stability. Use yes/no answers only as a starting point; insist on evidence, not assertions. Ask for supporting documents in the same package so reviewers can compare vendors quickly. Standardisation reduces the risk of being swayed by polished demos and vague assurances.

Step 3: score, challenge, and negotiate

Once scored, challenge any category where the score depends on incomplete evidence. If a vendor refuses to disclose model lineage, increase the risk weighting. If they lack version pinning, ask for contractual commitments or pilot-only approval. If their DPA or SLA is weak, make those items gate conditions rather than nice-to-have improvements. The best teams use procurement leverage to shape the final risk profile, not merely to record it.

9. Common red flags and how to respond

Red flag: “We can’t disclose the model”

This usually means the vendor is relying on opaque licensing, unstable upstream dependencies, or a very weak governance posture. Without model provenance, you cannot assess bias, safety, update behaviour, or even whether the vendor has a valid right to commercialise the service. Response: request written disclosure under NDA; if refused, downgrade the vendor materially or reject for regulated use.

Red flag: “We improve the product using customer data by default”

That statement may be acceptable in consumer settings, but it is often unacceptable in enterprise AI procurement. It means your prompts, documents, or outputs could be used for training unless you opt out or negotiate otherwise. Response: require an explicit no-training clause, deletion rights, retention limits, and subprocessor disclosure before pilot approval.

Red flag: “Our SLA is the same as our cloud provider’s”

This is not a real AI SLA. Your vendor is the service you buy, not the infrastructure beneath it. If their contractual obligations stop at pass-through cloud availability, they may not be accountable for model failures, workflow errors, or support gaps. Response: ask for service credits, incident windows, and remediation obligations tied to the product you actually use.

10. Bringing it together: a procurement decision framework that stands up later

Use the checklist to make decisions, not just notes

Due diligence is only valuable if it changes the decision. If a vendor scores well technically but poorly on compliance or financial stability, the board-level answer may still be no. If the score is middling but the use case is low-risk and the vendor agrees to contractual fixes, a controlled pilot may be appropriate. The key is to make the decision traceable, proportional, and repeatable.

Document exceptions and compensating controls

When you accept a risk, write down the reason, the owner, and the compensating control. For example, if a vendor lacks perfect version pinning, you might accept that for a low-risk internal assistant while requiring human review, logging, and monthly regression tests. This creates an auditable trail that supports future reviews, renewals, and incident investigations. It also helps procurement avoid rediscovering the same issues during every contract cycle.

Build renewal reviews into the lifecycle

AI vendor risk is not a one-time exercise. Models evolve, pricing changes, compliance obligations shift, and the vendor’s financial situation can change quickly. Set a renewal review cadence that re-checks model provenance, SLAs, security evidence, and data handling before auto-renewal. If you need a broader operational lens for governance and rollout, our guide on future tech adoption and AI strategy shifts can help your leadership team align vendor choice with business direction.

FAQ: AI Vendor Risk and Due Diligence

1. What is the most important question to ask an AI vendor?

The single most important question is usually: “Can you clearly explain model provenance, training data usage, and update policy?” If the vendor cannot answer that well, most other assurances are less reliable.

2. Do all AI vendors need a red-team report?

For low-risk tools, a brief internal abuse test may be enough. For customer-facing, regulated, or workflow-critical AI, you should ask for a formal red-team or adversarial testing summary.

3. How do we score open-source AI vendors?

Open-source does not automatically mean lower risk. You still need to assess hosting, fine-tuning, data controls, update cadence, support commitments, and the vendor’s right to distribute and support the stack.

4. Should procurement or IT own AI vendor assessment?

Neither should own it alone. Procurement should manage commercial and contractual risk, IT should assess architecture and security, and legal/compliance should validate data and regulatory obligations.

5. What makes an AI vendor unacceptable for enterprise use?

Opaque model provenance, no training data controls, weak or missing SLAs, no incident process, refusal to disclose subprocessors, and unstable financials are common deal-breakers.

6. How often should vendor risk be reviewed?

At minimum, review annually and at every major product or contract change. High-impact deployments should also have quarterly operational reviews and post-incident reassessment.

Reimagining Personal Assistants: The Impact of Chat Integration on Business Efficiency - See how AI chat systems affect enterprise workflows and support operations.
Designing Settings for Agentic Workflows: When AI Agents Configure the Product for You - Learn how agentic systems change governance and control requirements.
Lessons from OnePlus: User Experience Standards for Workflow Apps - Explore practical UX principles for reliable business software.
A Comprehensive Guide to Addressing Fast Pair Vulnerabilities - A security-first perspective on identifying and reducing technical exposure.
Understanding Digital Identity in the Cloud: Risks and Rewards - A useful companion for data access, identity, and cloud governance.