Why vendor evaluation matters more than demos
Every AI chatbot vendor can show you an impressive demo. The chatbot answers the questions it was prepared to answer fluently, confidently, and correctly. What demos do not show you: how the bot behaves when it does not know the answer, how it handles offensive inputs, how quickly the knowledge base can be updated, what the accuracy is on questions outside the demo set, and what the escalation path looks like when the bot fails.
These ten questions get past the demo and reveal the system's actual production readiness.
The 10 questions
1. How does the bot handle questions it cannot answer?
The correct answer: the bot acknowledges uncertainty, declines to guess, and escalates to a human channel. Any answer that involves the bot making up a plausible-sounding response to an out-of-scope question should be disqualifying for any customer-facing deployment.
2. How are knowledge base updates deployed?
The correct answer: within minutes, with no developer involvement required. If updates require a development cycle, a support ticket, or manual reindexing that takes hours, your knowledge base will become stale and your chatbot's accuracy will degrade over time.
3. Can you see exactly what context was used to generate each response?
The correct answer: yes, for every response. Source attribution is essential for trust and for quality assurance. If you cannot see what documents the bot retrieved to answer a question, you cannot diagnose incorrect answers or identify knowledge gaps.
4. What is the answer accuracy on your specific content?
The correct answer: a number derived from testing on your actual content, not the vendor's benchmark dataset. Request a proof-of-concept evaluation on a sample of your documentation. Measure accuracy against a set of questions with known correct answers.
5. How is personally identifiable information handled?
The correct answer: PII is identified, minimised, and handled in compliance with GDPR (and any other applicable regulations). Conversation logs should be encrypted, retained only as long as necessary, and subject to subject access requests. If the vendor cannot answer this question in detail, do not deploy their product for customer-facing use.
6. What authentication and access control does the system support?
The correct answer: SSO via SAML or OAuth, role-based access to the admin interface, and optionally, user-context-aware retrieval that respects document-level permissions. A bot that shows confidential HR documents to all employees regardless of role is a liability.
7. How are human escalations handled?
The correct answer: a configurable escalation path that routes to your preferred channel (live chat, email, ticketing system), with conversation context preserved so the human agent does not have to ask the user to repeat themselves.
8. What analytics and reporting does the platform provide?
The correct answer: at minimum, query volume, answer confidence distribution, escalation rate, and top unanswered questions. Top unanswered questions is particularly valuable — it tells you exactly where your knowledge base has gaps.
9. What is the deployment model?
The correct answer: a deployment model that matches your security and compliance requirements. Cloud-hosted SaaS is fine for most applications. Regulated industries (finance, healthcare, legal) may require private cloud or on-premises deployment. Confirm the vendor can meet your requirements before going deep in evaluation.
10. What does the contract say about model changes?
The correct answer: you should have at least 60 days notice before any change to the underlying model that could affect response quality, and the right to evaluate the change before it is deployed to your production instance. LLM providers update models frequently, and updates that improve general capability can sometimes degrade performance on specific domains.