Skip to content
Back to Magazine
ai-operating-models 14 min read

Enterprise AI Audit: 27 Questions a Consultant Should Ask Before Proposing Anything

Does this apply to your company?

Free 30-min AI diagnostic →

Key Takeaways

  • - [10 mistakes that sink AI initiatives in mid‑size companies](/magazine/ai-initiative-mistakes-mid-sized-en)
  • - [Decision Rights Map: who decides what in an AI system](/magazine/decision-rights-map-ai-system-governance-en)
  • - [Governance vs Compliance: why your policy decides nothing](/magazine/governance-vs-compliance-policy-decision-making-en)

Decision

Decide what governance, ownership or cadence is missing before scaling AI.

Room

Executive committee, AI portfolio review, transformation steering.

Risk

Mistaking activity, pilots and tooling for real operating capability.

Agent prompt: map decision rights, KPIs, risks and the next operational move

If a consultant comes to your company and already has a proposal in the first meeting, they haven’t done an audit. They’ve made a sales presentation with your logo.

A serious AI audit starts with questions. Lots of questions. The answers determine whether the problem you feel is the real problem, and whether the solution you envision is the one you need.

These are the 27 questions that structure a well‑executed audit. If your current consultant hasn’t asked most of them, you should know why not.


1. Strategy and Portfolio (5 questions)

P1: How many AI initiatives are currently active in your company, including those that do not go through IT?

Why it matters: Most companies underestimate their AI sprawl by 40‑60%. Departments activate AI tools without coordination with IT. Before designing any strategy, you need to know how many fronts you have open.

Red flags: If the answer is “a few” or cannot be given in less than 5 minutes, there is a visibility problem that will invalidate any strategy built on that void.


P2: Which of those initiatives have a defined business metric, with an owner and review date?

Why it matters: Initiatives without a business metric are experiments without closure. They can live indefinitely consuming resources without producing evidence of value or failure.

Red flags: If less than 50 % have a defined metric, the problem is not technical. It is investment governance. Adding more AI before solving this multiplies the problem, it does not solve it.


P3: When was the last time an AI initiative was closed for not meeting its objectives?

Why it matters: The ability to close is the most reliable indicator that real governance exists. Without documented closures, the portfolio only grows. AI sprawl is not a tools problem: it is a problem of nobody having the mandate to stop.

Red flags: If the answer is “never” or “I don’t remember,” the company has no operational kill criteria. Anything built on that foundation will be fragile.


P4: Is there explicit alignment between active AI initiatives and the business objectives for the quarter?

Why it matters: AI that does not point to a concrete business objective produces output, not value. The difference between output and value becomes visible in quarterly results: that’s where uncomfortable questions usually appear.

Red flags: If the answer requires a 30‑page document to justify, alignment does not exist; there is only a narrative of alignment, which is not the same.


P5: Who in leadership can formally say “this stops” on an AI initiative without needing committee consensus?

Why it matters: The power to stop is the core of AI governance. Without a single accountable person with stop authority, initiatives survive by political inertia, not merit.

Red flags: If the answer is “the digital transformation committee” or “we’d have to discuss it,” there is no governance. There is only a process.


2. Decision Rights and Governance (5 questions)

P6: Is there a map of which decisions an AI system can make without human supervision?

Why it matters: Without that map, the company does not know how much power it has delegated to automated systems. The risk is not hypothetical: it is operational. A decision taken by a system that no one supervised can have legal, financial or reputational consequences.

Red flags: If the map does not exist, autonomy boundaries were defined by default (i.e., nobody defined them). This is especially critical in processes that touch customers, personal data or financial decisions.


P7: What happens when an AI system makes an incorrect decision? Who is responsible and what is the rollback protocol?

Why it matters: Accountability of AI systems is one of the EU AI Act requirements for high‑risk systems. Beyond compliance, it is operationally necessary: if nobody knows what to do when the system fails, response time multiplies.

Red flags: Vague answers like “we would review it” without a documented protocol. The absence of a rollback protocol for critical systems is an active operational risk.


P8: How often does the executive team review AI portfolio performance? What formats are used?

Why it matters: Governance without an executive review cadence is not governance: it is hope. The frequency and format of reviews determine whether real corrective capacity exists before problems scale.

Red flags: “We review when problems arise” is reactive, not governance. “We have a dashboard” without executive decisions attached is reporting, not governance.


P9: Is there a process to approve new AI initiatives before they go into production?

Why it matters: Without an approval process, the portfolio grows unchecked and resources are allocated by perceived urgency, not strategic criteria. Speed of activation without criteria is the number one cause of AI sprawl.

Red flags: “Any department can activate AI tools if they have budget” is a policy, but not a governance process. The difference is that the process includes impact, risk and reversibility criteria.


P10: Are there documented policies on what data can be used to train or feed internal or external AI systems?

Why it matters: Many companies are using customer, financial or employee data to feed third‑party models without clear permission under contracts, GDPR or the EU AI Act.

Red flags: “I think the contract allows it” without documentation is active legal exposure. If the AI provider uses your data to improve its base model and you lack an opt‑out clause, you have a compliance problem.


3. Data and Observability (5 questions)

P11: Can you trace which data fed a specific decision made by an AI system 30 days ago?

Why it matters: Data traceability is a requirement of the EU AI Act for high‑risk systems and a basic condition for internal audits. Without traceability, you cannot demonstrate that a system worked correctly nor investigate why it failed.

Red flags: If the answer is no, any high‑impact AI system (credit, hiring, pricing, compliance) is operating without the minimum control infrastructure.


P12: Are there active monitoring systems that alert when a model’s performance falls below a threshold?

Why it matters: AI models degrade over time if the data context changes (data drift). Without active monitoring, degradation can go unnoticed for weeks or months while the system continues to make wrong decisions.

Red flags: “We review it manually” at variable intervals is not monitoring: it’s luck. The alert threshold must be defined before the system goes into production, not after it has failed.


P13: What percentage of your AI systems’ decisions are explainable to an external auditor or a client?

Why it matters: Explainability is not only a regulatory requirement: it is an operational requirement when someone challenges a decision. If the system cannot explain why it made a decision, the legal and reputational cost of defending it is disproportionate.

Red flags: Black‑box systems in legally exposed areas (credit scoring, employee selection, dynamic pricing) are an active vulnerability, not a future one.


P14: How do you measure the real impact of AI systems on business metrics, beyond usage metrics?

Why it matters: Number of queries processed, response time or adoption rate are usage metrics, not impact metrics. Real impact is measured against business metrics: time reduction, cost savings, margin increase, error reduction.

Red flags: If the answer mainly includes usage metrics, the system may be heavily used but deliver little real value. High adoption and low impact is a common combination in pilots that are scaled prematurely.


P15: Is there a centralized log of incidents caused by incorrect AI system decisions?

Why it matters: Without an incident log, failure patterns are invisible. A single failure looks like an accident. Ten similar failures in six months are a systemic problem. Without the log you cannot distinguish between the two.

Red flags: Absence of an incident log for production AI systems indicates failures are not documented, which in turn shows no learning occurs. It is one of the most reliable indicators of immature governance.


4. Tech Stack and Vendor Lock‑In (4 questions)

P16: Can you switch the primary AI vendor in less than 3 months without disrupting critical operations?

Why it matters: Vendor lock‑in in AI has costs that are underestimated at adoption and become visible when the provider changes pricing, modifies its model, or disappears. The ability to switch is a direct measure of real strategic flexibility.

Red flags: Dependence on a single vendor for more than 60 % of critical AI initiatives, without an abstraction architecture, is a strategic vulnerability that carries a price when the market moves.


P17: Do you have full visibility of the actual costs of your AI systems (API calls, compute, licenses, internal time)?

Why it matters: The real cost of AI in production is often 2‑3 times the estimated pilot cost. Without full cost visibility, calculated ROI is fictitious and scaling decisions are made on wrong data.

Red flags: “The cost is just the licenses” without including engineering time, error cost, rework, or coordination overhead. If you cannot calculate the total cost of an AI initiative, you cannot calculate its profitability.


P18: What happens to your proprietary data and models if you terminate the contract with your primary AI provider?

Why it matters: Many AI platform contracts lack clear clauses on data and model portability, fine‑tuned models, or embeddings generated with proprietary data. Losing that asset when switching providers is a cost that never appears in the initial budget.

Red flags: Not having reviewed portability and ownership clauses in AI provider contracts is an active legal risk, especially with customer or financial data.


P19: Does your AI architecture allow you to update or replace a base model without redesigning the entire application layer?

Why it matters: Base models evolve quickly. An architecture where the model is tightly coupled to the application requires much more engineering work each time the model changes. The accumulated maintenance cost of that coupling is invisible until an update is needed.

Red flags: If the answer is “we’d have to redo a lot,” future maintenance cost is underestimated. An abstraction architecture (separate layers for application, orchestration and model) is not over‑engineering: it is minimal engineering for systems that will live more than 12 months.


5. Talent and Operating Model (4 questions)

P20: Who in your company has explicit responsibility for AI systems delivering the expected results?

Why it matters: Without a clear results owner, responsibility diffuses among the technology provider, IT team, user department and leadership. When something fails, everyone points elsewhere. Diffused ownership is the most frequent pattern behind pilots that never scale.

Red flags: “Shared responsibility” without a single accountable name signals that nobody is responsible. In practice, “shared responsibility” means no one has enough incentive to solve the hard problems.


P21: Have you identified which internal skills are needed to operate the AI systems you already have in production?

Why it matters: Total dependence on the provider to operate production systems is an operational risk. Minimum internal skills (understanding outputs, detecting anomalies, executing basic rollbacks) are needed regardless of who built the system.

Red flags: “The provider manages everything” for critical business systems is an externalization of knowledge that creates operational vulnerability. When the provider fails or leaves, the company does not know what it has nor how to run it.


P22: How do you integrate AI system evaluation into the performance review processes of the teams that use them?

Why it matters: If teams are evaluated on KPIs that do not capture AI impact on their work, there is no incentive to adopt it seriously or to report failures. Superficial adoption and under‑utilization have the same cause: incentives are not aligned with real use.

Red flags: Performance review processes that include no indicator related to the quality of AI‑assisted decisions signal that the company talks about AI as strategy but does not manage it as an operation.


P23: Is there a process to capture and distribute learnings when an AI initiative fails or yields unexpected results?

Why it matters: Knowledge generated from AI system failures is one of the organization’s most valuable assets and the most frequently wasted. Without a post‑mortem and learning distribution process, each team repeats the same mistakes.

Red flags: “We review it internally and fix it” without distributed documentation means learning stays with one person or one team. When that person leaves or the team changes, the knowledge is lost.


6. Compliance EU AI Act and GDPR (4 questions)

P24: Have you classified your AI systems according to the EU AI Act risk levels (prohibited, high‑risk, limited, minimal)?

Why it matters: The EU AI Act applies from August 2026 for high‑risk systems. Companies operating AI systems without having performed this classification may be non‑compliant without knowing it, facing fines up to 30 million EUR or 6 % of global turnover.

Red flags: Not having started the classification at this stage is not just a legal risk: it signals that corporate governance does not include the regulatory framework in its decision cycle. The AI Act is not a 2026 surprise; it has been in public draft for years.


P25: Do you have technical and compliance documentation for AI systems you would classify as high‑risk?

Why it matters: The EU AI Act requires specific technical documentation for high‑risk systems: system specifications, training data, risk management measures and conformity records. Creating that documentation retroactively is more costly than building it during development.

Red flags: Production systems that touch personnel selection, credit scoring, essential service access or law enforcement without compliance documentation are active regulatory vulnerabilities.


Why it matters: GDPR requires an explicit legal basis for each personal data processing. Using customer, employee or user data to train or feed AI systems without a legal basis is a breach that can trigger investigations by the AEPD with fines up to 20 million EUR or 4 % of turnover.

Red flags: “We have generic consent in the terms of use” is usually insufficient as a legal basis for AI processing. Purposes must be specific, limited and known to the user at the time of collection.


P27: Do you have a process to respond to citizen rights (access, rectification, erasure) when the contested decision was made or assisted by an AI system?

Why it matters: GDPR includes the right not to be subject to automated decisions with significant effects, and the right to have those decisions reviewed by a human. If you lack a process to respond to these requests within GDPR timelines (30 days), you are operationally non‑compliant.

Red flags: Absence of a GDPR rights response protocol for AI decisions is especially critical in sectors like banking, insurance, HR and health, where decisions have direct effects on individuals and request frequency is higher.


Conduct the full audit with your team in 30 minutes

These 27 questions are the skeleton of a serious AI audit. Honest answers reveal exactly where the real problem lies and what type of intervention makes sense: governance, architecture, compliance, talent, or all of the above.

If you want to run a structured audit with your executive team and walk away with a prioritized risk map, we work with companies of 50 to 500 employees in diagnostic sprints ranging from 2,500 to 6,000 EUR depending on scope.

Request the full AI audit for your company.



Translated from the Spanish original with AI assistance and reviewed for accuracy. Read the original in Spanish.

auditoria-ia diagnostico consultoria-ia preguntas-clave
Cite this article

Berthelius, V. (2026). “Enterprise AI Audit: 27 Questions a Consultant Should Ask Before Proposing Anything”. BRTHLS Magazine. https://www.brthls.com/magazine/enterprise-ai-audit-questions-en

Fractional CAIO · Free diagnostic

Is your company ready to operate with AI?

30 minutes. No pitch. An honest read on where you are and what to move first.

Book free diagnostic