Data Q&A with Dawn Kelso

The big story in data this year is one of self-examination. After two years of generative AI pilots, boardrooms around the globe are asking: why are none of these projects in production?
The key issue is data quality and fragmentation, and the advent of mainstream AI usage has suddenly focused a spotlight on this. Businesses are dealing with data trapped in legacy silos, frequently lacking the structure, metadata, guardrails, and governance that AI tools need to use it effectively.
Put simply: many organisations have rushed to build on top of a foundation that wasn’t yet ready.
We reached out to our Global Data Lead, Dawn Kelso, to delve a bit deeper into this data fragmentation issue and what action organisations can take:
QUESTION 1:
Critical data
Can leaders clearly point to where critical data lives, who can access it (and which, if any, AI systems use it)?
of executives believe their employees are data-proficient
of employees feel confident working with data
This is such an interesting question – business leaders are often endlessly frustrated with why getting the data that they need both rapidly and accurately remains such a challenge.
This frustration is often driven by a disconnect in perception versus reality - a recent Dataversity survey found that 75% of executives believe their employees are data-proficient, yet only 21% of employees feel confident working with data.
Data technology has evolved significantly over the last decade, and yet in-house thinking has often lagged sadly behind - we still see legacy data landscapes with siloed data, slow data processing, loading into data warehouses, antique reporting software, and a lack of data consistency.
In-house data teams are up to their eyeballs in business as usual (BAU) keeping those legacy ETL processes and reports running and are often lacking in strategic direction from those same business leaders, changing direction rapidly from one new priority to another. Meanwhile analytics roles are popping up across business areas, either from a positive, pro-active, strategic drive, or from desperation – in which case ‘shadow data’ initiatives are built, commencing the creation of legacy issues for the future
The painful reality is that organisations have spent years chasing a single centralised version of truth, and yet for most businesses this was never what was needed. Contextual truth remains absolutely crucial, but applying domain-specific lenses has changed the way we need to think about that single version of truth.
Approach to data, and its supporting technologies has changed significantly, and organisations that have strategically adopted that change are the ones that are seeing real value – across implementing a Data Mesh approach, using shortcutting to avoid the constant data centralisation challenge, devolving governance and ownership to business data owners, and reaping the benefits of this de-centralised approach to data and analytics.
Put simply, organisations with this approach are set up for significantly more AI success than those with poorly governed, low quality, sprawling legacy data landscapes.
QUESTION 2:
Guardrails
Which guardrails let teams use AI in delivery without trading away quality, security or accountability?
Data governance has become an increasingly hot topic over recent years, and it has proven crucial in today’s era of rapid AI adoption. Perhaps unfairly, governance has often been seen as a barrier, yet it is proving essential for trustworthy, secure, AI.
Well-implemented governance not only ensures compliance and manages risks, but also boosts adoption, aligns efforts, accelerates innovation, and builds stakeholder trust.
Visibility of data lineage allows an understanding of the data being used to drive AI, with clear data ownership, cataloguing and access control all supporting accountability.
Perhaps more visibly than all of this, governance done well massively increases the potential success of AI implementations.
QUESTION 3:
Trade- offs
Which data/workloads are ‘non-negotiably local’, and which cost/capability trade-offs follow from that choice?
The wonderful world of cloud offers boundless elasticity and managed infrastructure.
But that boundlessness comes at a cost, and with other challenges in the form of risks, regulatory obligations, and latency constraints that can make on-premises deployment a more viable option, something I didn’t think I’d be saying in this decade, and yet, here we are. Model distillation is a real thing and can bring real benefits.
Understanding what you sacrifice by choosing local execution can be essential in determining your AI investment strategy.
- Data sensitivity: training models on proprietary data (financial patterns, client intelligence, product roadmaps) could create liability if datasets leave your infrastructure, not to mention giving competitors access to your secrets.
- Real-time inference at scale: autonomous systems, robotics, and high-frequency trading all require sub-millisecond latency. Cloud inference introduces 50-200 millisecond latency penalties through API calls. For speed-critical and safety-critical applications (industrial control, autonomous vehicles), this latency can be unacceptable.
- Regulated AI Systems: healthcare AI, financial risk models, and critical infrastructure systems frequently operate under data residency requirements (GDPR, HIPAA, sectoral regulations). Compliance with these regulations can mandate local processing and fines for violation of these are sizable, dwarfing on-premises infrastructure investment.
- Large Language Models (LLMs) on proprietary data: Organisations deploying LLMs on confidential documents (contracts, source code, patient records) cannot risk cloud exposure. Local model hosting using open-source LLMs or licensed models prevents data exfiltration and maintains audit control.
- The cost-capability trade-off matrix: local infrastructure delivers both latency and control, but at the cost of operational complexity, capital commitment, and ongoing maintenance. Your ability to handle burst workloads reduces, and you will need to both provision for peak loads and accept idle capacity during troughs.