Is the Kirkpatrick Model Still Relevant in the AI Era?
Donald Kirkpatrick developed his four-level training evaluation model in 1959. Sixty-five years later, it remains the most widely used framework in L&D.
But the learning landscape has changed dramatically. AI is transforming both how we deliver training and what we train people on. Does a framework developed for mid-century corporate training still apply?
My answer: yes, but with important adaptations.
The Four Levels, Briefly
For those unfamiliar with Kirkpatrick:
Level 1 - Reaction: How did participants feel about the training? Were they satisfied? Did they find it engaging?
Level 2 - Learning: Did participants actually acquire new knowledge, skills, or attitudes?
Level 3 - Behaviour: Are participants applying what they learned in their work?
Level 4 - Results: Is the organisation seeing business outcomes from the behaviour change?
The model is simple, intuitive, and logically structured. That’s why it’s endured.
The Classic Criticisms
Before addressing AI-specific challenges, it’s worth acknowledging the criticisms that have accumulated over decades:
The levels aren’t necessarily sequential. Behaviour change can occur without positive reaction. Learning can happen without leading to behaviour change. The linear chain of causation is oversimplified.
Level 1 is overused. Most evaluation stops at reaction surveys, which correlate poorly with learning and even less with results. Easy measurement substitutes for useful measurement.
Level 4 attribution is nearly impossible. Isolating training’s contribution to business results from all other factors is methodologically challenging.
The model is descriptive, not prescriptive. It tells you what to measure, not how to improve training.
These criticisms have validity. But none of them is fatal. The model remains useful despite its limitations.
AI-Era Challenges
AI introduces new challenges that the original model didn’t anticipate.
The Pace of Change
Traditional training assumed relatively stable skill requirements. You could train someone on a skill and expect it to remain valuable for years.
AI capabilities evolve weekly. A prompt engineering technique that works today might be obsolete in six months. By the time you’ve measured Level 3 behaviour change, the behaviour might no longer be relevant.
Adaptation: Shorten evaluation cycles. Don’t wait six months for Level 3 assessment—check at four weeks. Focus less on specific techniques and more on adaptive capability.
The Blurring of Learning and Work
Traditional models distinguished between learning time (in training) and work time (applying training). AI tools blur this distinction. People learn while working, experiment on live tasks, and develop skills through daily use.
Adaptation: Don’t limit measurement to formal training. Evaluate learning that happens through AI consultants Brisbane who embed AI skill development into daily workflows. Assess capability development wherever it occurs.
The Individual Variation in AI Fluency
Traditional training assumed roughly similar starting points. AI training encounters massive variation—from people who’ve never used ChatGPT to people who’ve been experimenting for two years.
Adaptation: Segment evaluation by starting capability. A 20% improvement for someone already proficient is less meaningful than a 20% improvement for someone starting from zero. Measure growth, not just endpoints.
The Difficulty of Defining Competence
For traditional skills, competence was reasonably well-defined. For AI fluency, we’re still figuring out what “good” looks like. Evaluation criteria are moving targets.
Adaptation: Use criterion-referenced assessment where possible. Define specific outcomes (“can use AI to draft a customer communication in under five minutes”) rather than normative comparisons.
Adapting Each Level
Here’s how I adapt each Kirkpatrick level for AI training evaluation.
Level 1: Reaction
The basic question remains relevant: did participants engage with the training? But add:
- Confidence ratings for applying AI in specific work contexts
- Assessment of whether content matched current AI capabilities
- Feedback on whether examples felt relevant to participants’ actual work
Be cautious about interpreting high satisfaction. AI training can be entertaining without being useful. Low confidence scores are often more informative than satisfaction scores.
Level 2: Learning
Traditional knowledge tests don’t work well for AI skills. Memorising facts about AI isn’t the point—applying AI effectively is.
Better Level 2 assessments:
- Practical demonstrations with real tasks
- Quality of output from AI-assisted work samples
- Observed problem-solving approaches when using AI tools
- Self-efficacy assessments tied to specific use cases
The assessment should mirror how AI skills will be used in practice.
Level 3: Behaviour
This is often the most valuable level for AI training. Are people actually using these tools in their work?
Measurement approaches:
- Usage analytics from AI platforms (where available and privacy-appropriate)
- Manager observations of work practices
- Self-reported frequency of AI application
- Review of work outputs for evidence of AI assistance
- Follow-up interviews about usage patterns
Watch for the plateau effect. Initial enthusiasm often fades. Assess behaviour at multiple time points.
Level 4: Results
The connection between AI training and business results is real but difficult to isolate. Focus on:
- Productivity metrics for AI-augmented workflows
- Quality indicators for AI-assisted outputs
- Time savings on specific tasks
- Error rates where AI might reduce mistakes
Use comparison approaches where possible: teams that received training versus those that didn’t, or time periods before and after training.
Be honest about attribution limitations. You can say “results improved after training” without claiming training was the sole cause.
Beyond Kirkpatrick: Additional Considerations
Some AI-era evaluation needs fall outside Kirkpatrick’s framework.
Ethical and Appropriate Use
Did people learn to use AI responsibly? Do they understand limitations? Are they avoiding problematic applications? This is distinct from capability and may require separate assessment.
Adaptability
Did training build adaptive capability, not just current skills? When AI capabilities change, do participants figure out new approaches, or do they need retraining?
Transfer to Novel Situations
AI fluency isn’t about following procedures. It’s about creative application to new situations. Assess whether people can apply principles to scenarios they weren’t explicitly trained on.
A Practical Evaluation Framework
Here’s how I structure AI training evaluation in practice:
Immediately after training:
- Reaction survey with confidence scales
- Practical demonstration of core capabilities
- Questions about planned applications
At two weeks:
- Brief usage check-in
- Troubleshooting support for early barriers
- Peer sharing of early successes
At six weeks:
- Comprehensive behaviour assessment
- Manager observations
- Usage analytics review
- Quality review of AI-assisted work samples
At twelve weeks:
- Results assessment against baseline
- Learning decay analysis
- Recommendations for reinforcement
This provides multiple data points without overwhelming the evaluation effort.
The Bigger Picture
Kirkpatrick’s model has endured because it asks the right fundamental question: did training create value?
That question remains relevant regardless of technology changes. What’s changed is how we answer it—faster cycles, more practical assessment, greater attention to adaptability, and honest acknowledgment of attribution challenges.
The model isn’t obsolete. But it needs thoughtful adaptation for a world where the skills we’re developing evolve as fast as the tools we’re teaching people to use.
The organisations that figure out evaluation in this environment will have significant advantages. They’ll invest in training that works and stop investing in training that doesn’t. That clarity becomes more valuable as the stakes of AI adoption increase.
Kirkpatrick would probably approve. The specifics have changed, but the pursuit of demonstrated training value hasn’t.