Skip to main content

Five lessons on building generative and agentic AI solutions

Surprising takeaways about AI ethical considerations, evaluation metrics, user experience design, and more.

Chatbots were a good beginning—but is it time to rethink them? Discover five lessons we’ve uncovered from building real-world generative and agentic AI solutions. Learn how to manage unpredictable outputs, design a human-centered user experience, stay grounded in ethical considerations, and define evaluation metrics for open-ended, evolving AI systems.

1. Agentic AI solutions should start small

Agentic AI solutions have broad potential, making it challenging to align them with business-specific value. Scope creep and misaligned expectations can derail business strategies. Start with narrowly focused solutions that can be refined over time to build value.

Key considerations for building agentic AI solutions:

  • Define use cases and users early. Establish well-scoped use cases early to help maintain focus, prevent scope creep, and ensure business alignment. 
  • Educate stakeholders. Define and manage realistic expectations for application, value, and limitations. 
  • Prioritize data quality. Accurate and useful outputs depend on well-managed, quality data. Prepare to allocate the necessary resources to data curation. 
  • Fast feature generation does not necessarily reduce production time. Rapid feature releases don’t always shorten time to production, scale, or value.

2. Evaluation metrics for agentic AI solutions are open-ended and evolving

Unlike traditional AI, where solutions can be validated against labeled data gathered before model development, agentic AI solutions generally rely on post hoc human feedback in development. However, the evaluation is particularly challenging due to the unstructured and inherent novelty of outputs (i.e., generated text or images that never previously existed).

Key considerations for AI evaluation metrics:

  • Define KPIs upfront. Establish performance benchmarks early, even if they are approximate. 
  • Establish technical evaluation metrics. Define clear criteria for technical accuracy, relevance, and comprehensiveness. These should be aligned with business requirements to help ensure meaningful and reliable assessments. Use rubric-based scoring, human review, or automated tools for evaluation. 
  • Simulations offer controlled environments for evaluation. Simulated users and environments can test agentic AI solutions with benchmarks defined with the help of subject matter experts. 
  • Evaluate entire task trajectories. Evaluate sequential outputs against expectations, including tool use and response chains. 
  • Implement structured feedback loops. Establish structured feedback loops by conducting user sessions and deploying UI-embedded surveys or rating systems for real-time input. Supplement with semi-automated LLM-generated feedback for scalable evaluation. 
  • Iteratively review and refine. Review and refine iteratively through scheduled deployment, testing, and improvement cycles. Enhance performance via repeated testing, error analysis, prompt tuning, system adjustments, and user feedback integration.

Five lessons from the frontlines

Consider all five lessons we’ve gleaned from building generative and agentic AI solutions, including takeaways on user experience design, surprising challenges, tips for keeping up with the fast pace of change—and an unexpected reflection regarding chatbots.

Did you find this useful?

Thanks for your feedback