SANS

GERARD

Google Developer Expert

Developer Evangelist

International Speaker

Spoken 210 times in 43 countries

Founder NextAI London

Founder Axiom Masterclass

Stochastic parrot or AGI?

What is a distribution?

Algorithmic Bias

Pick a number between 1 and 10 without saying it out loud.

Rise your hand if you picked the number

7

10

Seven is up to 10 times higher!

x10

7

10

7

10

The Ghost of Out-Of-Distribution

10:10

9:15

Final Boss: Context Contamination

Messi

Ronaldo

Lisbon

Distribution Drift

Original Prompt

New Prompt

What is an AI Agent?

Access to Tools and APIs

Python Sandbox

Google Search

Function

Calling

Multi-Agent Systems

Context Agent

 

Priorisation
Agent

 

Execution Agent

 

User

 

Goal/Rules

 

Task Queue

 

Task

 

Task creation Agent

 

1. Provide objective

 

4. Complete task

 

6. Update tasks

 

2. Add new tasks

 

3. Query context

 

Memory

 

5. Store task/result

 

Tools

 

Gemini Deep Research

Gemini Live in Action

Talk

30 Natural Voices

+24 Languages

2-15min

Show

Video

Attachments

Ask

Code Execution

Function Calling

 Web Search

AI Voice

Agents

AI Voice Agents Use Cases

Real Time

Native Audio

200-600 ms

30 Voices

Interactive

Half-Cascade

500-800 ms

8 Voices

On Demand

Text-To-Speech

few seconds

30 Voices

Tools: Google Search

Google Search

Run query in

Google Search

Tools: Function Calling

Function Calling vs MCP

Model 2

Model 1

Model 2

Model 1

Demo: AI Voice Agent using MCP

Voice-First Use Cases

24/7 Bookings using AI Voice Agents

Robot Cafe

Scan to access Slides.

Useful Links and Materials

Building an AI Voice Agent to Automate a Robot Cafe with Google Gemini Live and MCP

By Gerard Sans

Building an AI Voice Agent to Automate a Robot Cafe with Google Gemini Live and MCP

The launch of Alexa+ has sparked renewed excitement around the next generation of AI voice assistants powered by generative AI. With Gemini 2.5 and the new Gemini Live API together with the power of MCP, developers now have the tools to build voice-driven AI agents that seamlessly integrate into web applications, backend services, and third-party APIs.In this talk we will go beyond simple chatbot interactions to explore how AI agents can power real-world automation—in this case, running an entire robot cafe. We’ll walk through building a voice-first assistant capable of executing complex workflows using MCP, streaming real-time audio, querying databases, and interacting with external services. This marks a shift from "ask and respond" to a more dynamic "talk, show, and act" experience. You might assume taking a coffee order is straightforward, but even a basic interaction involves more than 15 distinct states. These include greeting the customer, handling the order flow, confirming selections, applying offer codes, managing exceptions, and supporting cancellations or changes. Behind the scenes, the AI agent using MCP coordinates with multiple systems to fetch menu data, validate inputs, and trigger robotic actions. You’ll learn how to stream microphone data, integrate with Gemini voice responses, and use the GenAI SDK to connect everything together using MCP. Instead of a traditional chat UI, this project creates a fully voice-automated, hands-free experience where the assistant doesn’t just chat—it runs the operation. Join us for a deep dive into the future of AI automation using MCP — where natural voice is the interface, and the AI agent takes care of the rest, including your fancy choice of coffee!

  • 28