Computer, Don't Fail Me

Gleb Bahmutov

Climate Crisis Is Bad

gleb.dev

Adjust your life

Join an organization

Vote

Greenpeace 350 Sunrise Citizen Climate Lobby 3rd Act Mothers Out Front

In This Talk

Vibe app coding
Vibe app testing
What can AI even do?
Used car sales
The end

gleb.dev

Speaker: Gleb Bahmutov PhD

🦋 bahmutov.bsky.social

gleb.dev

github.com/bahmutov

glebbahmutov.com/blog

C / C++ / C# / Java / CoffeeScript / JavaScript / Node / Angular / Vue / Cycle.js / functional programming / testing

www.youtube.com/glebbahmutov

🌎 🔥 350.org 🌎 🔥 citizensclimatelobby.org 🌎 🔥

https://cypress.tips/courses

gleb.dev

Gleb Bahmutov

Sr Director of Engineering

Mercari Does A Lot Of Testing

https://slides.com/bahmutov/decks/mercari

A typical Mercari US Cypress E2E test

gleb.dev

image source: https://www.popularmechanics.com/culture/g2759/starship-uss-enterprise-ranked/

Computer, one cup of hot tea

Computer, ????, and three glasses

gleb.dev

Computer, one web app!

gleb.dev

Now we wait...

Wait some more...

Pretty good

result after 90 seconds!

Greenfield development is the best

Q: How does AI know how to answer this?

A: It was trained. A lot.

gleb.dev

Hallucinations
Weird code
Verbosity

gleb.dev

Bad Training Leads To:

Maintenance nightmare

gleb.dev

Easy Code Generation Leads To:

Computer, one end-to-end test!

gleb.dev

Hmm, how do I ...

Not a greenfield project

〞

Computer, browse the acme.co website and write 5 tests.

– Lazy me

NO

gleb.dev

test('addition', () => {
  expect(sum(2, 3), '2+3').to.equal(sum(2, 3))
})

This is a bad test.

Do NOT trust your app to be correct

test('addition', () => {
  expect(sum(2, 3), '2+3').to.equal(5)
})

gleb.dev

Computer, browse acme.co

and check the following behavior:

step A
expect B
step C
expect D

gleb.dev

gleb.dev

Pick elements to interact with?
Assert the results on the page?
Check other data?

AI needs to "know" your app

gleb.dev

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

institutional knowledge (in your head)

Computer, read the source code / design docs and find out!

gleb.dev

https://gitingest.com/

gleb.dev

Copy the entire repo source code

and include with your prompt...

gleb.dev

Copy the entire repo source code

and include with your prompt...

gleb.dev

$ npx repomix path/to/directory
$ npx repomix --remote bahmutov/todo-ai-example

gleb.dev

The larger the context ...

the longer we wait 🕰️

and pay more 💰

Prompt 1

Lots of context

"Thinking"

Prompt 2

Lots of context

"Thinking"

Prompt 3

Lots of context

"Thinking"

Prompt 4

Lots of context

"Thinking"

gleb.dev

Prompt 2

Lots of context

"Thinking"

Prompt 3

Lots of context

"Thinking"

Prompt 4

Lots of context

"Thinking"

gleb.dev

Prompt 1

Lots of context

"Thinking"

Review

Reviewing AI code

Not Ideal

gleb.dev

Human code

AI code

gleb.dev

context

time

gleb.dev

Slow

Complex

Likely to 🚨

context

time

Simple

Fast

Likely ✅

gleb.dev

Inline code completions
Retrieval Augmented Generation

Simple tasks
Picking test tags

Slow

Complex

Likely to 🚨

Copilot inline suggestion

speed up coding 👍👍👍

comments give Copilot

all the context

gleb.dev

Inline code completions

Write good comments

They help:

you
your coworkers
AI

gleb.dev

Retrieval Augmented Generation

"Build RAG Using Chroma DB" https://glebbahmutov.com/blog/build-rag-using-chroma-db/

Code Generation

Take existing code and comments
Generate more code

Take existing code and comments
Find relevant high quality documents
Add found results to the prompt
Generate more code

Augmented Code Generation

gleb.dev

Retrieval Augmented Generation

📝 "Build RAG Using Chroma DB" https://glebbahmutov.com/blog/build-rag-using-chroma-db/

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
  cy.get('#foo')
    .invoke('text')
    .then((newText) => {
      cy.get('@initialText').then((initialText) => {
        expect(newText).to.not.equal(initialText)
      })
    })
})

gleb.dev

Retrieval Augmented Generation

"Build RAG Using Chroma DB" https://glebbahmutov.com/blog/build-rag-using-chroma-db/

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
  cy.get('#foo')
    .invoke('text')
    .then((newText) => {
      cy.get('@initialText').then((initialText) => {
        expect(newText).to.not.equal(initialText)
      })
    })
})

Subtle timing issue when we get the text

gleb.dev

Retrieval Augmented Generation

"Build RAG Using Chroma DB" https://glebbahmutov.com/blog/build-rag-using-chroma-db/

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
})

cy.get('#output')
  .invoke('text')
  .then((text) => {
    cy.get('#change').click()
    cy.get('#output').should('not.have.text', text)
  })

retrieved

code example

https://glebbahmutov.com/cypress-examples

gleb.dev

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
})

cy.get('#output')
  .invoke('text')
  .then((text) => {
    cy.get('#change').click()
    cy.get('#output').should('not.have.text', text)
  })

it('changes the label after the click', () => {
  cy.visit('/')
  // verify the label text has changed
  cy.get('#foo')
    .invoke('text')
    .then((oldText) => {
      // click the button
      cy.get('#bar').click()
      cy.get('#foo').should('not.have.text', oldText)
    })
})

https://glebbahmutov.com/cypress-examples

Training Quality Beats Quantity

gleb.dev

(you still need good examples!)

Simple tasks

gleb.dev

Triaging a failed Cypress test at Mercari US

Ask AI agent to fix it via Slack interface

Simple tasks

gleb.dev

Simple tasks

👍

gleb.dev

Likely to succeed
Easy to review
Tested

Picking A Test Tag

Large Language Models

"AI Picks Tests To Run On A Bug"

https://glebbahmutov.com/blog/ai-picks-tests-to-run-on-a-bug/

cypress/e2e/app-spec.js (15 tests)
└─ TodoMVC - React
  ├─ adds 4 todos [@smoke, @add]
  ├─ When page is initially opened
  │ └─ should focus on the todo input field
  ├─ No Todos
  │ └─ should hide #main and #footer [@misc]
  ├─ New Todo [@add]
  │ ├─ should allow me to add todo items
  │ ├─ adds items
  ...

/**
 * These are valid test tags used in our test cases,
 * plus their descriptions
 */
const TEST_TAGS = {
  '@smoke': 'Smoke tests - a small set of tests to check the main features',
  '@misc': 'Miscellaneous unimportant tests',
  '@add': 'Tests related to adding new todo items to the list',
  '@edit': 'Tests related to editing existing todo items in the list',
  '@routing':
    'Tests related to routing between different views and pages in the app',
  ...
}

gleb.dev

"AI Picks Tests To Run On A Bug"

https://glebbahmutov.com/blog/ai-picks-tests-to-run-on-a-bug/

gleb.dev

context

time

gleb.dev

Simple

Async

✅ is optional

AI code reviews
Code by example
Meaningful abstractions

Simple

Fast

Likely ✅

Slow

Complex

Likely to 🚨

When performing a code review:

- confirm that there are no hard-coded magic numbers.
  Prefer using named constants.
- do not allow unreachable code
- check each HTML element that shows any unique application data,
  like prices, values, names, address, etc to have a `data-testid`
  attribute to be used in end-to-end tests. If the attribute is missing,
  add a `data-testid` attribute with a meaningful value.
  Also add `data-testid` attributes to the top level forms, pages,
  large components.

copilot-instructions.md

AI code reviews

https://glebbahmutov.com/blog/copilot-pull-request-reviews/

Copilot review can detect page elements without “data-testid” attributes and even suggest good attribute names

https://glebbahmutov.com/blog/copilot-pull-request-reviews/

Custom "linter" rules

// ANTI-PATTERN: hardcoded wait
cy.wait(45_000)

import { defineConfig } from 'eslint/config'
import pluginCypress from 'eslint-plugin-cypress'
export default defineConfig([
  {
    plugins: {
      cypress: pluginCypress,
    },
    rules: {
      'cypress/no-unnecessary-waiting': 'warn',
    },
  },
])

cypress-io/eslint-plugin-cypress

eslint.config.js

What if I want to warn on waits longer than 30 seconds?!

When performing a code review, if the modified spec file has `cy.wait(n)` call, suggest replacing it with `cy.wait(seconds(n/1000))` value. Also suggest changing it if the duration is longer than 30 seconds.

copilot-instructions.md

Simple tasks

gleb.dev

Start New Spec File

Simple tasks

gleb.dev

Start New Spec File

Meaningful

Abstractions

gleb.dev

Instead of starting each prompt with:

Look through the entire codebase / app specs and do X

gleb.dev

## Use the TodoMVC page object

Preferred way is to use the TodoMVC page object from `cypress/e2e/todomvc.po.js`

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.visit()
```

## Reset the backend

Test can reset the backend data to zero todos state using the following commands

```js
cy.request('POST', '/reset', { todos: [] })
```

## Application loaded

Test can confirm the application has finished loading

```js
cy.get('body.loaded')
```

## Set the backend data

You can set the backend to have specific todos before visiting the app. Let's set 2 todos. Each todo must have an `id`, `title`, and `completed` status.

```js
cy.request('POST', '/reset', {
  todos: [
    { id: '1', title: 'learn testing', completed: false },
    { id: '2', title: 'learn cypress', completed: false },
  ],
})
```

Preferably, use the page object method

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.reset([
  { id: '1', title: 'learn testing', completed: false },
  { id: '2', title: 'learn cypress', completed: false },
])
```

copilot-instructions.md

Tip: adding a new tool or dependency - update the agent instructions file

https://glebbahmutov.com/blog/npm-install-and-copilot-instructions/

gleb.dev

Without AI instructions 👎

With AI instructions ✅

gleb.dev

Voice prompt

gleb.dev

📝 blog post "Good examples" https://glebbahmutov.com/blog/good-examples/

Sept 14, 2014

gleb.dev

## Use the TodoMVC page object

Preferred way is to use the TodoMVC page object from `cypress/e2e/todomvc.po.js`

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.visit()
```

## Reset the backend

Test can reset the backend data to zero todos state using the following commands

```js
cy.request('POST', '/reset', { todos: [] })
```

## Application loaded

Test can confirm the application has finished loading

```js
cy.get('body.loaded')
```

## Set the backend data

You can set the backend to have specific todos before visiting the app. Let's set 2 todos. Each todo must have an `id`, `title`, and `completed` status.

```js
cy.request('POST', '/reset', {
  todos: [
    { id: '1', title: 'learn testing', completed: false },
    { id: '2', title: 'learn cypress', completed: false },
  ],
})
```

Preferably, use the page object method

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.reset([
  { id: '1', title: 'learn testing', completed: false },
  { id: '2', title: 'learn cypress', completed: false },
])
```

copilot-instructions.md

📝 blog post "Copilot Instructions Example" https://glebbahmutov.com/blog/copilot-instructions-example/

Oct 9, 2025

gleb.dev

examples

gleb.dev

Replicator

gleb.dev

Small simple steps following a plan

"prompt: assemble Millennium Falcon"

gleb.dev

Final Thoughts

gleb.dev

AI Codes Problem

time

app

complexity

understanding how everything works

gleb.dev

AI Codes Problem

time

app

complexity

understanding how everything works if AI codes

gleb.dev

AI Codes Problem

AI for prototypes and experiments
smaller systems
code-by-example

Potential solutions

gleb.dev

Will AI Replace us?

https://youtu.be/SpPhm7S9vsQ

gleb.dev

Will AI Replace us?

https://youtu.be/SpPhm7S9vsQ

Computer, Don't Fail Me

Gleb Bahmutov

Computer, Don't Fail Me

Modern AI coding assistants like GitHub Copilot and Cursor promise easy test automation; just prompt the assistant to write a test, and watch it work. I found the day-to-day experience much different. In this talk, I will show which tasks AI is suitable for, how to collect relevant context for each prompt, and how to guide the AI to code better end-to-end tests. Presented at MaineJS meetup.

Gleb Bahmutov PRO

JavaScript ninja, image processing expert, software quality fanatic

glebbahmutov.com

Computer, Don't Fail Me

Gleb Bahmutov

gleb.dev

Climate Crisis Is Bad

Adjust your life

Join an organization

Vote

Greenpeace 350 Sunrise Citizen Climate Lobby 3rd Act Mothers Out Front

In This Talk

Vibe app coding

Vibe app testing

What can AI even do?

Used car sales

The end

Speaker: Gleb Bahmutov PhD

Gleb Bahmutov

Sr Director of Engineering

Mercari Does A Lot Of Testing

Computer, one web app!

Greenfield development is the best

Bad Training Leads To:

Easy Code Generation Leads To:

Computer, one end-to-end test!

〞

NO

Computer, browse acme.co

and check the following behavior:

step A

expect B

step C

expect D

Reviewing AI code

Not Ideal

Human code

AI code

Inline code completions

Retrieval Augmented Generation

Simple tasks

Picking test tags

Inline code completions

Write good comments

They help:

you

your coworkers

AI

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

Code Generation

Augmented Code Generation

Training Quality Beats Quantity

Simple tasks

Simple tasks

Simple tasks

👍

Picking A Test Tag

AI code reviews

Code by example

Meaningful abstractions

AI code reviews

Custom "linter" rules

Simple tasks

Start New Spec File

Simple tasks

Start New Spec File

Meaningful

Abstractions

Replicator

Final Thoughts

AI Codes Problem

AI Codes Problem