I think Claude Code can write very good end to end tests given the right constructs.
I have been building a desktop app (electron-based) which interacts with Anthropic’s AgentSDK and the local file system.
It’s 100% spec driven and Claude Code has written every line. I do large features instead of small ones (spec in issue around 300 lines of markdown).
I have had it generate playwright tests from the start. It was doing okay but one thing made it do amazing. I created a spec driven pull request to use data-testid attributes for selectors.
Every new feature adds tests, and verifies it hasn’t broken existing features.
I don’t even bother with unit tests. It’s working amazing.
Interesting approach. I have noticed the same issue — AI tools generate a lot of code and unit tests, but real user-flow or edge-case testing often gets skipped. Having something that reads the PR context and suggests missing scenarios could actually catch problems earlier.
I think Claude Code can write very good end to end tests given the right constructs.
I have been building a desktop app (electron-based) which interacts with Anthropic’s AgentSDK and the local file system.
It’s 100% spec driven and Claude Code has written every line. I do large features instead of small ones (spec in issue around 300 lines of markdown).
I have had it generate playwright tests from the start. It was doing okay but one thing made it do amazing. I created a spec driven pull request to use data-testid attributes for selectors.
Every new feature adds tests, and verifies it hasn’t broken existing features.
I don’t even bother with unit tests. It’s working amazing.
Interesting approach. I have noticed the same issue — AI tools generate a lot of code and unit tests, but real user-flow or edge-case testing often gets skipped. Having something that reads the PR context and suggests missing scenarios could actually catch problems earlier.