We look at a doorman and might naively think their function is to open doors so if we can replace their function, we can replace the person. Then we build an automation, fire the doorman and subsequently discover the doorman was responsible for a multitude of social tasks, like taking in the mail, co-ordinating services, providing small tasks for the residents etc. and that physically opening and closing doors was actually the least important part of their job.
Similarly, we think the purpose of code review must obviously be for reviewing code until we look deeper and understand the sociological purposes of code review.
Technologists have a bad habit of entering into a field they don't know, observing but not talking to any of the people they're trying to automate and assuming only the most legible parts of anyone's job are important to grok. It's important to understand that the value any job or function brings can often be totally opaque to an outsider and it requires actually talking to people and understanding total value chains to fully understand where technology can be used to improve things.
> Similarly, we think the purpose of code review must obviously be for reviewing code until we look deeper and understand the sociological purposes of code review.
The truth of this was made clear to me when everyone started doing asynchronous online "code reviews" where the reviewers are just looking at the code (in theory, anyway).
That alone eliminated the majority of the benefit of code reviews, where the author of the code walks through it and explains it in real time. It was amazing how many times I'd seen devs get partway into their explanation then stop when doing that gave them an important insight or revelation that otherwise would never have been surfaced.
One thing I enjoy when working on a new feature is to schedule a team call for me to walk through the code, detailing the changes or improvements I've made, thought processes and overall just give better context into what the wall of text they are going to review is all about.
Likewise, I enjoy when colleagues take the time to do the same with their own PRs. It's something I've tried to promote, but I think everyone is just in a default mode of ship-ship-ship, so people aren't eager to have another meeting to attend.
(Ideally, we shouldn't be submitting PRs that are so huge, but sometimes this isn't always feasible.)
Probably be more aptly called the "elevator operator fallacy".
The elevator operator ensures the safety and security of the passengers, makes sure the elevator is properly maintained and assists the elderly and disabled along with numerous other important tasks.
People believe the elevator operator is only responsible for operating the actuating lever and mistakenly believe they can be automated away without fully understanding their full value to the operations of elevators.
We use AI code review and it is genuinely helpful, but I agree it's mostly just making my life easier to review my own PR by pointing out salient points I would otherwise not really think about.
This obviously is not a replacement for another human looking at your code, and I would not do it in safety critical environments, but it really helps especially in small teams where time is precious and you ship fast.
My only issue is that I would love a dedicated UI where I get this review BEFORE another human looks at the code, so their feedback is not drowned by the AI noise
Interestingly, the bar for machine accuracy is often much higher than for humans for us to be willing to adopt its autonomy (baseball ABS systems or self-driving cars come to mind as examples).
I can imagine a future where even if an AI code reviewer that autonomously reviewed code resulted in a noticeable drop in some meaningful metrics, like DORA's MTTR or Deployment Frequency, an organization would flip out the very first time the AI made a mistake, even if a human reviewer would have resulted in 10x as many impactful errors.
This well-thought article restricts the notion of code review to pull request assessment - diff file - on an existing code base, perhaps a mostly human-written legacy code base.
As a Java code generation researcher, I would appreciate any constructive comments regarding the automated code review of automated changes made to an entirely generated code base, whose live specifications and requirements have been modified in order to generate the code being reviewed.
It links to a Forbes article arguing that the best chess is played by humans and computers working together.
This concept is evidently false. Alpha Zero played super human chess without any human assistance beyond coding its machine learning algorithm and inputting the rules of chess.
Furthermore, in a timed chess match the delay and possible misjudgments from a human would detract from the performance of the human-computer team versus the computer chess program alone.
Fascinating - are you imagining a sort of adversarial AI situation, where one LLM bot writes the PRs, and a different one reviews them, leading to an organically improving codebase? Kind of a cool idea.
Pull requests are too-downstream - a legacy I believe.
An AGI system that self-improves its code will regenerate every component impacted by the enhancement starting from live system design narratives, useful existing components, relevant design patterns, and intermediate development artifacts that are discarded or become stale in human-driven legacy coding.
I see "agents" as mostly bodies of assembled prompts for LLMs of various strengths used at the appropriate time in the pipeline of code development. A code review agent's prompt would not have the task of generating code and thus not need all that particular context, but would look for historically observed 'gotchas' and flag those for automatic repair, and the repair could go all the way back through the artifact chain to the text requirements and specifications.
AI won't replace humans at the moment, but it does make humans more productive. I've personally increased my productivity by many % after integrating AI into my workflow.
I really like the general take that LLMs scanning PRs is simply "zero config CI." We already have a great paradigm for this; we don't need to reinvent a new category. In that light, we can weigh its value more as a fuzzy linter, rather than a be-all-end-all.
If humans can't scale to review, how are they scaling to code?
Humans how code should review, or the whole point of code review is pointless, and everyone should just push to HEAD and revert on failure
Rory Sutherland calls this the "doorman fallacy".
We look at a doorman and might naively think their function is to open doors so if we can replace their function, we can replace the person. Then we build an automation, fire the doorman and subsequently discover the doorman was responsible for a multitude of social tasks, like taking in the mail, co-ordinating services, providing small tasks for the residents etc. and that physically opening and closing doors was actually the least important part of their job.
Similarly, we think the purpose of code review must obviously be for reviewing code until we look deeper and understand the sociological purposes of code review.
Technologists have a bad habit of entering into a field they don't know, observing but not talking to any of the people they're trying to automate and assuming only the most legible parts of anyone's job are important to grok. It's important to understand that the value any job or function brings can often be totally opaque to an outsider and it requires actually talking to people and understanding total value chains to fully understand where technology can be used to improve things.
> Similarly, we think the purpose of code review must obviously be for reviewing code until we look deeper and understand the sociological purposes of code review.
The truth of this was made clear to me when everyone started doing asynchronous online "code reviews" where the reviewers are just looking at the code (in theory, anyway).
That alone eliminated the majority of the benefit of code reviews, where the author of the code walks through it and explains it in real time. It was amazing how many times I'd seen devs get partway into their explanation then stop when doing that gave them an important insight or revelation that otherwise would never have been surfaced.
One thing I enjoy when working on a new feature is to schedule a team call for me to walk through the code, detailing the changes or improvements I've made, thought processes and overall just give better context into what the wall of text they are going to review is all about.
Likewise, I enjoy when colleagues take the time to do the same with their own PRs. It's something I've tried to promote, but I think everyone is just in a default mode of ship-ship-ship, so people aren't eager to have another meeting to attend.
(Ideally, we shouldn't be submitting PRs that are so huge, but sometimes this isn't always feasible.)
Probably be more aptly called the "elevator operator fallacy".
The elevator operator ensures the safety and security of the passengers, makes sure the elevator is properly maintained and assists the elderly and disabled along with numerous other important tasks.
People believe the elevator operator is only responsible for operating the actuating lever and mistakenly believe they can be automated away without fully understanding their full value to the operations of elevators.
Is the implication that, just as all elevator operators have been replaced and are not missed, the same will happen to all programmers?
That said; we don’t have elevator operators anymore, and we get along just fine. I don’t miss them.
Love this take
We use AI code review and it is genuinely helpful, but I agree it's mostly just making my life easier to review my own PR by pointing out salient points I would otherwise not really think about.
This obviously is not a replacement for another human looking at your code, and I would not do it in safety critical environments, but it really helps especially in small teams where time is precious and you ship fast.
My only issue is that I would love a dedicated UI where I get this review BEFORE another human looks at the code, so their feedback is not drowned by the AI noise
Interestingly, the bar for machine accuracy is often much higher than for humans for us to be willing to adopt its autonomy (baseball ABS systems or self-driving cars come to mind as examples).
I can imagine a future where even if an AI code reviewer that autonomously reviewed code resulted in a noticeable drop in some meaningful metrics, like DORA's MTTR or Deployment Frequency, an organization would flip out the very first time the AI made a mistake, even if a human reviewer would have resulted in 10x as many impactful errors.
This well-thought article restricts the notion of code review to pull request assessment - diff file - on an existing code base, perhaps a mostly human-written legacy code base.
As a Java code generation researcher, I would appreciate any constructive comments regarding the automated code review of automated changes made to an entirely generated code base, whose live specifications and requirements have been modified in order to generate the code being reviewed.
One nit-pick with the article:
It links to a Forbes article arguing that the best chess is played by humans and computers working together.
This concept is evidently false. Alpha Zero played super human chess without any human assistance beyond coding its machine learning algorithm and inputting the rules of chess.
Furthermore, in a timed chess match the delay and possible misjudgments from a human would detract from the performance of the human-computer team versus the computer chess program alone.
Fascinating - are you imagining a sort of adversarial AI situation, where one LLM bot writes the PRs, and a different one reviews them, leading to an organically improving codebase? Kind of a cool idea.
Pull requests are too-downstream - a legacy I believe.
An AGI system that self-improves its code will regenerate every component impacted by the enhancement starting from live system design narratives, useful existing components, relevant design patterns, and intermediate development artifacts that are discarded or become stale in human-driven legacy coding.
I see "agents" as mostly bodies of assembled prompts for LLMs of various strengths used at the appropriate time in the pipeline of code development. A code review agent's prompt would not have the task of generating code and thus not need all that particular context, but would look for historically observed 'gotchas' and flag those for automatic repair, and the repair could go all the way back through the artifact chain to the text requirements and specifications.
https://claude.ai/share/1c24e8c7-9d6f-4156-9ed9-0714b0ba6879
AI won't replace humans at the moment, but it does make humans more productive. I've personally increased my productivity by many % after integrating AI into my workflow.
I think an AI can help with a code review. I see tones for submitted CR requests for stuff that flat out doesn't compile or pass tests.
is that AI or just actually running CI?
I really like the general take that LLMs scanning PRs is simply "zero config CI." We already have a great paradigm for this; we don't need to reinvent a new category. In that light, we can weigh its value more as a fuzzy linter, rather than a be-all-end-all.
could use both though, failing CI->bot->simple summary of why CI failed. even better if this happens before you've requested a reviewer
Also, humans will not replace AI code review, because humans get tired and lazy, and can't be scaled up to cover the need.
If humans can't scale to review, how are they scaling to code? Humans how code should review, or the whole point of code review is pointless, and everyone should just push to HEAD and revert on failure