This is a problem everywhere now, and not just in code. It now takes zero effort to produce something, whether code or a work plan or “deep research” and then lob it over the fence, expecting people to review and act upon it.
It’s an extension of the asymmetric bullshit principle IMO, and I think now all workplaces / projects need norms about this.
This problem statement was actually where the idea for Proof of Work (aka mining) in bitcoin came from. It evolved out of the idea of requiring a computational proof of work for sending an email via cypherpunk remailers as a way of fighting spam. The idea being only a legitimate or determined sender would put in the "proof of work" to use the remailer.
I wonder how it would look if open source projects required $5 to submit a PR or ticket and then paid out a bounty to the successful or at least reasonable PRs. Essentially a "paid proof of legitimacy".
It feels like reputation / identity are about to become far more critical in determining whether your contribution, of whatever form, even gets considered.
But why should this expectation be honored? If someone spends close to zero effort generating a piece of code and lobs it over the fence to me, why would I even look at it? Particularly if it doesn't even meet the requirements for a pull request (which is what it seems like the article is talking about)?
My music/Youtube algos are ruined because when I flag I don't like the 100 AI songs/videos that it presents me each day the algos take it as my no longer liking those genres. Between me down rating AI music/AI history videos, Youtube now give me like half a page of recommendations then gives up. I'm now punished by Youtube/my experience is worse because Youtube's cool with hosting so much AI slop content and I chose to downrate it/try to curate if out of my feed. The way Youtube works today it punishes you (or trys to train you not to) for flagging 'don't recommend channel' when recommended a channel of AI slop. Flag AI and Youtube will degrade you algo recommendations.
>
This is a problem everywhere now, and not just in code. It now takes zero effort to produce something, whether code or a work plan or “deep research” and then lob it over the fence, expecting people to review and act upon it.
Where is the problem? If I don't have the time to review a PR, I simply reject it. Or if I am flooded in PRs, I only take those from people from which I know that their PRs are of high quality. In other words: your assumption "expecting people to review and act upon it" is wrong.
Even though I would bet that for the kind of code that I voluntarily write in my free time, using an LLM to generate lots of code is much less helpful because I use such private projects to try out novel things that are typically not "digested stuff from the internet".
So, the central problem that I rather see is the license uncertainties for AI-generated code.
My opinion swings between hype to hate every day. Yesterday all suggestions / edits / answers were hallucinated garbage, and I was ready to remove the copilot plugin altogether. Today I was stuck at a really annoying problem for hours and hours. For shits and giggles I just gave Claude a stacktrace and a description and let it go ham. It produced an amazingly accurate thought train and found my issue, which was not what I was expecting at all.
I still don't see how it's useful for generating features and codebases, but as a rubber ducky it ain't half bad.
I've been skeptical about LLMs being able to replace humans in their current state (which has gotten marginally better in the last 18 months), but let us not forget that GPT-3.5 (the first truly useful LLM) was only 3 years ago. We aren't even 10 years out from the initial papers about GPTs.
Well, when MS give OpenAI free use of their servers and OpenAI call it a $10 billion investment, then they use up their tokens and MS calls in $10 billion in revenue, I think so, yes.
I feel like we need a different programming paradigm that's more suited to LLM's strengths; that enables a new kind of application. IE, think of an application that's more analog with higher tolerances of different kinds of user inputs.
A different way to say it. Imagine if programming a computer was more like training a child or a teenager to perform a task that requires a lot of human interaction; and that interaction requires presenting data / making drawings.
When people talk about the “AI bubble popping” this is what they mean. It is clear that AI will remain useful, but the “singularity is nigh” hype is faltering and the company valuations based on perpetual exponential improvement are just not realistic. Worse, the marginal improvements are coming at ever higher resource requirements with each generation, which puts a soft cap on how good an AI can be and still be economical to run.
Maybe, maybe not, it’s hard to tell from articles like this from OSS projects what is generally going on, especially with corporate work. There is no such rhetoric at $job, but also, the massive AI investment seemingly has yet to shift the needle. If it doesn’t they’ll likely fire a bunch of people again and continue.
I was extremely skeptical at the beginning, and therefore critical of what was possible as my default stance. Despite all that, the latest iterations of cli agents which attach to LSPs and scan codebase context have been surprising me in a positive direction. I've given them tasks that require understanding the project structure and they've been able to do so. Therefore, for me my trajectory has been from skeptic to big proponent of the use, of course with all the caveats that at the end of the day, it is my code which will be pushed to prod. So I never went through the trough of disillusionment, but am arriving at productivity and find it great.
It feels that way to me, too—starting to feel closer to maturity. Like Mr. Saffron here, saying “go ham with the AI for prototyping, just communicate that as a demo/branch/video instead of a PR.”
It feels like people and projects are moving from a pure “get that slop out of here” attitude toward more nuance, more confidence articulating how to integrate the valuable stuff while excluding the lazy stuff.
> “I am closing this but this is interesting, head over to our forum/issues to discuss”
I really like the way Discourse uses "levels" to slowly open up features as new people interact with the community, and I wonder if GitHub could build in a way of allowing people to only be able to open PRs after a certain amount of interaction, too (for example, you can only raise a large PR if you have spent enough time raising small PRs).
This could of course be abused and/or lead to unintended restrictions (e.g. a small change in lots of places), but that's also true of Discourse and it seems to work pretty well regardless.
Mailing lists are used as a filter to raise the barrier to entry to prevent people from contributing code that they have no intention of maintaining and leaving that to the project owners. Github for better or worse has made the barrier to entry much much lower and significantly easier for people to propose changes and then disappear.
I use it like this: If a PR is LLM-generated, you as a maintainer either merge it if it's good or close if it's not. If it's human-written, you may spend some time reviewing the code and iterating on the PR as you used to.
Saves your time without discarding LLM PRs completely.
But what does LLM-generated mean? What if I use CoPilot for completions? Is that considered "AI generated"? What if I grab the code from Claude, and update greater than 50%. Am I now taking ownership of it as my code?
The title seems perfectly engineered to get upvotes from people who don't read the article, which puts the article in front of more people who would actually read it (which is good because the article is, as you say, very interesting and worth sharing).
I guess the main question I'm left with after reading this is "what good is a prototype, then?" In a few of the companies I've worked at there was a quarterly or biannual ritual called "hack week" or "innovation week" or "hackathon" where engineers form small teams and try to bang out a pet project super fast. Sometimes these projects get management's attention, and get "promoted" to a product or feature. Having worked on a few of these "promoted" projects, to the last they were unmitigated disasters. See, "innovation" doesn't come from a single junior engineer's 2AM beer and pizza fueled fever dream. And when you make the mistake of believing otherwise, what seemed like some bright spark's clever little dream turns into a nightmare right quick. The best thing you can do with a prototype is delete it.
Completely agree, I hate the “hackathon” for so many reasons, guess I’ll vent here too. All of this from the perspective of one frustrated software engineer in web tech.
First of all, if you want innovation, why are you forcing it into a single week? You very likely have smart people with very good ideas, but they’re held back by your number-driven bullshit. These orgs actively kill innovation by reducing talent to quantifiable rows of data.
A product hobbled together from shit prototype code very obviously stands out. It has various pages that don’t quite look/work the same, Cross-functional things that “work everywhere else” don’t in some parts.
It rewards only the people who make good presentations, or pick the “current hype thing” to work on. Occasionally something good that addresses real problems is at least mentioned but the hype thing will always win (if judged by your SLT)
Shame on you if the slop prototype is handed off to some other team than the hackathon presenters. Presenters take all the promotion points, then implementers have to sort out a bunch of bullshit code, very likely being told to just ship the prototype “it works you idiots, I saw it in the demo, just ship it.” Which is so incredibly short sighted.
I think the depressing truth is your executives know it’s all hobbled together bullshit, but that it will sell anyway, so why invest time making it actually good? They all have their golden parachutes, what do they care about the suckers stuck on-call for the house-of-cards they were forced to build, despite possessing the talent to make it stable? All this stupidity happens over and over again, not because it is wise, or even the best way to do this, the truth is just a flaccid “eh, it’ll work though, fuck it, let’s get paid.”
If one claims to be able to write good code with LLMs, it should just as easy to write comprehensive e2e tests. If you don't hold your code to a high testing standard than you were always going off 'vibes' whether they were from a silicon neural network or your human meatware biases.
Reviewing test code is arguably harder than reviewing implementation code because tests are enumerated success and failure scenarios. Some times the LOC of the tests is an order of magnitude larger than the implementation code.
The biggest place I've seen AI created code with tests produce a false positive is when a specific feature is being tested, but the test case overwrites a global data structure. Fixing the test reveals the implementation to be flawed.
Now imagine you get rewarded for shipping new features a test code, but are derided for refactoring old code. The person who goes to fix the AI slop is frowned upon while the AI slop driver gets recognition for being a great coder. This dynamic caused by AI coding tools is creating perverse workplace incentives.
Shouldn't there be guidelines for open source projects where it is clearly stipulated that code submitted for review must follow the project's code format and conventions?
This is the thought that I always have whenever I see the mention of coding standards. Not only should there be standards they should be enforced by tooling.
Now that being said a person should feel free to do what they want with their code. It’s somewhat tough to justify the work of setting up infrastructure to do that on small projects, but AI PRs aren’t likely a big issue fit small projects.
Code format and conventions are not the problem. It's the complexity of the change without testing, thinking, or otherwise having ownership of your PR.
Some people will absolutely just run something, let the AI work like a wizard and push it in hopes of getting an "open source contribution".
They need to understand due diligence and reduce the overhead of maintainers so that maintainers don't review things before it's really needed.
It's a hard balance to strike, because you do want to make it easy on new contributors, but this is a great conversation to have.
> that code submitted for review must follow the project's code format and conventions
...that's just scratching the surface.
The problem is that LLMs make mistakes that no single human would make, and coding conventions should anyway never be the focus of a code review and should usually be enforced by tooling.
E.g. when reading/reviewing other people's code you tune into their brain and thought process - after reading a few lines of (non-trivial) code you know subconsciously what 'programming character' this person is and what type of problems to expect and look for.
With LLM generated code it's like trying to tune into a thousand brains at the same time, since the code is a mishmash of what a thousand people have written and published on the internet. Reading a person's thought process via reading their code doesn't work anymore, because there is no coherent thought process.
Personally I'm very hesitant to merge PRs into my open source projects that are more than small changes of a couple dozen lines at most, unless I know and trust the contributor to not fuck things up. E.g. for the PRs I'm accepting I don't really care if they are vibe-coded or not, because the complexity for accepted PRs is so low that the difference shouldn't matter much.
As if people read guidelines. Sure they're good to have so you can point to them when people violate them but people (in general) will not by default read them before contributing.
> You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.
A couple of weeks ago I needed to stuff some binary data into a string, in a way where it wouldn't be corrupted by whitespace changes.
I wrote some Rust code to generate the string. After I typed "}" to end the method: 1: Copilot suggested a 100% correct method to parse the string back to binary data, and then 2: Suggested a 100% correct unit test.
I read both methods, and they were identical to what I would write. It was as if Copilot could read my brain.
BUT: If I relied on Copilot to come up with the serialization form, or even know that it needed to pick something that wouldn't be corrupted by whitespace, it might have picked something completely wrong, that didn't meet what the project needed.
2 months ago, after I started using Claude Code on my side project, within the space of days, I went from not allowing a single line of AI code into my codebase to almost 100% AI-written code. It basically codes in my exact style and I know ahead of time what code I expect to see so reviewing is really easy.
I cannot justify to myself writing code by hand when there is literally no difference in the output from how I would have done it myself. It might as well be reading my mind, that's what it feels like.
For me, vibe coding is essentially a 5x speed increase with no downside. I cannot believe how fast I can churn out features. All the stuff I used to type out by hand now seems impossibly boring. I just don't have the patience to hand-code anymore.
I've stuck to vanilla JavaScript because I don't have the patience to wait for the TypeScript transpiler. TS iteration speed is too slow. By the time it finishes transpiling, I can't even remember what I was trying to do. So you bet I don't have the patience to write by hand now. I really need momentum (fast iteration speed) when I code and LLMs provide that.
I dont mean to question you personally, after all this is the internet, but comments like yours do make the reader think, if he has 5x'ed his coding, was he any good to begin with? I guess what I'm saying is, without knowing your baseline skill level, I dont know whether to be impressed by your story. Have you become a super-programmer, or is it just cleaning up stupid stuff that you shouldn't have been doing in the first place? If someone is already a clear-headed, efficient, experienced programmer, would that person be seeing anywhere near the benefits you have? Again, this isn't a slight on you personally, it's just, a reader doesnt really know how to place your experience into context.
The problem with AI isn’t new, it’s the same old problem with technology: computers don’t do what you want, only what you tell them.
A lot of PRs can be judged by how well they are described and justified, it’s because the code itself isn’t that important, it’s the problem that you are solving with it that is.
People are often great at defining problems, AIs less so IMHO. Partially because they simply have no understanding, partially because they over explain everything to a point where you just stop reading, and so you never get to the core of the problem. And even if you do there’s a good chance AI misunderstood the problem and the solution is wrong in a some more or less subtle way.
This is further made worse by the sheer overconfidence of AI output, which quickly erodes any trust that they did understand the problem.
I really liked the paragraph about LLMs being "alien intelligence"
> Many engineers I know fall into 2 camps, either the camp that find the new class of LLMs intelligent, groundbreaking and shockingly good. In the other camp are engineers that think of all LLM generated content as “the emperor’s new clothes”, the code they generate is “naked”, fundamentally flawed and poison.
I like to think of the new systems as neither. I like to think about the new class of intelligence as “Alien Intelligence”. It is both shockingly good and shockingly terrible at the exact same time.
Framing LLMs as “Super competent interns” or some other type of human analogy is incorrect. These systems are aliens and the sooner we accept this the sooner we will be able to navigate the complexity that injecting alien intelligence into our engineering process leads to.
It's a similitude I find compelling. The way they produce code and the way you have to interact with them really feels "alien", and when you start humanizing them, you get emotions when interacting with it and that's not correct.
I mean, I do get emotional and frustrated even when good old deterministic programs misbehaved and there was some bug to find and squash or work-around, but the LLM interactions can bring the game to a complete new level. So, we need to remember they are "alien".
Isn't the intelligence of every other person alien to ourselves? The article ends with a need to "protect our own engineering brands" but how is that communicated? I found this [https://meta.discourse.org/t/contributing-to-discourse-devel...] which seems woefully inadequate. In practice, conventions are communicated through existing code. Are human contributors capable of grasping an "engineering brand" by working on a few PRs?
This is why at a fundamental level, the concept of AGI doesn't make a lot of sense. You can't measure machine intelligence by comparing it to a human's. That doesn't mean machines can't be intelligent...but rather that the measuring stick cannot be an abstracted human being. It can only be the accumulation of specific tasks.
> As engineers it is our role to properly label our changes.
I've found myself wanting line-level blame for LLMs. If my teammate committed something that was written directly by Claude Code, it's more useful to me to know that than to have the blame assigned to the human through the squash+merge PR process.
Ultimately somebody needs to be on the hook. But if my teammate doesn't understand it any better than I do, I'd rather that be explicit and avoid the dance of "you committed it, therefore you own it," which is better in principle than in practice IMO.
> That said it is a living demo that can help make an idea feel more real. It is also enormously fun. Think of it as a delightful movie set.
[pedantry] It bothers me that the photo for "think of prototype PRs as movie sets" is clearly not a movie set but rather the set of the TV show Seinfeld. Anyone who watched the show would immediately recognize Jerry's apartment.
Maybe we need open source credit scores. PRs from talented engineers with proven track records of high quality contributions would be presumed good enough for review. Unknown, newer contributors could have a size limit on their PRs, with massive PRs rejected automatically.
The Forgejo project has been gently trying to redirect new contributors into fixing bugs before trying to jump into the project to implement big features (https://codeberg.org/forgejo/discussions/issues/337). This allows a new contributor to get into the community, get used to working with the codebase, do something of clear value... but for the project a lot of it is about establishing reputation.
Will the contributor respond to code-review feedback? Will they follow-up on work? Will they work within the code-of-conduct and learn the contributor guidelines? All great things to figure out on small bugs, rather than after the contributor has done significant feature work.
A bit of a brutal title for what's a pretty constructive and reasonable article. I like the core: AI-produced contributions are prototypes, belong in branches, and require transparency and commitment as a path to being merged.
It is possible that some projects could benefit from triage volunteers?
There are plenty of open source projects where it is difficult to get up to speed with the intricacies of the architecture that limits the ability of talented coders to contribute on a small scale.
There might be merit in having a channel for AI contributions that casual helpers can assess to see if they pass a minimum threshold before passing on to a project maintainer to assess how the change works within the context of the overall architecture.
It would also be fascinating to see how good an AI would be at assessing the quality of a set of AI generated changes absent the instructions that generated them. They may not be able to clearly identify whether the change would work, but can they at least rank a collection of submissions to select the ones most worth looking at?
At the very least the pile of PRs count as data of things that people wanted to do, even if the code was completely unusable, placing it into a pile somewhere might be minable for the intentions of erstwhile contributors.
> Some go so far as to say “AI not welcome here” find another project.
This feels extremely counterproductive and fundamentally unenforceable to me.
But it's trivially enforceable. Accept PRs from unverified contributors, look at them for inspiration if you like, but don't ever merge one. It's probably not a satisfying answer, but if you want or need to ensure your project hasn't been infected by AI generated code you need to only accept contributions from people you know and trust.
I have a framework: don't use it, if you never used it don't start using it, public shame people, stop talking about it.
Slow down. Think long and deep about your problems. Write less code.
>That said, there is a trend among many developers of banning AI. Some go so far as to say “AI not welcome here” find another project.
>This feels extremely counterproductive and fundamentally unenforceable to me. Much of the code AI generates is indistinguishable from human code anyway. You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.
Isn't that exactly the point? Doesn't this achieve exactly what the whole article is arguing for?
A hard "No AI" rule filters out all the slop, and all the actually good stuff (which may or may not have been made with AI) makes it in.
When the AI assisted code is indistinguishable from human code, that's mission accomplished, yeah?
Although I can see two counterarguments. First, it might just be Covert Slop. Slop that goes under the radar.
And second, there might be a lot of baby thrown out with that bathwater. Stuff that was made in conjunction with AI, contains a lot of "obviously AI", but a human did indeed put in the work to review it.
I guess the problem is there's no way of knowing that? Is there a Proof of Work for code review? (And a proof of competence, to boot?)
> I guess the problem is there's no way of knowing that? Is there a Proof of Work for code review?
In a live setting, you could ask the submitter to explain various parts of the code. Async, that doesn’t work, because presumably someone who used AI without disclosing that would do the same for the explanation.
Well, but why not instead of asking/accepting people will lie undetectably when you say "No AI" and it's okay you're fine with lying, just say instead "Only AI when you spend the time to turn it into a real reviewed PR, which looks like X, Y, and Z", giving some actual tips on how to use AI acceptably. Which is what OP suggests.
The way we do it is to use AI to review the PR before a human reviewer sees it. Obvious errors, non-consistent patterns, weirdness etc is flagged before it goes any further. "Vibe coded" slop usually gets caught, but "vibe engineered" surgical changes that adhere to common patterns and standards and have tests etc get to be seen by a real live human for their normal review.
I wouldn't call it "vibe coded slop" the models are getting way better and I can work with my engineers a lot faster.
I am the founder and a product person so it helps in reducing the number of needed engineers at my business. We are currently doing $2.5M ARR and the engineers aren't complaining, in fact it is the opposite, they are actually more productive.
We still prioritize architecture planning, testing and having a CI, but code is getting less and less important in our team, so we don't need many engineers.
> code is getting less and less important in our team, so we don't need many engineers.
That's a bit reductive. Programmers write code; engineers build systems.
I'd argue that you still need engineers for architecture, system design, protocol design, API design, tech stack evaluation & selection, rollout strategies, etc, and most of this has to be unambiguously documented in a format LLMs can understand.
While I agree that the value of code has decreased now that we can generate and regenerate code from specs, we still need a substantial number of experienced engineers to curate all the specs and inputs that we feed into LLMs.
Nice jewish word mostly meant to mock. Why would I care what a plugin that I don't even see in use has to say to my face (since I had to read this with all the interpretation potential and receptiveness available). The same kind of inserted judgment that lingers similar to "Yes, I will judge you if you use AI".
There’s nothing wrong with judgment. Judging someone’s character based on whether they use generative “AI” is a valid practice. You may not like being judged, but that’s another matter entirely.
This is a problem everywhere now, and not just in code. It now takes zero effort to produce something, whether code or a work plan or “deep research” and then lob it over the fence, expecting people to review and act upon it.
It’s an extension of the asymmetric bullshit principle IMO, and I think now all workplaces / projects need norms about this.
This problem statement was actually where the idea for Proof of Work (aka mining) in bitcoin came from. It evolved out of the idea of requiring a computational proof of work for sending an email via cypherpunk remailers as a way of fighting spam. The idea being only a legitimate or determined sender would put in the "proof of work" to use the remailer.
I wonder how it would look if open source projects required $5 to submit a PR or ticket and then paid out a bounty to the successful or at least reasonable PRs. Essentially a "paid proof of legitimacy".
It feels like reputation / identity are about to become far more critical in determining whether your contribution, of whatever form, even gets considered.
> expecting people to review and act upon it.
But why should this expectation be honored? If someone spends close to zero effort generating a piece of code and lobs it over the fence to me, why would I even look at it? Particularly if it doesn't even meet the requirements for a pull request (which is what it seems like the article is talking about)?
My music/Youtube algos are ruined because when I flag I don't like the 100 AI songs/videos that it presents me each day the algos take it as my no longer liking those genres. Between me down rating AI music/AI history videos, Youtube now give me like half a page of recommendations then gives up. I'm now punished by Youtube/my experience is worse because Youtube's cool with hosting so much AI slop content and I chose to downrate it/try to curate if out of my feed. The way Youtube works today it punishes you (or trys to train you not to) for flagging 'don't recommend channel' when recommended a channel of AI slop. Flag AI and Youtube will degrade you algo recommendations.
> This is a problem everywhere now, and not just in code. It now takes zero effort to produce something, whether code or a work plan or “deep research” and then lob it over the fence, expecting people to review and act upon it.
Where is the problem? If I don't have the time to review a PR, I simply reject it. Or if I am flooded in PRs, I only take those from people from which I know that their PRs are of high quality. In other words: your assumption "expecting people to review and act upon it" is wrong.
Even though I would bet that for the kind of code that I voluntarily write in my free time, using an LLM to generate lots of code is much less helpful because I use such private projects to try out novel things that are typically not "digested stuff from the internet".
So, the central problem that I rather see is the license uncertainties for AI-generated code.
I think people are starting to realize what the “end of work” is going to look like and they don’t like it
Anyone else feel like we're cresting the LLM coding hype curve?
Like a recognition that there's value there, but we're passing the frothing-at-the-mouth stage of replacing all software engineers?
My opinion swings between hype to hate every day. Yesterday all suggestions / edits / answers were hallucinated garbage, and I was ready to remove the copilot plugin altogether. Today I was stuck at a really annoying problem for hours and hours. For shits and giggles I just gave Claude a stacktrace and a description and let it go ham. It produced an amazingly accurate thought train and found my issue, which was not what I was expecting at all.
I still don't see how it's useful for generating features and codebases, but as a rubber ducky it ain't half bad.
I've been skeptical about LLMs being able to replace humans in their current state (which has gotten marginally better in the last 18 months), but let us not forget that GPT-3.5 (the first truly useful LLM) was only 3 years ago. We aren't even 10 years out from the initial papers about GPTs.
Well, when MS give OpenAI free use of their servers and OpenAI call it a $10 billion investment, then they use up their tokens and MS calls in $10 billion in revenue, I think so, yes.
I feel like we need a different programming paradigm that's more suited to LLM's strengths; that enables a new kind of application. IE, think of an application that's more analog with higher tolerances of different kinds of user inputs.
A different way to say it. Imagine if programming a computer was more like training a child or a teenager to perform a task that requires a lot of human interaction; and that interaction requires presenting data / making drawings.
When people talk about the “AI bubble popping” this is what they mean. It is clear that AI will remain useful, but the “singularity is nigh” hype is faltering and the company valuations based on perpetual exponential improvement are just not realistic. Worse, the marginal improvements are coming at ever higher resource requirements with each generation, which puts a soft cap on how good an AI can be and still be economical to run.
Maybe, maybe not, it’s hard to tell from articles like this from OSS projects what is generally going on, especially with corporate work. There is no such rhetoric at $job, but also, the massive AI investment seemingly has yet to shift the needle. If it doesn’t they’ll likely fire a bunch of people again and continue.
It's been less than a year and agents have gone from patently useless to very useful if used well.
I was extremely skeptical at the beginning, and therefore critical of what was possible as my default stance. Despite all that, the latest iterations of cli agents which attach to LSPs and scan codebase context have been surprising me in a positive direction. I've given them tasks that require understanding the project structure and they've been able to do so. Therefore, for me my trajectory has been from skeptic to big proponent of the use, of course with all the caveats that at the end of the day, it is my code which will be pushed to prod. So I never went through the trough of disillusionment, but am arriving at productivity and find it great.
I think that happened when gpt5 was released and pierced OpenAIs veil. While not a bad model, we found out exactly what Mr. Altman’s words are worth.
It feels that way to me, too—starting to feel closer to maturity. Like Mr. Saffron here, saying “go ham with the AI for prototyping, just communicate that as a demo/branch/video instead of a PR.”
It feels like people and projects are moving from a pure “get that slop out of here” attitude toward more nuance, more confidence articulating how to integrate the valuable stuff while excluding the lazy stuff.
> “I am closing this but this is interesting, head over to our forum/issues to discuss”
I really like the way Discourse uses "levels" to slowly open up features as new people interact with the community, and I wonder if GitHub could build in a way of allowing people to only be able to open PRs after a certain amount of interaction, too (for example, you can only raise a large PR if you have spent enough time raising small PRs).
This could of course be abused and/or lead to unintended restrictions (e.g. a small change in lots of places), but that's also true of Discourse and it seems to work pretty well regardless.
Mailing lists are used as a filter to raise the barrier to entry to prevent people from contributing code that they have no intention of maintaining and leaving that to the project owners. Github for better or worse has made the barrier to entry much much lower and significantly easier for people to propose changes and then disappear.
So far I prefer the Hashimoto's solution to this that "AI tooling must be disclosed for contributions": https://news.ycombinator.com/item?id=44976568
I use it like this: If a PR is LLM-generated, you as a maintainer either merge it if it's good or close if it's not. If it's human-written, you may spend some time reviewing the code and iterating on the PR as you used to.
Saves your time without discarding LLM PRs completely.
But what does LLM-generated mean? What if I use CoPilot for completions? Is that considered "AI generated"? What if I grab the code from Claude, and update greater than 50%. Am I now taking ownership of it as my code?
It's like the ship of theseus
Essay is way more interesting than the title, which doesn't actually capture it.
The title seems perfectly engineered to get upvotes from people who don't read the article, which puts the article in front of more people who would actually read it (which is good because the article is, as you say, very interesting and worth sharing).
I don't like it but I can hardly blame them.
Thanks for pointing this out—it made me take the time to find a sentence in the article body that could serve as a less baity title.
From https://news.ycombinator.com/newsguidelines.html: "Please use the original title, unless it is misleading or linkbait" (note that word unless)
I guess the main question I'm left with after reading this is "what good is a prototype, then?" In a few of the companies I've worked at there was a quarterly or biannual ritual called "hack week" or "innovation week" or "hackathon" where engineers form small teams and try to bang out a pet project super fast. Sometimes these projects get management's attention, and get "promoted" to a product or feature. Having worked on a few of these "promoted" projects, to the last they were unmitigated disasters. See, "innovation" doesn't come from a single junior engineer's 2AM beer and pizza fueled fever dream. And when you make the mistake of believing otherwise, what seemed like some bright spark's clever little dream turns into a nightmare right quick. The best thing you can do with a prototype is delete it.
Completely agree, I hate the “hackathon” for so many reasons, guess I’ll vent here too. All of this from the perspective of one frustrated software engineer in web tech.
First of all, if you want innovation, why are you forcing it into a single week? You very likely have smart people with very good ideas, but they’re held back by your number-driven bullshit. These orgs actively kill innovation by reducing talent to quantifiable rows of data.
A product hobbled together from shit prototype code very obviously stands out. It has various pages that don’t quite look/work the same, Cross-functional things that “work everywhere else” don’t in some parts.
It rewards only the people who make good presentations, or pick the “current hype thing” to work on. Occasionally something good that addresses real problems is at least mentioned but the hype thing will always win (if judged by your SLT)
Shame on you if the slop prototype is handed off to some other team than the hackathon presenters. Presenters take all the promotion points, then implementers have to sort out a bunch of bullshit code, very likely being told to just ship the prototype “it works you idiots, I saw it in the demo, just ship it.” Which is so incredibly short sighted.
I think the depressing truth is your executives know it’s all hobbled together bullshit, but that it will sell anyway, so why invest time making it actually good? They all have their golden parachutes, what do they care about the suckers stuck on-call for the house-of-cards they were forced to build, despite possessing the talent to make it stable? All this stupidity happens over and over again, not because it is wise, or even the best way to do this, the truth is just a flaccid “eh, it’ll work though, fuck it, let’s get paid.”
If one claims to be able to write good code with LLMs, it should just as easy to write comprehensive e2e tests. If you don't hold your code to a high testing standard than you were always going off 'vibes' whether they were from a silicon neural network or your human meatware biases.
Reviewing test code is arguably harder than reviewing implementation code because tests are enumerated success and failure scenarios. Some times the LOC of the tests is an order of magnitude larger than the implementation code.
The biggest place I've seen AI created code with tests produce a false positive is when a specific feature is being tested, but the test case overwrites a global data structure. Fixing the test reveals the implementation to be flawed.
Now imagine you get rewarded for shipping new features a test code, but are derided for refactoring old code. The person who goes to fix the AI slop is frowned upon while the AI slop driver gets recognition for being a great coder. This dynamic caused by AI coding tools is creating perverse workplace incentives.
Shouldn't there be guidelines for open source projects where it is clearly stipulated that code submitted for review must follow the project's code format and conventions?
This is the thought that I always have whenever I see the mention of coding standards. Not only should there be standards they should be enforced by tooling.
Now that being said a person should feel free to do what they want with their code. It’s somewhat tough to justify the work of setting up infrastructure to do that on small projects, but AI PRs aren’t likely a big issue fit small projects.
Code format and conventions are not the problem. It's the complexity of the change without testing, thinking, or otherwise having ownership of your PR.
Some people will absolutely just run something, let the AI work like a wizard and push it in hopes of getting an "open source contribution".
They need to understand due diligence and reduce the overhead of maintainers so that maintainers don't review things before it's really needed.
It's a hard balance to strike, because you do want to make it easy on new contributors, but this is a great conversation to have.
In a perfect world people would read and understand contribution guidelines before opening a PR or issue.
Alas…
> that code submitted for review must follow the project's code format and conventions
...that's just scratching the surface.
The problem is that LLMs make mistakes that no single human would make, and coding conventions should anyway never be the focus of a code review and should usually be enforced by tooling.
E.g. when reading/reviewing other people's code you tune into their brain and thought process - after reading a few lines of (non-trivial) code you know subconsciously what 'programming character' this person is and what type of problems to expect and look for.
With LLM generated code it's like trying to tune into a thousand brains at the same time, since the code is a mishmash of what a thousand people have written and published on the internet. Reading a person's thought process via reading their code doesn't work anymore, because there is no coherent thought process.
Personally I'm very hesitant to merge PRs into my open source projects that are more than small changes of a couple dozen lines at most, unless I know and trust the contributor to not fuck things up. E.g. for the PRs I'm accepting I don't really care if they are vibe-coded or not, because the complexity for accepted PRs is so low that the difference shouldn't matter much.
As if people read guidelines. Sure they're good to have so you can point to them when people violate them but people (in general) will not by default read them before contributing.
I’ve found LLM coding agents to be quite good at writing linters…
> You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.
A couple of weeks ago I needed to stuff some binary data into a string, in a way where it wouldn't be corrupted by whitespace changes.
I wrote some Rust code to generate the string. After I typed "}" to end the method: 1: Copilot suggested a 100% correct method to parse the string back to binary data, and then 2: Suggested a 100% correct unit test.
I read both methods, and they were identical to what I would write. It was as if Copilot could read my brain.
BUT: If I relied on Copilot to come up with the serialization form, or even know that it needed to pick something that wouldn't be corrupted by whitespace, it might have picked something completely wrong, that didn't meet what the project needed.
2 months ago, after I started using Claude Code on my side project, within the space of days, I went from not allowing a single line of AI code into my codebase to almost 100% AI-written code. It basically codes in my exact style and I know ahead of time what code I expect to see so reviewing is really easy.
I cannot justify to myself writing code by hand when there is literally no difference in the output from how I would have done it myself. It might as well be reading my mind, that's what it feels like.
For me, vibe coding is essentially a 5x speed increase with no downside. I cannot believe how fast I can churn out features. All the stuff I used to type out by hand now seems impossibly boring. I just don't have the patience to hand-code anymore.
I've stuck to vanilla JavaScript because I don't have the patience to wait for the TypeScript transpiler. TS iteration speed is too slow. By the time it finishes transpiling, I can't even remember what I was trying to do. So you bet I don't have the patience to write by hand now. I really need momentum (fast iteration speed) when I code and LLMs provide that.
I dont mean to question you personally, after all this is the internet, but comments like yours do make the reader think, if he has 5x'ed his coding, was he any good to begin with? I guess what I'm saying is, without knowing your baseline skill level, I dont know whether to be impressed by your story. Have you become a super-programmer, or is it just cleaning up stupid stuff that you shouldn't have been doing in the first place? If someone is already a clear-headed, efficient, experienced programmer, would that person be seeing anywhere near the benefits you have? Again, this isn't a slight on you personally, it's just, a reader doesnt really know how to place your experience into context.
The problem with AI isn’t new, it’s the same old problem with technology: computers don’t do what you want, only what you tell them. A lot of PRs can be judged by how well they are described and justified, it’s because the code itself isn’t that important, it’s the problem that you are solving with it that is. People are often great at defining problems, AIs less so IMHO. Partially because they simply have no understanding, partially because they over explain everything to a point where you just stop reading, and so you never get to the core of the problem. And even if you do there’s a good chance AI misunderstood the problem and the solution is wrong in a some more or less subtle way. This is further made worse by the sheer overconfidence of AI output, which quickly erodes any trust that they did understand the problem.
The title doesn't make justice to the content.
I really liked the paragraph about LLMs being "alien intelligence"
It's a similitude I find compelling. The way they produce code and the way you have to interact with them really feels "alien", and when you start humanizing them, you get emotions when interacting with it and that's not correct. I mean, I do get emotional and frustrated even when good old deterministic programs misbehaved and there was some bug to find and squash or work-around, but the LLM interactions can bring the game to a complete new level. So, we need to remember they are "alien".I’m reminded of Dijkstra: “The question of whether machines can think is about as relevant as the question of whether submarines can swim.”
These new submarines are a lot closer to human swimming than the old ones were, but they’re still very different.
Some movements expected alien intelligence to arrive in the early 2020s. They might have been on the mark after all ;)
Isn't the intelligence of every other person alien to ourselves? The article ends with a need to "protect our own engineering brands" but how is that communicated? I found this [https://meta.discourse.org/t/contributing-to-discourse-devel...] which seems woefully inadequate. In practice, conventions are communicated through existing code. Are human contributors capable of grasping an "engineering brand" by working on a few PRs?
This is why at a fundamental level, the concept of AGI doesn't make a lot of sense. You can't measure machine intelligence by comparing it to a human's. That doesn't mean machines can't be intelligent...but rather that the measuring stick cannot be an abstracted human being. It can only be the accumulation of specific tasks.
> As engineers it is our role to properly label our changes.
I've found myself wanting line-level blame for LLMs. If my teammate committed something that was written directly by Claude Code, it's more useful to me to know that than to have the blame assigned to the human through the squash+merge PR process.
Ultimately somebody needs to be on the hook. But if my teammate doesn't understand it any better than I do, I'd rather that be explicit and avoid the dance of "you committed it, therefore you own it," which is better in principle than in practice IMO.
If your teammate doesn't understand it, they shouldn't have committed it. This isn't a "dance", it's basic responsibility for your actions.
An idea occurred to me. What if:
1. Someone raises a PR
2. Entry-level maintainers skim through it and either reject or pass higher up
3. If the PR has sufficient quality, the PR gets reviewed by someone who actually has merge permissions
> That said it is a living demo that can help make an idea feel more real. It is also enormously fun. Think of it as a delightful movie set.
[pedantry] It bothers me that the photo for "think of prototype PRs as movie sets" is clearly not a movie set but rather the set of the TV show Seinfeld. Anyone who watched the show would immediately recognize Jerry's apartment.
Its not the set of the TV show I beliefe, but a recreation.
https://nypost.com/2015/06/23/you-can-now-visit-the-iconic-s...
It looks a bit different wrt. the stuff on the fridge and the items in the cupboard
The Fedora policy on AI-assisted contributions seems very reasonable: https://communityblog.fedoraproject.org/council-policy-propo...
Maybe we need open source credit scores. PRs from talented engineers with proven track records of high quality contributions would be presumed good enough for review. Unknown, newer contributors could have a size limit on their PRs, with massive PRs rejected automatically.
The Forgejo project has been gently trying to redirect new contributors into fixing bugs before trying to jump into the project to implement big features (https://codeberg.org/forgejo/discussions/issues/337). This allows a new contributor to get into the community, get used to working with the codebase, do something of clear value... but for the project a lot of it is about establishing reputation.
Will the contributor respond to code-review feedback? Will they follow-up on work? Will they work within the code-of-conduct and learn the contributor guidelines? All great things to figure out on small bugs, rather than after the contributor has done significant feature work.
We don't need more KYC, no.
A bit of a brutal title for what's a pretty constructive and reasonable article. I like the core: AI-produced contributions are prototypes, belong in branches, and require transparency and commitment as a path to being merged.
It is possible that some projects could benefit from triage volunteers?
There are plenty of open source projects where it is difficult to get up to speed with the intricacies of the architecture that limits the ability of talented coders to contribute on a small scale.
There might be merit in having a channel for AI contributions that casual helpers can assess to see if they pass a minimum threshold before passing on to a project maintainer to assess how the change works within the context of the overall architecture.
It would also be fascinating to see how good an AI would be at assessing the quality of a set of AI generated changes absent the instructions that generated them. They may not be able to clearly identify whether the change would work, but can they at least rank a collection of submissions to select the ones most worth looking at?
At the very least the pile of PRs count as data of things that people wanted to do, even if the code was completely unusable, placing it into a pile somewhere might be minable for the intentions of erstwhile contributors.
Maybe what we need is AI based code review.
> Some go so far as to say “AI not welcome here” find another project.
This feels extremely counterproductive and fundamentally unenforceable to me.
But it's trivially enforceable. Accept PRs from unverified contributors, look at them for inspiration if you like, but don't ever merge one. It's probably not a satisfying answer, but if you want or need to ensure your project hasn't been infected by AI generated code you need to only accept contributions from people you know and trust.
This is sad. The barrier of entry will be raised extremely high, maybe even requiring some real world personal connections to the maintainer.
Author here, thanks heaps for the discussion, I replied to a few of the points in my blog comments:
https://discuss.samsaffron.com/t/your-vibe-coded-slop-pr-is-...
I have a framework: don't use it, if you never used it don't start using it, public shame people, stop talking about it. Slow down. Think long and deep about your problems. Write less code.
There is NOTHING inevitable about this stuff.
Indeed. "No." is perfectly clear.
>That said, there is a trend among many developers of banning AI. Some go so far as to say “AI not welcome here” find another project.
>This feels extremely counterproductive and fundamentally unenforceable to me. Much of the code AI generates is indistinguishable from human code anyway. You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.
Isn't that exactly the point? Doesn't this achieve exactly what the whole article is arguing for?
A hard "No AI" rule filters out all the slop, and all the actually good stuff (which may or may not have been made with AI) makes it in.
When the AI assisted code is indistinguishable from human code, that's mission accomplished, yeah?
Although I can see two counterarguments. First, it might just be Covert Slop. Slop that goes under the radar.
And second, there might be a lot of baby thrown out with that bathwater. Stuff that was made in conjunction with AI, contains a lot of "obviously AI", but a human did indeed put in the work to review it.
I guess the problem is there's no way of knowing that? Is there a Proof of Work for code review? (And a proof of competence, to boot?)
Personally, I would not contribute to a project that forced me to lie.
And from the point of view of the maintainers, it seems a terrible idea to set up rules with the expectation that they will be broken.
> I guess the problem is there's no way of knowing that? Is there a Proof of Work for code review?
In a live setting, you could ask the submitter to explain various parts of the code. Async, that doesn’t work, because presumably someone who used AI without disclosing that would do the same for the explanation.
Well, but why not instead of asking/accepting people will lie undetectably when you say "No AI" and it's okay you're fine with lying, just say instead "Only AI when you spend the time to turn it into a real reviewed PR, which looks like X, Y, and Z", giving some actual tips on how to use AI acceptably. Which is what OP suggests.
related discussion: https://news.ycombinator.com/item?id=45330378
We’re fixing this slop problem - engineers write rules that are enforced on PRs. Fixes the problem pretty well so far.
The way we do it is to use AI to review the PR before a human reviewer sees it. Obvious errors, non-consistent patterns, weirdness etc is flagged before it goes any further. "Vibe coded" slop usually gets caught, but "vibe engineered" surgical changes that adhere to common patterns and standards and have tests etc get to be seen by a real live human for their normal review.
It's not rocket science.
Do you work at a profitable company?
Well...just have AI review the PR to have it highlight the slop
/s
[flagged]
I wouldn't call it "vibe coded slop" the models are getting way better and I can work with my engineers a lot faster.
I am the founder and a product person so it helps in reducing the number of needed engineers at my business. We are currently doing $2.5M ARR and the engineers aren't complaining, in fact it is the opposite, they are actually more productive.
We still prioritize architecture planning, testing and having a CI, but code is getting less and less important in our team, so we don't need many engineers.
> code is getting less and less important in our team, so we don't need many engineers.
That's a bit reductive. Programmers write code; engineers build systems.
I'd argue that you still need engineers for architecture, system design, protocol design, API design, tech stack evaluation & selection, rollout strategies, etc, and most of this has to be unambiguously documented in a format LLMs can understand.
While I agree that the value of code has decreased now that we can generate and regenerate code from specs, we still need a substantial number of experienced engineers to curate all the specs and inputs that we feed into LLMs.
> the engineers aren't complaining, in fact it is the opposite, they are actually more productive.
More productive isn't the opposite of complaining.
> and a product person
Tells me all I need to know about your ability for sound judgement on technical topics right there.
> so it helps in reducing the number of needed engineers at my business
> the engineers aren't complaining
You're missing a piece of the puzzle here, Mr business person.
> reducing the number of needed engineers at my business
> code is getting less and less important in our team
> the engineers aren't complaining
lays off engineers for ai trained off of other engineer's code and says code is less important and engineers aren't complaining.
What do the spends for AI/LLM services look like per person? Do you track any dev/AI metrics related to how the usage is in the company?
[flagged]
Nice jewish word mostly meant to mock. Why would I care what a plugin that I don't even see in use has to say to my face (since I had to read this with all the interpretation potential and receptiveness available). The same kind of inserted judgment that lingers similar to "Yes, I will judge you if you use AI".
There’s nothing wrong with judgment. Judging someone’s character based on whether they use generative “AI” is a valid practice. You may not like being judged, but that’s another matter entirely.
> Nice jewish word
"Slop" doesn't seem to be Yiddish: https://www.etymonline.com/word/slop, and even if it was, so what?
Which word? Slop? I think it is from medieval old English if that is the word you are referring to.