OpenCode was the first open source agent I used, and my main workhorse after experimenting briefly with Claude Code and realizing the potential of agentic coding. Due to that, and because it's a popular an open source alternative, I want to be able to recommend it and be enthusiastic about it. The problem for me is that the development practices of the people that are working on it are suboptimal at best; they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things (or even build a proper list of changes for each release), and they add, remove, refine, change, fix, and break features constantly at that accelerated pace.
More than that, it's an extremely large and complex TypeScript code base — probably larger and more complex than it needs to be — and (partly as a result) it's fairly resource inefficient (often uses 1GB of RAM or more. For a TUI).
On top of that, at least I personally find the TUI to be overbearing and a little bit buggy, and the agent to be so full of features that I don't really need — also mildly buggy — that it sort of becomes hard to use and remember how everything is supposed to work and interact.
> Due to that, and because it's a popular an open source alternative, I want to be able to recommend it and be enthusiastic about it. The problem for me is that the development practices of the people that are working on it are suboptimal at best;
This is my experience with most AI tools that I spend more than a few weeks with. It's happening so often it's making me question my own judgement: "if everything smells of shit, check your own shoes." I left professional software engineering a couple of years ago, and I don't know how much of this is also just me losing touch with the profession, or being an old man moaning about how we used to do it better.
It reminds me of social media: there was a time where social media platforms were defined by their features, Vine was short video, snapchat was disappearing pictures, twitter was short status posts etc. but now they're all bloated messes that try do everything.
The same looks to be happening with AI and agent software. They start off as defined by one features, and then become messes trying to implement the latest AI approach (skills, or tools, or functions, or RAG, or AGENTS.md, or claws etc. etc.)
> and (partly as a result) it's fairly resource inefficient (often uses 1GB of RAM or more. For a TUI).
That's (one of the reasons) why I'm favoring Codex over Claude Code.
Claude Code is an... Electron app (for a TUI? WTH?) and Codex is Rust. The difference is tangible: the former feels sluggish and does some odd redrawing when the terminal size changes, while the latter definitely feels more snappy to me (leaving aside that GPT's responses also seem more concise). At some point, I had both chewing concurrently on the same machine and same project, and Claude Code was using multiple GBs of RAM and 100% CPU whereas Codex was happy with 80 MB and 6%.
Performance _is_ a feature and I'm afraid the amounts of code AI produces without supervision lead to an amount of bloat we haven't seen before...
I think you’re confusing capital c Claude Code, the desktop Electron app, and lowercase c `claude`, the command line tool with an interactive TUI. They’re both TypeScript under the hood, but the latter is React + Ink rendered into the terminal.
The redraw glitches you’re referring to are actually signs of what I consider to be a pretty major feature, a reason to use `claude` instead of `codex` or `opencode`: `claude` doesn’t use the alternate screen, whereas the other two do. Meaning that it uses the standard screen buffer, meaning that your chat history is in the terminal (or multiplexer) scrollback. I much prefer that, and I totally get why they’ve put so much effort into getting it to work well.
In that context handling SIGWINCH has some issues and trickiness. Well worth the tradeoff, imo.
Codex is using its app server protocol to build a nice client/server separation that I enjoy on top of the predictable Rust performance.
You can run a codex instance on machine A and connect the TUI to it from machine B. The same open source core and protocol is shared between the Codex app, VS Code and Xcode.
Java (incl. Scala, Closure, Groovy, Jython, etc.) is better suited to running as a server. Let agents write clean readable code and leave performance concerns to the JIT compiler. If you really want you can let agents rewrite components at runtime without losing context.
Erlang would offer similar benefits, because what we're doing with these things is more message passing than processing.
Rust is what I'd want agents writing for edge devices, things I don't want to have to monitor. Granted, our devices are edge devices to Anthropic, but they're more tightly coupled to their services.
I think Go might be a better choice but not for that reason at all.
Go could implement something like this with no dependencies outside the standard library. It would make sense to take on a few, but a comparable Rust project would have at least several dozens.
Also, Go can deliver a single binary that works on every Linux distribution right out of the box. In Rust, its possible but you have to static compile with muslc and that is a far less well-trodden path with some significant differences to the glibc that most Rust libraries have been tested with.
My personal opinion is that I like Rust much more than Go, but I can’t deny that Rust is a big, and more dauntingly to newcomers, pretty unopinionated language compared to Go.
There are more syntax features, more and more complex semantics, and while rustc and clippy do a great job of explaining like 90% of errors, the remaining 10% suuuuuck.
There’s also some choices imposed by the build system (like cargo allowing multiple versions of the same dep in a workspace) and by the macro system (axum has some unintuitive extractor ordering needs that you won’t find unless you know to look for them), and those things and the hurdles they present become intuitive after a time but just while getting started? Oof
Frankly I don't think one even needs to learn it, if you know a bunch of other languages and the codebase is good. I was able to just make a useful change to an open source project by just doing it, without having written any lines of Go before. Granted the MR needed some revisions.
Rust is my favorite, though. There are values beyond ease of contribution. I can't replicate the experience with a Rust project anymore, but I suspect it would have been tougher.
agents don't really care and they're doing anywhere between 90-100% of the work on CC. if anything, rust is better as it has more built-in verification out of the box.
Rust is accessible to everyone now that Claude Code and Opus can emit it at a high proficiency level.
Rust is designed so the error handling is ergonomic and fits into the flow of the language and the type system. Rust code will be lower defect rate by default.
Plus it's faster and doesn't have a GC.
You can use Rust now even if you don't know the language. It's the best way to start learning Rust.
The learning curve is not as bad as people say. It's really gentle.
I run many instances of Claude Code simultaneously and have not experienced what you are seeing. It sounds like you have a bias of Rust over Typescript.
No, they are describing a typical experience with the two apps. Just open both apps, run a few queries, and take a look at the difference in resource management yourself. It sounds like you have a bias of Claude Code over Codex.
Uh, it sounds like you're having trouble understanding that people in this thread are talking about two wildly different "claude code" applications. Those who are claiming the resources issues don't apply to them are referring to the cli application, ie: `claude` and those are saying things like "Just open both apps..." are surely referring to their GUI versions.
No, I've never used the GUI version. I literally just had to close and reopen the terminal running the Claude Code CLI on my Mac yesterday because it was taking too many resources. It generally happens when I ask Claude to use multiple sub agents. It's an obvious memory leak.
I am more concerned about their, umm, gallant approach to security. Not only that OpenCode is permissive by default in what it is allowed to do, but that it apparently tries to pull its config from the web (provider-based URL) by default [1]. There is also this open GitHub issue [2], which I find quite concerning (worst case, it's an RCE vulnerability).
It also sends all of your prompts to Grok's free tier by default, and the free tier trains on your submitted information, X AI can do whatever they want with that, including building ad profiles, etc.
You need to set an explicit "small model" in OpenCode to disable that.
This. I work on projects that warrant a self hosted model to ensure nothing is leaked to the cloud. Imagine my surprise when I discovered that even though the only configured model is local, all my prompts are sent to the cloud to... generate a session title. Fortunately caught during testing phase.
I’m curious if there’s a reason you’re not just coding in a container without access to the internet, or some similar setup? If I was worried about things in my dev chain accessing any cloud service, I’d be worried about IDE plugins, libraries included in imports, etc. and probably not want internet access at all.
The small_model option configures a separate model for lightweight tasks like title generation. By default, OpenCode tries to use a cheaper model if one is available from your provider, otherwise it falls back to your main model.
I would expect that if you set a local model it would just use the same model. Or if for example you set GPT as main model, it would use something else from OpenAI. I see no mentions of Grok as default
i ran it through mitmproxy, i am using pinned version 1.2.20, 6 march 2026, set up with local chat completions.
on that version, it does not fall back to the main model. it silently calls opencode zen and uses gpt-5-nano, which is listed as having 30 day retention plus openai policy, which is plain text human review by openai AND 3rd party contractors.
They're talking about before it's configured by the user. It defaults to 'free' models so that the user can ask a question immediately on startup. Once you configure a provider, the default models aren't used.
I liked the apple II, and the TRS 80 as I rather like basic. And then I didn’t hate DOS, and then I actively hated the graphical shell of Windows 3, but could not afford a Macintosh -so suffered through it where I had to, but mainly used DOS. Then I discovered UNIX, and did almost all of my work on a timeshare - in the early 90s!
Then Windows 95 came out and I actively hated it, but did think it was amazingly pretty - somehow this was the impetus for me to get a pc again, which I put Windows NT on. Which was profitable for freelance gigs in college. Soon after that, I dual booted it to Linux and spent most of my time in Slackware.
After that, I graduated and had enough money to buy a second rig, which I installed OS/2 warp on - which was good for side gigs. And I really liked. A lot. But my day job required that I have a Windows NT box to shell into the Solaris servers as we ran. Then I got a better class of employer and the next several let me run a Linux box to connect to our solaris (or Aix) servers.
Next my girlfriend at the time got a PowerBook G4 and installed OS X on it. It was obviously amazing. Windows XP came out, and it was once again so much worse than Windows NT - and crashed so much more - which was odd as it was based on Windows NT. (yes 98 was before this but it was really bad). Anyhow, right about here the Linux box I was running at home, died. And it was obvious that I was not going to buy an XP box, so I bought my first Mac.
And it’s been the same for the last 25 years - every time I look at a Windows box it’s horrible. I pretty much always have a Linux box headless somewhere in the house, and one rented in the cloud, and a Mac for interacting with the world.
And like the parent I actively dislike windows. And that’s interesting because I’ve liked most other operating systems I’ve used in my life, including MS-DOS. Modern windows is uniquely bad.
I use windows and absolutely hate the mac UI. Having the current window title bar always at the top of the screen doesn't make any sense when you have a very big monitor. It only made sense with the tiny monitors available when the mac UI was originally created.
No, it is still configurable. You can specify in your opencode.json config that it should be able to run everything. I think they just argued that it shouldn't be the default. Which I agree with.
No, the problem is that when logging in, the provider's website can provide an authentication shell command that OpenCode will send to the shell sight unseen, even if it is "rm -rf /home". This "feature" is completely unnecessary for the agent to function as an agent, or even for authentication. It's not about it being the default, it's about it being there at all and being designed that way.
> The problem for me is that the development practices of the people that are working on it are suboptimal at best; they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things (or even build a proper list of changes for each release), and they add, remove, refine, change, fix, and break features constantly at that accelerated pace.
this is what i notice with openclaw as well. there have been releases where they break production features. unfortunately this is what happens when code becomes a commidity, everyone thinks that shipping fast is the moat but at the expense of suboptimality since they know a fix can be implemented quickly on the next release.
Openclaw has 20k commits, almost 700k lines of code, and it is only four months old. I feel confident that that sort of code base would have a no coherent architecture at all, and also that no human has a good mental model of how the various subsystems interact.
I’m sure we’ll all learn a lot from these early days of agentic coding.
> I’m sure we’ll all learn a lot from these early days of agentic coding.
So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average. Depressingly.
> So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average.
Only for the non-pro users. After all, those users were happy to use excel to write the programs.
What we're seeing now is that more and more developers find they are happy with even less determinism than the Excel process.
Maybe they're right; maybe software doesn't need any coherence, stability, security or even correctness. Maybe the class of software they produce doesn't need those things.
I think what we're seeing is a phase transition. In the early days of any paradigm shift, velocity trumps stability because the market rewards first movers.
But as agents move from prototypes to production, the calculus changes. Production systems need:
- Memory continuity across sessions
- Predictable behavior across updates
- Security boundaries that don't leak
The tools that prioritize these will win the enterprise market. The ones that don't will stay in the prototype/hobbyist space.
We're still in the "move fast" phase, but the "break things" part is starting to hurt real users. The pendulum will swing back.
This makes sense. Development velocity is bought by having a short product life with few users. As you gain users that depend on your product, velocity must drop by definition.
The reason for this is that product development involves making decisions which can later be classified as good or bad decisions.
The good decisions must remain stable, while the bad decisions must remain open to change and therefore remain unstable.
The AI doesn't know anything about the user experience, which means it will inevitably change the good decisions as well.
20 for me, and let's not exaggerate. We've given lip service to it this entire time. Hell look at any of the corps we're talking about (including where I work) and they're demanding "velocity without lowering the quality bar", but it's a lie: they don't care about the quality bar in the slightest.
One of my main lessons after a decent long while in security, is that most orgs care about security, *as long as it doesn't get in the way of other priorities* like shipping new features. So when we get something like Agentic LLM tooling where everything moves super fast, security is inevitably going to suffer.
I’m learning that projects, developed with the help of agents, even when developers claim that they review and steer everything, ultimately are not fully understood or owned by the developers, and very soon turns into a thousand reinvented wheels strapped together by tape.
> very soon turns into a thousand reinvented wheels strapped together by tape.
Also most of the long running enterprise projects I’ve seen - there was one that had been around for like 10 years and like about 75% of the devs I hadn’t even heard of and none of the original ones were in the project at all.
The thing had no less than three auditing mechanisms, three ways of interacting with the database, mixed naming conventions, like two validation mechanisms none of which were what Spring recommended and also configurations versioned for app servers that weren’t even in use.
This was all before AI, it’s not like you need it for projects to turn into slop and AI slop isn’t that much different from human slop (none of them gave a shit about ADRs or proper docs on why things are done a certain way, though Wiki had some fossilized meeting notes with nothing actually useful) except that AI can produce this stuff more quickly.
When encountered, I just relied on writing tests and reworking the older slop with something newer (with better AI models and tooling) and the overall quality improved.
Claude Code breaks production features and doesn't say anything about it. The product has just shifted gears with little to no ceremony.
I expect that from something guiding the market, but there have been times where stuff changes, and it isn't even clear if it is a bug or a permanent decision. I suspect they don't even know.
We're still in the very early days of generative AI, and people and markets are already prioritizing quality over quantity. Quantity is irrelevant when it comes value.
All code is not fungible, "irreverent code that kinda looks okay at first glance" might be a commodity, but well-tested, well-designed and well-understood code is what's valuable.
and once you've got your wish: ugly code without tests or a way to comprehend it, but cheap!
How much value are you going to be able to extract over its lifetime once your customers want to see some additional features or improvements?
How much expensive maintenance burden are you incurring once any change (human or LLM generated) is likely to introduce bugs you have no better way of identifying than shipping to your paying customers?
Maybe LLM+tooling is going to get there with producing a comprehensible and well tested system but my anectodal experience is not promising. I find that AI is great until you hit its limit on a topic and then it will merrily generate tokens in a loop suggesting the same won't-work-fix forever.
What you wrote aligns with my experience so far.
It's fast and easy to get something working, but in a number of cases it (Opus) just gets stuck 'spinning' and no number of prompts is going to fix that.
Moreover - when creating things from scratch it tends to use average/insecure/ inefficient approaches that later take a lot of time to fix.
The whole thing reminds me a bit of the many RAD tools that were supposed to 'solve' programming. While it was easy to start and produce something with those tools, at some point you started spending way too much time working around the limitations and wished you started from scratch without it.
I'm of the opinion that the diligence of experts is part of what makes code valuable assets, and that the market does an alright job of eventually differentiating between reliable products/brands and operations that are just winging it with AI[1].
I would think that the better the code is designed and factored and refactored, the easier it is to maintain and evolve, detect and remove bugs and security vulnerabilties from it. The ease of maintenance helps both AI and humans.
There are limits to what even AI can do to code, within practical time-limits. Using AI also costs money. So, easier it is to maintain and evolve a piece of software, the cheaper it will be to the owners of that application.
It's understandable and even desirable that a new piece of code rapidly evolves as they iterate and fix bugs. I'd only be concerned if they keep this pattern for too long. In the early phases, I like keeping up with all the cutting edge developments. Projects where dev get afraid to ship because of breaking things end up becoming bloated with unnecessary backward compatibility.
I recently listened to this episode from the Claude Code creator (here is the video version: https://www.youtube.com/watch?v=PQU9o_5rHC4) and it sounded like their development process was somewhat similar - he said something like their entire codebase has 100% churn every 6 months. But I would assume they have a more professional software delivery process.
I would (incorrectly) assume that a product like this would be heavily tested via AI - why not? AI should be writing all the code, so why would the humans not invest in and require extreme levels of testing since AI is really good at that?
I feel like our industry goes through these phases where there's an obvious thought leader that everyone's copying because they are revolutionary.
Like Rails/DHH was one phase, Git/GitHub another.
And right now it's kinda Claude Code. But they're so obviously really bad at development that it feels like a MLM scam.
I'm just describing the feeling I'm getting, perhaps badly. I use Claude, I recommended Claude for the company I worked at. But by god they're bloody awful at development.
It feels like the point where someone else steps in with a rock solid, dependable, competitor and then everyone forgets Claude Code ever existed.
I use Claude Code because Anthropic requires me to in order to get the generous subscription tokens. But better tools exist. If I was allowed to use Cursor with my Claude sub I would in a heartbeat.
I mean, I'm slowly trying to learn lightweight formal methods (i.e. what stuff like Alloy or Quint do), behavior driven development, more advanced testing systems for UIs, red-green TDD, etc, which I never bothered to learn as much before, precisely because they can handle the boilerplate aspects of these things, so I can focus on specifying the core features or properties I need for the system, or thinking through the behavior, information flow, and architecture of the system, and it can translate that into machine-verifiable stuff, so that my code is more reliable! I'm very early on that path, though. It's hard!
I heard from somebody inside Anthropic that it's really two companies, one which are using AI for everything and the other which spends all their time putting out fires.
OpenCode's creator acknowledged that the ease of shipping has let them ship prototype features that probably weren't worth shipping and that they need to invest more time cleaning up and fixing things.
Uff. This is exactly what Casey Muratori and his friend was talking about in of their more recent podcast. Features that would never get implemented because of time constraints now do thanks to LLMs and now they have a huge codebase to maintain
I'm still trying to figure out how "open" it really is; There are reports that it phones home a lot[0], and there is even a fork that claims to remove this behavior[1]:
I think there’s a conflict between “open” as in “open source”, and “open” as in “open about the practice” paired with the fact we usually don’t review software’s source scrupulously enough to spot unwanted behaviors”.
so how is telemetry not open? If you don't like telemetry for dogmatic reasons then don't use it. Find the alternative magical product whose dev team is able to improve the software blindfolded
> Find the alternative magical product whose dev team is able to improve the software blindfolded
The choice isn't "telemetry or you're blindfolded", the other options include actually interacting with your userbase. Surveys exist, interviews exist, focus groups exist, fostering communities that you can engage is a thing, etc.
For example, I was recruited and paid $500 to spend an hour on a panel discussing what developers want out of platforms like DigitalOcean, what we don't like, where our pain points are. I put the dollar amount there only to emphasize how valuable such information is from one user. You don't get that kind of information from telemetry.
> Surveys exist, interviews exist, focus groups exist, fostering communities that you can engage is a thing, etc.
We all know it’s extremely, extremely hard to interact with your userbase.
> For example I was paid $500 an hour
+the time to find volunteers doubled that, so for $1000 an hour x 10 user interviews, a free software can have feedback from 0.001% of their users. I dislike telemetry, but it’s a lie to say it’s optional.
—a company with no telemetry on neither of our downloadable or cloud product.
> We all know it’s extremely, extremely hard to interact with your userbase.
On the contrary, your users will tell you what you need to know, you just have to pay attention.
> I dislike telemetry, but it’s a lie to say it’s optional.
The lie is believing it’s necessary. Software was successful before telemetry was a thing, and tools without telemetry continue to be successful. Plenty of independent developers ship zero telemetry in their products and continue to be successful.
Probably all describe problems stem from the developers using agent coding; including using TypeScript, since these tools are usually more familiar with Js/Js adjacent web development languages.
Perhaps the use of coding agents may have encouraged this behavior, but it is perfectly possible to do the opposite with agents as well — for instance, to use agents to make it easier to set up and maintain a good testing scaffold for TUI stuff, a comprehensive test suite top to bottom, in a way maintainers may not have had the time/energy/interest to do before, or to rewrite in a faster and more resource efficient language that you may find more verbose, be less familiar with, or find annoying to write — and nothing is forcing them to release as often as they are, instead of just having a high commit velocity. I've personally found AIs to be just as good at Go or Rust as TypeScript, perhaps better, as well, so I don't think there was anything forcing them to go with TypeScript. I think they're just somewhat irresponsible devs.
> I think they're just somewhat irresponsible devs.
Before coding agents it took quite a lot more experience before most people could develop and ship a successful product. The average years of experience of both core team and contributors was higher and this reflected in product and architecture choices that really have an impact, especially on non-functional requirements.
They could have had better design and architecture in this project if they had asked the AI for more help with it, but they did not even know what to ask or how to validate the responses.
Of course, lots of devs with more years of experience would do just as badly or worse. What we are seeing here though is a filter removed that means a lot of projects now are the first real product everyone the team has ever developed.
I agree that Opencodr is using a lot of RAM, but regarding the features, I am ak only using the built in features and I wouldn't say they are too many, they are just enough for a complete workflow. If you need more you can install plugins, which I haven't done yet and it's my daily driver for four months.
You must never rely on AI itself for authorization… don’t let it run on an environment where it can do that. I can’t believe this needs to be said but everyone seems to have lost their mind and decided to give all their permissions away to a non deterministic thing that when prompted correctly will send it all out to whoever asks it nicely.
The value of having (and executing) a coherent product vision is extremely undervalued in FOSS, and IMO the difference between a successful project in the long-term and the kind of sploogeware that just snowballs with low-value features.
> The value of having (and executing) a coherent product vision is extremely undervalued in FOSS
Interesting you say this because I'd say the opposite is true historically, especially in the systems software community and among older folks. "Do one thing and do it well" seems to be the prevailing mindset behind many foundational tools. I think this why so many are/were irked by systemd. On the other hand newer tools that are more heavily marketed and often have some commercial angle seem to be in a perpetual state of tacking on new features in lieu of refining their raison d'etre.
Is there a name for these types of "overbearing" and visually busy "TUIs"? It seems like all the other agents have the same aesthetic and it is unlike traditional nurses or plain text interfaces in a bad way IMO. The constant spinners, sidebars and needless margins are a nuisance to me. Especially over an ssh connection in a tmux session it feels wrong.
I’m a little surprised by your description of constant releases and instability. That matches how I would describe Claude Code, and has been one of the main reasons I tend to use OpenCode more than Claude Code.
OpenCode has been much more stable for me in the 6 months or so that I’ve been comparing the two in earnest.
I use Droid specifically because Claude Code breaks too often for me. And then Droid broke too (but rarely), and I just stuck to not upgrading (like I don't upgrade WebStorm. Dev tools are so fragile)
I’ve been testing opencode and it feels TUI in appearance only. I prefer commandline and TUIs and in my mind TUI idea is to be low level, extremely portable interface and to get out of the way. Opencode does not have low color, standard terminal theme so had to switch to a proper terminal program. Copy paste is hijacked so I need to write code out to file in order to get a snippet. The enter key (as in the return by the keypad) does not work for sending a line. I have not tested but don’t think this would work over SSH even. I have been googling around to find if I am holding it wrong but it feels to break expectations of a terminal app in a way that I wish they would have made it a gui. Makes me sad because I think the goods are there and it’s otherwise good.
I don’t think good TUI’s are the same as good command line programs. Great tui apps would to me be things like Norton/midnight commander, borlands turbo pascal, vim, eMacs and things like that
Yes cli and tui are not the same, but I expect TUI to work decent in general terminal emulator and not acitvely block copying and pasting. Having to install supported terminal emulator goes against the vibe.
Yeah every time I want to like it, scrolling is glitched vs codex and Claude. And other various things like: why is this giant model list hard coded for ollama or other local methods vs loading what I actually have...
On top of that. Open code go was a complete scam. It was not advertised as having lower quality models when I paid and glm5 was broken vs another provider, returning gibberish and very dumb on the same prompt
The biggest reason is I don't like being locked into an ecosystem. I can use whatever I want with OpenCode, not so much with Codex and Claude Code. Right now I'm only using GPT with it, but I like the option.
CC I have the least experience with. It just seemed buggy and unpolished to me. Codex was fine, but there was something about it that just didn't feel right. It seemed fined for code tasks but just as often I want to do research or discuss the code base, and for whatever reason I seemed to get terse less useful answers using Codex even when it's backed by the same model.
OpenCode works well, I haven't had any issues with bugs or things breaking, and it just felt comfortable to use right from the jump.
That is very disappointing coz I've been wanting to try an alternative to Gemini CLI for exactly these reasons. The AI is great but the actual software is a buggy, slow, bloated blob of TypeScript (on a custom Node runtime IIUC!) that I really hate running. It takes multiple seconds to start, requires restarting to apply settings, constantly fucks up the terminal, often crashes due to JS heap overflows, doesn't respect my home dir (~/.gemini? Come on folks are we serious?), has an utterly unusable permission system, etc etc. Yet they had plenty of energy to inject silly terminal graphics and have dumb jokes and tips scroll across the screen.
Is Claude Code like this too? I wonder if Pi is any better.
A big downside would be paying actual cost price for tokens but on the other hand, I wouldn't be tied to Google's model backend which is also extremely flaky and unable to meet demand a lot of the time. If I could get real work done with open models (no idea if that's the case yet) and switch providers when a given provider falls over, that would be great.
Claude will also happily write a huge pile of junk into your home directory, I am sad to report. The permissions are idiotic as well, but I always use it in a container anyway. But I have not had it crash and it hasn't been slow starting for me.
> they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things
Tbf, this seems exactly like Claude Code, they are releasing about one new version per day, sometimes even multiple per day. It’s a bit annoying constantly getting those messages saying to upgrade cc to the latest version
This is why I'm taking a wait-and-see approach to these tools on HN myself. My month with Claude Code (the TUI, not the GUI) was amazing from an IT POV, just slop-generating niche tools I could quickly implement and audit (not giant-ass projects), but I ain't outsourcing that to another company when Qwen et al are right there for running on my M1 Pro or RTX 3090.
I'm looking forward to more folks building these kinds of tools with a stronger focus on portability via API or loading local models, as means of having a genuinely useful assistant or co-programmer rather than paying some big corp way too much money (and letting them use my data) for roughly the same experience.
Yeah I tried using it when oh-my-opencode (now oh-my-openagent) started popping off and found it had highly unstable. I just stick with internal tooling now.
For serious coding work I use the Zed Agent; for everything else I use pi with a few skills. Overall, though, I'd recommend Pi plus a few extensions for any features you miss extremely highly. It's also TypeScript, but doesn't suffer from the other problems OC has IME. It's a beautiful little program.
Big +1 to Pi[1]. The simplicity makes it really easy to extend yourself too, so at this point I have a pretty nice little setup that's very specific to my personal workflows. The monorepo for the project also has other nice utilities like a solid agent SDK. I also use other tools like Claude Code for "serious" work, but I do find myself reaching for Pi more consistently as I've gotten more confident with my setup.
pi.dev is worth checking out. The basic idea is they provide a minimalist coding agent that's designed to be easy to extend, so you can tailor the harness to suit your needs without any bloat.
One of the best features is they haven't been noticed by Anthropic yet so you can still use your Claude subscription.
I've been building VT Code (https://github.com/vinhnx/vtcode), a Rust-based semantic coding agent. Just landed Codex OAuth with PKCE exchange, credentials go into the system keyring.
I build VT Code with Tree-sitter for semantic understanding and OS-native sandboxing. It's still early but I confident it usable. I hope you'll give it a try.
I tried it briefly and the practice - argued for strategy for operation actually - to override my working folder seelction and altering to the parent root git folder is a no go.
Isn't this pretty much the standard across projects that make heavy use of AI code generation?
Using AI to generate all your code only really makes sense if you prioritize shipping features as fast as possible over the quality, stability and efficiency of the code, because that's the only case in which the actual act of writing code is the bottleneck.
I don't think that's true at all. As I said, in a response to another person blaming it on agentic coding above, there are a very large number of ways to use coding agents to make your programs faster, more efficient, more reliable, and more refined that also benefit from agents making the code writing research, data piping, and refactoring process quicker and less exhausting. For instance, by helping you set up testing scaffolding, handling the boilerplate around tests while you specify some example features or properties you want to test and expands them, rewriting into a more efficient language, large-scale refactors to use better data structures or architectures, or allowing you to use a more efficient or reliable language that you don't know as well or find to have too much boilerplate or compiler annoyance to otherwise deal with yourself. Then there are sort of higher level more phenomenological or subjective benefits, such as helping you focus on the system architecture and data flow, and only zoom in on particular algorithms or areas of the code base that are specifically relevant, instead of forever getting lost in the weeds of thinking about specific syntax and compiler errors or looking up a bunch of API documentation that isn't super important for the core of what you're trying to do and so on.
Personally, I find this idea that "coding isn't the bottleneck" completely preposterous. Getting all of the API documentation, the syntax, organizing and typing out all of the text, finding the correct places in the code base and understanding the code base in general, dealing with silly compiler errors and type errors, writing a ton of error handling, dealing with the inevitable and inoraticable boilerplate of programming (unless you're one of those people that believe macros are actually a good idea and would meaningfully solve this), all are a regular and substantial occurrence, even if you aren't writing thousands of lines of code a day. And you need to write code in order to be able to get a sense for the limitations of the technology you're using and the shape of the problem you're dealing with in order to then come up with and iterate on a better architecture or approach to the problem. And you need to see your program running in order to evaluate whether it's functionality and design a satisfactory and then to iterate on that. So coding is actually the upfront costs that you need to pay in order to and even start properly thinking about a problem. So being able to get a prototype out quickly is very important. Also, I find it hard to believe that you've never been in a situation where you wanted to make a simple change or refactor that would have resulted in needing to update 15 different call sites to do properly in a way that was just slightly variable enough or complex enough that editor macros or IDE refactoring capabilities wouldn't be capable of.
That's not to mention the fact that if agentic coding can make deploying faster, then it can also make deploying the same amount at the same cadence easier and more relaxing.
You're both right. AI can be used to do either fast releases or well designed code. Don't say both, as you're not making time, you're moving time between those two.
Which one you think companies prefer? Or if you're a consulting business, which one do you think your clients prefer?
> AI can be used to do either fast releases or well designed code
I have yet to actually see a single example of the latter, though. OpenCode isn't an isolated case - every project with heavy AI involvement that I've personally examined or used suffers from serious architectural issues, tons of obvious bugs and quirks, or both. And these are mostly independent open source projects, where corporate interests are (hopefully) not an influence.
I will continue to believe it's not actually possible until I am proven wrong with concrete examples. The incentives just aren't there. It's easy to say "just mindlessly follow X principle and your software will be good", where X is usually some variation of "just add more tests", "just add more agents", "just spend more time planning" etc. but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
> It's easy to say "just mindlessly follow X principle and your software will be good", where X is usually some variation of "just add more tests", "just add more agents", "just spend more time planning" etc
That's a complete strawman of what I — or others trying to learn how to use coding agents to increase quality, like Simon Willison or the Oxide team — am saying.
> but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
This is just a no true Scotsman. I prefer to use coding agents because they don't forget details, or get exhausted, or overwhelmed, or lazy, or give up, ever — whereas I might. Therefore, they allow me to do all of the things that improve code and software quality more extensively and thoroughly, like refactors, performance improvements, and tests among other things (because yes, there is no single panacea). Furthermore, I do still care about the clarity, concision, modularity, referential transparency, separation of concerns, local reasonability, cognitive load, and other good qualities of the code, because if those aren't kept up a) I can't review the code effectively or debug things as easily when they go wrong, b) the agent itself will struggle to male changes without breaking other things, and struggle to debug, c) those things often eventually effect the quality of the end state software.
Additionally, what you say is empirically false. Many people who do deeply value quality software and code quality, such as the creators of Flask, Redis, and SerenityOS/Ladybird, all use and value agentic coding.
Just because you haven't seen good quality software with a large amount of agentic influence doesn't mean it isn't possible. That's very close minded.
Show me an example then. I want to see an example of quality software that makes heavy use of AI generated code (as in, basically written entirely by AI similar to OpenCode), led by developer(s) who care deeply about software quality but still choose to not write code themselves.
I tried running Opencode on my 7$/yr 512mb vps but it had the OOM issue and yes it needs 1GB of ram or more.
I then tried running other options like picoclaw/picocode etc but they were all really hard to manage/create
The UI/UX I want is that I can just put my free openrouter api key in and then I am ready to go to get access to free models like Arcee AI right now
After reading your comments/I read this thread, I tried crush by charmbracelet again and it gives the UI/UX that I want.
I am definitely impressed by crush/ the charm team. They are on HN and they work great for me, highly recommended if you want something which can work on low constrained devices
I do feel like Charm's TUI's are too beautiful in the sense that running a connection over SSH can delay so when I tried to copy some things, the delay made things less copy-able but overall, I think that I am using Crush and I am happy for the most part :-)
Edit: That being said, just as I was typing this, Crush took all the Free requests from Openrouter that I get for free so it might be a bit of minor issue but overall its not much of an issue from Crush side, so still overall, my point is that Crush is worth checking out
Kudos to the CharmBracelet team for making awesome golang applications!
This is my main problem I have with it: It sends data and loads code left and right by default. For instance, the latest plugin packages are automatically installed on every startup. Their “Zen” provider is enabled by default so you might accidentally upload your code base to their servers. Better yet: The web UI has a button that just uploads the entire session to their servers WITH A SINGLE CLICK for sharing.
The situation is ... pretty bad. But I don’t think this is particularly malicious or even a really well considered stance, but just a compromise in order to move fast and ship useful features.
To make it easily adoptable by anyone privacy conscious without hours of tweaking, there should be an effort to massively improve this situation. Luckily, unlike Claude Code, the project is open source and can he changed!
There is some kind of fitting irony around agentic coding harnesses mainly being maintained by coding agents themselves, and as a result they are all a chaotic mess.
The model selection for title generation works as follows (prompt.ts:1956-1960):
1. If the title agent has an explicit model configured — that model is used.
2. Otherwise, it tries Provider.getSmallModel(providerID) — which picks a "small" model from the same provider as the current session, using this priority list (provider.ts:1396-1402):
- claude-haiku-4-5 / claude-haiku-4.5 / 3-5-haiku / 3.5-haiku
- gemini-3-flash / gemini-2.5-flash
- gpt-5-nano
- (Copilot adds gpt-5-mini at the front; opencode provider uses only gpt-5-nano)
3. If no small model is found — it falls back to the same model currently being used for the session.
So by default, title generation uses a cheaper/faster small model from the same provider (e.g., Haiku if on Anthropic, Flash if on Google, nano if on OpenAI), and if none are available, it just uses whatever model the user is chatting with. You can also override this entirely by configuring a model on the title agent.
When I did this, I used a single local llama.cpp server instance as my main model without setting a small model and it did not use it for chat titles while I used it for prompts.
Chat titles would work even when the local llama.cpp server hadn't started, and it was never in the the llama.cpp logs, it used an external model I hadn't set up and had not intended to use.
It was only when I set `small_model` that I was able to route title generation to my own models.
Fwiw this got changed about a week ago, where they changed the logic to match the documentation rather than default to sending your prompts to their servers. This is why so many people have noticed this happening but if you ask an AI about it right now it will say this is not true.
Personally I think it's necessary to run opencode itself inside a sandbox, and if you do that you can see all of the rejected network calls it's trying to make even in local mode. I use srt and it was pretty straightforward to set up
Also, even when using local models in ollama or lmstudio, prompts are proxied via their domain, so never put anything sensitive even when using local setup
To be clear, that seems to be about the webui only, the TUI doesn't seem affected. I haven't fully investigated this myself, but when I run opencode (1.2.27-a6ef9e9-dirty) + mitmproxy and using LM Studio as the backend, when starting opencode + executing a prompt, I only see two requests, both to my LM Studio instance, both normal inference requests (one for the chat itself + one for generating the title).
Everything you read on the internet seems exaggerated today. Especially true for reddit, and especially especially true for r/LocalLllama which is a former shadow of itself. Today it's mostly sockpuppets pushing various tools and models, and other sockpuppets trying to push misinformation about their competitors tools/models.
Geez there should be a big warning on the tin about this. They’re so neatly integrated with copilot that I assumed (and told others) that they had all the privacy guarantees of copilot :(
I can tell that you’re doing all of this in the name of first-use UX. It’s working: The out of the box experience is really seamless.
But for serious (“grown up”) use, stuff like this just doesn’t fly. At all. We have to know and be able to control exactly where data gets sent. You can’t just exfiltrate our data to random unvetted endpoints.
Given the hurt trust of the past, there also needs to be a communication campaign (“actually we’re secure now”), because otherwise people will keep going around claiming that OpenCode sends all of your data to Grok. This would really unnecessarily hurt the project in the long run.
More importantly, the current dev branch source for packages/opencode/src/session/summary.ts shows summarizeMessage() now only computes diffs and updates the message summary object; it does not make an LLM call there anymore. The current code path calls summarizeSession() and summarizeMessage(), and summarizeMessage() just filters messages, computes diffs, sets userMsg.summary.diffs, and saves the message.
Yikes... sending prompts to a third party by default with no disclosure in the setup flow is a rough look for a tool that positions itself as the open sources alternative. "Open" loses meaning fast if the defaults work against the user.
To the provider you select in the UI, I agree. But OpenCode automatically sends prompts to their free "Zen" proxy, even without choosing it in the UI.
Imagine someone using it at work, where they are only allowed to use a GitHub Copilot Business subscription (which is supported in OpenCode). Now they have sent proprietary code to a third party, and don't even know they're doing it.
This is exactly me considering what I might have leaked to god knows who via grok. I was hyped by opencode but now I’m thinking of alternatives. A huge red flag… at best irresponsible?
Are you using Grok for the coding? Because I have Copilot connected and I can see the request to Copilot for the summaries - with no "small model" setting even visible in my settings.
I found out about OpenCode through the Anthropic feud. I now spend most of my AI time in it, both at work and at home. It turns out to be pretty great for general chat too, with the ability to easily integrate various tools you might need (search being the top one of course).
I have things to criticize about it, their approach to security and pulling in code being my main one, but over all it’s the most complete solution I’ve found.
They have a server/client architecture, a client SDK, a pretty good web UI and use pretty standard technologies.
The extensibility story is good and just seems like the right paradigms mostly, with agents, skills, plugins and providers.
They also ship very fast, both for good and bad, I’ve personally enjoyed the rapid improvements (~2 days from criticizing not being able to disable the default provider in the web ui to being able to).
I think OpenCode has a pretty bright future and so far I think that my issues with it should be pretty fixable. The amount of tasteful choices they’ve made dwarfs the few untasteful ones for me so far.
The team also is not breathlessly talking about how coding is dead. They have pretty sane takes on AI coding including trying to help people who care about code quality.
I do like OpenCode, and have been using it in and off since last July. But I feel like they’re trying to stuff too much GUI into a TUI? Due to this I find myself using Codex and Pi more often. But am still glad OpenCode and their Zen product exist.
opencode stands out as one of the few agents with a proper client server architecture that allows something like openchambers great vscode extension so its possible to seamlessly switch between tui, vscode, webapp, desktop app. i think there is hardly a usable alternative for most coding agent usecases (assuming agents from model providers are a no go, they cannot be allowed to own the tools AND the models). But its also far from perfect: the webui is secretly served from their servers instead of locally for no reason. worse the fallback route gets also sent to their servers so any unknown request to opencode api ends up being sent to opencode servers potentially leaking data. the security defaults are horrific, its impossible to use it safely outside a controlled container. it will just serve your whole hard drive via rest endpoint and not constrain to project folders. the share feature uploading your conversations to their servers is also so weirdly communicated and implemented that it leaves a bad taste. I dont think this will become much better until the agent ecosystem is more modular and less monolith, acp, a2a and mcp need to become good enough so tools, prompts, skills, subagent setups and workflow engines and UIs are completely swappable and the agent core has to only focus on the essentials like runtime and glue architecture. i really hope we dont see all of these grow into full agent oses with artificial lock in effects and big effort buy in.
The Agent that is blacklisted from Anthropic AI, soon more to come.
I really like how their subagents work, as a bonus I get to choose which model is in which agent. Sadly I have to resort to the mess that Anthropic calls Claude Code
They are not blacklisted. You are allowed to use the API at commercial usage pricing. You are just not allowed to use your Claude Code subscription with OpenCode (or any other third‑party harness for the record).
If you're not paying full-fat API prices, then probably.
From what I've heard, the metrics used by Anthropic to detect unauthorized clients is pretty easy to sidestep if you look at the existing solutions out there. Better than getting your account banned.
The highest in in the industry for API pricing right now is GPT-5.4-Pro, OpenRouter adding that as an option in their Auto Router was when I had to go customise the routing settings because it was not even close to providing $30/m input tokens and $180/m output tokens of value (for context Opus 4.6 is $5/m input and $25/m output)
(Ok, technically o1-pro is even more expensive, but I'm assuming that's a "please move on" pricing)
Sometimes people want to be real pedants about licensing terms when it comes to OSS, assuming such terms are completely bulletproof, other times people don't think the terms of their agreement with a service provider should have any force at all.
With Anthropic, you either pay per token with an API key (expensive), or use their subscription, but only with the tools that they provide you - Claude, Claude Cowork and Claude Code (both GUI and CLI variants). Individuals generally get to use the subscriptions, companies, especially the ones building services on top of their models, are expected to pay per token. Same applies to various third party tools.
The belief is that the subscriptions are subsidized by them (or just heavily cut into profit margins) so for whatever reason they're trying to maintain control over the harness - maybe to gather more usage analytics and gain an edge over competitors and improve their models better to work with it, or perhaps to route certain requests to Haiku or Sonnet instead of using Opus for everything, to cut down on the compute.
Given the ample usage limits, I personally just use Claude Code now with their 100 USD per month subscription because it gives me the best value - kind of sucks that they won't support other harnesses though (especially custom GUIs for managing parallel tasks/projects). OpenCode never worked well for me on Windows though, also used Codex and Gemini CLI.
>or perhaps to route certain requests to Haiku or Sonnet instead of using Opus for everything, to cut down on the compute
You can point Claude Code at a local inference server (e.g. llama.cpp, vLLM) and see which model names it sends each request to. It's not hard to do a MITM against it either. Claude Code does send some requests to Haiku, but not the ones you're making with whatever model you have it set to - these are tool result processing requests, conversation summary / title generation requests, etc - low complexity background stuff.
Now, Anthropic could simply take requests to their Opus model and internally route them to Sonnet on the server side, but then it wouldn't really matter which harness was used or what the client requests anyway, as this would be happening server-side.
Sounds pretty sane, the same way how OpenWebUI and probably other software out there also has a concept of “tool models”, something you use for all the lower priority stuff.
Actually curious to hear what others think about why Anthropic is so set on disallowing 3rd party tools on subscriptions.
The sota models are largely undifferentiated from each other in performance right now. And it’s possible open weight models will get “good enough” relatively soonish. This creates a classic case where inference becomes a commodity. Commodities have very low margins. Training puts them in an economic hole where low margins will kill them.
So they have to move up the stack to higher margin business solutions. Which is why they offer subsidized subscription plans in the first place. It’s a marketing cost. But they want those marketing dollars to drive up the stack not commodity inference use cases.
Anthropic's model deployments for Claude Code are likely optimized for Claude Code. I wouldn't be surprised if they had optimizations like sharing of system prompt KV-cache across users, or a speculative execution model specifically fine-tuned for the way Claude Code does tool calls.
When setting your token limits, their economics calculations likely assume that those optimizations are going to work. If you're using a different agent, you're basically underpaying for your tokens.
It’s probably a mixture of things including direct control over how the api is called and used as pointed out above and giving a discount for using their ecosystem. They are in fact a business so it should not surprise anyone they act as one.
It might well be a mixture, but 95% of that mixture is vendor lock in. Same reason they don't support AGENTS.md, they want to add friction in switching.
It's very straightforward to instrument CC under tmux with send-keys and capturep. You could easily use that for distillation, IMO. There are also detailed I/O logs.
Yup. And right now I'm straight-up breaking Claude's TOS by modifying OpenCode to still accept tokens. But I only have a few days left and don't care if they ban me. I'm using what I paid for.
Anthropic has an API, you can use any client but they charge per input/output/cache token.
One-price-per-month subscriptions (Claude Code Pro/MAX @ $20/$100/$200 a month) use a different authentication mechanism, OAUTH. The useful difference is you get a lot more inference than you can for the same cost using the API but they require you to use Claude Code as a client.
Some clients have made it simple to use your subscription key with them and they are getting cease and desist letters.
I pay $100/mo to Anthropic. Yesterday I coded one small feature via an API key by accident and it cost $6. At this rate, it will cost me $1000/mo to develop with Opus. I might as well code by hand, or switch to the $20 Codex plan, which will probably be more than enough.
I'd rather switch to OpenAI than give up my favorite harness.
My monthly "connection fee" is more than that (no solar, just EV). Your cartel needs to step it up!
For me it's $0.8/kWh during peak, $0.47 off peak, and super off peak of $0.15. I accidentally left a little mini 500W heater on all day, while I was out, costing > 5% of your whole month!
Yeah I had a similar experience one time. Which is why I laugh when people suggest Anthropic is profitable. Sure, maybe if everyone does API pricing. Which they won’t because it’s so damn expensive. Another way to think about it is API pricing is a glimpse into the future when everyone is dependent on these services and the subscription model price increases start.
probably more agents to be blocked by anthropic. i've seen theo from t3.gg go through a bunch of loopholes to support claude in his t3code app just so anthropic doesn't sue their asses.
There are boards starting in the $1500-$2000 range, and complete systems in the $2500-$2700 range. I actually don't know of any Strix Halo mini PCs that cost $3000, do you?
EDIT: The system I bought last summer for $1980 and just took delivery of in October, Beelink GTR 9 Pro, is now $2999.... wow...
JS is not something that was developed with CLI in mind and on top of that that language does not lend itself to be good for LLM generation as it has pretty weak validation compared to e.g. Rust, or event C, even python.
It’s simply one of the most productive languages. It actually has a very strong type system, while still being a dynamic language that doesn’t have to be compiled, leading to very fast iteration. It’s also THE language you use when writing UIs. Execution is actually pretty fast through the runtimes we have available nowadays.
The only other interpreted language is Python and that thoroughly feels like a toy in comparison (typing situation still very much in progress, very weak ORM situation, not even a usable package manger until recently!).
I'm unsure that I agree with this, for my smaller tools with a UI I have been using rust for business logic code and then platform native languages, mostly swift/C#.
I feel like with a modern agentic workflow it is actually trivial to generate UIs that just call into an agnostic layer, and keeping time small and composable has been crucial for this.
That way I get platform native integration where possible and actual on the metal performance.
If Python has a "very weak ORM situation", what is it about the TS ORM scene that makes it stronger by comparison? Is there one library in particular that stands out?
pnpm is amazing for speed and everybody should use it! but even with npm before it, at least it was correct. I had very few (none?) mysterious issues with it that could only be solved by nuking the entire environment. That is more than I can say about the python package managers before uv.
For a TUI agent, runtime performance is not the bottleneck, not by far. Hackability is the USP. Pi has extensions hotreloading which comes almost for free with jiti. The fact that the source is the shipped artifact (unlike Go/Rust) also helps the agent seeing its own code and the ability to write and load its own extensions based on that. A fact that OpenClaw’s success is in part based on IMO.
I can’t find the tweet from Mario (the author), but he prefers the Typescript/npm ecosystem for non-performance critical systems because it hits a sweet spot for him. I admire his work and he’s a real polyglot, so I tend to think he has done his homework. You’ll find pi memory usage quite low btw.
OK, make sense, but there are also claw clones that are in Rust (and self modifying).
Also python ones would also allow self modifying. I'm always puzzled (and worried) when JS is used outside of browsers.
I'm biased as I find JS/TS rather ugly language compared to anything other basically (PHP is close second). Python is clean, C has performance, Rust is clean and has performance, Java has the biggest library and can run anywhere.
In pi’s case there is a plugin system. It’s much easier to make a self extending agent work with Python or JavaScript than most other languages. JavaScript has the benefit that it has a great typing system on top with TypeScript.
Pi is refreshingly minimal in terms of system prompts, but still works really well and that makes me wonder whether other harnesses are overdoing. Look at OpenCode's prompts, for instance - long, mostly based on feels and IMO unnecessary. I would've liked to just overwrite OC's system prompts with Pi's (to get other features that Pi doesn't have) but that isn't possible today (without maintaining a custom fork)
I just found out about pi yesterday. It's the only agent that I was able to run on RISC-V. It's quite scary that it runs commands without asking though.
The simplicity of extending pi is in itself addictive, but even in its raw form it does the job well.
Before finding pi I had written a lot of custom stuff on top of all the provider specific CLI tools (codex, Claude, cursor-agent, Gemini) - but now I don’t have to anymore (except if I want to use my anthropic sub, which I will now cancel for that exact reason)
Pi is good stuff and refreshingly simple and malleable.
I used it recently inside a CI workflow in GitLab to automatically create ChangeLog.md entries for commits. That + Qwen 3.5 has been pretty successful. The job starts up Pi programatically, points it at the commits in question, and tells it to explore and get all the context it needs within 600 seconds... and it works. I love that this is possible.
I love OpenCode! I wrote a plugin that adds two tools: prune and retrieve. Prune lets the LLM select messages to remove from the conversation and replace with a summary and key terms. The retrieve tool lets it get those original messages back in case they're needed. I've been livestreaming the development and using it on side projects to make sure it's actually effective... And it turns out it really is! It feels like working with an infinite context window.
Long tool outputs/command outputs everything in my harness is spilled over to the filesystem. Context messages are truncated and split to filesystem with a breadcrumb for retrieving the full message.
The infinite context window framing is the right way to think about it. Running inside Claude Code continuously, the prune step matters more than retrieve in practice — most of what gets dropped stays dropped. More useful is being deliberate about what goes in at the start of each loop iteration rather than managing what comes out at the end.
Assuming you pay per token, which seems like a really strange workflow to lock yourself into at this point. Neither paid monthly plans nor local models suffer from that issue.
I tried once to use APIs for agents but seeing a counter of money go up and eventually landing at like $20 for one change, made it really hard to justify. I'd rather pay $200/month before I'd be OK with that sort of experience.
Yes I use the $200 per month plan for Claude Code and it's amazing
I assume the usage varies based on prompt caching, but I could be wrong. Why would you assume prompt caching would have zero effect on the subscription usage?
The $20-per-change problem is a workflow problem, not a pricing problem. Batching work into larger well-scoped sessions rather than interactive back-and-forth changes the unit economics significantly. Most people use these tools like a terminal — one command at a time — which is the worst possible cost profile.
I’ve been extraordinarily productive with this, their $10 Go plan, and a rigorous spec-driven workflow. Haven’t touched Claude in 2 months.
I sprinkle in some billed API usage to power my task-planner and reviewer subagents (both use GPT 5.4 now).
The ability to switch models is very useful and a great learning experience. GLM, Kimi and their free models surprised me. Not the best, not perfect, but still very productive. I would be a wary shareholder if I owned a stake in the frontier labs… that moat seems to be shrinking fast.
It's been a moving target for years at this point.
Both open and closed source models have been getting better, but not sure if the open source models have really been closing the gap since DeepSeek R1.
But yes: If the top closed source models were to stop getting better today, it wouldn't take long for open source to catch up.
The moat is having researchers that can produce frontier models. When OpenCode starts building frontier models, then I'd be worried; otherwise they're just another wrapper
"OpenCode Go" (a subscription) lets you use lots of hosted open-weights frontier AI models, such as GLM-5 (currently right up there in the frontier model leaderboards) for $10 per month.
Can you talk more about how you leverage higher quality models for the stuff that counts? Anywhere I can read more on the philosophy of when to use each?
Sure happy to share. It’s been trial and error, but I’ve learned that for agents to reliably ship a large feature or refactor, I need a good spec (functional acceptance criteria) and I need a good plan for sequencing the work.
The big expensive models are great at planning tasks and reviewing the implementation of a task. They can better spot potential gotchas, performance or security gaps, subtle logic and nuance that cheaper models fail to notice.
The small cheap models are actually great (and fast) at generating decent code if they have the right direction up front.
So I do all the spec writing myself (with some LLM assistance), and I hand it to a Supervisor agent who coordinates between subagents. Plan -> implement -> review -> repeat until the planner says “all done”.
I switch up my models all the time (actively experimenting) but today I was using GPT 5.4 for review and planning, costing me about $0.4-$1 for a good sized task, and Kimi for implementation. Sometimes my spec takes 4-5 review loops and the cost can add up over an 8 hour day. Still cheaper than Claude Max (for now, barely).
Each agent retains a fairly small context window which seems to keep costs down and improves output. Full context can be catastrophic for some models.
As for the spec writing, this is the fun part for me, and I’ve been obsessing over this process, and the process of tracking acceptance criteria and keeping my agents aligned to it. I have a toolkit cooking, you can find in my comment history (aiming to open source it this week).
I'm building a full stack web app, simple but with real API integrations with CC.
Moving so fast that I can barely keep a hold on what I'm testing and building at the same time, just using Sonnet. It's not bad at all. A lot of the specs develop as I'm testing the features, either as an immediate or a todo / gh issue.
I don't use it for coding but as an agent backend. Maybe opencode was thought for coding mainly, but for me, it's incredibly good as an agent, especially when paired with skills, a fastapi server, and opencode go(minimax) is just so much intelligence at an incredibly cheap price. Plus, you can talk to it via channels if you use a claw.
I'd really like to get more clarification on offline mode and privacy. The github issues related to privacy did not leave a good feeling, despite being initially excited. Is offline mode a thing yet? I want to use this, but I don't want my code to leave my device.
The only thing I'm wondering is if they have eval frameworks (for lack of a better word). Their prompts don't seem to have changed for a while and I find greater success after testing and writing my own system prompts + modification to the harness to have the smallest most concise system prompt + dynamic prompt snippets per project.
I feel that if you want to build a coding agent / harness the first thing you should do is to build an evaluation framework to track performance for coding by having your internal metrics and task performance, instead I see most coding agents just fiddle with adding features that don't improve the core ability of a coding agent.
I've forked it locally, to be honest I haven't merged upstream in a while as I haven't seen any commits that I found relevant and would improve my usage, they seem to work on the web and desktop version which I don't use.
The changes I've made locally are:
- Added a discuss mode with almost on tools except read file, ask tool, web search only based no heuristics + being able to switch from discuss to plan mode.
Experiments:
- hashline: it doesn't bring that much benefit over the default with gpt-5.4.
- tried scribe [0]: It seems worth it as it saves context space but in worst case scenarios it fails by reading the whole file, probably worth it but I would need to experiment more with it and probably rewrite some parts.
The nice thing about opencode is that it uses sqlite and you can do experiments and then go through past conversation through code, replay and compare.
Now I just started looking into OpenCode yesterday, but seems you can override the system prompts by basically overloading the templates used in for example `~/.opencode/agents/build.md`, then that'd be used instead of the default "Build" system prompt.
At least from what I gathered skimming the docs earlier, might not actually work in practice, or not override all of it, but seems to be the way it works.
I wish the team would be more responsive to popular issues - like inability to provide a dynamic api key helper like claude has. This one even has a PR open: https://github.com/anomalyco/opencode/issues/1302
i've been using this as my primary harness for llama.cpp models, Claude, and Gemini for a few months now. the LSP integration is great. i also built a plugin to enable a very minimal OpenClaw alternative as a self modifying hook system over IPC as a plugin for OpenCode: https://github.com/khimaros/opencode-evolve -- and here's a deployment ready example making use of it which runs in an Incus container/VM: https://github.com/khimaros/persona
Very cool! I have been using opencode, as almost everybody else in the lab is using codex. I found the tools thing inside your own repo amazing but somehow I could not get it to reliably get opencode to write its own tools. Seems also a bit scary as there is pretty much not much security by default. I am using it in a NixOS WSL2 VM
I'm actually moving to containerised isolation. I realised the agents waste too much time trying to correctly install dependencies, not unlike a normal nixos user.
I've used both. I stuck with Claude Code, the ergonomics are better and the internals are clearly optimized for Opus which I use daily, you can feel it. That said OpenCode is still a very good alternative, well above Codex, Gemini CLI or Mistral Vibe in my experience.
What would be the advantage using this over say VSCode with Copilot or Roo Code? I need to make some time to compare, but just curious if others have a good insight on things.
I started out using VSCode with their Claude plugin; it seemed like a totally unnecessary integration. A better workflow seems to just run Claude Code directly on my machine where there are fewer restrictions - it just opens a lot more possibilities on what it can do
Ok I get it now, same with the vim comment above, it seems VSCode has the more IDE setup while OpenCode is giving the vim nerdtree vibe? I'll have to take a look, it makes sense to possibly have both for different use cases I guess.
Stupid question, but are there models worth using that specialize in a particular programming language? For instance, I'd love to be able to run a local model on my GPU that is specific to C/C++ or Python. If such a thing exists, is it worth it vs one of the cloud-based frontier models?
I'm guessing that a model which only covers a single language might be more compact and efficient vs a model trained across many languages and non-programming data.
Months ago I tested a concept revolving this issue and made a weird MCP-LSP-LocalLLM hybrid thing that attempts to enhance unlucky, fast changing, or unpopular languages (mine attempts with Zig)
I'm currently experimenting with (trying to) fine tune Qwen3.5 to make it better at a given language (Nim in this case); but I am quite bad at this, and honestly am unsure if it's even really fully feasible at the scale I have access to. Certainly been fun so far though, and I have a little Asus GX10 box on the way to experiment some more!
Been playing around with fine-tuning models for specific languages as well (Clojure and Rust mostly), but the persistent problem is high quality data sets, mostly I've been generating my own based on my own repositories and chat sessions, what approach are you taking for gathering the data?
My own experience trying many different models is that general intelligence of the model is more important.
If you want it to stick to better practices you have to write skills, provide references (example code it can read), and provide it with harnessing tools (linters, debuggers, etc) so the agent can iterate on its own output.
OpenCode works awesome for me. The BigPickle model is all I want. I do not throw some large work at the agent that requires lot of reasoning, thinking or decision making. It's my role to chop the work down to bite-size and ask the fantastic BigPickle to just do the damn coding or bit of explaining. It works very well with interactive sessions with small tasks. Not giving something to work over night.
I used Claude with paid subscription and codex as well and settled to OpenCode with free models.
Can someone explain how Claude Code can instantly determine what file I have open and what lines I have selected in VS Code even if it's just running in a VS Code terminal instance, yet I cannot for the life of me get OpenCode to come anywhere close to that same experience?
The OpenCode docs suggest its possible, but it only works with their extension (not in an already open VS Code terminal) with a very specific keyboard shortcut and only barely at that.
What does well: helps context switching by using one window to control many repos with many worktrees each.
What can do better?
It's putting AI too much in control? What if I want to edit a function myself in the workspace I'm working on? or select a snippet and refer that in the promp? without that I feel it's missing a non-negotiable feature.
Do you think the design direction of “chat first” is compatible with editor first? I don’t know if any tools do both well. Seems like a fork in the road, design wise.
Since this is blowing up, gonna plug my opencode/claude-code plugin that allows you to annotate LLMs plans like a Google doc with strikethroughs, comments, etc. and loop with your agent until you're happy with the plan.
The decision to build this as a TUI rather than a web app is interesting. Terminal-native tools tend to get out of the way and let you stay in flow -- curious how the context management works when you have a large codebase, do you chunk by file or do something smarter?
It’s both! The core is implemented as a server and any UI (the TUI being one) can connect to it.
It’s actually “dumber” than any of your suggestions - they just let the agent explore to build up context on its own. “ls” and “grep” are among the most used discovery tools. This works extraordinarily well and is pretty much the standard nowadays because it lets the agent be pretty smart about what context it pulls in.
That's my favorite CLI agent, over codex, claude, copilot and qwen-code.
It has beautified markdown output, much more subagents, and access to free models. Unlike claude and codex. Best is opencode with GitHub opus 4.6, but the fun only lasts for a day, then you're out of tokens for a month.
I use it with Qwen 3.5 running locally when my daily limits run out on my other subscriptions.
The harness is great. Local models are just slow enough that the subscription models are easier to use. For most of my tasks these days, the model's capability is sufficient; it is just not as snappy.
Could you say more about the differences between Aider and OpenCode?
I briefly dabbled with Aider some months back but never got any real work done with it. Without installing each one of these new tools I'm having trouble grokking what is changing about them that moves the LLM-assisted software dev experience forward.
One thing I like with Aider is the fact that I can control the context by using /add explicitly on a subset of files. Can you achieve the same wit OpenCode ?
I'm curious: I'venever touched cloud models beyond a few seconds. I run a AMD395+ with the new qwen coder. Is there any intelligence difference, or is it just speed and context? At 128GB, it takes quite awhile before getting context wall.
There's a difference in intelligence. However for 90% of what I'm doing I don't really need it. The online models are just faster.
I just did a one hour vibe session today, ripping out a library dependency and replacing it with another and pushing the library to pypi. I should take my task list and let the local model replicate the work and see how it works out.
The security concerns here are real but not unique to OpenCode. Most AI coding agents have the same fundamental problem: they need broad file system access to be useful, but that access surface is also the attack surface. The config-from-web issue is particularly bad because it's essentially remote code execution through prompt injection.
What I'd want to see from any of these tools is a clear permissions model — which files the agent can read vs write, whether it can execute commands, and an audit log of what it actually did. Claude Code's hooks system at least gives you deterministic guardrails before/after agent actions, but it's still early days for this whole category.
Same thoughts - I wanted a "permission manager" that defines a set of policies agnostic to coding agents. It also comes with "monitor mode" that shows operations blocked, but not quite an audit log yet though.
This is another one of OpenCode’s current weak points in the security complex: They consider permissions a “UX feature” rather than actual guardrails. The reasoning is that you’re giving the agent access to the shell, so it’ll be able to sidestep everything.
This is of course a cop-out: They’re not considering the case in which you’re not blindly doing that.
Fun fact: In the default setup, the agent can fully edit all of the harnesses files, including permissions and session history. So it’s pretty trivial for it to a) escalate privileges and then even b) delete evidence of something nefarious happening.
It’s pretty reckless and even pretty easy to solve with chroot and user permissions. There just has been (from what I see currently) relatively little interest from the project in solving this issue.
Granted, I just started playing around with OpenCode (but been using Codex and Claude Code since they were initially available, so not first time with agents), but anyways:
> they need broad file system access to be useful, but that access surface is also the attack surface
Do they? You give them access to one directory typically (my way is to create a temporary docker container that literally only has that directory available, copied into the container on boot, copied back to the host once the agent completed), and I don't think I've needed them to have "broad file system access" at any point, to be useful or otherwise.
So that leads me to think I'm misunderstanding either what you're saying, or what you're doing?
This is the way. If you’re not running your agent harness/framework in a container with explicit bind mounts or copy-on-build then you’re doing it wrong. Whenever I see someone complain about filesystem access and sequirity risk it’s a clear signal of incompetence imo.
Someone correct me if I'm wrong, but if you're doing bind-mounts, ensure you do read-only, if you're doing bi-directional bind mounts with docker, the agent could (and most likely know how to) create a symlink that allows them to browse outside the bind mount.
That's why I explicitly made my tooling do "Create container, copy over $PWD, once agent completes, copy back to $PWD" rather than the bind-mount stuff.
> create a symlink that allows them to browse outside the bind mount
Could you reproduce that? iiuc the symlink that the agent creates should follow to the path that's still inside the container.
I built a product solving this problem about a year ago, basically a serverless, container-based, NATed VScode where you can eg "run Claude Code" (or this) in your browser on a remote container.
There's a reason I basically stopped marketing it, Cursor took off so much then, and now people are running Claude/Codex locally. First, this is something people only actually start to care about once they've been bitten by it hard enough to remember how much it hurt, and most people haven't got there yet (but it will happen more as the models get better).
Also, the people who simultaneously care a lot about security and systems work AND are AI enthusiasts AND generally highly capable are potentially building in the space, but not really customers. The people who care a lot about security and systems work aren't generally decision makers or enthusiastic adopters of AI products (only just now are they starting to do so) and the people who are super enthusiastic about AI generally aren't interested in spending a lot of time on security stuff. To the extent they do care about security, they want it to Just Work and let them keep building super fast. The people who are decision makers but less on the security/AI trains need to this happen more, and hear about the problem from other executives, before they're willing to spend on it.
To the extent most people actualy care about this, they still want to Just Work like they do now and either keep building super fast or not thinking about AI at all. It's actually extremely difficult to give granular access to agents because the entire point is them acting autonomously or keeping you in a flow state. You either need to have a really compatible threat model to doing so (eg open source work, developer credentials only used for development and kept separate from production/corp/customer data), spend a lot of time setting things up so that agents can work within your constraints (which also requires a willingness to commit serious amounts of time or resources to security, and understanding of it), or spend a lot of time approving things and nannying it.
So right now everybody is just saying, fuck it, I trust Anthropic or Microsoft or OpenAI or Cursor enough to just take my chances with them. And people who care about security are of course appalled at the idea of just giving another company full filesystem access and developer credentials in enterprises where the lack of development velocity and high process/overhead culture was actually of load-bearing importance. But really it's just that secure agentic development requires significant upfront investment in changing the way developers work, which nobody is willing to pay for yet, and has no perfect solutions yet. Dev containers were always a good idea and not that much adopted either, btw.
It takes a lot more investment in actually providing good permissions/security for agent development environments still too, which even the big companies are still working on. And I am still working on it as well. There's just not that much demand for it, but I think it's close.
What caused the switch was that we're building AI solutions for sometimes price-conscious customers, so I was already familiar with the pattern of "Use a superior model for setting a standard, then fine-tuning a cheaper one to do that same work".
So I brought that into my own workflows (kind of) by using Opus 4.6 to do detailed planning and one 'exemplar' execution (with 'over documentation' of the choices), then after that, use Opus 4.6 only for planning, then "throw a load of MiniMax M2.5s at the problem".
They tend to do 90% of the job well, then I sometimes do a final pass with Opus 4.6 again to mop up any issues, this saves me a lot of tokens/money.
This pattern wasn't possible with Claude Code, thus my move to Open Code.
I've used it but recently moved back to plain claude code. We use claude at the company and weirdly the experience has become less and less productive using opencode. I'm a bit sad about it as it was the first experience that really clicked and got great results out of. I'm actually curious if Anthropic knows which client is used and if they negatively influence the experience on purpose. It's very difficult to prove because nothing about this is exact science.
I think Anthropic just highly RL’s their model to work best with it’s Claude Code’s particular ways of going about things.
All the background capability Claude code now has makes things way more complex and I saw a meaningful improvement with 4.6 versus 4.5, so imagine other harnesses will take time to catch up.
I tried to use it but OpenCode won't even open for me on Wayland (Ubuntu 24.04), whichever terminal emulator I use. I wasn't even aware TUI could have compatibility issues with Wayland
Definitely not Wayland related, or so I doubt. I'm on wayland and never had any issues, and it's a TUI, where the terminal emulator does or does not do GPU work. What led you to that conclusion?
> On Linux, some Wayland setups can cause blank windows or compositor errors.
> If you’re on Wayland and the app is blank/crashing, try launching with OC_ALLOW_WAYLAND=1.
> If that makes things worse, remove it and try launching under an X11 session instead.
OC_ALLOW_WAYLAND=1 didn't work for me (Ubuntu 24.04)
Suggesting to use a different display server to use a TUI (!!) seems a bit wild to me. I didn't put a lot of time into investigating this so maybe there is another reason than Wayland. Anyway I'm using Pi now
That issue points out that it is probably a dependency problem.
The other problem is that they let a package manager block the UI and either swallow hard errors or unable to progress on soft errors. The errors are probably (hopefully) in some logs.
A dev oriented TUI should report unrecoverable errors on screen or at least direct you to the logs. It's not easy to get right, but if you dare to do it isn't rocket science either. They didn't dare.
I had to abandon it because of the memory leak, it would fill up all my memory in a matter of minutes. The devs don't seem to pay it much attention: https://github.com/anomalyco/opencode/issues/5363
I've been using opencode for a few months and really like it, both from a UX and a results perspective.
It started getting increasingly flaky with Anthropic's API recently, so I switched back to Claude Code for a couple of days. Oh my, what a night and day difference. Tokens, MCP use, everything.
For anyone reading at OpenAI, your support for OpenCode is the reason I now pay you 200 bucks a month instead.
I've been paying OpenAI 200 bucks a month for what feels like forever by now, but used OpenCode for the first time yesterday, been using Codex (and Claude Code from time to time, to see if they've caught up with Codex) since then.
But I don't use MCP, don't need anything complicated, and not sure what OpenCode actually offers on top. The UI is slightly nicer (but oh so much heavier resource usage), both projects source code seems vibecoded and the architecture is held together with hopes and dreams, but in reality, minor difference really.
Also, didn't find a way in OpenCode to do the "Fast Mode" that Codex has available, is that just not possible or am I missing some setting? Not Codex-Spark but the mode that toggles faster inference.
If it was a somewhat unique name, then yeah maybe. But "opencode" is probably as generic as you could make it, hard to claim to be "squatting" something so well used already... Earliest project on GitHub named "opencode" seems to date back to 2010, but I'm sure there are even earlier projects too: https://github.com/search?q=opencode&type=repositories&s=upd...
you'll be surprised the name was actually a controversy on x/twitter since opencode was originally another dev's idea who joined the charmcli team. they wanted to keep that name but dax somehow (?) ended up squatting it. the charmcli team has renamed their tool to "crush" which matches their other tools a lot better than "opencode"
I'd love for all these tools to standardise on the structure of plugins / skills / commands / hooks etc., so I can swap between them to compare without feeling handicapped!
I wish they would add back support for anthropic max/pro plans via calling the claude cli in -p mode. As I understand thats still very much allowed usage of claude code cli (as you are still using claude cli as it was intended anyway and fixes the issue of cache hits which I believe were the primary reason anthropic sent them the c&d). I love the UX from OpenCode (I loved setting it up in web mode on my home server and code from the web browser vs doing claude code over ssh) but until I can use my pro/max subscription I can't go back, the API pricing is way too much for my third world country wallet.
They had that?! I saw that some people wrote skills and plugins to call claude cli and gemini cli to still be able to use the subscription.
I would also wish that this was supported out of the box, something similar to goose cli providers or acp providers (https://block.github.io/goose/docs/guides/acp-providers).
But I don't want to spend testing yet another agent harness or change the workflow when I somewhat got used to one way of working on things (the churn is real).
- GH copilot API is a first class citizen with access to multiple providers’ models at a very good price with a pro plan
- no terminal flicker
- it seems really good with subagents
- I can’t see any terminal history inside my emacs vterm :(
Question: How do we use Agents to Schedule and Orchestrate Farming and Agricultural production, or Manufacturing assembly machines, or Train rail transportation, or mineral and energy deposit discovery and extraction or interplanetary terraforming and mining, or nuclear reactor modulation, or water desalination automation, or plutonium electric fuel cell production with a 24,000 year half-life radiation decay, or interplanetary colonization, or physics equation creation and solving for faster-than-light travel?
Yeah, support the company that promised to help your government illegally mass surveil and mass kill people, because they support a use case slightly better than the non-mass-murdering option.
You are absolutely correct that both are evil ... as are most corporations.
Still, I feel like "will commit illegal mass murder against their own citizens" is a significant enough degree more evil. I think lots of corporations will help their government murder citizens of other countries, but very few would go so far as to agree to murder their own (fellow) citizens ... just to get a juicy contract.
I see your viewpoint but, to me, "both will happily murder you but one is better because they won't murder ME!" isn't very compelling. Like, I get it, but also it changes nothing for me. They're both bad.
It's not about "won't murder me" it's about "won't murder their own tribe". Humans are very tribal creatures, and we have all sorts of built-in societal taboos about betraying our tribe.
We also have taboos against betraying/murdering/whatever people of other tribes, but those taboos are much weaker and get relaxed sometimes (eg. in war). My point is, it takes significantly more anti-social (ie. evil) behavior to betray your own tribe, in the deepest way possible, than it does to do horrible things to other tribes.
This is just as much true for Russians murdering Ukranians as Ukranians murdering Russians, or any other conflict group: almost all Russians would consider a Russian who helps kill Russians to be more evil than a Russian who kills Ukranians (and vice versa).
Right, but I consider someone who'll murder exclusively other tribes to be infinitely closer to someone who'll murder their own tribe than to someone who won't murder anyone.
That a gross exaggeration. But to your point, I could say the same for almost any product I use from Big Tech, every laptop company I buy my hardware from, etc. I'm sure the same applies to you. I can't fight every vendor all the time. For now I pick what works best for my use case.
You're right, Anthropic shouldn't have even taken a moral stance here at all. They should have just gone full send and allowed everything, because there will never be satisfying some people. Why even try?
Many folks from other tools are only getting exposed to the same functionality they got used to, but it offers much more than other harnesses, especially for remote coding.
You can start a service via `opencode serve`, it can be accessed from anywhere and has great experience on mobile except a few bugs. It's a really good way to work with your agents remotely, goes really well with TailScale.
The WebUI that they have can connect to multiple OpenCode backends at once, so you may use multiple VPS-es for various projects you have and control all of them from a single place.
Lastly, there's a desktop app, but TBH I find it redundant when WebUI has everything needed.
Make no mistakes though, it's not a perfect tool, my gripes with it:
- There are random bugs with loading/restoring state of the session
- Model/Provider selection switch across sessions/projects is often annoying
- I had a bug making Sonnet/Opus unusable from mobile phone because phone's clock was 150ms ahead of laptop's (ID generation)
- Sometimes agent get randomly stuck. It especially sucks for long/nested sessions
- WebUI on laptop just completely forgot all the projects at
one day
- `opencode serve` doesn't pick up new skills automatically, it needs to be restarted
Interesting timing — I've been building on Cloudflare Workers
with edge-first constraints, and the resource footprint of most
AI coding tools is striking by comparison. A TypeScript agent
that uses 1GB+ RAM for a TUI feels like the wrong abstraction.
The edge computing model forces you to think differently about
state, memory, and execution — maybe that's where lighter
agentic tools will emerge.
Being able to assign different models to subagents is the feature I've been wanting. I use Claude Code daily and burning the same expensive model on simple file lookups hurts. Any way to set default model routing rules, or is it manual per task?
With OpenCode, I've found that I can do this by defining agents, assigning each agent a specifically model to use. Then K manually flip to that agent when I want it or define some might rules in my global AGENTS.nd file to gives some direction and OpenCode will automatically subtask out to the agent, which then forces the use of the defined model.
The maintaining team is incredibly petty though. Tantrums when they weren't allowed to abuse Claude subscriptions and had to use the API instead. They just removed API support entirely.
Anthropic has zero problems with API billing, there's no chance they told him to rip that out.
Reading through his X comments and GitHub comments he is behaving immaturely. I don't trust what he's saying here. Ripping out Claude API support was just throwing a tantrum. Weird given his age - he's old enough to be more mature.
‘abuse’. The same rate limits apply, the requests still go to the same endpoints.
Even as a CC user I’m glad someone is forcing the discussion.
My prediction: within two years ‘model neutrality’ will be a topic of debate. Creating lock-in through discount pricing is anti-competitive. The model provider is the ISP; the tool, the website.
> The same rate limits apply, the requests still go to the same endpoints.
That is not the point. That is a mere technicality.
You signed a contract. If you don't ignore the terms of the contract to use the product in a way that is explicitly prohibited, you're abusing the product. It is as simple as that.
They offer a separate product (API) if you don't like the terms of the contract.
Also, if you really want to get technical: the limits are under the assumption that caching works as intended, which requires control of the client. 3P clients suck at caching and increase costs. But that is not the overarching point.
> Creating lock-in through discount pricing is anti-competitive.
Literally everyone does this. OpenAI is doing this with Codex, far more than Anthropic is. It's not great but players much bigger than Anthropic are using discount pricing to create an anti-competitive advantage.
> Because that could be easily resolved by factoring % cache hits into the usage limits.
Absolutely not, you are not thinking from a product perspective at all.
You might not want to capture cache % hits in usage limits because there may be some edge cases you want to support that have low hits even with an optimized client. Maybe your caching strategy isn't perfect yet, so you don't count hits to keep a good product experience going.
OSS clients that freeload on the subscription break your ability to support these use cases entirely. Now you have to count cache hits at the expense of everyone else. It is a classic case of some people ruining the experience for everyone.
> Why is the 'Apple electric company' selling cheaper electricity to households with Apple devices?
Why does Netflix not let you use your OSS hacked client of choice with your subscription?
> Literally everyone does this. OpenAI is doing this with Codex, far more than Anthropic is.
And yet, OpenAI have publicly said they welcome OpenCode users to use their subscription package. So how are they being anti-competitive "far more" than Anthropic?
I had been using open code and admire they effort to create something huge and help a lot of developers around the world, connecting LLM our daily work without use a browser!
The MCP (Model Context Protocol) support is what makes this interesting to me. Most coding agents treat the file system and shell as the only surfaces — MCP opens up the possibility of connecting to any structured data source or API as a first-class tool without custom integration work each time.
Curious how the context window management works in practice. With large repos, the "what files to include" problem tends to dominate — does it have a strategy beyond embedding-based retrieval, or is that the main approach here?
I want to love this, but the "just install it globally, what could go wrong?" is simply not happening for an AI-written codebase. Open Source was never truly "you can trust it because everyone can vet it", so you had to do your due diligence. Now with AI code bases, that's "it might be open source, but no one actually knows how it works and only other AIs can check if it's safe because no one can read the code". Who's getting the data? No idea. How would you find out? I guess you can wireshark your network? This is not a great feeling.
Things that make an an OpenCode fanboy
1. OpenCode source code is even more awesome. I have learned so much from the way they have organized tools, agents, settings and prompts.
2. models.dev is an amazing free resource of LLM endpoints these guys have put together
3. OpenCode Zen almost always has a FREE coding model that you can use for all kinds of work. I recently used the free tier to organize and rename all my documents.
I use bubblewrap. This ensures it only has access to the current working directory and its own configuration. No ability to commit or push (since it doesn't have access to ssh keys) or try to run aws commands (no access to awscli configuration) and so on. It can read anything from my .envrc, since it doesn't have access to direnv or the parent directory. You could lock down the network even further if you wanted to limit web searches.
Honestly I was a Claude code only guy for a while. I switched to opencode and I’m not going back.
IMO, the web UI is a killer feature - it’s got just enough to be an agent manager - without any fluff. I run it on my remote VMs and connect over HTTP.
This is very interesting. This could allow custom harnesses to be used economically with Opus. Depending on the usage limits, this may be cheaper than their API.
You can scroll down literally two messages in the Github issue you linked:
> there isnt any telemetry, the open telemetry thing is if you want to get spans like the ai sdk has spans to track tokens and stuff but we dont send them anywhere and they arent enabled either
> most likely these requests are for models.dev (our models api which allows us to update the models list without needing new releases)
> There is currently no option to change this behavior, no startup flag, nothing. You do not have the option to serve the web app locally, using `opencode web` just automatically opens the browser with the proxied web app, not a true locally served UI.
That is the address of their hosted WebUI which connects to an OpenCode server on your localhost. Would be nice if there was an option to selfhost it, but it is nowhere near as bad as "proxying all requests".
I used Codex for a long time. It's definitely better than Claude Code due to being open source, but opencode is nicer to use. Good hotkeys, plan/build modes, fast and easy model switching, good mcp support. Supports skills, is not the fastest but good enough.
Just a data point, I would need to use it for my workflows. I do have a monorepo with a root level claude.md, and project level claude.md files for backend/frontend.
I use this. I run it in a sandbox[0]. I run it inside Emacs vterm so it's really quick for me to jump back and forth between this and magit, which I use to review what it's done.
I really should look into more "native" Emacs options as I find using vterm a bit of a clunky hack. But I'm just not that excited about this stuff right now. I use it because I'm lazy, that's all. Right now I'm actually getting into woodwork.
I started with Codex, then switched to OpenCode, then switched to Codex.
OpenCode just has more bugs, it's incredibly derivative so it doesn't really do anything else than Codex.
The advantage of OpenCode is that it can use any underlying model, but that's a disadvantage because it breaks the native integration. If you use Opus + Claude Code, or Gpt-Codex + Codex App, you are using it the way it was designed to be used.
If you don't actually use different models, or plan to switch, or somehow value vendor neutrality strategically, you are paying a large cost without much reward.
This is in general a rule, vendor neutrality is often seen as a generic positive, but it is actually a tradeoff. If you just build on top of AWS for example, you make use of it's features and build much faster and simpler than if you use Terraform.
You do not "write" code. Stop these euphemisms. It is an intellectual prosthetic for feeble minded people that plagiarizes code by written by others. And it connects to the currently "free" providers who own the means of plagiarizing.
There is nothing open about it. Please do not abuse the term "open" like in OpenBSD.
What I don't understand is that, if coding agents are making coding obsolete, why do these vibe coders not choose a language that doesn't set their users' compute resources on fire? Just vibe rust or golang for their cli tools, no one reviews code slop nowadays anyway /s.
I do not understand the insistence on using JavaScript for command line tools. I don't use rust at all, but if I'm making a vibe coded cli I'm picking rust or golang. Not zig because coding agents can't handle the breaking changes. What better test of agentic coders' conviction in their belief in AI than to vibe a language they can't read.
Just remember, OpenCode is sending telemetry to their own servers, even when you're using your own locally hosted models. There are no environment variables, flags, or other configuration options to disable this behavior.¹
At least you can easily turn off telemetry in Claude Code - just set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC to 1.
You can use Claude Code with llama.cpp and vLLM, too right out of the box with no additional software necessary, just point ANTHROPIC_BASE_URL at your inference server of choice, with any value in ANTHROPIC_API_KEY.
Some people think that Anthropic could disable this at any time, but that's not really true - you can disable automatic updates and back up and reuse native Claude Code binaries, ensuring Anthropic cannot change your existing local Claude Code binary's behavior.
With that said, I like the idea of an open source TUI agent that won't spy on me without my consent and no way to disable it much better than a closed source TUI agent that I can effectively neuter telemetry on, but sadly, OpenCode is not the former. It's just another piece of VC-funded spyware that's destined for enshittification.
Are you sure that endpoint is sending all traffic to opencode? I'm not familiar with Hono but it looks like a catch all route if none of the above match and is used to serve the front-end web interface?
OpenCode was the first open source agent I used, and my main workhorse after experimenting briefly with Claude Code and realizing the potential of agentic coding. Due to that, and because it's a popular an open source alternative, I want to be able to recommend it and be enthusiastic about it. The problem for me is that the development practices of the people that are working on it are suboptimal at best; they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things (or even build a proper list of changes for each release), and they add, remove, refine, change, fix, and break features constantly at that accelerated pace.
More than that, it's an extremely large and complex TypeScript code base — probably larger and more complex than it needs to be — and (partly as a result) it's fairly resource inefficient (often uses 1GB of RAM or more. For a TUI).
On top of that, at least I personally find the TUI to be overbearing and a little bit buggy, and the agent to be so full of features that I don't really need — also mildly buggy — that it sort of becomes hard to use and remember how everything is supposed to work and interact.
> Due to that, and because it's a popular an open source alternative, I want to be able to recommend it and be enthusiastic about it. The problem for me is that the development practices of the people that are working on it are suboptimal at best;
This is my experience with most AI tools that I spend more than a few weeks with. It's happening so often it's making me question my own judgement: "if everything smells of shit, check your own shoes." I left professional software engineering a couple of years ago, and I don't know how much of this is also just me losing touch with the profession, or being an old man moaning about how we used to do it better.
It reminds me of social media: there was a time where social media platforms were defined by their features, Vine was short video, snapchat was disappearing pictures, twitter was short status posts etc. but now they're all bloated messes that try do everything.
The same looks to be happening with AI and agent software. They start off as defined by one features, and then become messes trying to implement the latest AI approach (skills, or tools, or functions, or RAG, or AGENTS.md, or claws etc. etc.)
> and (partly as a result) it's fairly resource inefficient (often uses 1GB of RAM or more. For a TUI).
That's (one of the reasons) why I'm favoring Codex over Claude Code.
Claude Code is an... Electron app (for a TUI? WTH?) and Codex is Rust. The difference is tangible: the former feels sluggish and does some odd redrawing when the terminal size changes, while the latter definitely feels more snappy to me (leaving aside that GPT's responses also seem more concise). At some point, I had both chewing concurrently on the same machine and same project, and Claude Code was using multiple GBs of RAM and 100% CPU whereas Codex was happy with 80 MB and 6%.
Performance _is_ a feature and I'm afraid the amounts of code AI produces without supervision lead to an amount of bloat we haven't seen before...
I think you’re confusing capital c Claude Code, the desktop Electron app, and lowercase c `claude`, the command line tool with an interactive TUI. They’re both TypeScript under the hood, but the latter is React + Ink rendered into the terminal.
The redraw glitches you’re referring to are actually signs of what I consider to be a pretty major feature, a reason to use `claude` instead of `codex` or `opencode`: `claude` doesn’t use the alternate screen, whereas the other two do. Meaning that it uses the standard screen buffer, meaning that your chat history is in the terminal (or multiplexer) scrollback. I much prefer that, and I totally get why they’ve put so much effort into getting it to work well.
In that context handling SIGWINCH has some issues and trickiness. Well worth the tradeoff, imo.
Codex is using its app server protocol to build a nice client/server separation that I enjoy on top of the predictable Rust performance.
You can run a codex instance on machine A and connect the TUI to it from machine B. The same open source core and protocol is shared between the Codex app, VS Code and Xcode.
OpenCode works this way too
not sure if same reason but window resize feels better in claude than codex.
on my m1, claude is noticeably slower when starting, but it feels ok after that.
Anthropic needs to spend some tokens rewriting Claude Code in Rust (yes, really).
The difference in feel between Codex and Claude Code is obvious.
The whole thing is vibed anyway, I'm sure they could get it done in a week or two for their quality standards.
Java (incl. Scala, Closure, Groovy, Jython, etc.) is better suited to running as a server. Let agents write clean readable code and leave performance concerns to the JIT compiler. If you really want you can let agents rewrite components at runtime without losing context.
Erlang would offer similar benefits, because what we're doing with these things is more message passing than processing.
Rust is what I'd want agents writing for edge devices, things I don't want to have to monitor. Granted, our devices are edge devices to Anthropic, but they're more tightly coupled to their services.
I'd suggest Go ahead of Rust. It's more accessible to contributors.
I think Go might be a better choice but not for that reason at all.
Go could implement something like this with no dependencies outside the standard library. It would make sense to take on a few, but a comparable Rust project would have at least several dozens.
Also, Go can deliver a single binary that works on every Linux distribution right out of the box. In Rust, its possible but you have to static compile with muslc and that is a far less well-trodden path with some significant differences to the glibc that most Rust libraries have been tested with.
Most of, if not every commit of claude code is now written by claude code itself without any human writing code, only promoting.
Because of all these obvious Go benefits, wonder why they are instead always doing these tools in typescript? Must be some reason?
Because is all the current generation of devs know unfortunately.
Most developers find it more pleasant.
Claude Code is closed source so this isn’t a concern they should have as Opus is great at Rust.
I think go will make it easier for more developers to contribute, bit rust would probably attract higher quality contributions.
If anything, the stricter the compiler the better for vibe coding the language
> It's more accessible to contributors.
What would make go more "accessible to contributors" than Rust?
My personal opinion is that I like Rust much more than Go, but I can’t deny that Rust is a big, and more dauntingly to newcomers, pretty unopinionated language compared to Go.
There are more syntax features, more and more complex semantics, and while rustc and clippy do a great job of explaining like 90% of errors, the remaining 10% suuuuuck.
There’s also some choices imposed by the build system (like cargo allowing multiple versions of the same dep in a workspace) and by the macro system (axum has some unintuitive extractor ordering needs that you won’t find unless you know to look for them), and those things and the hurdles they present become intuitive after a time but just while getting started? Oof
Go is a language one can learn and become functional in an afternoon. Rust is way more involved.
Frankly I don't think one even needs to learn it, if you know a bunch of other languages and the codebase is good. I was able to just make a useful change to an open source project by just doing it, without having written any lines of Go before. Granted the MR needed some revisions.
Rust is my favorite, though. There are values beyond ease of contribution. I can't replicate the experience with a Rust project anymore, but I suspect it would have been tougher.
To vibe coders it doesn't matter, right?
Even then, you still need to read that code and Rust is way less read friendly than Go.
I have the impression that most vibe coders don't read code. I guess they would probably use something accessible to them, just in case.
Successful vibe coders read code.
If you already know some php python javascript and/or c, you can pretty much just wing with Claude code.
Mature tui packages like bubbletea, lipgloss. Besides TS resemblance to go could push the movement of rewrite, not necessarily easier though.
CC isn't foss in the first place, so the previous comment falls short.
> TS resemblance to go
This is the second time I've seen claims like this in the last 24 hours and I'm afraid I might have lost contact with reality.
agents don't really care and they're doing anywhere between 90-100% of the work on CC. if anything, rust is better as it has more built-in verification out of the box.
This is a terrible suggestion.
Rust is accessible to everyone now that Claude Code and Opus can emit it at a high proficiency level.
Rust is designed so the error handling is ergonomic and fits into the flow of the language and the type system. Rust code will be lower defect rate by default.
Plus it's faster and doesn't have a GC.
You can use Rust now even if you don't know the language. It's the best way to start learning Rust.
The learning curve is not as bad as people say. It's really gentle.
Rust is the best AI language. Bar none.
already done, this is what I use now: https://github.com/leonardcser/agent
Claude Code is a Rust app now.
I run many instances of Claude Code simultaneously and have not experienced what you are seeing. It sounds like you have a bias of Rust over Typescript.
No, they are describing a typical experience with the two apps. Just open both apps, run a few queries, and take a look at the difference in resource management yourself. It sounds like you have a bias of Claude Code over Codex.
Uh, it sounds like you're having trouble understanding that people in this thread are talking about two wildly different "claude code" applications. Those who are claiming the resources issues don't apply to them are referring to the cli application, ie: `claude` and those are saying things like "Just open both apps..." are surely referring to their GUI versions.
No, I've never used the GUI version. I literally just had to close and reopen the terminal running the Claude Code CLI on my Mac yesterday because it was taking too many resources. It generally happens when I ask Claude to use multiple sub agents. It's an obvious memory leak.
On the 100% cpu issue, I’m curious to know, what is the processor and was it performing any other cpu intensive work?
Totally agree. I'm baffled by those who don't clearly see that Codex works better than C.C. in many ways.
Codex being faster is not at all equivalent to working better. Claude Code does what I need from it most of the time.
Claude Code is not an electron app.
It does use React for rendering the terminal UI.
Did not realize this. That's bizarre!
I am more concerned about their, umm, gallant approach to security. Not only that OpenCode is permissive by default in what it is allowed to do, but that it apparently tries to pull its config from the web (provider-based URL) by default [1]. There is also this open GitHub issue [2], which I find quite concerning (worst case, it's an RCE vulnerability).
[1] https://opencode.ai/docs/config/#precedence-order
[2] https://github.com/anomalyco/opencode/issues/10939
It also sends all of your prompts to Grok's free tier by default, and the free tier trains on your submitted information, X AI can do whatever they want with that, including building ad profiles, etc.
You need to set an explicit "small model" in OpenCode to disable that.
This. I work on projects that warrant a self hosted model to ensure nothing is leaked to the cloud. Imagine my surprise when I discovered that even though the only configured model is local, all my prompts are sent to the cloud to... generate a session title. Fortunately caught during testing phase.
I’m curious if there’s a reason you’re not just coding in a container without access to the internet, or some similar setup? If I was worried about things in my dev chain accessing any cloud service, I’d be worried about IDE plugins, libraries included in imports, etc. and probably not want internet access at all.
Ok wow.
I mean the default model being Grok, whatever - that everyone sets to their favorite.
But the hidden use of a different model is wow.
Documentation [1] says:
The small_model option configures a separate model for lightweight tasks like title generation. By default, OpenCode tries to use a cheaper model if one is available from your provider, otherwise it falls back to your main model.
I would expect that if you set a local model it would just use the same model. Or if for example you set GPT as main model, it would use something else from OpenAI. I see no mentions of Grok as default
[1] https://opencode.ai/docs/config/
i ran it through mitmproxy, i am using pinned version 1.2.20, 6 march 2026, set up with local chat completions.
on that version, it does not fall back to the main model. it silently calls opencode zen and uses gpt-5-nano, which is listed as having 30 day retention plus openai policy, which is plain text human review by openai AND 3rd party contractors.
i see they removed the title model on v1.2.23.
i was so annoyed i made an account here today
From the code, this does not seem to be true anymore. It falls back to the current model if no small model is identified with the current provider. https://github.com/anomalyco/opencode/blob/9b805e1cc4ba4a984...
It uses a model called "Big Pickle" by default which is an alias for minimax 2.5, as far as I've been able to tell.
Wait what, so are you saying if I am on some other model, it still sends my prompts to Grok??
Wait what. For real? I knew their security posture was bad, but this bad??
They're talking about before it's configured by the user. It defaults to 'free' models so that the user can ask a question immediately on startup. Once you configure a provider, the default models aren't used.
I second that.
Have fun on windows - automatic no from me. https://github.com/anomalyco/opencode/issues?q=is%3Aissue%20...
No surprise that a tool that can run shell scripts, open URLs, etc. is flagged down on Windows where AV try to detect such trojan methods.
Who cares about Windows?
people who don't make OS preferences their entire personality
I do: they're important for ventilation in this heat wave.
People who don't like messing around with drivers and like running Linux VMs on a Windows OS.
Driver issues are way more of a thing on Windows than Linux or MacOS.
Getting hardware to work is MUCH harder on Linux
Last years I have had more problem with hardware in windows than in linux. It is not so trivial anymore.
Please provide examples.
I think the parent meant vs MacOS, not vs Linux.
Users of MacOS rarely have an active dislike for Windows, nor are they likely to announce this.
I use macos and I do actively dislike windows: here I announce it.
I liked the apple II, and the TRS 80 as I rather like basic. And then I didn’t hate DOS, and then I actively hated the graphical shell of Windows 3, but could not afford a Macintosh -so suffered through it where I had to, but mainly used DOS. Then I discovered UNIX, and did almost all of my work on a timeshare - in the early 90s!
Then Windows 95 came out and I actively hated it, but did think it was amazingly pretty - somehow this was the impetus for me to get a pc again, which I put Windows NT on. Which was profitable for freelance gigs in college. Soon after that, I dual booted it to Linux and spent most of my time in Slackware.
After that, I graduated and had enough money to buy a second rig, which I installed OS/2 warp on - which was good for side gigs. And I really liked. A lot. But my day job required that I have a Windows NT box to shell into the Solaris servers as we ran. Then I got a better class of employer and the next several let me run a Linux box to connect to our solaris (or Aix) servers.
Next my girlfriend at the time got a PowerBook G4 and installed OS X on it. It was obviously amazing. Windows XP came out, and it was once again so much worse than Windows NT - and crashed so much more - which was odd as it was based on Windows NT. (yes 98 was before this but it was really bad). Anyhow, right about here the Linux box I was running at home, died. And it was obvious that I was not going to buy an XP box, so I bought my first Mac.
And it’s been the same for the last 25 years - every time I look at a Windows box it’s horrible. I pretty much always have a Linux box headless somewhere in the house, and one rented in the cloud, and a Mac for interacting with the world.
And like the parent I actively dislike windows. And that’s interesting because I’ve liked most other operating systems I’ve used in my life, including MS-DOS. Modern windows is uniquely bad.
DOS was bad by UNIX standards too. Only Windows NT/2000 was decent.
I use windows and absolutely hate the mac UI. Having the current window title bar always at the top of the screen doesn't make any sense when you have a very big monitor. It only made sense with the tiny monitors available when the mac UI was originally created.
What? Drivers?
RCE is exactly the feature of coding agents. I'm happy with it that I don't need to launch OpenCode with --dangerously-skip every time.
No, it is still configurable. You can specify in your opencode.json config that it should be able to run everything. I think they just argued that it shouldn't be the default. Which I agree with.
No, the problem is that when logging in, the provider's website can provide an authentication shell command that OpenCode will send to the shell sight unseen, even if it is "rm -rf /home". This "feature" is completely unnecessary for the agent to function as an agent, or even for authentication. It's not about it being the default, it's about it being there at all and being designed that way.
And in the webui there is a don't ask button
I assign a specific user for it, which doesn't have much access to my files. So what I want is complete autonomy.
> The problem for me is that the development practices of the people that are working on it are suboptimal at best; they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things (or even build a proper list of changes for each release), and they add, remove, refine, change, fix, and break features constantly at that accelerated pace.
this is what i notice with openclaw as well. there have been releases where they break production features. unfortunately this is what happens when code becomes a commidity, everyone thinks that shipping fast is the moat but at the expense of suboptimality since they know a fix can be implemented quickly on the next release.
Openclaw has 20k commits, almost 700k lines of code, and it is only four months old. I feel confident that that sort of code base would have a no coherent architecture at all, and also that no human has a good mental model of how the various subsystems interact.
I’m sure we’ll all learn a lot from these early days of agentic coding.
> I’m sure we’ll all learn a lot from these early days of agentic coding.
So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average. Depressingly.
> So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average.
Only for the non-pro users. After all, those users were happy to use excel to write the programs.
What we're seeing now is that more and more developers find they are happy with even less determinism than the Excel process.
Maybe they're right; maybe software doesn't need any coherence, stability, security or even correctness. Maybe the class of software they produce doesn't need those things.
I, unfortunately, am unable to adopt this view.
I think what we're seeing is a phase transition. In the early days of any paradigm shift, velocity trumps stability because the market rewards first movers.
But as agents move from prototypes to production, the calculus changes. Production systems need: - Memory continuity across sessions - Predictable behavior across updates - Security boundaries that don't leak
The tools that prioritize these will win the enterprise market. The ones that don't will stay in the prototype/hobbyist space.
We're still in the "move fast" phase, but the "break things" part is starting to hurt real users. The pendulum will swing back.
This makes sense. Development velocity is bought by having a short product life with few users. As you gain users that depend on your product, velocity must drop by definition.
The reason for this is that product development involves making decisions which can later be classified as good or bad decisions.
The good decisions must remain stable, while the bad decisions must remain open to change and therefore remain unstable.
The AI doesn't know anything about the user experience, which means it will inevitably change the good decisions as well.
> our constant claims that quality and security matter
I'm 13 years into this industry, this is the first I'm hearing of this.
I’ve heard the "S" in IoT stands for Security.
same with openclaw
20 for me, and let's not exaggerate. We've given lip service to it this entire time. Hell look at any of the corps we're talking about (including where I work) and they're demanding "velocity without lowering the quality bar", but it's a lie: they don't care about the quality bar in the slightest.
One of my main lessons after a decent long while in security, is that most orgs care about security, *as long as it doesn't get in the way of other priorities* like shipping new features. So when we get something like Agentic LLM tooling where everything moves super fast, security is inevitably going to suffer.
I’m learning that projects, developed with the help of agents, even when developers claim that they review and steer everything, ultimately are not fully understood or owned by the developers, and very soon turns into a thousand reinvented wheels strapped together by tape.
> very soon turns into a thousand reinvented wheels strapped together by tape.
Also most of the long running enterprise projects I’ve seen - there was one that had been around for like 10 years and like about 75% of the devs I hadn’t even heard of and none of the original ones were in the project at all.
The thing had no less than three auditing mechanisms, three ways of interacting with the database, mixed naming conventions, like two validation mechanisms none of which were what Spring recommended and also configurations versioned for app servers that weren’t even in use.
This was all before AI, it’s not like you need it for projects to turn into slop and AI slop isn’t that much different from human slop (none of them gave a shit about ADRs or proper docs on why things are done a certain way, though Wiki had some fossilized meeting notes with nothing actually useful) except that AI can produce this stuff more quickly.
When encountered, I just relied on writing tests and reworking the older slop with something newer (with better AI models and tooling) and the overall quality improved.
Claude Code breaks production features and doesn't say anything about it. The product has just shifted gears with little to no ceremony.
I expect that from something guiding the market, but there have been times where stuff changes, and it isn't even clear if it is a bug or a permanent decision. I suspect they don't even know.
We're still in the very early days of generative AI, and people and markets are already prioritizing quality over quantity. Quantity is irrelevant when it comes value.
All code is not fungible, "irreverent code that kinda looks okay at first glance" might be a commodity, but well-tested, well-designed and well-understood code is what's valuable.
Generative what? Code is not a thing anymore, in fact it never really was, but now it's definitely not.
Code today can be as verbose and ugly as ever, because from here on out, fewer people are going to read it, understand and care about it.
What's valuable, and you know this I think, is how much money your software will sell for, not how fine and polished your code is.
Code was a liability. Today it's a liability that cost much much less.
and once you've got your wish: ugly code without tests or a way to comprehend it, but cheap!
How much value are you going to be able to extract over its lifetime once your customers want to see some additional features or improvements?
How much expensive maintenance burden are you incurring once any change (human or LLM generated) is likely to introduce bugs you have no better way of identifying than shipping to your paying customers?
Maybe LLM+tooling is going to get there with producing a comprehensible and well tested system but my anectodal experience is not promising. I find that AI is great until you hit its limit on a topic and then it will merrily generate tokens in a loop suggesting the same won't-work-fix forever.
What you wrote aligns with my experience so far. It's fast and easy to get something working, but in a number of cases it (Opus) just gets stuck 'spinning' and no number of prompts is going to fix that. Moreover - when creating things from scratch it tends to use average/insecure/ inefficient approaches that later take a lot of time to fix.
The whole thing reminds me a bit of the many RAD tools that were supposed to 'solve' programming. While it was easy to start and produce something with those tools, at some point you started spending way too much time working around the limitations and wished you started from scratch without it.
I'm of the opinion that the diligence of experts is part of what makes code valuable assets, and that the market does an alright job of eventually differentiating between reliable products/brands and operations that are just winging it with AI[1].
[1] https://museumoffailure.com/exhibition/wonka-chocolate-exper...
I would think that the better the code is designed and factored and refactored, the easier it is to maintain and evolve, detect and remove bugs and security vulnerabilties from it. The ease of maintenance helps both AI and humans.
There are limits to what even AI can do to code, within practical time-limits. Using AI also costs money. So, easier it is to maintain and evolve a piece of software, the cheaper it will be to the owners of that application.
You may not need to read it, but you still need to test it.
Code that has not been thoroughly tested is a greater liability, not a lesser one.l, the faster you can write it.
It's understandable and even desirable that a new piece of code rapidly evolves as they iterate and fix bugs. I'd only be concerned if they keep this pattern for too long. In the early phases, I like keeping up with all the cutting edge developments. Projects where dev get afraid to ship because of breaking things end up becoming bloated with unnecessary backward compatibility.
I recently listened to this episode from the Claude Code creator (here is the video version: https://www.youtube.com/watch?v=PQU9o_5rHC4) and it sounded like their development process was somewhat similar - he said something like their entire codebase has 100% churn every 6 months. But I would assume they have a more professional software delivery process.
I would (incorrectly) assume that a product like this would be heavily tested via AI - why not? AI should be writing all the code, so why would the humans not invest in and require extreme levels of testing since AI is really good at that?
I've gotta say, it shows. Claude Code has a lot of stupid regressions on a regular basis, shit that the most basic test harness should catch.
I feel like our industry goes through these phases where there's an obvious thought leader that everyone's copying because they are revolutionary.
Like Rails/DHH was one phase, Git/GitHub another.
And right now it's kinda Claude Code. But they're so obviously really bad at development that it feels like a MLM scam.
I'm just describing the feeling I'm getting, perhaps badly. I use Claude, I recommended Claude for the company I worked at. But by god they're bloody awful at development.
It feels like the point where someone else steps in with a rock solid, dependable, competitor and then everyone forgets Claude Code ever existed.
I use Claude Code because Anthropic requires me to in order to get the generous subscription tokens. But better tools exist. If I was allowed to use Cursor with my Claude sub I would in a heartbeat.
There are plenty of competitors! I’ve been using Copilot, RovoCLI, Gemni, and there’s OpenAI thing.
This aren't competitors, they're clones, it's a different thing.
CC leads and they follow.
I mean, I'm slowly trying to learn lightweight formal methods (i.e. what stuff like Alloy or Quint do), behavior driven development, more advanced testing systems for UIs, red-green TDD, etc, which I never bothered to learn as much before, precisely because they can handle the boilerplate aspects of these things, so I can focus on specifying the core features or properties I need for the system, or thinking through the behavior, information flow, and architecture of the system, and it can translate that into machine-verifiable stuff, so that my code is more reliable! I'm very early on that path, though. It's hard!
I heard from somebody inside Anthropic that it's really two companies, one which are using AI for everything and the other which spends all their time putting out fires.
Highly recommend trying pi.dev
It's fully open, fairly minimal, very extensible and (while getting very frequent updates) never has broken on me so far.
Been using it more and more in the last two months, switching more and more from codex to it now.
OpenCode's creator acknowledged that the ease of shipping has let them ship prototype features that probably weren't worth shipping and that they need to invest more time cleaning up and fixing things.
https://x.com/thdxr/status/2031377117007454421
Uff. This is exactly what Casey Muratori and his friend was talking about in of their more recent podcast. Features that would never get implemented because of time constraints now do thanks to LLMs and now they have a huge codebase to maintain
Not terrible if they proactively depricate slop features
Well that's good to hear, maybe they'll improve moving forward on the release aspect at least.
What to release > What to build > Build anything faster
I'm still trying to figure out how "open" it really is; There are reports that it phones home a lot[0], and there is even a fork that claims to remove this behavior[1]:
[0] https://www.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
[1] https://github.com/standardnguyen/rolandcode
the fact that somebody was able to fork it and remove behaviour they didn't want suggests that it is very open
that #12446 PR hasn't even been resolved to won't merge and last change was a week ago (in a repo with 1.8k+ open PRs)
I think there’s a conflict between “open” as in “open source”, and “open” as in “open about the practice” paired with the fact we usually don’t review software’s source scrupulously enough to spot unwanted behaviors”.
Must be a karmic response from “Free” /s
so how is telemetry not open? If you don't like telemetry for dogmatic reasons then don't use it. Find the alternative magical product whose dev team is able to improve the software blindfolded
> Find the alternative magical product whose dev team is able to improve the software blindfolded
The choice isn't "telemetry or you're blindfolded", the other options include actually interacting with your userbase. Surveys exist, interviews exist, focus groups exist, fostering communities that you can engage is a thing, etc.
For example, I was recruited and paid $500 to spend an hour on a panel discussing what developers want out of platforms like DigitalOcean, what we don't like, where our pain points are. I put the dollar amount there only to emphasize how valuable such information is from one user. You don't get that kind of information from telemetry.
> Surveys exist, interviews exist, focus groups exist, fostering communities that you can engage is a thing, etc.
We all know it’s extremely, extremely hard to interact with your userbase.
> For example I was paid $500 an hour
+the time to find volunteers doubled that, so for $1000 an hour x 10 user interviews, a free software can have feedback from 0.001% of their users. I dislike telemetry, but it’s a lie to say it’s optional.
—a company with no telemetry on neither of our downloadable or cloud product.
> We all know it’s extremely, extremely hard to interact with your userbase.
On the contrary, your users will tell you what you need to know, you just have to pay attention.
> I dislike telemetry, but it’s a lie to say it’s optional.
The lie is believing it’s necessary. Software was successful before telemetry was a thing, and tools without telemetry continue to be successful. Plenty of independent developers ship zero telemetry in their products and continue to be successful.
Or by testing it themselves.
Probably all describe problems stem from the developers using agent coding; including using TypeScript, since these tools are usually more familiar with Js/Js adjacent web development languages.
Perhaps the use of coding agents may have encouraged this behavior, but it is perfectly possible to do the opposite with agents as well — for instance, to use agents to make it easier to set up and maintain a good testing scaffold for TUI stuff, a comprehensive test suite top to bottom, in a way maintainers may not have had the time/energy/interest to do before, or to rewrite in a faster and more resource efficient language that you may find more verbose, be less familiar with, or find annoying to write — and nothing is forcing them to release as often as they are, instead of just having a high commit velocity. I've personally found AIs to be just as good at Go or Rust as TypeScript, perhaps better, as well, so I don't think there was anything forcing them to go with TypeScript. I think they're just somewhat irresponsible devs.
> I think they're just somewhat irresponsible devs.
Before coding agents it took quite a lot more experience before most people could develop and ship a successful product. The average years of experience of both core team and contributors was higher and this reflected in product and architecture choices that really have an impact, especially on non-functional requirements.
They could have had better design and architecture in this project if they had asked the AI for more help with it, but they did not even know what to ask or how to validate the responses.
Of course, lots of devs with more years of experience would do just as badly or worse. What we are seeing here though is a filter removed that means a lot of projects now are the first real product everyone the team has ever developed.
I agree that Opencodr is using a lot of RAM, but regarding the features, I am ak only using the built in features and I wouldn't say they are too many, they are just enough for a complete workflow. If you need more you can install plugins, which I haven't done yet and it's my daily driver for four months.
The moment that OpenCode, after helping fix a Dockerfile issue, decided it was time to deploy to prod without asking for consent, I was out.
You must never rely on AI itself for authorization… don’t let it run on an environment where it can do that. I can’t believe this needs to be said but everyone seems to have lost their mind and decided to give all their permissions away to a non deterministic thing that when prompted correctly will send it all out to whoever asks it nicely.
The value of having (and executing) a coherent product vision is extremely undervalued in FOSS, and IMO the difference between a successful project in the long-term and the kind of sploogeware that just snowballs with low-value features.
> The value of having (and executing) a coherent product vision is extremely undervalued in FOSS
Interesting you say this because I'd say the opposite is true historically, especially in the systems software community and among older folks. "Do one thing and do it well" seems to be the prevailing mindset behind many foundational tools. I think this why so many are/were irked by systemd. On the other hand newer tools that are more heavily marketed and often have some commercial angle seem to be in a perpetual state of tacking on new features in lieu of refining their raison d'etre.
negative values even.
Is there a name for these types of "overbearing" and visually busy "TUIs"? It seems like all the other agents have the same aesthetic and it is unlike traditional nurses or plain text interfaces in a bad way IMO. The constant spinners, sidebars and needless margins are a nuisance to me. Especially over an ssh connection in a tmux session it feels wrong.
I’ve pretty much ended up with a pi.dev+gpt-5 and Claude combo. Sometimes I use GLM with Pi if I run out of quota or need some simple changes.
I tried Opencode but it was just too much? Same with Crush, 10/10 pretty but lacking in features I need. LSP support was cool though.
Can you expands on the cool part of LSP support ? I"m curious and "on paper" it sounds desirable but I'm unclear on the pluses
That sounds a lot like my experience with claude code. IDK about OpenCode, but claude code is also largely written by LLMs, and you can tell.
I’m a little surprised by your description of constant releases and instability. That matches how I would describe Claude Code, and has been one of the main reasons I tend to use OpenCode more than Claude Code.
OpenCode has been much more stable for me in the 6 months or so that I’ve been comparing the two in earnest.
I use Droid specifically because Claude Code breaks too often for me. And then Droid broke too (but rarely), and I just stuck to not upgrading (like I don't upgrade WebStorm. Dev tools are so fragile)
I’ve been testing opencode and it feels TUI in appearance only. I prefer commandline and TUIs and in my mind TUI idea is to be low level, extremely portable interface and to get out of the way. Opencode does not have low color, standard terminal theme so had to switch to a proper terminal program. Copy paste is hijacked so I need to write code out to file in order to get a snippet. The enter key (as in the return by the keypad) does not work for sending a line. I have not tested but don’t think this would work over SSH even. I have been googling around to find if I am holding it wrong but it feels to break expectations of a terminal app in a way that I wish they would have made it a gui. Makes me sad because I think the goods are there and it’s otherwise good.
I don’t think good TUI’s are the same as good command line programs. Great tui apps would to me be things like Norton/midnight commander, borlands turbo pascal, vim, eMacs and things like that
Yes cli and tui are not the same, but I expect TUI to work decent in general terminal emulator and not acitvely block copying and pasting. Having to install supported terminal emulator goes against the vibe.
Yeah every time I want to like it, scrolling is glitched vs codex and Claude. And other various things like: why is this giant model list hard coded for ollama or other local methods vs loading what I actually have...
On top of that. Open code go was a complete scam. It was not advertised as having lower quality models when I paid and glm5 was broken vs another provider, returning gibberish and very dumb on the same prompt
I agree. Since tools like Codex let you use SOTA models more cheaply and with looser weekly limits, I think they’re the smarter choice.
Drives me nuts that we have TUIs written in friggin TS now.
That being said, I do prefer OpenCode to Codex and Claude Code.
Why to you prefer? I have a different experience, and want to learn.
(I'm also hating on TS/JS: but some day some AI will port it to Rust, right?)
I find it more configurable, for defining (sub)agent abilities, plugins, and different models/providers, of course.
The biggest reason is I don't like being locked into an ecosystem. I can use whatever I want with OpenCode, not so much with Codex and Claude Code. Right now I'm only using GPT with it, but I like the option.
CC I have the least experience with. It just seemed buggy and unpolished to me. Codex was fine, but there was something about it that just didn't feel right. It seemed fined for code tasks but just as often I want to do research or discuss the code base, and for whatever reason I seemed to get terse less useful answers using Codex even when it's backed by the same model.
OpenCode works well, I haven't had any issues with bugs or things breaking, and it just felt comfortable to use right from the jump.
> they add, remove, refine, change, fix, and break features constantly at that accelerated pace.
I wonder how much of this is because the maintainers are using OpenCode to vibe the code for OpenCode.
claude code easily uses 10+GB in single session :) 1Gb sounds very efficient by comparison
That is very disappointing coz I've been wanting to try an alternative to Gemini CLI for exactly these reasons. The AI is great but the actual software is a buggy, slow, bloated blob of TypeScript (on a custom Node runtime IIUC!) that I really hate running. It takes multiple seconds to start, requires restarting to apply settings, constantly fucks up the terminal, often crashes due to JS heap overflows, doesn't respect my home dir (~/.gemini? Come on folks are we serious?), has an utterly unusable permission system, etc etc. Yet they had plenty of energy to inject silly terminal graphics and have dumb jokes and tips scroll across the screen.
Is Claude Code like this too? I wonder if Pi is any better.
A big downside would be paying actual cost price for tokens but on the other hand, I wouldn't be tied to Google's model backend which is also extremely flaky and unable to meet demand a lot of the time. If I could get real work done with open models (no idea if that's the case yet) and switch providers when a given provider falls over, that would be great.
I use Pi with Aliyun, which cost a flat ¥40 (~€5) per month for GLM-5, Kimi K2.5, Minmax and a few other models.
Honestly, these models seem quite on par with Claude. Some days they seem slightly worse, some days I can't tell the difference.
AFAIK, the usage quota is comparable to the Claude $200 subscription.
> Is Claude Code like this too? I wonder if Pi is any better.
I'm very happy with Pi myself (running it on a small VPS so that I don't need to do sandboxing shenanigans).
you can use subscriptions with pi.
Claude will also happily write a huge pile of junk into your home directory, I am sad to report. The permissions are idiotic as well, but I always use it in a container anyway. But I have not had it crash and it hasn't been slow starting for me.
You are describing a typical state of a wibecoded project.
> they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things
Tbf, this seems exactly like Claude Code, they are releasing about one new version per day, sometimes even multiple per day. It’s a bit annoying constantly getting those messages saying to upgrade cc to the latest version
Oh wow. I got multiple messages in a day and just assumed it was a cache bug.
It's annoying how I always get that "claude code has a native installer xyz please upgrade" message
I think it goes away if you actually use the native installer ...
I've never gotten that message?
This is why I'm taking a wait-and-see approach to these tools on HN myself. My month with Claude Code (the TUI, not the GUI) was amazing from an IT POV, just slop-generating niche tools I could quickly implement and audit (not giant-ass projects), but I ain't outsourcing that to another company when Qwen et al are right there for running on my M1 Pro or RTX 3090.
I'm looking forward to more folks building these kinds of tools with a stronger focus on portability via API or loading local models, as means of having a genuinely useful assistant or co-programmer rather than paying some big corp way too much money (and letting them use my data) for roughly the same experience.
The types of models you can run locally on that hardware are toys in comparison to the foundation models
Curious about your setup of qwen on m1 pro. Care to share the toolkit?
Do you have a setup with a local Qwen that can write out niche tools pretty well? I have been curious about how much I could do local.
Yeah I tried using it when oh-my-opencode (now oh-my-openagent) started popping off and found it had highly unstable. I just stick with internal tooling now.
Why not just code your own agent harness
How much of the development is being done by humans?
yeah I agree is way too buggy, nice tho and I appreciate the effort but really feels sloppy
What is a better option?
For serious coding work I use the Zed Agent; for everything else I use pi with a few skills. Overall, though, I'd recommend Pi plus a few extensions for any features you miss extremely highly. It's also TypeScript, but doesn't suffer from the other problems OC has IME. It's a beautiful little program.
Big +1 to Pi[1]. The simplicity makes it really easy to extend yourself too, so at this point I have a pretty nice little setup that's very specific to my personal workflows. The monorepo for the project also has other nice utilities like a solid agent SDK. I also use other tools like Claude Code for "serious" work, but I do find myself reaching for Pi more consistently as I've gotten more confident with my setup.
[1] https://github.com/badlogic/pi-mono/tree/main/packages/codin...
pi.dev is worth checking out. The basic idea is they provide a minimalist coding agent that's designed to be easy to extend, so you can tailor the harness to suit your needs without any bloat.
One of the best features is they haven't been noticed by Anthropic yet so you can still use your Claude subscription.
I've been building VT Code (https://github.com/vinhnx/vtcode), a Rust-based semantic coding agent. Just landed Codex OAuth with PKCE exchange, credentials go into the system keyring.
I build VT Code with Tree-sitter for semantic understanding and OS-native sandboxing. It's still early but I confident it usable. I hope you'll give it a try.
https://charm.land/crush
I tried crush when it first came out - the vibes were fun but it didn’t seem to be particularly good even vs aider. Is it better now?
Disclaimer: I work for Charm, so my opinion may be biased.
But we did a lot of work on improving the experience, both on UX, performance, and the actual reliability of the agent itself.
I would suggest you to give it a try.
Will do thanks - any standout features or clever things for me to look out for?
We just launched this: https://charm.land/blog/crush-and-docker-mcp/
Also, non-interactive support, useful for some workflows:
https://github.com/charmbracelet/crush/releases/tag/v0.48.0
https://github.com/charmbracelet/crush/releases/tag/v0.50.0
Yeah just try to select text to copy. Nope. Try to scroll back in terminal or tmux. Nope. Overbearing for sure.
this is a bot comment
its hard not to wonder if they are taking their own medicine, but not quite properly
I tried it briefly and the practice - argued for strategy for operation actually - to override my working folder seelction and altering to the parent root git folder is a no go.
Isn't this pretty much the standard across projects that make heavy use of AI code generation?
Using AI to generate all your code only really makes sense if you prioritize shipping features as fast as possible over the quality, stability and efficiency of the code, because that's the only case in which the actual act of writing code is the bottleneck.
I don't think that's true at all. As I said, in a response to another person blaming it on agentic coding above, there are a very large number of ways to use coding agents to make your programs faster, more efficient, more reliable, and more refined that also benefit from agents making the code writing research, data piping, and refactoring process quicker and less exhausting. For instance, by helping you set up testing scaffolding, handling the boilerplate around tests while you specify some example features or properties you want to test and expands them, rewriting into a more efficient language, large-scale refactors to use better data structures or architectures, or allowing you to use a more efficient or reliable language that you don't know as well or find to have too much boilerplate or compiler annoyance to otherwise deal with yourself. Then there are sort of higher level more phenomenological or subjective benefits, such as helping you focus on the system architecture and data flow, and only zoom in on particular algorithms or areas of the code base that are specifically relevant, instead of forever getting lost in the weeds of thinking about specific syntax and compiler errors or looking up a bunch of API documentation that isn't super important for the core of what you're trying to do and so on.
Personally, I find this idea that "coding isn't the bottleneck" completely preposterous. Getting all of the API documentation, the syntax, organizing and typing out all of the text, finding the correct places in the code base and understanding the code base in general, dealing with silly compiler errors and type errors, writing a ton of error handling, dealing with the inevitable and inoraticable boilerplate of programming (unless you're one of those people that believe macros are actually a good idea and would meaningfully solve this), all are a regular and substantial occurrence, even if you aren't writing thousands of lines of code a day. And you need to write code in order to be able to get a sense for the limitations of the technology you're using and the shape of the problem you're dealing with in order to then come up with and iterate on a better architecture or approach to the problem. And you need to see your program running in order to evaluate whether it's functionality and design a satisfactory and then to iterate on that. So coding is actually the upfront costs that you need to pay in order to and even start properly thinking about a problem. So being able to get a prototype out quickly is very important. Also, I find it hard to believe that you've never been in a situation where you wanted to make a simple change or refactor that would have resulted in needing to update 15 different call sites to do properly in a way that was just slightly variable enough or complex enough that editor macros or IDE refactoring capabilities wouldn't be capable of.
That's not to mention the fact that if agentic coding can make deploying faster, then it can also make deploying the same amount at the same cadence easier and more relaxing.
You're both right. AI can be used to do either fast releases or well designed code. Don't say both, as you're not making time, you're moving time between those two.
Which one you think companies prefer? Or if you're a consulting business, which one do you think your clients prefer?
> AI can be used to do either fast releases or well designed code
I have yet to actually see a single example of the latter, though. OpenCode isn't an isolated case - every project with heavy AI involvement that I've personally examined or used suffers from serious architectural issues, tons of obvious bugs and quirks, or both. And these are mostly independent open source projects, where corporate interests are (hopefully) not an influence.
I will continue to believe it's not actually possible until I am proven wrong with concrete examples. The incentives just aren't there. It's easy to say "just mindlessly follow X principle and your software will be good", where X is usually some variation of "just add more tests", "just add more agents", "just spend more time planning" etc. but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
> It's easy to say "just mindlessly follow X principle and your software will be good", where X is usually some variation of "just add more tests", "just add more agents", "just spend more time planning" etc
That's a complete strawman of what I — or others trying to learn how to use coding agents to increase quality, like Simon Willison or the Oxide team — am saying.
> but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
This is just a no true Scotsman. I prefer to use coding agents because they don't forget details, or get exhausted, or overwhelmed, or lazy, or give up, ever — whereas I might. Therefore, they allow me to do all of the things that improve code and software quality more extensively and thoroughly, like refactors, performance improvements, and tests among other things (because yes, there is no single panacea). Furthermore, I do still care about the clarity, concision, modularity, referential transparency, separation of concerns, local reasonability, cognitive load, and other good qualities of the code, because if those aren't kept up a) I can't review the code effectively or debug things as easily when they go wrong, b) the agent itself will struggle to male changes without breaking other things, and struggle to debug, c) those things often eventually effect the quality of the end state software.
Additionally, what you say is empirically false. Many people who do deeply value quality software and code quality, such as the creators of Flask, Redis, and SerenityOS/Ladybird, all use and value agentic coding.
Just because you haven't seen good quality software with a large amount of agentic influence doesn't mean it isn't possible. That's very close minded.
Show me an example then. I want to see an example of quality software that makes heavy use of AI generated code (as in, basically written entirely by AI similar to OpenCode), led by developer(s) who care deeply about software quality but still choose to not write code themselves.
I tried running Opencode on my 7$/yr 512mb vps but it had the OOM issue and yes it needs 1GB of ram or more.
I then tried running other options like picoclaw/picocode etc but they were all really hard to manage/create
The UI/UX I want is that I can just put my free openrouter api key in and then I am ready to go to get access to free models like Arcee AI right now
After reading your comments/I read this thread, I tried crush by charmbracelet again and it gives the UI/UX that I want.
I am definitely impressed by crush/ the charm team. They are on HN and they work great for me, highly recommended if you want something which can work on low constrained devices
I do feel like Charm's TUI's are too beautiful in the sense that running a connection over SSH can delay so when I tried to copy some things, the delay made things less copy-able but overall, I think that I am using Crush and I am happy for the most part :-)
Edit: That being said, just as I was typing this, Crush took all the Free requests from Openrouter that I get for free so it might be a bit of minor issue but overall its not much of an issue from Crush side, so still overall, my point is that Crush is worth checking out
Kudos to the CharmBracelet team for making awesome golang applications!
Rust > TS Codex > OpenCode
By default OpenCode sends all of your prompts to Grok's free tier to come up with chat summaries for the UI.
To change that, you need to set a custom "small model" in the settings.
This is my main problem I have with it: It sends data and loads code left and right by default. For instance, the latest plugin packages are automatically installed on every startup. Their “Zen” provider is enabled by default so you might accidentally upload your code base to their servers. Better yet: The web UI has a button that just uploads the entire session to their servers WITH A SINGLE CLICK for sharing.
The situation is ... pretty bad. But I don’t think this is particularly malicious or even a really well considered stance, but just a compromise in order to move fast and ship useful features.
To make it easily adoptable by anyone privacy conscious without hours of tweaking, there should be an effort to massively improve this situation. Luckily, unlike Claude Code, the project is open source and can he changed!
There is some kind of fitting irony around agentic coding harnesses mainly being maintained by coding agents themselves, and as a result they are all a chaotic mess.
I had to double check this. Here is summary:
The model selection for title generation works as follows (prompt.ts:1956-1960): 1. If the title agent has an explicit model configured — that model is used. 2. Otherwise, it tries Provider.getSmallModel(providerID) — which picks a "small" model from the same provider as the current session, using this priority list (provider.ts:1396-1402): - claude-haiku-4-5 / claude-haiku-4.5 / 3-5-haiku / 3.5-haiku - gemini-3-flash / gemini-2.5-flash - gpt-5-nano - (Copilot adds gpt-5-mini at the front; opencode provider uses only gpt-5-nano) 3. If no small model is found — it falls back to the same model currently being used for the session. So by default, title generation uses a cheaper/faster small model from the same provider (e.g., Haiku if on Anthropic, Flash if on Google, nano if on OpenAI), and if none are available, it just uses whatever model the user is chatting with. You can also override this entirely by configuring a model on the title agent.
When I did this, I used a single local llama.cpp server instance as my main model without setting a small model and it did not use it for chat titles while I used it for prompts.
Chat titles would work even when the local llama.cpp server hadn't started, and it was never in the the llama.cpp logs, it used an external model I hadn't set up and had not intended to use.
It was only when I set `small_model` that I was able to route title generation to my own models.
Fwiw this got changed about a week ago, where they changed the logic to match the documentation rather than default to sending your prompts to their servers. This is why so many people have noticed this happening but if you ask an AI about it right now it will say this is not true.
Personally I think it's necessary to run opencode itself inside a sandbox, and if you do that you can see all of the rejected network calls it's trying to make even in local mode. I use srt and it was pretty straightforward to set up
Also, even when using local models in ollama or lmstudio, prompts are proxied via their domain, so never put anything sensitive even when using local setup
https://old.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
They also don't let you run all local models, but specific whitelisted by another 3rd party: https://github.com/anomalyco/opencode/issues/4232
To be clear, that seems to be about the webui only, the TUI doesn't seem affected. I haven't fully investigated this myself, but when I run opencode (1.2.27-a6ef9e9-dirty) + mitmproxy and using LM Studio as the backend, when starting opencode + executing a prompt, I only see two requests, both to my LM Studio instance, both normal inference requests (one for the chat itself + one for generating the title).
Everything you read on the internet seems exaggerated today. Especially true for reddit, and especially especially true for r/LocalLllama which is a former shadow of itself. Today it's mostly sockpuppets pushing various tools and models, and other sockpuppets trying to push misinformation about their competitors tools/models.
Geez there should be a big warning on the tin about this. They’re so neatly integrated with copilot that I assumed (and told others) that they had all the privacy guarantees of copilot :(
this isn't true
it will use whatever small model there is in your provider
we had a fallback where we provided free small models if your provider did not have one (gpt nano)
some configs fell back to this unexpectedly which upset people so we removed it
I can tell that you’re doing all of this in the name of first-use UX. It’s working: The out of the box experience is really seamless.
But for serious (“grown up”) use, stuff like this just doesn’t fly. At all. We have to know and be able to control exactly where data gets sent. You can’t just exfiltrate our data to random unvetted endpoints.
Given the hurt trust of the past, there also needs to be a communication campaign (“actually we’re secure now”), because otherwise people will keep going around claiming that OpenCode sends all of your data to Grok. This would really unnecessarily hurt the project in the long run.
Not true according to a CGPT question:
More importantly, the current dev branch source for packages/opencode/src/session/summary.ts shows summarizeMessage() now only computes diffs and updates the message summary object; it does not make an LLM call there anymore. The current code path calls summarizeSession() and summarizeMessage(), and summarizeMessage() just filters messages, computes diffs, sets userMsg.summary.diffs, and saves the message.
https://github.com/anomalyco/opencode/blob/dev/packages/open...
Yikes... sending prompts to a third party by default with no disclosure in the setup flow is a rough look for a tool that positions itself as the open sources alternative. "Open" loses meaning fast if the defaults work against the user.
Seems like an anti-pattern to me to run AI models without user’s consent.
? The whole idea of a coding assistant is to send all your interactions with the program to the llm model.
To the provider you select in the UI, I agree. But OpenCode automatically sends prompts to their free "Zen" proxy, even without choosing it in the UI.
Imagine someone using it at work, where they are only allowed to use a GitHub Copilot Business subscription (which is supported in OpenCode). Now they have sent proprietary code to a third party, and don't even know they're doing it.
This is exactly me considering what I might have leaked to god knows who via grok. I was hyped by opencode but now I’m thinking of alternatives. A huge red flag… at best irresponsible?
My understanding is that it’s best to set a whitelist in enabled_providers, which prevents it from using providers you don’t anticipate.
Are you using Grok for the coding? Because I have Copilot connected and I can see the request to Copilot for the summaries - with no "small model" setting even visible in my settings.
I found out about OpenCode through the Anthropic feud. I now spend most of my AI time in it, both at work and at home. It turns out to be pretty great for general chat too, with the ability to easily integrate various tools you might need (search being the top one of course).
I have things to criticize about it, their approach to security and pulling in code being my main one, but over all it’s the most complete solution I’ve found.
They have a server/client architecture, a client SDK, a pretty good web UI and use pretty standard technologies.
The extensibility story is good and just seems like the right paradigms mostly, with agents, skills, plugins and providers.
They also ship very fast, both for good and bad, I’ve personally enjoyed the rapid improvements (~2 days from criticizing not being able to disable the default provider in the web ui to being able to).
I think OpenCode has a pretty bright future and so far I think that my issues with it should be pretty fixable. The amount of tasteful choices they’ve made dwarfs the few untasteful ones for me so far.
Try pi.dev+gpt-5, it works amazingly well
Just note that you need to either create any special features yourself or find an implementation by someone else. It’s pretty bare bones by default
The team also is not breathlessly talking about how coding is dead. They have pretty sane takes on AI coding including trying to help people who care about code quality.
Couldn’t tell by the way they write their software.
They probably don't have to write OKRs every quarter saying the opposite.
Do you follow them? They most definitely pump out insane takes on twitter. But maybe that’s just engagement bait for a check, of course.
I do like OpenCode, and have been using it in and off since last July. But I feel like they’re trying to stuff too much GUI into a TUI? Due to this I find myself using Codex and Pi more often. But am still glad OpenCode and their Zen product exist.
opencode stands out as one of the few agents with a proper client server architecture that allows something like openchambers great vscode extension so its possible to seamlessly switch between tui, vscode, webapp, desktop app. i think there is hardly a usable alternative for most coding agent usecases (assuming agents from model providers are a no go, they cannot be allowed to own the tools AND the models). But its also far from perfect: the webui is secretly served from their servers instead of locally for no reason. worse the fallback route gets also sent to their servers so any unknown request to opencode api ends up being sent to opencode servers potentially leaking data. the security defaults are horrific, its impossible to use it safely outside a controlled container. it will just serve your whole hard drive via rest endpoint and not constrain to project folders. the share feature uploading your conversations to their servers is also so weirdly communicated and implemented that it leaves a bad taste. I dont think this will become much better until the agent ecosystem is more modular and less monolith, acp, a2a and mcp need to become good enough so tools, prompts, skills, subagent setups and workflow engines and UIs are completely swappable and the agent core has to only focus on the essentials like runtime and glue architecture. i really hope we dont see all of these grow into full agent oses with artificial lock in effects and big effort buy in.
The Agent that is blacklisted from Anthropic AI, soon more to come.
I really like how their subagents work, as a bonus I get to choose which model is in which agent. Sadly I have to resort to the mess that Anthropic calls Claude Code
They are not blacklisted. You are allowed to use the API at commercial usage pricing. You are just not allowed to use your Claude Code subscription with OpenCode (or any other third‑party harness for the record).
I have my own harness I wrap Claude CLI in, I wonder if I'm breaking the rules...
If you're not paying full-fat API prices, then probably.
From what I've heard, the metrics used by Anthropic to detect unauthorized clients is pretty easy to sidestep if you look at the existing solutions out there. Better than getting your account banned.
No, they specifically said it’s only if you’re trying to build a whole other product for public consumption on top of it
If you’re just essentially calling claude -p you’re fine
So it's less 'blacklist' and more a licensing gotcha designed to crush price arbitrage, basically rent-seeking by toggling where the tollbooth sits.
Has it occurred to anyone that Anthropic highest in the industry API pricing is a play to drive you into their subscription? For the lock-in?
The highest in in the industry for API pricing right now is GPT-5.4-Pro, OpenRouter adding that as an option in their Auto Router was when I had to go customise the routing settings because it was not even close to providing $30/m input tokens and $180/m output tokens of value (for context Opus 4.6 is $5/m input and $25/m output)
(Ok, technically o1-pro is even more expensive, but I'm assuming that's a "please move on" pricing)
Sometimes people want to be real pedants about licensing terms when it comes to OSS, assuming such terms are completely bulletproof, other times people don't think the terms of their agreement with a service provider should have any force at all.
I dont understand this, what is the difference, technically!
With Anthropic, you either pay per token with an API key (expensive), or use their subscription, but only with the tools that they provide you - Claude, Claude Cowork and Claude Code (both GUI and CLI variants). Individuals generally get to use the subscriptions, companies, especially the ones building services on top of their models, are expected to pay per token. Same applies to various third party tools.
The belief is that the subscriptions are subsidized by them (or just heavily cut into profit margins) so for whatever reason they're trying to maintain control over the harness - maybe to gather more usage analytics and gain an edge over competitors and improve their models better to work with it, or perhaps to route certain requests to Haiku or Sonnet instead of using Opus for everything, to cut down on the compute.
Given the ample usage limits, I personally just use Claude Code now with their 100 USD per month subscription because it gives me the best value - kind of sucks that they won't support other harnesses though (especially custom GUIs for managing parallel tasks/projects). OpenCode never worked well for me on Windows though, also used Codex and Gemini CLI.
>or perhaps to route certain requests to Haiku or Sonnet instead of using Opus for everything, to cut down on the compute
You can point Claude Code at a local inference server (e.g. llama.cpp, vLLM) and see which model names it sends each request to. It's not hard to do a MITM against it either. Claude Code does send some requests to Haiku, but not the ones you're making with whatever model you have it set to - these are tool result processing requests, conversation summary / title generation requests, etc - low complexity background stuff.
Now, Anthropic could simply take requests to their Opus model and internally route them to Sonnet on the server side, but then it wouldn't really matter which harness was used or what the client requests anyway, as this would be happening server-side.
Sounds pretty sane, the same way how OpenWebUI and probably other software out there also has a concept of “tool models”, something you use for all the lower priority stuff.
Actually curious to hear what others think about why Anthropic is so set on disallowing 3rd party tools on subscriptions.
The sota models are largely undifferentiated from each other in performance right now. And it’s possible open weight models will get “good enough” relatively soonish. This creates a classic case where inference becomes a commodity. Commodities have very low margins. Training puts them in an economic hole where low margins will kill them.
So they have to move up the stack to higher margin business solutions. Which is why they offer subsidized subscription plans in the first place. It’s a marketing cost. But they want those marketing dollars to drive up the stack not commodity inference use cases.
Anthropic's model deployments for Claude Code are likely optimized for Claude Code. I wouldn't be surprised if they had optimizations like sharing of system prompt KV-cache across users, or a speculative execution model specifically fine-tuned for the way Claude Code does tool calls.
When setting your token limits, their economics calculations likely assume that those optimizations are going to work. If you're using a different agent, you're basically underpaying for your tokens.
- OR - it's about lock-in.
Build the single pane of glass everyone uses. Offer it under cost. Salt the earth and kill everything else that moves.
Nobody can afford to run alternative interfaces, so they die. This game is as old as time. Remember Reddit apps? Alternative Twitter clients?
In a few years, CC will be the only survivor and viable option.
It also kneecaps attempts to distill Opus.
It’s probably a mixture of things including direct control over how the api is called and used as pointed out above and giving a discount for using their ecosystem. They are in fact a business so it should not surprise anyone they act as one.
It might well be a mixture, but 95% of that mixture is vendor lock in. Same reason they don't support AGENTS.md, they want to add friction in switching.
They can try add as much as friction they want. A simple rename in the files and directories like .claude makes the thing work to move out of CC.
It’s not like moving from android to iOS.
You'd be surprised how effective small bits of friction are.
If it was lock in they wouldn't make it absolutely trivial to change inference providers in Claude Code.
The goal is to use Anthropic subscriptions outside of Claude Code!! That is the lock in.
It's very straightforward to instrument CC under tmux with send-keys and capturep. You could easily use that for distillation, IMO. There are also detailed I/O logs.
Subscription = token that requires refreshing 1-2x/day, and you get the freedom to use your subscription-level usage amount any way you want.
API = way more expensive, allowed to use on your terms without anthropic hindering you.
Also, Subscription: against the TOS of Claude Code, need to spoof a token and possibly get banned due to it.
Yup. And right now I'm straight-up breaking Claude's TOS by modifying OpenCode to still accept tokens. But I only have a few days left and don't care if they ban me. I'm using what I paid for.
Anthropic has an API, you can use any client but they charge per input/output/cache token.
One-price-per-month subscriptions (Claude Code Pro/MAX @ $20/$100/$200 a month) use a different authentication mechanism, OAUTH. The useful difference is you get a lot more inference than you can for the same cost using the API but they require you to use Claude Code as a client.
Some clients have made it simple to use your subscription key with them and they are getting cease and desist letters.
about 30 times more cost
Was it not obvious what the OP meant by blacklisted?
Blacklisted usually means something is banned. OpenCode is not banned from using Anthropic's API.
No, it was not? For those whose native language is English, "blacklisted" implies Claude API will not allow OpenCode.
API will, they just can spoof Claude Code OAUTH credentials
You can still use OpenCode with the Anthropic API.
Yep. That's what I do. Just API keys and you can switch from Opus to GPT especially this week when Opus has been kind of wonky.
I pay $100/mo to Anthropic. Yesterday I coded one small feature via an API key by accident and it cost $6. At this rate, it will cost me $1000/mo to develop with Opus. I might as well code by hand, or switch to the $20 Codex plan, which will probably be more than enough.
I'd rather switch to OpenAI than give up my favorite harness.
This is the intention. They do not want folks that can’t pay to use their service.
SOTA models cost SOTA prices. Nothing new there
Out of curiosity, what's your next monthly subscription in terms of price?
Electricity, $95/mo.
Now you got me thinking my electric company should start offering subscription tiers in these uncertain energy times...
Ours never will, they're a cartel, sadly. If you mean fixed subscription, next one is Netflix, I think, or my server provider at $40 or so.
My monthly "connection fee" is more than that (no solar, just EV). Your cartel needs to step it up!
For me it's $0.8/kWh during peak, $0.47 off peak, and super off peak of $0.15. I accidentally left a little mini 500W heater on all day, while I was out, costing > 5% of your whole month!
Wow, what the hell.
Yeah I had a similar experience one time. Which is why I laugh when people suggest Anthropic is profitable. Sure, maybe if everyone does API pricing. Which they won’t because it’s so damn expensive. Another way to think about it is API pricing is a glimpse into the future when everyone is dependent on these services and the subscription model price increases start.
I don't get why people talk about ChatGPT as some great saviour though, they're in the same boat but just have more money to burn.
Or have Claude write the code and Gemini review it. (Was using GPT for review until the recent Pentagon thing.)
You can also review the code you ship yourself.
'just API key' lol. just hundreds of dollars at a minimum
Yes. And many companies pay that.
More what to come?
probably more agents to be blocked by anthropic. i've seen theo from t3.gg go through a bunch of loopholes to support claude in his t3code app just so anthropic doesn't sue their asses.
a $3000 AMD395+ will get you pretty close to a open development environment.
There are boards starting in the $1500-$2000 range, and complete systems in the $2500-$2700 range. I actually don't know of any Strix Halo mini PCs that cost $3000, do you?
EDIT: The system I bought last summer for $1980 and just took delivery of in October, Beelink GTR 9 Pro, is now $2999.... wow...
RAM has gone up a lot since last summer.
the boards now are pricier, at least the framework one. I got it for 1700, and now its ~$2400.
not mini PCs, no, but there are laptops that do
I bought mine, a mini PC, for $1400 just six months ago. This bubble will pass.
I'm a https://pi.dev man myself.
Why most of those tools are written in js/ts?
JS is not something that was developed with CLI in mind and on top of that that language does not lend itself to be good for LLM generation as it has pretty weak validation compared to e.g. Rust, or event C, even python.
Not to mention memory usage or performance.
TS is just a boring default.
It’s simply one of the most productive languages. It actually has a very strong type system, while still being a dynamic language that doesn’t have to be compiled, leading to very fast iteration. It’s also THE language you use when writing UIs. Execution is actually pretty fast through the runtimes we have available nowadays.
The only other interpreted language is Python and that thoroughly feels like a toy in comparison (typing situation still very much in progress, very weak ORM situation, not even a usable package manger until recently!).
> It’s also THE language you use when writing UIs
I'm unsure that I agree with this, for my smaller tools with a UI I have been using rust for business logic code and then platform native languages, mostly swift/C#.
I feel like with a modern agentic workflow it is actually trivial to generate UIs that just call into an agnostic layer, and keeping time small and composable has been crucial for this.
That way I get platform native integration where possible and actual on the metal performance.
If Python has a "very weak ORM situation", what is it about the TS ORM scene that makes it stronger by comparison? Is there one library in particular that stands out?
I was going to say that pnpm isn't that old but wikipedia says 2017!
pnpm is amazing for speed and everybody should use it! but even with npm before it, at least it was correct. I had very few (none?) mysterious issues with it that could only be solved by nuking the entire environment. That is more than I can say about the python package managers before uv.
uv + PEP723 is amazing for CLI tools
You download one .py, run it and uv automatically downloads and installs any requirements to a virtual environment and runs it
Has the developer tooling been fixed? Doesn't it use an ephemeral environment? How do editors/LSPs know where to get dependency information?
For a TUI agent, runtime performance is not the bottleneck, not by far. Hackability is the USP. Pi has extensions hotreloading which comes almost for free with jiti. The fact that the source is the shipped artifact (unlike Go/Rust) also helps the agent seeing its own code and the ability to write and load its own extensions based on that. A fact that OpenClaw’s success is in part based on IMO.
I can’t find the tweet from Mario (the author), but he prefers the Typescript/npm ecosystem for non-performance critical systems because it hits a sweet spot for him. I admire his work and he’s a real polyglot, so I tend to think he has done his homework. You’ll find pi memory usage quite low btw.
OK, make sense, but there are also claw clones that are in Rust (and self modifying).
Also python ones would also allow self modifying. I'm always puzzled (and worried) when JS is used outside of browsers.
I'm biased as I find JS/TS rather ugly language compared to anything other basically (PHP is close second). Python is clean, C has performance, Rust is clean and has performance, Java has the biggest library and can run anywhere.
In pi’s case there is a plugin system. It’s much easier to make a self extending agent work with Python or JavaScript than most other languages. JavaScript has the benefit that it has a great typing system on top with TypeScript.
Same.
Pi is refreshingly minimal in terms of system prompts, but still works really well and that makes me wonder whether other harnesses are overdoing. Look at OpenCode's prompts, for instance - long, mostly based on feels and IMO unnecessary. I would've liked to just overwrite OC's system prompts with Pi's (to get other features that Pi doesn't have) but that isn't possible today (without maintaining a custom fork)
Pi is the Emacs of coding AI agents.
It's a pity it's written in TS, but at least it can draw from a big contributor pool.
There is https://eca.dev/ too, which might worth considering, which is a UI agnostic agent, a bit like LSP servers.
I just found out about pi yesterday. It's the only agent that I was able to run on RISC-V. It's quite scary that it runs commands without asking though.
It has zero safeguards by default
But the magic is that it knows how to modify itself, if you need a plan mode you can ask it to implement it :)
Same here!
The simplicity of extending pi is in itself addictive, but even in its raw form it does the job well.
Before finding pi I had written a lot of custom stuff on top of all the provider specific CLI tools (codex, Claude, cursor-agent, Gemini) - but now I don’t have to anymore (except if I want to use my anthropic sub, which I will now cancel for that exact reason)
Same.
I’m sure there’s a more elegant way to say this, but OpenCode feels like an open source Claude Code, while pi feels like an open source coding agent.
> Sessions are stored as trees
that is actually really nice
Pi is a great project, and the lightweight Agent development is really recommended to refer to Pi's implementation method.
Pi is good stuff and refreshingly simple and malleable.
I used it recently inside a CI workflow in GitLab to automatically create ChangeLog.md entries for commits. That + Qwen 3.5 has been pretty successful. The job starts up Pi programatically, points it at the commits in question, and tells it to explore and get all the context it needs within 600 seconds... and it works. I love that this is possible.
I love OpenCode! I wrote a plugin that adds two tools: prune and retrieve. Prune lets the LLM select messages to remove from the conversation and replace with a summary and key terms. The retrieve tool lets it get those original messages back in case they're needed. I've been livestreaming the development and using it on side projects to make sure it's actually effective... And it turns out it really is! It feels like working with an infinite context window.
https://www.youtube.com/live/z0JYVTAqeQM?si=oLvyLlZiFLTxL7p0
Hey I built that into my harness! http://github.com/computerex/z
Long tool outputs/command outputs everything in my harness is spilled over to the filesystem. Context messages are truncated and split to filesystem with a breadcrumb for retrieving the full message.
Works really well.
The infinite context window framing is the right way to think about it. Running inside Claude Code continuously, the prune step matters more than retrieve in practice — most of what gets dropped stays dropped. More useful is being deliberate about what goes in at the start of each loop iteration rather than managing what comes out at the end.
That doesn't sound all that useful to be honest and would likely increase costs overall due to the hit to prompt caching by removing messages
> would likely increase costs overall
Assuming you pay per token, which seems like a really strange workflow to lock yourself into at this point. Neither paid monthly plans nor local models suffer from that issue.
I tried once to use APIs for agents but seeing a counter of money go up and eventually landing at like $20 for one change, made it really hard to justify. I'd rather pay $200/month before I'd be OK with that sort of experience.
Yes I use the $200 per month plan for Claude Code and it's amazing
I assume the usage varies based on prompt caching, but I could be wrong. Why would you assume prompt caching would have zero effect on the subscription usage?
The $20-per-change problem is a workflow problem, not a pricing problem. Batching work into larger well-scoped sessions rather than interactive back-and-forth changes the unit economics significantly. Most people use these tools like a terminal — one command at a time — which is the worst possible cost profile.
Have a look how pi.dev implements /tree. Super useful
That borks the cache and costs you more.
Seems interesting, but at a glance I can't find a repo or a package manager download for this. Have you made it available anywhere?
I found the opencode fork repo, but no plugin seems available so far
https://github.com/Vibecodelicious/opencode
I’ve been extraordinarily productive with this, their $10 Go plan, and a rigorous spec-driven workflow. Haven’t touched Claude in 2 months.
I sprinkle in some billed API usage to power my task-planner and reviewer subagents (both use GPT 5.4 now).
The ability to switch models is very useful and a great learning experience. GLM, Kimi and their free models surprised me. Not the best, not perfect, but still very productive. I would be a wary shareholder if I owned a stake in the frontier labs… that moat seems to be shrinking fast.
> Moat seems to be shrinking fast.
It's been a moving target for years at this point.
Both open and closed source models have been getting better, but not sure if the open source models have really been closing the gap since DeepSeek R1.
But yes: If the top closed source models were to stop getting better today, it wouldn't take long for open source to catch up.
The moat is having researchers that can produce frontier models. When OpenCode starts building frontier models, then I'd be worried; otherwise they're just another wrapper
Of course, my point is that these trailing models are close behind, and cost me a lot less, and work great with harnesses like OpenCode.
"OpenCode Go" (a subscription) lets you use lots of hosted open-weights frontier AI models, such as GLM-5 (currently right up there in the frontier model leaderboards) for $10 per month.
GLM is benchmaxxed, leaderboards don't mean much anymore
Also most of the development experience is in the harness, the models aren’t as important anymore
Can you talk more about how you leverage higher quality models for the stuff that counts? Anywhere I can read more on the philosophy of when to use each?
Sure happy to share. It’s been trial and error, but I’ve learned that for agents to reliably ship a large feature or refactor, I need a good spec (functional acceptance criteria) and I need a good plan for sequencing the work.
The big expensive models are great at planning tasks and reviewing the implementation of a task. They can better spot potential gotchas, performance or security gaps, subtle logic and nuance that cheaper models fail to notice.
The small cheap models are actually great (and fast) at generating decent code if they have the right direction up front.
So I do all the spec writing myself (with some LLM assistance), and I hand it to a Supervisor agent who coordinates between subagents. Plan -> implement -> review -> repeat until the planner says “all done”.
I switch up my models all the time (actively experimenting) but today I was using GPT 5.4 for review and planning, costing me about $0.4-$1 for a good sized task, and Kimi for implementation. Sometimes my spec takes 4-5 review loops and the cost can add up over an 8 hour day. Still cheaper than Claude Max (for now, barely).
Each agent retains a fairly small context window which seems to keep costs down and improves output. Full context can be catastrophic for some models.
As for the spec writing, this is the fun part for me, and I’ve been obsessing over this process, and the process of tracking acceptance criteria and keeping my agents aligned to it. I have a toolkit cooking, you can find in my comment history (aiming to open source it this week).
How are you managing context?
I'm building a full stack web app, simple but with real API integrations with CC.
Moving so fast that I can barely keep a hold on what I'm testing and building at the same time, just using Sonnet. It's not bad at all. A lot of the specs develop as I'm testing the features, either as an immediate or a todo / gh issue.
How can you manage an agentic flow?
I wrote something about that: https://www.stavros.io/posts/how-i-write-software-with-llms/
I don't use it for coding but as an agent backend. Maybe opencode was thought for coding mainly, but for me, it's incredibly good as an agent, especially when paired with skills, a fastapi server, and opencode go(minimax) is just so much intelligence at an incredibly cheap price. Plus, you can talk to it via channels if you use a claw.
I see great potential in this use case, but haven’t found that many documented cases of people doing this.
Do you have resources you can point to / mind sharing your setup? What were the biggest problems / delights doing this?
By "agent" you mean what?
Coding is mostly "agentic" so I'm bit puzzled.
It's defined in opencode docs, but it's an overall cross industry term for custom system prompt with it's own permissions:
https://opencode.ai/docs/agents/
I'd really like to get more clarification on offline mode and privacy. The github issues related to privacy did not leave a good feeling, despite being initially excited. Is offline mode a thing yet? I want to use this, but I don't want my code to leave my device.
Related https://github.com/anomalyco/opencode/issues/10416
The only thing I'm wondering is if they have eval frameworks (for lack of a better word). Their prompts don't seem to have changed for a while and I find greater success after testing and writing my own system prompts + modification to the harness to have the smallest most concise system prompt + dynamic prompt snippets per project.
I feel that if you want to build a coding agent / harness the first thing you should do is to build an evaluation framework to track performance for coding by having your internal metrics and task performance, instead I see most coding agents just fiddle with adding features that don't improve the core ability of a coding agent.
You can't write your system prompt in opencode, there's no API to override the default anthropic.txt as far as I'm aware.
I considered creating a PR for that, but found that creating new agents instead worked fine for me.
I've forked it locally, to be honest I haven't merged upstream in a while as I haven't seen any commits that I found relevant and would improve my usage, they seem to work on the web and desktop version which I don't use.
The changes I've made locally are:
- Added a discuss mode with almost on tools except read file, ask tool, web search only based no heuristics + being able to switch from discuss to plan mode.
Experiments:
- hashline: it doesn't bring that much benefit over the default with gpt-5.4.
- tried scribe [0]: It seems worth it as it saves context space but in worst case scenarios it fails by reading the whole file, probably worth it but I would need to experiment more with it and probably rewrite some parts.
The nice thing about opencode is that it uses sqlite and you can do experiments and then go through past conversation through code, replay and compare.
[0] https://github.com/sibyllinesoft/scribe
> You can't write your system prompt in opencode
Now I just started looking into OpenCode yesterday, but seems you can override the system prompts by basically overloading the templates used in for example `~/.opencode/agents/build.md`, then that'd be used instead of the default "Build" system prompt.
At least from what I gathered skimming the docs earlier, might not actually work in practice, or not override all of it, but seems to be the way it works.
I wish the team would be more responsive to popular issues - like inability to provide a dynamic api key helper like claude has. This one even has a PR open: https://github.com/anomalyco/opencode/issues/1302
i've been using this as my primary harness for llama.cpp models, Claude, and Gemini for a few months now. the LSP integration is great. i also built a plugin to enable a very minimal OpenClaw alternative as a self modifying hook system over IPC as a plugin for OpenCode: https://github.com/khimaros/opencode-evolve -- and here's a deployment ready example making use of it which runs in an Incus container/VM: https://github.com/khimaros/persona
Very cool! I have been using opencode, as almost everybody else in the lab is using codex. I found the tools thing inside your own repo amazing but somehow I could not get it to reliably get opencode to write its own tools. Seems also a bit scary as there is pretty much not much security by default. I am using it in a NixOS WSL2 VM
You could try something like this https://github.com/andersonjoseph/jailed-agents
I'm actually moving to containerised isolation. I realised the agents waste too much time trying to correctly install dependencies, not unlike a normal nixos user.
I've used both. I stuck with Claude Code, the ergonomics are better and the internals are clearly optimized for Opus which I use daily, you can feel it. That said OpenCode is still a very good alternative, well above Codex, Gemini CLI or Mistral Vibe in my experience.
Dax post on x:
"we see occasional complaints about memory issues in opencode
if you have this can you press ctrl+p and then "Write heap snapshot"
Upload here: https://romulus.warg-snake.ts.net/upload
Original post:https://x.com/i/status/2035333823173447885
What would be the advantage using this over say VSCode with Copilot or Roo Code? I need to make some time to compare, but just curious if others have a good insight on things.
In terms of output, it's comparable. In terms of workflow, it suits my needs a lot more as a VIM terminal user.
I started out using VSCode with their Claude plugin; it seemed like a totally unnecessary integration. A better workflow seems to just run Claude Code directly on my machine where there are fewer restrictions - it just opens a lot more possibilities on what it can do
Aren’t those in-editor tools? Opencode is a CLI
Ok I get it now, same with the vim comment above, it seems VSCode has the more IDE setup while OpenCode is giving the vim nerdtree vibe? I'll have to take a look, it makes sense to possibly have both for different use cases I guess.
No that’s not it. Opencode is a pure terminal app, your interaction is by typing prompts and slash commands. You can also script prompts to it.
There are probably IDE plugins that feed prompts or context in based on your interaction with the editor.
Stupid question, but are there models worth using that specialize in a particular programming language? For instance, I'd love to be able to run a local model on my GPU that is specific to C/C++ or Python. If such a thing exists, is it worth it vs one of the cloud-based frontier models?
I'm guessing that a model which only covers a single language might be more compact and efficient vs a model trained across many languages and non-programming data.
Months ago I tested a concept revolving this issue and made a weird MCP-LSP-LocalLLM hybrid thing that attempts to enhance unlucky, fast changing, or unpopular languages (mine attempts with Zig)
Give it a look, maybe it could inspire you: https://github.com/fulgidus/zignet
Bottom-line: fine-tuning looks like the best option atm
I'm currently experimenting with (trying to) fine tune Qwen3.5 to make it better at a given language (Nim in this case); but I am quite bad at this, and honestly am unsure if it's even really fully feasible at the scale I have access to. Certainly been fun so far though, and I have a little Asus GX10 box on the way to experiment some more!
Been playing around with fine-tuning models for specific languages as well (Clojure and Rust mostly), but the persistent problem is high quality data sets, mostly I've been generating my own based on my own repositories and chat sessions, what approach are you taking for gathering the data?
Have you looked on HF? Here's one that is fine tuned on Rust https://huggingface.co/Fortytwo-Network/Strand-Rust-Coder-14...
Yeah I have a body of work of 15 years, plus I’ve been building labelled sets from open source (the source code isn’t quite enough on its own)
Now I’m using that to generate synthetic sets and clean it up, but man I’m struggling hah. Fun though.
My own experience trying many different models is that general intelligence of the model is more important.
If you want it to stick to better practices you have to write skills, provide references (example code it can read), and provide it with harnessing tools (linters, debuggers, etc) so the agent can iterate on its own output.
I'd be interested in this too. I think that's what post-training can achieve but I've never looked into it.
OpenCode works awesome for me. The BigPickle model is all I want. I do not throw some large work at the agent that requires lot of reasoning, thinking or decision making. It's my role to chop the work down to bite-size and ask the fantastic BigPickle to just do the damn coding or bit of explaining. It works very well with interactive sessions with small tasks. Not giving something to work over night.
I used Claude with paid subscription and codex as well and settled to OpenCode with free models.
Can someone explain how Claude Code can instantly determine what file I have open and what lines I have selected in VS Code even if it's just running in a VS Code terminal instance, yet I cannot for the life of me get OpenCode to come anywhere close to that same experience?
The OpenCode docs suggest its possible, but it only works with their extension (not in an already open VS Code terminal) with a very specific keyboard shortcut and only barely at that.
I started my own fully containerized coding agent 100% in Go recently. Looking for testers: https://github.com/aduermael/herm
i like the containerization idea. i wish you used the opencode cli as the actual underlying agent.
What do you like particularly about the opencode cli?
OpenCode is the almost good IDE I need.
What does well: helps context switching by using one window to control many repos with many worktrees each.
What can do better? It's putting AI too much in control? What if I want to edit a function myself in the workspace I'm working on? or select a snippet and refer that in the promp? without that I feel it's missing a non-negotiable feature.
Do you think the design direction of “chat first” is compatible with editor first? I don’t know if any tools do both well. Seems like a fork in the road, design wise.
Since this is blowing up, gonna plug my opencode/claude-code plugin that allows you to annotate LLMs plans like a Google doc with strikethroughs, comments, etc. and loop with your agent until you're happy with the plan.
https://github.com/ndom91/open-plan-annotator
The decision to build this as a TUI rather than a web app is interesting. Terminal-native tools tend to get out of the way and let you stay in flow -- curious how the context management works when you have a large codebase, do you chunk by file or do something smarter?
It’s both! The core is implemented as a server and any UI (the TUI being one) can connect to it.
It’s actually “dumber” than any of your suggestions - they just let the agent explore to build up context on its own. “ls” and “grep” are among the most used discovery tools. This works extraordinarily well and is pretty much the standard nowadays because it lets the agent be pretty smart about what context it pulls in.
That's my favorite CLI agent, over codex, claude, copilot and qwen-code.
It has beautified markdown output, much more subagents, and access to free models. Unlike claude and codex. Best is opencode with GitHub opus 4.6, but the fun only lasts for a day, then you're out of tokens for a month.
This replaced Aider for me a couple months back.
I use it with Qwen 3.5 running locally when my daily limits run out on my other subscriptions.
The harness is great. Local models are just slow enough that the subscription models are easier to use. For most of my tasks these days, the model's capability is sufficient; it is just not as snappy.
Could you say more about the differences between Aider and OpenCode?
I briefly dabbled with Aider some months back but never got any real work done with it. Without installing each one of these new tools I'm having trouble grokking what is changing about them that moves the LLM-assisted software dev experience forward.
One thing I like with Aider is the fact that I can control the context by using /add explicitly on a subset of files. Can you achieve the same wit OpenCode ?
Yes, using at @ sign; CC and Codex use this too.
I feel like I haven't really needed to manage context with newer models. Rarely I will restart the session to clear out out.
I'm curious: I'venever touched cloud models beyond a few seconds. I run a AMD395+ with the new qwen coder. Is there any intelligence difference, or is it just speed and context? At 128GB, it takes quite awhile before getting context wall.
There's a difference in intelligence. However for 90% of what I'm doing I don't really need it. The online models are just faster.
I just did a one hour vibe session today, ripping out a library dependency and replacing it with another and pushing the library to pypi. I should take my task list and let the local model replicate the work and see how it works out.
The security concerns here are real but not unique to OpenCode. Most AI coding agents have the same fundamental problem: they need broad file system access to be useful, but that access surface is also the attack surface. The config-from-web issue is particularly bad because it's essentially remote code execution through prompt injection.
What I'd want to see from any of these tools is a clear permissions model — which files the agent can read vs write, whether it can execute commands, and an audit log of what it actually did. Claude Code's hooks system at least gives you deterministic guardrails before/after agent actions, but it's still early days for this whole category.
I created a tool for this: https://github.com/Use-Tusk/fence
Same thoughts - I wanted a "permission manager" that defines a set of policies agnostic to coding agents. It also comes with "monitor mode" that shows operations blocked, but not quite an audit log yet though.
This is another one of OpenCode’s current weak points in the security complex: They consider permissions a “UX feature” rather than actual guardrails. The reasoning is that you’re giving the agent access to the shell, so it’ll be able to sidestep everything.
This is of course a cop-out: They’re not considering the case in which you’re not blindly doing that.
Fun fact: In the default setup, the agent can fully edit all of the harnesses files, including permissions and session history. So it’s pretty trivial for it to a) escalate privileges and then even b) delete evidence of something nefarious happening.
It’s pretty reckless and even pretty easy to solve with chroot and user permissions. There just has been (from what I see currently) relatively little interest from the project in solving this issue.
Granted, I just started playing around with OpenCode (but been using Codex and Claude Code since they were initially available, so not first time with agents), but anyways:
> they need broad file system access to be useful, but that access surface is also the attack surface
Do they? You give them access to one directory typically (my way is to create a temporary docker container that literally only has that directory available, copied into the container on boot, copied back to the host once the agent completed), and I don't think I've needed them to have "broad file system access" at any point, to be useful or otherwise.
So that leads me to think I'm misunderstanding either what you're saying, or what you're doing?
This is the way. If you’re not running your agent harness/framework in a container with explicit bind mounts or copy-on-build then you’re doing it wrong. Whenever I see someone complain about filesystem access and sequirity risk it’s a clear signal of incompetence imo.
> container with explicit bind mounts
Someone correct me if I'm wrong, but if you're doing bind-mounts, ensure you do read-only, if you're doing bi-directional bind mounts with docker, the agent could (and most likely know how to) create a symlink that allows them to browse outside the bind mount.
That's why I explicitly made my tooling do "Create container, copy over $PWD, once agent completes, copy back to $PWD" rather than the bind-mount stuff.
> create a symlink that allows them to browse outside the bind mount Could you reproduce that? iiuc the symlink that the agent creates should follow to the path that's still inside the container.
Codex has some OS-level sandboxing by default that confines its actions to the current workspace [1].
OpenCode has no sandboxing, as far as I know.
That makes Codex a much better choice for security.
[1] https://developers.openai.com/codex/concepts/sandboxing
Greywall/Greyproxy aims to address this. I haven't tried it yet though.
https://greywall.io/
Or just run it in your VPS?
I built a product solving this problem about a year ago, basically a serverless, container-based, NATed VScode where you can eg "run Claude Code" (or this) in your browser on a remote container.
There's a reason I basically stopped marketing it, Cursor took off so much then, and now people are running Claude/Codex locally. First, this is something people only actually start to care about once they've been bitten by it hard enough to remember how much it hurt, and most people haven't got there yet (but it will happen more as the models get better).
Also, the people who simultaneously care a lot about security and systems work AND are AI enthusiasts AND generally highly capable are potentially building in the space, but not really customers. The people who care a lot about security and systems work aren't generally decision makers or enthusiastic adopters of AI products (only just now are they starting to do so) and the people who are super enthusiastic about AI generally aren't interested in spending a lot of time on security stuff. To the extent they do care about security, they want it to Just Work and let them keep building super fast. The people who are decision makers but less on the security/AI trains need to this happen more, and hear about the problem from other executives, before they're willing to spend on it.
To the extent most people actualy care about this, they still want to Just Work like they do now and either keep building super fast or not thinking about AI at all. It's actually extremely difficult to give granular access to agents because the entire point is them acting autonomously or keeping you in a flow state. You either need to have a really compatible threat model to doing so (eg open source work, developer credentials only used for development and kept separate from production/corp/customer data), spend a lot of time setting things up so that agents can work within your constraints (which also requires a willingness to commit serious amounts of time or resources to security, and understanding of it), or spend a lot of time approving things and nannying it.
So right now everybody is just saying, fuck it, I trust Anthropic or Microsoft or OpenAI or Cursor enough to just take my chances with them. And people who care about security are of course appalled at the idea of just giving another company full filesystem access and developer credentials in enterprises where the lack of development velocity and high process/overhead culture was actually of load-bearing importance. But really it's just that secure agentic development requires significant upfront investment in changing the way developers work, which nobody is willing to pay for yet, and has no perfect solutions yet. Dev containers were always a good idea and not that much adopted either, btw.
It takes a lot more investment in actually providing good permissions/security for agent development environments still too, which even the big companies are still working on. And I am still working on it as well. There's just not that much demand for it, but I think it's close.
I‘m a big fan of OpenCode. I’m mostly using it via https://github.com/prokube/pk-opencode-webui which I built with my colleague (using OpenCode).
Open Code has been the backbone of our entire operation (we used Claude Code before it, and Cursor before that).
Hugely grateful for what they do.
What caused the switch? Also, are you still trying to use Claude models in OpenCode?
Sorry, I missed part of your question:
What caused the switch was that we're building AI solutions for sometimes price-conscious customers, so I was already familiar with the pattern of "Use a superior model for setting a standard, then fine-tuning a cheaper one to do that same work".
So I brought that into my own workflows (kind of) by using Opus 4.6 to do detailed planning and one 'exemplar' execution (with 'over documentation' of the choices), then after that, use Opus 4.6 only for planning, then "throw a load of MiniMax M2.5s at the problem".
They tend to do 90% of the job well, then I sometimes do a final pass with Opus 4.6 again to mop up any issues, this saves me a lot of tokens/money.
This pattern wasn't possible with Claude Code, thus my move to Open Code.
You can access anthropic models with subscription pricing via a copilot license.
Pretty sure that's against TOS.
Edit: it's not. https://github.blog/changelog/2026-01-16-github-copilot-now-...
They must be eating insane amounts of $$$ for this. I wouldn't expect it to last
No, Claude on GitHub Copilot is billed at 3X the usage rate of the other models e.g. GPT-5.4 and you get an extremely truncated context window.
See https://models.dev for a comparison against the normal "vanilla" API.
Yes I regularly plan in Opus 4.6 and execute in “lesser” models ie MiniMax
I've used it but recently moved back to plain claude code. We use claude at the company and weirdly the experience has become less and less productive using opencode. I'm a bit sad about it as it was the first experience that really clicked and got great results out of. I'm actually curious if Anthropic knows which client is used and if they negatively influence the experience on purpose. It's very difficult to prove because nothing about this is exact science.
I think Anthropic just highly RL’s their model to work best with it’s Claude Code’s particular ways of going about things.
All the background capability Claude code now has makes things way more complex and I saw a meaningful improvement with 4.6 versus 4.5, so imagine other harnesses will take time to catch up.
I tried to use it but OpenCode won't even open for me on Wayland (Ubuntu 24.04), whichever terminal emulator I use. I wasn't even aware TUI could have compatibility issues with Wayland
> I wasn't even aware TUI could have compatibility issues with Wayland
They shouldn't, as long as your terminal emulator doesn't. Why do you think it's Wayland related?
Strange. I've been running it on several different ubuntu 24 04 machines with standard terminal with no issues.
This shouldn't be related to Wayland.
It works perfectly fine on Niri, Hyprland and other Wayland WMs.
What problem do you have?
Blank screen, and it's referenced in the official docs as potentially a Wayland issue https://opencode.ai/docs/troubleshooting/#linux-wayland--x11...
I didn't dig further
Seems like there's many github issues about this actually
https://github.com/anomalyco/opencode/issues/14336
https://github.com/anomalyco/opencode/issues/14636
https://github.com/anomalyco/opencode/issues/14335
I've run into that issue while developing https://soloterm.com.
If you respond twice to their theme query probes, the whole thing bricks. Or if you're slightly out of order. It's very delicate.
Definitely not Wayland related, or so I doubt. I'm on wayland and never had any issues, and it's a TUI, where the terminal emulator does or does not do GPU work. What led you to that conclusion?
This issue: https://github.com/anomalyco/opencode/issues/9505
And then the official docs: https://opencode.ai/docs/troubleshooting/#linux-wayland--x11...
> Linux: Wayland / X11 issues
> On Linux, some Wayland setups can cause blank windows or compositor errors.
> If you’re on Wayland and the app is blank/crashing, try launching with OC_ALLOW_WAYLAND=1.
> If that makes things worse, remove it and try launching under an X11 session instead.
OC_ALLOW_WAYLAND=1 didn't work for me (Ubuntu 24.04)
Suggesting to use a different display server to use a TUI (!!) seems a bit wild to me. I didn't put a lot of time into investigating this so maybe there is another reason than Wayland. Anyway I'm using Pi now
https://github.com/anomalyco/opencode/issues/14636
That issue points out that it is probably a dependency problem.
The other problem is that they let a package manager block the UI and either swallow hard errors or unable to progress on soft errors. The errors are probably (hopefully) in some logs.
A dev oriented TUI should report unrecoverable errors on screen or at least direct you to the logs. It's not easy to get right, but if you dare to do it isn't rocket science either. They didn't dare.
There's a desktop app which uses Tauri. Unrelated to the TUI.
That is wild. Thanks for the info.
Probably vibe coded
Some of the more recent versions of it had memory leaks so you couldn't just leave it on in the background
I had to abandon it because of the memory leak, it would fill up all my memory in a matter of minutes. The devs don't seem to pay it much attention: https://github.com/anomalyco/opencode/issues/5363
I've been using opencode for a few months and really like it, both from a UX and a results perspective.
It started getting increasingly flaky with Anthropic's API recently, so I switched back to Claude Code for a couple of days. Oh my, what a night and day difference. Tokens, MCP use, everything.
For anyone reading at OpenAI, your support for OpenCode is the reason I now pay you 200 bucks a month instead.
I've been paying OpenAI 200 bucks a month for what feels like forever by now, but used OpenCode for the first time yesterday, been using Codex (and Claude Code from time to time, to see if they've caught up with Codex) since then.
But I don't use MCP, don't need anything complicated, and not sure what OpenCode actually offers on top. The UI is slightly nicer (but oh so much heavier resource usage), both projects source code seems vibecoded and the architecture is held together with hopes and dreams, but in reality, minor difference really.
Also, didn't find a way in OpenCode to do the "Fast Mode" that Codex has available, is that just not possible or am I missing some setting? Not Codex-Spark but the mode that toggles faster inference.
Have they "squatted" the name? It's the same name for the digital Sovereignty initiative in Germany
https://opencode.de/
If it was a somewhat unique name, then yeah maybe. But "opencode" is probably as generic as you could make it, hard to claim to be "squatting" something so well used already... Earliest project on GitHub named "opencode" seems to date back to 2010, but I'm sure there are even earlier projects too: https://github.com/search?q=opencode&type=repositories&s=upd...
you'll be surprised the name was actually a controversy on x/twitter since opencode was originally another dev's idea who joined the charmcli team. they wanted to keep that name but dax somehow (?) ended up squatting it. the charmcli team has renamed their tool to "crush" which matches their other tools a lot better than "opencode"
oh yea that whole drama turned me off from this project. dax guy seems to be some sort of grumpy cat.
I'd love for all these tools to standardise on the structure of plugins / skills / commands / hooks etc., so I can swap between them to compare without feeling handicapped!
I wish they would add back support for anthropic max/pro plans via calling the claude cli in -p mode. As I understand thats still very much allowed usage of claude code cli (as you are still using claude cli as it was intended anyway and fixes the issue of cache hits which I believe were the primary reason anthropic sent them the c&d). I love the UX from OpenCode (I loved setting it up in web mode on my home server and code from the web browser vs doing claude code over ssh) but until I can use my pro/max subscription I can't go back, the API pricing is way too much for my third world country wallet.
They had that?! I saw that some people wrote skills and plugins to call claude cli and gemini cli to still be able to use the subscription. I would also wish that this was supported out of the box, something similar to goose cli providers or acp providers (https://block.github.io/goose/docs/guides/acp-providers). But I don't want to spend testing yet another agent harness or change the workflow when I somewhat got used to one way of working on things (the churn is real).
I guess you could look into my plugin for that use case of CC inside opencode: https://github.com/unixfox/opencode-claude-code-plugin
Is there any initiative to port it to rust (or preferably golang) and remove weird tracking/telemetry?
I guess golang is better since we need goroutines that will basically wait for i/o and api calls.
https://github.com/charmbracelet/crush ?
You can do it.
Anecdotal pros and one annoyance:
- GH copilot API is a first class citizen with access to multiple providers’ models at a very good price with a pro plan - no terminal flicker - it seems really good with subagents - I can’t see any terminal history inside my emacs vterm :(
Question: How do we use Agents to Schedule and Orchestrate Farming and Agricultural production, or Manufacturing assembly machines, or Train rail transportation, or mineral and energy deposit discovery and extraction or interplanetary terraforming and mining, or nuclear reactor modulation, or water desalination automation, or plutonium electric fuel cell production with a 24,000 year half-life radiation decay, or interplanetary colonization, or physics equation creation and solving for faster-than-light travel?
- With love The Official Pink Eye #ThereIsNoOther
I don’t know why people use opencode. I mean it’s better than copilot but it’s pretty terrible in general when there are better options available.
Rather than listing what tooling you think is worse than OpenCode, wouldn't it make sense to list what tooling you think is better?
Amp. CC. Codex. They all have a better harness.
Interested if these TUI agent systems have any unique features. They all seem to be shipping the standard agent swarm/bg agent approach.
"This guy is coding everything in the terminal, he must be really good!"
I’ve been having a very good experience with OpenCode and Kimi 2.5. It’s fast enough and smart enough that I can stay in a state of flow.
The reason I'm switching again next month, from Claude back to OpenAI.
Yeah, support the company that promised to help your government illegally mass surveil and mass kill people, because they support a use case slightly better than the non-mass-murdering option.
Both of them promised to help their government illegally mass surveil and mass kill people. One of them just didn't want it done to US citizens.
I'm not a US citizen, so both companies are the same, as far as I'm concerned.
You are absolutely correct that both are evil ... as are most corporations.
Still, I feel like "will commit illegal mass murder against their own citizens" is a significant enough degree more evil. I think lots of corporations will help their government murder citizens of other countries, but very few would go so far as to agree to murder their own (fellow) citizens ... just to get a juicy contract.
I see your viewpoint but, to me, "both will happily murder you but one is better because they won't murder ME!" isn't very compelling. Like, I get it, but also it changes nothing for me. They're both bad.
It's not about "won't murder me" it's about "won't murder their own tribe". Humans are very tribal creatures, and we have all sorts of built-in societal taboos about betraying our tribe.
We also have taboos against betraying/murdering/whatever people of other tribes, but those taboos are much weaker and get relaxed sometimes (eg. in war). My point is, it takes significantly more anti-social (ie. evil) behavior to betray your own tribe, in the deepest way possible, than it does to do horrible things to other tribes.
This is just as much true for Russians murdering Ukranians as Ukranians murdering Russians, or any other conflict group: almost all Russians would consider a Russian who helps kill Russians to be more evil than a Russian who kills Ukranians (and vice versa).
Right, but I consider someone who'll murder exclusively other tribes to be infinitely closer to someone who'll murder their own tribe than to someone who won't murder anyone.
watching trump get elected twice; you can see why americanos have no problemos with mental backflips when choosing.
But you're still choosing evil when you could try local models
Will you send me an H100?
Are you doing something that actually demands it? Have you tried local models on either the mac or AMD395+?
I will be able to do something that demands it once I have it ;)
Most people who win the lottery are poor again within the decade.
Will you send me an AMD395+ or a new Mac that can handle the local models? That would probably be enough for me.
That a gross exaggeration. But to your point, I could say the same for almost any product I use from Big Tech, every laptop company I buy my hardware from, etc. I'm sure the same applies to you. I can't fight every vendor all the time. For now I pick what works best for my use case.
> mass kill people
https://www.washingtonpost.com/technology/2026/03/04/anthrop...
You're right, Anthropic shouldn't have even taken a moral stance here at all. They should have just gone full send and allowed everything, because there will never be satisfying some people. Why even try?
OpenCode is an awesome tool.
Many folks from other tools are only getting exposed to the same functionality they got used to, but it offers much more than other harnesses, especially for remote coding.
You can start a service via `opencode serve`, it can be accessed from anywhere and has great experience on mobile except a few bugs. It's a really good way to work with your agents remotely, goes really well with TailScale.
The WebUI that they have can connect to multiple OpenCode backends at once, so you may use multiple VPS-es for various projects you have and control all of them from a single place.
Lastly, there's a desktop app, but TBH I find it redundant when WebUI has everything needed.
Make no mistakes though, it's not a perfect tool, my gripes with it:
- There are random bugs with loading/restoring state of the session
- Model/Provider selection switch across sessions/projects is often annoying
- I had a bug making Sonnet/Opus unusable from mobile phone because phone's clock was 150ms ahead of laptop's (ID generation)
- Sometimes agent get randomly stuck. It especially sucks for long/nested sessions
- WebUI on laptop just completely forgot all the projects at one day
- `opencode serve` doesn't pick up new skills automatically, it needs to be restarted
Interesting timing — I've been building on Cloudflare Workers with edge-first constraints, and the resource footprint of most AI coding tools is striking by comparison. A TypeScript agent that uses 1GB+ RAM for a TUI feels like the wrong abstraction. The edge computing model forces you to think differently about state, memory, and execution — maybe that's where lighter agentic tools will emerge.
aider.chat was my entry to agentic coding. OpenCode followed. Not looking back.
Can anyone clarify how this compares with Aider?
Being able to assign different models to subagents is the feature I've been wanting. I use Claude Code daily and burning the same expensive model on simple file lookups hurts. Any way to set default model routing rules, or is it manual per task?
In Claude Code, you can use the (undocumented command) "/model opusplan" to use opus for planning and sonnet for development
With OpenCode, I've found that I can do this by defining agents, assigning each agent a specifically model to use. Then K manually flip to that agent when I want it or define some might rules in my global AGENTS.nd file to gives some direction and OpenCode will automatically subtask out to the agent, which then forces the use of the defined model.
If you are doing data engineering, there is a specific fork of Open Code with an agentic harness for data tasks: https://github.com/AltimateAI/altimate-code
The maintaining team is incredibly petty though. Tantrums when they weren't allowed to abuse Claude subscriptions and had to use the API instead. They just removed API support entirely.
> we did our best to convince anthropic to support developer choice but they sent lawyers
https://x.com/i/status/2034730036759339100
Anthropic has zero problems with API billing, there's no chance they told him to rip that out.
Reading through his X comments and GitHub comments he is behaving immaturely. I don't trust what he's saying here. Ripping out Claude API support was just throwing a tantrum. Weird given his age - he's old enough to be more mature.
‘abuse’. The same rate limits apply, the requests still go to the same endpoints.
Even as a CC user I’m glad someone is forcing the discussion.
My prediction: within two years ‘model neutrality’ will be a topic of debate. Creating lock-in through discount pricing is anti-competitive. The model provider is the ISP; the tool, the website.
> The same rate limits apply, the requests still go to the same endpoints.
That is not the point. That is a mere technicality.
You signed a contract. If you don't ignore the terms of the contract to use the product in a way that is explicitly prohibited, you're abusing the product. It is as simple as that.
They offer a separate product (API) if you don't like the terms of the contract.
Also, if you really want to get technical: the limits are under the assumption that caching works as intended, which requires control of the client. 3P clients suck at caching and increase costs. But that is not the overarching point.
> Creating lock-in through discount pricing is anti-competitive.
Literally everyone does this. OpenAI is doing this with Codex, far more than Anthropic is. It's not great but players much bigger than Anthropic are using discount pricing to create an anti-competitive advantage.
> But that is not the overarching point.
Because that could be easily resolved by factoring % cache hits into the usage limits.
> Literally everyone does this.
Never a strong justification, much as I like Anthropic in general.
Why is the 'Mercedes gas station' selling gas 85% cheaper but only to Mercedes drivers?
Why is the 'Apple electric company' selling cheaper electricity to households with Apple devices?
They're not the strongest analogies, I'll admit, but that's what it smells like to me.
> Because that could be easily resolved by factoring % cache hits into the usage limits.
Absolutely not, you are not thinking from a product perspective at all.
You might not want to capture cache % hits in usage limits because there may be some edge cases you want to support that have low hits even with an optimized client. Maybe your caching strategy isn't perfect yet, so you don't count hits to keep a good product experience going.
OSS clients that freeload on the subscription break your ability to support these use cases entirely. Now you have to count cache hits at the expense of everyone else. It is a classic case of some people ruining the experience for everyone.
> Why is the 'Apple electric company' selling cheaper electricity to households with Apple devices?
Why does Netflix not let you use your OSS hacked client of choice with your subscription?
> Literally everyone does this. OpenAI is doing this with Codex, far more than Anthropic is.
And yet, OpenAI have publicly said they welcome OpenCode users to use their subscription package. So how are they being anti-competitive "far more" than Anthropic?
> And yet, OpenAI have publicly said they welcome OpenCode users to use their subscription package.
It's a PR stunt. They'll eat the costs for a bit, once they've cornered the market they'll do the same thing as Anthropic.
Agree, I find it hard to support them when the team is so obnoxious on X.
API support was never removed
For some reason opencode does not have option to disable streaming http client, which renders some inference providers unavailable...
There's also a request and a PR to add such option but it was closed due to "not adhering to community standards"
I had been using open code and admire they effort to create something huge and help a lot of developers around the world, connecting LLM our daily work without use a browser!
The MCP (Model Context Protocol) support is what makes this interesting to me. Most coding agents treat the file system and shell as the only surfaces — MCP opens up the possibility of connecting to any structured data source or API as a first-class tool without custom integration work each time.
Curious how the context window management works in practice. With large repos, the "what files to include" problem tends to dominate — does it have a strategy beyond embedding-based retrieval, or is that the main approach here?
Why is this upvoted again on hacker news this is an old thing
Because this site is basically dead for any other subject than vibecoding and AI agents.
I want to love this, but the "just install it globally, what could go wrong?" is simply not happening for an AI-written codebase. Open Source was never truly "you can trust it because everyone can vet it", so you had to do your due diligence. Now with AI code bases, that's "it might be open source, but no one actually knows how it works and only other AIs can check if it's safe because no one can read the code". Who's getting the data? No idea. How would you find out? I guess you can wireshark your network? This is not a great feeling.
why is this trending, we've been using it since its beta
Use it with zed
Does it support hybrid models, for e.g deep research by Model 1 vs faster response from Model2
Yes
I haven't been able to successfully get their CLI to reliably edit files when using local models, anybody else having the same problem?
I've been using opencode for months with codex. best combo I've tried so far
I reach for OpenCode + Kimi to save tokens on lower priority stuff and because it's quite fast on Fireworks AI.
I'm 90% sure Fireworks serves up quantized models.
OpenX is becoming a bit like that hindu symbol associated with well being..
Geminis cli is clearly a fork of it btw
No because Gemini CLI is slow and barely functioning.
It is clearly not. Why would you think so?
easily the best one
yep
Claude Code subscription is still usable, but requires plugin like https://github.com/griffinmartin/opencode-claude-auth
Or just don't abuse the subscription and use the API instead.
Sure but will you get banned by anthropic anyway?
Things that make an an OpenCode fanboy 1. OpenCode source code is even more awesome. I have learned so much from the way they have organized tools, agents, settings and prompts. 2. models.dev is an amazing free resource of LLM endpoints these guys have put together 3. OpenCode Zen almost always has a FREE coding model that you can use for all kinds of work. I recently used the free tier to organize and rename all my documents.
What is this pervasive use of yield*? I've been writing Typescript for quiet some time and I've never seen yield* used in this way: https://github.com/anomalyco/opencode/blob/dev/packages/open...
They are using effect-ts which uses yield as their equivalent to haskells Do notation.
I personally like this better than claude code
Do they have any sandbox out of the box?
I use bubblewrap. This ensures it only has access to the current working directory and its own configuration. No ability to commit or push (since it doesn't have access to ssh keys) or try to run aws commands (no access to awscli configuration) and so on. It can read anything from my .envrc, since it doesn't have access to direnv or the parent directory. You could lock down the network even further if you wanted to limit web searches.
I built Fence for this! https://github.com/Use-Tusk/fence
fence -t code -- opencode
nope - most folks wrap it in nono: https://nono.sh/docs/cli/clients/opencode
I’m happy with the one I built. (ZDX)
Honestly I was a Claude code only guy for a while. I switched to opencode and I’m not going back.
IMO, the web UI is a killer feature - it’s got just enough to be an agent manager - without any fluff. I run it on my remote VMs and connect over HTTP.
I feel like Anthropic really need to fork this for Claude Code or something. The render bugs in Claude Code drive me nuts.
OpenCode feels like the “open-source Copilot agent” moment the more control, hackability, and no black-box lock-in.
opus/sonnet 4.6 can be used in opencode with a github copilot subscription
Does github copilot ToS allow this?
They officially support OpenCode: https://github.blog/changelog/2026-01-16-github-copilot-now-...
This is very interesting. This could allow custom harnesses to be used economically with Opus. Depending on the usage limits, this may be cheaper than their API.
I don't see why not. It's just using the Github Copilot API.
OpenCode vs Aider vs Crush?
OpenCode, by reason of plugins alone, is better than all of them.
isn't this the one with default-on need code change to turn off telemetry?
No
https://github.com/anomalyco/opencode/issues/5554
https://www.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
?
You can scroll down literally two messages in the Github issue you linked:
> there isnt any telemetry, the open telemetry thing is if you want to get spans like the ai sdk has spans to track tokens and stuff but we dont send them anywhere and they arent enabled either
> most likely these requests are for models.dev (our models api which allows us to update the models list without needing new releases)
You should really look at the 2nd link, its much worse than telemetry..
> opencode will proxy all requests internally to https://app.opencode.ai
> There is currently no option to change this behavior, no startup flag, nothing. You do not have the option to serve the web app locally, using `opencode web` just automatically opens the browser with the proxied web app, not a true locally served UI.
> https://github.com/anomalyco/opencode/blob/4d7cbdcbef92bb696...
That is the address of their hosted WebUI which connects to an OpenCode server on your localhost. Would be nice if there was an option to selfhost it, but it is nowhere near as bad as "proxying all requests".
For open models with limited context, Swival works really well: https://swival.dev
If I wanted to switch from Claude Code to this - what openai model is comparable to opus 4.6? And is it the same speed or slower/faster? Thank you!
GPT 5.4 has been the winner this week. Last week Opus 4.6. You can use both in OpenCode.
5.4 kind of falls apart in big/large projects.
How does it compare to using GPT 5.4 inside Codex?
I used Codex for a long time. It's definitely better than Claude Code due to being open source, but opencode is nicer to use. Good hotkeys, plan/build modes, fast and easy model switching, good mcp support. Supports skills, is not the fastest but good enough.
Well not anymore with Claude pro…
It is "not supported" for two month at this point, yet somehow opencode + claude max is still my main workflow today
If you want faster, anything running on a Cerebras machine will do.
Never tried it for much coding though.
Outside of their (hard to buy) GLM 4.7 coding plans, it's also extremely expensive.
do you care about harness benchmarks or no?
Just a data point, I would need to use it for my workflows. I do have a monorepo with a root level claude.md, and project level claude.md files for backend/frontend.
thats not at all the question i asked
Love opencode!
I see it uses massive resources for TUI interfaces like 1GB+ of RAM.
I wonder why did they use Typescript and not a more resource efficient language like C, C++, Rust, Zig?
Since their code is generated by AI human preferences shouldn't matter much and AI is happy to work with any language.
I use this. I run it in a sandbox[0]. I run it inside Emacs vterm so it's really quick for me to jump back and forth between this and magit, which I use to review what it's done.
I really should look into more "native" Emacs options as I find using vterm a bit of a clunky hack. But I'm just not that excited about this stuff right now. I use it because I'm lazy, that's all. Right now I'm actually getting into woodwork.
[0] https://blog.gpkb.org/posts/ai-agent-sandbox/
The fact that I wasn’t able to link llama.cpp server locally without fuss kinda beats the whole open point. Open for proprietary APIs only?
I started with Codex, then switched to OpenCode, then switched to Codex.
OpenCode just has more bugs, it's incredibly derivative so it doesn't really do anything else than Codex.
The advantage of OpenCode is that it can use any underlying model, but that's a disadvantage because it breaks the native integration. If you use Opus + Claude Code, or Gpt-Codex + Codex App, you are using it the way it was designed to be used.
If you don't actually use different models, or plan to switch, or somehow value vendor neutrality strategically, you are paying a large cost without much reward.
This is in general a rule, vendor neutrality is often seen as a generic positive, but it is actually a tradeoff. If you just build on top of AWS for example, you make use of it's features and build much faster and simpler than if you use Terraform.
Codex is 15MB of memory per process. Just sayin'
You do not "write" code. Stop these euphemisms. It is an intellectual prosthetic for feeble minded people that plagiarizes code by written by others. And it connects to the currently "free" providers who own the means of plagiarizing.
There is nothing open about it. Please do not abuse the term "open" like in OpenBSD.
minus Claude login
nice
If you have to post something like this line already loser the plot
I only boot my windows 11 gaming machine for drm games that don’t work with proton. Otherwise it’s hot garbage
I fucking love OpenCode.
What I don't understand is that, if coding agents are making coding obsolete, why do these vibe coders not choose a language that doesn't set their users' compute resources on fire? Just vibe rust or golang for their cli tools, no one reviews code slop nowadays anyway /s.
I do not understand the insistence on using JavaScript for command line tools. I don't use rust at all, but if I'm making a vibe coded cli I'm picking rust or golang. Not zig because coding agents can't handle the breaking changes. What better test of agentic coders' conviction in their belief in AI than to vibe a language they can't read.
Just remember, OpenCode is sending telemetry to their own servers, even when you're using your own locally hosted models. There are no environment variables, flags, or other configuration options to disable this behavior.¹
At least you can easily turn off telemetry in Claude Code - just set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC to 1.
You can use Claude Code with llama.cpp and vLLM, too right out of the box with no additional software necessary, just point ANTHROPIC_BASE_URL at your inference server of choice, with any value in ANTHROPIC_API_KEY.
Some people think that Anthropic could disable this at any time, but that's not really true - you can disable automatic updates and back up and reuse native Claude Code binaries, ensuring Anthropic cannot change your existing local Claude Code binary's behavior.
With that said, I like the idea of an open source TUI agent that won't spy on me without my consent and no way to disable it much better than a closed source TUI agent that I can effectively neuter telemetry on, but sadly, OpenCode is not the former. It's just another piece of VC-funded spyware that's destined for enshittification.
¹https://github.com/anomalyco/opencode/blob/4d7cbdcbef92bb696...
Are you sure that endpoint is sending all traffic to opencode? I'm not familiar with Hono but it looks like a catch all route if none of the above match and is used to serve the front-end web interface?
You are correct, it is indeed a route for the web interface
updated post accordingly
That linked code is not used by the opencode agent instance though right? Looks related to their web server?
They don't. That is just the route for their WebUI, which is completely optional.
I've point thought about making things that just send garbage to any data collecting service.
You'd be surprised how useless datasets become with like 10% garbage data when you don't know which data is garbage
Does opencode still work if you blackhole the telemetry?
this is a big red flag
Sadly Antropic have blocked the usage of claude on it.
You can use Github Copilot and also use Claude that way.
No, they haven’t. You can use claude like any other model via API, you just can’t reuse your subscription token.
There’s plenty of options to get around that.
This is extremely cool; will download now and check it out. Thank you!