Would be cool to have a $5-10/month plan that only works off-peak, for people who want to do the occasional side project after work. Right now it's hard to justify anything but Copilot (because it's cheaper, offers the same models, and I'm nowhere near the usage limits).
I suspect that any GPU cycle not spent on inference will just be dedicated to training (which as I understand it can “soak up” essentially unlimited compute at constant value per token), and I’d not expect to see time-based billing until that changes.
Isn't this post an announcement of time-based billing? Just in a kind of indirect way (not billing, rather than billing).
Also, my (extremely naive) understanding is that at the cutting edge, hardware is diverging for training vs inference. That might not be true for Anthropic though.
Would be better if they simply made it free for open source developers. I can barely justify spending time on my hobby projects. If I paid for this, I'd be paying to work for them since they're using our data for training.
I canceled my plan today and wrote my reason as: now that I have a job again I don’t have the time or needs for the pro plan. If there was a $5 a month option, I would gladly take it to make use of Opus for my rare side ideas.
On my gamedev side project I have a ralph loop going on the $100 5x plan and it caps the session limit 4-5 times a day and hit the weekly limit in 3 days. Token usage is around $750 a week or $3000 a month according to "npx ccusage". I would have to be insane to pay that instead of $100.
I have the enterprise plan and get to use it for both work and some personal stuff.
I mainly use it for side projects and doing research for writing stuff on my blog.
I use Opus 4.6 with claude code 1M context and consistently use up 150-200$ worth of token per day. I wonder how do you manage to do anything with a 10$/mo program.
A $50-per-week Codex Pro/Claude Max plan would be perfect for solo gamedevs/open-source devs who have existing code that would benefit from an occasional review pass or subsystem experiments/brainstorming with the most powerful models, but don't need to use one for a whole month.
Pricing will soon be structured around energy costs and On/off peak power rates, I’m actually surprised it hasn’t happened sooner.
Even with Behind the Meter Generation, you’re not completely mitigated from peak (daily) power prices. Being able to shift at least some demand around will help from a pure energy costs perspective.
Most of these Behind the Meter generation projects will be Gas Generation. Guess what happens during a cold snap like the one we experienced in the Northeast US a few weeks ago? Natural gas prices jumped 10X in the daily market. You say that they are hedged? Hedges do not matter during Operational Flow Order(OFO)/Force Majeur/Curtailment pipeline events and they are exposed to the daily market. (I do this for a living)
I was thinking the same, but (and correct me if I'm wrong), the timezone means this is only really useful between 11pm and 5am AEST? - EDIT: yup - I _completely_ missed the "outside" of those US hours. yay!
Presumably they have unused compute in those hours and figure they may as well enable people to use it and get more invested into their ecosystem.
What I wish Anthropic would do is be a lot more explicit about what windows apply when. Surely they have the data to say "you get X usage from hours A to B, Y usage from B to C"
I just know there has to be some psychology in play with these promos. The promo during December got me to upgrade to the $100 plan, and I know I'm not the only one.
I suspect it’s much more about understanding user behavior, i.e: given more allowance off-peak, do users change when they use Claude? And from there, that will inform how plans are designed long term. If they discover that offering higher off-peak limits meaningfully changes how/when users interact with the service, they can use discounted off-peak plans to flatten usage. I would be very surprised if this promotion had anything to do with encouraging people to upgrade.
I found that when I have “infinite” tokens my behaviour changed. 3-5 tabs so I’m not waiting, free side quests, huge review skills over whole codebase, skills that wrap 10 other skills. It’s like going from expensive data to uncapped.
I think these token doubles are there to kick you into a abundance mindset (for want of a better term) so going back feels painful. Stop counting tokens, focus on your project and the cost of your own time.
I have this as a skill Claude created to run the rest. It mentions each skill in turn, see below. It’s not deterministic but it definitely runs each skill and it’s raised a bunch of issues, which I then selectively deal with. Where I can, once an issue is identified, I make deterministic tests.
Text includes:
Invoke each review/audit skill in sequence. Each skill runs its own comprehensive checks and returns findings. Capture the findings from each and incorporate them into the final report.
IMPORTANT: Invoke each skill using the Skill tool. Each skill is independently runnable and will produce its own detailed output. Summarize findings per skill into the unified report format.
You're probably right. I've been thinking about why anthropic's revenue keeps soaring. I think in terms of "new users trying the product" we're definitely somewhere in the slowing part of the S-curve (at least in the US), but there are other growth contributors. Two bigs ones are people finding new use-cases and people figuring out how to scale up current use-cases to use more tokens. Perhaps little temporary-usage-boosts like this give people permission to attempt new use-cases or more scale and realize they could use a higher tiered plan.
It's faster to change user behavior then to buy and setup new hardware. I bet this is just to bleed off the their growing pains with the influx of users.
There's definitely psychology in play, but I think it might be less "trying to get you to spend more" and more "trying to incentivize load-shifting", which (to me at least) is a lot less sinister-- my utility does this too for electricity, and nobody attributes malicious intent to it.
We all know these services see huge load spikes and sometimes service degradation when America wakes up, and I bet they'd appreciate it if as many "chug-and-plug" agent workflows moved to overnight hours as possible.
My assumption was always that the December promo was a combination – they were presumably way under capacity because everyone was on holiday given how enterprise-heavy they are, so giving people a bunch of extra usage with a loud promo meant a whole bunch of people would try Claude and see how good it had gotten at very little cost to Anthropic.
The psychology is to hook you on the usage. A lot of people see a little movement in the usage meter and get cold feet about heavy usage. The prior $70 credit deal and now this offering are to try to get people to dive in, and hopefully retain that usage pattern afterwards.
Anthropic's models are obviously superior at coding right now but using 2-3 $20 accounts between different providers is still a very effective way to get good value. Gemini CLI and Codex seem to be at least 2x more permissive on usage. The models are good enough.
Plus we are technologists, we want to try out different stuff and compare.
That's precisely what I do, with subscriptions to all of them. Gemini almost seems unlimited...like I never hit limits with it. Don't even know how to check my usage for the subscription plans on that.
But increasingly I'm using Claude for basically all real coding. I ask Gemini and Codex questions, but I'm honestly in awe at Opus' ridiculous capabilities.
I tried gemini the other day and after asking it 3 things I hit a limit. That's after gemini cli crashed my terminal twice for some reason (just opening it `gemini` caused the freeze -> crash). I must be doing something wrong because using gemini flash over openrouter I barely spend credits, yet my subscription ran out almost instantly.
Gemini 3.1 pro under a Google AI pro subscription has just recently started imposing really small weekly limits. I went from it feeling unlimited to hitting a 4 day quota in 2 hours of use. Very odd. Wonder if too many people jumped on with the 3.1 pro release.
Yes, it overlaps well with the market open time. But I thought Claude was good with coding... Does this mean major trading agents write code using Claude to make trading decisions? Or Claude models are relatively better than other models in non-coding trading work?
"Claude" is their chat bot product, so a peer of ChatGPT and used for everything. It by default uses their "Claude Sonet" models.
"Claude Code" is their code-writing client application, which uses "Claude Opus" models.
From my understanding:
Peak time (non-promo): UTC 12:00–18:00 / KST (UTC+9): 21:00–03:00
Off-peak time (promo): UTC 18:00–12:00 / KST (UTC+9): 03:00–21:00
I guess I’ll need to do more coding during the daytime.
"One thing I really suspect we'll see a lot more of is much more generous rate limits at 'off peak' times - likely to be early morning UTC - as there is no doubt a lot of "idle" compute sitting there"
I strongly suspect this will end up in the opposite happening - where peak tokens are far more "expensive" (whether that be thru usage limits of API costs) than off-peak.
PS: Anthropic have managed to improve reliability but are absolutely shredding opus tok/s at peak times. It absolutely crawls on the web (maybe 2-3 tok/s?) and I believe that on non-max plans it's also incredibly slow on claude code.
“I strongly suspect this will end up in the opposite happening - where peak tokens are far more "expensive" (whether that be thru usage limits of API costs) than off-peak.”
This only happens once/if competition eases up. Until then, it’s a race to the bottom
Interesting to see more demand shaping mechanisms applied to LLM inference. Even though the "batch processing" feature is already available. I guess this "promotion" is to test the hypothesis of sliding along the spectrum towards more "real-time" demand shaping.
I need something in between pro and max (about 2-3x pro not 5x). Really hoping this usage promotion is a permanent fixture. I have Claude through work and more tokens than I know what to do with. But on personal projects, I tend to want a lot of tokens all at once at late hours.
DST shenanigans aside (we're in the "US has changed but Europe hasn't" window), 10:00 in SF is 18:00 in London. Meaning their peak time window is 13:00–19:00 London time, or 14:00–20:00 Berlin time.
So us European folks get promotional rates during the morning and evening.
EDIT: Actually, because the promo ends at the end of March, it'll all be within DST shenanigans. So peak times are 12:00–18:00 London, 13:00–19:00 Berlin.
Very much like electric utility time of day pricing, using economic incentives to shift demand to trough periods.
Perhaps an opportunity for them to improve workload scheduling orchestration, like submitting a job to a distributed computing cluster queue, to smooth demand and maximize utilization.
Everything bursty will use economic incentives to smooth the load. I'm not sure how they'd do that with workload scheduling orchestration when you have latency-sensitive loads and there are e.g. twice as many requests at midday as at midnight.
You decouple the workloads from human interaction (ie when you submit the job to the queue vs when it is scheduled to execute) so when they run is not a consideration, if possible. The economic incentives encourage solving this, and if it can’t be solved, it buckets customer cohort by willingness (or unwillingness) to pay for access during peak times.
Certainly, interactive workloads aren’t realistic for time shifting, but agentic coding likely is. Package everything up and ship it as a job, getting a bundle back asynchronously.
I don't know, my agentic coding is pretty interactive. Maybe once the plan is done, sure. That would be interesting, though OpenAI already does this with batch workloads.
The insanely competitive market for LLMs is great for us, but if I were one of the investors in these companies it wouldn't exactly fill me with confidence that my $500 billion spent on datacenters and Nvidia cards is going to get repaid ten times over like they're claiming. I'm still getting very strong "this is a commodity; margins will be driven inexorably to zero" vibes from these products.
Did you exhaust the five-hour usage limit already? As I understand it, the ”additional usage” refers to anything beyond the standard five-hour usage limit.
Who are these guys even competing with that they are going so hard with the deals? Like the 1M context window, is Gemini offering that? In any case, they seem to have no real competition today.
codex offered the 1m context window (without markup, and via subscription) first, and is now wrapping up a 2 month promo of 2x usage rate. they've also provided free tier access which claude code lacks, and have shipped a desktop app (unlike claude code) for mac and windows. codex is also beating them on many benchmarks and has influencers like @steipete (before they hired him) proclaiming that he uses codex exclusively for code, after having been a claude code user and popularizing openclaw initially on top of claude code (but never for writing openclaw, only running it).
codex is still in minority use but has taken many customers from them over a short period.
Long ago in the ancient days of punchcards and IBM mainframes, you’d write your programs during the day, then submit them to run overnight and pick up your results in the morning. It would be funny and sort of romantic if time-based LLM pricing returned us to that: write your specs all day, run agents on them overnight, check out the results in the morning.
I don't really understand why AI providers don't charge like the electric company, or AWS. Instead of increasing usage limits, just charge less for off-hours use.
LLM inference is much more geographically fungible than electricity, so maybe it’s just not worth the complexity yet and there is enough (not highly latency sensitive) load on average globally.
It's basically the whole time Wall Street & stock markets run. And the entire afternoon and early evening of Europe. Plenty of usage in this window, AWS-East|Azure-East max usage window.
These promos should be based on when more renewable energy is available for inference not when less people are likely to be using the AI. We need to adjust usage to when supply is more renewable for both training and inference in order to better protect our grid and the planet.
So we now have just pure marketing slop on the HN front page? How is this interesting or "curious" again? The AI slop season is affecting HN in clever ways.
Would be cool to have a $5-10/month plan that only works off-peak, for people who want to do the occasional side project after work. Right now it's hard to justify anything but Copilot (because it's cheaper, offers the same models, and I'm nowhere near the usage limits).
I suspect that any GPU cycle not spent on inference will just be dedicated to training (which as I understand it can “soak up” essentially unlimited compute at constant value per token), and I’d not expect to see time-based billing until that changes.
Isn't this post an announcement of time-based billing? Just in a kind of indirect way (not billing, rather than billing).
Also, my (extremely naive) understanding is that at the cutting edge, hardware is diverging for training vs inference. That might not be true for Anthropic though.
This is an announcement of time-based billing.
Would be better if they simply made it free for open source developers. I can barely justify spending time on my hobby projects. If I paid for this, I'd be paying to work for them since they're using our data for training.
How would this work? How would they verify that someone is an “open source developer”?
Anthropic is taking applications here:
https://claude.com/contact-sales/claude-for-oss
Huh. This is cool. Sadly I'm not big enough to benefit from this. Here's to hoping that changes in the future.
They could probably fairly easily identify all the authors of the opensource software the hoovered up and used in their training set.
But them even admitting that was possible is a little bit to close to being able to be held accountable...
They would use the "trust me bro" verification mechanism
Hard to justify? 20/month for like 5x output is a great deal (be it Claude or Codex or whatever), even if it lasts only 2-3 hours per day.
I canceled my plan today and wrote my reason as: now that I have a job again I don’t have the time or needs for the pro plan. If there was a $5 a month option, I would gladly take it to make use of Opus for my rare side ideas.
Pay as you go. I never spent more than $10/month working on my side project (usually a few evenings per month).
On my gamedev side project I have a ralph loop going on the $100 5x plan and it caps the session limit 4-5 times a day and hit the weekly limit in 3 days. Token usage is around $750 a week or $3000 a month according to "npx ccusage". I would have to be insane to pay that instead of $100.
I have the enterprise plan and get to use it for both work and some personal stuff.
I mainly use it for side projects and doing research for writing stuff on my blog.
I use Opus 4.6 with claude code 1M context and consistently use up 150-200$ worth of token per day. I wonder how do you manage to do anything with a 10$/mo program.
You’re not using Claude Code?
the $20 pro plan would also have double offpeak limits - just set it to sonnet and you'll get a reasonable level of output
A $50-per-week Codex Pro/Claude Max plan would be perfect for solo gamedevs/open-source devs who have existing code that would benefit from an occasional review pass or subsystem experiments/brainstorming with the most powerful models, but don't need to use one for a whole month.
Isn't Co-Pilot tied to VsCode?
No, it's in notepad, Edge, all MS Office products, Azure, M365. But you also get to chose your model: MS, OpenAI, Anthropic.
You can use it w/ things like opencode. They also have their own Claude-Code-like CLI (but I find opencode to be better).
Claude Pro is $20/month.
Set up an API key and use that.
Pricing will soon be structured around energy costs and On/off peak power rates, I’m actually surprised it hasn’t happened sooner. Even with Behind the Meter Generation, you’re not completely mitigated from peak (daily) power prices. Being able to shift at least some demand around will help from a pure energy costs perspective.
Most of these Behind the Meter generation projects will be Gas Generation. Guess what happens during a cold snap like the one we experienced in the Northeast US a few weeks ago? Natural gas prices jumped 10X in the daily market. You say that they are hedged? Hedges do not matter during Operational Flow Order(OFO)/Force Majeur/Curtailment pipeline events and they are exposed to the daily market. (I do this for a living)
This is a psyop to recruit more Australians I'm sure of it
I was thinking the same, but (and correct me if I'm wrong), the timezone means this is only really useful between 11pm and 5am AEST? - EDIT: yup - I _completely_ missed the "outside" of those US hours. yay!
Great for me in Japan.
I was thinking the same thing. Would be interesting to see their usage over timezones.
Can't complain honestly!
Presumably they have unused compute in those hours and figure they may as well enable people to use it and get more invested into their ecosystem.
What I wish Anthropic would do is be a lot more explicit about what windows apply when. Surely they have the data to say "you get X usage from hours A to B, Y usage from B to C"
or see if they can shift some end user workloads to off hours so that they aren't bogging things down during peak hours.
I just know there has to be some psychology in play with these promos. The promo during December got me to upgrade to the $100 plan, and I know I'm not the only one.
I suspect it’s much more about understanding user behavior, i.e: given more allowance off-peak, do users change when they use Claude? And from there, that will inform how plans are designed long term. If they discover that offering higher off-peak limits meaningfully changes how/when users interact with the service, they can use discounted off-peak plans to flatten usage. I would be very surprised if this promotion had anything to do with encouraging people to upgrade.
I found that when I have “infinite” tokens my behaviour changed. 3-5 tabs so I’m not waiting, free side quests, huge review skills over whole codebase, skills that wrap 10 other skills. It’s like going from expensive data to uncapped.
I think these token doubles are there to kick you into a abundance mindset (for want of a better term) so going back feels painful. Stop counting tokens, focus on your project and the cost of your own time.
I use the enterprise plan for work and often burn ~150$ worth of tokens per day. I have noticed exhibiting similar behaviors here.
When you say nearly unlimited token, do you mean the 100 or 200$ subscription?
$200, over December it was doubled. I tried my best in between family time and friends to burn a hole in it. Never got near doing so.
Is it possible to link/wrap several skills together? I haven't managed to get Claude to react to a reference to another skill within a skill.
I have this as a skill Claude created to run the rest. It mentions each skill in turn, see below. It’s not deterministic but it definitely runs each skill and it’s raised a bunch of issues, which I then selectively deal with. Where I can, once an issue is identified, I make deterministic tests.
Text includes:
Invoke each review/audit skill in sequence. Each skill runs its own comprehensive checks and returns findings. Capture the findings from each and incorporate them into the final report.
IMPORTANT: Invoke each skill using the Skill tool. Each skill is independently runnable and will produce its own detailed output. Summarize findings per skill into the unified report format.
4. Architecture Health
Invoke: Skill(architecture-review)
Covers: module boundaries, cross-module communication, dependency direction, infrastructure layer rules, hexagonal architecture compliance.
5. Security Health
Invoke: Skill(security-review)
Covers: hardcoded secrets, SQL injection, authorization, HTTPS, CORS, input validation, authentication patterns.
You're probably right. I've been thinking about why anthropic's revenue keeps soaring. I think in terms of "new users trying the product" we're definitely somewhere in the slowing part of the S-curve (at least in the US), but there are other growth contributors. Two bigs ones are people finding new use-cases and people figuring out how to scale up current use-cases to use more tokens. Perhaps little temporary-usage-boosts like this give people permission to attempt new use-cases or more scale and realize they could use a higher tiered plan.
It's faster to change user behavior then to buy and setup new hardware. I bet this is just to bleed off the their growing pains with the influx of users.
There's definitely psychology in play, but I think it might be less "trying to get you to spend more" and more "trying to incentivize load-shifting", which (to me at least) is a lot less sinister-- my utility does this too for electricity, and nobody attributes malicious intent to it.
We all know these services see huge load spikes and sometimes service degradation when America wakes up, and I bet they'd appreciate it if as many "chug-and-plug" agent workflows moved to overnight hours as possible.
My assumption was always that the December promo was a combination – they were presumably way under capacity because everyone was on holiday given how enterprise-heavy they are, so giving people a bunch of extra usage with a loud promo meant a whole bunch of people would try Claude and see how good it had gotten at very little cost to Anthropic.
Interesting - the first thing my mind went to was the DoD supply chain risk designation, and wanting to boost metrics to calm investors nerves
The psychology is to hook you on the usage. A lot of people see a little movement in the usage meter and get cold feet about heavy usage. The prior $70 credit deal and now this offering are to try to get people to dive in, and hopefully retain that usage pattern afterwards.
Anthropic's models are obviously superior at coding right now but using 2-3 $20 accounts between different providers is still a very effective way to get good value. Gemini CLI and Codex seem to be at least 2x more permissive on usage. The models are good enough.
Plus we are technologists, we want to try out different stuff and compare.
That's precisely what I do, with subscriptions to all of them. Gemini almost seems unlimited...like I never hit limits with it. Don't even know how to check my usage for the subscription plans on that.
But increasingly I'm using Claude for basically all real coding. I ask Gemini and Codex questions, but I'm honestly in awe at Opus' ridiculous capabilities.
I tried gemini the other day and after asking it 3 things I hit a limit. That's after gemini cli crashed my terminal twice for some reason (just opening it `gemini` caused the freeze -> crash). I must be doing something wrong because using gemini flash over openrouter I barely spend credits, yet my subscription ran out almost instantly.
Gemini 3.1 pro under a Google AI pro subscription has just recently started imposing really small weekly limits. I went from it feeling unlimited to hitting a 4 day quota in 2 hours of use. Very odd. Wonder if too many people jumped on with the 3.1 pro release.
/stats session shows you the remaining quota in Gemini CLI and when the quota resets, and they dropped the quota badly in the last few days.
Before that I would totally agree with you, it felt really endless
I found the $250 in free credit for Claude Code hard to actually use before it expired. I think I got down to less than $50
That is doubled usage between 5AM and 11PM for anyone playing along from Sydney/Melbourne.
JST here, it' basically add day.
Are you sure? It's a 6h window on the page
Not sure what you mean by "it", but the doubled usage would be 18h a day.
What I mean is that "it" - the advertised window during which the offer applies - is not 5am to 11pm as mentioned by the poster above.
Using timezone not UTC for a global service is a crime, especially mixed with daylight saving.
This is how they say Wall St is all using Anthropic without saying Wall St is all using Anthropic.
Regular price window around the world: https://www.worldtimebuddy.com/?qm=1&lid=5368361,5128581,316...
Yes, it overlaps well with the market open time. But I thought Claude was good with coding... Does this mean major trading agents write code using Claude to make trading decisions? Or Claude models are relatively better than other models in non-coding trading work?
The US trading day overlaps well with the US business day in general (and to a lesser extent, European demand).
"Claude" is their chat bot product, so a peer of ChatGPT and used for everything. It by default uses their "Claude Sonet" models. "Claude Code" is their code-writing client application, which uses "Claude Opus" models.
https://platform.claude.com/docs/en/about-claude/models/over...
There's only one country in the world.
Damn. Please use UTC.
From my understanding: Peak time (non-promo): UTC 12:00–18:00 / KST (UTC+9): 21:00–03:00 Off-peak time (promo): UTC 18:00–12:00 / KST (UTC+9): 03:00–21:00
I guess I’ll need to do more coding during the daytime.
I'm in their time zone, and was just planning to stop with my bad habit of staying up working till 4 am and waking up at noon.
So much for that plan.
Travelling salesman problem in 2026 is Travelling Engineer Problem to find optimal location to maximize tokens usage.
Very interesting. As I wrote in this article https://martinalderson.com/posts/is-the-ai-compute-crunch-he... a couple of weeks ago:
"One thing I really suspect we'll see a lot more of is much more generous rate limits at 'off peak' times - likely to be early morning UTC - as there is no doubt a lot of "idle" compute sitting there"
I strongly suspect this will end up in the opposite happening - where peak tokens are far more "expensive" (whether that be thru usage limits of API costs) than off-peak.
PS: Anthropic have managed to improve reliability but are absolutely shredding opus tok/s at peak times. It absolutely crawls on the web (maybe 2-3 tok/s?) and I believe that on non-max plans it's also incredibly slow on claude code.
“I strongly suspect this will end up in the opposite happening - where peak tokens are far more "expensive" (whether that be thru usage limits of API costs) than off-peak.”
This only happens once/if competition eases up. Until then, it’s a race to the bottom
Living in Tasmania as competitive advantage
Tassie represent
Literally dozens of us on here!
Interesting to see more demand shaping mechanisms applied to LLM inference. Even though the "batch processing" feature is already available. I guess this "promotion" is to test the hypothesis of sliding along the spectrum towards more "real-time" demand shaping.
Dear line manager, I will be taking a very long lunch 12-6pm in London's Chinatown then heading back to the office half cut to vibe code
I need something in between pro and max (about 2-3x pro not 5x). Really hoping this usage promotion is a permanent fixture. I have Claude through work and more tokens than I know what to do with. But on personal projects, I tend to want a lot of tokens all at once at late hours.
Converted to AEDT (Australian Eastern Daylight Time):
Peak hours (normal usage): 8 AM – 2 PM ET → 12 AM (midnight) – 6 AM AEDT (next day)
Off-peak hours (2x usage): All other times → 6 AM – 11:59 PM AEDT
So afternoon in Germany or am I misreading?
DST shenanigans aside (we're in the "US has changed but Europe hasn't" window), 10:00 in SF is 18:00 in London. Meaning their peak time window is 13:00–19:00 London time, or 14:00–20:00 Berlin time.
So us European folks get promotional rates during the morning and evening.
EDIT: Actually, because the promo ends at the end of March, it'll all be within DST shenanigans. So peak times are 12:00–18:00 London, 13:00–19:00 Berlin.
Outside 4pm to 10pm
This is great, but i guess they are feeling the heat from Codex resetting limits in the last month quite a bit.
I think they're feeling the heat from growing too quickly so they want to incentivize people to spread the load more evenly.
Very much like electric utility time of day pricing, using economic incentives to shift demand to trough periods.
Perhaps an opportunity for them to improve workload scheduling orchestration, like submitting a job to a distributed computing cluster queue, to smooth demand and maximize utilization.
Everything bursty will use economic incentives to smooth the load. I'm not sure how they'd do that with workload scheduling orchestration when you have latency-sensitive loads and there are e.g. twice as many requests at midday as at midnight.
You decouple the workloads from human interaction (ie when you submit the job to the queue vs when it is scheduled to execute) so when they run is not a consideration, if possible. The economic incentives encourage solving this, and if it can’t be solved, it buckets customer cohort by willingness (or unwillingness) to pay for access during peak times.
Sure, but if I ask the LLM a question, I'd like it to respond now, instead of tonight.
Certainly, interactive workloads aren’t realistic for time shifting, but agentic coding likely is. Package everything up and ship it as a job, getting a bundle back asynchronously.
I don't know, my agentic coding is pretty interactive. Maybe once the plan is done, sure. That would be interesting, though OpenAI already does this with batch workloads.
The insanely competitive market for LLMs is great for us, but if I were one of the investors in these companies it wouldn't exactly fill me with confidence that my $500 billion spent on datacenters and Nvidia cards is going to get repaid ten times over like they're claiming. I'm still getting very strong "this is a commodity; margins will be driven inexorably to zero" vibes from these products.
I’m trying to figure out how this affects weekly limits, since those overlap peak hours. My observation is that it doesn’t. But I could be wrong.
If they are doing it “right” I think any off peak usage should count 50% toward your weekly limits.
Edit: it does look like they are doing it the "right" way.
> Does bonus usage count against my weekly usage limit?
> No. The additional usage you get during off-peak hours doesn’t count toward any weekly usage limits on your plan.
So the first 100% of 5-hour usage are billed against weekly usage at normal rates, but the second additional 100% are not counted?
I just watched my "weekly limit" get used while I ran a claude code command.
I'm not sure how to square that with the quote you gave.
Did you exhaust the five-hour usage limit already? As I understand it, the ”additional usage” refers to anything beyond the standard five-hour usage limit.
> Does bonus usage count against my weekly usage limit?
> No. The additional usage you get during off-peak hours doesn’t count toward any weekly usage limits on your plan.
Oops! Looks like we posted at the same time.
all weekend is off-peak
Who are these guys even competing with that they are going so hard with the deals? Like the 1M context window, is Gemini offering that? In any case, they seem to have no real competition today.
pretty sure Gemini has shipped with a 1m context window for a long time
codex offered the 1m context window (without markup, and via subscription) first, and is now wrapping up a 2 month promo of 2x usage rate. they've also provided free tier access which claude code lacks, and have shipped a desktop app (unlike claude code) for mac and windows. codex is also beating them on many benchmarks and has influencers like @steipete (before they hired him) proclaiming that he uses codex exclusively for code, after having been a claude code user and popularizing openclaw initially on top of claude code (but never for writing openclaw, only running it).
codex is still in minority use but has taken many customers from them over a short period.
Ah gotcha, thanks
Long ago in the ancient days of punchcards and IBM mainframes, you’d write your programs during the day, then submit them to run overnight and pick up your results in the morning. It would be funny and sort of romantic if time-based LLM pricing returned us to that: write your specs all day, run agents on them overnight, check out the results in the morning.
They have this. It’s called batch pricing and it’s 50% off.
I find that incredibly optimistic.
This company is clearly on a mission. I would just like to know what that mission is. I mean this in a good way.
I didn't understood "your five-hour usage" I thought plans were per interaction or per token, not per hour.
There's a limit that resets every five hours and one that resets every week.
My usage only shows daily and weekly, though. I never got that.
It has "current session" and "weekly". If you notice, "current session" is never more than five hours away from expiration.
Oh, you're right. I don't know why I've always misread "current session" as daily.
Thanks for clearing that up. It'll help me schedule stuff in the future.
For Claude Code, you use up 12% of your weekly allotment every session, so 8 sessions per week.
If you are only using a session a day, you're wasting a session. :)
You can pay either for API usage or a fixed monthly plan (which is way cheaper but you can't use it for applications, just personal use).
Nice. I have 18 hours to my weekly reset and 10% left. Can maximize.
Would be happy to utilize but didn't see a promocode or voucher. Do I need more coffee?
Have they accounted for time zones? This is my daytime in Australia/Sydney
Around the world, the regular price window: https://www.worldtimebuddy.com/?qm=1&lid=5368361,5128581,316...
Just open the page, they specify the UTC-4 band.. you can adjust accordingly (you'll be fine)
Australia here we come.
I still hate Claude for turning down limits. I use z.ai in Claude code now, haven't hit the limit yet.
It's eerie how similar these trends are to early phone/text usage limits.
But the best part is, those usage levels are hidden, arbitrary, and they change them all the time.
So they could “double” your usage by keeping it the same and then simply halving peak usage.
I don't really understand why AI providers don't charge like the electric company, or AWS. Instead of increasing usage limits, just charge less for off-hours use.
LLM inference is much more geographically fungible than electricity, so maybe it’s just not worth the complexity yet and there is enough (not highly latency sensitive) load on average globally.
I guess extra compute opened up after they were canned by Department of War.
They are learning from Codex
https://hascodexratelimitreset.today
Wild conspiracy theory: this is targetting to decrease usuage from Indian users.
There is no way 5-11 AM PT is peak traffic
It's basically the whole time Wall Street & stock markets run. And the entire afternoon and early evening of Europe. Plenty of usage in this window, AWS-East|Azure-East max usage window.
Wtf is ET? Is an alien time?
Now is the window of our discontent*: https://www.worldtimebuddy.com/?qm=1&lid=5368361,5128581,316...
* This is the regular price window, the rest is the promo usage.
My fellow Californians would agree that, yes, ET is an alien time. https://en.wikipedia.org/wiki/Eastern_Time_Zone
Brazil have 3 tz, but officially we have just one. We are a vertically country :)
You and China both https://www.timeanddate.com/time/zone/china
Ah crap I was hoping to benefit more of my sub because I'm in an off-hours tz.
These promos should be based on when more renewable energy is available for inference not when less people are likely to be using the AI. We need to adjust usage to when supply is more renewable for both training and inference in order to better protect our grid and the planet.
Is this going to cause another outage?
AI psychosis intensifies
changes sleep schedule
So we now have just pure marketing slop on the HN front page? How is this interesting or "curious" again? The AI slop season is affecting HN in clever ways.