It's insane spends like that which are the main reason that reducing cloud costs is the main money maker for my consulting business at the moment.
It's extremely rare to come across people using cloud services where you can't cut their costs by at least 30%, often more than half, sometimes as much as 90%+, and usually it results in reduced devops spends as well.
The interesting thing is that the hardest part of the sales process is to convince people that cloud isn't cheap, because there's a near religious belief that cloud providers must be cost effective in some companies.
We're discussing Figma's spend here which is a bit of a cherry pick. At their size the savings for on-prem can be substantial. For the vast majority of companies without such heavy bandwidth needs, the additional cost needs to be measured in terms of additional intenance and staffing.
Savings can be substantial at much smaller sizes and generally devops maintenance/staffing costs goes down if anything.
This is literally my bread and butter, and we make more money off customers who insist on staying on cloud providers, because they consistently need more help.
I'm curious what size and use case you're serving describing here. Up to a certain point, devops effort and cost is negligible when offloading to the cloud.
We'd typically look at everything, and you're absolutely right. Dev environments are often also poorly managed and you can find lots of resources nobody knows who are using.
For the vast majority of them, rented servers at a managed hosting provider would beat cloud in price hands down, but for some the difference might be small enough once they've actually optimised their cloud setup that it's a judgement call. I usually recommend clients plan for optimizing their cloud setup first, both to get a fair comparison and also because a lot of the assessments you need to do of your cloud environment to do that will also make it easier to figure out what you'd pay for a managed hosting setup as well (e.g. understanding base load vs. spikes, durability requirements for different subsets of storage etc.)
For some of them, a colo facility would be cheaper, but that's highly dependent on where you want to host it (e.g. I'm in London - putting things in a colo in London is really hard to make cost effective vs. renting servers somewhere with lower land costs; data centre operators are real-estate plays)
However, you can usually make managed hosting/colo even cheaper by sprinkling some cloud in. E.g. a "trick" that can work amazingly well is to set up the bare minimum to let you spin up what you need to handle traffic spikes in a cloud environment, and then set up monitoring for your load balancer so that you start scaling into the cloud environment once load hits a certain level, but use only the managed hosting below that level.
That way, you almost never end up actually spinning up cloud instances, but you gain the ability to run the managed hosting environment far closer to the wire than you could otherwise safely do, and drive your cost per request down accordingly.
At the startup I work at, we have an AWS instance that only runs our GitLab server. The damn thing runs at 10-15% CPU nearly constantly (because GitLab's founding assumption is that everyone wants to scale to ten million global users) so our spend racks up to $70/month every months.
And yes, that's absolute peanuts for any business. But in my view, spending almost $1k annually for a GitLab server for a team of ten is ridiculous.
We could accomplish exactly the same thing on-prem with hardware we already own. It would take a couple of engineer-hours per year. As long as it's under 40 hours of maintenance per year, we come out day ahead. And over the last two years, I've had to spend a total of maybe 10 hours with hands on this box.
I just don't get the thinking that leads businesses to put unnecessary crap in the cloud. Just save your money, on-prem is cheaper for pretty much everyone not doing a saas or less than a few hundred employees
I often starts that way, yes. And then you end up accreting more and more.
I usually tell people that if they don't want something on-prem, then at least pick a cheaper provider.
But as long as it's a VM and you don't build in reliance on other AWS services, at least you can move off it as you scale. The real problem comes when you start depending on a single cloud.
Not only do you lose the ability to do an easy move, but you also lose almost all negotiating leverage as your bill increases.
My mental model of startups is that they're free of the bureaucracy that would cause you to whine in comments versus Just Fucking Do It. So, what's the impediment to using your time to test your "10 hours" theory?
Also, my experience with GitLab isn't that the thing is hurp-durping because of scaling to ten million, it's that ruby gonna ruby
Your estimate is off by an order of magnitude unless you value developer time at $21 an hour. I pay union electricians 6x more than that to work for me.
> $70 a month times 12 is $840
> As long as it’s under 40 hours of maintenance (??)
I don't understand how it takes 3 years to get off the cloud. I'm not a cloud developer, though. The most I've done is run code on free hosts or compute instances. Presumably there's something to the microservices and lambdas and distributed compute that makes this hard. I'm thinking if this was a monolith (like AWS themselves admit is cheaper), they could just run it locally? What a giant waste of money. I'm very glad to start seeing xAAS start to die out. At the end of the day it's just looking like more middle-men instead of how I've always assumed it was intended to be: economies of scale.
However, and missing from this article + discussion so far, is their revenue. If they pay $4/day and make $2 in revenue, that's bad. They pay $300k/day but make ~ $2250k/day in revenue. I don't know what the ratio is supposed to be, but at first blush that doesn't actually seem too bad. I'll let the more qualified take over, I'm struggling to find out how big a % of their total expenses this is.
A mistake I see commonly whenever someone says to "just move off of the cloud" is that they see the cloud as just a VM provider. If it was, then yeah, moving to another provider wouldn't be such a big deal.
In reality, the cloud creeps into your systems in all sorts of ways. Your permissions use cloud identities, your firewalls are based on security group referencing, your cross-region connectivity relies on cloud networking products, you're managing secrets and secret rotation using cloud secrets management, your observability is based on cloud metrics, your partners have whitelisted static ip ranges that belong to the cloud provider, your database upgrades are automated by the cloud provider, your VM images are built specifically for your cloud provider, your auditing is based on the cloud provider's logs, half the items in your security compliance audit reference things that are solved by your cloud provider, your applications are running on a container scheduler managed by your cloud provider, your serverless systems are strongly coupled distributed monoliths dependent on events on cloud specific event buses, your disaster recovery plans depend on your cloud provider's backup or region failover capabilities, etc. Not to mention that when you have several hundred systems, you're not going to be moving them all at the same time. They still need to be able to communicate during the transition period (extra fun when your service-to-service authentication is dependent on your cloud) without any downtime.
It's not just a matter of dropping a server binary onto a VM from a different provider. If I think about how long it would take my org to move fully off of _a_ cloud (just to a different cloud with somewhat similar capabilities), 3 years doesn't sound unrealistic.
> A mistake I see commonly whenever someone says to "just move off of the cloud" is that they see the cloud as just a VM provider. If it was, then yeah, moving to another provider wouldn't be such a big deal.
I think it can still be a big deal depending on what's the overall system architecture, where are all data stores, how many services you run, and what constraints you're dealing with.
For example, when you are between two cloud providers, more often than not you will have to replace internal calls with external calls, at least within the migration stage. That as important impacts on reliability and performance. In some scenarios, this performance impact is not acceptable and might require architecting services.
This is why I have a hard rule of just doing everything in the VM for stuff I build. I'm able to move between cloud providers and even self host often with near zero effort because of this.
People dig themselves in hard, with reliances on all kinds of proprietary services, and complex relationships.
My experience in helping people do cloud migrations are that companies also often quickly lose oversight over which cloud services they are even still running. Sometimes systems that should've been shut down years ago are still hanging around, or parts of them anyway, like S3 buckets etc. Most companies that use cloud systems underprovision their devops because they think they don't need much for a cloud system (in fact, they typically need more to do it well).
$300k/day for their revenue is very much crazy high.
During good times, many companies simply don't care about which services or how much. I've worked at several startups where they got a large funding round and the word was "we don't care about cloud costs, just get it done fast / make it scale." Unfortunately, one bad mistake (like storing all data in a proprietary service like DynamoDB) can make this difficult to unwind when things get bad...
To me it's the same thing. You can pay somebody to care about that, but they might be underutilized for the majority of time so it's not worth it. If you have a service, instead of your security expert being used idk 1/x of full time, they can be y/x where y is the number of contracts. For me and my time we are just way too small to have somebody full-time dedicated. So that's how I think about it
It is a reasonable point. But i think it is not exactly that. Having your organisation focus on maintenance is a certain type of opportunity cost. It is pretty often one of your most knowledgeable engineer that does this. And it also interrupts the flow of many of your other engineers.
I'd love to see some real figures on that. My gut feeling is most companies spend as much on AWS experts as they previously did on people running in house facilities but I really don't know.
I have never had a client that got away with less maintenance because they used cloud.
In fact, those of my clients who insist of relying on cloud, tend to spend far more with me for similar complexity systems. I love taking their money, but I'd frankly rather help them save it, because longer term it's better.
The services are the "easy" part; moving data out of a cloud provider is slow and expensive. For a _really_ big dataset it can take months, sometimes years, just to complete the data transfer.
They need cloud storage. Object store. For design files.
Some web sockets with CRDT processes running.
What else?
It's insane spends like that which are the main reason that reducing cloud costs is the main money maker for my consulting business at the moment.
It's extremely rare to come across people using cloud services where you can't cut their costs by at least 30%, often more than half, sometimes as much as 90%+, and usually it results in reduced devops spends as well.
The interesting thing is that the hardest part of the sales process is to convince people that cloud isn't cheap, because there's a near religious belief that cloud providers must be cost effective in some companies.
We're discussing Figma's spend here which is a bit of a cherry pick. At their size the savings for on-prem can be substantial. For the vast majority of companies without such heavy bandwidth needs, the additional cost needs to be measured in terms of additional intenance and staffing.
Savings can be substantial at much smaller sizes and generally devops maintenance/staffing costs goes down if anything.
This is literally my bread and butter, and we make more money off customers who insist on staying on cloud providers, because they consistently need more help.
I'm curious what size and use case you're serving describing here. Up to a certain point, devops effort and cost is negligible when offloading to the cloud.
Is this production only? Based on my personal experience, many orgs could save a decent chunk just be cleaning up or spinning down dev environments.
We'd typically look at everything, and you're absolutely right. Dev environments are often also poorly managed and you can find lots of resources nobody knows who are using.
This 100%. SRE at $dayjob periodically has to ask us to clean up dev and uat kubernetes deployments
What percentage of clients do you think don't even need cloud?
For the vast majority of them, rented servers at a managed hosting provider would beat cloud in price hands down, but for some the difference might be small enough once they've actually optimised their cloud setup that it's a judgement call. I usually recommend clients plan for optimizing their cloud setup first, both to get a fair comparison and also because a lot of the assessments you need to do of your cloud environment to do that will also make it easier to figure out what you'd pay for a managed hosting setup as well (e.g. understanding base load vs. spikes, durability requirements for different subsets of storage etc.)
For some of them, a colo facility would be cheaper, but that's highly dependent on where you want to host it (e.g. I'm in London - putting things in a colo in London is really hard to make cost effective vs. renting servers somewhere with lower land costs; data centre operators are real-estate plays)
However, you can usually make managed hosting/colo even cheaper by sprinkling some cloud in. E.g. a "trick" that can work amazingly well is to set up the bare minimum to let you spin up what you need to handle traffic spikes in a cloud environment, and then set up monitoring for your load balancer so that you start scaling into the cloud environment once load hits a certain level, but use only the managed hosting below that level.
That way, you almost never end up actually spinning up cloud instances, but you gain the ability to run the managed hosting environment far closer to the wire than you could otherwise safely do, and drive your cost per request down accordingly.
At the startup I work at, we have an AWS instance that only runs our GitLab server. The damn thing runs at 10-15% CPU nearly constantly (because GitLab's founding assumption is that everyone wants to scale to ten million global users) so our spend racks up to $70/month every months.
And yes, that's absolute peanuts for any business. But in my view, spending almost $1k annually for a GitLab server for a team of ten is ridiculous.
We could accomplish exactly the same thing on-prem with hardware we already own. It would take a couple of engineer-hours per year. As long as it's under 40 hours of maintenance per year, we come out day ahead. And over the last two years, I've had to spend a total of maybe 10 hours with hands on this box.
I just don't get the thinking that leads businesses to put unnecessary crap in the cloud. Just save your money, on-prem is cheaper for pretty much everyone not doing a saas or less than a few hundred employees
I often starts that way, yes. And then you end up accreting more and more.
I usually tell people that if they don't want something on-prem, then at least pick a cheaper provider.
But as long as it's a VM and you don't build in reliance on other AWS services, at least you can move off it as you scale. The real problem comes when you start depending on a single cloud.
Not only do you lose the ability to do an easy move, but you also lose almost all negotiating leverage as your bill increases.
My mental model of startups is that they're free of the bureaucracy that would cause you to whine in comments versus Just Fucking Do It. So, what's the impediment to using your time to test your "10 hours" theory?
Also, my experience with GitLab isn't that the thing is hurp-durping because of scaling to ten million, it's that ruby gonna ruby
Your estimate is off by an order of magnitude unless you value developer time at $21 an hour. I pay union electricians 6x more than that to work for me.
> $70 a month times 12 is $840
> As long as it’s under 40 hours of maintenance (??)
Am I the only one that finds that not that crazy?
Any chance we can get a high-level breakdown of the AWS services they are paying for?
No. The figure is from their S-1 filing and further detail is not provided.
I don't understand how it takes 3 years to get off the cloud. I'm not a cloud developer, though. The most I've done is run code on free hosts or compute instances. Presumably there's something to the microservices and lambdas and distributed compute that makes this hard. I'm thinking if this was a monolith (like AWS themselves admit is cheaper), they could just run it locally? What a giant waste of money. I'm very glad to start seeing xAAS start to die out. At the end of the day it's just looking like more middle-men instead of how I've always assumed it was intended to be: economies of scale.
However, and missing from this article + discussion so far, is their revenue. If they pay $4/day and make $2 in revenue, that's bad. They pay $300k/day but make ~ $2250k/day in revenue. I don't know what the ratio is supposed to be, but at first blush that doesn't actually seem too bad. I'll let the more qualified take over, I'm struggling to find out how big a % of their total expenses this is.
A mistake I see commonly whenever someone says to "just move off of the cloud" is that they see the cloud as just a VM provider. If it was, then yeah, moving to another provider wouldn't be such a big deal.
In reality, the cloud creeps into your systems in all sorts of ways. Your permissions use cloud identities, your firewalls are based on security group referencing, your cross-region connectivity relies on cloud networking products, you're managing secrets and secret rotation using cloud secrets management, your observability is based on cloud metrics, your partners have whitelisted static ip ranges that belong to the cloud provider, your database upgrades are automated by the cloud provider, your VM images are built specifically for your cloud provider, your auditing is based on the cloud provider's logs, half the items in your security compliance audit reference things that are solved by your cloud provider, your applications are running on a container scheduler managed by your cloud provider, your serverless systems are strongly coupled distributed monoliths dependent on events on cloud specific event buses, your disaster recovery plans depend on your cloud provider's backup or region failover capabilities, etc. Not to mention that when you have several hundred systems, you're not going to be moving them all at the same time. They still need to be able to communicate during the transition period (extra fun when your service-to-service authentication is dependent on your cloud) without any downtime.
It's not just a matter of dropping a server binary onto a VM from a different provider. If I think about how long it would take my org to move fully off of _a_ cloud (just to a different cloud with somewhat similar capabilities), 3 years doesn't sound unrealistic.
> A mistake I see commonly whenever someone says to "just move off of the cloud" is that they see the cloud as just a VM provider. If it was, then yeah, moving to another provider wouldn't be such a big deal.
I think it can still be a big deal depending on what's the overall system architecture, where are all data stores, how many services you run, and what constraints you're dealing with.
For example, when you are between two cloud providers, more often than not you will have to replace internal calls with external calls, at least within the migration stage. That as important impacts on reliability and performance. In some scenarios, this performance impact is not acceptable and might require architecting services.
This is why I have a hard rule of just doing everything in the VM for stuff I build. I'm able to move between cloud providers and even self host often with near zero effort because of this.
But you're also not the CTO of Figma, they probably have use for at least 150k$ a day of cloud :)
Which is exactly why you don't make the mistake of relying on all that.
If you can't run it locally, don't use it unless you have absolutely no choice.
Cloud is a drug for companies… once get on and become addictive addicted, getting off is almost impossible.
People dig themselves in hard, with reliances on all kinds of proprietary services, and complex relationships.
My experience in helping people do cloud migrations are that companies also often quickly lose oversight over which cloud services they are even still running. Sometimes systems that should've been shut down years ago are still hanging around, or parts of them anyway, like S3 buckets etc. Most companies that use cloud systems underprovision their devops because they think they don't need much for a cloud system (in fact, they typically need more to do it well).
$300k/day for their revenue is very much crazy high.
During good times, many companies simply don't care about which services or how much. I've worked at several startups where they got a large funding round and the word was "we don't care about cloud costs, just get it done fast / make it scale." Unfortunately, one bad mistake (like storing all data in a proprietary service like DynamoDB) can make this difficult to unwind when things get bad...
> I've always assumed it was intended to be: economies of scale
IMO, the value proposition these days is rather to avoid maintenance. I.e. help with up with all the latest patches on your infrastructure.
To me it's the same thing. You can pay somebody to care about that, but they might be underutilized for the majority of time so it's not worth it. If you have a service, instead of your security expert being used idk 1/x of full time, they can be y/x where y is the number of contracts. For me and my time we are just way too small to have somebody full-time dedicated. So that's how I think about it
It is a reasonable point. But i think it is not exactly that. Having your organisation focus on maintenance is a certain type of opportunity cost. It is pretty often one of your most knowledgeable engineer that does this. And it also interrupts the flow of many of your other engineers.
All of these services - be it in or out of the cloud - are trivially available as service contract with external providers.
I'd love to see some real figures on that. My gut feeling is most companies spend as much on AWS experts as they previously did on people running in house facilities but I really don't know.
I have never had a client that got away with less maintenance because they used cloud.
In fact, those of my clients who insist of relying on cloud, tend to spend far more with me for similar complexity systems. I love taking their money, but I'd frankly rather help them save it, because longer term it's better.
The services are the "easy" part; moving data out of a cloud provider is slow and expensive. For a _really_ big dataset it can take months, sometimes years, just to complete the data transfer.
> I don't understand how it takes 3 years to get off the cloud.
Because in fact cloud is not just someone else's computer.
When you code your entire app to proprietary APIs like Lambda or DynamoDB, it becomes complex to migrate.
On what jeeeeeeezus
Utterly bananas.
Seriously the CTO should be fired.
edit - i read it wrong, removing my comment.
Two different companies. Figma is not 37Signals.
oh, woops. thanks for correcting me, will edit my comment.