Making Postgres scale

(pgdog.dev)

225 points | by levkk 4 months ago ago

192 comments

Eikon 4 months ago
I run a 100 billion+ rows Postgres database [0], that is around 16TB, it's pretty painless!
There are a few tricks that make it run well (PostgreSQL compiled with a non-standard block size, ZFS, careful VACUUM planning). But nothing too out of the ordinary.
ATM, I insert about 150,000 rows a second, run 40,000 transactions a second, and read 4 million rows a second.
Isn't "Postgres does not scale" a strawman?
[0] https://www.merklemap.com/
[-]
- mmontagna9 4 months ago
  People run postgres at scale which is much "larger" than what you are running, which isn't to say that your workload isn't substantial, it is. But there are folks who push it harder, both in terms of writes/rows and workload complexity. It's one thing to write a ton of rows into a single table, it's another thing entirely to write into many tables, index all of them and then query them at scale; you didn't mention much about your workload complexity so I'm not trying to suggest it isn't complex, but there are certainly plenty of folks on this forum who can attest to it being tricky to scale high write workloads, in large organizations with many stakeholders and complex query loads.
  [-]
  - Eikon 4 months ago
    > it's another thing entirely to write into many tables, index all of them and then query them at scale
    Well, that’s pretty much what I am doing.
- Shorn 4 months ago
  > PostgreSQL compiled with a non-standard block size
  Do you think this could become less important for your use case the new PG17 "I/O combining" stuff?
  https://medium.com/@hnasr/combining-i-os-in-postgresql-17-39...
  [-]
  - Eikon 4 months ago
    No, because the custom block size is about reducing zfs r/w amplification.
- kelafoja 4 months ago
  One of things I find challenging is understand the meaning of the word "scales". It is sometimes used differently in different contexts.
  Can it be performant in high load situations? Certainly. Can is elastically scale up and down based on demand? As far as I'm aware it cannot.
  What I'm most interested in is how operations are handled. For example, if it's deployed in a cloud environment and you need more CPU and/or memory, you have to eat the downtime to scale it up. What if it's deployed to bare metal and it cannot handle the increasing load anymore? How costly (in terms of both time and money) is it to migrate it to bigger hardware?
  [-]
  - nine_k 4 months ago
    When it "scales", it usually means "scales up". A scalable solution is such that can withstand a large and increasing load, past the usual limitations of obvious solutions.
    Being elastic is nice, but not always needed. In most cases of database usage, downsizing never happens, or expected to happen: logically, data are only added, and any packaging and archiving only exists to keep the size manageable.
    [-]
    - prng2021 4 months ago
      You’re conflating things. The question was about scaling compute and memory up and down based on load and you’re commenting about never needing to downsize on storage.
  - pluto_modadic 4 months ago
    a database scaling dramatically up and down /under load/ and expecting it to perform the same as steady state seems a bit weird, vs a single, beefy database with a beefy ingest job and a bunch of read only clients searching it?
    like you're more likely to encounter two phases (building the DB in heavy growth mode, and using the DB in light growth heavy read mode).
    A business that doesn't quite yet know what size the DB needs to be has a frightening RDS bill incoming.
- rednafi 4 months ago
  Damn, that’s a chonky database. Have you written anything about the setup? I’d love to know more— is it running on a single machine? How many reader and writer DBs? What does the replication look like? What are the machine specs? Is it self-hosted or on AWS?
  By the way, really cool website.
  [-]
  - Eikon 4 months ago
    I'll try to get a blog post out soon!
    > Damn, that’s a chonky database. Have you written anything about the setup? I’d love to know more— is it running on a single machine? How many reader and writer DBs? What does the replication look like? What are the machine specs? Is it self-hosted or on AWS?
    It's self-hosted on bare metal, with standby replication, normal settings, nothing "weird" there.
    6 NVMe drives in raidz-1, 1024GB of memory, a 96 core AMD EPYC cpu.
    A single database with no partitioning (I avoid PostgreSQL partitioning as it complicates queries and weakens constraint enforcement, and IHMO is not providing much benefits outside of niche use-cases).
    > By the way, really cool website.
    Thank you!
    [-]
    - tracker1 4 months ago
      > A single database with no partitioning (I avoid PostgreSQL partitioning as it complicates queries and weakens constraint enforcement, and IHMO is not providing much benefits outside of niche use-cases).
      That's kind of where I'm at now... you can vertically scale a server so much now (compared to even a decade ago) that there's really no need to bring a lot of complexity in IMO for Databases. Simple read replicas or hot spare should be sufficient for the vast majority of use cases and the hardware is way cheaper than a few years ago, relatively speaking.
      I spent a large part of the past decade and a half using and understanding all the no-sql options (including sharding with pg) and where they're better or not. At this point my advice is start with PG, grow that DB as far as real hardware will let you... if you grow to the point you need more, then you have the money to deal with your use case properly.
      So few applications have the need for beyond a few million simultaneous users, and avoiding certain pitfalls, it's not that hard. Especially if you're flexible enough to leverage JSONB and a bit of denormalization for fewer joins, you'll go a very, very long way.
      [-]
      - arkh 4 months ago
        > you can vertically scale a server so much now
        And you often don't really need to.
        Just last week for some small application and checking the performance of some queries I add to get random data on a dev setup. Which is a dockerized postgres (with no tuning at all) in a VM on a basic windows laptop. I inserted enough data to represent what could maybe be there in 20 years (like some tables got half a billion rows, small internal app). Still no problem chugging along.
        It is crazy when you compare what you can do with databases now on modern hardware with how other software do not feel as having benefited as much. Especially on the frontend side.
        [-]
        tracker1 4 months ago
        Considering the front end today is an amazingly flexible client app platform with flexible rendering styling and accessibility compared to a VB app a few decades ago... It's kind of amazing.
        Only if my favorite websites in the late 90s was 15 seconds make because that's how long people would wait for a webpage to load at the time. Things have improved dramatically.
        [-]
        arkh 4 months ago
        > accessibility
        I'd like to see those accessible frontends. The majority is not usable keyboard-only.
        [-]
        tracker1 4 months ago
        I have done a lot of work for eLearning, govt and banking which required good usability. Also, mui is very good out of the box.
        https://mui.com/material-ui/all-components/
        [-]
        arkh 4 months ago
        I'm gonna doubt you when on they own website navigating is really bad.
        Only tab and shift-tab. Arrow keys are a bust. And the only visible shortcut is ctrl-K for the search input and I think it's because it comes as an algolia default.
        For something better I only have to watch around the page at the browser itself: underlined letters in the menu tells me what alt+letter will open said menu. Then I can navigate using arrow keys and most menu items are shown with a key combination shortcut.
        [-]
        tracker1 4 months ago
        If I could show you some of the apps I've built with it would probably change your mind. A few had to go through testing and validation for accessibility. That and I'm pretty firm on keyboard navigation for all things. Had to tweak the corporate colors a little bit to fit WCAG compliance on the contrasts.
        One thing that was crazy was having to go through verification for blind usability, when the core function (validating scanned documents) requires a well sighted user.
        I won't say MUI is perfect... it isn't... but you can definitely go a lot farther in a browser than you can with what's in the box with most ui component libraries is the only real point.
    - rednafi 4 months ago
      > I'll try to get a blog post out soon!
      Please do.
      > It’s self-hosted on bare metal, with standby replication, normal settings, nothing “weird” there.
      16TB without nothing weird is pretty impressive. Our devops team reached for Aurora way before that.
      > 6 NVMe drives in raidz-1, 1024GB of memory, a 96-core AMD EPYC CPU.
      Since you’re self hosted, I’m you aren’t on AWS. How much is this setup costing you now if you don’t mind sharing.
      > A single database with no partitioning (I avoid PostgreSQL partitioning as it complicates queries and weakens constraint enforcement, and IMHO does not provide many benefits outside of niche use cases).
      Beautiful!
      [-]
      - Eikon 4 months ago
        > Since you’re self hosted, I’m you aren’t on AWS. How much is this setup costing you now if you don’t mind sharing.
        About 28K euros of hardware per replica IIRC + colo costs.
        [-]
        rednafi 4 months ago
        Yearly, 28k Euros I presume.
        Damn. I hope you make enough revenue to continue. This is pretty impressive.
        [-]
        Eikon 4 months ago
        No, one time + ongoing colocation costs.
        [-]
        fulafel 4 months ago
        So that's about 467 eur per month per server assuming a 5 year term. Anyone know what it would be on AWS with Aurora? I had a quick go with https://calculator.aws/ and ended up with a 5-figure sum per month.
        [-]
        Eikon 4 months ago
        I tried for fun:
        https://calculator.aws/#/estimate?id=cfc9b9e8207961f777766e1...
        Seems like it would be 160k USD a month.
        I could not input my actual IO stats there, I was getting:
        Baseline IO rate can't be more than 1000000000 per hour.
        tracker1 4 months ago
        The CPU itself is around $8-10k for a top-end AMD Epyc, $15-20k for the rest of the server, including memory and storage is probably about right. There are still $100k+ servers, but they tend to be AI equipment at this point, not the general purpose stuff, which is sub $30-50k now.
        sgarland 4 months ago
        I mean no disrespect, but it is stunning how hard the idea of owning your own hardware is to a large percentage of the tech population.
        You can just… own servers. I have five in a rack in my house. I could pay a colo a relatively small fee per month for a higher guarantee of power and connectivity. This idea also scales.
      - outworlder 4 months ago
        > 16TB without nothing weird is pretty impressive. Our devops team reached for Aurora way before that.
        Probably depends on the usage patterns too. Our developers commit atrocities in their 'microservices' (which are not micro, or services, but that's another discussion).
        [-]
        sgarland 4 months ago
        I continue to find horror shows during incidents; sometimes not even the cause, merely a tangential rabbit hole I wandered down.
        “Are you… are you storing images in BLOBS?”
        “Yes. Is that bad?”
    - JohnBooty 4 months ago
      Your replies are really valuable and informative. Thank you so much.
      Question - what is your peak utilization % like? How close are you to saturating these boxes in terms of CPU etc?
      [-]
      - Eikon 4 months ago
        I’d say 60-70% overall cpu usage, including database, ingest workers, web app and search.
        > Your replies are really valuable and informative. Thank you so much.
        Thank you!
    - GordonS 4 months ago
      I'm also self-hosting Postgres, and the project is getting to the point where a standby would be a good idea to ensure higher availability.
      Did you use any particular guide for setting up replication? Also, how do you handle failover/fallback to/from standby please?
      [-]
      - Tostino 4 months ago
        Not OP, but managed a ~1tb Postgres install for years. You should use something like pgbackrest or barman to help with both replication (replicas can pull WAL from your backups when catching up), backups, and failovers.
        At least for pgbackrest, set up a spool directory which allows async wal push / fetch.
    - justinclift 4 months ago
      > 6 NVMe drives in raidz-1
      Did you benchmark io rate with different ZFS layouts?
      6 NVMe drives in mirrored pairs would probably be substantially higher latency and throughput
      Though you'd probably need more pairs of drives to match your current storage size. Or get higher capacity NVMe drives. :)
    - stuartjohnson12 4 months ago
      > It's self-hosted on bare metal, with standby replication, normal settings, nothing "weird" there.
      I can build scalable data storage without a flexible scalable redundant resilient fault-tolerant available distributed containerized serverless microservice cloud-native managed k8-orchestrated virtualized load balanced auto-scaled multi-region pubsub event-based stateless quantum-ready vectorized private cloud center? I won't believe it.
      [-]
      - tracker1 4 months ago
        +1 as I'm hoping this is sarcastic humor.
        [-]
        Keyframe 4 months ago
        from riches to RAG.
- chrisandchris 4 months ago
  And here I am, having an ERP software at hand whose database I "should not query" with some (3-6) joins and likely in the 100'000s of records in total in the larger tables because "the DB server will go down".
  [-]
  - WorldWideWebb 4 months ago
    PeopleSoft?
    [-]
    - chrisandchris 4 months ago
      Progress
- frollogaston 4 months ago
  "Postgres does not scale" means that you can't just give it more machines, which is true. At some point you've got the fastest available machine, maybe you've played with the settings, and that's it. Then you can embark on the kind of manual sharding journey the article describes.
  But most of the time, an RDBMS is the right tool for the job anyway, you just have to deal with it.
  [-]
  - Eikon 4 months ago
    > "Postgres does not scale" means that you can't just give it more machines, which is true.
    Well, it’s only true for writes.
    [-]
    - frollogaston 4 months ago
      It's still true for reads if you need them to be fully consistent, but yeah, read replicas are the answer sometimes.
    - 4 months ago
      [deleted]
- MR4D 4 months ago
  Having some issues with your numbers, but I’m probably just missing something…
  If you insert 150K rows per second, that’s roughly 13 Billion rows per day.
  So you’re inserting 10%+ of your database size every day?
  That seems weird to me. Are you pruning somewhere? If not, is your database less than a month old? I’m confused.
  [-]
  - Eikon 4 months ago
    Well, that's why I said "ATM", it's not a sustained rate, all the time. And yes, there's a bunch of DELETEs too.
- vanviegen 4 months ago
  Regarding MerkleMap: Ouch! I never realized letsencrypt is leaking all of the sub domains i'm creating. :-/ I thought the names were giving me a bit of extra security through obscurity.
  Why does it do that? I thought only revocations need to be published?
  [-]
  - jorams 4 months ago
    It does that because it's required to. The system is called Certificate Transparency[1] and browsers require certificates to be in CT logs to be accepted.
    If you want to hide what subdomains you have you can use a wildcard certificate, though it can be a bit harder to set up.
    [1]: https://developer.mozilla.org/en-US/docs/Web/Security/Certif...
    [-]
    - vanviegen 4 months ago
      Thanks for explaining!
- ambicapter 4 months ago
  > ATM, I insert about 150,000 rows a second, run 40,000 transactions a second, and read 4 million rows a second.
  Silly question but is this at the same time, regular daily numbers or is that what you've benchmarked?
- sa46 4 months ago
  I used to run a bunch of Postgres nodes at a similar scale. The most painful parts (by far) were restoring to a new node and major version upgrades.
  Any tricks you used for those parts?
  [-]
  - Eikon 4 months ago
    > were restoring to a new node
    Zfs send / recv or replication.
    > I used to run a bunch of Postgres nodes at a similar scale. The most painful parts (by far) were restoring to a new node and major version upgrades. Any tricks you used for those parts?
    Replication makes this pretty painless :)
    [-]
    - wg0 4 months ago
      So run a replica? Is there more literature on that?
      [-]
      - Eikon 4 months ago
        > So run a replica?
        Basically, yes.
        https://www.postgresql.org/docs/current/runtime-config-repli...
- plavreshin 4 months ago
  You have decent documentation portal https://www.merklemap.com/documentation with related articles and API examples, great work! By the way, did you use any markdown2html converter here to built developer portal or this is just a simplified Redocly version of it?
  [-]
  - Eikon 4 months ago
    It’s mdx, with nextjs.
    Thanks for the kind words!
- tnorgaard 4 months ago
  Super interesting compiling pg, I assume, with same as the zfs block size! It was always on our todo to try, but never got around to it. If possible, what block size did you end up with? Have you tried zfs direct io in 2.3.x, if so, could you share any findings? Thanks for sharing - and cool website!
  [-]
  - Eikon 4 months ago
    I don’t think Postgres will be able to benefit from direct io? I might be wrong though!
    I use Postgres with 32K BLKSZ.
    I am actually using default 128K zfs recordsize, in a mixed workload, I found overall performance nicer than matching at 32K, and compression is way better.
    > Thanks for sharing - and cool website!
    Thank you!
- amw-zero 4 months ago
  Curious. At that scale and transaction rate, are you deleting / offloading rows to another storage solution after some amount of time? I’m assuming you’re not just letting 150,000 rows a second accumulate indefinitely.
- 4 months ago
  [deleted]
- shayonj 4 months ago
  > I insert about 150,000 rows a second
  That's amazing - I would love to know if you have done careful data modeling, indexing, etc that allows you to get to this and what kind of data is being insert ed?
  [-]
  - Eikon 4 months ago
    I am not optimizing too much around insertion speed. I avoid GIN, GIST and hash indexes.
    The schema is nicely normalized.
    I had troubles with hash indexes requiring hundreds of gigabytes of memory to rebuild.
    Postgres B-Trees are painless and (very) fast.
    Eg. querying one table by id (redacted):
```
    EXPLAIN ANALYZE SELECT * FROM table_name WHERE id = [ID_VALUE];

    Index Scan using table_name_pkey on table_name  (cost=0.71..2.93 rows=1 width=32) (actual time=0.042..0.042 rows=0 loops=1)
     Index Cond: (id =  '[ID_VALUE]'::bigint)
    Planning Time: 0.056 ms
    Execution Time: 0.052 ms
```
    Here’s a zpool iostat 1
```
    # zpool iostat 1
    operations     bandwidth 
     read  write   read  write
    -----  -----  -----  -----
     148K   183K  1.23G  2.43G
     151K   180K  1.25G  2.36G
     151K   177K  1.25G  2.33G
     148K   153K  1.23G  2.13G
```
    [-]
    - chatmasta 4 months ago
      Have you measured _transactions_ per second? For both “insert only” and “mixed insert/update/delete?” This might be a more telling metric in terms of the stress it puts on the system. There’s a big difference between 1 transaction with 150k inserts, and 150k transactions with 1 insert each – especially when the workload shifts to include updates in the middle of those.
      If you’re doing insert only, you might benefit from copying directly to disk rather than going through the typical query interface.
    - shayonj 4 months ago
      thank you for the response and stats! very cool
  - dist1ll 4 months ago
    150k rows/second is pretty ordinary if you do batching, especially for a machine of that size. I was able to get 250k+ row inserts on a 16vCPU on a table with 16 columns, a third of which are strings >30 bytes. Pretty sure you can push that much higher with COPY.
    [-]
    - shayonj 4 months ago
      that's true, you can get very far with COPY
- nickf 4 months ago
  Curious as to why it’s that many inserts? The whole CT ecosystem has about 30-40,000 new certs/precerts a minute IIRC). Love merkelmap though!
  [-]
  - Eikon 4 months ago
    There’s backlog I still need to process :)
- nesarkvechnep 4 months ago
  Because no one else cares to ask, are you running FreeBSD? I ask because you use ZFS.
  [-]
  - Eikon 4 months ago
    Debian!
    [-]
    - nesarkvechnep 4 months ago
      Damn, a stud! ZFS on Linux.
      [-]
      - Eikon 4 months ago
        Ahaha :)
- oa335 4 months ago
  Can you please share some tips and tricks for achieving such high throughput?
- marsavar 4 months ago
  That's amazing! Could you elaborate more on your VACUUM planning?
  [-]
  - Eikon 4 months ago
    Mostly, be careful with long-running transactions (hours long), and modify your autovacuum settings to be as aggressive as possible based on your workload. Be careful with the freezing threshold too.
    I ran into
```
    multixact "members" limit exceeded
```
    Quite a bit when starting out :)
    [-]
    - whitepoplar 4 months ago
      Can you recommend a good rule of thumb for autovacuum settings given a large workload like yours?
      [-]
      - Eikon 4 months ago
        Here's mine:
        - autovacuum_max_workers to my number of tables (Only do so if you have enough IO capacity and CPU...).
        - autovacuum_naptime 10s
        - autovacuum_vacuum_cost_delay 1ms
        - autovacuum_vacuum_cost_limit 2000
        You probably should read https://www.postgresql.org/docs/current/routine-vacuuming.ht... it's pretty well written and easy to parse!
        [-]
        ahoka 4 months ago
        Do you ever need to VACUUM FULL?
        [-]
        Eikon 4 months ago
        It’s too slow at that scale, pg_squeeze works wonders though.
        I only “need” that because one of my table requires batch deletion, and I want to reclaim the space. I need to refactor that part. Otherwise nothing like that would be required.
        [-]
        ahoka 4 months ago
        Yes, I was asking because you have mention deletes. Thanks for the answer, cool stuff!
- throwawaytodey 4 months ago
  [dead]
aprdm 4 months ago
99.9% of the companies in the world will never need more than 1 beefy box running postgres with a replica for a manual failover and/or reads.
[-]
- frollogaston 4 months ago
  Availability is trickier than scalability. An async replica can lose a few recent writes during a failover, and a synchronous replica is safer but slower. A company using some platform might not even know which one they're using until it bites them.
  [-]
  - wavemode 4 months ago
    99.9% of companies also aren't going to feel the performance difference of synchronous replication.
    That being said, the setups I typically see don't even go that far. Most companies don't mitigate for the database going down in the first place. If the db goes down they just eat the downtime and fix it.
fmajid 4 months ago
Skype open-sourced their architecture way back, using PL/Proxy to route calls based on shard. It works, is quite elegant, handled 50% of all international phone calls in the noughties. My old company used it to provide real-time analytics on about 300M mobile devices.
https://wiki.postgresql.org/images/2/28/Moskva_DB_Tools.v3.p...
https://s3.amazonaws.com/apsalar_docs/presentations/Apsalar_...
[-]
- Keyframe 4 months ago
  Skype has had from the beginning the requirement that all database access must be implemented through stored procedures.
  That presentation starts with hard violence.
  [-]
  - fmajid 4 months ago
    Righteous violence.
    One of the benefits of stored procedures they don't mention is SECURITY DEFINER, which is like setuid.
    You can for instance have a user table with login and hashed password, have a stored procedure that can verify login and password, without giving SELECT access to the user table to the database user your application use.
    Stored procedures also block SQL injection attacks.
  - Tostino 4 months ago
    If the database team designed a thoughtful API with stored procedures, this can actually be a quite nice way to interact with a database for specific uses.
    Being 100% hard and fast on that rule seems like a bad idea though.
    [-]
    - dboreham 4 months ago
      Fashionable 20 years ago but thankfully everyone who had that bee in their bonnet seems to have retired.
      [-]
      - tomnipotent 4 months ago
        I think it's a perfectly fine approach in 2025 now that that CI/CD have proliferated and you're less likely to run into a human DBA arbitrarily blocking deployments. It was always the feudal lord mentality of DB operations that made relying so much on stored procedures undesirable.
      - Kinrany 4 months ago
        It'll make a comeback once stored procedures can be easily written in real programming languages using standard tools.
        [-]
        sgarland 4 months ago
        You already can [0]. C, PL/Perl, PL/Python, and PL/Tcl exist out of the box, in addition to PL/pgSQL, which I assume you were implying isn’t a “real programming language.”
        [0]: https://www.postgresql.org/docs/current/server-programming.h...
        [-]
        Kinrany 4 months ago
        Sure, even PL/Rust is available. But what does the deployment pipeline look like, from version control to a live instance?
        [-]
        sgarland 4 months ago
        That is described in the link, in section 36.18, albeit for C. If you have another language, i assume you get to figure out how to make it work.
        [0]: https://www.postgresql.org/docs/current/extend-pgxs.html
        fulafel 4 months ago
        Few people seem to be using these. At a guess the dev experience is not great.
        [-]
        sgarland 4 months ago
        Frankly, I think devs need to stop being such primmadonnas about DX. Not everything has a slick UI that does most of the work for you. Take Linux, for example: you have to email your PR as a patch, inline as plaintext to a mailing list. Yet somehow, they manage, and it had an incredibly active community.
        [-]
        fulafel 4 months ago
        Alternative programming languages are all about DX. PL/pgsql is turing complete after all.
        [-]
        sgarland 4 months ago
        It is indeed, but I’ll be the first to admit it’s a bit clunky to use beyond a few variables and actions.
        Sorry for snapping. I’m exhausted with devs complaining that some older and well-established piece of tech (Postgres, HAProxy, nginx to name a few) isn’t easy enough to use, and then using something demonstrably worse, or writing their own terrible version of it. Work trauma.
        4 months ago
        [deleted]
  - sgarland 4 months ago
    I would much rather this than letting devs write their own queries, ORM or not. But then, I own the DBs, and I am constantly dealing with the result of poor schema and query decisions.
craigkerstiens 4 months ago
Probably as useful is the overview of what pgdog is and the docs. From their docs[1]: "PgDog is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code."
[1] https://docs.pgdog.dev/
rednafi 4 months ago
Another option is going full-scale with CockroachDB. We had a Django application backed by PostgreSQL, which we migrated to CockroachDB using their official backend.
The data migration was a pain, but it was still less painful than manually sharding the data or dealing with 3rd party extensions. Since then, we’ve had a few hiccups with autogenerated migration scripts, but overall, the experience has been quite seamless. We weren’t using any advanced PostgreSQL features, so CockroachDB has worked well.
[-]
- skunkworker 4 months ago
  Unless their pricing has changed, it’s quite exorbitant when you need a lot of data. To the point that one year of cockroachdb would cost 5x the cost of the server it was running on.
  [-]
  - CharlesW 4 months ago
    I think you're referring to the CockroachDB Cloud DBaaS offering vs. CockroachDB itself, correct?
    [-]
    - rednafi 4 months ago
      Not the parent but yeah, most likely. But then again, you probably don’t want to maintain your own deployment of Cockroach fleet.
  - rednafi 4 months ago
    This is still true. I wouldn’t use Cockroach if it were my own business. Also, they don’t offer any free version to try out the product. All you get is a short trial period and that’s it.
    [-]
    - CharlesW 4 months ago
      > Also, they don’t offer any free version to try out the product.
      The site makes it seems as if I can install CockroachDB on Mac, Linux, or Windows and try it out for as long as I like. https://www.cockroachlabs.com/docs/v25.1/install-cockroachdb... Additionally, they claim CockroachDB Cloud is free for use "up to 10 GiB of storage and 50M RUs per organization per month".
      [-]
      - re-thc 4 months ago
        > The site makes it seems as if I can install CockroachDB on Mac, Linux, or Windows and try it out for as long as I like.
        Therare limitations in terms of licenses [0].
        [0]: https://www.cockroachlabs.com/docs/v25.1/licensing-faqs
        [-]
        benjiro 4 months ago
        Licenses are also yearly renewed. Its not like you get one license and can use it forever. No, you need to yearly enter the new license in your instance, if you do not, after 2 weeks it goes into cripple mode (as in, you can not run anything that is not a personal blog).
        And those free-licenses have this dirty little clause that you are not entitled to a license, they need to APPROVE a free-license. Now that is even more scary.
        They pull all this because people kept using the free-core version and people simply never upgrade/wanted more. That is why all these changed happened. Coincidentally, the buzz around CRDB has died down to the point that most talks about CRDB are these rare mentions here (even reddit is as good as dead). 98% of CRDB mentioning how great it is, is all origination from CRDBLabs. Do a google and limit in time range, and then go page by page, ... They are a enterprise only company at this point.
      - benjiro 4 months ago
        > Additionally, they claim CockroachDB Cloud is free for use "up to 10 GiB of storage and 50M RUs per organization per month".
        Take in account, that the constant CPU/Mem/Query monitoring that CRDB does, eats up around 20 a 30M RUs per month. There are some people that complained as to why there free instances lost so much capacity. And those RU are not 1:1, like, you do a insert, its 1RU, oooo, no ... Its like 9 to 12RU or something.
        Its very easy to eat all those RUs on a simply website. Let alone something that needs to scale. Trust me, your better of self deploying but then you enjoy the issue of the new licenses / forced telemetric / forced phone home, or spend 125$+ / vcpu (good luck finding out the price, we only know these numbers from people breaking nda offers). They are very aggressive in sales.
        Its not worth it to tie your company to a product, that can chance licenses on a whim, that charges Oracle prices (and uses the same tactics). I am very sure that some of their sales staff is ex-Oracle employees. ;)
      - rednafi 4 months ago
        Oh yeah, you can run docker-compose and play with the local version as long as you want. But their cloud offers are limited and quite expensive.
        [-]
        Ocha 4 months ago
        You can use official binary in production deployment - you just need to manage it yourself like you would manage Postgres.
- traceroute66 4 months ago
  > Another option is going full-scale with CockroachDB
  Just beware that CockroachDB is not a drop-in replacement for PostgreSQL.
  Last time I looked it was missing basic stuff. Like stored functions. I don't call stored functions an "advanced feature".
  [-]
  - frollogaston 4 months ago
    I also wonder what limitations you get on queries. Cause in Spanner, there's something about cross-directory queries, or Citus can't do cross-shard FKs iirc. There's no magic way to shard a relational DB, you have to deal with it one way or another.
    [-]
    - vvern 4 months ago
      Cockroach really doesn’t have limitations when it comes to queries. It is kinda magic in that regard. The single partition queries are the special case and are treated as rare. Cockroach ranges are quite small compared to other systems (~256MiB).
      [-]
      - frollogaston 4 months ago
        Interesting, I'll have to try it out at some point.
  - benjiro 4 months ago
    > Just beware that CockroachDB is not a drop-in replacement for PostgreSQL.
    O, its way worse then that... A ton of small functionality tend to be missing or work differently. Sometimes even small stuff like column[1:2] does not even exist in CRDB and other times its things like ROW LEVEL SECURITY ... you know, something you may want on a mass distributed database then will be use for tenant setups (i hear they are finally going to implement it this year).
    The main issue is the performance... You are spending close to ~4x the resources on a similar performing setup. And this is a 2x Pgres (sync) vs 3x CRDB (where every node is just a replica). Its not the replication itself but the massive overhead on the raft protocol + the way less optimized query planner.
    To match Pgres, your required to deploy around 10 a 12 nodes, on hardware that needs about twice the resources. That is the point where CRDB performance on a similar level.
    The issue is, well, you increase the resources to your Postgresql server and the gap is back. You can scale Postgresql to insane levels with 64, 96 CPUs ... Or imagine, you know, just planning your app in advance in such a way, that you spread your load over multiple Postgresql instances. Rocket science folks ;)
    CRDB is really fun to watch, the build in GUI, the replication being very visual, but its a resource hog. The storage system (peble) eats a ton of resources to compact the data, when you can simply solve that with a PostgreSQL instance with zfs (with ironically, often better compression).
    I do not joke when i say, that seeing those step cpu spikes in the night hours of the compacter working, is painful. Even a basic empty CRDB instance, just logging its own usage, runs between 7 to 50% on a quad ARM N1, constantly.
    PostgreSQL? You do not even know its running. Barely any CPU usage, what memory usage?
    And we have not talked license/payment issues ... Of "free enterprise" version with FORCED telemetric on, AND your not allowed to hide the server (aka, if it can not call home, it goes into restricted mode in 7 days, with like 50 queries / second ... aka, the same as just shutting down your DB ). By the way, some people reported that its 125+ dollar/vcore payment. Given that even the most basic CRDB gets you 3x instances, with minimum 4 cores ... Do the math. Yea, after 10M income but they chance their licenses every few years, so who knows next year or the year after that.
    Interesting product, shitty sales company behind it. I am more interested to see when the postgresql storage extension orioledb comes out, so it solve the main issue that prevents postgresql scaling even more, namely the write/vacuum issue. And ofcourse a better solution to upgrade postgresql versions, there CRDB is leaps and bound better.
- levkk 4 months ago
  I'm glad you brought up the migration, because one of the main goals behind our project is to automate migrating to a sharded deployment. You can think of your DB as the base case (num_shards = 1), and PgDog as the recursive solution.
  [-]
  - rednafi 4 months ago
    Automatic data transfer would be super cool to get out of the box. We had a custom multi-tenancy solution for our application that heavily used PostgreSQL schemas. One schema for each customer.
    It was a pain to get that work with Cockroach since it doesn’t optimize cross schema queries and suggests one DB per customer. This was a deal breaker for us and we had to duplicate data to avoid cross schema queries.
    Being able to live within Postgres has its advantages.
mindcrash 4 months ago
Well, ofcourse it does! :)
Another (battle tested * ) solution is to deploy the (open source) Postgres distribution created by Citus (subsidiary of Microsoft) on nodes running on Ubuntu, Debian or Red Hat and you are pretty much done: https://www.citusdata.com/product/community
Slap good old trusty PgBounce in front of it if you want/need (and you probably do) connection pooling: https://www.citusdata.com/blog/2017/05/10/scaling-connection...
*) Citus was purchased by Microsoft more or less solely to provide easy scale out on Azure through Cosmos DB for PostgreSQL
[-]
- gigatexal 4 months ago
  Is it really that easy? What are the edge cases?
  [-]
  - levkk 4 months ago
    It's not. We tried. Plus, it doesn't work on RDS, where most of production databases are. I think Citus was a great first step in the right direction, but it's time to scale the 99% of databases that don't run on Azure Citus already.
    [-]
    - mindcrash 4 months ago
      That's because Amazon wants to do whatever they like themselves... you apparently can get stuff to work by running your own masters (w/ citus extension) in EC2 backed by workers (Postgres RDS) in RDS:
      https://www.citusdata.com/blog/2015/07/15/scaling-postgres-r... (note that this is a old blog post -- pg_shard has been succeeded by citus, but the architecture diagram still applies)
      And me saying "Apparently" because I have no experience dealing with large databases on AWS.
      Personally had no issues with Citus too, both on bare metal/VMs and as SaaS on Azure...
  - caffeinated_me 4 months ago
    Depends on your schema, really. The hard part is choosing a distribution key to use for sharding- if you've got something like tenant ID that's in most of your queries and big tables, it's pretty easy, but can be a pain otherwise.
    [-]
    - mindcrash 4 months ago
      Same pain as with good old (native) partitioning, right? :)
      As with partitioning, in my experience something like a common key (identifying data sets), tenant id and/or partial date (yyyy-mm) work pretty great
      [-]
      - caffeinated_me 4 months ago
        For a multi-tenant use case, yeah, pretty close to thinking about partitioning.
        For other use cases, there can be big gains from cross-shard queries that you can't really match with partitioning, but that's super use case dependent and not a guaranteed result.
gourneau 4 months ago
I’m working with several Postgres databases that share identical schemas, and I want to make their data accessible from a single interface.
Currently, I’m using Postgres FDWs to import the tables from those databases. I then create views that UNION ALL the relevant tables, adding a column to indicate the source database for each row.
This works, but I’m wondering if there’s a better way — ideally something that can query multiple databases in parallel and merge the results with a source database column included.
Would tools like pgdog, pgcat, pganimal be a good fit for this? I’m open to suggestions for more efficient approaches.
Thanks!
999900000999 4 months ago
I'm kind of interested in why we can't make a better database with all of our modern technology.
Postgres is a fantastic workhorse, but it was also released in the late 80s. Who, who among you will create the database of the future... And not lock it behind bizarro licenses which force me to use telemetry.
[-]
- HighlandSpring 4 months ago
  There are "better" databases but they're better given some particular definition that may not be relevant to your needs. If SQL/the relational model and ACID semantics is what you need then postgres is simply the best in class. The fact it dates back to the 80s is probably an advantage (requirement?) when it comes to solving a problem really well
- trescenzi 4 months ago
  I guess I’d ask why is something having been first released in the late 80s, or any decade, as positive or negative? Some things are still used because they solve the problems people have. Some things are still used just because of industry capture. I’m not honestly sure where I’d put Postgres.
  Are there specific things you’d want from a modern database?
  [-]
  - 999900000999 4 months ago
    Relating to the article, better scaling. Saying run it on a bigger box is a very brute force way to optimize an application.
    While they come up with some other tricks here, that's ultimately what's scaling postgres means.
    If I imagine a better database, it would have native support for scaling, a postgres compatible data layer as well as first party support for NoSQL( JSONB columns don't cut it since if you have simultaneous writes unpredictable behavior tends to occur).
    It needs to also have a permissible license
    [-]
    - mike_hearn 4 months ago
      What does permissible license mean? If you mean open source, no such database exists AFAIK. If you mean you can run it locally for free for dev purposes, on prem without telemetry etc, then Oracle is clearly the best option.
      Compared to Postgres, Oracle DB:
      • Scales horizontally with full SQL and transactional consistency. That means both write and read masters, not replicas - you can use database nodes with storage smaller than your database, or with no storage, and they are fully ACID.
      • Has full transactional MQ support, along with many other features.
      • Can scale elastically.
      • Doesn't require vacuuming or have problems with XID wraparound. These are all Postgresisms that don't affect Oracle due to its better MVCC engine design.
      • Has first party support for NoSQL that resolves your concern (see SODA and JSON duality views).
      I should note that I have a COI because I work part time at Oracle Labs (and this post is my own yadda yadda), but you're asking why does no such database exist and whether anyone can make one. The database you're asking for not only exists but is one of the world's most popular databases. It's also actually quite cheap thanks to the elastic scaling. Spec out a managed Oracle DB in Oracle's cloud using the default elastic scaling option and you'll find it's cheaper than an Amazon Postgres RDS for similar max specs!
      [-]
      - 999900000999 4 months ago
        Does Oracle offer a Firebase like application/layer with authentication and stuff.
        Can you get me some Oracle cloud credits ?
        First positive thing I’ve ever heard about an oracle project !
        [-]
        mike_hearn 4 months ago
        Oracle Cloud (called OCI) has an always-free offering, you don't need credits. You can just sign up and use some small quantity of resources for nothing indefinitely. That includes a managed Oracle database:
        https://docs.oracle.com/en/cloud/paas/autonomous-database/se...
        There's also some sort of startup credits program with a brochure here, apparently you can just fill out a form and get some credits with an option to apply for more. But I don't know much about that. I've used the always-free programme for some personal stuff and it worked fine so I never needed to think about credits.
        https://www.oracle.com/a/ocom/docs/free-cloud.pdf
        I have to admit I'm not really familiar with Firebase, I thought that was some managed service for mobile apps, but Oracle DB comes with some stuff that sounds similar. And Oracle Cloud is an AWS-style cloud, it has a ton of high level services for things.
        ORDS is a REST binding layer that lets you export tables, views, stored procedures and NoSQL JSON document stores over HTTP without writing a middleman server yourself. You can drive it directly from the browser. ORDS supports OAuth2 or can be integrated with custom auth schemes from what I understand. I've not used ORDS myself yet but probably will in the near future.
        https://www.oracle.com/database/technologies/appdev/rest.htm...
        Firebase IIRC when it first launched was known for push streaming of changes. Oracle DB lets you subscribe to the results of SQL queries and get push notifications when they change, either directly via driver callbacks or into a message queue for async processing later. It's pretty easy to hook such notifications up to web sockets or SSE or similar, in fact I've done that in my current project.
        There's also a thing called APEX which is a bundled visual low-code app builder. I've never used it but I've used apps built with it, and it must be quite flexible as they all had a lot of features and looked very different. You can tell you're using an APEX app because they have a lot of colons in the URLs for some reason. Here's a random example of one from outside of Oracle that exports a database of dubious scientific research papers:
        https://dbrech.irit.fr/pls/apex/f?p=9999:1::::::
        I'm not holding it up as a great example, there are probably better examples out there, it's just a one that came to mind that's public and I used before.
        [-]
        999900000999 4 months ago
        Firebase, is basically a back-end server in a box. Authentication, check, file storage , check. Databases, check. Firebase functions which allow for complex logic written in JavaScript, check. Hosting, check. Supabase comes really close, but has been much more difficult in actual use.
        I'm working on a fully open source game right now, and I can't ethically tell people to hook into a closed source service. But I must admit, having to use supabase instead of firebase has made this much harder than it needs to be.
        [-]
        mike_hearn 4 months ago
        Hm right well you can certainly write apps that run fully inside the database and use its services, so with ORDS doing OAuth2 I guess that gets close to something like Firebase. You can run logic written in JS easily too, it has an embedding of graaljs which is a fast JS engine that's well integrated with the rest of the database environment.
        kiwicopple 4 months ago
        > Supabase comes really close, but has been much more difficult in actual use
        supabase team here - can you share more about the challenges? We'd love your feedback so that we can fix any difficulties for the future
        [-]
        999900000999 4 months ago
        It's not just one thing, but it's much harder to use vs firebase.
        For example, Firebase doesn't care about captchas for anonymous authentication. Supabase explicitly warns you to enable this before allowing anonymous authentication.
        The problem here is not every client is going to be able to use captchas. You can't just enable it for new signups, but disable it for signing in.
        If user XYZ used a captcha to register, they probably aren't a bot when they sign in later. Say they sign in using a game engine client. Unless I want to open up a webview a captcha won't work.
        Honestly I might not be using Supabase correctly. Basically I'm working on a small open source card game. I created a very similar game in Firebase previously. I used the database to manage state and then wrote some basic logic in Firebase functions.
        I'm not sure exactly why, but this has been much harder in Supabase. I will admit my SQL isn't the best, so maybe I'm just used to NoSQL...
    - throwaway7783 4 months ago
      Can you please expand on the JSONB unpredictable behavior? We are about to embark on a journey to move some of our data from MongoDB to postgres (with some JSONB). While we don't have significant concurrent writes to a table, would be very helpful to understand the issues
      [-]
      - 999900000999 4 months ago
        https://github.com/shrinerb/shrine/discussions/665
        I've never personally encountered this, but I've seen other HN contributors mention it. https://news.ycombinator.com/item?id=43189535
        From what I can tell, unlike mongo, some postgres queries will try to update the entire JSONB data object vs a single field. This can lead to race conditions.
        [-]
        throwaway7783 4 months ago
        Ah, thanks. The first link seems something specific to Shrine. The bottomline is concurrent updates to different parts of JSONB need row level locking for correct behavior in Postgresql. This is not an important issue for us. Thank you for the pointers
        [-]
        ahoka 4 months ago
        You can trivially do it without locking using a version column.
      - leonidasv 4 months ago
        You may be interested in FerretDB[1].
        [1] https://www.ferretdb.com/
    - Reefersleep 4 months ago
      What's an example of non-brute force scaling?
      [-]
      - HDThoreaun 4 months ago
        The problem with postgres scaling is that you have to have a single master which means horizontal scaling really only gives you more reads and a failover. Eventually you wont be able to find a server big enough to handle all the writes, and if you get enough reads with even a small number of writes single master setups fall over. Distributed computing gets complicated very quickly but the gist here is basically that you need to be able to have multiple instances that can accept writes. Lots of literature on this but good starting points imo would be the paxos paper https://lamport.azurewebsites.net/pubs/time-clocks.pdf and dynamo db paper https://www.allthingsdistributed.com/files/amazon-dynamo-sos...
      - 999900000999 4 months ago
        Something like a queuing system, re-architecting the database to handle a higher load without just throwing more machines on it. A cache.
    - aprdm 4 months ago
      Why is it brute force and why is it bad ?
- thinkingtoilet 4 months ago
  Postgres 17.4 was released last month. Show some respect to the devs.
- wmf 4 months ago
  Anyone who creates a better database is going to want to get paid for it which either means DBaaS or those weird licenses.
- HDThoreaun 4 months ago
  Postgres is the most worked on database in the world right now. Its original release date doesnt mean work stopped. No-sql was a new thing a decade-ish ago but most companies probably dont need it. New data platforms focused on scaling like snowflake and cockroach have come too but again for most use cases postgres is better.
- moltar 4 months ago
  Look at AWS presentation/talk about Aurora DSQL [1]
  It’s a Postgres facade. But everything beyond that is a complete reimagining and a rewrite to scale independently.
  I personally think it’s going to eat a lot of market share when it solves some remaining limitations.
  [1] https://youtu.be/huGmR_mi5dQ?si=ALw4XjdDJBxkZWRv
- DoctorOW 4 months ago
  Have you looked at CockroachDB? PostgreSQL compatibility with modern comforts (e.g. easy cloud deployments, horizontal scaling, memory safe language)
  [-]
  - rednafi 4 months ago
    Came here to say this. Cockroach solves the sharding issue by adopting consistent hashing-based data distribution, as described in the Dynamo paper. However, their cloud solution is a bit expensive to get started with.
    [-]
    - I_am_tiberius 4 months ago
      Does CockroachdB already support ltree?
      [-]
      - rednafi 4 months ago
        It doesn’t. It doesn’t support most of the Postgres extensions. We got away with it because we weren’t doing anything beyond vanilla Postgres.
eximius 4 months ago
Something I don't see in the pgdog documentation is how cross-shard joins work. Okay, if I do a simple `select * from users order by id`, you'll in-memory order the combined results for me. But if I have group by and aggregations and such? Will it resolve that correctly?
[-]
- levkk 4 months ago
  Aggregates are a work in progress. We're going to implement them in this order:
  1. count
  2. max, min, sum
  3. avg (needs a query rewrite to include count)
  Eventually, we'll do all of these: https://www.postgresql.org/docs/current/functions-aggregate..... If you got a specific use case, reach out and we'll prioritize.
  [-]
  - eximius 4 months ago
    Heh, no chance I can introduce this at work and hard to have a personal project requiring it. :)
    I think you probably need some documentation to the effect of the current state of affairs, as well as prescriptions as to how to work around it. _Most_ live workloads, even if the total dataset is huge, have a pretty small working set. So limiting DB operations to simple fetches and doing any complex operations in memory is viable, but should be prescribed as the solution or people will consider it's omission as a fault instead of a choice.
    [-]
    - levkk 4 months ago
      I ended up adding support for GROUP BY: https://github.com/pgdogdev/pgdog/pull/43
      I had it in the back of my mind for a while, nice to have it in code. Works pretty well, as long as columns in GROUP BY are present in the result set. Otherwise, we would need to rewrite the query to include them, and remove them once we're done.
    - levkk 4 months ago
      No worries. It's early days, the code and docs are just a few months in the making. I'm happy to keep you updated on the progress. If you want, send your contact info to hi@pgdog.dev.
      - Lev
briankelly 4 months ago
People talk about scale frequently as a single dimension (and usually volume as it relates to users) but that can be oversimplifying for many kinds of applications. For instance, as you are thinking about non-trivial partitioning schemes (like if there is high coupling between entities of the same kind - as you see in graphs) is when you should consider alternatives like the Bigtable-inspired DBs, since those are (relatively) more batteries included for you.
> It’s funny to write this. The Internet contains at least 1 (or maybe 2) meaty blog posts about how this is done
It would’ve been great to link those here. I’m guessing one refers to StackOverflow which has/had one of the more famous examples of scaled Postgres.
[-]
- banashark 4 months ago
  Notion wrote about their experience before:
  https://www.notion.com/blog/sharding-postgres-at-notion
  https://www.notion.com/blog/the-great-re-shard
- levkk 4 months ago
  I was thinking of the Instagram post years ago. And maybe the Instacart one.
  [-]
  - briankelly 4 months ago
    Maybe this one for Instagram? https://instagram-engineering.com/sharding-ids-at-instagram-...
fourseventy 4 months ago
I run a postgresql db with a few billion rows at about 2TB right now. We don't need sharding yet but when we do I was considering Citus. Does anyone have experience implementing Citus that could comment?
[-]
- caffeinated_me 4 months ago
  It can be great, depending on your schema and planned growth. Questions I'd be asking in your shoes:
  1. Does the schema have an obvious column to use for distribution? You'll probably want to fit one of the 2 following cases, but these aren't exclusive:
```
    1a. A use case where most traffic is scoped to a subset of data. (e.g. a multitenant system). This is the easiest use case- just make sure most of your queries contain the column (most likely tenant ID or equivalent), and partially denormalize to have it in tables where it's implicit to make your life easier. Do not use a timestamp. 

    1b. A rollup/analytics based use case that needs heavy parallelism (e.g. a large IoT system where you want to do analytics across a fleet). For this, you're looking for a column that has high cardinality witout too many major hot spots- in the IoT use case mentioned, this would probably be a device ID or similar
```
  2. Are you sure you're going to grow to the scale where you need Citus? Depending on workload, it's not too hard to have a 20TB single-server PG database, and that's more than enough for a lot of companies these days.
  3. When do you want to migrate? Logical replication in should work these days (haven't tested myself), but the higher the update rate and larger the database, the more painful this gets. There's not a lot of tools that are very useful for the more difficult scenarios here, but the landscape has changed since I've last had to do this
  4. Do you want to run this yourself? Azure does offer a managed service, and Crunchy offers Citus on any cloud, so you have options.
  5. If you're running this yourself, how are you managing HA? pg_auto_failover has some Citus support, but can be a bit tricky to get started with.
  I did get my Citus cluster over 1 PB at my previous job, and that's not the biggest out out there, so there's definitely room to scale, but the migration can be tricky.
  Disclaimer: former Citus employee
saisrirampur 4 months ago
Interesting technology. Similar to Citus but not built as an extension. The Citus coordinator, which is a Postgres database with the Citus extension, is replaced by a proxy layer written in Rust. That might provide more flexibility and velocity implementing distributed planning and execution than being tied to the extension ecosystem. It would indeed be a journey to catch up with Postgres on compatibility, but it's a good start.
[-]
- sroussey 4 months ago
  So like the MySQL proxies of long ago?
  There are definitely advantages of not running inside the system you wish to orchestrate.
  Better keep up with the parser though!
  [-]
  - levkk 4 months ago
    We use the Postgres parser directly, thanks to the great work of pg_query [1].
    [1] https://github.com/pganalyze/pg_query.rs
Karupan 4 months ago
Tangentially related: is there a good guide or setup scripts to run self hosted Postgres with backups and secondary standby? Like I just want something I can deploy to a VPS/dedicated box for all my side projects.
If not is supabase the most painless way to get started?
akshayshah 4 months ago
Apart from being backed by Postgres instead of MySQL, is this different from Vitess (and its commercial vendor PlanetScale)?
https://vitess.io/
[-]
- levkk 4 months ago
  The goal for this project is to be analogous to Vitess for Postgres.
sourtrident 4 months ago
Funny how everyone eventually hits this point and thinks they're inventing fire - but then again, pushing your trusty old tools way past comfort is where cool engineering actually happens.
remram 4 months ago
Never heard of pgdog before. How does it compare to citus?
[-]
- caffeinated_me 4 months ago
  Seems like this is a similar philosophy, but is missing a bunch of things the Citus coordinator provides. From the article, I'm guessing Citus is better at cross-shard queries, SQL support, central management of workers, keeping schemas in sync, and keeping small join tables in sync across the fleet, and provides a single point of ingestion.
  That being said, this does seem to handle replicas better than Citus ever really did, and most of the features it's lacking aren't relevant for the sort of multitenant use case this blog is describing, so it's not a bad tradeoff. This also avoids the coordinator as a central point of failure for both outages and connection count limitations, but we never really saw those be a problem often in practice.
  [-]
  - levkk 4 months ago
    We certainly have a way to go to support all cross-shard use cases, especially complex aggregates (like percentiles). In OLTP, where PgDog will focus on first, it's good to have a sharding key and a single shard in mind, 99% of the time. The 1% will be divided between easy things we already support, like sorting, and slightly more complex things like aggregates (avg, count, max, min, etc.), which are on the roadmap.
    For everything else, and until we cover what's left, postgres_fdw can be a fallback. It actually works pretty well.
- MuffinFlavored 4 months ago
  What are the drawbacks of Citus/why isn't it perfect/what would you look for from a competitor/alternative?
  [-]
  - craigkerstiens 4 months ago
    Citus works really well *if* you have your schema well defined and slightly denormalized (meaning you have the shard key materialized on every table), and you ensure you're always joining on that as part of querying. For a lot of existing applications that were not designed with this in mind if can be several months of database and application code changes to get things into shape to work with Citus.
    If you're designing from scratch and make it worth with Citus then (specifically for a multi-tenant/SaaS sharded app) it can make scaling seem a bit magical.
- levkk 4 months ago
  Great question and we will publish something for this later on. TLDR: Citus doesn't provide an easy migration path for existing DBs and the managed deployments are pretty much exclusive to Azure. A large percentage of production DBs run elsewhere and don't have an easy way to shard.
moribvndvs 4 months ago
DynamoDB is most certainly not the way.
Nelkins 4 months ago
How does PgDog handle a column like:
`id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY`
[-]
- levkk 4 months ago
  Replace it with a function that makes sure the id coming out matches the sharding schema. Assuming it's coming from a sequence, we're consuming it until we get a matching number. It would be good to know what is behind the generated column in your use case.
roark_howard 4 months ago
Shouldn't the title be 'Learning to scale Postgres'?
kristianpaul 4 months ago
Logical replication works both ways, thats a good start.
octernion 4 months ago
i was reading through this and was going "huh this sounds familiar" until i read who wrote it :)
neat piece of tech! excited to try it out.
rohan_ 4 months ago
Couldn't they have just moved to Aurora DSQL and saved all the headache?
[-]
- levkk 4 months ago
  We actually found Aurora to be about 3x slower than community PG for small queries. That was back then, maybe things are better now. Migrating to another database (and Aurora is Postgres-compatible, it's not Postgres) is very risky when you've been using yours for years and know where the edge cases are.
  [-]
  - sgarland 4 months ago
    I’ve consistently found Aurora MySQL and PG to be slower than everything, including my 12 year old Dell R620s. You can’t beat data locality, and the 4/6 quorum requirement of Aurora combined with the physical distance kills any hope of speed.
  - SahAssar 4 months ago
    Aurora is postgres but with a different storage layer, no? It uses the postgres engine, which other postgres-compatible databases like cockroach do not, right?
    [-]
    - tudorg 4 months ago
      That’s right, Aurora Postgres is quite close to vanilla Postgres. Aurora DSQL is a different story though.
      For a more scientific answer, there is this project: https://pgscorecard.com/
      Note that Aurora scores 93% while Cockroach scores only 40%.
      [-]
      - SahAssar 4 months ago
        I actually wasn't aware of Aurora DSQL, that's incredibly bad product naming.
  - rednafi 4 months ago
    This!
    Postgres to pg-compatible DBs are never as smooth as they advertise it to be.
- re-thc 4 months ago
  > Aurora DSQL
  There isn't even pricing on it...